Download Confidence Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript

The point estimators of population
parameters ( X and p in our case) are random
variables and they follow a normal
distribution. Their expected values are the
values of the population parameters and
variances/standard deviations are in the given
form.



we can consider each point estimate is one
of the values of the random variable under
normal distribution.
We want to use our point estimate to find the
population parameter.
However, the probability that our point
estimate actually IS the population
parameter is VERY small.

Therefore, the conclusion is, the point
estimates are not very helpful in terms
finding the population parameter if used
ALONE.


We need to take a different strategy.
Assuming we play a game, I have one number
in my mind, with some restrictions, and you
will guess what that number is.
 A. Your guess can only be ONE number.
 B. Your guess could be a range (or interval) of
numbers.
 In which case do you think you have a better
chance of getting the number.



Apparently, B gives you a better chance.
Let’s do that in our estimation here.
Obviously, that strategy should be called
Interval Estimation.

Idea of interval estimation.

Form of interval estimation:
 Point Estimate  Margin of Error
 That is usually called confidence interval.
 Recall some of the poll results of the presidential
election, “someone is leading at 52% with margin
of error of 3%”.

Two things we need to know about interval
estimation:
 1. How to do it.
 2. How to use it.



Let’s start with the second one, which is true
for all types of interval estimation.
We use the term “confidence” interval, which
means we should have some confidence in
the interval estimation we come up with.
How do we quantify “confidence”?


Remember that if we do many, many SRS
and calculate the point estimate for each one
of them, the mean of those point estimates
should be the population parameter of
interest.
That is the same for confidence interval.



Given that we take many, many SRS and
calculate the confidence intervals for the
point estimates from each sample, we expect
a proportion of those confidence intervals
should cover the true population parameter.
That “proportion” is our measure of
confidence.
Usually, we use 95% as our confidence level.


Now, how to calculate confidence intervals:
For sample mean,X , the confidence interval
is in the form of:
X  Z
2


n
where X is the sample mean, n is the sample
size, is the population standard deviation.



Now let’s look at Z , that is how we take
“confidence” into account.
We assume the sample mean is normally
distributed, or at least approximately
normally distributed.
The values of sample mean within 95%
probability of the population parameter 
is   Z  

2
2
n

Then if an point estimate X is within 95% of μ,
the interval X  Z n should cover μ.

2


Next question, how do we decide .
Z
If we are interested in 95%, taking into
account that normal distribution is
symmetric, we should use the cutoff at
97.5%.

2

Therefore, a 95% confidence interval of
X  1.96
X

n
 Also, we can try other confidence levels, for
example, if we want to try 99% confidence
interval, we should use 2.576 and if we want to try
90% confidence interval, we should use 1.645.


Example: Average GPA of management
students.
Using the previous example, if we know that
the standard deviation of GPA among
management students is 0.8, create a 95%
confidence interval for the estimate we got in
our sample. How about a 90%?

Some observations:
1. Given that a sample has been drawn, the margin
of error :Z   , totally depends on the
n
confidence level. The higher the confidence level,
the larger the margin of error.
2. The width of the confidence interval also depends
on the confidence level, the higher the confidence
level, the wider then confidence interval.
2

Interpretation of confidence interval: (at 95%
for example)
 Basically, if we draw many samples and calculate
the confidence interval, 95% of those confidence
intervals will cover the true population parameter.
 Or, since each sample is equally likely, we can say,
our confidence interval has a 95% chance of
covering the true population parameter.


Confidence interval is something that we
calculate after we draw a sample and
calculate the point estimate.
You may also ask what if we do NOT have the
population variance/standard deviation?

How about we want to do a confidence
interval for sample proportion, very similar:
p  Z
2

p (1  p )
n
And everything else follows as it is for the
sample mean.

Example: if we want to find a 95% confidence
interval for the proportion of management
students whose GPA is higher than 2.8, what
shall we do and how shall we interpret it?



So far we have assumed that you have the
power to make and carry out the decisions of
sampling scheme and data collection.
Unfortunately, we do not have that power or
resource most of the time.
In real research, we are always faced with the
question, how large do you think your
sample should be to get a desired margin of
error?

If the population parameter of interest is the
population mean, we will assume that we
know the population standard deviation and
use the following formula:
(Z   )2
n 
2
E2
where E is the desired level of margin of error.

If our interest is in the population proportion,
then use the following formula:
( Z  ) 2 p(1  p)
n

How to find
2
E2
p
?
 1. Use other people’s results; 2. Use a pilot study;
3. Use judgments; 4. Start with p=0.5.

Example: if you are interested in the average
GPA of management students and you know
that the standard deviation of GPA among
management students is 0.8. You want your
estimate to be off by at most 0.4. How will
you determine the number of students to
collect information from?

Example: You are interested in how many
chocolate beans there are in an M&M packet.
Someone from the factory told you that the
standard deviation of number of chocolate
beans is 10, how many packs of M&M will
you have to sample to get an estimate which
is 3 beans within the actual number of beans
in each packet? How about you change your
desired margin of error to be 20?

Example: Again you are interested in the
proportion of management students with
GPA higher than 2.8. You heard from
someone whose has done this study before
that the proportion in her study is 60%. If you
decide to be off by no more than 5%, how
many students should you include in your
sample?