Download Confidence Interval - simulation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Confidence Interval
In practice, the population mean (µ) is hardly ever known. We normally estimate it with a point estimate or
an interval estimate.
A point estimate is a statistic that estimates a parameter. For example, a sample mean is a point estimate of
the parameter population mean (µ).
An interval estimate is an interval of values that is believed to contain the population mean. An interval
estimate of the population mean (µ) is called a confidence interval.
Let's look at an example. Suppose we randomly select 40 SAT math scores and the sample data are as follows:
300
420
540
680
320
430
550
690
350
430
570
700
370
440
590
710
380
450
600
710
380
460
620
720
390
470
650
720
390
470
650
730
400
480
660
730
400
500
670
750
The sample average is 536.75 and sample standard deviation is 140.
A point estimate for the population of SAT math scores mean would be 536.75.
An interval estimate of the population could be (536.75 - 20, 536.75 + 20) = (516.75, 556.75).
Question: Can we an interval that is wider that (516.75, 556.75) so that I can more confident about
coming up with an interval estimate that contains the population mean?
We can make the interval as wide as you want. For example other possible interval estimates might be:
a) (536.75 - 30, 536.75 + 30) = (506.75, 566.75)
b) (536.75 - 40, 536.75 + 40) = (496.75, 576.75)
c) (536.75 - 50, 536.75 + 50) = (486.75, 586.75)
Hence, the wider the interval estimate, the more confident you can be about the population mean lying
inside the interval.
But note that the wider the interval estimate, the less we can be precise about estimating the population
mean.
Consequently, when constructing an interval estimate, we have to decide which is more important: level
of confidence or precision?
To understand the logic behind constructing a confidence interval, we will examine the relationship between
sampling distribution and confidence interval.
Sampling Distribution and Confidence Interval
Illustration of 95% Confidence Interval
Underlying Population
mean =  and standard deviation = 
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
...
Sample k
... ...
Calculate
sample
mean for
each sample
x1
x2
x3
x4
x5
Population of Sample Means
x1
x2
x3
x4
x5
xk
mean =  x = and standard deviation = 
n
xk
Confidence interval is defined as follows:

  
  
Confidence Interval =  x  z / 2 
 , x  z / 2 

 n
 n 

where x  sample mean;
 = underlying population standard deviation;
n = sample size;
z / 2 = z-score corresponding to the percentage of
confidence intervals containing population mean.
  
z / 2 
  margin of error.
 n
Hence, confidence interval = sample mean ± margin of error.
For each sample mean x in the sampling population, we can construct a confidence intervals. For 95%
confidence interval, we set z / 2 to 1.96. Later on we will see how to figure out the value of z / 2 for any confidence
interval.
Sample Mean
Confidence Interval
x1

  
   
  
  
 x1  z / 2 
 , x1  z / 2 
  =  x1  1.96 
 , x1  1.96 

 n
 n  
 n
 n 

x2
x3
x4
x5
xk


 x2  z / 2 




 x3  z / 2 


   
  
  
 , x2  z / 2 
  =  x2  1.96 
 , x2  1.96 

n
 n  
 n
 n 
 
   
  
  
 , x3  z / 2 
  =  x3  1.96 
 , x3  1.96 

n
 n  
 n
 n 


 x4  z / 2 




 x5  z / 2 


   

 , x4  z / 2 
  =  x4  1.96 
n
 n  

 
   

 , x5  z / 2 
  =  x5  1.96 
n
 n  

 
 
  
 , x4  1.96 

n
 n 
 
  
 , x5  1.96 

n
 n 
 

  
   
  
  
 xk  z / 2 
 , xk  z / 2 
  =  xk  1.96 
 , xk  1.96 

 n
 n  
 n
 n 

We can see that the number of confidence intervals is very large. Some of these confidence intervals contain
the population mean (µ) and some do not. When we construct a confidence interval, we would hope that our
confidence interval contains the population mean.
In practice we do not know if our constructed confidence interval contains the population mean (µ) or not.
We only know what percentage of all possible confidence intervals containing the population mean. The
percentage of confidence intervals that contains the population mean is dictated by the quantity z / 2 .
When calculating the value of z / 2 , we will assume that the sampling population is approximately normal. In
other words, the sample size is at least 30 or the underlying population is normal.
The table below shows some values of z / 2 and corresponding percentage of confidence intervals containing
the population mean:
Percentage of Confidence Intervals
Containing Population Mean
1
68.27%
2
95.45%
3
99.73%
1.645
90%
1.96
95%
2.57
99%
(Note: We can use simulation programs at www.simulation-math.com to illustrate the above table.)
z / 2
The percentage containing the population mean is calculated by using the standard normal distribution.
For example, for z / 2 = 1.96, we would find the area between -1.96 and 1.96. Below the area in yellow is 95%
and represents the percentage of confidence intervals containing the population mean.
95%
We can find the area between -1.96 and 1.96 by using one of the computational tools
at www.simulation-math.com.
Question: What would be an appropriate value for z / 2 ?
As z / 2 increases, the width of the confidence intervals also increases. That's the reason why, we see that as
z / 2 increases, the percentage of the confidence intervals containing the population mean also increases.
Thus, if we want to have higher percentage of confidence intervals containing the population mean, then we
would choose a large value for z / 2 . But note that large value for z / 2 will lead to wider confidence intervals;
and hence less precision about the estimation of the population mean.
In practice, when we want to do an interval estimate of the population mean, we would specify the what
percentage of the confidence intervals do we want to contain the population mean. Then we have to figure
out the corresponding z / 2 .
Suppose we want the percentage containing the population mean to be 95%. Normally we would say we
want to construct a confidence interval with a level of confidence of 95%.
For level of confidence is 95%:
  5% = percentage of confidence intervals not containing population mean
 / 2  2.5%  0.025
z /2  1.96.
For standard normal distribution, z-score is 1.96 if right-tailed area is 0.025.
95%
We can find the z-score corresponding to a right-tailed area of 0.025 by using one of the computational tools
at www.simulation-math.com.
Since the level of confidence is specified at 95%, 95% of confidence intervals contain population mean (µ) and
5% do not. Hence, α = 5%.
In practice, to construct a confidence interval, we select a sample of size n at random and then calculate the
sample mean and confidence interval. We do not know if our confidence interval is part of the 95% or part of
the 5%. All we know is that of all possible confidence intervals -- where level of confidence is set at 95% -- 95%
of them contain the population mean and 5% do not.
Thus, we can only say that we are 95% confident the our confidence interval contains the population mean
(µ).
Constructing 95% Confidence Interval
The population of ACT scores has a standard deviation of 6. Suppose we randomly select a sample of 40 ACT
scores
and the data are as follows:
6
14
18
22
7
15
18
22
8
15
19
22
10
16
19
23
11
16
19
23
11
17
20
23
12
17
20
24
12
18
21
24
13
18
21
25
13
18
22
27
Find a confidence interval with a level of confidence of 95%.
Solution:
From the population of ACT scores we can form many, many samples of size 40. One these many, many
samples is:
6
14
18
22
7
15
18
22
8
15
19
22
10
16
19
23
11
16
19
23
11
17
20
23
12
17
20
24
12
18
21
24
13
18
21
25
13
18
22
27
Note: Sample mean = x = 17.48.
For each sample of size 40, a sample mean can be calculated. Hence there are many, many sample means.
For each sample
mean a confidence interval can be formed. Consequently, there are many, many confidence intervals.
Since our level of confidence is set 95%, 95% of all confidence intervals will contain the population mean and
5% of the confidence intervals do not contain the population mean.
Since the sample size is greater than 30, the distribution of the sample means is approximately normal and
95% of the z-scores will lie between - z / 2 and z / 2 .
Left Area = 2.5%
Middle Area = 95%
Right Area = 2.5%
z / 2 is the z-score corresponding to a right area of 0.025. Hence, z / 2 = 1.96.
From earlier discussion, a confidence interval has the form:


 , x  z 


/
2
 x  z / 2 

n 
n  


where x is the sample mean;
z / 2 is the number of standard error from the population mean;

is the standard deviation of the population of ACT scores;
n is the sample size
Standard Error =

n
= 6
40
 0.9486 .

, x  z  

Confidence Interval =  x  z / 2  


 /2 
n
n  



= 17.48  1.96  0.9486  , 17.48 + 1.96  0.9486  
= 15.62, 19.33
Comments:
We do not know if 15.62, 19.33 contains the population mean or not since this interval is one of many,
many confidence intervals. However, since we know that 95% of the confidence intervals do contain the
population mean, we can be 95% confident that the interval 15.62, 19.33 does contain the population
mean.
Also, the interval 15.62, 19.33 is an interval estimate of the population mean.
Illustration of 99% Confidence Interval
Underlying Population
mean =  and standard deviation = 
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
...
Sample k
... ...
Calculate
sample
mean for
each sample
x1
x2
x3
x4
x5
Population of Sample Means
x1
x2
x3
x4
x5
xk
mean =  x = and standard deviation = 
n
xk

  
  
Confidence Interval =  x  z / 2 
,
x

z
 /2 


 n
 n 

Sample Mean
Confidence Interval
x1

  
   
  
  
 x1  z / 2 
 , x1  z / 2 
  =  x1  2.57 
 , x1  2.57 

 n
 n  
 n
 n 

x2
x3
x4
x5
xk


 x2  z / 2 




 x3  z / 2 


   
  
  
 , x2  z / 2 
  =  x2  2.57 
 , x2  2.57 

n
 n  
 n
 n 
 
   
  
  
 , x3  z / 2 
  =  x3  2.57 
 , x3  2.57 

n
 n  
 n
 n 


 x4  z / 2 




 x5  z / 2 


   

 , x4  z / 2 
  =  x4  2.57 
n
 n  

 
   

 , x5  z / 2 
  =  x5  2.57 
n
 n  

 
 
  
 , x4  2.57 

n
 n 
 
  
 , x5  2.57 

n
 n 
 

  
   
  
  
 xk  z / 2 
 , xk  z / 2 
  =  xk  2.57 
 , xk  2.57 

 n
 n  
 n
 n 

For level of confidence of 99%:
  1%
 / 2  0.5%  0.005
z /2  2.57.
For standard normal distribution, z-score is 2.57 if right-tailed area is 0.005.
99%
Using computational tool at www.simulation-math.com:
Since the level of confidence is specified at 99%, 99% of confidence intervals contain population mean (µ) and
1% do not. Hence, α = 1%.
In practice, to construct a confidence interval, we select a sample of size n at random and then calculate the
sample mean and confidence interval. We do not know if our confidence interval is part of the 99% or part of
the 1%. All we know is that of all possible confidence intervals -- where level of confidence is set at 99% -- 99%
of them contain the population mean and 1% do not.
Thus, we can only say that we are 99% confident the our confidence interval contains the population mean
(µ).
Example:
The population of ACT scores has a standard deviation of 6. Suppose we randomly select a sample of 40 ACT
scores
and the data are as follows:
6
14
18
22
7
15
18
22
8
15
19
22
10
16
19
23
11
16
19
23
11
17
20
23
12
17
20
24
12
18
21
24
13
18
21
25
13
18
22
27
Find a confidence interval with a level of confidence of 99%.
Solution:
From the population of ACT scores we can form many, many samples of size 40. One these many, many
samples is:
6
14
18
22
7
15
18
22
8
15
19
22
10
16
19
23
11
16
19
23
11
17
20
23
12
17
20
24
12
18
21
24
13
18
21
25
13
18
22
27
Note: Sample mean = x = 17.48.
For each sample of size 40, a sample mean can be calculated. Hence there are many, many sample means.
For each sample mean a confidence interval can be formed. Consequently, there are many, many confidence
intervals.
Since our level of confidence is set 99%, 99% of all confidence intervals will contain the population mean and
1% of the confidence intervals do not contain the population mean.
Since the sample size is greater than 30, the distribution of the sample means is approximately normal and
99% of the z-scores will lie between - z / 2 and z / 2 .
Left Area = 0.5%
Middle Area = 99%
Right Area = 0.5%
z / 2 is the z-score corresponding to a right area of 0.005. Hence, z / 2 = 2.57.
From earlier discussion, a confidence interval has the form:


 , x  z 

 /2 
 x  z / 2 

n
n  



where x is the sample mean;
z / 2 is the number of standard error from the population mean;

is the standard deviation of the population of ACT scores;
n is the sample size
Standard Error =

n
= 6
40
 0.9486 .

, x  z  

Confidence Interval =  x  z / 2  


 /2 
n
n  



= 17.48  2.57  0.9486  , 17.48 + 2.57  0.9486  
= 15.042, 19.918
Comments:
We do not know if 15.042, 19.918 contains the population mean or not since this interval is one of many,
many confidence intervals. However, since we know that 99% of the confidence intervals do contain the
population mean, we can be 99% confident that the interval 15.042, 19.918 does contain the population
mean.
Also, the interval 15.042, 19.918 is an interval estimate of the population mean.
Illustration of 90% Confidence Interval
Underlying Population
mean =  and standard deviation = 
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
...
Sample k
... ...
Calculate
sample
mean for
each sample
x1
x2
x3
x4
x5
Population of Sample Means
x1
x2
x3
x4
x5
xk
mean =  x = and standard deviation = 
n

  
  
Confidence Interval =  x  z / 2 
,
x

z

/
2



 n
 n 

xk
Sample Mean
Confidence Interval
x1

  
   
  
  
 x1  z / 2 
 , x1  z / 2 
  =  x1  1.645 
 , x1  1.645 

 n
 n  
 n
 n 

x2
x3
x4
x5
xk


 x2  z / 2 




 x3  z / 2 


   
  
  
 , x2  z / 2 
  =  x2  1.645 
 , x2  1.645 

n
 n  
 n
 n 
 
   
  
  
 , x3  z / 2 
  =  x3  1.645 
 , x3  1.645 

n
 n  
 n
 n 


 x4  z / 2 




 x5  z / 2 


   

 , x4  z / 2 
  =  x4  1.645 
n
 n  

 
   

 , x5  z / 2 
  =  x5  1.645 
n
 n  

 
 
  
 , x4  1.645 

n
 n 
 
  
 , x5  1.645 

n
 n 
 

  
   
  
  
 xk  z / 2 
 , xk  z / 2 
  =  xk  1.645 
 , xk  1.645 

 n
 n  
 n
 n 

For level of confidence is 90%:
  10%
 / 2  5%  0.05
z /2  1.644854.
For standard normal distribution, z-score is 1.644854 if right-tailed area is 0.05.
90%
Since the level of confidence is specified at 90%, 90% of confidence intervals contain population mean (µ) and
10% do not.
Hence, α = 10%.
In practice, to construct a confidence interval, we select a sample of size n at random and then calculate the
sample mean and confidence interval. We do not know if our confidence interval is part of the 90% or part of
the 10%. All we know is that of all possible confidence intervals -- where level of confidence is set at 90% -90% of them contain the population mean and 10% do not.
Thus, we can only say that we are 90% confident the our confidence interval contains the population mean
(µ).
Example:
The population of ACT scores has a standard deviation of 6. Suppose we randomly select a sample of 40 ACT
scores
and the data are as follows:
6
14
18
22
7
15
18
22
8
15
19
22
10
16
19
23
11
16
19
23
11
17
20
23
12
17
20
24
12
18
21
24
13
18
21
25
13
18
22
27
Find a confidence interval with a level of confidence of 90%.
Solution:
From the population of ACT scores we can form many, many samples of size 40. One these many, many
samples is:
6
14
18
22
7
15
18
22
8
15
19
22
10
16
19
23
11
16
19
23
11
17
20
23
12
17
20
24
12
18
21
24
13
18
21
25
13
18
22
27
Note: Sample mean = x = 17.48.
For each sample of size 40, a sample mean can be calculated. Hence there are many, many sample means.
For each sample mean a confidence interval can be formed. Consequently, there are many, many confidence
intervals.
Since our level of confidence is set 90%, 90% of all confidence intervals will contain the population mean and
10% of the confidence intervals do not contain the population mean.
Since the sample size is greater than 30, the distribution of the sample means is approximately normal and
90% of the z-scores will lie between - z / 2 and z / 2 .
Left Area = 5%
Middle Area = 90%
Right Area = 5%
z / 2 is the z-score corresponding to a right area of 0.05. Hence, z / 2 = 1.644854.
From earlier discussion, a confidence interval has the form:


 , x  z 


/
2
 x  z / 2 

n 
n  


where x is the sample mean;
z / 2 is the number of standard error from the population mean;

is the standard deviation of the population of ACT scores;
n is the sample size
Standard Error =

n
= 6
40
 0.9486 .

, x  z  

Confidence Interval =  x  z / 2  


 /2 
n
n  



= 17.48  1.645  0.9486  , 17.48 + 1.645  0.9486  
= 15.919, 19.040 
Comments:
We do not know if 15.919, 19.040  contains the population mean or not since this interval is one of many,
many confidence intervals. However, since we know that 99% of the confidence intervals do contain the
population mean, we can be 99% confident that the interval 15.919, 19.040  does contain the population
mean.
Also, the interval 15.919, 19.040  is an interval estimate of the population mean.
Related documents