Download Confidence Interval with Unknown Sigma and t

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Confidence Interval with Unknown Sigma
and t-distribution
When the population standard deviation is unknown, we substitute it with the sample standard deviation.
Due to this substitution, construction of confidence interval requires the use of the Student's t-Distribution.
What is a t-distribution?
Assumption: Sampling Distribution is normal (or approximately normal).
The process of forming a t-distribution population is illustrated below:
Underlying Population
mean =  and standard deviation = 
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
...
Sample k
Calculate sample mean and standard deviation for each sample
x1 ; s1
x2 ; s2
x3 ; s3
x4 ; s4
x5 ; s5
Population of Sample Means
x1
x2
x3
x4
x5
xk
mean =  x = and standard deviation = 
n
xk ; sk
... ...
Calculate the t-score = tk 
Form new population of t1
xk  
for each sample mean.
 sk 


 n
t2
t3
t4
t5
tk
Population consisting of t-scores
t1
t2
t3
t4
t5
tk
has a t-distribution with mean of 0 and (n-1) degrees
of freedom; where n is the sample size
Sample Mean
Sample Standard Deviation
x1
s1
x2
s2
x3
s3
x4
s4
x5
s5
xk
sk
t-score
x 
t1  1
 s1



n


x2  
t2 
 s2



n


x3  
t3 
 s3



n

x 
t4  4
 s4



n

x 
t5  5
 s5



n

tk 
xk  
 sk



n


The population t-scores t1
t2
t3
t4
t5
has a t-distribution with mean 0 and
tk
degrees of freedom of (n - 1), where n is the sample size.
Confidence interval is defined as follows:

 s 
 s 
Confidence Interval =  x  t / 2 
 , x  t / 2 

 n
 n 

where x  sample mean;
s = sample standard deviation;
n = sample size;
t / 2 = t -score which depends on the degrees of freedom (n  1) and level of confidence;
For each sample mean
Sample Mean
x in the sampling population, we can construct a confidence interval.
Sample Standard Deviation
Confidence Interval

 s1 
 s1  
 x1  t / 2 
 , x1  t / 2 

 n
 n 

x1
s1
x2
s2
x3
s3
x4
s4
x5
s5

 s5 
 s5  
 x5  t / 2 
 , x5  t / 2 

 n
 n 

xk
sk

 sk 
 sk  
 xk  t / 2 
 , xk  t / 2 

 n
 n 


 s2 
 s2  
 x2  t / 2 
 , x2  t / 2 

 n
 n 


 s3 
 s3  
 x3  t / 2 
 , x3  t / 2 

 n
 n 


 s4 
 s4  
 x4  t / 2 
 , x4  t / 2 

 n
 n 

We can see that the number of confidence intervals is very large. Some of these confidence intervals contain
the population mean (µ) and some do not. When we construct a confidence interval, we would hope that our
confidence interval contains the population mean. In practice we do not know if our constructed confidence
interval contains the population mean (µ) or not. We only know what percentage of all possible confidence
intervals the population mean. The percentage of confidence intervals that contains the population
mean is dictated by the quantity by t / 2 . The quantity t / 2 is calculated based on the level of confidence.
When calculating the value of t / 2 , we will assume that the sampling population is normal (or approximately
normal). In other words, the sample size is at least 30 or the underlying population is normal.
The table below shows some values of t / 2 and corresponding percentage of confidence intervals
containing the population mean:
Degrees of Freedom = n -1 Percentage of Confidence Intervals Containing Population Mean
t / 2
(sample size minus 1)
(same value for level of confidence)
1
2
3
35
35
35
67.48%
94.67%
99.53%
1
2
3
39
39
39
67.55%
94.75%
99.56%
1
99
67.93%
2
99
95.17%
3
99
99.68%
(Note: We can use simulation programs at www.simulation-math.com to illustrate the above table.)
Calculation t / 2 Given Level of Confidence
Illustration of 95% Confidence Interval with Sample Size 40
Underlying Population
mean =  and standard deviation = 
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
...
Sample k
... ...
Calculate
sample
mean for
each sample
x1
x2
x3
x4
x5
Population of Sample Means
x1
x2
x3
x4
x5
xk
mean =  x = and standard deviation = 
n
xk

 s 
 s 
Confidence Interval =  x  t / 2 
,
x

t
 /2 


 n
 n 

Level of Confidence 95%.
 = 5%
 /2 = 2.5% = 0.025
Degrees of freedom = 40 - 1 = 39
t / 2
Left Area = 2.5%
Middle Area = 95%
Right Area = 2.5%
For a right-tailed area of 0.025 and the degrees of freedom of 39, the corresponding t-score is 2.019.
Hence, t / 2 = 2.019.
Constructing Confidence Interval with Unknown Population
Standard Deviation
Suppose we randomly select 40 students and ask them how many hours do they watch TV per week.
The sample data are as follows:
2
3.5
4
6
3
3.5
4
6
3
2.5
4
7
2.5
5
4
7
2.5
4.5
5
8
3
2.5
5
8.5
4
2.5
5.5
4.5
4.5
2
4
9
4.5
1
4
10
4.5
5
3
11
Find a point estimate of the population mean and a 95% confidence interval for the population mean.
Solution:
The sample mean for this set of data is 4.625. Hence, a point estimate of the population mean is 4.625.
There are many, many samples of size 40. One these samples is:
2
3.5
4
6
3
3.5
4
6
Note: Sample mean =
3
2.5
4
7
2.5
5
4
7
2.5
4.5
5
8
3
2.5
5
8.5
4
2.5
5.5
4.5
x = 4.625 and Sample Standard Deviation = s =
4.5
2
4
9
4.5
1
4
10
4.5
5
3
11
2.246793
Sample Size = 40 and Degrees of Freedom = 40 - 1 = 39
Since our level of confidence is set 95%, about 95% of all confidence intervals will contain the population
mean and
5% of the confidence intervals do not contain the population mean.
From earlier discussion, a confidence interval has the form:

 s , x  t  s  
 /2 
 x  t / 2 
n 
n  



where
x is the sample mean;
t / 2 is the number of standard error from the population mean;
s is sample standard deviation;
n is the sample size
Since the sample size is greater than 30, approximately 95% of the t-scores will lie between - t / 2 and t / 2 .
t / 2
Left Area = 2.5%
Middle Area = 95%
Right Area = 2.5%
For a right-tailed area of 0.025 and the degrees of freedom of 39, the corresponding t-score is 2.019.
Hence, t / 2 = 2.019.
Standard Error = s
n
=
2.246792586
 0.355249 .
40

, x  t  s

Confidence Interval =  x  t / 2  s


 /2 
n
n  



=  4.625  2.019  0.355249  , 4.625 + 2.019  0.355249  
=  3.9077, 5.3422 
Comments:
We do not know if  3.9077, 5.3422  contains the population mean or not since this interval is one of many,
many confidence intervals. However, since we know that 95% of the confidence intervals do contain the
population mean, we can be 95% confident that the interval  3.9077, 5.3422  does contain the population
mean.
Also, the interval  3.9077, 5.3422  is an interval estimate of the population mean.
Example 1:
Find the t-score corresponding to a right-tailed area of 0.025
and degrees of freedom of 34.
Example 2:
Find the t-score corresponding to a right-tailed area of 0.05
and degrees of freedom of 57.
Example 3:
Find the t-score corresponding to a left-tailed area of 0.005
and degrees of freedom of 89.
Example 4:
Find the area to the right of the t-score of 1.2
and degrees of freedom of 128.
Example 5:
Find the area between the t-scores of -1.1 and 2.1.
and degrees of freedom of 228.
Example 6:
Suppose 40 TCC students were selected randomly and asked about the
number of hours they spend on studying per week.
The data are as follows:
1
2.5
3.5
4.5
1.5
2.5
3.5
5
1.5
2.5
3.5
6
1.5
2.5
3.5
6.5
2
2.5
3.5
6.5
2
3
4
6.5
2
3
4
6.5
2
3
4
7
2
3
4
8
2.5
3
4
9.5
x = sample average = 3.725 and s  sample standard deviation = 1.960997914
Find a 99% confidence interval for population mean of number of hours
students spent on studying.
Solution:
Level of confidence = 99%
α = 1%
α/2 = 0.5% = 0.005 = right-tailed area
Degrees of freedom = 40 - 1 = 39 (where 40 is the sample size)
t / 2 = 2.689

 s 
 s 
Confidence Interval =  x  t / 2   , x  t / 2   
 n
 n 


 1.960997914 
 1.960997914  
=  3.725   2.689  
,
3.725+
2.689





40
40





=  2.89, 4.5587 
Thus, we are 99% confident that the mean of hours students spent on
studying is between (2.89, 4.5587).
Related documents