Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Confidence Interval with Unknown Sigma and t-distribution When the population standard deviation is unknown, we substitute it with the sample standard deviation. Due to this substitution, construction of confidence interval requires the use of the Student's t-Distribution. What is a t-distribution? Assumption: Sampling Distribution is normal (or approximately normal). The process of forming a t-distribution population is illustrated below: Underlying Population mean = and standard deviation = Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 ... Sample k Calculate sample mean and standard deviation for each sample x1 ; s1 x2 ; s2 x3 ; s3 x4 ; s4 x5 ; s5 Population of Sample Means x1 x2 x3 x4 x5 xk mean = x = and standard deviation = n xk ; sk ... ... Calculate the t-score = tk Form new population of t1 xk for each sample mean. sk n t2 t3 t4 t5 tk Population consisting of t-scores t1 t2 t3 t4 t5 tk has a t-distribution with mean of 0 and (n-1) degrees of freedom; where n is the sample size Sample Mean Sample Standard Deviation x1 s1 x2 s2 x3 s3 x4 s4 x5 s5 xk sk t-score x t1 1 s1 n x2 t2 s2 n x3 t3 s3 n x t4 4 s4 n x t5 5 s5 n tk xk sk n The population t-scores t1 t2 t3 t4 t5 has a t-distribution with mean 0 and tk degrees of freedom of (n - 1), where n is the sample size. Confidence interval is defined as follows: s s Confidence Interval = x t / 2 , x t / 2 n n where x sample mean; s = sample standard deviation; n = sample size; t / 2 = t -score which depends on the degrees of freedom (n 1) and level of confidence; For each sample mean Sample Mean x in the sampling population, we can construct a confidence interval. Sample Standard Deviation Confidence Interval s1 s1 x1 t / 2 , x1 t / 2 n n x1 s1 x2 s2 x3 s3 x4 s4 x5 s5 s5 s5 x5 t / 2 , x5 t / 2 n n xk sk sk sk xk t / 2 , xk t / 2 n n s2 s2 x2 t / 2 , x2 t / 2 n n s3 s3 x3 t / 2 , x3 t / 2 n n s4 s4 x4 t / 2 , x4 t / 2 n n We can see that the number of confidence intervals is very large. Some of these confidence intervals contain the population mean (µ) and some do not. When we construct a confidence interval, we would hope that our confidence interval contains the population mean. In practice we do not know if our constructed confidence interval contains the population mean (µ) or not. We only know what percentage of all possible confidence intervals the population mean. The percentage of confidence intervals that contains the population mean is dictated by the quantity by t / 2 . The quantity t / 2 is calculated based on the level of confidence. When calculating the value of t / 2 , we will assume that the sampling population is normal (or approximately normal). In other words, the sample size is at least 30 or the underlying population is normal. The table below shows some values of t / 2 and corresponding percentage of confidence intervals containing the population mean: Degrees of Freedom = n -1 Percentage of Confidence Intervals Containing Population Mean t / 2 (sample size minus 1) (same value for level of confidence) 1 2 3 35 35 35 67.48% 94.67% 99.53% 1 2 3 39 39 39 67.55% 94.75% 99.56% 1 99 67.93% 2 99 95.17% 3 99 99.68% (Note: We can use simulation programs at www.simulation-math.com to illustrate the above table.) Calculation t / 2 Given Level of Confidence Illustration of 95% Confidence Interval with Sample Size 40 Underlying Population mean = and standard deviation = Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 ... Sample k ... ... Calculate sample mean for each sample x1 x2 x3 x4 x5 Population of Sample Means x1 x2 x3 x4 x5 xk mean = x = and standard deviation = n xk s s Confidence Interval = x t / 2 , x t /2 n n Level of Confidence 95%. = 5% /2 = 2.5% = 0.025 Degrees of freedom = 40 - 1 = 39 t / 2 Left Area = 2.5% Middle Area = 95% Right Area = 2.5% For a right-tailed area of 0.025 and the degrees of freedom of 39, the corresponding t-score is 2.019. Hence, t / 2 = 2.019. Constructing Confidence Interval with Unknown Population Standard Deviation Suppose we randomly select 40 students and ask them how many hours do they watch TV per week. The sample data are as follows: 2 3.5 4 6 3 3.5 4 6 3 2.5 4 7 2.5 5 4 7 2.5 4.5 5 8 3 2.5 5 8.5 4 2.5 5.5 4.5 4.5 2 4 9 4.5 1 4 10 4.5 5 3 11 Find a point estimate of the population mean and a 95% confidence interval for the population mean. Solution: The sample mean for this set of data is 4.625. Hence, a point estimate of the population mean is 4.625. There are many, many samples of size 40. One these samples is: 2 3.5 4 6 3 3.5 4 6 Note: Sample mean = 3 2.5 4 7 2.5 5 4 7 2.5 4.5 5 8 3 2.5 5 8.5 4 2.5 5.5 4.5 x = 4.625 and Sample Standard Deviation = s = 4.5 2 4 9 4.5 1 4 10 4.5 5 3 11 2.246793 Sample Size = 40 and Degrees of Freedom = 40 - 1 = 39 Since our level of confidence is set 95%, about 95% of all confidence intervals will contain the population mean and 5% of the confidence intervals do not contain the population mean. From earlier discussion, a confidence interval has the form: s , x t s /2 x t / 2 n n where x is the sample mean; t / 2 is the number of standard error from the population mean; s is sample standard deviation; n is the sample size Since the sample size is greater than 30, approximately 95% of the t-scores will lie between - t / 2 and t / 2 . t / 2 Left Area = 2.5% Middle Area = 95% Right Area = 2.5% For a right-tailed area of 0.025 and the degrees of freedom of 39, the corresponding t-score is 2.019. Hence, t / 2 = 2.019. Standard Error = s n = 2.246792586 0.355249 . 40 , x t s Confidence Interval = x t / 2 s /2 n n = 4.625 2.019 0.355249 , 4.625 + 2.019 0.355249 = 3.9077, 5.3422 Comments: We do not know if 3.9077, 5.3422 contains the population mean or not since this interval is one of many, many confidence intervals. However, since we know that 95% of the confidence intervals do contain the population mean, we can be 95% confident that the interval 3.9077, 5.3422 does contain the population mean. Also, the interval 3.9077, 5.3422 is an interval estimate of the population mean. Example 1: Find the t-score corresponding to a right-tailed area of 0.025 and degrees of freedom of 34. Example 2: Find the t-score corresponding to a right-tailed area of 0.05 and degrees of freedom of 57. Example 3: Find the t-score corresponding to a left-tailed area of 0.005 and degrees of freedom of 89. Example 4: Find the area to the right of the t-score of 1.2 and degrees of freedom of 128. Example 5: Find the area between the t-scores of -1.1 and 2.1. and degrees of freedom of 228. Example 6: Suppose 40 TCC students were selected randomly and asked about the number of hours they spend on studying per week. The data are as follows: 1 2.5 3.5 4.5 1.5 2.5 3.5 5 1.5 2.5 3.5 6 1.5 2.5 3.5 6.5 2 2.5 3.5 6.5 2 3 4 6.5 2 3 4 6.5 2 3 4 7 2 3 4 8 2.5 3 4 9.5 x = sample average = 3.725 and s sample standard deviation = 1.960997914 Find a 99% confidence interval for population mean of number of hours students spent on studying. Solution: Level of confidence = 99% α = 1% α/2 = 0.5% = 0.005 = right-tailed area Degrees of freedom = 40 - 1 = 39 (where 40 is the sample size) t / 2 = 2.689 s s Confidence Interval = x t / 2 , x t / 2 n n 1.960997914 1.960997914 = 3.725 2.689 , 3.725+ 2.689 40 40 = 2.89, 4.5587 Thus, we are 99% confident that the mean of hours students spent on studying is between (2.89, 4.5587).