* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Survey
Document related concepts
Transcript
Statistics 303 Chapter 7 Inference for Means Inference for Means • To this point, when examining the mean of a population we have always assumed that the population standard deviation (s) was known. • In practice this is seldom the case. • We usually must estimate the population standard deviation with the sample standard deviation s (for a review of s, see pp. 49-50 of the book). • When we do this, the sampling distribution of the sample mean is no longer normally distributed, because of the adjustment for estimating s with s. • Thus, instead of using the Z, the standard normal distribution, we must use the appropriate t-distribution. Inference for Means • The t-distribution – Although there is only one Z-distribution, there are many, many t-distributions. – In fact, there is a different t-distribution for each sample size used. – The shape of each t-distribution is very similar to the Z-distribution, but is slightly flatter. – The larger the sample size, the closer the tdistribution is to the Z-distribution. Inference for Means • The t-distribution – The way we distinguish between various t-distributions is by finding the degrees of freedom (df) that correspond to the sample size. – When we are looking at only one sample, the degrees of freedom are the sample size minus one: df = n – 1. – We say that the one-sample t-statistic: tn 1 x s n has the t distribution with n – 1 degrees of freedom. Inference for Means • The t-distribution – A table of t distribution critical values can be found in Table D (the last page of the book). – Note that these values are areas to the right, not areas to the left as in the Z-table. – In Table D, the degrees of freedom are listed in the left column. – The probabilities are on top (these probabilities are inside for the Z-table) – The individual t-values are inside the table. – Make sure to get acquainted with this table and how it differs from the Z-table. Inference for Means • The t-distribution – In the book, p.452, we see an example of how the distributions compare: Inference for Means • The t-distribution – With the change from s to s, and the change from z* to t*, the steps in producing confidence intervals and hypothesis tests are the same as we have seen previously. – In Chapter 1, p. 50, we find that s is calculated from the data using the formula: 1 n 2 s x x i n 1 i 1 This formula is very cumbersome. Ideally, a computer is used to calculate s, particularly for large data sets. Confidence Interval for with Unknown s • The formula for a confidence interval for with unknown s is Calculated from the data. x t * s n Calculated from the data. Sample size t* is found in table D at the back of the book. It must correspond to the appropriate df = n – 1. It is easiest to find the confidence level at the bottom of the table and go up to the correct df. Confidence Interval for with Unknown s • Confidence Interval Example – An economist wants to determine the average amount a family of four in the United States spends on housing annually. He randomly selects 85 families of size four and finds the amount they spent on housing the previous year. – The economist wishes to estimate the mean with 99% confidence. Confidence Interval for with Unknown s • Confidence Interval Example – Information given: Sample size: n = 85. Data: $6,789, $8,233, $4,784, …, $5,974 (85 numbers) x $6,219 s $1,978 df = n – 1 = 85 – 1 = 84 Calculated from the data. Confidence Interval for with Unknown s • Confidence Interval Example x t * s n 1,978 6,219 566.18 6,219 2.639 85 (5652.82,6785.18) t* is found in table D. We first go to the 99% confidence level at the bottom. Then we go up to 80 df (always round down). Thus, t* = 2.639. This is a 99% confidence interval for the true average amount a family of four in the United States spends on housing annually. Hypothesis Test for with Unknown s • The steps for a hypothesis test are the same as those seen previously, namely, – – – – 1. State the null hypothesis. 2. State the alternative hypothesis. 3. State the level of significance (i.e., a = 0.05). 4. Calculate the test statistic (note change): x 0 t s n Hypothesis Test for with Unknown s – 5. Find the P-value: • For a two-sided test: P - value PrT t or T t 2PrT t • For a one-sided test: P - value PrT t • For a one-sided test: P - value PrT t Because of the limited number of t-values given in Table D, it is more common to find a range for the P-value, rather than the exact value (as will be seen in the example). Computers can be used to obtain exact values. Hypothesis Test for with Unknown s – 6. Reject or fail to reject H0 based on the P-value. • If the P-value is less than or equal to a, reject H0. • It the P-value is greater than a, fail to reject H0. – 7. State your conclusion. • If H0 is rejected, “There is significant statistical evidence that the population mean is different than 0.” • If H0 is not rejected, “There is not significant statistical evidence that the population mean is different than 0.” Notice that these last two steps are exactly the same as for the case where s is known. Hypothesis Test for with Unknown s • T.V. Example – Suppose that the data collected from our class survey is a random sample from the entire university (which it obviously is not). We wish to see if there is evidence that the average amount of television watched for students here is more than 7 hours per week. Hypothesis Test for with Unknown s 3 4 3 10 2 5 20 10 5 10 20 3 6 10 2 3 1 3 x 8.05 9 5 1 4 5 30 s 7.46 1 10 30 10 4 10 6 3 10 0 15 21 3 9 • T.V. Example – Information given: Sample size: n = 38. df n 1 38 1 37 Hypothesis Test for with Unknown s • T.V. Example – 1. State the null hypothesis: H0 : 7 or H 0 : 7 – 2. State the alternative hypothesis: Ha : 7 – 3. State the level of significance Assume a = 0.05 from “is more than” Hypothesis Test for with Unknown s • T.V. Example – 4. Calculate the test statistic. x 0 t s n 1.05 8.05 7 0.87 7.46 1.21 38 – 5. Find the P-value. P - value PrT t PrT 0.87 Remember the table gives probabilities to the right so we do not use the technique of subtracting from 1. between 0.15 and 0.20 Use df = 30 (rounding down) Hypothesis Test for with Unknown s • T.V. Example – 6. Do we reject or fail to reject H0 based on the Pvalue? P-value = between 0.15 and 0.20 is greater than a = 0.05. Therefore, we fail to reject H0 – 7. State the conclusion. “There is not significant statistical evidence that the average amount of television watched is more than 7 hours per week at the 0.05 level of significance.” Matched Pairs t-test • To this point we have only looked at tests for single samples. • Soon we will look at confidence intervals and hypothesis tests for comparing two groups. • When each individual can be given both treatments, we can reduce the two samples to a single sample using a matched pairs design. • Examples: – Students are each given a pre-test and a post-test to determine the amount of material learned in a given time interval. – To examine the effect of a new drug, a large group of identical twins is identified. One twin is given a treatment and the other a placebo. – A ophthalmologist is examining the importance of the dominant eye in reading. A large group of subjects is asked to read a passage with dominant eye covered and again with the non-dominant eye covered. – It can be seen in each of these examples that something pairs the two responses. Matched Pairs t-test • To analyze matched pairs data, we first reduce the data from two samples to one sample and then analyze the data using one-sample techniques. • The data is reduced from two samples to one by subtracting one of the responses from the other. – We could subtract each pre-test score from each post-test score. – We could subtract each placebo response from each treatment response. – We could subtract the time taken to read the passage with the nondominant eye from the time taken to read the passage with the dominant eye. Matched Pairs t-test • Example: Keyboards – “Suppose we want to compare two brands of computer keyboards, which we will denote as keyboard 1 and keyboard 2. Keyboard 1 is a standard keyboard, while keyboard 2 is specially designed so that the keys need very little pressure to make them respond. The manufacturer of keyboard 2 would like to claim that typing can be done faster using keyboard 2…A simple random sample of n = 30 teachers was selected from a population of high-school teachers attending a national conference. Each teacher typed the same page of text once using keyboard 1 and once using keyboard 2. For each teacher the order in which the keyboards were used was determined by the toss of a coin. For each teacher the variable measured was the time (in seconds) to correctly type the page of text…” (from Graybill, Iyer and Burdick, Applied Statistics, 1998). Matched Pairs t-test • Example: Keyboards – Information given: Sample size: n = 30. xdiff 3.53 sdiff 8.56 df n 1 30 1 29 Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Keyboard 1 Keyboard 2 348 350 435 442 369 356 357 360 376 373 412 405 396 376 317 314 366 366 340 337 347 352 315 303 349 338 330 328 335 322 345 351 374 361 374 370 380 375 319 318 387 382 313 317 303 310 404 393 355 362 364 364 348 355 361 368 301 291 348 323 Reduction to one sample Difference = K2 - K1 2 7 -13 3 -3 -7 -20 -3 0 -3 5 -12 -11 -2 -13 6 -13 -4 -5 -1 -5 4 7 -11 7 0 7 7 -10 -25 Matched Pairs t-test • Example: Keyboards – 1. State the null hypothesis: H0 : 0 or H 0 : 0 – 2. State the alternative hypothesis: Ha : 0 – 3. State the level of significance Assume a = 0.05 from carefully reading Matched Pairs t-test • Example: Keyboards – 4. Calculate the test statistic. x 0 3.53 0 3.53 t 2.26 s 8.56 1.56 n 30 – 5. Find the P-value. P - value PrT t PrT 2.26 Remember the table gives probabilities to the right. PrT 2.26 between 0.01 and 0.02 Use df = 29 Matched Pairs t-test • Example: Keyboards – 6. Do we reject or fail to reject H0 based on the Pvalue? P-value = between 0.01 and 0.02 is less than a = 0.05. Therefore, we reject H0 – 7. State the conclusion. “There is significant statistical evidence that the average amount of time needed to type the passage is lower for keyboard 2 than keyboard 1 at the 0.05 level of significance.” Matched Pairs Confidence Interval • After reducing the data to a single sample, we use the same formula as for a confidence interval for with unknown s, namely, x t * s n using the mean and standard deviation of the differences. Matched Pairs Confidence Interval • Example: Golf Balls – “In the manufacture of golf balls two procedures are used. Method I utilizes a liquid center and method II, a solid center. To compare the distance obtained using both types of balls, 12 golfers are allowed to drive a ball of each type, and the length of the drive (in yards) is measured.” (from Milton, McTeer, and Corbet, Introduction to Statistics, 1997) – The manufacturer wants to estimate the mean difference with 90% confidence. Matched Pairs Confidence Interval • Example: Golf Balls – Information given: Sample size: n = 12. xdiff 9.52 sdiff 3.12 df = n – 1 = 12 – 1 = 11 Golfer liquid 1 2 3 4 5 6 7 8 9 10 11 12 solid difference (liquid - solid) 180 172.7 7.3 215.8 202.5 13.3 140.6 128.1 12.5 182.7 173.9 8.8 193.8 180.7 13.1 100.2 88.7 11.5 195.2 188.9 6.3 117.6 108.8 8.8 199 186.5 12.5 179.5 175.9 3.6 122.3 112.7 9.6 106.7 99.8 6.9 Matched Pairs Confidence Interval • Example: Golf Balls x t * s n 3.12 9.52 1.796 9.52 1.62 12 (7.90,11.14) t* is found in table D. We first go to the 90% confidence level at the bottom. Then we go up to 11 df. Thus, t* = 1.796. This is a 90% confidence interval for the true average difference for the distance traveled for the two types of golf balls. Comparing Two Means • We use the same basic principles for comparing two population means as those used for examining one population mean. • If the standard deviations s1 and s2 for each of the two populations are known, the two-sample z-statistic is then x1 x2 1 2 z s 12 n1 s 22 n2 But it is very rare that both population standard deviations are known. We will examine the situation in which they are not known. Comparing Two Means • When we are interested in comparing two population means and we are estimating the population standard deviations s1 and s2 with s1 and s2, the two-sample tstatistic is then x1 x2 1 2 t s12 s22 n1 n2 with degrees of freedom equal to the smaller of n1-1 and n2-1 (or an appropriate estimate using computer software). Comparing Two Means • The null hypothesis can be any of the following: H 0 : 1 2 or H 0 : 1 2 or H 0 : 1 2 • The alternative hypothesis can be any of the following (depending on the question being asked): H a : 1 2 or H a : 1 2 or H a : 1 2 The other steps are the same as those used for the tests we have looked at previously. Comparing Two Means • Example: Tomatoes – “There has been some discussion among amateur gardeners about the virtues of black plastic versus newspapers as weed inhibitors for growing tomatoes. To compare the two, several rows of tomatoes are planted. Black plastic is used around nine randomly selected plants and newspaper around the remaining ten. All plants start at virtually the same height and receive the same care. The response of interest is the height in feet after a month’s growth.” (from Milton, McTeer, and Corbet, Introduction to Statistics, 1997). – Perform a test to see if there is any difference between the average heights with significance level 0.10. Comparing Two Means • Example: Tomatoes – Information given: Sample sizes: n1 = 9, n2 = 10. x1 1.87 x2 1.49 s1 0.63 s2 0.43 black plastic 1.8 1.29 1.13 2.92 2.2 1.25 2.61 1.6 2.06 newspaper 2.57 1.59 1.78 1.37 1.22 1.34 1.43 1.06 1.44 1.12 df n1 1 9 1 8 because n1 is smaller th an n2 Comparing Two Means • Example: Tomatoes – 1. State the null hypothesis: H 0 : 1 2 – 2. State the alternative hypothesis: H a : 1 2 from “any difference between” – 3. State the level of significance a = 0.10 Comparing Two Means • Example: Tomatoes – 4. Calculate the test statistic. t x1 x2 1 2 2 1 2 2 s s n1 n2 1.87 1.49 0 2 0.63 0.43 9 10 2 0.38 0.25 1.52 – 5. Find the P-value. P - value 2 * PrT | t | 2 * PrT 1.52 Remember the table gives probabilities to the right. 2 * (between 0.05 and 0.10) Use df = 8 between 0.10 and 0.20 Comparing Two Means • Example: Tomatoes – 6. Do we reject or fail to reject H0 based on the Pvalue? P-value = between 0.10 and 0.20 is greater than a = 0.10. Therefore, we fail to reject H0 – 7. State the conclusion. “There is not significant statistical evidence that the average tomato plant heights are different for the two types of weed inhibitors at the 0.10 level of significance.” Comparing Two Means • The confidence interval for the difference of two population means (1- 2) is x1 x2 t * 2 1 2 2 s s n1 n2 Where t* comes from Table D and corresponds to the confidence level desired and df = smaller of n1-1 and n2-1 . Comparing Two Means • Example: Commercials – “There is some concern that TV commercial breaks are becoming longer. The observations on the following slide are obtained on the length in minutes of commercial breaks for the 1984 viewing season and the current season.” (from Milton, McTeer, and Corbet, Introduction to Statistics, 1997) – Find a 95% confidence interval for the difference between the true averages of the two seasons. Comparing Two Means • Example: Commercials – Information given: Sample sizes: n1 = 16, n2 = 16. x1 2.01 x2 2.36 s1 0.49 s2 0.19 1984 current 2.42 2.28 2 2.36 1.17 2.05 1.18 2.45 2.32 2.64 1.84 2.62 2.16 2.39 2.35 2.63 2.4 2.29 1.47 2.39 2.82 2.11 2.04 2.04 2.23 2.25 1.95 2.31 1.38 2.44 2.42 2.57 df 16 1 15 because n1 and n2 are the same. Comparing Two Means • Example: Commercials x1 x2 t * 0.492 0.192 s12 s22 2.01 2.36 2.131 16 16 n1 n2 t* is found in table D. We first go to the 95% confidence level at the bottom. Then we go up to 15 df. Thus, t* = 2.131. 0.35 0.28 (0.63,0.07) This is a 95% confidence interval for the true difference of average length in minutes for commercials between 1984 and the present. Pooled t test: Comparing Two Means • The null hypothesis can be any of the following: H 0 : 1 2 or H 0 : 1 2 or H 0 : 1 2 • The alternative hypothesis can be any of the following (depending on the question being asked): H a : 1 2 or H a : 1 2 or H a : 1 2 Pooled Estimator • Previously, we discussed two-sample t procedures from two populations with two unknown standard deviations. We then used the sample standard deviations to estimate the population standard deviations. But what about when the two populations have the same standard deviation. This estimate is called the pooled estimator of σ2 because it combines the information in both samples. 2 2 ( n 1 ) s ( n 1 ) s 1 2 2 s 2p 1 n1 n2 2 Test Statistic • Suppose that an SRS of size n1 is drawn from a normal population with unknown mean μ1 and that an independent SRS of size n2 is drawn from another normal population with unknown mean μ2. Suppose also that the two populations have the SAME standard deviation. Thus, the two-sample t statistic is x1 x2 t 1 1 sp n1 n2 • With degrees of freedom equal to n1 + n2 – 2 Confidence Interval • A level C confidence interval for μ1 – μ2 is ( x1 x2 ) t * s p 1 1 n1 n2 • Where t* comes from Table D and corresponds to the confidence level desired and df = n1 + n2 – 2