* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Section 10-1 t Distribution for Inferences about a Mean
Psychometrics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Confidence interval wikipedia , lookup
Resampling (statistics) wikipedia , lookup
10.1 t Distribution for Inferences about a Mean LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal distribution for constructing confidence intervals or conducting hypothesis tests for population means, and know how to make proper use of the t distribution. Copyright © 2009 Pearson Education, Inc. Much of the work in preceding sections assumed the sampling distribution is normal, but a review of articles in professional journals shows that professional statisticians rarely use the normal distribution for confidence intervals and hypothesis tests in real applications. A major reason for this is that the normal distribution requires that we know the population standard deviation σ. Because we generally do not know σ, we must estimate it with the sample standard deviation s. Statisticians therefore prefer an approach that does not require knowing σ. Such is the case with the Student t distribution, or t distribution for short, which can be used when we do not know the population standard deviation and either the sample size is greater than 30 or the population has a normal distribution. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 2 Inferences about a Population Mean: Choosing between t and Normal Distributions t distribution: Population standard deviation is not known and the population is normally distributed. or Normal distribution: or Population standard deviation is not known and the sample size is greater than 30. Population standard deviation is known and the population is normally distributed. Population standard deviation is known and the sample size is greater than 30. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 3 Figure 10.1 This figure compares the standard normal distribution to the t distribution for two different sample sizes. Notice that as the sample size gets larger, the t distribution more closely approximates the normal distribution. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 4 The t distribution is very similar in shape and symmetry to the normal distribution, but it accounts for the greater variability that is expected with small samples. The real value of the t distribution is that it allows us to extend ideas of confidence intervals or hypothesis tests to many cases in which we cannot use the normal distribution because we do not know the population standard deviation. Keep in mind, however, that it still does not work for all cases. For example, if we have a small sample of size 30 or less or the sample data suggest that the population has a distribution which is radically different from a normal distribution, then neither the t distribution nor the normal distribution applies. Such cases require other methods not discussed in this book. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 5 Confidence Intervals Using the t Distribution To specify a confidence interval, we must first calculate the margin of error, E. With a t distribution, the formula is s E=t· n where n is the sample size, s is the sample standard deviation, and t is a value that we look up in Table 10.1 (next slide). Copyright © 2009 Pearson Education, Inc. Slide 10.1- 6 Copyright © 2009 Pearson Education, Inc. Slide 10.1- 7 Steps for finding t values in Table 10.1. • First, determine the number of degrees of freedom for the sample data, defined to be the sample size minus 1: degrees of freedom for t distribution = n – 1 • Table 10.1 (previous slide) shows degrees of freedom in column 1. Find the row corresponding to the number of degrees of freedom in your sample data, and then look across the row to find the appropriate t value. For confidence intervals with population means, the t values correspond to 95% confidence in column 2 and 90% confidence in column 3. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 8 Steps for finding t values in table 10.1 (cont.) We use the table values for the “area in two tails” because the margin of error can be either below the mean or above it. For example, 95% confidence means we are looking for a total area of 0.05 both to the far left and to the far right of a t distribution like those shown in Figure 10.1 (slide 4). Once you find the t value for your data and confidence level, you can determine the confidence interval just as we did in Section 8.2, except using the new formula for the margin of error, E. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 9 Confidence Interval for a Population Mean (μ) with the t Distribution If conditions require use of the t distribution (σ not known and n > 30 or population normally distributed), the confidence interval for the true value of the population mean (μ) extends from the sample mean minus the margin of error ( x – E) to the sample mean plus the margin of error ( x + E) . That is, the confidence interval for the population mean is E) x –– EE < μ < x ++EE (or, equivalently, x ± E where the margin of error is s E=t· n and we find t from Table 10.1. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 10 EXAMPLE 1 Confidence Interval for Diastolic Blood Pressure Here are five measures of diastolic blood pressure from randomly selected adult men: 78, 54, 81, 68, 66. These five values result in these sample statistics: n = 5, x 69.4, s = 10.7. Using this sample, construct the 95% confidence interval estimate of the mean diastolic blood pressure level for the population of all adult men. Solution: Because the population standard deviation is not known and because it is reasonable to assume that blood pressure levels of adult men are normally distributed, we use the t distribution instead of the normal distribution. With a sample of size n = 5, the number of degrees of freedom is degrees of freedom for t distribution = n – 1 = 5 – 1 = 4 Copyright © 2009 Pearson Education, Inc. Slide 10.1- 11 EXAMPLE 1 Confidence Interval for Diastolic Blood Pressure Solution: (cont.) For 95% confidence, we use column 2 in Table 10.1 to find that t = 2.776. We now use this value along with the given sample size (n = 5) and sample standard deviation (s = 10.7) to calculate the margin of error, E: s 10.7 2.776 13.3 E=t n 5 Copyright © 2009 Pearson Education, Inc. Slide 10.1- 12 EXAMPLE 1 Confidence Interval for Diastolic Blood Pressure Solution: (cont.) Finally, we use the margin of error and the sample mean to find the 95% confidence interval: x –E < μ < x+E 69.4 – 13.3 < μ < 69.4 + 13.3 56.1 < μ < 82.7 Based on the five sample measurements, we have 95% confidence that the limits of 56.1 and 82.7 contain the mean diastolic blood pressure level for the population of all adult men. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 13 Hypothesis Tests Using the t Distribution When the t distribution is used for a hypothesis test of a claim about a population mean (H0: μ = claimed value), the t value plays the role that the standard score z played when we studied these hypothesis tests in Section 9.2. With the t distribution, instead of calculating the standard score z, we use the following formula to calculate t: x-μ t s/ n where n is the sample size, x is the sample mean, s is the sample standard deviation, and μ is the population mean claimed by the null hypothesis. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 14 Once we have calculated t, we decide whether to reject or not reject the null hypothesis by comparing our value of t to the critical values of t found in Table 10.1. The critical values depend on the type of test as follows. Right-tailed test: Reject the null hypothesis if the computed test statistic t is greater than or equal to the value of t found in the column of Table 10.1 labeled “Area in one tail.” Notice that for the one-tailed test, column 2 gives critical values for significance at the 0.025 level and column 3 gives critical values for significance at the 0.05 level. Left-tailed test: Reject the null hypothesis if the computed test statistic t is less than or equal to the negative of the value of t found in the column of Table 10.1 labeled “Area in one tail.” Copyright © 2009 Pearson Education, Inc. Slide 10.1- 15 Left-tailed test: (cont) Again, because this is a one-tailed test, column 2 gives critical values for significance at the 0.025 level and column 3 gives critical values for significance at the 0.05 level. Two-tailed test: Reject the null hypothesis if the absolute value of the computed test statistic t is greater than or equal to the value of t found in the column of Table 10.1 labeled “Area in two tails.” For this case, column 2 gives critical values for significance at the 0.05 level and column 3 gives critical values for significance at the 0.10 level. The computed test statistic t can also be used to find a Pvalue; however, that is usually done with the aid of statistical software rather than with tables. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 16 EXAMPLE 2 Right-Tailed Hypothesis Test for a Mean Listed below are ten randomly selected IQ scores of statistics students: 111 115 118 100 106 108 110 105 113 109 Using methods from Chapter 4, you can confirm that these data have the following sample statistics: n = 10, x 109.5, s = 5.2. Using a 0.05 significance level, test the claim that statistics students have a mean IQ score greater than 100, which is the mean IQ score of the general population. Solution: Based on the claim that the mean IQ of statistics students is greater than 100, we use the null hypothesis H0: μ = 100 and the alternative hypothesis Ha: μ > 100. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 17 EXAMPLE 2 Right-Tailed Hypothesis Test for a Mean Solution: (cont.) Because the standard deviation of all IQ scores for the population of all statistics students is not known and because it is reasonable to assume that IQ scores of statistics students are normally distributed, we use the t distribution instead of the normal distribution. The value of the t test statistic is computed as follows: x – μ 109.5 – 100 t 5.777 s/ n 5.2 / 10 Copyright © 2009 Pearson Education, Inc. Slide 10.1- 18 EXAMPLE 2 Right-Tailed Hypothesis Test for a Mean Solution: (cont.) We now need to compare this value to the appropriate critical value from Table 10.1: • We find the correct row by recognizing that this data set has n – 1 = 10 – 1 = 9 degrees of freedom. • Because it is a one-tailed test and we are asked to test for significance at the 0.05 level, we use the values from column 3. • Looking in the row for 9 degrees of freedom and column 3, we find that the critical value for significance at the 0.05 level is t = 1.833. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 19 EXAMPLE 2 Right-Tailed Hypothesis Test for a Mean Solution: (cont.) Because the sample test statistic t = 5.777 is greater than the critical value t = 1.833, we reject the null hypothesis. We conclude that there is sufficient evidence to support the claim that the mean IQ score is greater than 100. We can be more precise by using software to compute the P-value for this hypothesis test, which turns out to be 0.000135. Notice that this P-value is much less than 0.05, so we can be quite confident in the decision to reject the null hypothesis and support the claim that the mean IQ score is greater than 100. Copyright © 2009 Pearson Education, Inc. Slide 10.1- 20 The End Copyright © 2009 Pearson Education, Inc. Slide 10.1- 21