Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
Sufficient statistic wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Confidence interval wikipedia , lookup
Statistical inference wikipedia , lookup
Happiness comes not from material wealth but less desire. 1 Chapter 17 Inference about a Population Mean 2 Chapter 17 BPS - 5th Ed. Conditions for Inference about a Mean Data are from a SRS of size n. Population has a Normal distribution with mean m and standard deviation s. Both m and s are usually unknown. 3 we use inference to estimate m. Problem: s unknown means we cannot use the z procedures previously learned. Chapter 17 BPS - 5th Ed. Standard Error When we do not know the population standard deviation s (which is usually the case), we must estimate it with the sample standard deviation s. When the standard deviation of a statistic is estimated from data, the result is called the standard error of the statistic. The standard error of the sample mean x is s n 4 Chapter 17 BPS - 5th Ed. One-Sample t Statistic When we estimate s with s, our one-sample z statistic becomes a one-sample t statistic. x μ0 z σ n 5 x μ0 t s n By changing the denominator to be the standard error, our statistic no longer follows a Normal distribution. The t test statistic follows a t distribution with n – 1 degrees of freedom. Chapter 17 BPS - 5th Ed. The t Distributions 6 The t density curve is similar in shape to the standard Normal curve. They are both symmetric about 0 and bell-shaped. The spread of the t distributions is a bit greater than that of the standard Normal curve (i.e., the t curve is slightly “fatter”). As the degrees of freedom increase, the t density curve approaches the N(0, 1) curve more closely. This is because s estimates s more accurately as the sample size increases. Chapter 17 BPS - 5th Ed. The t Distributions 7 Chapter 17 BPS - 5th Ed. Using Table C 8 Table C on page 693 gives critical values having upper tail probability p along with corresponding confidence level C. z* values are also displayed at the bottom. Chapter 17 BPS - 5th Ed. Using Table C Find the value t* with probability 0.025 to its right under the t(7) density curve. t* = 2.365 9 Chapter 17 BPS - 5th Ed. One-Sample t Confidence Interval Take an SRS of size n from a population with unknown mean m and unknown standard deviation s. A level C confidence interval for m is: x t s n where t* is the critical value for confidence level C from the t density curve with n – 1 degrees of freedom. – This interval is exact when the population distribution is Normal and approximate for large n in other cases. 10 Chapter 17 BPS - 5th Ed. Case Study American Adult Heights A study of 7 American adults from an SRS yields an average height of x = 67.2 inches and a standard deviation of s = 3.9 inches. A 95% confidence interval for the average height of all American adults (m) is: x t s n 67.2 2.365 3.9 67.2 3.486 7 63.714 to 70.686 “We are 95% confident that the average height of all American adults is between 63.714 and 70.686 inches.” 11 Chapter 17 BPS - 5th Ed. One-Sample t Test Like the confidence interval, the t test is close in form to the z test learned earlier. When estimating s with s, the test statistic becomes: x μ0 t s n where t follows the t density curve with n – 1 degrees of freedom, and the P-value of t is determined from that curve. – The P-value is exact when the population distribution is Normal and approximate for large n in other cases. 12 Chapter 17 BPS - 5th Ed. P-value for Testing Means Ha: m> m0 Ha: m< m0 P-value is the probability of getting a value as small or smaller than the observed test statistic (t) value. Ha: mm0 13 P-value is the probability of getting a value as large or larger than the observed test statistic (t) value. P-value is two times the probability of getting a value as large or larger than the absolute value of the observed test statistic (t) value. Chapter 17 BPS - 5th Ed. 14 Chapter 17 BPS - 5th Ed. Case Study Sweetening Colas (Ch. 14) Cola makers test new recipes for loss of sweetness during storage. Trained tasters rate the sweetness before and after storage. Here are the sweetness losses (sweetness before storage minus sweetness after storage) found by 10 tasters for a new cola recipe: 2.0 0.4 0.7 2.0 -0.4 2.2 -1.3 1.2 1.1 Are these data good evidence that the cola lost sweetness during storage? 15 Chapter 17 BPS - 5th Ed. 2.3 Case Study Sweetening Colas It is reasonable to regard these 10 carefully trained tasters as an SRS from the population of all trained tasters. While we cannot judge Normality from just 10 observations, a stemplot of the data shows no outliers, clusters, or extreme skewness. Thus, P-values for the t test will be reasonably accurate. 16 Chapter 17 BPS - 5th Ed. Case Study 1. 2. Hypotheses: Test Statistic: t (df = 101 = 9) H0: m = 0 H a: m > 0 x μ0 s 4. 17 2.70 1.196 n 3. 1.02 0 10 P-value: P-value = P(T > 2.70) = 0.0123 (using a computer) P-value is between 0.01 and 0.02 since t = 2.70 is between t* = 2.398 (p = 0.02) and t* = 2.821 (p = 0.01) (Table C) Conclusion: Since the P-value is smaller than a = 0.02, there is quite strong evidence that the new cola loses sweetness on average during storage at room temperature. Chapter 17 BPS - 5th Ed. Case Study Sweetening Colas 18 Chapter 17 BPS - 5th Ed. Matched Pairs t Procedures 19 To compare two treatments, subjects are matched in pairs and each treatment is given to one subject in each pair. Before-and-after observations on the same subjects also calls for using matched pairs. To compare the responses to the two treatments in a matched pairs design, apply the one-sample t procedures to the observed differences (one treatment observation minus the other). The parameter m is the mean difference in the responses to the two treatments within matched pairs of subjects in the entire population. Chapter 17 BPS - 5th Ed. Case Study Air Pollution Pollution index measurements were recorded for two areas of a city on each of 8 days. Are the average pollution levels the same for the two areas of the city? 20 Area A Area B A–B 2.92 1.84 1.08 1.88 0.95 0.93 5.35 4.26 1.09 3.81 3.18 0.63 4.69 3.44 1.25 4.86 3.69 1.17 5.81 4.95 0.86 5.55 4.47 1.08 Chapter 17 BPS - 5th Ed. Case Study Air Pollution It is reasonable to regard these 8 measurement pairs as an SRS from the population of all paired measurements. While we cannot judge Normality from just 8 observations, a stemplot of the data shows no outliers, clusters, or extreme skewness. Thus, P-values for the t test will be reasonably accurate. 0 689 1 11122 These 8 differences have x = 1.0113 and s = 0.1960. 21 Chapter 17 BPS - 5th Ed. Case Study 1. Hypotheses: 2. Test Statistic: (df = 81 = 7) H 0: m = 0 H a: m ≠ 0 t x μ0 s 4. 22 14.594 0.1960 n 3. 1.0113 0 8 P-value: P-value = 2P(T > 14.594) = 0.0000017 (using a computer) P-value is smaller than 2(0.0005) = 0.0010 since t = 14.594 is greater than t* = 5.041 (upper tail area = 0.0005) (Table C) Conclusion: Since the P-value is smaller than a = 0.001, there is very strong evidence that the mean pollution levels are different for the two areas of the city. Chapter 17 BPS - 5th Ed. Case Study Air Pollution Find a 95% confidence interval to estimate the difference in pollution indexes (A – B) between the two areas of the city. (df = 81 = 7 for t*) 0.1960 s x t 1.0113 2.365 1.0113 0.1639 n 8 0.8474 to 1.1752 We are 95% confident that the pollution index in area A exceeds that of area B by an average of 0.8474 to 1.1752 index points. 23 Chapter 17 BPS - 5th Ed. Chapter 18 Two-Sample Problems 24 Chapter 18 BPS - 5th Ed. Two-Sample Problems The goal of inference is to compare the responses to two treatments or to compare the characteristics of two populations. We have a separate sample from each treatment or each population. 25 Each sample is separate. The units are not matched, and the samples can be of differing sizes. Chapter 18 BPS - 5th Ed. Case Study Exercise and Pulse Rates A study if performed to compare the mean resting pulse rate of adult subjects who regularly exercise to the mean resting pulse rate of those who do not regularly exercise. n mean std. dev. Exercisers 29 66 8.6 Nonexercisers 31 75 9.0 This is an example of when to use the two-sample t procedures. 26 Chapter 18 BPS - 5th Ed. Conditions for Comparing Two Means We have two independent SRSs, from two distinct populations Both populations are Normally distributed 27 that is, one sample has no influence on the other-matching violates independence we measure the same variable for both samples. the means and standard deviations of the populations are unknown in practice, it is enough that the distributions have similar shapes and that the data have no strong outliers. Chapter 18 BPS - 5th Ed. Two-Sample t Procedures In order to perform inference on the difference of two means (m1 – m2), we’ll need the standard deviation of the observed difference x1 x2 : 2 σ1 2 σ2 n1 n2 28 Chapter 18 BPS - 5th Ed. Two-Sample t Confidence Interval Draw an SRS of size n1 form a Normal population with unknown mean m1, and draw an independent SRS of size n2 form another Normal population with unknown mean m2. A confidence interval for m1 – m2 is: x1 x2 t s12 s22 n1 n2 – here t* is the critical value for confidence level C for the t density curve. The degrees of freedom are equal to the smaller of n1 – 1 and n2 – 1. 29 Chapter 18 BPS - 5th Ed. Case Study Exercise and Pulse Rates Find a 95% confidence interval for the difference in population means (nonexercisers minus exercisers). 2 2 2 (8.6)2 s s (9.0) 1 x1 x2 t 2 75 66 2.048 n1 n2 31 29 9 4.65 4.35 to 13.65 “We are 95% confident that the difference in mean resting pulse rates (nonexercisers minus exercisers) is between 4.35 and 13.65 beats per minute.” 30 Chapter 18 BPS - 5th Ed. Two-Sample t Significance Tests Draw an SRS of size n1 form a Normal population with unknown mean m1, and draw an independent SRS of size n2 form another Normal population with unknown mean m2. To test the hypothesis H0: m1 = m2, the test statistic is: t ( x1 x 2 ) ( μ1 μ 2 ) 2 s1 n1 31 2 s2 n2 x1 x 2 2 s1 n1 2 s2 n2 Use P-values for the t density curve. The degrees of freedom are equal to the smaller of n1 – 1 and n2 – 1. Chapter 18 BPS - 5th Ed. P-value for Testing Two Means Ha: m1 > m2 Ha: m1 < m2 P-value is the probability of getting a value as small or smaller than the observed test statistic (t) value. Ha: m1 m2 32 P-value is the probability of getting a value as large or larger than the observed test statistic (t) value. P-value is two times the probability of getting a value as large or larger than the absolute value of the observed test statistic (t) value. Chapter 18 BPS - 5th Ed. Case Study Exercise and Pulse Rates Is the mean resting pulse rate of adult subjects who regularly exercise different from the mean resting pulse rate of those who do not regularly exercise? Null: The mean resting pulse rate of adult subjects who regularly exercise is the same as the mean resting pulse rate of those who do not regularly exercise? [H0: m1 = m2] Alt: The mean resting pulse rate of adult subjects who regularly exercise is different from the mean resting pulse rate of those who do not regularly exercise? [Ha : m1 ≠ m2] Degrees of freedom = 28 (smaller of 31 – 1 and 29 – 1). 33 Chapter 18 BPS - 5th Ed. Case Study 1. Hypotheses: H0: m1 = m2 2. Test Statistic: x1 x 2 t 2 s1 n1 2 Ha: m1 ≠ m2 75 66 s2 (9.0) n2 31 2 3.961 (8.6) 2 29 3. P-value: P-value = 2P(T > 3.961) = 0.000207 (using a computer) P-value is smaller than 2(0.0005) = 0.0010 since t = 3.961 is greater than t* = 3.674 (upper tail area = 0.0005) (Table C) 4. Conclusion: Since the P-value is smaller than a = 0.001, there is very strong evidence that the mean resting pulse rates are different for the two populations (nonexercisers and exercisers). 34 Chapter 18 BPS - 5th Ed. Avoid Inference About Standard Deviations 35 There are methods for inference about the standard deviations of Normal populations. Most software packages have methods for comparing the standard deviations. However, these methods are extremely sensitive to non-Normal distributions and this lack of robustness does not improve in large samples. Hence it is not recommended that one do inference about population standard deviations in basic statistical practice. Chapter 18 BPS - 5th Ed.