* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Statistical inference wikipedia , lookup
Misuse of statistics wikipedia , lookup
Graph of the Day Lecture 9 Chapter 17. Inference for a population mean (σ unknown) Objectives (PSLS Chapter 17) Inference for the mean of one population (σ unknown) Know which sampling distribution to use when s is unknown Know the properties of t distributions Be able to apply Student’s t-tests Be able to use confidence intervals based on the t distribution Know how to adapt t procedures for matched pairs designs Recognize the limits of robustness for t-tests Motivating Examples: Sweetening colas and Guinness beer How is the sweetness of a cola drink affected by storage? The sweetness loss due to 1 year of storage was evaluated by 10 professional tasters (by comparing the sweetness before and after storage): Taster 1 2 3 4 5 6 7 8 9 10 “Sweetness” Change (Before – After) -2.0 -0.4 We want to test if storage -0.7 results in a loss of sweetness. -2.0 This can be translated into a 0.4 -2.2 statistical hypothesis (Ha), and 1.3 we can look for evidence -1.2 -1.1 against the null hypothesis of -2.3 no lose of sweetness, H0: m = 0 versus Ha: m ≠0 We are familiar with such tests, except we do not know the value of the population parameter s in this case. Motivating Examples: Sweetening colas and Guinness beer • In 1908 Guinness biochemist William Gosset developed the t-test • It was used as a means for comparing small samples of beer and beer ingredients, and may have been applied to beer quality control and process/recipe development. • It should be called the Gosset t-test. Gosset published the t-test under a pseudonym (“Student”), because Guinness did not want competitors to know they were using statistics to improve and maintain consistency of their product. • While it appears at first that Gosset’s t-distribution was only a minor tweak, those small adjustments are so important that engineers and scientists now recognize the t-distribution as essential when we don’t know s. What to do when s is unknown Use s. The sample standard deviation (s) provides an estimate of the population standard deviation (s). Z scores are values on a standard deviation scale. You can’t calculate a Z-score without knowing the standard deviation. You can estimate a Z-score if you have an estimate of the standard deviation. Just as with the sample mean, s is subject to sampling variation. s will be < s or > s . That means the estimates of Z will be over or underestimated in any given sample. Larger samples give more reliable estimates of s. The t distributions We take 1 random sample of size n from a Normal population N(µ,σ): s is known, the sampling distribution of x is Normal N(m, s/√n), ) and the statistic z = x m follows the standard Normal N(0,1). s n When When s is estimated from the sample standard deviation s, the statistic follows a t distribution t (m = 0, s/√n = 1) with n − 1 degrees of freedom. t= x m ) s n The t and Z sampling distributions m = 0, s / √n = 1, n = 15 When n is very large, s is a very good estimate of s and the corresponding t distributions are very close to the Normal distribution. The t distributions become wider for smaller sample sizes, reflecting the lack of precision in estimating s from s. Using the t-distribution tables When n is very large, s is a very good estimate of s and the corresponding t distributions are very close to the Normal distribution. Table C provides the t-values and corresponding confidence ranges for the tdistribution. How large does n have to be before the t-distribution approximates a Z distribution? Note: “degrees of freedom” equals n - 1. Table C When σ is unknown we use a t distribution with “n − 1” degrees of freedom (df). Table C shows the z-values and t-values corresponding to landmark P-values/ confidence levels. When σ is known, we use the Normal distribution and z. x m t= s n Standard deviation versus standard error For a sample of size n, the sample standard deviation s is: n − 1 is the “degrees of freedom.” 1 s= (x i - x ) 2 å n -1 The value s/√n is called the standard error of the mean SEM. Scientists often present their sample results as the mean ± SEM. A medical study examined the effect of a new medication on the seated systolic blood pressure. The results, presented as mean ± SEM for 25 patients, are 113.5 ± 8.9. What is the standard deviation s of the sample data? SEM = s/√n <=> s = SEM*√n s = 8.9*√25 = 44.5 Why n-1? If you know x , then once you calculate the first n-1 squared deviations, you know the last without having to calculate it. x Related example: If n = 5, = 0, and the first four observations are -2, -1, 0, & 1, then you know the last observation. It must take a certain value for all of the given information to be consistent. That last observation is not free to vary. In this example there are only 4 degrees of freedom in x. What happens if we calculate the sample standard deviation and use n instead of n-1 in the denominator? The one-sample t test As before, a test of hypotheses requires a few steps: 1. Identifying the biological hypothesis 2. Translating that a statistical null hypothesis (H0) 3. Choosing a significance level a 4. Calculating t and its degrees of freedom 5. Finding the area under the curve with Table C or software 6. Estimating the difference and stating the P-value 7. Making a conclusion about H0 8. Making a biological conclusion We draw a random sample of size n from an N(µ, σ) population. When s is estimated from s, the distribution of the test statistic t is a t distribution with df = n – 1. H o : m =mo 1 x m0 t= s n 0 t This resulting t test is robust to deviations from Normality as long as the sample size is large enough. The P-value is the probability, if H0 was true, of randomly drawing a sample like the one obtained or more extreme in the direction of Ha. One-sided (one-tailed) Two-sided (two-tailed) t= x m0 s n Using Table C: For Ha: μ ≠ μ0 if n = 10 and t = 2.70, then… 2.398 < t 2.7 < 2.821 so 0.04 > P-value > 0.02 Sweetening colas (cont.) Is there evidence that storage results in sweetness loss for the new cola recipe at the 0.05 level of significance (a = 5%)? H0: m = 0 versus Ha: m ≠ 0 x m0 1.02 0 = = 2.70 s n 1.196 10 df = n 1 = 9 t= 2.398 < t = 2.70 < 2.821, 0.04 > P > 0.02 P < a . The result is mildly significant. Taster Sweetness loss 1 2.0 2 0.4 3 0.7 4 2.0 5 -0.4 6 2.2 7 -1.3 8 1.2 9 1.1 10 2.3 ___________________________ Average 1.02 Standard deviation 1.196 There is a significant loss of sweetness, on average, following storage. Confidence intervals A confidence interval is a range of values that contains the true population parameter with probability (confidence level) C. We have a set of data from a population with both m and s unknown. We use x̅ to estimate m, and s to estimate s, using a t distribution (df n − 1). C is the area between −t* and t*. We find t* in the line of Table C. The margin of error m is: m = t*s n C m −t* m t* Taster Sweetening colas (cont.) Sweetness loss (positive value = loss) What is the true population mean sweetness loss after storage? We want 90% confidence. 1 2.0 2 0.4 3 0.7 4 2.0 5 -0.4 6 2.2 7 -1.3 8 1.2 9 1.1 10 2.3 ___________________________ Mean 1.02 Standard deviation m = t* s 1.196 n = 1.833 1.196 / 10 0.693 x m = 1.02 0.69 0.33 to 1.71 With 90% confidence, the true population mean sweetness loss is somewhere between 0.33 and 1.71. Matched pairs t procedures Sometimes we want to compare treatments or conditions at the individual level. The data from the pairs of observations are not independent. Example study designs where the individuals in one sample are related to those in the other sample: Pre-test and post-test studies look at data collected on the same sample elements before and after some experiment is performed. Twin studies often try to sort out the influence of genetic factors by comparing a variable between sets of twins. Using people matched for age, sex, and education in social studies allows us to cancel out the effect of these potential lurking variables. In these cases, we use the paired data to test for the difference in the two population means. _ The variable tested becomes Xdiff = average difference, and H0: µdifference=0; Ha: µdifference ≠ 0 Conceptually, this is just like a test for one population mean. Sweetening colas (revisited) The sweetness loss due to storage was evaluated by 10 professional tasters (comparing the sweetness before and after storage): Taster 1 2 3 4 5 6 7 8 9 10 Sweetness loss 2.0 0.4 0.7 2.0 −0.4 2.2 −1.3 1.2 1.1 2.3 We want to test if storage results in a loss of sweetness, thus H0: m = 0 versus Ha: m ≠ 0 Although the text did not mention it explicitly, this is a pre-/post-test design, and the variable is the difference in cola sweetness before and after storage. A matched pairs test of significance is indeed just like a one-sample test. Does lack of caffeine increase depression? Randomly selected caffeine-dependent individuals were deprived of all caffeinerich foods and assigned to receive daily pills. At one time the pills contained caffeine and, at another time they were a placebo. Depression was assessed quantitatively (higher scores represent greater depression). Depression Depression Placebo Subject with Caffeine with Placebo Caffeine Cafeine 1 5 16 11 2 5 23 18 3 4 5 1 4 3 7 4 5 8 14 6 6 5 24 19 7 0 6 6 8 0 3 3 9 2 15 13 10 11 12 1 11 1 0 -1 This is a matched pairs design with 2 data points for each subject. We compute a new variable “Difference” Placebo minus Caffeine With 11 "difference" points, df = n – 1 = 10. We find: x̅diff = 7.36; sdiff = 6.92; so SEMdiff = sdiff / √n = 6.92/√11 = 2.086 We test: H0: mdiff = 0 ; Ha: mdiff ≠ 0 t= xdiff mdiff sdiff n = xdiff 0 SEM diff 7.36 = 3.53 2.086 (…) For df = 10, 3.169 < t 3.53 < 3.581 0.005 > P-value > 0.0025 (Software gives P = 0.0207.) Caffeine deprivation causes a significant increase in depression (P < 0.005, n = 11). [Assuming the P-value is valid] Robustness The t procedures are exactly correct when the population is exactly Normal. This is rare. The t procedures are robust to small deviations from normality, but: The sample must be a random sample from the population. Parent population that produce lots of outliers and are skewed strongly influence the mean and therefore the t procedures. Their impact diminishes as the sample size gets larger because of the Central Limit Theorem. As a guideline: When n < 15, the data must be close to normal and without outliers. When 15 > n > 40, mild skewness is acceptable, but not outliers. When n > 40, the t statistic will be valid even with strong skewness. Red wine, in moderation Does drinking red wine in moderation increase blood polyphenol levels, thus maybe protecting against heart attacks? Nine randomly selected healthy men were assigned to drink half a bottle of red wine daily for two weeks. The percent change in their blood polyphenol levels was assessed: 0.7 3.5 4 4.9 5.5 7 7.4 8.1 8.4 x̅ = 5.5; s = 2.517; df = n − 1 = 8 1.2 2.4 3.6 4.8 6.0 Percent change in blood polyphenol level 7.2 8.4 Can we use a t inference procedure for this study? Discuss the assumptions.