Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 23: Inferences about means Example: The expression mad as a hatter stems from brain damage sustained by 19thcentury hatmakers who used mercury to soften felt. Mercury contamination of Yale Wildcat Weir fish is a substantial concern, and in 1969, TsalaApopka Trout Trafford Tohopekaliga the Food and Drug Adminstration set a Tarpon Talquin Shipp Sampson standard for the maximum safe level of Rousseau Rodman Puzzle Placid mercury at .5 micrograms per gram but Parker Panasoffkee Orange Okeechobee soon raised the standard to 1.0 mg/g.1 OcheesePond OceanPond Newmans Monroe State guidelines are often set at .5 mg/g. Minneola Miccasukee Louisa Lochloosa Kissimmee A question that initiates further discussion Kingsley Josephine Jackson Istokpoga and investigation is Iamonia Hatchineha Hart Harney whether fish taken from lakes are Griffin George Farm−13 Eaton safe (in the sense of containing on EastTohopekaliga Down Dorr Dias average less than .5 mg/g of DeerPoint Crescent Cherry Bryant mercury). The data shown to the Brick BlueCypress Apopka 2 Annie right are the average mercury Alligator 0.0 0.5 1.0 concentrations of largemouth Hg concentration (mg/m) bass taken from Florida lakes.3 The question of interest is whether the average mercury concentration in largemouth bass taken from Florida lakes is less than .5 mg/g. The data shown above are measurements on a quantitative variable whereas the inferential methods that have been developed so far apply to situations involving one or two Bernoulli random variables. The general approach to inference is not different from this situation though some of specifics differ. The population parameter for this problem is the mean largemouth bass mercury concentration across all Florida lakes containing largemouth bass. The parameter is denoted by µ and the usual estimator of µ is the sample mean y. The objectives in this chapter are to construct a confidence interval for µ, and test the hypotheses H0 : µ ≤ .5 versus Ha : µ > .5. 1 One explanation for the change is that swordfish could not be sold since the amount of methyl mercury in swordfish on average was 1.0 microgram per gram. 2 On average 13 fish were taken from each lake and measured for mercury concentration level. 3 Lange TL, Royals HE, Connor LL (1993) Influence of water chemistry on mercury concentration in largemouth bass from Florida lakes. Trans Am Fish Soc 122:74-84. 175 Mathematical foundation for one-sample inferential methods involving µ Once again, the theoretical foundation of the confidence interval (and hypothesis test) is the sampling distribution of a statistic, in this case, y. According to the Central Limit Theorem, y is approximately normal in distribution when the sample size n is large.4 The mean, standard deviation, and the approximate distribution are specified by writing √ · y ∼ N (µ, σ/ n) (1) A 100(1 − 2α)% confidence interval for µ is derived by inverting the probability statement µ ¶ y−µ ∗ ∗ (2) 1 − 2α = P −zα ≤ √ ≤ zα , σ/ n where zα∗ is a critical value from the standard normal table satisfying 1 − 2α = P (−zα∗ ≤ Z ≤ zα∗ ). For example, .95 = P (−1.96 ≤ Z ≤ 1.96) ⇒ zα∗ = 1.96. Inverting statement (2) results in µ ¶ ∗ σ ∗ σ 1 − 2α = P y − zα √ ≤ µ ≤ y + zα √ . n n If σ were known, then the confidence interval would be σ y ± zα∗ √ . n However, σ is rarely known and must be estimated by the sample standard deviation s.5 When σ is replaced by s, the interval needs to be wider to account for the uncertainty in estimating σ. Furthermore, using s in place of σ degrades the accuracy of normal approximation. Consequently, the Central Limit Theorem does not apply to y−µ √ s/ n (3) unless the sample size is relatively large. In fact, the standardized mean shown in formula (3) has a t distribution with n − 1 degrees of freedom6 rather than a normal distribution. t 4 5 6 The population distribution does not need to be normal for the Central Limit Theorem to hold. The sample standard deviation is v u n u 1 X s=t (yi − y)2 . n − 1 i=1 There are qualifications to this statement which will be ignored for the time being. 176 distributions will be used in place of the standard normal distribution in the construction of the confidence interval for µ. df=1 df=2 df=5 df=15 df=100 N(0,1) 0.2 0.3 0.4 The t-distributions are a family of distributions indexed by a single parameter called the degrees of freedom (d.f.).7 The figure to the right shows some t-distributions. The vertical lines mark the 10th percentiles of the distributions. 0.1 Properties of the t-distributions: 0.0 1. t-distributions are symmetric about zero and have a shape similar to the standard normal distribution. −4 2. The spread decreases as the degrees of freedom increases. −2 0 2 4 t 3. The t-distributions have longer tails than the standard normal distribution. 4. The degrees of freedom is the sole parameter. 5. For large values of n, the t-distribution is indistinguishable from the standard normal distribution. Confidence intervals for µ are based on distributional result y−µ √ ∼ tn−1 s/ n provided that y is the mean of a random sample drawn from a normal distribution with mean µ and standard deviation σ. When the sampled distribution is not normal but moderately symmetric, then the distribution is approximately tn−1 , and the confidence interval presented below is approximately correct. The confidence interval is s y ± t∗ σ b(y) = y ± t∗ √ , n where t∗ is the critical value from a t-distribution with n − 1 degrees of freedom. 7 For now, df = n − 1. 177 Critical values and tail probabilities from the t-distribution Table T, in De Veaux, et al. provides some critical values from the t-distributions. A less-confusing t-table can be retrieved from the class webpage (ttable.pdf). In the pdf table, the tabled values are percentiles corresponding to a right-tail area, or upper-tail probability. Recall that the confidence level of a confidence interval is defined to be 100(1 − 2α). Given d.f. = n−1, then it’s necessary to determine the critical value t∗ satisfying P (tn−1 ≥ t∗ ) = α. Look on the line (or row) corresponding to the degrees of freedom. Then locate the column headed by α. The intersection and row and column contains the critical value. For example, the critical value t∗ for a 90% confidence interval with 14 degrees of freedom is 1.761. For example, to find t∗ for a 95% confidence interval when n = 100, note that d.f. = n − 1 = 99. There is no line for 99 degrees of freedom. The convention is to use the next smaller tabled degrees of freedom, specifically, use the line for d.f= 80. Then, t∗ = 1.990 is used as the critical value. The (almost) exact critical value (t∗ = 1.9842) can be obtained from StatCrunch by selecting Stat/Calculators/T, setting d.f.= 99, Prob(x <=) = .975 and clicking on the Compute button. Alternatively, the R command qt(p=.975,df=99) provides the same critical value. The pdf t-table line headed by z ∗ provides critical values from the standard normal distribution. Example: In 1926, Albert Michelson made the first truly accurate measurement of the speed of light. His special expertise was in precision optical technology, and he had already been awarded a Nobel Prize for the instrumentation involved in the Michelson-Morley experiment.8,9 In 1972, a group at the National Bureau of Standards (NBS) in Boulder, Colorado determined the speed of light in vacuum to be 299, 792, 456.2 ± 1.1 m/s. Does a confidence interval derived from Michelson’s data contain the NBS value? 8 The Michelson data analyzed herein was drawn from Ernest N. Dorsey’s 1944 paper: ”The Velocity of Light” in the Transactions of the American Philosophical Society, Volume 34, Part 1, Pages 1-110, Table 22. 9 Morely, Michelson’s early partner, said he feared that Michelson had suffered a softening of the brain early in his career, after Michelson was hospitalized for exhaustion in the 1880s. Michelson’s first wife tried to have the scientist committed. His daughter wrote that her father often worked for days without sleeping or eating, that he sat alone at meals so his thinking would not be disturbed, that in turn he could be arrogant, distant, imperious and rude. The physicist also suffered from recurring nightmares, including one in which he rode a motorcycle up an endless hill. 178 Michelson made n = 100 measurements on the speed of the light. The data are shown below in a histogram. The sample statistics are y = 299.8524 × 106 m/s and s = .0790 × 106 m/s. √ √ The standard error of y is s/ n = .0790/ 100 = .00790. From R, t∗ = 1.9842, and the confidence interval for the speed of light is 25 20 15 5 10 Frequency 95% of all such confidence intervals constructed using this procedure will contain the population mean, yet the current best estimate of the speed of light is not contained in the interval. It appears that Michelsons’ method was biased downwards–it produces estimates of the true speed that are less than the current best estimate. 30 s y ± t∗ √ = 299.8524 ± 1.9842 × .00790 n = [299.8367, 299.8681]. 0 The one-sample t-test The one-sample t-test is used when there is one population. 299.6 299.7 299.8 299.9 Speed (m/s) The variable of interest Y is quantitative and the parameter of interest is the population mean E(Y ) = µ. 300.0 300.1 Hypotheses: Ha takes one of three forms: Ha : µ 6= µ0 }) Ha : µ < µ 0 Ha : µ > µ0 , 2-sided alternative 1-sided alternatives where µ0 is a numerical value specific to the problem. The generic form of H0 is H0 : µ = µ 0 , though sometimes it is restated as H0 : µ ≤ µ0 or H0 : µ ≥ µ0 . For example, in the largemouth bass mercury contamination example, the hypotheses to be tested are H0 : µ ≤ .5 versus Ha : µ > .5 179 where µ denotes the mean mercury contamination of largemouth bass in Florida lakes. Changing a null hypothesis from H0 : µ = µ0 to H0 : µ ≤ µ0 for instance, does not affect the test or p-value since µ0 is used in the numerator of the t-statistic. The test statistic is t= y − µ0 √ s/ n where µ0 is the null hypothesis value of µ. For this problem, µ0 = .5. The sample quantities are n = 53, y = .527, and √ s = .341 ⇒ s/ n = .0468. Plugging in the sample values yields y − µ0 √ s/ n .527 − .5 = .0468 = .576. t = The strength of evidence against H0 and in favor of Ha is measured by the p-value. The p-value is the probability of observing a sample mean as extreme or more extreme (consistent with Ha and contradicting H0 ) as the observed result supposing that the null hypothesis is correct. For example, in the largemouth bass mercury contamination example, values of the sample mean that are as or more extreme than the observed value are values at least .527mg/g because Ha : µ > .5 implies that large values of y contradict H0 and support Ha . Consequently, the p-value is P (y ≥ .527|µ = .5). This form of the p-value needs to be transformed by standardizing y–specifically, by computing the t-statistic as was done above. It is p-value = P (y ≥ .527|µ = .5) = P (t52 ≥ .576) = .283. The probability P (t52 ≥ .576) can be computed using R or StatCrunch. Without statistical software, an approximate p-value can be obtained from the t-table. Find the line with the appropriate degrees of freedom, or the line for the largest degrees of freedom that is less than n − 1 = 52 (in this case, use the line for d.f.= 50). Find the two critical values that bracket the value of the t-statistic. If the t-statistic is too small to be bracketed, (as it is 180 in this case), then the p-value is larger than the largest probability in the table (the largest probability is .25). Report p-value> .25. If the t-statistic is too large to be bracketed, then the p-value is smaller than the smallest probability (the smallest probability is .0005). Report p-value< .0005. If Ha is two-sided, then the bracketing lower and upper probabilities are doubled. Conclusion Returning to the problem, there is insufficient evidence to conclude that the mean mercury concentration of largemouth bass in Florida lakes is greater than .5 mg/g because p-value = .283. On the other hand, there are some particular lakes that appear to contain unacceptably large mercury concentrations in the largemouth bass. The guide to interpreting p-values (same as was presented in Chapter 20) is: P-value less than .001 .001 ≤p-value< .01 .01 ≤p-value< .05 .05 ≤p-value< .10 greater than .10 Strength very strong evidence strong evidence moderate evidence some evidence no evidence Conditions for t-procedures: The inferential procedures discussed above are accurate provided that the following conditions are met: 1. Independence: The data are random sample (or at least a representative sample) of the population, and the 10% condition holds (n is less than 10% of the population size). 2. Normality: The data have been sampled from a normal population. t-procedures are robust with respect to the normality condition.10 Larger sample sizes increase robustness. Outliers: If outliers are present in the data, then it cannot be assumed that the population distribution is normal. The effect of outliers on confidence intervals is to increase the width relative to the width without the outliers. A wider interval is more likely to bracket µ so some outliers are tolerable. Outliers affect hypothesis tests by reducing the magnitude of the t-statistic and increasing the p-value relative to the p-value without the outliers. The effect of outliers on an 10 Robust procedures are procedures that are not sensitive to mild or moderate deviations from the stated conditions and yield accurate results in the presence of moderate deviations. 181 accept/reject decision rule is to reduce the probability of a Type I error. In general, tprocedures are described as conservative (or safe) when there are outliers. Guidelines: Small sample sizes demand greater adherence to normality; specifically, • If n < 15, use t-procedures only if the sample distribution is without skewness or outliers. • If 15 ≤ n < 40, t-procedures can be used reliably if sample distribution is roughly normal (only mild skewness or mild outliers are present). • If n ≥ 40, t-procedures can be used reliably even for highly skewed populations with more extreme outliers. 182