Download Chapter 23: Inferences about means Example: The expression mad

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 23: Inferences about means
Example: The expression mad as a hatter stems from brain damage sustained by 19thcentury hatmakers who used mercury
to soften felt. Mercury contamination of
Yale
Wildcat
Weir
fish is a substantial concern, and in 1969,
TsalaApopka
Trout
Trafford
Tohopekaliga
the Food and Drug Adminstration set a
Tarpon
Talquin
Shipp
Sampson
standard for the maximum safe level of
Rousseau
Rodman
Puzzle
Placid
mercury at .5 micrograms per gram but
Parker
Panasoffkee
Orange
Okeechobee
soon raised the standard to 1.0 mg/g.1
OcheesePond
OceanPond
Newmans
Monroe
State guidelines are often set at .5 mg/g.
Minneola
Miccasukee
Louisa
Lochloosa
Kissimmee
A question that initiates further discussion
Kingsley
Josephine
Jackson
Istokpoga
and investigation is
Iamonia
Hatchineha
Hart
Harney
whether fish taken from lakes are
Griffin
George
Farm−13
Eaton
safe (in the sense of containing on
EastTohopekaliga
Down
Dorr
Dias
average less than .5 mg/g of
DeerPoint
Crescent
Cherry
Bryant
mercury). The data shown to the
Brick
BlueCypress
Apopka
2
Annie
right are the average mercury
Alligator
0.0
0.5
1.0
concentrations of largemouth
Hg concentration (mg/m)
bass taken from Florida lakes.3
The question of interest is whether the average mercury concentration in largemouth bass
taken from Florida lakes is less than .5 mg/g.
The data shown above are measurements on a quantitative variable whereas the inferential
methods that have been developed so far apply to situations involving one or two Bernoulli
random variables. The general approach to inference is not different from this situation
though some of specifics differ.
The population parameter for this problem is the mean largemouth bass mercury concentration across all Florida lakes containing largemouth bass. The parameter is denoted by µ
and the usual estimator of µ is the sample mean y. The objectives in this chapter are to
construct a confidence interval for µ, and test the hypotheses
H0 : µ ≤ .5 versus Ha : µ > .5.
1
One explanation for the change is that swordfish could not be sold since the amount of methyl mercury
in swordfish on average was 1.0 microgram per gram.
2
On average 13 fish were taken from each lake and measured for mercury concentration level.
3
Lange TL, Royals HE, Connor LL (1993) Influence of water chemistry on mercury concentration in
largemouth bass from Florida lakes. Trans Am Fish Soc 122:74-84.
175
Mathematical foundation for one-sample inferential methods involving µ
Once again, the theoretical foundation of the confidence interval (and hypothesis test) is
the sampling distribution of a statistic, in this case, y.
According to the Central Limit Theorem, y is approximately normal in distribution when
the sample size n is large.4 The mean, standard deviation, and the approximate distribution
are specified by writing
√
·
y ∼ N (µ, σ/ n)
(1)
A 100(1 − 2α)% confidence interval for µ is derived by inverting the probability statement
µ
¶
y−µ
∗
∗
(2)
1 − 2α = P −zα ≤ √ ≤ zα ,
σ/ n
where zα∗ is a critical value from the standard normal table satisfying 1 − 2α = P (−zα∗ ≤
Z ≤ zα∗ ). For example, .95 = P (−1.96 ≤ Z ≤ 1.96) ⇒ zα∗ = 1.96. Inverting statement (2)
results in
µ
¶
∗ σ
∗ σ
1 − 2α = P y − zα √ ≤ µ ≤ y + zα √
.
n
n
If σ were known, then the confidence interval would be
σ
y ± zα∗ √ .
n
However, σ is rarely known and must be estimated by the sample standard deviation s.5
When σ is replaced by s, the interval needs to be wider to account for the uncertainty in
estimating σ. Furthermore, using s in place of σ degrades the accuracy of normal approximation. Consequently, the Central Limit Theorem does not apply to
y−µ
√
s/ n
(3)
unless the sample size is relatively large. In fact, the standardized mean shown in formula
(3) has a t distribution with n − 1 degrees of freedom6 rather than a normal distribution. t
4
5
6
The population distribution does not need to be normal for the Central Limit Theorem to hold.
The sample standard deviation is
v
u
n
u 1 X
s=t
(yi − y)2 .
n − 1 i=1
There are qualifications to this statement which will be ignored for the time being.
176
distributions will be used in place of the standard normal distribution in the construction of
the confidence interval for µ.
df=1
df=2
df=5
df=15
df=100
N(0,1)
0.2
0.3
0.4
The t-distributions are a family of
distributions indexed by a single
parameter called the degrees of freedom
(d.f.).7 The figure to the right shows
some t-distributions. The vertical lines
mark the 10th percentiles of the
distributions.
0.1
Properties of the t-distributions:
0.0
1. t-distributions are symmetric about
zero and have a shape similar to
the standard normal distribution.
−4
2. The spread decreases as the degrees
of freedom increases.
−2
0
2
4
t
3. The t-distributions have longer tails than the standard normal distribution.
4. The degrees of freedom is the sole parameter.
5. For large values of n, the t-distribution is indistinguishable from the standard normal
distribution.
Confidence intervals for µ are based on distributional result
y−µ
√ ∼ tn−1
s/ n
provided that y is the mean of a random sample drawn from a normal distribution with
mean µ and standard deviation σ. When the sampled distribution is not normal but moderately symmetric, then the distribution is approximately tn−1 , and the confidence interval
presented below is approximately correct.
The confidence interval is
s
y ± t∗ σ
b(y) = y ± t∗ √ ,
n
where t∗ is the critical value from a t-distribution with n − 1 degrees of freedom.
7
For now, df = n − 1.
177
Critical values and tail probabilities from the t-distribution
Table T, in De Veaux, et al. provides some critical values from the t-distributions. A
less-confusing t-table can be retrieved from the class webpage (ttable.pdf). In the pdf table,
the tabled values are percentiles corresponding to a right-tail area, or upper-tail probability.
Recall that the confidence level of a confidence interval is defined to be 100(1 − 2α). Given
d.f. = n−1, then it’s necessary to determine the critical value t∗ satisfying P (tn−1 ≥ t∗ ) = α.
Look on the line (or row) corresponding to the degrees of freedom. Then locate the column
headed by α. The intersection and row and column contains the critical value.
For example, the critical value t∗ for a 90% confidence interval with 14 degrees of freedom is 1.761.
For example, to find t∗ for a 95% confidence interval when n = 100, note that d.f. =
n − 1 = 99. There is no line for 99 degrees of freedom. The convention is to use the next
smaller tabled degrees of freedom, specifically, use the line for d.f= 80. Then, t∗ = 1.990 is
used as the critical value.
The (almost) exact critical value (t∗ = 1.9842) can be obtained from StatCrunch by selecting Stat/Calculators/T, setting d.f.= 99, Prob(x <=) = .975 and clicking on the Compute
button. Alternatively, the R command qt(p=.975,df=99) provides the same critical value.
The pdf t-table line headed by z ∗ provides critical values from the standard normal distribution.
Example: In 1926, Albert Michelson made the first truly accurate measurement of the speed
of light. His special expertise was in precision optical technology, and he had already been
awarded a Nobel Prize for the instrumentation involved in the Michelson-Morley experiment.8,9 In 1972, a group at the National Bureau of Standards (NBS) in Boulder, Colorado
determined the speed of light in vacuum to be 299, 792, 456.2 ± 1.1 m/s. Does a confidence
interval derived from Michelson’s data contain the NBS value?
8
The Michelson data analyzed herein was drawn from Ernest N. Dorsey’s 1944 paper: ”The Velocity of
Light” in the Transactions of the American Philosophical Society, Volume 34, Part 1, Pages 1-110, Table 22.
9
Morely, Michelson’s early partner, said he feared that Michelson had suffered a softening of the brain
early in his career, after Michelson was hospitalized for exhaustion in the 1880s. Michelson’s first wife tried
to have the scientist committed. His daughter wrote that her father often worked for days without sleeping or
eating, that he sat alone at meals so his thinking would not be disturbed, that in turn he could be arrogant,
distant, imperious and rude. The physicist also suffered from recurring nightmares, including one in which
he rode a motorcycle up an endless hill.
178
Michelson made n = 100 measurements on the speed of the light. The data are shown below
in a histogram. The sample statistics are y = 299.8524 × 106 m/s and s = .0790 × 106 m/s.
√
√
The standard error of y is s/ n = .0790/ 100 = .00790. From R, t∗ = 1.9842, and the
confidence interval for the speed of light is
25
20
15
5
10
Frequency
95% of all such confidence intervals
constructed using this procedure will
contain the population mean, yet the
current best estimate of the speed of
light is not contained in the interval.
It appears that Michelsons’ method was
biased downwards–it produces
estimates of the true speed that are
less than the current best estimate.
30
s
y ± t∗ √
= 299.8524 ± 1.9842 × .00790
n
= [299.8367, 299.8681].
0
The one-sample t-test The one-sample
t-test is used when there is one population.
299.6
299.7
299.8
299.9
Speed (m/s)
The variable of interest Y is quantitative
and the parameter of interest is the population mean E(Y ) = µ.
300.0
300.1
Hypotheses: Ha takes one of three forms:
Ha : µ 6= µ0 })
Ha : µ < µ 0
Ha : µ > µ0 ,
2-sided alternative
1-sided alternatives
where µ0 is a numerical value specific to the problem. The generic form of H0 is
H0 : µ = µ 0 ,
though sometimes it is restated as H0 : µ ≤ µ0 or H0 : µ ≥ µ0 . For example, in the
largemouth bass mercury contamination example, the hypotheses to be tested
are
H0 : µ ≤ .5 versus Ha : µ > .5
179
where µ denotes the mean mercury contamination of largemouth bass in Florida lakes.
Changing a null hypothesis from H0 : µ = µ0 to H0 : µ ≤ µ0 for instance, does not affect the test or p-value since µ0 is used in the numerator of the t-statistic.
The test statistic is
t=
y − µ0
√
s/ n
where µ0 is the null hypothesis value of µ. For this problem, µ0 = .5. The sample quantities
are
n = 53,
y = .527, and
√
s = .341 ⇒ s/ n = .0468.
Plugging in the sample values yields
y − µ0
√
s/ n
.527 − .5
=
.0468
= .576.
t =
The strength of evidence against H0 and in favor of Ha is measured by the p-value. The
p-value is the probability of observing a sample mean as extreme or more extreme (consistent
with Ha and contradicting H0 ) as the observed result supposing that the null hypothesis is
correct.
For example, in the largemouth bass mercury contamination example, values of the sample mean that are as or more extreme than the observed value are values at least .527mg/g
because Ha : µ > .5 implies that large values of y contradict H0 and support Ha . Consequently, the p-value is P (y ≥ .527|µ = .5). This form of the p-value needs to be transformed
by standardizing y–specifically, by computing the t-statistic as was done above. It is
p-value = P (y ≥ .527|µ = .5)
= P (t52 ≥ .576)
= .283.
The probability P (t52 ≥ .576) can be computed using R or StatCrunch. Without statistical
software, an approximate p-value can be obtained from the t-table. Find the line with the
appropriate degrees of freedom, or the line for the largest degrees of freedom that is less
than n − 1 = 52 (in this case, use the line for d.f.= 50). Find the two critical values that
bracket the value of the t-statistic. If the t-statistic is too small to be bracketed, (as it is
180
in this case), then the p-value is larger than the largest probability in the table (the largest
probability is .25). Report p-value> .25.
If the t-statistic is too large to be bracketed, then the p-value is smaller than the smallest probability (the smallest probability is .0005). Report p-value< .0005.
If Ha is two-sided, then the bracketing lower and upper probabilities are doubled.
Conclusion Returning to the problem, there is insufficient evidence to conclude that the
mean mercury concentration of largemouth bass in Florida lakes is greater than .5 mg/g
because p-value = .283. On the other hand, there are some particular lakes that appear to
contain unacceptably large mercury concentrations in the largemouth bass.
The guide to interpreting p-values (same as was presented in Chapter 20) is:
P-value
less than .001
.001 ≤p-value< .01
.01 ≤p-value< .05
.05 ≤p-value< .10
greater than .10
Strength
very strong evidence
strong evidence
moderate evidence
some evidence
no evidence
Conditions for t-procedures: The inferential procedures discussed above are accurate provided that the following conditions are met:
1. Independence: The data are random sample (or at least a representative sample) of the
population, and the 10% condition holds (n is less than 10% of the population size).
2. Normality: The data have been sampled from a normal population. t-procedures
are robust with respect to the normality condition.10 Larger sample sizes increase
robustness.
Outliers: If outliers are present in the data, then it cannot be assumed that the population
distribution is normal. The effect of outliers on confidence intervals is to increase the width
relative to the width without the outliers. A wider interval is more likely to bracket µ so
some outliers are tolerable.
Outliers affect hypothesis tests by reducing the magnitude of the t-statistic and increasing the p-value relative to the p-value without the outliers. The effect of outliers on an
10
Robust procedures are procedures that are not sensitive to mild or moderate deviations from the stated
conditions and yield accurate results in the presence of moderate deviations.
181
accept/reject decision rule is to reduce the probability of a Type I error. In general, tprocedures are described as conservative (or safe) when there are outliers.
Guidelines: Small sample sizes demand greater adherence to normality; specifically,
• If n < 15, use t-procedures only if the sample distribution is without skewness or
outliers.
• If 15 ≤ n < 40, t-procedures can be used reliably if sample distribution is roughly
normal (only mild skewness or mild outliers are present).
• If n ≥ 40, t-procedures can be used reliably even for highly skewed populations with
more extreme outliers.
182