Download Chapter 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
620.152 Introduction to Biomedical Statistics
7. Hypothesis testing
7.1
7.2
7.3
7.4
7.5
Introduction and terminology
Hypothesis testing and confidence intervals
p values
Critical region, types of error and power
Hypothesis testing for Normal populations
... 1
... 3
... 3
... 4
. . . 10
References: Pagano and Gauvreau, Chapter 10.
“I had come to an entirely erroneous conclusion, which shows, my dear Watson,
how dangerous it is to reason from insufficient data.”
Sherlock Holmes, The Speckled Band, 1892.
7.1 Introduction
We introduce the concepts and terminology for hypothesis testing using the
case of inference on the population mean, µ, based on a random sample from
a Normal population with known variance. In this case, inference is based on
the result
X̄ − µ d
√ = N(0, 1).
Z=
σ/ n
Extension to other cases comes later: not until section 7.5.
Hypothesis testing can be regarded as the “other side” of confidence
intervals. We have seen that a confidence interval for the parameter µ
gives a set of “plausible” values for µ.
Suppose we are interested in whether µ = µ0 . In determining whether
or not µ0 is a plausible value for µ (using a confidence interval) we are
really testing µ = µ0 against the alternative that µ 6= µ0 . If µ0 is not a
plausible value, then we would reject µ = µ0 .
In this subject, we deal only with two-sided confidence intervals and,
correspondingly, with two-sided tests, i.e tests against a two-sided alternative (µ = µ0 vs µ 6= µ0 ). There are circumstances in which onesided tests and one-sided confidence intervals seem more appropriate.
Some statisticians argue that they are never appropriate. In any case,
we will use only two-sided tests.
All our confidence intervals are based on the central probability interval
for the estimator, i.e. that obtained by excluding probability 12 α at each
end of the distribution, giving a Q% confidence interval, where Q =
100(1 − α).
In the first instance, this means that our tests are based on rejecting
µ = µ0 for an event of probability 12 α at either end of the estimator
distribution.
Note: this is not always the case for other test statistics, i.e. test statistics that
are not estimators. For example, using a test statistic such as U = (X̄ − µ0 )2
to test µ = µ0 . We will consider such cases in a later chapter.
page 7.1
620.152 Introduction to Biomedical Statistics
Example (Serum cholesterol level)
The distribution of serum cholesterol level for the population of males
in the US who are hypertensive and who smoke is approximately normal with an unknown mean µ. However, we do know that the mean
serum cholesterol level for the general population of all 20–74-year-old
males is 211 mg/100ml. We might wonder whether the mean cholesterol level of the subpopulation of men who smoke and are hypertensive is different.
Suppose we select a sample of 25 men from this group and their mean
cholesterol level is x̄ = 220 mg/100ml. What can we conclude from
this?
A statistical hypothesis is a statement concerning the probability distribution of a population (a random variable X).
We are concerned with parametric hypotheses: where the distribution
of X is specified except for a parameter. In the present case µ, where the
population distribution is N(µ, σ 2 ). The hypotheses can take the form
µ = 6, or µ 6= 4 (or µ > 10, or 6 < µ < 8, or . . . ).
The hypothesis under test is called the null hypothesis, denoted H0 . It
has a special importance in that it usually reflects the status quo: the
way things were, or should be. Often the null hypothesis represents a
“no effect” hypothesis. The onus is on the experimenter to demonstrate
that an “effect” exists. We don’t reject the null hypothesis unless there
is strong evidence against it.
We always take H0 to be a simple hypothesis: µ = µ0 .
We test the null hypothesis against an alternative hypothesis, denoted by
H1 . We will always take the alternative hypothesis to be H00 , i.e. the
complement of H0 (µ 6= µ0 ).
It it need not be: it may be another simple hypothesis (µ = µ1 ) or a one-sided
alternative (µ > µ0 ).
Example (Serum cholesterol level)
State the null and alternative hypotheses for this problem.
The “logic” of the hypothesis testing procedure seems a bit back-to-front at first.
It is based on the contrapositive: [P ⇒ Q] = [Q0 ⇒ P 0 ].
For example: [sheeP ⇒ Quadraped] = [notQuadraped ⇒ notsheeP ];
[(x = 2) ⇒ (x2 = 4)] = [(x2 6= 4) ⇒ (x 6= 2)].
Our application is rather more uncertain:
[(µ = µ0 ) ⇒ (x̄ ≈ µ0 )] = [(x̄ 6≈ µ0 ) ⇒ (µ 6= µ0 )]
This logic means that we have a (NQR) “proof” of µ 6= µ0 . (If the signs were
all equalities rather than (random) approximations, it would be a proof.)
We have no means of “proving” (NQR or otherwise) that µ = µ0 .
We observe the sample and compute x̄.
On the basis of the sample and the , we must reach a decision: “reject
H0 ”, or not.
Statisticians are reluctant to use “accept H0 ” for “do not reject H0 ”, for
the reasons indicated above. Mind you, this does seem a bit odd when they
can use “success” to mean “the patient dies”. If ever I use “accept H0 ” (and
I’m inclined to), it means only “do not reject H0 ”. In particular, it does not
mean that H0 is true, or even that I think it likely to be true!
“I am getting into your involved habit, Watson, of telling a story backward.”
Sherlock Holmes, The Problem of Thor Bridge, 1927.
page 7.2
620.152 Introduction to Biomedical Statistics
To demonstrate the existence of an effect (µ 6= µ0 ), the sample must
produce evidence against the no-effect hypothesis (µ = µ0 ).
There are several ways of approaching this. The first, and simplest after
Chapter 6, is to compute a confidence interval (which is a good idea in
any case); and then to check whether or not the null-hypothesis value
(µ = µ0 ) is in the in the confidence interval.
7.2 Hypothesis testing and confidence intervals
We have seen how to obtain a confidence interval for µ, so there is not
much more to do. In fact, a number of the problems in Problem Set 6
had parts that questioned the plausibility of particular values of µ. This
is now seen to be equivalent to hypothesis testing.
Example We obtain a random sample of n=40 from a Normal population with known standard deviation σ=4. The sample mean is x̄=11.62.
Test the null hypothesis H0 : µ=10 (against a two-sided alternative).
95% CI for µ: (11.62 ± 1.96 √4 ) = (10.38, 12.86).
40
Since the 95% confidence interval does not include 10, we reject the null
hypothesis µ = 10. There is evidence in this sample that µ > 10.
Example
(Serum cholesterol level)
95% CI for µ:
220 ± 1.96× √46
25
[n=25, µ0 =211, σ=46; x̄=220]
= (202.0, 238.0).
Since the 95% confidence interval includes 211, we do not reject the null
hypothesis µ = 211. There is no evidence in this sample that µ 6= 211.
7.3 p-value
We measure the strength of the evidence of the sample against H0 by
using the “unlikelihood” of the data if H0 is true. The idea is to work
out how unlikely the observed sample is, assuming µ = µ0 . If it is “too
unlikely”, then we reject H0 ; and otherwise, we do not reject H0 .
The p-value is the probability (if H0 were true) of observing a value as
extreme as the one observed. This means that:
p = 2 Pr(X̄ 0 is at least as far from µ0 as the observed x̄),
where X̄ 0 denotes a (hypothetical) sample mean, assuming H0 true, i.e.
d
2
X̄ 0 = N(µ0 , σn ); the 2 is because this is a two-sided test, and we must
allow for the possibility of being as extreme at the other end of the distribution. Therefore:
½
2 Pr(X̄ 0 > x̄) if x̄>µ0
p=
2 Pr(X̄ 0 < x̄) if x̄<µ0
We use X̄ 0 to denote a “pretend” X̄; to work out what would happen
if we did the experiment again under other assumptions: in this case,
assuming that H0 is true.
Example We obtain a random sample of n=40 from a Normal population with known standard deviation σ=4. The sample mean is x̄=11.62.
Test the null hypothesis H0 : µ=10 (against a two-sided alternative).
[n=40, σ=4, µ0 =10, x̄=11.62]
2
d
4
p = 2 Pr(X̄ 0 >11.62), where X̄ 0 = N(10, 40
)
√
p = 2 Pr(X̄ 0 >11.62) = 2 Pr(X̄s0 > 11.62−10
) = 2 Pr(X̄s0 >2.56) = 0.010.
4/ 40
page 7.3
620.152 Introduction to Biomedical Statistics
Example A random sample of fifty observation is obtained from a Normal population with standard deviation 5. The observed sample mean
is 8.3. Test the null hypothesis that µ = 10.
[n=50, µ0 =10, σ=5, x̄=8.3]
2
d
5
p = 2 Pr(X̄ 0 <8.3), where X̄ 0 = N(10, 50
).
√ ) = 2 Pr(X̄ 0 < − 2.40) = 0.016.
p = 2 Pr(X̄ 0 <8.3) = 2 Pr(X̄s0 < 8.3−10
s
5/ 50
Now we must specify what is meant by “too unlikely”; i.e. how small
is “too small” a value for p? It seems sensible to match our idea of
what is “too small”, with what is “implausible”. Thus, if we reject H0
if p < 0.05, then this corresponds exactly to values outside the 95%
confidence interval, i.e. the “implausible” values.
Our standard testing procedure therefore, is to compute the p-value
and to reject H0 if p < 0.05 (and not to reject H0 otherwise). Thus, in
both the above two examples, we would reject H0 (at the 5% level of
significance).
We have seen how to compute the probability, so there is nothing new there.
What is new here is the terminology that comes with it.
One advantage of the p-value is that it gives a standard measure of the
evidence against H0 .
As we can specify different levels for a confidence interval, we can specify different levels for the test. To correspond to a 99% CI, we would
reject H0 if p < 0.01.
We reject H0 if p < α, where α denotes the significance level of the test.
Typically we use α=0.05, just as we typically use a 95% confidence interval. But we may choose α=0.01 or 0.001 or another value.
When p<α, we reject H0 and say that the result is statistically significant.
Example
(Serum cholesterol level)
[n=25, µ0 =211, σ=46; x̄=220]
2
d
p = 2 Pr(X̄ 0 > 220), where X̄ 0 = N(211, 46
).
25
√ ) = 2 Pr(X̄ 0 > 0.978) = 0.328
Thus p = 2 Pr(X̄s0 > 220−211
46/ 25
Since p > 0.05, we do not reject the null hypothesis µ = 211. There is
no evidence in this sample that µ 6= 211.
7.4 Critical values, types of error and power
The critical value approach specifies a decision rule for rejecting H0 : for
example, for a test of significance level 0.05 the rule is
reject H0 if x̄ < µ0 −1.96 √σn or x̄ > µ0 +1.96 √σn .
This specifies a rejection rule in terms of the estimate: if the estimate (x̄)
is too far away from the H0 value (µ0 ), then H0 is rejected.
The rejection rule is often best expressed in terms of a statistic that has
a standard distribution if H0 is true. Here the test statistic is
X̄−µ
Z = σ/√n0
d
which is such that, if H0 is true, then Z = N(0, 1).
page 7.4
620.152 Introduction to Biomedical Statistics
The rule then is to compute the observed value of Z and to see if it
could plausibly be an observation from a standard Normal distribution.
If not, we reject H0 . This leads to the name often used for this test: the
z-test.
Note that Z involves only X̄ and known constants (the null hypothesis
value µ0 , the known standard deviation, σ, and the sample size, n).
In particular, Z does not depend on the unknown parameter µ.
We compute the observed value of Z:
x̄−µ
z = σ/√n0
and compare it to the standard Normal distribution (where “plausible”
is taken to mean within the central 95% of the distribution) . Thus the
decision rule is
reject H0 if z < −1.96 or z > 1.96; i.e. if |z| > 1.96.
which corresponds exactly to the rejection region for x̄ given above.
Example A random sample of fifty observation is obtained from a Normal population with standard deviation 5. The observed sample mean
is 8.3. Test the null hypothesis that µ = 10.
[n=50, µ0 =10, σ=5, x̄=8.3]
√
= −2.40,
z = 8.3−10
⇒
5/ 50
hence we reject H0 (using significance level 0.05) since z < −1.96.
There is evidence in this sample that µ < 10.
Example We obtain a random sample of n=40 from a Normal population with known standard deviation σ=4. The sample mean is x̄=11.62.
Test the null hypothesis H0 : µ=10 (against a two-sided alternative).
[n=40, µ0 =10, σ=4, x̄=11.62]
⇒
√
z = 11.62−10
= 2.56,
4/ 40
Since |z| > 1.96, we reject the null hypothesis µ = 10. There is evidence
in this sample that µ > 10.
Example
z=
(Serum cholesterol level)
x̄−µ
√0
σ/ n
=
220−211
√
46/ 25
[n=25, µ0 =211, σ=46; x̄=220]
= 0.978.
Since |z| < 1.96, we do not reject the null hypothesis µ = 211. There is
no evidence in this sample that µ 6= 211.
Types of error
In deciding whether to accept or reject H0 , there is a risk of making two
types of errors:
H0 true
reject H0
× error of type I
prob α
α = significance level
don’t reject H0
X correct
prob 1−α
H1 true
reject H0
X correct
prob 1−β
1−β = power
don’t reject H0
× error of type II
prob β
We want α and β to be small.
The significance level, α is usually pre-set at 0.05; we then do what we
can to make the power large (and hence β small). This will generally
mean taking a bigger sample.
page 7.5
620.152 Introduction to Biomedical Statistics
Example (Birth weights)
A researcher thinks that mothers with low socioeconomic status (SES)
deliver babies whose birthweights are lower than “normal”. To test
this hypothesis, a random sample of birthweights from 100 consecutive, full-term, live-born babies from the maternity ward of a hospital
in a low-SES area is obtained. Their mean birthweight is found to be
3240 g. We know from nationwide surveys that the population mean
birthweight is 3400 g with a standard deviation of 700 g.
• Do the data support her hypothesis?
√
[n=100, x̄=3240; we assume σ=700; µ0 =3400] z = 3240−3400
= −2.29.
700/ 100
Since |z| > 1.96, we reject H0 . There is significant evidence in this sample that the mean birthweight of SES babies is less than the national
average.
• Describe the type I and type II errors in this context.
In this context, a type I error is to conclude that “SES babies” are different, when they are actually the same as the rest of the population;
a type II error is to conclude that “SES babies” are the same, when they
are in fact different.
Example (Serum cholesterol level)
• Describe the type I and type II errors in this context.
A type I error is to conclude that the group of interest (SHM = men who
smoke and have hypertension) have different mean serum cholesterol
level from the general population, when they actually have the same
mean.
A type II error is to conclude that the SHM individuals are no different
from the general population with respect to serum cholesterol levels,
when in fact they are different.
• Compute β, the probability of making a type II error, when the true
value of µ is 250.
[n = 25, µ0 = 211, σ = 46]
β = Pr(don’t reject H0 | µ = 250)
d
2
= Pr(211−1.96× √46 < X̄ 0 < 211+1.96× √46 ), where X̄ 0 = N(250, 46
)
25
= Pr(−
39
√
46/ 25
25
− 1.96 < X̄s0 < −
39
√
46/ 25
25
+ 1.96)
X̄s0
= Pr(−6.20 <
< −2.28)
= 0.0113 − 0.0000
= 0.011
√ .
This calculation can be done more neatly in terms of Z = X̄−211
46/ 25
0 d
If X̄ = N(250,
462
),
25
0 d
then Z =
√ , 1),
N( 250−211
46/ 25
0 d
i.e. Z = N(4.24, 1).
2
µ−a
[using the result that Y = X−a
has mean b and variance σb2 ]
b
Then:
β = Pr(−1.96 < Z 0 < 1.96) = Pr(−6.20 < Zs0 < −2.28) = 0.011,
as we obtained above.
page 7.6
620.152 Introduction to Biomedical Statistics
There is a helpful analogy between legal processes, at least in Westminster-style legal systems, and hypothesis testing.
hypothesis testing
the law
null hypothesis H0
alternative hypothesis H1
don’t reject H0
without strong evidence
type I error
type II error
α = Pr(type I error)
power = 1 − Pr(type II error)
accused is innocent
accused is guilty
innocent until proven guilty,
beyond a reasonable doubt
convict an innocent person
acquit a guilty person
beyond reasonable doubt
effectiveness of system
in convicting a guilty person
A simple example to illustrate some of the terms used.
Example I have a coin which I think may be biased.
To test this I toss it five times: if I get all heads or all tails, I will say it is
biased, otherwise I’ll say it’s unbiased.
Let θ = probability of obtaining a head, then:
null hypothesis, H0 : θ = 12 (unbiased);
alternative hypothesis, H1 : θ 6= 21 (biased);
test statistic, X = number of heads obtained.
test (decision rule): reject H0 if X ∈ {0, 5}.
significance level
= Pr(reject H0 | H0 true)
= Pr(X ∈ {0, 5} | θ = 12 )
= ( 12 )5 + ( 21 )5
1
= 16
≈ 0.06
power
= Pr(reject H0 | H1 true)
= Pr(X ∈ {0, 5} | θ 6= 12 )
(this can’t be evaluated)
So, we define the power function:
Q(θ) = Pr(reject H0 | θ)
= Pr(X ∈ {0, 5} | θ)
= (1 − θ)5 + θ5
graph of Q(θ):
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Note 1: Q(0.5) is the significance level of the test
Note 2: Q(0.75) = 0.255 + 0.755 ≈ 0.24; so this is not a particularly good
test. But we knew that anyway!
page 7.7
620.152 Introduction to Biomedical Statistics
d
Example Suppose that Z = N(θ, 1). We observe Z. On the basis on
this one observation, we wish to test H0 : θ = 0 against H1 : θ 6= 0.
This relatively trivial example has a range of applications!
d
Z = N(θ, 1)
reject H0
|Z| > 1.96
don’t reject H0
|Z| > 1.96
H0 true (θ = 0)
× error of type I
X correct
d
Z = N(0, 1)
H1 true (θ 6= 0)
α = Pr(|Z| > 1.96) = 0.05
prob = 0.95
reject H0
|Z| > 1.96
don’t reject H0
|Z| > 1.96
X correct
× error of type II
d
power = Pr(|Z| > 1.96) = 0.17
prob = 0.83
d
power = Pr(|Z| > 1.96) = 0.52
prob = 0.48
d
power = Pr(|Z| > 1.96) = 0.85
prob = 0.15
d
power = Pr(|Z| > 1.96) = 0.95
prob = 0.05
d
power = Pr(|Z| > 1.96) = 0.98
prob = 0.02
d
power = Pr(|Z| > 1.96) = 0.17
prob = 0.83
d
power = Pr(|Z| > 1.96) = 0.52
prob = 0.48
d
power = Pr(|Z| > 1.96) = 0.85
prob = 0.15
d
power = Pr(|Z| > 1.96) = 0.95
prob = 0.05
d
power = Pr(|Z| > 1.96) = 0.98
prob = 0.02
e.g. Z = N(1, 1)
e.g. Z = N(2, 1)
e.g. Z = N(3, 1)
e.g. Z = N(3.61, 1)
e.g. Z = N(4, 1)
e.g. Z = N(−1, 1)
e.g. Z = N(−2, 1)
e.g. Z = N(−3, 1)
e.g. Z = N(−3.61, 1)
e.g. Z = N(−4, 1)
For example, for θ = 3,
d
power = Pr(|Z| > 1.96), where Z = N(3, 1)
1 − power = Pr(−1.96 < Z < 1.96)
= Pr(−4.96 < Zs < −1.04)
= Pr(Z < −1.04) − Pr(Z < −4.96)
= 0.1492 − 0.0000
·
power
=
0.851
· ·
Except for θ around zero, it is usually the case that only one tail is required (as the other is negligible).
For example, for θ = −1,
d
·
· ·
power = Pr(|Z| > 1.96), where Z = N(−1, 1)
1 − power = Pr(−1.96 < Z < 1.96)
= Pr(−0.96 < Zs < 2.96)
= Pr(Z < 2.96) − Pr(Z < −0.96)
= 0.9985 − 0.1685
power = 0.170
Using the above table, we could plot a graph of the power function.
The graph will have a minimum at zero (of 0.05, the significance level),
and increases up to 1 on both sides, as θ moves away from zero:
for θ = 4, or θ = −4 the power is 0.98.
page 7.8
620.152 Introduction to Biomedical Statistics
X̄−µ
For the z-test, the statistic is Z = σ/√n0 .
d
If µ = µ0 , then Z = N(0, 1).
µ −µ
d
1√ 0
If µ = µ1 , then Z = N(θ, 1), where θ = σ/
.
n
We only get one observation on Z.
So the z-test is actually equivalent to the example above.
We can use the results of that example to work out power for any z-test,
d
using power = Pr(|Z| > 1.96), where Z = N(θ, 1).
Sample size calculations
To devise a test of significance level 0.05 that has power of 0.95 when
µ = µ1 , we need θ = 3.61.
13 σ 2
µ1 −µ
√ 0 = 3.61 ⇒
i.e. σ/
n
=
.
n
(µ1 − µ0 )2
Example (Serum cholesterol level)
Find the required sample size if we want a test to have significance level
0.05 and power 0.95 when µ = 220.
Here µ0 = 211, µ1 = 220 and σ = 46. Therefore:
2
n > 13×46
= 340.
92
Thus we need a sample of at least 340, in order to ensure a power of
0.95 when the population mean is 220.
The sample size result can be generalised, to any significance level α
and specified power β as indicated in the following diagram, which
indicates the derivation of 3.61 ≈ 1.96 + 1.6449.
0.025
-4
-3
0.05
-2
-1
0
1
1.96
z1− 12 α
0.025
2
3
4
5
6
1.6449
z1−β
For a z-test of µ = µ0 , with significance level α and power β when
µ = µ1 , we require
(z1− 12 α + z1−β ) σ 2
n>
,
(µ1 − µ0 )2
where zq denotes the standard Normal q-quantile.
page 7.9
7
620.152 Introduction to Biomedical Statistics
7.5 Hypothesis testing for Normal populations
In this section, we consider tests for the parameters (µ and σ) for a Normal population. In each case, we define a statistic that has a “standard”
distribution (N, t, χ2 ) when H0 is true. A decision is then obtained by
comparing the observed value for this statistic with the standard distribution.
In reporting the results of the test, you should give the value of the
“standard” statistic, the p-value, and a verbal conclusion/explanation.
It is recommended that you also give a confidence interval in reporting
your results.
z-test (testing µ=µ0 when σ is known/assumed)
We define:
X̄ − µ0
√
σ/ n
(in which, X̄ is observed; µ0 , σ and n are given or assumed known.)
Z=
d
If H0 is true, then Z = N(0, 1).
We evaluate the observed value of Z:
x̄ − µ0
√
z=
σ/ n
and compare it to the standard Normal distribution. For significance
level 0.05, we reject H0 if |z| > 1.96.
The p-value is computed using the tail probability for a standard Normal distribution:
½
2 Pr(Z 0 > z) if z > 0
d
where Z 0 = N(0, 1).
p=
2 Pr(Z 0 < z) if z < 0
Example We obtain a random sample of n=40 from a Normal population with known standard deviation σ=4. The sample mean is x̄=11.62.
Test the null hypothesis H0 : µ=10 (against a two-sided alternative).
[n = 40, σ = 4, x̄ = 11.62, µ0 = 10]
√
z = 11.62−10
= 2.56;
4/ 40
p = 2 Pr(Z 0 > 2.56) = 0.010.
The sample mean, x̄=11.62; the z-test of µ=10 gives z=2.56, p=0.010.
Thus there is significant evidence in this sample that µ>10.
In reporting this test result, it is recommended that you also give the
95% CI for µ: (10.28, 12.86).
Example The distribution of diastolic blood pressure for the population of female diabetics between the ages of 30 and 34 has an unknown
mean µ, and standard deviation σ = 9.1 mm Hg. Researchers want to
determine whether or not the mean of this population is equal to the
mean diastolic blood pressure of the general population of females in
this age group, which is 74.4 mm Hg. A sample of fifty diabetic women
in this age group is selected and their mean diastolic blood pressure is
79.6 mm Hg.
[n = 50, σ = 9.1, x̄ = 79.6, µ0 = 74.4]
√
z = 79.6−74.4
= 4.04;
9.1/ 50
p = 2 Pr(Z 0 > 4.04) = 0.000.
Note: p-values should be reported to three decimal places. p = 0.000 means
that p < 0.0005.
The mean blood pressure for this sample of female diabetics is 79.6 mm
Hg. This is significantly greater than the general population value of
74.4 mm Hg (z=4.04, p=0.000).
page 7.10
620.152 Introduction to Biomedical Statistics
Example (Renal disease)
The mean serum-creatinine level measured in 12 patients 24 hours after
they received a newly proposed antibiotic was 1.2 mg/dL. The mean
and standard deviation of serum-creatinine level in the general population are 1.0 and 0.4 mg/dL respectively.
Is there evidence to support the claim that their mean serum-creatinine
level is different from that of the general population?
The z-test is available on MINITAB using
Stat > Basic Statistics I 1-Sample Z . . .
and then enter either (the column containing the data) or (n and x̄); and
enter the known value for σ and the null hypothesis value µ0 .
For the above example, the following output is obtained:
One-Sample Z
Test of mu = 1 vs not = 1
The assumed standard deviation = 0.4
N
Mean SE Mean
95% CI
12 1.20000 0.11547 (0.97368, 1.42632)
Z
1.73
P
0.083
Note that we are assuming the standard deviation of serum-creatine
level is the same in the treated individuals as the general population (as
well as Normality etc.)
There is no evidence in this sample that the mean serum-creatine level
is different in these patients (z = 1.73, p = 0.083).
The z-test provides a routine which can be followed in the other cases.
t-test (testing µ=µ0 when σ is unknown)
We define:
X̄ − µ0
√
S/ n
(in which, X̄ and S are observed; µ0 and n are given.)
T =
d
If H0 is true, then T = tn−1 .
We evaluate the observed value of T :
x̄ − µ0
√
t=
s/ n
and compare it to the tn−1 distribution. For significance level 0.05, we
reject H0 if |t| > “2” = c0.975 (tn−1 ).
The p-value is computed using the tail probability for a tn−1 distribu½
tion:
2 Pr(T 0 > t) if t > 0
d
p=
where T 0 = tn−1 .
2 Pr(T 0 < t) if z < 0
Example (Cardiology)
A topic of recent clinical interest is the possibility of using drugs to reduce infarct size in patients who have had a myocardial infarction (MI)
within the past 24 hours. Suppose we know that in untreated patients
the mean infarct size is 25. In 18 patients treated with drug, the sample
mean infarct size is 16.2 with a sample standard deviation of 8.4. Is the
drug effective in reducing infarct size?
[µ0 = 25; n = 18, x̄ = 16.2, s = 8.4]
√
t = 16.2−25
= −4.44;
8.4/ 18
d
p = 2 Pr(T 0 < −4.44) = 0.000. (T 0 = t17 )
The sample mean for treated patients x̄=16.2 is significantly less than
the known mean for untreated patients of 25: t= − 4.44, p=0.000.
In reporting this test result, it is recommended that you also give the
95% CI for µ: (12.0, 20.4).
page 7.11
620.152 Introduction to Biomedical Statistics
Example (Calorie content)
Many consumers pay careful attention to stated nutritional contents on
packaged foods when making purchases. It is therefore important that
the information on packages be accurate. A random sample of n = 12
frozen dinners of a certain type was selected from production during
a particular period, and calorie content of each one was determined.
Here are the resulting observations.
255
244
239
242
265
245
259
248
225
226
251
233
The stated calorie content is 240. Do the data suggest otherwise?
MINITAB can be used to analyse the data using
Stat > Basic Statistics I 1-Sample t . . .
and then enter either (the column containing the data) or (n, x̄ and s);
and enter the null hypothesis value µ0 .
For the above example we obtain
One-Sample T:
Test of mu = 240 vs not = 240
N
Mean
StDev SE Mean
95% CI
12 244.33 12.383
3.575 (236.47, 252.20)
T
1.21
P
0.251
There is no significant evidence in this sample that the mean is different
from 240 calories (t = 1.21, p = 0.251).
χ2 -test (testing σ=σ0 )
We define:
(n − 1)S 2
σ02
(in which S is observed; σ0 and n are given.)
U=
d
If H0 is true, then U = χ2n−1 .
We evaluate the observed value of U :
(n − 1)s2
u=
σ02
and compare it to the χ2n−1 distribution. For significance level 0.05, we
reject H0 if u is outside the central 95% probability interval for the χ2n−1
distribution.
The p-value is computed using the tail probability for a χ2n−1 distribu½
tion:
2 Pr(U 0 > u) if u large
d
p=
where U 0 = χ2n−1 .
2 Pr(U 0 < u) if u small
Example (Packaging variation)
A packaging line fills nominal 900 g tomato juice jar with an actual
mean of 908.5 g. The process should have a standard deviation smaller
than 4.25 g per jar (a larger standard deviation leads to too many underweight and overfilled jars). Samples of 61 jars are regularly taken to test
the process. One such sample yields a mean of 907.9 g and a standard
deviation of 3.74 g.
Does this indicate that the true (population) standard deviation is less
than 4.25? (i.e. is σ < 4.25?)
[σ0 = 4.25; n = 61, s = 3.74]
2
d
= 46.46, p = 2 Pr(U 0 < 46.46) = 0.200. (U 0 = χ260 ).
u = 60×3.74
4.252
There is no significant evidence in this sample that the true standard
deviation is not equal to 4.25.
q
q
´
³
60
60
, 3.74 40.48
= (3.17, 4.55)
Note: the 95% CI for σ: 3.74 83.30
page 7.12
620.152 Introduction to Biomedical Statistics
Example (Renal disease)
The mean serum-creatinine level measured in 12 patients 24 hours after
they received a newly proposed antibiotic was 1.20 mg/dL; the sample standard deviation was 0.52 mg/dL. The mean and standard deviation of serum-creatinine level in the general population are 1.0 and 0.4
mg/dL respectively.
Is there evidence that their standard deviation of the serum-creatinine
level for the treated group is different from that of the general population?
[σ0 = 0.4; n = 12, s = 0.52]
2
d
u = 11×0.52
= 18.59, p = 2 Pr(U 0 > 18.59) = 0.138. (U 0 = χ211 ).
0.42
There is no significant evidence in this sample that the true standard
deviation is not equal to 0.4.
q
q
³
´
11
11
Note: the 95% CI for σ: 0.52 21.92
, 0.52 3.816
= (0.37, 0.88)
approximate z-test (testing λ=λ0 )
The procedure described above can be applied to cases where the null
distribution is approximately known. For example, we can follow the
routine to obtain an approximate test for the mean of a Poisson distribution.
We define:
X − λ0
√
λ0
(in which, X is observed; λ0 is given.)
Z=
d
If H0 is true, then Z ≈ N(0, 1), provided λ0 > 10
This can then be used in the same way as a z-test. We evaluate the
observed value of Z:
x̄ − λ0
z= √
λ0
and compare it to the standard Normal distribution.
It is common to use this result in epidemiological studies when examining disease occurrence. If X denotes the number of cases of a rare
d
disease in a sub-population of n individuals, then X ≈ Bi(n, p), where
d
n is large and p small. So, to a good approximation X ≈ Pn(λ).
Example (Occupational health)
Many studies have looked at possible health hazards of workers in the
aluminium industry. In one such study, a group of 8418 male workers
ages 40–64 (either active or retired) on January 1, 1994, were followed
for 10 years for various mortality outcomes. Their mortality rates were
then compared with national male mortality rates in 1998. In one of the
reported findings, there were 21 observed cases of bladder cancer and
an expected number of events from general-population cancer mortality rates of 16.1. Evaluate the statistical significance of this result.
x = 21, λ0 = 16.1
√
⇒ z = 21−16.1
= 1.271; so p = 0.204.
16.1
This result is not significant: there is no evidence in this result to indicate that the occurrence of bladder cancer is different from the general
population.
page 7.13
620.152 Introduction to Biomedical Statistics
This approximate z-test can be used in a wide variety of situations:
whenever we have a result that says the null distribution is approximately normal.
approximate z-test (testing p=p0 )
Suppose we observe a large number of independent trials and obtain
X successes. To test H0 : p = p0 , where p denotes the probability of
success, we can use
X
− p0
X − np0
Z=p
= qn
p0 (1−p0 )
np0 (1 − p0 )
n
(in which, X is observed; p0 and n are given.)
d
If H0 is true, then Z ≈ N(0, 1), provided n is large.
This too can then be used in the same way as a z-test. We evaluate the
observed value of Z:
x
− p0
Z = qn
p0 (1−p0 )
n
and compare it to the standard Normal distribution.
Example 100 independent trials resulted in 23 successes. Test the hypothesis that the probability of success is 0.3.
0.23−0.3
= −1.528; p = 2 Pr(Z 0 < −1.528) = 0.127.
p̂ = nx = 0.23, z = √
0.3×0.7
100
There is no significant evidence in this result that p is not equal to 0.3.
page 7.14