Download Hypothesis Testing I

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Hypothesis Testing – Part I
Hypothesis Testing I: 1
Recall:
1. We learned how to describe data
•
Made no assumptions about where the data
came from
•
Nor about method of sampling
2. We focused on methods of sampling
•
Probability samples
•
Learned how to calculate probabilities
•
Focus on specific probability distributions
Hypothesis Testing I: 2
3. We learned how to estimate unknown population
parameters
•
Goal: to try to understand the characteristics
(parameters) of the population that gave rise to
our sample data
4. Now, we’ll learn how to evaluate alternative
explanations for the data we have observed
•
The purpose is to test a research hypothesis
Hypothesis Testing I: 3
Examples of Research Hypotheses
1. Early treatment, compared to later treatment, of
patients with an evolving Ml will result in better
heart function (ejection fraction) at 24 hours.
2. The implementation of policy “A” will result in a
reduction in the inappropriate use of a particular
drug.
3. The delivery of an educational intervention to
high school students will result in greater use of
“safer” sex practices.
4. The average cost of a particular procedure is $X.
Hypothesis Testing I: 4
To answer such questions,
• we collect data
• then analyze the data
• to see if they are compatible with the
research hypothesis being true.
We reason by use of “proof by contradiction”
• Proof by example won’t work.
The critic can always claim a counter
example must exist.
Hypothesis Testing I: 5
The Logic of Statistical Hypothesis Testing
The investigator starts
• by presuming the NULL explanation, eg:
• The treatment as NO benefit
• The new cost is the same as the old
(there is NO difference between cost of new
and old)
• Data are then collected and evaluated for
consistency with the NULL explanation
Hypothesis Testing I: 6
•
If the data are NOT consistent with the null
explanation
•
then abandon the null explanation in favor of
an alternative
•
Typically, it is the alternative explanation that
the investigator would like to advance
Hypothesis Testing I: 7
If the null hypothesis is not true –
Then some alternative hypothesis must be true
This suggests some guidelines:
1. We’ll let
“Ho” represent the null hypothesis
“Ha” represent the alternative hypothesis
(called “H-naught” or “H-a”
Hypothesis Testing I: 8
1. The null and alternative hypotheses are typically
specified so that
• The null is the one we hope to contradict
•
•
The null and alternative are
•
Mutually exclusive
•
Collectively exhaustive
Both should be specified in advance!
– before the data is collected.
Hypothesis Testing I: 9
The Research Hypothesis

Ho and Ha
Examples
1. Early treatment post MI results in better function
of the heart at 24 hours (better = higher)
Define study with:
Group 1 = “Early”
m1 = true mean at 24 hours
Group 2 = “Late”
m2 = true mean at 24 hours
Research hypothesis says m1 > m2
 This is the alternative hypothesis since it is the
explanation the investigator seeks to advance.
Hypothesis Testing I: 10
Thus, with the alternative defined, the null is
defined to be anything other than the alternative:
That is, m1  m2
Thus we have:
Ho : m1  m2  Ho : m1  m2  0
Ha : m1  m2  Ha : m1  m2  0
Note that we can rewrite the hypotheses to
compare the difference between group means to
zero
Hypothesis Testing I: 11
If our research hypothesis is that “early treatment
leads to different heart function at 24 hours”
then our alternative is that the means are not
equal: m1 m2
H o : m1  m2  H o : m1  m2  0
H a : m1  m2  H a : m1  m2  0
Hypothesis Testing I: 12
The first case is called a one-sided alternative
• we are interested in a change in only one
direction : m1 > m2
The second case is called a two-sided alternative
• we consider change in either direction away
from equality: m1  m2
Hypothesis Testing I: 13
Example 2:
The average cost of a particular procedure is $X
Suppose a medical insurance company wants to
pay no more than $500 for a particular surgical
procedure:
Let m= true average cost
The research hypothesis says m  500
 specify this as the alternative hypothesis
Thus,
Ho: m ≥ 500
Ha: m < 500
Hypothesis Testing I: 14
Hypothesis test as “proof by contradiction”
1. Assume null hypothesis is true
2. Determine a “rejection region” corresponding
to values unlikely to occur
using this assumption (Ho true)
3. Either:
“Reject” Ho if the observed data is in the
rejection region.
OR
“Fail to Reject” Ho if data in the rejection region
with assumption
Hypothesis Testing I: 15
For example we will reject Ho: m > 500
when X is low enough that we believe the
true mean must be less than $500
Rejection Region
X
500
Hypothesis Testing I: 16
Steps in Constructing a Statistical Hypothesis Test
1. Identify the research question
2. State the assumptions necessary for computing
probabilities
3. Specify Ho and Ha and the α-level (usually α = 0.05)
4. Specify the test statistic
5. Specify a decision rule
6. Compute the test statistic and the achieved
significance( or P - value) from sample data
7. Come to a “Statistical” Decision
8. Reach a Conclusion
9. Report a confidence interval
Hypothesis Testing I: 17
Example:
1. Identify the research question
Suppose the mean birth weight for 1998 of all US
hospital births is known to be m = 3400 gm with
s = 710 gm, based upon national birth certificate
data.
How do births at Hospital A compare?
We are asking
Is the mean birth weight at Hospital A different from
the national mean?
Hypothesis Testing I: 18
Experiment: Collect birth weights of 100
consecutive births at Hospital A and compute our
sample mean of x = 3250 gm.
2. What Assumptions must we make about our data
to compute probabilities?
Assume:
• a random sample of births from
• a population with known s = 710 gm (known
national standard deviation).
Thus, by the central limit theorem:
 s2 
 7102 
X ~ N  m,   X ~ N  m,

n 

 100 
Hypothesis Testing I: 19
3. Specify null and alternative hypotheses:
Ho: The true mean birth weight at Hospital A is
the same as the national mean.
Ha: The true mean birth weight at Hospital A is
different from the national mean
Or
Ho: m = 3400
Ha: m  3400
Hypothesis Testing I: 20
4. Compute the Test Statistic
This is where the proof by contradiction thinking
comes in.
We want to know:
If it is true that m = 3400 gm (Ho)
• what are the chances of observing a sample
mean as far away from m = 3400 as x = 3250?
Hypothesis Testing I: 21
Since Ha is two-sided
(greater OR less than the value for Ho),
we want to know the probability represented by the
following shaded area:
150
3250
150
3400
3550
“What are the chances of observing x as far away
from the pop. mean m=3400 as the one we have (3250)
in either direction” ?
Hypothesis Testing I: 22
We want to compute:
Pr[ x  3250]  Pr[ x  3550]
 (2) Pr[ x  3250]
150
3250
150
3400
3550
We know how to do this!
We can transform the probability calculation into an
equivalent one for a standard normal:
x m
z
s/ n
Hypothesis Testing I: 23
When we use m= 3400, as presumed by Ho , the
resulting quantity is called a
a Test Statistic
More generally
• if we let mo represent the value of m specified
by Ho we have
Test Statistic:
x  mo
z
s/ n
Hypothesis Testing I: 24
5. Specify Decision Rule
What is the probability of observing a sample
mean as far away from mo as the mean we have
observed?”
This probability calculation is known as:
the achieved significance
or
the significance of the data
or
the p-value
Hypothesis Testing I: 25
Our decision rule might be
Reject Ho if the achieved significance is less
than 0.05
This is equivalent to saying,
• If the probability of observing a sample mean, x,
this far or farther from mo
• is less than 5%,
• then we will reject Ho in favor of Ha.
Hypothesis Testing I: 26
6. Compute the test statistic from the sample data:
x  mo 3250  3400
z

 2.11
s / n 710 / 100
Hypothesis Testing I: 27
The achieved significance (or P-value)is:
Pr[ x  3250]  Pr[ x  3550]
 (2) Pr[ x  3250]
 x  mo 3250  3400 
 (2) Pr 


 s / n 710 / 100 
 (2) Pr[ z  2.11]
 (2)(.0174)
 .0348
-2.11
0
2.11
Hypothesis Testing I: 28
7. Statistical Decision:
• Since 0.0348 is less than 0.05 we will reject Ho.
• We are saying that x = 3250 is sufficiently
different from µo = 3400
• that it suggests that Ho is not true and should be
abandoned.
• That is, if Ho is true, the probability of a sample
mean this far away is only .0348 or 3.5% – an
unlikely outcome, so reject Ho.
Hypothesis Testing I: 29
8. Conclusion:
Hospital A has babies of significantly different
birth weight than the US average.
In fact, the mean birth weight at Hospital A appears
to be lower than the US mean.
Hypothesis Testing I: 30
9. Compute a Confidence Interval Estimate of the
true mean birth weight for babies at Hospital A
We have all the ingredients to compute a confidence
interval estimate:
x = 3250, s = 710, n=100
since the true standard deviation is known we use:
z.975 = 1.96 for a 95% confidence interval:
x  z (s/n) = 3250  (1.96)(710/10) = 3250  139.2
95% CI:
(3110.8 , 3389.2)
Hypothesis Testing I: 31
Interpretation:
The hypothesized mean mo = 3400 falls outside
(above) the 95% confidence interval.
It therefore seems likely that the mean birth weight
at Hospital A is less than the overall US mean.
Your confidence interval should give a consistent
result with your hypothesis test.
If it doesn’t – check your work!
Hypothesis Testing I: 32
Comparing CI estimates and Hypothesis Testing
• When conducting a hypothesis test, with an
a=.05 decision rule, we are centering an interval
around the hypothesized mean (m0):
x ??
m0z.975s/n
m0
m0z.975s/n
• When our observed sample mean (x) falls
outside this interval, we interpret this as
indicating, with 0.05 likelihood of error, that our
sample comes from a distribution with a
different mean
Hypothesis Testing I: 33
COMMENTS:
1. The “.05 rule” alone is very uninformative
•
it leads to a “reject” or “do not reject” with no
information about the data.
A better approach is to report both
•
the achieved significance
•
confidence interval estimate
You can then interpret these, while also leaving
room for your reader to interpret.
Hypothesis Testing I: 34
2. Don’t forget the conclusion step!
Too often, only a p-value is reported or, worse
still, only a “reject” or “do not reject” is reported.
3. Statistical significance alone gives no clues
about biology.
Keep in mind that a standard error is a function
of sample size n. This means that by increasing n,
the SE can be made smaller and smaller.
Hypothesis Testing I: 35
Eventually, any observation can achieve statistical
significance regardless of its biological relevance.
For example is a statistically significant change in
blood pressure of 1 mm Hg very useful?
If we have a very large n, say n=1000 we might find
such a difference of 1mmhg statistically significant,
but it may not be a biologically meaningful
distinction.
Hypothesis Testing I: 36
4. A statistical hypothesis test uses probabilities
based only on the null hypothesis (Ho) model!
The proof by contradiction thinking asks us to:
• presume that Ho is true
• then examine the plausibility of our data in
light of this assumption.
• We either reject it, or we fail to do so.
We do not prove that Ho is correct.
Hypothesis Testing I: 37
5. We can summarize the results of statistical
hypothesis testing as follows:
NULL HYPOTHESIS
Actually True
Fail to
Reject
DECISION
Reject
Actually False

Type II error
b
Type I error
a

Hypothesis Testing I: 38
IF Ho is true and we (incorrectly) reject Ho
• we have type I error
• we can calculate Pr[type I error] = a
IF Ha is true and we (incorrectly) fail to reject Ho
• we have type II error
• we must have a specific Ha model before
we can calculate Pr[type II error] = b
IF Ha is true and we (correctly) reject Ho
• This occurs with probability = (1-b)
which we call the “POWER” of a test
Hypothesis Testing I: 39
Example 2
Does a new treatment for cancer increase the survival
time from diagnosis significantly beyond 38.3
months?
A sample of 100 subjects given the new treatment
had a mean survival time of 46.9 months.
Assume the data are a random sample of survival
times from a N(m,s2) with s = 43.3 months.
(e.g., we may know the distribution of survival
times from prior studies)
Hypothesis Testing I: 40
SOLUTION.
2.
Assumptions
We have a random sample of n=100 survival times
from a population with s = 43.3.
Thus,
3.
 43.32 
X ~ N  m,

100


Specify Ho and Ha
Research hypothesis suggests an increase in
survival
H o:
m  38.3
Ha:
m > 38.3 (one sided!)
Hypothesis Testing I: 41
4.
Specify Test Statistic:
Since s = 43.3 is known, we’ll use
x  mo
x  38.3
z

s / n 43.3/ 100
5. Decision Rule
• We’ll calculate z using observed data
• compute the achieved significance (p-value)
• and compare this to 0.05
• If it is less than 0.05 we will reject Ho
• otherwise we will “fail to reject” Ho
Hypothesis Testing I: 42
6. Calculations – Achieved significance
Be careful! For a one-sided test, we are
concerned with a probability in only 1 direction
from mo!
x

m
46.9

38.3


o
Pr[ x  46.9]  Pr 


 s / n 43.3/ 100 
 Pr[ z  1.986]
 .0233
.0233
38.3
46.9
1.986
z
Hypothesis Testing I: 43
7. Statistical Decision
.023 < .05

Reject Ho
8. Conclusion
It is unlikely that the improvements in survival
time are due to chance. The new treatment
appears to significantly improves survival.
9. Confidence Interval on True Mean survival using
new treatment:
z.975 = 1.96 for a 95% confidence interval, known s:
x  z (s/n) = 46.9  (1.96)(43.3/10) = 46.9  8.49
95% CI: (38.41, 55.39)
Hypothesis Testing I: 44
A note on One-sided hypothesis tests:
Quite often, we are interested in a change in only
one direction:
•
Does a new drug increase the proportion of
patients cured?
•
Does a new policy decrease the hospital
length of stay (LOS)?
A test that looks at a change in only one direction
seems to make sense.
However in practice this is rarely done.
Hypothesis Testing I: 45
•
If it is possible for the change to occur in
either direction
•
then a test should look for the change in
either direction.
For example,
•
the new drug could actually decrease the
proportion of patients cured,
•
or the new policy could potentially result in
increased LOS due to unexpected side
effects.
Standard practice is to use a two-sided test!
Hypothesis Testing I: 46
Recap of Significance Testing So Far
The Basic Idea
1. Compute the “probability of the data”
(achieved significance) presuming Ho to be true.
•
Large Probabilities are consistent with Ho
-- do not reject
•
Small Probabilities are NOT consistent with Ho
-- reject
Hypothesis Testing I: 47
“Probability of the Data”
We want to know the probability of a sample
statistic as extreme or more extreme than the one
observed.
One Sided Alternative
Two Sided Alternative
Distribution
determined
by Ho
0
0
t or z = observed
sample statistic
Hypothesis Testing I: 48
Next we will consider a couple of examples
that parallel the situations we have discussed
so far for confidence interval estimation.
We will also focus on computer analysis for
conducting hypothesis tests
Hypothesis Testing I: 49
Application 1: One Population, s2 Known, Test of
hypothesis on mean, m
1. Research Question: Serum enzyme A levels are
measured in 10 patients with a sample mean of 22.
If it is known that the population variance is 45 and
if normality is assumed, are the data consistent with
a population mean of 25?
That is, we have
 n=10
 s2=45
 x=22
 mo=25
Hypothesis Testing I: 50
2. Assumptions
• Random sample of serum enzyme A levels
• from a Normal distribution with s2 = 45.
• Thus,
 45 
X ~ N  m, 
 10 
Why must we assume normality of
the data for this example?
Is n particularly large for Central Limit Theorem
to hold?
Hypothesis Testing I: 51
3. Specify Ho and Ha :
The wording “are the data consistent with”
suggests a two sided alternative:
H o:
m 25
Ha:
m25
4. Test Statistic
s2 known suggests use of Normal or
z-transformation:
x  mo
z
s/ n
Hypothesis Testing I: 52
5. Decision Rule
We’ll calculate the achieved significance
(p-value) and compare to a.05
Reject Ho for p<.05, else fail to reject.
6. Calculations
Test Statistic:
x  mo 22  25
z

 1.41
s/ n
45 /10
Hypothesis Testing I: 53
Achieved Significance:
p  Pr[ z  1.41]  Pr[ z  1.41]  0.1586
- 1.41
0
1.41
For 2-sided test:
Total area is the
achieved level
of significance or
the p-value
Minitab examples
Hypothesis Testing I: 54
7. Statistical Decision
• 0.1586 represents the probability of a sample
mean at least as far away from 25 as 22,
if in fact the true mean is 25.
• 0.1586, or 15.86% is reasonably high.
• This suggests the data are consistent with the
null hypothesis
 Do NOT reject Ho
.1586 > .05
(or .1586 > a)
Hypothesis Testing I: 55
8. Conclusion
•
The data are consistent with an hypothesized
mean serum A level of m= 25.
Note
•
we have not proven Ho is true, merely that
•
with the evidence of our sample, m = 25 is a
reasonable possibility
•
and we cannot reject it.
Hypothesis Testing I: 56
9. 95% Confidence Interval
x  z.975(s/n) = 22  1.96(2.12) = (17.5, 26.5)
Note that:
 mo = 25 is within the interval
Hypothesis Testing I: 57
Using Minitab:
Stat  Basic Stats  1-Sample Z
Select variable to test
Test mean: Enter mo
Sigma: Enter s
Hypothesis Testing I: 58
Z-Test
Test of mu = 25.00 vs mu not = 25.00
The assumed sigma = 6.71
Variable N
SerA
10
Mean
22.00
StDev
6.29
SE Mean
Z
2.12
-1.41
P
0.16
Hypothesis Testing I: 59
Application 2: One Normal Population, s2 UNknown
Test of m
1. Research Question
A drug company claims a certain capsule contains 2.5
milligrams of a drug X. An independent laboratory
obtained a random sample of 20 capsules and
measured the amount of the drug in each. The
measurements were as follows:
3.31 1.30 0.61 2.42 1.94 2.23 2.35 0.96 2.97 2.91
1.70 2.05 3.15
2.54 1.84 2.23 1.94 0.88 0.83 1.92
Is the drug company claim correct?
Hypothesis Testing I: 60
2. Assumptions
We have data from a random sample from a
normal distribution, s2 unknown.
3. Specify Ho and Ha
Ho: mo = 2.5 mg
Ha: mo  2.5 mg (Two-sided)
Why must we
assume
normality of the
data?
So the tdistribution is
applicable!
4. Test Statistic
s2 UNknown suggests use of the t-transformation:
x  mo
tn 1 
s/ n
Hypothesis Testing I: 61
5. Decision Rule
We’ll calculate the achieved significance level
(two-sided) and compare this to a type I error
of a0.05
6. Calculations (x = 2.00, s = .787)
Test Statistic:
x  mo
2.0  2.5
t

 2.83
s / n .787 / 20
Achieved significance:
p  Pr[t19  2.83]  Pr[t19  2.83]  2(.0054)  .0108
Hypothesis Testing I: 62
7. Statistical Decision
•
0.0108 < 0.05
•
Therefore, REJECT Ho
p-value < type I error (a)
8. Conclusion
•
The mean amount of drug in the capsules
appears to be significantly different from 2.5
mg.
•
In fact, the mean is significantly less than 2.5
mg.
Hypothesis Testing I: 63
9. Confidence Interval Estimate
x  t19; .975 (se) = 2.00  2.093(.176) = (1.66, 2.35).
•
Note that
The confidence interval does not include the
hypothesized mean amount of 2.50
Hypothesis Testing I: 64
The particular test just conducted is known as a
ONE-SAMPLE t-TEST.
•
We have a sample from a single population
•
We are comparing our observed sample
mean to some hypothesized value for the
mean.
Hypothesis Testing I: 65
Using Minitab: Stats  Basic Stats  1-Sample t
Select variable to test
Enter mo
Hypothesis Testing I: 66
T-Test of the Mean
Test of mu = 2.500 vs mu not = 2.500
Variable N
Mean
StDev
SE Mean
drugx
2.004
0.787
0.176
20
T
-2.82
P
0.011
Note that the one sample t-test provides you
with estimates of the mean, standard
deviation and standard error.
By checking the Confidence Interval option,
you can get a confidence interval rather than
a hypothesis test.
Hypothesis Testing I: 67