Download Introduction to Hypothesis Testing Introduction to Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
1
Introduction to Hypothesis Testing
Decision Examples
TRUE STATE
DECISION
Innocent
Guilty
Declare innocent correct decision
Declare guilty
ERROR
ERROR
correct decision
• How can the jury avoid
– Convicting an innocent person?
– Freeing a guilty person?
• Is one kind of error worse than another?
– What does the instruction, “Innocent unless the evidence
proves guilt beyond a reasonable doubt,” suggest about
how our system, in theory, balances the two?
Introduction to Hypothesis Testing
Decision Examples
TRUE STATE
DECISION
No Prostate
Cancer
Prostate
Cancer
Decide no
Prostate Cancer
correct decision
ERROR
Biopsy
ERROR
correct decision
• The evidence comes from a painless blood test
– PSA <4.0 or less is considered normal, cancer-free
– PSA > 4.0 can be caused by infection or by cancer of the
prostrate
– Urologists disagree on how much above 4.0 or for how long
above 4.0 the PSA should be to call for a biopsy, and on
how age of the patient should influence the decision
• Note, disagreements are about the decision criterion and the
relative costs of the two types of errors
2
Introduction to Hypothesis Testing
What would you do if you wanted to determine if a two
sided coin is fair?
• You’d probably flip it a bunch of times to see if about
1/2 the time it’s “heads” and 1/2 the time it’s “tails”.
• You might also set a criteria by which it would be
considered unfair. For example, you might suggest
that out of 12 flips if there are 9 or more “heads” or
“tails” the coin is unfair.
• This scenario is a simple hypothesis test. Using what
is known about probabilities and sampling
distributions, even more precise tests may be
developed.
Introduction to Hypothesis Testing
• as researchers, we need to decide at what
point we believe the coin is unfair
• a typical guideline is to call anything within the
middle 95% of the distribution fair, while the
upper and lower 2.5% would be unfair
fair
Area=95%
unfair
4
Area=2.5%
3
CRITICAL
REGION
unfair
Area=2.5%
CRITICAL REGION
2
1
0
−
€
α
2
+
€
α
2
3
Introduction to Hypothesis Testing
Probability
0.00024
0.0029
0.0161
0.0537
0.1208
0.1936
0.2256
0.1936
0.1208
0.0537
0.0161
0.0029
0.00024
2 std dev
0.25
0.2
0.15
p
Number of
Heads
12
11
10
9
8
7
6
5
4
3
2
1
0
0.1
0.05
0
1
2
3
4
5
6
7
8
# heads
9
10
11
12
13
• using the addition rule of probability, notice that the
probability of 10, 11, or 12 “heads” out of 12
is < .025 or 2.5%
• the same is true for 0, 1, or 2
Hypothesis Testing
Definition:
• An inferential procedure that uses sample
data to evaluate a hypothesis about a
population
• Hypothesis testing involves a standardized set
of procedures so a researcher can objectively
evaluate a hypothesis
• The process starts with a research question -how will the population mean change after a
treatment (independent variable) is
administered?
4
Hypothesis Testing: The Steps
1. State the hypotheses: null & alternative
2. Set the criterion
3. Obtain sample data
4. Calculate the test statistic
5. Decided to reject or fail to reject the
null hypothesis and interpret your
decision
1. State the hypotheses
• the null hypothesis, H0 , is always the hypothesis
that states that there is no treatment effect, no
change, no difference, etc.
• the alternative hypothesis, H1 , states that there
was a treatment effect, usually in terms of the
independent variable, I.V., having an effect on the
dependent variable, D.V.
hypothesis are always stated in terms of populations
remember, even though samples are used, the goal
of inferential statistics is to make statements about
the population of interest
5
1. State the hypotheses (cont.)
Null Hypothesis H0
• for example, suppose a researcher wanted to
know what effect smoking marijuana has on
reaction time
• knowing the population mean on this
particular reaction time instrument is 1.2
seconds, the hypothesis can be set up
H0: µ=1.2 sec
Control Group
1. State the hypotheses (cont.)
Alternative Hypothesis H1
• when the direction of the effect is not known,
the alternative hypothesis will be stated in
terms of inequality,
H1: µ≠1.2 sec
• there are instances, based on theory or
previous research, when the alternative
hypothesis is stated in terms of direction
– for example, based on previous research, it is known
that smoking marihuana increases the amount of
time it takes to react
H1: µ>1.2 sec
6
1. State the hypotheses (cont.)
• notice in the previous example that the
null hypothesis, H0 , still maintains
equality
• this should always be the case
• therefore,
H0: µ=1.2 sec
H1: µ>1.2 sec
2. Set the criterion
• referring back to the example of flipping the
coin, setting the criterion, α, is the statistical
equivalent of deciding “at what point is the
coin unfair”
• as was already mentioned, the middle 95% is
usually considered “fair”
• in this example, the remaining 5% would be
considered error, therefore the criterion is
α=0.05
Area=95%
4
3
Area=2.5%
2
Area=2.5%
1
0
−
€
α
2
+
€
α
2
7
2. Set the criterion (cont.)
• The criterion, α, is also known as Type I
Error
• Type I error is defined as the probability
of rejecting a true null hypothesis
– that is to say, if the null hypothesis is true
and we reject it, there is a predetermined
chance (usually a 5%) that we are wrong
• errors will be discussed in detail later on
2. Set the criterion (cont.)
• The criterion delimits what is called the critical region
• The critical region is defined as the extreme scores in a
distribution where the probability of obtaining them is
< α when the null hypothesis is true
Two-Tailed Test
4
3
2
Critical
Region
Critical
Region
1
One-Tailed Test
4
0
−
α
2
+
α
2
3
Critical
Region
2
1
0
€
€
+
€
α
2
8
2. Set the criterion (cont.)
• as was previously mentioned, the unit normal
table can be used to calculate area
proportions above or below a score or scores
in a distribution corresponding to a given
percentage
Example
– Find the z-score associated with the upper and
lower scores when considering 95% of a normal
distribution
• “upper and lower scores”  two-tailed test
• α should be divided by 2 before looking up the z-score
• α/2 = 0.05/2 = 0.025
2. Set the criterion (cont.)
4
3
2
p=.025
p=.025
1
0
−
α
2
+
α
2
• In Appendix D: Table A look for p=0.025 in
“the area
beyond z”€
€
• The z-score is 1.96. Since it’s a two-tailed test
z= +/-1.96.
9
4. Obtain Sample Data
• After manipulating as per your
hypothesis, collect sample data
• Use descriptive statistics to see how
your data looks like
4. Calculate the test statistic
• one of the challenges you will face is
deciding which test statistic to use
• you will learn what each one is used for
as the class progresses
10
5. Decide to reject or fail to reject
• if the test statistic falls in the
critical region, the null
hypothesis is rejected
test
statistic
4
3
2
1
0
−
4
• if the test statistic does not fall 3
in the critical region, the null 21
hypothesis is NOT rejected
0
α
2
€
+
α
2
test
statistic
€
−
α
2
+
α
2
Notice that no statements are made
about€ the
€
alternative hypothesis
Caveat:
• hypothesis testing does not “prove” anything
• this is particularly true of the alternative
hypothesis
• the reason probability statements are not
made about the alternative hypothesis, is
that there still might be other alternative
hypothesis
– comments such as “supports the theory” and
“provide evidence to suggest” are common ways
of describing research findings
11
Example:
Suppose I am interested in determining
whether or not review sessions have any
effect on exam performance. I will
administer the independent variable, a review
session, to a sample of students in an
attempt to determine if this has an effect on
the dependent variable, exam performance.
Based on information gathered in previous
semesters, I know that the population mean
for a given exam is 24.
Step 1: State the hypotheses
• A researcher always states two opposing
hypotheses
NULL HYPOTHESIS:
– States that the treatment has no effect (there is
no change, no difference, nothing happened).
– The null hypothesis is always written as Ho.
Example:
– H0: µ=24 (Even with the review session, the
mean exam score is 24)
– µ represents the hypothesized population mean
for students having review sessions
12
Step 1: State the hypotheses (cont)
ALTERNATIVE HYPOTHESIS: Predicts that the
independent variable will have an effect on the
dependent variable (this is the hypothesis the
researcher “roots” for
– The alternative hypothesis is written as H1 or HA.
We’ll use H1.
Example:
– H1: µ≠24
– µ represents the hypothesized population mean
for students having review sessions. The true
population mean for these students may be higher
or lower than 24
Step 1: State the hypotheses (cont)
Hypotheses:
• H0: µ=24
• H1: µ≠24
– The task is to choose between these two
hypotheses
– The null hypothesis is the hypothesis that is
actually tested (we can only test one distribution
at at time)
– The null hypothesis states that the mean for the
review population will be 24 -- the same as the
untreated, previous population
13
Step 2: Setting the criterion
• Our decision is going to be based on a comparison of
our sample mean and the hypothesized population mean
Small discrepancy 
fail to reject null hypothesis
X compared to µ
Large discrepancy 
reject null hypothesis
€
How far away does our sample data mean need to be from the
hypothesized mean in order to tell if the effect is due to our
manipulation or just sampling error?
The process of answering this question
involves establishing an alpha level.
Step 2: Setting the criterion (cont)
ALPHA LEVEL (LEVEL OF SIGNIFICANCE):
Alpha is
• An area under the curve that
symbolized as α
we use to define “very unlikely”
or “very extreme” sample values
• By convention, α is usually set at .05, .01, or .001
• The alpha level is used to split the distribution into
two sections:
– Sample means that are compatible with the null hypothesis
(the center of the distribution)
– Sample means that are significantly different from the null
hypothesis (the very unlikely values that fall in the tails of
the distribution)
α
α Compatible Ho
4
−
3
Incompatible
Ho
+
2
Incompatible Ho
2
1
0
€
2
€
14
Step 2: Setting the criterion (cont)
• If alpha is set at α=.05, then the extreme 5% of
scores in the sampling distribution would represent
those “extreme” or “unlikely” sample values
• This “extreme” region of the distribution that we
define with α is called the critical region
• If we set α to .05 for our example, this would mean
that if our sample mean falls in the critical region, we
would believe that the mean of the population of the
review group is not 24 (the same as the non-review
group). It is something larger or smaller, depending
on which tail it falls in.
4
2.5%
CRITICAL
2 REGION
2.5%
CRITICAL
REGION
3
1
0
−
€
α
2
+
α
2
€
Step 2: Setting the criterion (cont)
Directional vs. Non-directional Hypotheses
(One-tailed vs. Two-Tailed)
TWO-TAILED HYPOTHESIS TEST (NON-DIRECTIONAL):
The alternative hypothesis does not specify the direction of
change in the mean; all that is predicted is that some change
will occur
Example: Do review sessions have
any effect on exam
4
3
performance?
2
H0: µ=24
1
H1: µ≠24
0
• Sample values that are substantially different (either larger or
smaller) than the hypothesized population mean would lead
to a rejection of the null hypothesis
15
Step 2: Setting the criterion (cont)
Directional vs. Non-directional Hypotheses
(One-tailed vs. Two-Tailed)
ONE-TAILED HYPOTHESIS TEST (DIRECTIONAL):
The alternative hypothesis specifies either an increase or a
decrease in the mean due to treatment; a specific
prediction about the direction of change is made
Example: Do review sessions4 improve exam performance?
3
H0: µ< 24
2
1
H1: µ> 24
0
• Only sample values substantially larger than 24 would lead
to a rejection of the null hypothesis
Step 2: Setting the criterion (cont)
Effects on Alpha:
• Due to convention, alpha is most often set at .05
• For a two-tailed test, alpha must be divided between
the two tails (.025 in each tail of the distribution)
• For a one-tailed test, all of the alpha amount is found
in one tail (.05)
Two-Tailed Test
4
3
2
.025
.025
1
One-Tailed Test
4
0
−
α
2
+
α
2
3
.05
2
1
0
€
€
+
€
α
2
16
Step 3: Obtain sample data
• In order to ensure that the researcher makes
an objective decision, the data is collected
after the researcher has stated the hypotheses
and set the alpha level.
– Our hypothesis is that the review session will
improve test scores. Thus, we should a one-tailed
test, α = 0.05
EXAMPLE A
EXAMPLE B
X = 28
σ X = 2.29
X = 28
σ X = 2.67
€
€
Step 4: Calculate the test statistic
z=
EXAMPLE A
28 − 24
z=
2.29
z = 1.75
€
€
X −µ
σX
EXAMPLE B
28 − 24
2.67
z = 1.50
z=
17
Step 5: Evaluate the null hypothesis
• In the final step, you compare your sample
data to the null hypothesis and make a
decision
• There are 2 possible decisions:
1. Reject the null hypothesis: if our sample
mean is substantially different from what the
null hypothesis predicts (if the sample mean
falls in the critical region)
2. Fail to reject the null hypothesis: if our
sample mean is not substantially different
from the null hypothesis (does not fall in the
critical region)
Step 5: Evaluate the null hypothesis (cont)
1) Reject the null hypothesis:
– The sample mean provides evidence that the
treatment had an effect
– Findings are considered statistically significant
when the null hypothesis is rejected
EXAMPLE A
In Appendix D:Table A, lookup what the p value is for
z=1.75
•
•
•
•
Which column should you look at, B or C?
Is the p value less or greater than alpha?
Did the treatment have an effect?
Was it statistically significant?
18
Step 5: Evaluate the null hypothesis (cont)
2) Fail to reject the null hypothesis:
– Findings are considered statistically nonsignificant
when we fail to reject the null hypothesis
EXAMPLE B
In Appendix D:Table A, lookup what the p value is for
z=1.5
•
•
•
•
Which column should you look at, B or C?
Is the p value less or greater than alpha?
Did the treatment have an effect?
Was it statistically significant?
Type I & Type II error
• the fifth step of hypothesis testing is deciding to
reject or fail to reject the null hypothesis
• when this decision is made one of two things is
possible, either you are right or you are wrong
TRUE STATE
DECISION
Ho
H1
correct decision
Type II error
p =1-α
p =β
Type I error
correct decision
p =α
p =1-β
Do not reject Ho
Reject Ho
19
Type I & Type II error
• Type I error, α (alpha), is defined as
the probability of rejecting a true null
hypothesis
• Type II error, β (beta), is defined as
the probability of failing to reject a false
TRUE STATE
null hypothesis
DECISION
Ho
H1
correct decision
Type II error
p =1-α
p =β
Type I error
correct decision
p =α
p =1-β
Do not reject Ho
Reject Ho
Type I & Type II error analogy
• consider a court case
– H0: not guilty
– H1: guilty
TRUE STATE
DECISION
not guilty
guilty
not guilty
correct decision
Type II error
guilty
Type I error
correct decision
• A Type I error would occur if a jury convicted
an innocent person
• A Type II error would occur if a jury let a guilty
man walk
• Our justice system sets the probability of a
Type I error to “beyond a reasonable doubt”,
just as researchers set it to .05, .01, etc.
20
Type I & Type II error
• example of a Type I error:
A researcher concludes that a certain drug
treatment significantly decreases the
possibility of heart disease when, if fact, it
doesn’t.
• example of a Type II error.
A researcher concludes that a certain drug does
not significantly decrease overactive behavior
in children when, in fact, it does.
TRUE STATE
DECISION
NO decrease
heart disease
decrease
heart disease
NO decrease
heart disease
correct decision
Type II error
decrease heart
disease
Type I error
correct decision