Download 216 Chap 8 Introduction to Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Introduction to Hypothesis
Testing
1
Hypothesis Testing
A hypothesis test is a statistical procedure
that uses sample data to evaluate a
hypothesis about a population
Hypothesis is stated in terms of the population
Predict sample statistics based on population
parameters (e.g.  ≈ µ)
Select random sample from population
Compare observed sample data with predicted
values
2
Step 1: State the Hypotheses
The null hypothesis, H0, states that in the
population there is no change, no difference, or no
relationship
H0: µtreatment = constant (e.g. µ)
e.g. H0: µtreatment = 100
This is read as: “The null hypothesis is that the
population mean of people receiving the treatment
equals 100”
H0 is that the treatment had no effect
3
1
H0
The null hypothesis must contain an equal
sign of some sort (=, ≥, ≤)
Statistical tests are designed to reject H0,
never to accept it
4
H1: The Alternative Hypothesis
The alternative hypothesis usually takes the
following form:
H1: µtreatment ≠ constant (e.g. µ)
e.g. H1: µtreatment ≠ 100
This is read as: “The alternative hypothesis states
that the population mean of people receiving the
treatment does not equal 100”
H1 is that the treatment had an effect
5
H0 and H1
Together, the null and alternative
hypotheses must be mutually exclusive and
exhaustive
Mutual exclusion implies that H0 and H1
cannot both be true at the same time
Exhaustive implies that each of the possible
outcomes of the experiment must make
either H0 or H1 true
6
2
Step 2: Set the Decision Criteria
What sample means are
consistent with H0 and
what sample means are
consistent with H1?
Separate distribution of
sample means into two
sets of regions – one
whose means are
consistent with H0 and
one whose means are
consistent with H1
n = 25, µ = 100, σ = 15
for graph
Extreme,
lowprobability
values if
H0 is true
90
Sample means
close to H0: highprobability values
if H0 is true
95
100
105
Extreme,
lowprobability
values if
H0 is true
110
7
α Level
The α level (alpha level; level of significance)
is a probability value that is used to define the
very unlikely sample outcomes if H0 is true
Psychologists usually adopt α = 0.05, although α =
0.01 and α = 0.001 are sometimes used
The critical region is composed of the extreme
sample values that are very unlikely (as
specified by the α level) to be obtained if H0 is
true
8
Critical Regions
Since we can reject H0 two
ways (extremely small or
extremely large sample
means), the α level is
divided across the two tails
of the distribution
Find the z-score whose area
above equals α / 2
z = 1.96 for α = 0.05
Find raw scores that
correspond to that z score
X = 100 + 1.96 · 3 = 105.9
X = 100 – 1.96 · 3 = 94.1
Extreme,
lowprobability
values if
H0 is true,
z = -1.96
90
Sample means
close to H0: highprobability values
if H0 is true
95
100
105
Extreme,
lowprobability
values if
H0 is true,
z = 1.96
110
9
3
Step 3: Collect Data & Compute
Sample Statistics
Randomly sample from population
In this example, n = 25
Give the sample the treatment
Measure the dependent variable
Calculate the z score of sample mean in the
sampling distribution
In this example the sample statistics are,  = 107, s
= 14; population parameters from slide 7 (IQs)
10
Step 4: Make a Decision
 = 107;
z = 2.33
If the sample mean’s zscore is in the extreme tails
of the sampling distribution
(e.g. in the critical region),
reject H0; otherwise, fail to
reject H0
Critical region is z > 1.96 or
z < -1.96 for α = 0.05
The example z is 2.33. It is
in the critical region.
Therefore, reject H0
It is likely the case that the
treatment had an effect
Extreme,
lowprobability
values if
H0 is true,
z = -1.96
90
Sample means
close to H0: highprobability values
if H0 is true
95
100
Extreme,
lowprobability
values if
H0 is true,
z = 1.96
105
110
11
Reject H0 or Fail to Reject H0
The only decisions you ever make in
hypothesis testing are
Reject H0. or
Fail to reject H0
No other decisions are possible
Never reject H1
Never accept H1
Never accept H0
12
4
Type I (α) Error
A type I (or α) error occurs when a
researcher rejects H0 when H0 is really true
Researcher concludes that the treatment had an
effect when it did not
This should happen with a probability equal to
α
13
Type II (β) Errors
A type II (or β) error occurs when a
researcher fails to reject H0 when H0 is
really false
Researcher concludes that there is insufficient
evidence to suggest that the treatment had an
effect when in fact it does have an effect
This should happen with a probability equal to
β
14
β
Unlike α, β is not directly set by the
researcher
β depends on the sample size (n)
β depends on how much the treatment affects
the dependent variable
β depends on the variability of the data
β depends on α
15
5
Type-I and Type-II Errors
Ideally, we would like to minimize both TypeI and Type-II errors
This is not possible for a given sample size
When we lower the α level to minimize the
probability of making a Type-I error, the β
level will rise
When we lower the β level to minimize the
probability of making a Type-II error, the α
level will rise
16
Type-I and Type-II Errors
17
Factors that Influence a
Hypothesis Test
The size of the mean difference
The larger the mean difference is, the more likely
you are to reject H0
The variability of the scores
The more variable the scores are, the less likely
you are to reject H0
The number of scores in the sample
The larger the sample size, the more likely you are
to reject H0
18
6
Assumptions of the z-Score
Hypothesis Test
Random sampling
If the sample is not selected randomly from the
population, it probably will not represent the
population
Independent observations
σ does not change as a result of the
treatment
Distribution of sample means is normal
19
Directional vs Non-Directional
Hypotheses
The hypotheses we have been talking about are
called non-directional hypotheses because they
do not specify how the population mean should
differ from the constant
That is, they do not say that the population mean
should be larger than the constant
They only state that the population mean should
differ from the constant
Non-directional hypotheses are sometimes
called two-tailed tests
20
Directional vs Non-Diretional
Hypotheses
Directional hypotheses include an ordinal
relation between the population mean and the
constant
That is, they state that the population mean should
be larger than the constant
For directional hypotheses, the H0 and H1 are
written as:
H0: µtreatment ≤ constant
H1: µtreatment > constant
Directional hypotheses are sometimes called
one-tailed tests
21
7
1 Tailed
When performing a one tailed test, all of
the critical region is in one tail of the
distribution of sample means
Do not divide α by two when finding the z
score for the critical region
This increases statistical power – the
probability of correctly rejecting a false H0
22
1 Tailed vs. 2 Tailed
1 Tailed
2 Tailed
α= .05, z = 1.65
Critical region
in one tail
-3
-2
-1
0
1
2
α=.05, z = -1.96
Critical region
in two tails
3
-3
-2
-1
α=.05, z = 1.96
Critical region
in two tails
0
1
2
3
23
Concerns about Hypothesis
Testing
Hypothesis testing focuses on the data, and not
the hypothesis
When we reject H0, we should really say “This
specific sample mean is very unlikely (p < .05) if
the null hypothesis is true
Statistical significance ≠ practical significance
The effect size can be small, but still be statistically
significant if the sample size is sufficiently large
24
8
Effect Size
A measure of effect size is intended to
provide a measurement of the absolute
magnitude of a treatment effect,
independent of the size of the sample(s)
being used
Cohen’s d is a measure of effect size
25
Effect Size
What is the effect size
for the example on
slide 5?
Magnitude of d
Evaluation of
Effect Size
d = 0.2
Small effect
d = 0.5
Medium effect
d = 0.8
Large effect
This is a small effect
26
Statistical Power
Statistical power is the probability that a
statistical test will correctly reject a false H0
Probability that a statistical test will identify a
treatment effect if one really exists
Power = 1 – β = 1 – probability of a Type II
error
27
9
Statistical Power
Calculate before performing the study
Need to know / estimate
How much the treatment changes the dependent
variable
Sample size
α
σ, µ
28
Statistical Power Example
How much the treatment changes the
dependent variable
Researchers hypothesize that having proper
nutrition during the first two years will increase
IQ by 3 points (notice – 1 tailed)
µ = 100, σ = 15
Sample size
n = 25
α = .05
29
Distribution of Sample Means
If the treatment has no effect, by the central limit
theorem, the distribution of sample means will have:
a mean = population mean = 100
a standard deviation = σ/√n = 15 / √25 = 3
If the treatment has the hypothesized effect, the
distribution of sample means will have
a mean = population mean + effect of treatment = 100 +
3 = 103
a standard deviation = σ/√n = 15 / √25 = 3
add a constant to all scores does not change the standard
deviation
30
10
z Score of Critical Region
This is a one-tailed test with α = .05
Consult a table to find the z with an area
above equal to .05
z = 1.65
31
Statistical Power Example
91
94
97
100
103
0
1
106
109
112
115
z
1.65 2
32
Statistical Power Example
Power equals area to right of the z score for
the critical region under the treatment
distribution of sample means
Areas to the right of the z score for the critical
region correspond to rejecting H0
Areas under the treatment distribution of
sample means correspond to a false H0
Both combined correspond to rejecting a false
H0 = power
33
11
Statistical Power Example
Find the z score in the treatment distribution of
sample means that is at the same location as
the z score for the critical region in the no
treatment distribution of sample means
ztreatment = zcritical region – zmean of treatment
zmean of treatment = (103 – 100) / 3 = 1
ztreatment = 1.65 – 1 = 0.65
Power = area above z = 0.65
Power = .26
Only about a 1 in 4 chance of observing this effect
34
Factors that Influence Power
Sample size
As sample size increases, power increases
α level
As α decreases (fewer Type I errors), β increases
(more Type II errors), and 1 – β (power) decreases
Number of tails (directional vs non-directional)
One tailed tests have more statistical power than
two tailed tests. Can you explain why?
35
12