Download 04 Introduction to Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Introduction to Hypothesis Testing
The One-Sample z Test
The One-Sample z Test
• Conditions of Applicability:
– One group of subjects
– Comparing to population with known mean and variance.
• Note: this is not a common situation in Psychology!
PSYC 6130, PROF. J. ELDER
2
Example: Finish times for the 2005 Toronto Marathon
(Oct 16, 2005)
• Suppose your population of interest are women who ran the
marathon (slightly artificial).
• You hypothesize that women in their early twenties (20-24) are
faster than the average woman who ran the marathon.
• Here the ‘treatment’ is ‘youth’.
PSYC 6130, PROF. J. ELDER
3
Null Hypothesis Testing
• Largely due to English mathematician Sir R.A. Fisher (1890-1962)
• ‘Proof by contradiction’
• Suppose the null hypothesis is true
– In our example, the null hypothesis is that the finishing times for young
women are drawn from the same distribution as for the rest of the
female contestants.
– Knowing the mean and standard deviation of the population, we can
compute the sampling distribution of the mean for a sample of size n.
This is the null hypothesis distribution.
– The mean time for our sample of young women should be plausible
under this sampling distribution.
– If it is not plausible, it suggests that the null hypothesis is false.
– This lends credence to our alternate hypothesis (that young women are
faster).
PSYC 6130, PROF. J. ELDER
4
How do we judge the plausibility of the null hypothesis?
• The sample mean should be plausible under the sampling
distribution of the mean.
X
p( X )
Implausible
X
X
X

Fairly plausible
Highly plausible
PSYC 6130, PROF. J. ELDER
5
Plausibility of the null hypothesis
• The plausibility of the null hypothesis is judged by computing the
probability p of observing a sample mean that is at least as deviant
from the population mean as the value we have observed.
p( X )
X
p
PSYC 6130, PROF. J. ELDER
X

6
Plausibility of the null hypothesis
• This computation is simplified by converting to z-scores.
• Under the assumption of normality, we can determine this probability
from a standard normal table.
z
p( z )
X 
X
1
p
PSYC 6130, PROF. J. ELDER
z
0
7
Results for 2005 Toronto Marathon
n  420
  4hr 16min  256 min
  33min
PSYC 6130, PROF. J. ELDER
8
Results for Random Sample
of Women Under 25
n  38
X  4hr 9min  249 min
PSYC 6130, PROF. J. ELDER
9
Statistical Decisions
• We now know the probability that an observation like ours could
have been drawn from the general female contestant population, i.e.
that our ‘treatment of youth’ had no effect.
• This probability is pretty small. Should we reject the null
hypothesis? This is the process of turning a continuous probability
(a real number) into a binary decision (yes or no).
• If we reject the null hypothesis, there is a chance we will be wrong.
We have to decide what chance we are willing to take, i.e. the
maximum p-value we will accept as grounds for rejecting the null
hypothesis.
• We call this probability threshold the alpha (a) level. A typical value
is .05.
• The alevel must be decided prior to the experiment.
PSYC 6130, PROF. J. ELDER
10
Type I and Type II Errors
• Type I Error: the null hypothesis is true and we reject it.
• Type II Error: the null hypothesis is false and we fail to
reject it.
Actual Situation
Researcher’s Decision
Accept the Null
Hypothesis
Reject the Null
Hypothesis
PSYC 6130, PROF. J. ELDER
Null Hypothesis is True
Null Hypothesis is False
p (accept H0 | H0 true)
p (accept H0 | H0 false)
p (reject H0 | H0 true)
p (reject H0 | H0 false)
 1 a
a
11

 1   (power)
Type I and Type II Errors
• Which is more serious?
– Type I can be bad, as rejecting the null hypothesis (e.g., ‘This
stuff really works’), may cause actions to be taken that have no
value.
– Type II may not be so bad, if it is understood that the treatment
may still have an effect (we fail to reject the null hypothesis, but
we do not reject the alternate hypothesis).
– But Type II may be bad if it leads to inaction when action would
have produced good results (e.g., a cure for cancer).
PSYC 6130, PROF. J. ELDER
12
One-Tailed vs Two-Tailed Tests
• Our marathon hypothesis was one-tailed, because we
made a specific prediction about the direction of the
effect (young women are faster).
• Suppose we had simply hypothesized that young women
are different.
PSYC 6130, PROF. J. ELDER
13
Two-Tailed Test
p( z )
z
X 
X
1
z
PSYC 6130, PROF. J. ELDER
0
p
14
z
One-Tailed vs Two-Tailed Tests
• Use a one-tailed test when you have a specific reason to
believe the effect will be in a particular direction, and you
do not care if the effect is in the opposite direction.
• Otherwise, use a two-tailed test.
• One-tailed tests will always result in smaller p values,
and hence a greater chance of reaching significance for
your directional hypothesis.
• The decision of whether to perform one-tailed or twotailed tests must be made prior to data collection.
PSYC 6130, PROF. J. ELDER
15
Basic Procedure for Statistical Inference
1. State the hypothesis
2. Select the statistical test and significance level
3. Select the sample and collect the data
4. Find the region of rejection
5. Calculate the test statistic
6. Make the statistical decision
PSYC 6130, PROF. J. ELDER
16
Step 1. State the Hypothesis
Null hypothesis: marathon times for young women are the same
as for the general female contestant population.
Alternate hypothesis: young women are faster.
H0 :   0
HA :   0
PSYC 6130, PROF. J. ELDER
17
Step 2. Select the Statistical Test and the
Significance Level
• We are comparing a sample mean to a population with
known mean and standard deviation  z-test
• p=.05 is probably appropriate.
PSYC 6130, PROF. J. ELDER
18
Step 3. Select the Sample and Collect the
Data
• Ideally, we would randomly assign the treatment to a
random sample of the population (Toronto Marathon
women). Is this possible?
• Instead, we randomly sample female contestants under
25.
PSYC 6130, PROF. J. ELDER
19
Step 4. Find the Region of Rejection
• The z value defining the rejection region is called the
critical value for your test, and is a function of the
selected α-level. For this reason, we often denote the
critical value as zα
p( z )
1
a  .05 za  1.65
PSYC 6130, PROF. J. ELDER
0
20
Step 5. Calculate the Test Statistic
z
PSYC 6130, PROF. J. ELDER
X 
X
21
Step 6. Make the Statistical Decision
• p<a: Reject null hypothesis.
• p>a: Fail to reject null hypothesis.
PSYC 6130, PROF. J. ELDER
22
Example: Height of Female Psychology Graduate Students
Canadian Adult Female Population:


Sample: Female students enrolled in PSYC 6130C 2008-09
162.10 cm
6.55 cm
PSYC 6130, PROF. J. ELDER
23
Assumptions Underlying One-Sample z Test
• Random sampling
• Variable is normal
– CLT: Deviations from normality ok as long as sample is large.
• Dispersion of sampled population is the same as for the
comparison population
– e.g. suppose means are the same, but dispersion of sampled
population is greater than dispersion of comparison population.
PSYC 6130, PROF. J. ELDER
24
Limitations of the One-Sample Test
• Strongly depends on random sampling.
• Better to have two groups of subjects: test (treatment)
group and control group.
• Problem of random sampling reduces to problem of
random assignment to two groups: much easier!
PSYC 6130, PROF. J. ELDER
25
Reporting your results
• Express your result in evocative English, then include
the required numbers.
• Follow APA style.
• Example:
– Young female runners were not found to be significantly faster
than the general female contestant population, z=-1.31, p=0.095,
one-tailed.
PSYC 6130, PROF. J. ELDER
26
More on Type I and Type II Errors
H0 is true
H0 is false
1 
a
Total number of significant results
•
Consistent use of a fixed alpha-level determines the proportion of null
experiments that generate significant results.
•
Don’t have enough information to know how many reported results are
errors, because:
– Don’t know the relative proportion of cases where H0 is true and H0 is false.
– Don’t know the power of effective experiments.
– Typically only significant results are reported (publication bias).
PSYC 6130, PROF. J. ELDER
27