Download Hypothesis Testing - personal.kent.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Hypothesis Testing
Hypothesis Testing

Research hypothesis are formulated in terms of
the outcome that the experimenter wants, and
an alternative outcome that he doesn’t want

I.e. If we’re comparing scores on an exam with two
groups, one with test anxiety and one without, our
hypotheses are:
(1) That the group with test anxiety will score higher
(expected outcome)
 (2) The two groups will score the same (unexpected
outcome)

Hypothesis Testing

The hypothesis that outlines that outcome that
we’re expecting/hoping for is the Research
Hypothesis (H1)

The hypothesis that runs counter to our
expectations is the Null Hypothesis (Ho)
Hypothesis Testing

We can use the sampling distribution of the mean to determine
the probability that we would obtain the mean of our sample by
chance

I.e. the same way we could convert a score to a z-score, and determine the
probability of obtaining values higher or lower than it
Normal Distribution
Cutoff at +1.645
1200
1000
800
600
400
200
0
50
3.
00
3.
50
2.
00
2.
50
1.
00
1.
0
.5
00
0.
0
-.5
0
.0
-1
0
.5
-1
0
.0
-2
0
.5
-2
0
.0
-3
0
.5
-3
0
.0
-4
z
Hypothesis Testing

If the probability is low (i.e. only a 5% chance
or less), we can assume that chance sampling
error did not produce our results, and our IV did

I.e. In our comparison of people with test anxiety,
our test anxious group may also be quite dumb,
resulting in their poor test scores. However, if their
scores are extreme enough (low), we can discount
even that possibility
Hypothesis Testing

Why bother with Ho at all?

Technically, we can never prove a particular hypothesis to be
true



You cannot prove the statement: “All ducks are black”, because you
would have to have observations on all ducks that were, are, and ever
will be (i.e. on all ducks)
You can disprove a hypothesis – “All ducks are black” can be easily
proven false by seeing one white (non-black) duck
This is why technically, we are supposed to talk about
“rejecting Ho” and not “accepting H1” and “failing to reject
Ho”, never “proving H0”
Hypothesis Testing

Beginning with the assumption that H0 is true,
and trying to disprove it also maintains the
scientific spirit of objectivity and skepticism
Objectivity – illustrates that we value the results of
the data more than the hypothesis that, if proven,
would make us happiest (H1)
 Skepticism – showing that we are not convinced of
even our own hypothesis until confirmed by the data

Hypothesis Testing

In our example of people with (x1) and without
test anxiety (x2), where our hypothesis is that
people with anxiety will have lower IQ scores:
Ho = [x1 ≥ x2]
 H1 = [x1 < x2]

Hypothesis Testing

If, instead, we were testing if the group with
anxiety was different from the average student
population (Hint: Look at the italics), how would
we phrase Ho and H1?

What if we were testing whether or not the two
groups (x1 & x2) were equal?
Hypothesis Testing

How do we know when our sample is rare enough to
fail to accept Ho?



Statistical convention says when the probability of obtaining a
mean that exceeds the one you’ve obtained is only 5% or less,
we can says this is not due to chance
AKA the probability of rejecting Ho when it is “true” (i.e.
screwing-up) = significance/rejection level/alpha/critical value
HOWEVER THIS DOES NOT MEAN THAT 5.1% IS
MEANINGLESS!
p<.05
Hypothesis Testing

For our group with test anxiety, if their mean score on
an IQ test was 70, we first convert this into a z-score (μ
= 100, σ = 15)

z = (70 – 100)/15 = -2
Hypothesis Testing


The probability of obtaining a score at or below
z = -2 is .0228 or 2.3%
Since this is below the 5% convention, we would
reject Ho (or “accept” H1)
This score is unlikely in the population of average
individuals
 I.e. Our sample of anxious individuals came from a
different population then nonanxious individuals

Hypothesis Testing

α is the p(“accepting” H1 when it is
false/rejecting H0 when it is true), or of making
a mistake called a Type I Error


p(“accepting” H1 when it is false) ≠ p(“accepting”
H1) – the former refers to a type of error, the latter
simply to an outcome
What about the p(“accepting” H0 when it is
false/rejecting H1 when it is true)?

This is called a Type II Error, or β (Beta)
Hypothesis Testing

Why not make α as small as possible?


Because as α [p(Type I Error)] decreases, β [p(Type II Error)] increases
Red = α, Blue = β
Hypothesis Testing

It seems like we care more about Type I Error
than Type II Error. Why?
Scientists are more likely to commit a Type I Error
because they are more motivated to prove their
hypothesis (H1)
 In Law, establishing motive is important to proving
guilt, without a motive, there’s little reason to expect
that a crime will occur, let alone stringently attempt
to protect against it

Hypothesis Testing

So long we’re only willing to take a 5% of incorrectly
rejecting Ho, it doesn’t matter how we distribute this
5%, as long as it doesn’t exceed 5%



We can place all 5% in one “tail” of the distribution if we
only expect a difference in means in one direction = OneTailed/Directional Test
We can place half of 5% (2.5%) in either “tail”, if we have no
a priori (before) hypothesis about where our mean difference
will be – Two-Tailed/Non-Directional Test
The decision of which type of test to use should be made a
priori based on theory, not data driven
Hypothesis Testing
One-Tailed Test
Two-Tailed Test
Hypothesis Testing

Ho and H1 with One- and Two-Tailed Tests:

For One-Tailed Tests:

If our hypothesis is that group x is lower than group y



Ho = (x ≥ y)
H1 = (x < y)
For Two-Tailed Tests:

If our hypothesis is that group x is either greater than or
less than group y


Ho = (x = y)
H1 = (x ≠ y)
Hypothesis Testing

Psychologists can be sneaky bastards and
covertly increase α by testing one hypothesis
many times by:
Evaluating one hypothesis with many different
statistical tests
 Using more than one measure to operationalize one
DV


i.e. Measuring depression with both the Beck Depression
Inventory-II (BDI-II) and the Minnesota Multi-Phasic
Personality Inventory-II (MMPI-II) = testing depression
twice = doubling your α
Hypothesis Testing

What should you do to prevent this from happening?





If you’re testing one hypothesis many different ways or with
many measures, adjust α accordingly w/ the Bonferroni
Correction
Note: NOT the same as the Beeferoni™ Correction, which
prevents incorrect preparation of Chef Boyardee ™
products
Testing w/ 2 tests; Test using α = .05/2 = .025
Test using 3 measures of one construct; Use α = .05/3 =
.0167
Testing w/2 tests and 3 measures; Use α = .05/6 = .008
Hypothesis Testing

Example:

Your hypothesis is that males and females will differ
in degree of instrumental aggression (IA =
aggression designed to obtain an end). IA is
measured with the Instrumental Aggression Scale
(IAS) and the Positive and Negative Affect Scale
(PANAS), and the groups are evaluated with both
ANOVA and SEM

What is your corrected α-level?
Hypothesis Testing

Three of the Ten Commandments of Statistics:

1. P-Values indicate the probability that your findings
occurred by chance or the likelihood of obtaining them again
in a similar sample NOT the strength of the relationship
between an IV and DV


I.e. NEVER SAY “In my experiment evaluating the influence of
coffee (the IV) on people’s activity levels (the DV), I found highly
significant results at p = .000001, indicating that coffee produces a lot
of activity in people”
CORRECT – “The likelihood that the effect, that coffee boosted
activity levels, was due to sampling error (i.e. chance) was only
.000001”
Hypothesis Testing

Three of the Ten Commandments of Statistics:

2. p = .052, .055, etc. is not “insignificant”, and does
not mean that a relationship between your IV and DV
does not exist, just that it did not meet
“conventional” levels of significance.

3. When testing a hypothesis multiple ways, always
use some corrected level of α (i.e. the Bonferroni
Correction).