Download Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Introduction to
Hypothesis Testing
Chapter 8
Applying what we know:
inferential statistics
z-scores
+
probability
distribution of sample means
HYPOTHESIS TESTING!
Some Familiar Concepts…
Sampling error: There is always some diff.
btwn. samples and populations, even
when sample is untreated (control)
 M ≠μ just by chance.
 So… how can we tell if a difference we
observe is due to:

– chance (random sampling error or fluctuation)
or
– treatment effect or true group differences
(differences do exist in the population) ?
…and Some New Concepts

H1: Alternate hypothesis
– What we believe to be true
– There is a change, difference, relationship
But it’s easier to disprove than to prove, so…
 H0: Null hypothesis

– No change, no difference, no relationship
– Try to prove this is wrong!
– Disproving H0 provides support for
(but does not prove) H1.

Decide ahead of time which sample statistics
(means) are:
– likely to be obtained if H0 is true
– likely to be obtained if H0 not true (critical region!)
THE HYPOTHESIZED (NULL) DISTRIBUTION
What is this called?
What is this called?
What is this value called?
Figure 8-3 (p. 236): The set of potential samples is
divided into those that are likely to be obtained and those that
are very unlikely if the null hypothesis is true.
Sampling Distribution Z-scores
in a new light

z=M–μ
σ
M

z = obtained M – hypothesized μ .

Ratio of: obtained difference (distance)
typical, expected, standard distance

How far away from typical, or expected,
is our sample?
standard error between M and μ
Hypotheses

A hypothesis states an expected relationship
between two or more variables.

May be causal: one variable causes the other.
A

May be descriptive: one variable is simply
related to the other.
A

B
B
Much of this chapter focuses on causal hypotheses,
from experimental studies (treatment group and
control group)
Where do Hypotheses come
from?
Personal observations, opinions
 Existing research
 Theory
 Models

– more specific and concrete than theories
– usually describe specific relationships
among constructs/variables
Scientific Hypotheses Must Be

Testable: Can a test be designed?

Falsifiable: Could it potentially be incorrect?
Room to be disproven?

Precise: Is it clearly defined?

Rational: Does it fit with existing facts?

Parsimonious: Is it as simple as possible?
Hypotheses cannot be proven!

A single experiment cannot PROVE a
hypothesis

Hypotheses are only supported or not
supported by scientific data.

We add evidence toward confirmation or
disconfirmation of a hypothesis
A Hypothesis Test
The null hypothesis:
The alpha level:
The sample data:
The critical region:
The conclusion:
A Jury Trial
A Hypothesis Test
A Jury Trial
The null hypothesis: We assume there is no
treatment (tx) effect until there is enough
evidence to show otherwise.
Assume an individual is
innocent until proven guilty.
The alpha level: We are confident that the tx
does have an effect because it is very unlikely
that the data could occur simply by chance.
Jury must be convinced
beyond a reasonable doubt
before finding defendant guilty.
The sample data: The research study is
conducted to gather data (evidence) to
demonstrate that the treatment had an effect.
Prosecutor presents evidence
to demonstrate defendant
guilty.
The critical region: Either the sample data fall
in the critical region (enough evidence to reject
H0) or the data don’t fall into critical region (not
enough evidence to reject H0).
Either there is enough
evidence to convince jury that
defendant is guilty, or there is
not.
The conclusion: If the data aren’t in the critical
region, the decision is to “fail to reject the null
hypothesis.” We have not proven that the null
is true; we simply have failed to reject it.
If there is not enough
evidence, the decision is “not
guilty”.
Directional vs. Nondirectional Tests
(one-tailed)

(two-tailed)
.
Nondirectional hypothesis/test
– Critical region is split between both tails:
on either side of the mean
– Allows possibility that tx effect in either direction
– More common, more conservative test

Directional hypothesis/test
– H1 specifies direction of the effect / difference
– Critical region is only in one tail
(either above or below mean)
– Less conservative
Error

Type I: Ho true (treatment does not have
an effect), but:
– Hypothesis test detects a false treatment
effect
– Reject Ho even though it’s true
– Think have support for H1 even though it’s
not true

Type II: Ho false (treatment does have an
effect), but:
– Hypothesis test failed to detect it
– Retain Ho even though it’s false
Type I and Type II Error
ACTUAL SITUATION
Decision
No Effect / Ho true
Effect Exists / Ho false
Reject Ho
Type I Error
False positive
(probability = )
True positive
(effect exists = correct!)
(decide
effect
does
exist)
Retain Ho
(decide
no effect
exists)
Ability to detect effect=POWER
test too sensitive:
p(reject false Ho) = 1- 
good sensitivity
to detect effect
True negative
(no effect=correct!)
good specificity,
selectivity to catch a
non-effect
Type II Error
False negative
(probability = )
detect nonexistent effect
test too specific:
fail to detect true effect
Power

Probability that a test will correctly:
– reject a false null hypothesis
– detect a real treatment effect
in other words:

Sensitivity of a statistical test to detect an
effect that does exist
Group Activity!

Make a graphical representation of these concepts:
–
–
–
–
–

Type I error (false positive)
Type II error (false negative)
True positive / negative
Alpha, Beta, Power
Sensitivity, specificity
Some ideas:
– Draw a concept map, decision tree, flow chart
– Sketch all possibilities using the null distribution, the
alternative distribution (see pp. 266-268)
– Use sample data / a sample hypothesis (Ho and H1)
– Use an analogy (like the trial by jury analogy)
Beyond p and chance: Effect Sizes

Limitations of hypothesis tests:
– give ratio of obtained to expected difference
– evaluate relative size of obtained difference
(or tx effect)
– Strongly influenced by sample size
(big enough n  small σM easy to reject Ho!)

Effect sizes:
– Give the absolute size of the obtained difference
(or tx effect)
– Scaled with std deviation, not std error
– Thus, not influenced by sample size

Cohen’s d
Figure 8-15 (p. 268)
The relationship between
sample size and power. The
top figure (a) shows a null
distribution and a 20-point
treatment distribution based
on samples of n = 16 and a
standard error of 10 points.
Notice that the right-hand
critical boundary is located in
the middle of the treatment
distribution so that roughly
50% of the treated samples
fall in the critical region.
In the bottom figure (b) the
distributions are based on
samples of n = 100 and the
standard error is reduced to 4
points. In this case,
essentially all of the treated
samples fall in the critical
region and the hypothesis
test has power of nearly
100%.
For Wednesday
Finish reading Chapter 9
 Finish HW Chapter 9
(turn in start of class)
