Download pptx

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia, lookup

History of statistics wikipedia, lookup

Transcript
Three Views of
Hypothesis Testing
What are we doing?
• In science, we run lots of experiments
• In some cases, there's an "effect”; in others there's not
– Some manipulations have an impact; some don't
– Some drugs work; some don't
• Goal: Tell these situations apart, using the sample data
– If there is an effect, reject the null hypothesis
– If there’s no effect, retain the null hypothesis
• Challenge
– Sample is imperfect reflection of population, so we will
make mistakes
– How to minimize those mistakes?
Analogy: Prosecution
• Think of cases where there is an effect as "guilty”
– No effect: "innocent" experiments
• Scientists are prosecutors
– We want to prove guilt
– Hope we can reject null hypothesis
• Can't be overzealous
– Minimize convicting innocent people
– Can only reject null if we have enough evidence
• How much evidence to require to convict someone?
• Tradeoff
– Low standard: Too many innocent people get convicted
(Type I Error)
– High standard: Too many guilty people get off
(Type II Error)
Binomial Test
• Example: Halloween candy
– Sack holds boxes of raisins and boxes of jellybeans
– Each kid blindly grabs 10 boxes
– Some kids cheat by peeking
• Want to make cheaters forfeit candy
– How many jellybeans before we call kid a cheater?
Binomial Test
• How many jellybeans before we call kid a cheater?
• Work out probabilities for honest kids
– Binomial distribution
– Don’t know distribution of cheaters
• Make a rule so only 5% of honest kids lose their candy
• For each individual above cutoff
Honest
Cheaters
– Don’t know for sure they cheated
– Only know that if they were honest, chance would have been less than
5% that they’d have so many jellybeans
0.20
?
0.00
0.10
• Therefore, our rule catches as many cheaters as possible
while limiting Type I errors (false convictions) to 5%
0 1 2 3 4 5 6 7 8 9 10
Jellybeans
Testing the Mean
• Example: IQs at various universities
– Some schools have average IQs (mean = 100)
– Some schools are above average
– Want to decide based on n students at each school
• Need a cutoff for the sample mean
– Find distribution of sample means for Average schools
– Set cutoff so that only 5% of Average schools will be mistaken as Smart
Smart
0.4
0.8
p(M)
s
?
n
0.0
exp(-z^2)
Average
-4
-2
0
100
z
2
4
Unknown Variance: t-test
• Want to know whether population mean equals m0
– H0: m = m0
H1: m ≠ m0
• Don’t know population variance
– Don’t know distribution of M under H0
– Don’t know Type I error rate for any cutoff
• Use sample variance to estimate standard error of M
-4
4e-04
Distribution of t when H0 is True
2e-04
Critical
value
0e+00
pt(z[2:8001], 10) - pt(z[1:8000], 10)
0.4
p(M)
p(t) 0.8
0.0
exp(-z^2)
– Divide M – m0 by standard error to get t
– We know distribution of t exactly
-4
0
-2
-2
2000
m00
0
4000
0
z[1:8000]
Index
z
2
6000
a
2
4
8000
Critical Region
4
Two-tailed Tests
• Often we want to catch effects on either side
• Split Type I Errors into two critical regions
3e-04
H1
?
H0
Critical
value
0e+00
a/2
pt(z[2:8001], 10) - pt(z[1:8000], 10)
– Each must have probability a/2
-4
-2
Test
0
statistic
z[1:8000]
Critical
value
2
H1
?
4
a/2
An Alternative View: p-values
• p-value
– Probability of a value equal to or more extreme than what you actually got
– Measure of how consistent data are with H0
• p>a
– t is within tcrit
– Retain null hypothesis
• p<a
0.002
0.001
tcrit for a = .05
p = .03
0.000
p
0.003
0.004
– t is beyond tcrit
– Reject null hypothesis; accept alternative hypothesis
-4
-2
0
t
t2 tt= 2.57
4
Confidence
interval
4e-04
pt(z[2:8001], 10) - pt(z[1:8000], 10)
Three Views of Inferential Statistics
0e+00
• p-value & a
– Type I error rate
• Can answer hypothesis test
at any level
– Result predicted by H0
vs. confidence interval
– Test statistic vs. critical value
– p vs.
m0 M m0
-2
0
2
z[1:8000]
-4
Test
0
statistic
-2
2
4
z[1:8000]
0
a
m0
Critical
value
Critical
value
3e-04
– Measure of consistency with H0
m0
-4
0e+00
• Test statistic & critical value
pt(z[2:8001], 10) - pt(z[1:8000], 10)
– Values of m0 that don’t
lead to rejecting H0
2e-04
• Effect size & confidence interval
p
1
4