Download pptx

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Three Views of
Hypothesis Testing
What are we doing?
• In science, we run lots of experiments
• In some cases, there's an "effect”; in others there's not
– Some manipulations have an impact; some don't
– Some drugs work; some don't
• Goal: Tell these situations apart, using the sample data
– If there is an effect, reject the null hypothesis
– If there’s no effect, retain the null hypothesis
• Challenge
– Sample is imperfect reflection of population, so we will
make mistakes
– How to minimize those mistakes?
Analogy: Prosecution
• Think of cases where there is an effect as "guilty”
– No effect: "innocent" experiments
• Scientists are prosecutors
– We want to prove guilt
– Hope we can reject null hypothesis
• Can't be overzealous
– Minimize convicting innocent people
– Can only reject null if we have enough evidence
• How much evidence to require to convict someone?
• Tradeoff
– Low standard: Too many innocent people get convicted
(Type I Error)
– High standard: Too many guilty people get off
(Type II Error)
Binomial Test
• Example: Halloween candy
– Sack holds boxes of raisins and boxes of jellybeans
– Each kid blindly grabs 10 boxes
– Some kids cheat by peeking
• Want to make cheaters forfeit candy
– How many jellybeans before we call kid a cheater?
Binomial Test
• How many jellybeans before we call kid a cheater?
• Work out probabilities for honest kids
– Binomial distribution
– Don’t know distribution of cheaters
• Make a rule so only 5% of honest kids lose their candy
• For each individual above cutoff
Honest
Cheaters
– Don’t know for sure they cheated
– Only know that if they were honest, chance would have been less than
5% that they’d have so many jellybeans
0.20
?
0.00
0.10
• Therefore, our rule catches as many cheaters as possible
while limiting Type I errors (false convictions) to 5%
0 1 2 3 4 5 6 7 8 9 10
Jellybeans
Testing the Mean
• Example: IQs at various universities
– Some schools have average IQs (mean = 100)
– Some schools are above average
– Want to decide based on n students at each school
• Need a cutoff for the sample mean
– Find distribution of sample means for Average schools
– Set cutoff so that only 5% of Average schools will be mistaken as Smart
Smart
0.4
0.8
p(M)
s
?
n
0.0
exp(-z^2)
Average
-4
-2
0
100
z
2
4
Unknown Variance: t-test
• Want to know whether population mean equals m0
– H0: m = m0
H1: m ≠ m0
• Don’t know population variance
– Don’t know distribution of M under H0
– Don’t know Type I error rate for any cutoff
• Use sample variance to estimate standard error of M
-4
4e-04
Distribution of t when H0 is True
2e-04
Critical
value
0e+00
pt(z[2:8001], 10) - pt(z[1:8000], 10)
0.4
p(M)
p(t) 0.8
0.0
exp(-z^2)
– Divide M – m0 by standard error to get t
– We know distribution of t exactly
-4
0
-2
-2
2000
m00
0
4000
0
z[1:8000]
Index
z
2
6000
a
2
4
8000
Critical Region
4
Two-tailed Tests
• Often we want to catch effects on either side
• Split Type I Errors into two critical regions
3e-04
H1
?
H0
Critical
value
0e+00
a/2
pt(z[2:8001], 10) - pt(z[1:8000], 10)
– Each must have probability a/2
-4
-2
Test
0
statistic
z[1:8000]
Critical
value
2
H1
?
4
a/2
An Alternative View: p-values
• p-value
– Probability of a value equal to or more extreme than what you actually got
– Measure of how consistent data are with H0
• p>a
– t is within tcrit
– Retain null hypothesis
• p<a
0.002
0.001
tcrit for a = .05
p = .03
0.000
p
0.003
0.004
– t is beyond tcrit
– Reject null hypothesis; accept alternative hypothesis
-4
-2
0
t
t2 tt= 2.57
4
Confidence
interval
4e-04
pt(z[2:8001], 10) - pt(z[1:8000], 10)
Three Views of Inferential Statistics
0e+00
• p-value & a
– Type I error rate
• Can answer hypothesis test
at any level
– Result predicted by H0
vs. confidence interval
– Test statistic vs. critical value
– p vs.
m0 M m0
-2
0
2
z[1:8000]
-4
Test
0
statistic
-2
2
4
z[1:8000]
0
a
m0
Critical
value
Critical
value
3e-04
– Measure of consistency with H0
m0
-4
0e+00
• Test statistic & critical value
pt(z[2:8001], 10) - pt(z[1:8000], 10)
– Values of m0 that don’t
lead to rejecting H0
2e-04
• Effect size & confidence interval
p
1
4