Download H 0

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Hypothesis testing
Draw inferences about a population
based on a sample
Null Hypothesis expresses no difference
Example:
Often said
“H naught”
H0:  = 0
Later…….
H0: 1 = 2
Or any number
Alternative Hypothesis
H0:  = 0; Null Hypothesis
HA:  = 0; Alternative Hypothesis
 Researcher’s predictions should be a priori, i.e.
before looking at the data
-To test a hypothesis about , determine X from a
random sample
-If H0 is true, what is the probability of X as far (above or
below- 2 tailed) from  as the observed X (for a given
n)?
-Calculate the normal deviate
Population Mean
Sample Mean
X-
Z=
x
Normal deviate
SE of mean
How to determine what proportion of a normal
population lies above/below a certain level
If distribution of Hobbit
heights is normal with mean
= 120 cm, SD = 20
Half < 120 & half >120
120 cm
What is probability of finding
a Hobbit taller than 130 cm??
The average
Hobbit
Calculate the normal deviate
- Any point on normal curve
- Here, 130 cm
Mean
Xi - 
Z=

- Normal deviate
- Test statistic
SD
Z = (130-120)/20 = 0.5
- Calculate P (Z); table 2.B Zar, Table A in S&R)
-If Z is large, the probability that H0 is true is small
-Pre-selected probability level,  , that you require to
reject the null, referred to as significance level
-0.05 is common 
-If Z (test statistic) is larger than critical value, then H0
is rejected
-If Z (test statistic) is smaller than critical value, then
H0 is not rejected (failure to reject null)
Table B.2; Zar
Table A S & R
P (probability) (Xi >130 cm) = P (Z>0.50) = 0.3085 or
30.85%
What is the probability of finding a hobbit
between 120cm and 130cm tall?
density
Area = 0.3085
or 30.85%
130cm
density
120cm
Area = 0.3085
or 30.85%
0
0.5
BE AWARE!!
Different tables will give you different areas
under the curve. You need to know what the
table you are looking at is actually telling you!!
S&R Table A
Your book
gives you this
area (0.1915)
You want this area:
0.5 - 0.1915=0.3085
120cm
130cm
Statistical Error
Sometimes H0 will be rejected (based on large test
statistic & small P value) even though H0 is really true
i.e., if you had been able to measure the entire
population, not a sample, you would have found
no difference between and  some value- but
based on Xbar you see a difference.
The mistake of rejecting a true H0 will happen with
frequency 
So, if H0 is true, it will be rejected ~5% of the time as 
frequently = 0.05
H0 : mean = 0
Population mean = 0
population=“True”
Sample mean = 20
0
Sample=What you see
0
20
Conclude based on sample mean that population
mean  0, but it really does (H0 true), therefore you
have falsely rejected H0
Type 1 Error
Statistical Error
Sometimes H0 will be accepted (based on small test
statistic & large P value) even though H0 is really false
i.e., if you had been able to measure the entire
population, not a sample, you would have found
a difference between and  some value- but
based on Xbar you do not see a difference.
The mistake of accepting a false H0 will happen with
frequency β
H0 : mean = 0
Sample mean = 20
Population= “True”
Sample mean = 0
0
20
Sample= what you see
0
20
Conclude based on sample mean that population
mean = 0, but it really does not (H0 really false),
therefore you have falsely failed to reject H0
Type 2 Error
Finicky Words
Reject the null hypothesis (or other)
Fail to reject the null hypothesis
Accept the null hypothesis
Support the null hypothesis
Statistically
correct
I think these
are OK
Prove the null hypothesis to be true
H0 Is rejected
H0 is true
H0 Is not true
Type I error
No error
H0 Is not rejected No error
Type II error
Probability of Type I Error= 
Probability of Type II Error= 
 rarely known or reported
power of a test = (1- ) = probability of
rejecting null hypothesis that is really false
and that should be rejected
For a given N,  and  inversely related
Both types of Error go down as you increase N
Read pgs 157-169 in S&R
A few more words on hypothesis testing
Methods trace to R.A. Fisher and colleagues. Before
this, opinion of expert was criterion.
Many practicing / publishing statisticians take issue with
the null hypothesis and testing framework (see
upcoming quotes)
Still the dominant paradigm of analysis and you have to
learn it
“Under the usual teaching, the trusting student, to pass the course must
forsake all the scientific sense that (s)he has accumulated so far, and
learn the book, mistakes and all." (Deming 1975)
"Small wonder that students have trouble [with statistical hypothesis
testing]. They may be trying to think." (Deming 1975)
"... surely, God loves the .06 nearly as much as the
.05." (Rosnell and Rosenthal 1989)
"What is the probability of obtaining a dead person (D)
given that the person was hanged (H); that is, in symbol
form, what is p(D|H)? Obviously, it will be very high,
perhaps .97 or higher. Now, let us reverse the question:
What is the probability that a person has been hanged
(H) given that the person is dead (D); that is, what is
p(H|D)? This time the probability will undoubtedly be
very low, perhaps .01 or lower. No one would be likely to
make the mistake of substituting the first estimate (.97)
for the second (.01); that is, to accept .97 as the
probability that a person has been hanged given that the
person is dead. Even thought this seems to be an
unlikely mistake, it is exactly the kind of mistake that is
made with the interpretation of statistical significance
testing---by analogy, calculated estimates of p(H|D) are
interpreted as if they were estimates of p(D|H), when
they are clearly not the same." (Carver 1978)
One sample, two tailed tests concerning means
Does the body temperature of a group of 24 crabs
differ from room temperature
TweetyBird parakeet food company wants to know if
their mega-bird formulation helps birds grow. They
measure the weight gain/loss of 40 birds eating the
food for one week. Does this differ from zero.
Describe other scenarios…..
One sample, two tailed tests concerning means
Crab Temperature
Null; H0:  = room temp
Alternative; HA:  ≠ room temp
Must determine if sample mean (xbar) is different
from room temp
Similar to calculating normal deviate, calculate “t”
Value to which you compare
Sample Mean
t=
t-statistic
X-
sx
Sample SE
William Sealy Gosset (1876 –1937)
Mathematician worked as brewer for Guinness
Guinness progressive agro-chemical business, Gosset applied
statisticas in brewery and farm selection of best varieties of barley.
Guinness prohibited publishing by employees due to worry over
trade secrets
Used pseudonym “Student” and his most famous achievement now
referred to as the Student t-distribution
Gosset was a friend of both Pearson and Fisher, an achievement,
for each had a massive ego and a loathing for the other. Gosset
was a modest man who cut short an admirer with the comment that
“Fisher would have discovered it all anyway.”
One sample, two tailed tests concerning means
Crab Temperature
For the crabs:
 = .05 (set by you ahead)
=24.3 C
Xbar= 25.03 C
 = 24 (n-1)
S2(variance)= 1.80 C2
sx
=
Sample SE
t=
X-
sx
1.80 C2
25
s
sx
=
SD
Variance
Mean SS
n
s2
sx
=
n
t=
X-
sx
25.03 C – 24.3 C
t=
= 2.704
0.27 C2
t0.05(2) = 2.064 (critical t from table B in S&R, B3 in Zar)
test statistic (t) > critical t …….
Reject null hypothesis, conclude sample did not come
from population with body temp of 24.3
Excel demo
t- distribution = normal distribution for very large
sample sizes
Area outside critical values represent 5% total area
density
Expect xbar so far from  that it lies in critical area
only 5 % of time
Critical value
2.5%
2.5%
-2.064
0
2.064
t for  = 24,  =0.05
2.704=t
If p<0.05 (or your chosen ), expect to get a values
as extreme as the observed based on chance alone
5% (or your chosen %) of the time
Expect to falsely reject null ~5% of the time
H0 is true
H0 Is not
H0 Is rejected
Type I error
true
No error
H0 Is not
No error
Type II error
rejected
Assumptions of a t-test
Theoretical basis of t testing assumes that the sample
came from a normal population
But….. minor deviation from normality not does not
affect validity, ie test is “robust from deviation from
normality”
Effect of deviation from normality more important with
small 
Effect of deviation from normality decreases as N
increases
Assumes that data are random sample
Data must be true replicates (can’t measure the same
crab 25 times; Hurlbert 1984, more later)
One sample, one tailed tests concerning means
Does a drug cause weight loss?
The Jamesville county school board has mandated
that the mean standardized reading test scores should
be above 440. Oak elementary school wants to know if
their mean test score > 440.
Describe other scenarios…..
One sample, one tailed tests concerning means
Weight loss product
Null; H0:  ≥ 0; weight gain or no change, ie no loss
Alternative; HA:  <0; weight loss
Must determine if sample mean weight gain (xbar)
is different from 0
One sample, one tailed tests concerning means
Weight loss
For weight loss:
 = .05 (set by you ahead)
=0 kg
Xbar= -0.61 kg
 = 11 (n-1)
S2(variance)= 0.4008kg2
sx
=
Sample SE
t=
X-
sx
0.4008kg2
12
s
sx
=
SD
Variance
Mean SS
n
s2
sx
=
n
t=
X-
sx
-.61kg -0kg
t=
= -3.389
0.18 kg
t0.05(11) = 1.796 (critical t from table B3; table gives
you critical t for 2-tails)
test statistic (t) > critical t …….
Reject null hypothesis, support alternative
hypothesis of weight loss
Excel demo
Confidence limits of mean
density
Xbar expected in
tails only 5% of
the time, then 95%
of the time Xbar
lies in this region
-2.064
0
2.064
t for  = 24,  =0.05
So, if we know xbar and SE and degrees of freedom, we
can calculate an interval in which we will be 95% (or
other value) confident that the “true mean” () falls
Confidence limits of mean
X ± t(2), * sx
CI= mean ± (critical t * SE)
For the crabs…..
25.03 ± (2.064 * .27)
25.03 ± 0.56
Confidence limits of mean
density
Xbar expected in
tails only 5% of
the time, then 95%
of the time Xbar
lies in this region
24.47
25.03
25.59
t for  = 24,  =0.05
CI is two tailed
The smaller SE, the smaller CI. We have more
precise estimate of  when SE small
A large N results in smaller SE
Parameter estimate from large sample more precise
than from small sample