Download Significance Tests Significance tests Step 1: Formulate hypothesis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
11/5/09
Significance tests
 Question:
Significance Tests
 Given the collected data, is there evidence against a specified
hypothesis about the corresponding parameter?
FPP 26-27
 In other words, are the data consistent or not with a specified
hypothesis?
Logic of significance tests:
Significance test for a population
percentage
 Proof by contradiction:
 Civil rights and the 1960s
 In the court case Swain vs. Alabama (1965), the prosecution
 1. assume some hypothesis is true
 2. find a statistic (a quantity that depends on data) that takes on extreme
alleged there was discrimination against black people in grand
jury selection. Census data from the time indicates that 25% of
people eligible for grand jury service were black. A random
sample of 1050 people called to appear for possible jury duty
contained 177 black people. Is there evidence of
discrimination?
values when assumed hypothesis is false
 3. Calculate the value of this statistic in the collected data
 4. Calculate the probability of observing a value of the statistic as or more
extreme that the observed value, under the assumed hypothesis
 5. when this probability is small, one of two things happened
 A. the assumed hypothesis is correct and a rare event occurred
 B . the assumed hypothesis is incorrect.
 since rare events are by definition rare, we interpret small probabilities as evidence that
 Reference:
Devore, J. Probability and Statistics for Engineering and the Sciences. Pacific Grove, CA: Duxbury, 2000, p. 339
the assumed hypothesis is false.
 When the probability is not small, the data provide insufficient evidence to claim that
the assumed hypothesis is false.
Step 1: Formulate hypothesis
Step2: Find a relevant statistic
 Claim: There is discrimination
 Values of the sample percentage of black jurors much smaller
than 0.25 suggest the null hypothesis is not true
 The opposite of this claim is called the null hypothesis. It usually
can be translated as there is nothing unusual going on.
 The claim is called the alternative hypothesis. It usually can be
translated as there is some unusual pattern in the data
 Sample proportion = 177/1050 = 0.1689.
 Is this much smaller than 0.25?
 H0: P = 0.25 vs HA: P < 0.25
 A good way to determine this is by converting the difference
between 0.1689 and 0.25 to standard units
z=
(pˆ − p0 )
(.1689 − 0.25)
=
p0 (1− p0 ) /n
0.25(1− 0.25) /1050
€
1
11/5/09
Step 3: Calculate z in data
Step 4: Calculate the p-value
 We get:
 When n (the sample size) is large enough, we an use a
standard normal curve to calculate the probability of seeing a
value of z less (i.e.as or more extreme) the observed value of
-6.06
 The sample percentage of black jurors is six SE away from
zero
Conclusion in Swain case
Stating hypothesis
 Because the p-value is approximately 0, we reject the null
 Null Hypothesis (H0)
hypothesis. It is very unlikely that we would observe a
sample percentage of 16.89% or smaller if the true
percentage was 0.25. The data suggest that black jurors were
indeed selected less frequently than would have been
expected. The data provide some evidence of discrimination.
 The statement being tested in a test of significance is called the
null hypothesis
 Usually the null hypothesis
 is a statement of “no effect” or “no difference”,
 is a statement about a population,
 is expressed in terms of a (some) parameter(s).
 Example H0: µ=0
Stating hypothesis
Stating hypothesis
 Alternative Hypothesis ( Ha )
 One-sided alternative hypotheses:
 name given to the statement we hope or suspect to be true
instead of H0
 Example Ha: µ≠0
 Hypotheses always refer to some population or model, not a
 Example: Ha: μ< 0.
Ha: μ > 0
 Two-sided alternative hypothesis:
 Example: Ha: μ≠ 0
particular outcome
 We must decide whether the alternative hypothesis (Ha)
should be one-sided or two-sided
2
11/5/09
Stating hypothesis
Stating hypothesis
 Choosing one-sided or two-sided Hypothesis
 Example: Your company hopes to reduce the mean time (µ)
 The alternative hypothesis should express the hopes or suspicions
we had in mind when we decided to collect the data
 It is cheating to first look at the data and then frame Ha to fit
what the data show
required to process customer orders. At present, this mean is 3.8
days.You study the process and eliminate some unnecessary steps.
 Q: Did you succeed in decreasing the average process time?
Target: to show that the mean is now less than 3.8 days.
 So alternative hypothesis is one-sided
 The null hypothesis is “no change” value
 If you do not have a specific direction in advance, use a two-sided
alternative
Ho: μ= 3.8 vs Ha: μ< 3.8
Stating hypothesis
Stating hypothesis
 The mean area of several thousand apartments in a new
 Experimenters on learning in animals sometimes
development is advertised to be 1250 sqft. A tenant group
thinks that the apartments are smaller than advertised. They
hire an engineer to measure a sample of apartments to test
their suspicion.
 H0: µ=1250 vs. Ha: µ<1250
measure how long it takes a mouse to find its way
through a maze. The mean time is 18 seconds for one
particular maze. A researcher thinks that a loud noise
will cause the mice to complete the maze slower. She
measures how long each of 10 mice takes with a noise as
stimulus
 H0: µ=18 vs. Ha: µ>18
Stating hypothesis
Test Statistic
 Last year, your company’s service technicians took an
 After correctly formulating the null and alternative
average of 2.6 hours to respond to trouble calls from
business customers who purchased service contracts.
Do this year’s data show a different average response
time?
 H0: µ = 2.6 vs. Ha: µ ≠ 2.6
hypothesis we make a comparison between the hypothesized
value and the data by using a test statistic.
 Many test statistics can be thought of as a distance between a
sample estimate of a parameter and the value of the parameter
specified by the null hypothesis
observed − expected
 Most test statistics have generic form:
SE
 Test statistic for a proportion : z =
x − µ0
s/ n
pˆ − p0
p0 (1− p0 )
n
 Test statistic for a mean t =€
€
€
3
11/5/09
P-values
P-values
 A test of significance assesses the evidence against the null
 A test of significance finds the probability of getting an
hypothesis and provides a numerical summary of this
evidence in terms of a probability
 The idea is that “surprising” outcomes are evidence against Ho
 A surprising outcome is one that is far from what we would
expect if Ho were true
outcome as extreme or more extreme than the actually
observed outcome
 The direction or directions that count as “far from what we
would expect” are determined by the alternative hypothesis
 Definition: The probability, assuming that H0 is true, that
the test statistic would take a value as extreme or more
extreme than that actually observed is called the P-value of
the test
 the smaller the P-value, the stronger the evidence against H0
provided by the data
P-values
P-values
 What does “as or more extreme really mean”?
 When the alternative has a > sign, “as or more extreme” means
use area to the right of the test statistic in p-value calculation
 When the alternative has a < sign, “as or more extreme” means
use area to the left of the test statistic in p-value calculation
 When the alternative uses a≠ “as or more extreme” mean values
of the test statistic far from zero in positive and negative
directions.
 For these type of alternative hypthoses, add areas to the left of -|test
statistic| and to the right of |test statistic|
Interpretation of a p-value
Enough evidence?
 Common misinterpretations of p-values
 Below are some guidelines for judging p-values. (Don’t treat
 The p-value is not the probability that the null hypothesis is
true. (the null is either true or not)
 Also, (1-p-value) is not the probability that the alternative
hypothesis is true. (the alternative is either true or not true)
 Correct interpretation
 The p-value is the probability of getting a value of a test statistic
as or more extreme than the value of the statistic computed
from the collected data, under the assumption that the null
hypothesis is true
these as “golden standards”)
p-value
Evidence against H0
< 0.01-ish
very strong
> .01-ish and <.05-ish
moderate
> .05-ish and < .10-ish weak
> .10 ish
practically none
4
11/5/09
Statistical significance
Statistical significance
 To formalize testing further, some researchers advocate strict
 These cut-offs are called “significance levels”.
p-value cutoffs when deciding whether or not to reject null
hypotheses.
 Example: reject the null hypothesis when the p-value is less than
0.05. Otherwise, do not reject it.
 They are typically labeled with the Greek letter α (alpha).
 Example: for a statistical significance level of 0.05, we write
 α = 0.05
 When the null hypothesis is rejected, the term used to
describe the outcome of the test is “statistically significant”.
 Made-up example with typical language:
 “We go a p-value of 0.036 and used α = 0.05. The results are
statistically significant at the 0.05 level.
Etruscan example
Exploratory data analysis for
Etruscans
 In the eighth century B.C., the Etruscan civilization was the most advanced in all of Italy.
Its art forms and political innovations were destined to leave indelible marks on the
entire Western world. Originally located in the region now known as Tuscany, it spread
rapidly across the Apennines and eventually overran much of Italy. But as quickly as it
came, it faded. Militarily it was no match for the burgeoning Roman legions, and by the
dawn of Christianity it was all but gone.
 No chronicles of the Etruscan empire have ever been found, and to this day its origin
remains shrouded in mystery. Were the Etruscans native Italians or were they
immigrants? And if they were immigrants, where did they come from? Much of our
knowledge of the Etruscans derives from archaeological investigations and
anthropometric studies… (for example) body measurements to determine…
origins.” (Source: Larsen and Marx, Statistics, 2001, p. 513.)
 A team of archaeologists collected 84 skulls of Etruscan men and measured their head
breadth (in mm). Let’s assume that these 84 men are a random sample of Etruscan men.
If the Etruscan men were native, it makes sense to think that the population average head
breadth of Etruscans is comparable to the head breadth of modern Italians, 132.44 mm.
This assumes evolution has not shifted average head size substantially over the last 2800
years, an assumption that is reasonably close to true.
Significance test
A more wordy conclusion
 Step1: Specify the null and alternative hypothesis
 It’s practically impossible to observe a difference of 17 SE’s
 Claim: true average breadth of Etruscan heads differs from 132.44
 Ho:μ = 132.44 vs Ha: μ≠ 132.44
 Step2: compute a test statistic
observed - expected
x - µ 143.77381−132.44
=
=
= 17.4
SE
0.6514363
SD/ n
 The sample average is over 17 SE’s away from the hypothesized
by chance alone. Our initial assumption in the null
hypothesis is very unlikely to be true. The data
overwhelmingly suggest that modern Italians and the
Etruscans have different average head sizes, indicating that
Etruscans were not native to Italy.
average of 132.44
€  Step3: calculate the p-value
 For all intents and purposes this p-value is zero why?
 Step4: make a conclusion
 For those interested, current theory is that Etruscans came from
Asia. But, it remains a mystery how they got to Italy
 There is enough evidence in the data to conclude that modern Italians
and the Estruscans have different average head sizes.
5
11/5/09
Significance test using JMP
Example 1
 A sample of 40 recovery alcoholics was given the State-Trait
Inventory Test. The mean score of the 40 recovery alcoholics
was 38 with a sample SD of 7. A psychologist suspected that
recovering alcoholics in general had a higher mean score than
the norm of 35. Do the sample justify the suspicion?
Example 2
My opinion about statistical
significance
 There was concern among health officials in a community
 DO NOT RELY BLINDLY ON A FIXED CUT-OFF
that an unusually large percentage of babies with abnormally
low birth weight were being born. Abnormally low birth
weight here is defined as less than 88 ounces. A sample of
180 births showed 14 babies with abnormally low birth
weight. The proportion births that the officials expect to be
abnormally low is 5%. Do the data support the health
officials claims?
 Consider two p-values: 0.050001 and 0.049999.
 These two p-values provide the same amount of evidence
against the null hypothesis.
 But if we judge strictly by the 0.05 cut-off we don’t reject the
null for 0.050001 and we do for 0.04999.
 Ridiculous no? Consider p-values on their own merits
Type I and Type II errors
The role of sample size
 Possible errors from decision to reject or not to reject the
 The chance of a making a Type I error does not depend on
null hypothesis
 Type I error = reject when Ho is true
 Type II error = fail to reject when Ha is true
sample size. (Sample sizes incorporated into test statistics).
 The chance of making a Type II error decreases as sample size
increases. (Be wary when using test based on small sample
sizes)
 Hypothesis testing is not perfect. You never know if you are
making one these errors!
 Important to replicate study whenever possible to reduce
these errors
6
11/5/09
The role of sample size
The role of sample size
 When the hypothesized value is NOT very different from the
 Inferences are always improved by obtaining as much
actual value of the parameter, you need a large sample size to
reduce the chance of a Type II error.
(accurate and relevant) data as possible.
 With large enough sample size, you can reject any false null
 In many grant proposals, you have to justify the study size by
methods that attempt to minimize the chance of Type II
errors.
hypothesis
 However,
 These methods are called power analyses.
Dangers of excessive fishing
Practical vs. statistical significance
 With enough hypothesis tests, you’ll find something
 When you get a statistically significant result, consider
statistically significant.
 Some of these statistically significant results may really be
Type I errors.
whether it is practically significant.
 If your sample size is large enough you’ll be able to detect a
difference between the hypothesised value of a parameter and
its true value if Ho is wrong.
 Try to avoid excessive fishing for statistical significance. If
you perform many tests, be sure to report how many you
do. And, see if results are replicated in separate studies
 But is this difference of practical significance
 Example of weight lifting study
Non-significant results
Relationship between CI and
hypothesis tests
 Failing to reject a null hypothesis is not a failed study
 You can use Cis like a hypothesis test
 It is just as important to learn that a null hypothesis explains
data well as it is to learn that it does not
 Example: Say your null hypothesis is Ho: p = 0.5.
 If 95% CI does not contain null hypothesis vale, e.g. (0.64, 0.70), then
the two sided test has p.value < 0.05
 If 95% CI contains the null hypothesis value, e.g. (0.47, 0.87), then the
two-sided test has p-value > 0.05
7
11/5/09
CIs vs Hypothesis tests
Important caveat
 Hypothesis test can identify parameter values that are
 A hypothesis test will not remedy a poorly designed study
inconsistent with the data.
 They do not specify parameter values that plausibly could
have produced the data.
 Bad data yield unreliable p-values
 Confidence intervals do this. Hence, when given a choice
use CIs over hypothesis tests.
8