Download 10.2 ap stats new.notebook

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
10.2 ap stats new.notebook
August 31, 2009
10.2 Tests of Significance
A test of significance asks whether some difference or effect
is real ­or­
is just due to chance sampling variability
A significance test assesses evidence from sample data against some claim about a parameter.
Two ways to test claims or hypotheses about a parameter:
1. Confidence Intervals
2. Significance Tests
1
10.2 ap stats new.notebook
August 31, 2009
Tests of Significance are based on sampling distributions (ch. 9) and standard scores (like z) and tell us how much evidence we have against some claim.
US Court System
1. Assume innocent.
2. Determine legal procedings.
3. Assess evidence against innocence.
4. Make a decision about guilt.
Significance Testing
1. Assume some claim is true.
2. Determine procedure to follow.
3. Find the probability of getting such a statistic.
4. Make a decision about the claim.
2
10.2 ap stats new.notebook
August 31, 2009
A few more specifics about Significance Tests
1. Assume that some claim is true.
2. Find your statistic and look at the sampling distribution.
3. Find the probability of getting a statistic like that or more extreme.
4. Is the evidence convincing? An outcome that is very unlikely, assuming that the claim is true, is good evidence that the claim should be questioned or rejected.
It's not monkey­business,
it's logic!
3
10.2 ap stats new.notebook
August 31, 2009
Hypothesis: • claim about a population &/or
parameter.
• what we assume is true or want to prove
Never write a hypothesis about a statistic!
Writing a hypothesis about a statistic is immediately
4
10.2 ap stats new.notebook
August 31, 2009
Null Hypothesis Ho
A statistical test begins by supposing the effect we want is not present. Then we try to find evidence against this assumption. The claim of no change, no effect, or no difference is the null hypothesis (the null is dull) and is symbolized as Ho. We then evaluate how much evidence against the null hypothesis there is.
The null hypothesis is the statement being tested. See page 565 for more about Ho.
5
10.2 ap stats new.notebook
August 31, 2009
Alternative Hypothesis Ha
The statement that there is a change, an effect, or a difference is the alternative hypothesis and is symbolized as Ha. If we find strong evidence against the null hypothesis (Ho), then we decide the alternative hypothesis (Ha) is probably the true state of things.
6
10.2 ap stats new.notebook
August 31, 2009
More about hypotheses and claims.
The null hypothesis is just that­­ "null" ­­ nothing interesting, nothing
has changed, no difference, not effective, etc. The burden of proof
resides in the alternative hypothesis. Someone might claim that something has changed or that it has not. Either way the null is no change.
Ex. Suppose we have a medication that's proven effective, but has now been slightly reformulated.
Scenario 1: We've added another ingredient and now we claim the new version is even more effective.
Ho: The cure rate has not changed
Ha: The cure rate is higher
Scenario 2: We've deleted an ingredient that was causing upset stomach in some people and now we claim the new version is still just as effective.
Ho: The cure rate has not changed
Ha: The cure rate is lower
In both cases the null is the same, but in the first case the claim
is in the alternative hypothesis and in the second it's in the null.
What's different is the strength of the conclusion we can reach. In
the first case, rejecting the null offers strong evidence that the
claim is true (not quite a definitive proof of the claim, but
certainly leaning in that direction). In the second, failing to
reject the null merely says we lack evidence things have changed. This is nowhere near proof that the claim is true. We'll continue to hold the conjecture of equivalent effectiveness, but we certainly haven't proven it (which is why we never "accept" a null hypothesis).
7
10.2 ap stats new.notebook
August 31, 2009
Hypotheses in words
Null Hypothesis Ho:
The true population proportion or mean of what you are trying to study equals the guess of the parameter.
Alternative Hypothesis Ha:
The true population proportion or mean of what you are trying to study is greater than, less than, or not equal to the guess of the parameter.
8
10.2 ap stats new.notebook
August 31, 2009
Hypothesis Testing
State both hypotheses at the beginning of EVERY statistical test.
Only one hypothesis will be declared likely TRUE at the end of the test. We can't really prove that one is true, though.
9
10.2 ap stats new.notebook
August 31, 2009
reject Ho
• implies there's sufficient evidence that the Ho is false
• guilty
• implies there's sufficient evidence to convict
fail to reject Ho
• implies there's insufficient evidence that the Ho is false
• not guilty
• implies there's insufficient evidence to convict
accept Ho
• implies proof that Ho is true
• innocent
• implies proof that one is innocent
10
10.2 ap stats new.notebook
August 31, 2009
Hypotheses for tests about the mean
Ho: µ=µo
Ha: µ<µo
Ho: µ=µo
Ha: µ>µo
Ho: µ=µo
Ha: µ≠µo
11
10.2 ap stats new.notebook
August 31, 2009
How we quantify the strength of evidence: the P­value
The P value is
• Probability that the test statistic takes a value at least as extreme as the one observed.
• A conditional probability
• P(seeing this statistic (or one more extreme) | Ho is true)
• The probability of seeing a statistic at least this extreme if the Ho is true.
A small P­value is strong evidence against Ho.
How small? That depends on what we're testing.
12
10.2 ap stats new.notebook
August 31, 2009
page 567
The probability, computed assuming that Ho is true, that the observed outcome would take a value as extreme or more extreme than that actually observed is called the P­value of the test. The smaller the P­value is, the stronger the evidence against Ho provided by the data.
page 569
If we decide in advance that some particular probability is decisive, we use the Greek letter alpha, α, to indicate this level of evidence against Ho that we will insist upon.
If the P­value is small or smaller than alpha, α, we say that the data are statistically significant at level α.
13
10.2 ap stats new.notebook
August 31, 2009
Statistical Significance is
• a way to say how much evidence we need to make some decision.
• based on α, the significance level, the complement of a confidence level
• declared if the P­value is as small or smaller than α.
14
10.2 ap stats new.notebook
August 31, 2009
Evidence Against Null Hypothesis P­Value
"Some" 0.05 < P < 0.10
"Moderate" 0.01 < P < 0.05
"Strong" P < 0.01
but, we have to be careful about being rigid concerning α (alpha). Ex. Most of us have a pretty good built­in feel for α. We have a coin we suspect is biased. We agree that we expect 10 heads in 20 tosses.
Almost no one would suspect it biased if 11 heads came up. That's just ordinary random variability. How about 12 heads? That's probably still a reasonable number for a fair coin. At 13, some of us might suspects the coin is not fair. Most of us would really start to question the coin's fairness
at 14 and 15 heads. If we check the binomial cumulative probability of up to 14 or 15 heads in 20 tosses, we would find that most of us reject the assumption of a fair coin at a probability of 0.01 to 0.05.
15
10.2 ap stats new.notebook
August 31, 2009
A Significance level is
• the p value that is the criterion for saying that Ho is either true or not true.
• also called the α (alpha) level.
• generally set at .05 or .01. • the area of the rejection region (also called the critical region) of a distribution.
16
10.2 ap stats new.notebook
August 31, 2009
What if alpha is not given?
NOBODY will argue with α=0.05, but if you think something else is more appropriate, just justify it by appealing to the context. (Since this requires a longer answer that will eat up time on the AP Stat test, it is NOT casually recommended it unless the question specifically asks you to address that issue.)
17
10.2 ap stats new.notebook
August 31, 2009
From now on, always follow one of these methods:
The inference toolbox (page 571)
Step 1: ID the population and the parameter we want to draw conclusions about.
Step 2: Choose the appropriate inference procedure and verify the conditions for the procedure.
Step 3: If the conditions are met, find the test statistic and the P­value.
Step 4: Interpret the results in the context of the problem.
PHANTOMS (my usual method)
Population and parameter
Hypotheses in context
Assumptions/conditions verified
Name of the procedure
Test statistic found
Obtain a P value
Make a decision about Ho
State a conclusion in context
18
10.2 ap stats new.notebook
August 31, 2009
A comparison of PHANTOMS and the Inference Toolbox
Inference toolbox:
"PHANTOMS"
1
P parameter
H hypotheses
2
A assumptions
3
N name of test
T test statistic
O obtain p­value
M make decision
4
S state conclusion in context
19
10.2 ap stats new.notebook
August 31, 2009
Another method you can use is PHAT DC:
1. state Parameter of interest
2. state null and alternate Hypotheses
3. state and verify Assumptions
4. choose a Test, then construct and evaluate a test statistic
5. Decide based on the test statistic whether Ho is supported
6. interpret test results in the Context of the situation
Remember, that's PHAT DC:
Parameter
Hypotheses
Assumptions
Test choice and Test statistic
Decisions Conclusion in context
Inference toolbox:
1
2
3
4
20
10.2 ap stats new.notebook
August 31, 2009
Other people like to use this:
"CATCH" spelled backward
H
C
T
A
C
Inference toolbox:
hypotheses and parameter
check conditions
test statistic
alpha and p­value
conclusion in context
1
2
3
4
21
10.2 ap stats new.notebook
August 31, 2009
Details! Details! No matter what method you use...
Show how you verified the assumptions/conditions.
Don't just list them and check them off.
Specify the procedure, by name or formula.
Show the values substituted into the formula. (It's an insurance policy­­ if you do this but push a wrong calculator button, you might still get full credit.)
Be sure your conclusion is in context. Generic statements don't earn you points.
22
10.2 ap stats new.notebook
August 31, 2009
page 572
To test the hypothesis Ho: µ = µo based on an SRS of size n
from a population with unknown mean µ and known standard deviation σ, compute the one­sample z statistic
23
10.2 ap stats new.notebook
August 31, 2009
The P­value for a test of Ho against
Ha: µ > µo is P(Z>z)
Where
Ha: µ < µo is P(Z<z)
Ha: µ ≠ µo is 2P(Z>z)
The P­values are exact if the population distribution is Normal. Otherwise, they are approximate for large n.
24
10.2 ap stats new.notebook
August 31, 2009
Ex. Company A uses bug repellent for its workers. The current repellent protects for 8 hours. A cheaper repellent is being considered. Assume α=.05, that effective times are Normally distributed, and that σ=2 hours. Is there evidence that the cheaper repellent protects less than 8 hours? The cheaper repellent protected a SRS of 15 workers for an average of 7 hours. 25
10.2 ap stats new.notebook
August 31, 2009
26
10.2 ap stats new.notebook
August 31, 2009
27
10.2 ap stats new.notebook
August 31, 2009
Ex. Company B also uses bug repellent for its workers. The current repellent protects for 8 hours. A plant­based repellent is being considered. Assume α=.05, that effective times are Normally distributed, and that σ=2 hours. Is there evidence that the plant­based repellent is any different from the current one? The plant­based repellent protected a SRS of 15 workers for an average of 8.5 hours. 28
10.2 ap stats new.notebook
August 31, 2009
P­value =
2(.166)=
.332
2(.166)= .332
At .05 level of significance there is no reason to believe that the average effective life of the plant­based repellent is different from 8 hours.
29
10.2 ap stats new.notebook
August 31, 2009
A couple ways to remember whether you should reject Ho:
p α
person already­set­up hurdle
P­value method
> cleared the hurdle, so fail to reject Ho
≤ hit the hurdle, so reject Ho
test statisic method
α
If the test statistic falls in the shaded critical region (rejection region), then we reject Ho.
n α
o
ed
s
ba
* e z
alu
l v
ca
iti
cr
If the test statistic falls in the unshaded acceptance region, then we fail to reject Ho.
If the T(wisted) S(ister) goes shopping and the If the T(wisted) S(ister) goes store is brightly lit, and shopping and the lights go out, not shaded, she keeps her she rejects her purchases; who purchases (fails to reject would shop in the dark?
her purchases).
30
10.2 ap stats new.notebook
August 31, 2009
page 578 the critical value method
To test the hypothesis H0: µ = µo based on an SRS of size n from a population with unknown mean µ and known standard deviation σ, compute the one­sample z statistic.
Reject Ho at significance level α against a one­sided alternative Ha: µ > µo if z > z*
Ha: µ < µo if z < ­z*
where z* is the upper critical value from Table C. Reject Ho at significance level α against a two­sided alternative
Ha: µ ≠ µo if |z| > z*
where z* is the upper α/2 critical value from Table C.
31
10.2 ap stats new.notebook
August 31, 2009
We can also use a CI for a significance test.
page 581
A level α two­sided significance test rejects a hypothesis Ho: µ= µo exactly when the value µo falls outside a level 1­ α confidence interval for µ.
Basically, if the hypothesized value for μ is outside the CI, then we reject the Ho. This works when Ha says that μ≠μo. The significance level will be α if the confidence level is 1­α.
32
10.2 ap stats new.notebook
August 31, 2009
Ex. Survey of Study Habits and Attitudes (SSHA) • a psychological test
• measures motivation, attitude toward school, & study habits
• scores range from 0 to 200 • for U.S. college students μ ≈ 115 and σ ≈ 30
A teacher suspects that older students have better attitudes toward school & gives SSHA to 20 students over age 30. = 135.2. Assume σ = 30 for the older students. Construct a 90% confidence interval for the mean score μ for older students. Is there evidence that the mean score for older students differs from that of the general college population?
33