Download document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
STAT 111 Introductory Statistics
Lecture 11: Hypothesis Testing
June 9, 2004
Today’s Topics
• Hypothesis testing continued
– Test statistics
– P-values
– Statistical Significance
• Testing a population mean
• Using and abusing hypothesis tests
Hypothesis Testing
• Terminology:
– Hypothesis: a statement about the parameters in a
population or model
– Null hypothesis H0: claim which is initially favored
or believed to be true; the claim we try to find
evidence against. Usually a statement of “no effect or
no difference.”
– Alternative hypothesis Ha: claim that we hope or
suspect to be true instead of H0.
Hypothesis Testing
• Hypothesis testing is designed to assess the
strength of the evidence against the null
hypothesis.
• Hypotheses refer to some population or model,
not to any particular outcome.
• Generally, we begin with the alternative
hypothesis Ha and set up H0 as the statement that
the hoped-for effect is not present.
Hypothesis Testing
• Alternative hypotheses can be either one-sided
(ex., μ > 0), or two-sided (ex., μ ≠ 0).
• The alternative hypothesis should express the
hopes or suspicions we bring to the data.
• Two-sided alternatives are generally used unless
we have a specific direction firmly in mind
beforehand.
Finding Evidence: Test Statistics
• Hypothesis testing is based on a statistic that
estimates the parameter that appears in the
hypotheses, usually the same estimate we use in a
confidence interval for the parameter.
– When H0 is true, we expect the estimate to take a
value near the parameter value specified by H0.
– Values of the estimate far from the parameter value
specified by H0 give evidence against H0. The
alternative hypothesis determines which directions
count against H0.
Finding Evidence: Test Statistics
• A test statistic measures compatibility between
the null hypothesis and the data.
• A test statistic is a random variable with a known
distribution. Once a sample is drawn from the
population, we can observe a value for our test
statistic.
• Question: How probable is it that our test
statistic takes a value as extreme as or more
extreme than that which we actually observed, if
the null hypothesis is true?
Finding Evidence: P-values
• A significance test assesses the evidence against
the null hypothesis in terms of probability.
• I.e., if the observed outcome is unlikely if the null
hypothesis is true, but is more probable under the
alternative hypothesis, the outcome we observe is
evidence for Ha against H0.
• The less probable the outcome, the stronger the
evidence that H0 is false.
• Not all test statistics are normal, so we translate
the value of a test statistic into a probability.
Finding Evidence: P-values
• A test of significance finds the probability of
getting an outcome as extreme or more extreme
than the actually observed outcome.
• “Extreme” in this context means “far from what
we would expect if H0 were true.”
• The direction is determined by Ha as well as H0.
Finding Evidence: P-values
• The P-value of a test is the probability of the test
statistic taking a value as extreme or more
extreme than that actually observed.
• The P-value of a test provides information about
the amount of evidence that is in favor of the
alternative hypothesis and against the null.
• The smaller the P-value of a test is, the stronger
the evidence against the null hypothesis provided
by the data.
Finding Evidence: Significance Levels
• How should we draw conclusions about our
hypothesis test based on the P-value?
• We need a cut-off point (decisive value) that we
can compare our P-value to so that we can draw a
conclusion or make a decision about our test.
• This cut-off point is a significance level. It is a
number announced in advance and serves as a
standard on how much evidence against H0 we
need to reject H0. Usually denoted as α, and the
corresponding test is called a level α test.
Statistical Significance
• When the P-value is as small or smaller than the
significance level, i.e., P-value ≤ α, we say that the data
are statistically significant at level α.
• In other words, we have significant evidence against the
null.
• Whether data is considered statistically significant or
not depends on the significance level; data with a Pvalue of 0.03 are statistically significant at level 0.05,
but not at level 0.01.
• The P-value itself is the smallest level α at which the
data are significant.
Statistical Significance
• If the P-value is less than 0.01, there is
overwhelming evidence against the null.
• If the P-value is between 0.01 and 0.05, there is
strong evidence against the null.
• If the P-value is between 0.05 and 0.10, there is
weak evidence against the null.
• If the P-value exceeds 0.10, we are led to believe
that there is no real evidence against the null.
General Procedures of Hypothesis
Testing
• Step 1. State the null hypothesis H0 and alternative
hypothesis Ha. Specify the significance level.
• Step 2. Calculate the value of the test statistic on which
the test will be based. This statistic usually measures
how far the data are from H0.
• Step 3. Find the P-value for the observed data.
• Step 4. State a conclusion. If the P-value is less than or
equal to the significance level α, reject the null in favor
of the alternative hypothesis; if it is greater than α,
conclude that the data do not provide sufficient evidence
to reject the null hypothesis.
z Test for a Population Mean
• Let X1,…., Xn be a simple random sample from
N(μ, σ). σ is known, μ is the unknown parameter
of interest.
• The null hypothesis is
– H0: μ = μ0
• The alternative hypothesis could be:
– Ha: μ ≠ μ0
– Ha: μ > μ0
– Ha: μ < μ0
z Test for a Population Mean
• The sample mean X is normally distributed with
X  
X 
n
• If H0 is true,  X  0 and
X  0
Z
/ n
has a standard normal distribution.
x -μ0
• Once an SRS is drawn, we will be observe z 
σ/ n
• If H0 is true, z should be close to 0.
Case 1 : H a :    0 . Then H 0 should be rejected if x is too far
away from 0.
- - The P - value is 2 P( Z  z ).
Case 2 : H a :    0. Then H 0 should be rejected if x is too far
away from 0 in a direction consistent with H a , or z is much
larger tha n 0.
- -The P - value is P( Z  z ).
Case 3 : H a :    0 . Then H 0 should be rejected if x is too far
away from 0 in a direction consistent with H a , or z is much
smaller th an 0.
- -The P - value is P( Z  z ).
Example 1
• A new billing system for a store will be cost
effective only if the mean monthly account is
more than $170.
• An SRS of 400 monthly accounts has a mean of
$178.
• If the accounts are normally distributed with σ =
$65, can we conclude that the new system will be
cost-effective? Carry out a level 0.05 test.
Example 2
• A manufacturer of sprinkler systems used for fire
protection in office buildings claims that the true
average system-activation temperature is 130° F.
A sample of n = 9 systems, when tested, yields a
sample average activation temperature of
131.08° F.
• If the distribution of activation times is normal
with standard deviation 1.5° F, does the data
contradict the manufacturer’s claim at
significance level α = 0.01?
Example 3
• The melting point of each of 16 samples of a
certain brand of vegetable oil was determined,
with 94.32 the sample mean.
• Assume that the distribution of melting point is
normal with σ = 1.20
• Test H0: μ = 95 versus Ha: μ ≠ 95 using a twotailed level 0.01 test.
Rejection Region
• The rejection region is a range of values such that if the
test statistic falls into that range, the null hypothesis is
rejected in favor of the alternative hypothesis.
• To use the rejection region method,
– State hypotheses and specify significance level.
– Find corresponding rejection region.
– Calculate test statistic.
– Reject null hypothesis only if value of test statistic
falls within rejection region; otherwise, do not reject
null.
Example
• Bottles of a popular cola drink are supposed to contain
300 milliliters(ml) of cola, but there is some variation
from bottle to bottle. The distribution of the contents is
normal with standard deviation 3ml. A student who
suspects that the bottle is are being under-filled measures
the contents of six bottles. The results are
299.4; 297.7; 301.0; 298.9; 300.2; 297.0
Is this convincing evidence that the mean contents of
cola bottles is less than the advertised 300ml? Carry out
the test at significance level 0.05.
Duality between Confidence Intervals
and Tests
• Suppose we construct a 95% confidence interval
for the population mean μ.
• Then the values of μ that are not in our interval
would seem to be incompatible with the data.
• This sounds like a significance test with α = 0.05
• In particular, any level α two-sided significance
test rejects a hypothesis H0: μ = μ0 exactly when
the value of μ0 falls outside a level 1 – α
confidence interval for μ.
Using/Abusing Hypothesis Tests
• Carrying out a hypothesis test is simple; using a
test wisely not quite so simple.
• Things to consider when using hypothesis tests:
–
–
–
–
–
Choosing a level of significance
What statistical significance does not mean
Ignoring lack of significance
Validity of statistical inference on some data sets
Searching for significance
Choosing Significance Levels
• A significance test is designed to give a clear
statement of the degree of evidence provided by
the sample against the null hypothesis.
• Choosing a level α in advance makes sense if you
need to make a decision, but not if you wish only
to describe the strength of your evidence.
• Choose α by asking how much evidence is
required to reject the null hypothesis. This
depends on how plausible the null really is.
Choosing Significance Levels
• If the null is a widely-believed assumption, strong
evidence will be needed to reject it.
• Level of evidence required to reject the null is
affected by the consequences of such a decision.
• Standard levels of significance are 1%, 5%, and
10%, but there is no sharp border between
“significant” and “insignificant.”
• For example, suppose one test yields P-value of
0.0501 and another yields P-value of .0499, and
our chosen level is α = 0.05.
Statistical Significance
• Rejecting a null hypothesis at one of the usual
levels suggests that there is good evidence that an
effect is present.
• The magnitude of the effect may be extremely
small.
• In particular, for large samples, even tiny
deviations from the null will be significant. I.e.,
we will almost invariably reject the null.
• Statistical significance ≠ practical significance.
Statistical Significance
• Don’t attach too much importance to statistical
significance – pay attention to actual
experimental results.
• Examine plots of data – if effect you are seeking
is not visible in plots, it might not be large
enough to be practically important.
• Giving confidence intervals for parameters of
interest is wise – size of effect is estimated rather
than simply asking if it is too large to occur
through chance alone.
Validity of Statistical Inference
• Badly designed surveys/experiments produce
invalid results. Formal statistical inference
cannot correct the flaws in a design.
• Hypothesis tests and confidence intervals are
based on laws of probability, and randomization
ensures the applicability of those laws.
• Not all data to analyze will arise from
randomized samples/experiments.
• Confidence in a probability model for the data
Searching for Significance
• Statistical significance is highly desired by
researchers.
• The reasoning behind statistical significance
works well if you decide what effect you are
seeking, design an experiment or sample to
search for it, and use a test of significance to
weigh the evidence you get.
• Tempting to make significance itself the object
of the search.
Searching for Significance
• Taking many tests on the same data will allow
you to find significance.
• Not convincing to search for any effect or
pattern and find one. Usual reasoning of
statistical inference does not apply for a
successful search for a pattern.
• Cannot legitimately perform a hypothesis test on
the same data that first suggested that
hypothesis.