Download Stat 281 Chapter 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 8
Introduction to Statistical
Inferences
Parameters and Statistics
• A parameter is a numeric characteristic of a
population or distribution, usually symbolized
by a Greek letter, such as μ, the population
mean.
• Inferential Statistics uses sample information to
estimate parameters.
• A Statistic is a number calculated from data.
• There are usually statistics that do the same
job for samples that the parameters do for
populations, such as x , the sample mean.
Using Samples for Estimation
μ
x
Sample
(known statistic)
Population
(unknown parameter)
The Idea of Estimation
• We want to find a way to estimate the
population parameters.
• We only have information from a sample,
available in the form of statistics.
• The sample mean, x , is an estimator of
the population mean, μ.
• This is called a “point estimate” because it
is one point, or a single value.
Interval Estimation
• There is variation in x , since it is a random
variable calculated from data.
• A point estimate doesn’t reveal anything about
how much the estimate varies.
• An interval estimate gives a range of values that
is likely to contain the parameter.
• Intervals are often reported in polls, such as
“56% ±4% favor candidate A.” This suggests we
are not sure it is exactly 56%, but we are quite
sure that it is between 52% and 60%.
• 56% is the point estimate, whereas (52%, 60%)
is the interval estimate.
The Confidence Interval
• A confidence interval is a special interval
estimate involving a percent, called the
confidence level.
• The confidence level tells how often, if samples
were repeatedly taken, the interval estimate
would surround the true parameter.
• We can use this notation: (L,U) or (LCL,UCL).
• L and U stand for Lower and Upper endpoints.
The longer versions, LCL and UCL, stand for
“Lower Confidence Limit” and “Upper
Confidence Limit.”
• This interval is built around the point estimate.
Theory of Confidence Intervals
• Alpha (α) represents the probability that when
the sample is taken, the calculated CI will miss
the parameter.
• The confidence level is given by (1-α)×100%,
and used to name the interval, so for example,
we may have “a 90% CI for μ.”
• After sampling, we say that we are, for
example, “90% confident that we have
captured the true parameter.” (There is no
probability at this point. Either we did or we
didn’t, but we don’t know.)
How to Calculate CI’s
• There are many variations, but most CI’s have
the following basic structure:
• P ± TS
– Where P is the parameter estimate,
– T is a “table” value equal to the number of standard
deviations needed for the confidence level,
– and S is the standard deviation of the estimate.
• The quantity TS is also called the “Error Bound”
(E) or “Margin of Error.”
• The CI should be written as (L,U) where
L= P-TS, and U= P+TS.
A Confidence Interval for μ
• If σ is known, and
• the population is normally distributed,
or n>30 (so that we can say x is
approximately normally distiributed),
x  z / 2 x
gives the endpoints for a (1- α)100% CI for μ
• Note how this corresponds to the P ± TS
formula given earlier.
Distribution Details
• What is z / 2?
– α is the significance level, P(CI will miss)
– The subscript on z refers to the upper tail
probability, that is, P(Z>z).
– To find this value in the table, look up the
z-value for a probability of .5-α/2.
Hypothesis Tests
• So far, we have discussed estimating
parameters.
• For example, use a sample mean to estimate
μ, giving both a point estimate and a CI.
• Now we take a different approach. Suppose
we have an existing belief about the value of
μ. This could come from previous research,
or it could be a standard that needs to be met.
• Examples:
– Previous corn hybrids have achieved 100 bu/acre.
We want to show that our new hybrid does better.
– Advertising claims have been made that there are
20 chips in every chocolate chip cookie. Support
or refute this claim.
Framing the Test
• We start with a null hypothesis. This represents the status
quo, or the conclusion if our test cannot prove anything.
• The null hypothesis is denoted by H0: μ=μ0 where μ0
corresponds to the current belief or status quo. (The equal
sign could be replaced with an inequality if appropriate.)
• Example:
– In the corn problem, if our hybrid is not better, it doesn’t beat the
previous yield achievement of 100 bu/acre. Then we have H0:
μ=100 or possibly H0: μ≤100.
– In the cookie problem, if the advertising claims are correct, we have
H0: μ=20 or possibly H0: μ≥20.
• Notice the choice of null hypothesis is not based on what
we hope to prove, but on what is currently accepted.
Framing the Test
• The alternative hypothesis is the result that
you will get if your research proves something
is different from status quo or from what is
expected.
• It is denoted by Ha: μ≠μ0. Sometimes there is
more than one alternative, so we can write H1:
μ≠μ0, H2: μ>μ0, and H3: μ<μ0.
• In the corn problem, if our yield is more than
100 we have proved that our hybrid is better,
so the alternative Ha: μ>100 is appropriate.
Framing the Test
• For the cookie example, if there are less than
20 chips per cookie, the advertisers are
wrong and possibly guilty of false advertising,
so we want to prove Ha: μ<20.
• A jar of peanut butter is supposed to have 16
oz in it. If there is too much, the cost goes up,
while if it is too little, consumers will complain.
Therefore we have H0: μ=16 and Ha: μ≠16.
Difference v. Confidence Intervals
• A hypothesis test makes use of an estimate,
such as the sample mean, but is not directly
concerned with estimation.
• The point is to determine if a proposed value
of the parameter is contradicted by the data.
• A hypothesis test resembles the legal concept
of “innocent until proven guilty.” The null
hypothesis is innocence. If there is not
enough evidence to reject that claim, it
stands.
Accept vs. Reject
• In scientific studies, the null hypothesis is based on
the current theory, which will continue to be believed
unless there is strong evidence to reject it.
• However, the failure to reject the null hypothesis does
not mean it is true, just as the guilty sometimes do go
free because of lack of evidence.
• Thus, statisticians resist saying “accept H0.” When
there is enough evidence, we reject H0, and replace it
with Ha. H0 is never accepted as a result of the test,
since it was assumed to begin with.
• Therefore, we will use the terms “Reject H0” and “Do
Not Reject H0” (DNR) to describe the results of the
test.
Hypothesis Tests of the Mean
• The null hypothesis is initially assumed true.
• It states that the mean has a particular value, μ0.
• Therefore, it follows that the distribution of x has
the same mean, μ0.
• We reason as follows: If we take a sample, we get
a particular sample mean, x . If the null hypothesis
is true, x is not likely to be “far away” from μ0. It
could happen, but it’s not likely. Therefore, if x is
“too far away,” we will suspect something is wrong,
and reject the null hypothesis.
• The next slide shows this graphically.
Comments on the Graph
• What we see in the previous graph is the idea
that lots of sample means will fall close to the
true mean. About 68% fall within one standard
deviation. There is still a 32% chance of
getting a sample mean farther away than that.
So, if a mean occurs more than one standard
deviation away, we may still consider it quite
possible that this is a random fluctuation,
rather than a sign that something is wrong
with the null hypothesis.
More Comments
• If we go to two standard deviations, about
95% of observed means would be
included. There is only a 5% chance of
getting a sample mean farther away than
that. So, if a far-away mean occurs (more
than two standard deviations out), we think
it is more likely that it comes from a
different distribution, rather than the one
specified in the null hypothesis.
Choosing a Significance Level
• The next graph shows what it means to
choose a 5% significance level.
• If the null hypothesis is true, there is only a
5% chance that the standardized sample
mean will be above 1.96 or below -1.96.
• These values will serve as a cutoff for the
test.
x
Decision Time
• We have already shown that we can use a
standardized value instead of x to decide
when to reject. We will call this value Z*, the
standard normal test statistic.
• The criterion by which we decide when to
reject the null hypothesis is called a “decision
rule.”
• We establish a cutoff value, beyond which is
the rejection region. If Z* falls into that region,
we will reject Ho.
• The next slide shows this for α=.05.
x
Steps in Hypothesis Testing
1. State the null and alternative hypotheses
2. Determine the appropriate type of test
3. State the decision rule (Define the
rejection region)
4. Calculate the test statistic
5. State the decision and the conclusion in
terms of the original problem
Example
• A jar of peanut butter is supposed to have 16 oz in it. If there is
too much, the cost goes up, while if it is too little, consumers will
complain. Assume the amount filled is normally distributed with a
standard deviation of ½ oz. In a random sample of 20 jars, the
mean amount of peanut butter is 16.15 oz. Conduct a test to see
if the jars are properly filled, using α=.05.
• Step 1: Hypotheses: H0: μ=16 and Ha: μ≠16.
• Step 2: Type of test: The population is normal and standard
deviation is given, use Z-test.
• Step 3: Decision Rule: Reject H0 if Z*>1.96 or Z*<-1.96.
• Step 4: Test Statistic:
16.15  16
.15
Z* 
.5 / 20

.1118
 1.34
• Step 5: Conclusion: Do not reject H0 and conclude the jars may
be properly filled.
One-tailed Tests
• Our graphs so far have shown tests with
two tails.
• We have also seen that the alternative
hypothesis could be of the form H2: μ>μ0,
or H3: μ<μ0.
• These are one-tailed tests. The rejection
region only goes to one side, and all of α
goes into one tail (it doesn’t split).
 /2
 /2
Example
• Advertising claims have been made that there are 20
chips in every chocolate chip cookie. A sample of 30
cookies gives an average of 18.5 chips per cookie.
Assume the standard deviation is 1.5 and conduct an
appropriate test using α=.05.
• Step 1: Hypotheses: H0: μ=20 and Ha: μ<20.
• Step 2: Type of Test: Sample is 30 and standard
deviation known, use Z-test.
• Step 3: Decision Rule: Reject H0 if Z*<-1.645.
• Step 4: Test statistic: Z *  18.5  20  1.5  5.48
1.5 / 30
.2739
• Step 5: Reject H0 and conclude the cookies contain
less than 20 chips per cookie on average.
Making Mistakes
• Hypothesis testing is a statistical process involving
random events. As a result, we could make the
wrong decision.
• A Type I Error occurs if we reject H0 when it is
true. The probability of this is known as α, the
level of significance.
• A Type II Error occurs when we fail to reject a
false null hypothesis. The probability of this is
known as β.
• The Power of a test is 1-β. This is the probability
of rejecting the null hypothesis when it is false.
Classification of Errors
Actual
Decision
Ho True
Ho False
Reject
Type I Err Type B
P(Error)= α Correct
Do Not
Reject
Type A
Correct
Type II Err
P(Error)=β
Two numbers describe a test
• The significance level of a test is α, the
probability of rejecting Ho if it is true.
• The power of a test is 1-β, the probability
of rejecting Ho if it is false.
• There is a kind of trade-off between
significance and power. We want
significance small and power large, but
they tend to increase or decrease together.
p-Value Testing
• Say you are reporting some research in biology
and in your paper you state that you have rejected
the null hypothesis at the .10 level. Someone
reviewing the paper may say, “What if you used a
.05 level? Would you still have rejected?”
• To avoid this kind of question, researchers began
reporting the p-value, which is actually the
smallest α that would result in a rejection.
• It’s kind of like coming at the problem from
behind. Instead looking at α to determine a critical
region, we let the estimate show us the critical
region that would “work.”
How p-Values Work
• To simplify the explanation,
let’s look at a right-tailed
means test. We assume a
distribution with mean μ0 and
we calculate a sample mean.
• What if our sample mean fell
right on the boundary of the
Z*
critical region?
• This is just at the point where we would reject H0.
• So if we calculate the probability of a value greater than x ,
this corresponds to the smallest α that results in a rejection.
• If the test is two tailed, we have to double the probability,
because x marks one part of the rejection region, but its
negative marks the other part, on the other side (other tail).
Using a p-Value
• Using a p-Value couldn’t be easier. If p<α, we reject H0.
That’s it.
• p-Values tell us something about the “strength” of a rejection.
If p is really small, we can be very confident in the decision.
• In real world problems, many p-Values turn out to be like
.001 or even less. We can feel very good about a rejection in
this case. However, if p is around .05 or .1, we might be a
little nervous.
• When Fischer originally proposed these ideas early in the last
century, he suggested three categories of decision:
– p < .05  Reject H0
– .05 ≤ p ≤ .20  more research needed
– p > .20  Accept H0