Download PowerPoint Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 12
Confidence Intervals and Hypothesis
Tests for Means
1
© 2010 Pearson Education
12.1 The Sampling Distribution for the Mean
We found confidence intervals for proportions to be
pˆ  ME
where the ME was equal to a critical value, z*,times SE( p̂ ).
Our confidence intervals for means will be
y  ME
where the ME will be a critical value times SE( y ).
2
© 2010 Pearson Education
12.1 The Sampling Distribution for the Mean
The standard deviation of the sample mean is given below.
SD( y ) 

n
So we need know the true value of the population standard
deviation σ.
Instead of σ, we will use s, the sample standard deviation
from the data. We get the following formula for standard
error.
s
SE ( y ) 
n
3
© 2010 Pearson Education
12.1 The Sampling Distribution for the Mean
Gosset’s t
William S. Gosset discovered that when he used the
standard error s / n , the shape of the curve was no longer
Normal.
He called the new model the Student’s t, which is a model
that is always bell-shaped, but the details change with the
sample sizes.
The Student’s t-models form a family of related distributions
depending on a parameter known as degrees of freedom.
4
© 2010 Pearson Education
12.2 A Confidence Interval for Means
5
© 2010 Pearson Education
12.2 A Confidence Interval for Means
6
© 2010 Pearson Education
12.2 A Confidence Interval for Means
Student’s t-models are unimodal, symmetric, and bellshaped, just like the Normal model.
But t-models (solid curve below) with only a few degrees of
freedom have a narrower peak than the Normal model
(dashed curve below) and have much fatter tails.
As the degrees of freedom
increase, the t-models look more
and more like the Normal model.
7
© 2010 Pearson Education
12.3 Assumptions and Conditions
Independence Assumption
There is no way to check independence of the data, but we
should think about whether the assumption is reasonable.
Randomization Condition: The data arise from a random
sample or suitably randomized experiment.
10% Condition: The sample size should be no more than
10% of the population. For means our samples generally are,
so this condition will only be a problem if our population is
small.
8
© 2010 Pearson Education
12.3 Assumptions and Conditions
Normal Population Assumption
Student’s t-models won’t work for data that are badly
skewed. We assume the data come from a population that
follows a Normal model. Data being Normal is idealized, so
we have a “nearly normal” condition we can check.
Nearly Normal Condition: The data come from a distribution
that is unimodal and symmetric. This can be checked by
making a histogram.
9
© 2010 Pearson Education
12.3 Assumptions and Conditions
Normal Population Assumption
• For very small samples (n < 15), the data should follow a
Normal model very closely. If there are outliers or strong
skewness, t methods shouldn’t be used.
• For moderate sample sizes (n between 15 and 40), t
methods will work well as long as the data are unimodal and
reasonably symmetric.
• For sample sizes larger than 40, t methods are safe to use
unless the data are extremely skewed. If outliers are present,
analyses can be performed twice, with the outliers and
without.
10
© 2010 Pearson Education
12.3 Assumptions and Conditions
Normal Population Assumption
In business, the mean is often the value of consequence.
Even when we must sample from a very skewed distribution,
the Central Limit Theorem tells us that the sampling
distribution of our sample mean will be close to Normal.
We can use Student’s t methods without much worry as long
as the sample size is large enough.
11
© 2010 Pearson Education
12.3 Assumptions and Conditions
Normal Population Assumption
Example: The histogram below displays the compensation of
500 CEO’s. We see an extremely skewed distribution.
12
© 2010 Pearson Education
12.3 Assumptions and Conditions
Normal Population Assumption
Example (continued): Taking a sample of 100 CEO’s, we
obtain the nearly Normal plot below for the sample means.
13
© 2010 Pearson Education
12.4 Cautions About Interpreting
Confidence Intervals
• If the confidence interval is for the mean, then do not
interpret the results in terms of individuals.
• Don’t forget that the true mean does not vary, but the
confidence interval will vary based on the sample.
• Don’t suggest that a particular confidence interval somehow
sets the standard for every other interval.
14
© 2010 Pearson Education
12.5 One-Sample t-Test
15
© 2010 Pearson Education
12.5 One-Sample t-Test
Finding t-Values by Hand
The Student’s t-model is different for each value of degrees
of freedom.
Typically we limit ourselves to 80%, 90%, 95%, and 99%
confidence levels.
We can use technology to give critical values for any number
of degrees of freedom and for any confidence levels we
need. More precision won’t necessarily help make good
business decisions.
16
© 2010 Pearson Education
12.5 One-Sample t-Test
Finding t-Values by Hand
A typical t-table is shown here.
The table shows the critical
values for varying degrees of
freedom, df, and for varying
confidence intervals.
Since the t-models get closer
to the normal as df increases,
the final row has critical values
from the Normal model and is
labeled “∞”.
17
© 2010 Pearson Education
12.5 One-Sample t-Test
Finding t-Values by Hand
For example, suppose we’ve
performed a one-sample t-test
with 19 df and a critical value
of 1.639, and we want the
upper tail P-value.
From the table, we see that
1.639 falls between 1.328 and
1.729. All we can say is that
the P-value lies between
P-values of these two critical
values, so 0.05 < P < 0.10.
18
© 2010 Pearson Education
12.6 Sample Size
We know that a larger sample will almost always give better
results, but more data costs money, effort, and time.
We know how to find the margin of error for the mean.
ME  t n*1  SE ( y )
We also know how to find the standard error for the mean.
s
SE ( y ) 
n
From these equations we obtain an equation for the sample
size n.
ME  t
© 2010 Pearson Education
*
n 1
s

n
19
12.6 Sample Size
The equation has several values that we don’t know.
We need to know s, but we won’t know s until we collect
some data, and we want to calculate the sample size before
we collect the data.
Often a “good guess” for s is sufficient.
If we have no idea what the value for s is, we could run a
small pilot study to get some feeling for the size of the
standard deviation.
20
© 2010 Pearson Education
12.6 Sample Size
Without knowing n, we don’t know the degrees of freedom,
and we can’t find the critical value, t n*1.
One common approach is to use the corresponding z* value
from the Normal model.
For example, if you’ve chosen a 95% confidence interval,
then use 1.96 (or 2).
If your estimated sample size is 60 or more, your z* was
probably a good guess. If it’s smaller, use z* at first, finding n,
and then replacing z* with the corresponding t n*1and
calculating the sample size once more.
21
© 2010 Pearson Education
12.6 Sample Size
Sample size calculations are never exact.
The margin of error you find after collecting the data won’t
match exactly the one you used to find n.
Before you collect data, it’s always a good idea to know
whether the sample size is large enough to give you a good
chance of being able to tell you what you want to know.
22
© 2010 Pearson Education
*12.7 Degrees of Freedom – Why n – 1?
If we know the true population mean, μ, we can find the
standard deviation using n instead of n – 1.
s
2
(
y


)

n
We use y instead of μ. For any sample, y will be as close to
the data values as possible, and the population mean μ will
be farther away.
2
(
y


)
If we use  ( y y ) instead of 
in the equation to
2
calculate s, our standard deviation will be too small.
We compensate for this by dividing by n – 1 instead of by n.
23
© 2010 Pearson Education
What Can Go Wrong?
First, you must decide when to use Student’s t methods.
•Don’t confuse proportions and means. Use Normal models
with proportions. Use Student’s t methods with means.
• Be careful of interpretation when confidence intervals
overlap. Don’t assume that the means of overlapping
confidence intervals are equal.
24
© 2010 Pearson Education
What Can Go Wrong?
Student’s t methods work only when the Normal Population
Assumption is true.
• Beware of multimodality. If you see this, try to separate the
data into groups.
• Beware of skewed data. If it is skewed, try re-expressing
the data
• Investigate outliers. If they are clearly in error, remove
them. If they can’t be removed, you might run the analysis
with and without the outlier.
25
© 2010 Pearson Education
What Can Go Wrong?
The are other risks when doing inferences about means.
• Watch out for bias. Measurements can be biased.
• Make sure data are independent. Consider whether there
are likely violations of independence in the data collection
methods.
• Make sure that data are from an appropriately randomized
sample.
26
© 2010 Pearson Education
What Have We Learned?
• What we can say about a population mean is inferred from
the data using the mean and standard deviation of a
representative random sample.
• To describe the sampling distribution of sample means
using a new model we select from the Student’s t family
based on our degrees of freedom.
• Our ruler for measuring the variability in sample means is
the standard error.
s
SE ( y ) 
n
27
© 2010 Pearson Education
What Have We Learned?
• To find the margin of error for a confidence interval using
that standard error ruler and a critical value based on a
Student’s t-model.
• To use that standard error ruler to test hypotheses about
the population mean.
• The reasoning of inference, the need to verify that the
appropriate assumptions are met, and the proper
interpretation of confidence intervals and P-values all
remain the same regardless of whether we are investigating
means or proportions.
28
© 2010 Pearson Education