• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Transcript
```Chapter 12
Confidence
Intervals for
Means
Pearson
12-1
12.1 The Sampling Distribution for the Mean
We found confidence intervals for proportions to be
pˆ  ME
where the ME was equal to a critical value, z*,times SE( p̂ ).
Our confidence intervals for means will be
y  ME
where the ME will be a critical value times SE( y ).
12-2
12.1 The Sampling Distribution for the Mean
The standard deviation of the sample mean is given below.
SD( y ) 

n
So we need know the true value of the population standard
deviation σ.
Instead of σ, we will use s, the sample standard deviation from
the data. We get the following formula for standard error.
s
SE ( y ) 
n
12-3
12.1 The Sampling Distribution for the Mean
Gosset’s t
William S. Gosset discovered that when he used the standard
error s / n , the shape of the curve was no longer Normal.
He called the new model the Student’s t, which is a model that is
always bell-shaped, but the details change with the sample sizes.
The Student’s t-models form a family of related distributions
depending on a parameter known as degrees of freedom.
12-4
12.1 The Sampling Distribution for the Mean
Student’s t-models are unimodal, symmetric, and bell-shaped, just
like the Normal model.
But t-models (solid curve below) with only a few degrees of
freedom have a narrower peak than the Normal model (dashed
curve below) and have much fatter tails.
As the degrees of freedom increase,
the t-models look more and more
like the Normal model.
12-5
12.1 The Sampling Distribution for the Mean
Example: Find the standard error
Data from a survey of 25 randomly selected customers found a
mean age of 31.84 years and the standard deviation was 9.84
years.
What is the standard error of the mean?
How would the standard error change if the sample size had
been 100 instead of 25? (Assume that s = 9.84 years.)
12-6
12.1 The Sampling Distribution for the Mean
Example: Find the standard error
Data from a survey of 25 randomly selected customers found a
mean age of 31.84 years and the standard deviation was 9.84
years.
What is the standard error of the mean?
s
9.84
SE ( y ) 

 1.968
n
25
How would the standard error change if the sample size had
been 100 instead of 25? (Assume that s =9.84 years.)
s
9.84
SE ( y ) 

 0.984, which is half as large
n
100
12-7
12.2 A Confidence Interval for Means
12-8
12.2 A Confidence Interval for Means
12-9
12.2 A Confidence Interval for Means
Finding t-Values
The Student’s t-model is different for each value of degrees of
freedom.
Typically we limit ourselves to 80%, 90%, 95%, and 99%
confidence levels.
We can use technology to give critical values for any number of
degrees of freedom and for any confidence levels we need. More
precision won’t necessarily help make good business decisions.
12-10
12.2 A Confidence Interval for Means
Finding t-Values
A typical t-table is shown here.
The table shows the critical
values for varying degrees of
freedom, df, and for varying
confidence intervals.
Since the t-models get closer to
the normal as df increases, the
final row has critical values from
the Normal model and is labeled
“∞”.
12-11
12.2 A Confidence Interval for Means
Finding t-Values
For example, suppose we’ve
performed a one-sample t-test
with 19 df and a critical value of
1.639, and we want the upper tail
P-value.
From the table, we see that 1.639
falls between 1.328 and 1.729. All
we can say is that the P-value lies
between P-values of these two
critical values, so 0.05 < P < 0.10.
12-12
12.2 A Confidence Interval for Means
Example: Construct a Confidence Interval
Data from a survey of 25 randomly selected customers found a
mean age of 31.84 years and the standard deviation was 9.84
years.
Construct a 95% confidence interval for the mean. Interpret the
interval.
12-13
12.2 A Confidence Interval for Means
Example: Construct a Confidence Interval
Data from a survey of 25 randomly selected customers found a
mean age of 31.84 years and the standard deviation was 9.84
years.
Construct a 95% confidence interval for the mean.
y  t * SE ( y )  31.84  (2.064)(1.968)
 31.84  4.062
 (27.78, 35.90)
Interpret the interval. We’re 95% confident the true mean age of
all customers is between 27.78 and 35.90 years.
12-14
12.3 Assumptions and Conditions
Independence Assumption
There is no way to check independence of the data, but we
should think about whether the assumption is reasonable.
Randomization Condition: The data arise from a random
sample or suitably randomized experiment.
10% Condition: The sample size should be no more than 10%
of the population. For means our samples generally are, so this
condition will only be a problem if our population is small.
12-15
12.3 Assumptions and Conditions
Normal Population Assumption
Student’s t-models won’t work for data that are badly skewed.
We assume the data come from a population that follows a
Normal model. Data being Normal is idealized, so we have a
“nearly normal” condition we can check.
Nearly Normal Condition: The data come from a distribution
that is unimodal and symmetric. This can be checked by
making a histogram.
12-16
12.3 Assumptions and Conditions
Normal Population Assumption
Nearly Normal Condition:
•For very small samples (n < 15), the data should follow a
Normal model very closely. If there are outliers or strong
skewness, t methods shouldn’t be used.
•For moderate sample sizes (n between 15 and 40), t methods
will work well as long as the data are unimodal and reasonably
symmetric.
•For sample sizes larger than 40 or 50, t methods are safe to
use unless the data are extremely skewed. If outliers are
present, analyses can be performed twice, with the outliers and
without.
12-17
12.3 Assumptions and Conditions
Normal Population Assumption
In business, the mean is often the value of consequence.
Even when we must sample from a very skewed distribution,
the Central Limit Theorem tells us that the sampling distribution
of our sample mean will be close to Normal.
We can use Student’s t methods without much worry as long
as the sample size is large enough.
12-18
12.3 Assumptions and Conditions
Normal Population Assumption
The histogram below displays the compensation of 500 CEO’s.
We see an extremely skewed distribution.
12-19
12.3 Assumptions and Conditions
Normal Population Assumption
Taking many samples of 100 CEO’s, we obtain the nearly Normal
plot below for the sample means.
12-20
12.2 A Confidence Interval for Means
Example: Check Assumptions and Conditions
Data from a survey of 25 randomly selected customers found a
mean age of 31.84 years and the standard deviation was 9.84
years. A 95% confidence interval for the mean is (27.78,
25.90). Check conditions for this interval.
12-21
12.2 A Confidence Interval for Means
Example: Check Assumptions and Conditions
Data from a survey of 25 randomly selected customers found a
mean age of 31.84 years and the standard deviation was 9.84
years. A 95% confidence interval for the mean is (27.78,
25.90). Check conditions for this interval.
Independence: Data were gathered from a random sample and
should be independent.
10% Condition: These customers are fewer than 10% of the
customer population.
Nearly Normal: The histogram is unimodal and approximately
symmetric.
12-22
Confidence Intervals
Confidence intervals for means offer new, tempting, wrong
interpretations. Here are some ways to keep from going astray:
• Don’t say, “95% of all the policies sold by this sales rep have
profits between \$942.48 and \$1935.32.” The confidence interval
policies.
• Don’t say, “We are 95% confident that a randomly selected
policy will have a net profit between \$942.48 and \$1935.32.”
This false interpretation is also about individual policies rather
than about the mean of the policies.
12-23
Confidence Intervals
• Don’t say, “The mean profit is \$1438.90 95% of the time.”
That’s about means, but still wrong. It implies that the true
mean varies, when in fact it is the confidence interval that
would have been different had we gotten a different sample.
• Don’t say, “95% of all samples will have mean profits between
\$942.48 and \$1935.32.” That statement suggests that this
interval somehow sets a standard for every other interval. In
fact, this interval is no more (or less) likely to be correct than
any other.
12-24
Confidence Intervals
• If the confidence interval is for the mean, then do not interpret
the results in terms of individuals.
• Don’t forget that the true mean does not vary, but the
confidence interval will vary based on the sample.
• Don’t suggest that a particular confidence interval somehow
sets the standard for every other interval.
12-25
12.5 Sample Size
We know that a larger sample will almost always give better
results, but more data costs money, effort, and time.
We know how to find the margin of error for the mean.
ME  t n*1  SE ( y )
We also know how to find the standard error for the mean.
s
SE ( y ) 
n
We can determine the sample size by solving this equation for n.
12-26
12.5 Sample Size
The equation has several values that we don’t know.
We need to know s, but we won’t know s until we collect some
data, and we want to calculate the sample size before we collect
the data.
Often a “good guess” for s is sufficient.
If we have no idea what the value for s is, we could run a small
pilot study to get some feeling for the size of the standard
deviation.
12-27
12.5 Sample Size
Without knowing n, we don’t know the degrees of freedom,
and we can’t find the critical value, t n*1.
One common approach is to use the corresponding z* value
from the Normal model.
For example, if you’ve chosen a 95% confidence interval, then
use 1.96 (or 2).
If your estimated sample size is 60 or more, your z* was
probably a good guess. If it’s smaller, use z* at first, finding n,
and then replacing z* with the corresponding t n*1 and
calculating the sample size once more.
12-28
12.5 Sample Size
Sample size calculations are never exact.
The margin of error you find after collecting the data won’t match
exactly the one you used to find n.
Before you collect data, it’s always a good idea to know whether
the sample size is large enough to give you a good chance of
being able to tell you what you want to know.
12-29
12.5 Sample Size
Example: Check Assumptions and Conditions
Data from a survey of 25 randomly selected customers
found a mean age of 31.84 years and the standard deviation
was 9.84 years. A 95% confidence interval for the mean is
(27.78, 25.90).
How large a sample is needed to cut the margin of error in
half?
How large a sample is needed to cut the margin of error by
a factor of 10?
12-30
12.5 Sample Size
Example: Check Assumptions and Conditions
Data from a survey of 25 randomly selected customers
found a mean age of 31.84 years and the standard deviation
was 9.84 years. A 95% confidence interval for the mean is
(27.78, 25.90).
How large a sample is needed to cut the margin of error in
half? Four times as large, or n = 100.
How large a sample is needed to cut the margin of error by
a factor of 10? one hundred times as large, or n = 2500
12-31
12.6 Degrees of Freedom – Why n – 1?
If we know the true population mean, μ, we can find the
standard deviation using n instead of n – 1.
s
2
(
y


)

n
We use yinstead of μ. For any sample, will
y be as close to the
data values as possible, and the population mean μ will be farther
away.


( y y ) instead of
( y  ) in the equation to
If we use
calculate s, our standard deviation will be too small.
2
2
We compensate for this by dividing by n – 1 instead of by n.
12-32
First, you must decide when to use Student’s t methods.
•Don’t confuse proportions and means. Use Normal models with
proportions. Use Student’s t methods with means.
•Be careful of interpretation when confidence intervals overlap.
Don’t assume that the means of overlapping confidence intervals
are equal.
12-33
Student’s t methods work only when the Normal Population
Assumption is true.
•Beware of multimodality. If you see this, try to separate the data
into groups.
•Beware of skewed data. If it is skewed, try re-expressing the
data
•Investigate outliers. If they are clearly in error, remove them. If
they can’t be removed, you might run the analysis with and
without the outlier.
12-34
The are other risks when doing inferences about means.
• Watch out for bias. Measurements can be biased.
• Make sure data are independent. Consider whether there are
likely violations of independence in the data collection methods.
12-35
What Have We Learned?
Know the sampling distribution of the mean.
• To apply the Central Limit Theorem for the mean in practical
applications, we must estimate the standard deviation. This
standard error is
s
SE ( y ) 
n
• When we use the SE, the sampling distribution that allows for
the additional uncertainty is Student’s t.
12-36
What Have We Learned?
Construct confidence intervals for the true mean, µ.
• A confidence interval for the mean has the form y  ME
• The Margin of Error is ME = t*df SE( y ).
Find values by technology or from tables.
• When constructing confidence intervals for means, the correct
degrees of freedom is n – 1.
Check the Assumptions and Conditions before using any
sampling distribution for inference.
Write clear summaries to interpret a confidence interval.