Download Confidence Intervals for Population Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Confidence Intervals
for Population Mean
Business Statistics
Plan for Today
•
•
•
•
•
Inferential Statistics
Point and Interval Estimates
Confidence Intervals
Estimating the required sample size
Examples
1
Inferential Statistics
• Goal = use information obtained from a sample to
increase our knowledge about the population from
which the sample was taken (i.e., to estimate or
make inferences about the population)
• 2 types:
– Estimating the value of a population parameter
– Testing a hypothesis
• Using the Sampling Distribution of the
Sample Mean (SDSM) is key
Estimating a population mean
• One of the purposes of randomly sampling a
population is to get an estimate of the mean of the
population
• Usually, the best estimate of a population mean is the
sample mean. Example: mean SEL test score for a group of
64 students is 77.4, thus 77.4 is the best estimate for the
population of all students who take SEL test
• Logic behind it is that you are more likely to get a
sample mean of 77.4 from a population with a mean
of 77.4: this is a point estimate
2
Point and Interval Estimates
• Point estimate is when you estimate a specific
value of a population parameter
– Accuracy of the point estimate = SD (how much
the scores in this distribution typically vary)
• Interval estimate is when you estimate a
range in which the population parameter is
likely to fall
– You can do this because the distribution of means
is generally a normal curve, thus you know the
percentage of scores that lie at a given area of the
distribution: about 68 % of all sample means lie
between the mean ± 1 SD
Terminology
• Point estimate: a single number designed to estimate a
quantitative parameter of a population, usually the
corresponding sample statistic
• Interval estimate: an interval bounded by two values that
is calculated from the sample and that is used to estimate the
value of a population parameter
• Confidence interval: an interval estimate with a specified
level of confidence
• Level of confidence 1 − 𝛼 : the proportion of all interval
estimates that include the parameter being estimated –
usually 90% , 95% , 98% or 99%
3
Example
Take a city, like Trenton, NJ. We want to know
how much time it takes workers living in Trenton
to get to work and back: the commuting time
• Sample = 36 workers from Trenton
• Mean = 49 minutes
• This mean becomes the point estimate for the
population of all Trenton workers
• σ = 15 minutes
Example: continued
• This mean should be close to the population
mean, μ
• SDSM and the CLT tell us how close this mean,
a point estimate, is to the population mean, μ
• Recall: with a large enough sample the SDSM
will be close to normally distributed
4
Recall: the Empirical Rule
Example: continued
If we knew the value of 𝜇, the population mean,
then we could have calculated an interval
between which 9̴ 5% of the sample average
commuting times should fall:
From 𝜇 − 2𝜎𝑥 to 𝜇 + 2𝜎𝑥 , i.e.
𝜎
𝑛
to 𝜇 + 2
𝜎
𝑛
15
36
to μ + 2
15
36
from 𝜇 − 2
from μ − 2
, i.e.
, i.e.
from 𝜇 − 5 to 𝜇 + 5 minutes
5
Sampling Distribution of 𝒙 ’s , unknown μ
In algebraic terms: 𝑃 𝜇 − 5 < 𝑥 < 𝜇 + 5 ≈ 95%
Interval Estimates
• Interval estimate: an interval bounded by two
values that is calculated from the sample and that is
used to estimate the value of a population parameter
• Level of confidence 1 − 𝛼 : the proportion of all
interval estimates that include the parameter being
estimated
• Confidence interval: an interval estimate with a
specified level of confidence
6
Example: continued
What are the bounds of the interval centered at
𝑥 = 49 minutes?
From 𝑥 − 2𝜎𝑥 to 𝑥 + 2𝜎𝑥 , i.e.
from 49−5 to 49+5 minutes
This means that the 95.44%
confidence interval for μ is
from 44 to 54 minutes.
Confidence Intervals
7
Summary :
Calculating Confidence Intervals
•
•
•
•
Sample Mean: 𝑥
Sample Size: n
Population standard deviation: σ
Level of confidence we wish to have: 1 − 𝛼
1 − 𝛼 ∙ 100% gives us an estimate of how
confident you can be that your mean falls within this
interval
0.95 *100% = 95%: you are 95% confident that the
population mean falls within this interval
Step by step
Estimation of Mean μ (σ known)
Assumption: either the general population
has the bell-shaped symmetric distribution,
or the sample size is at least 25.
8
Confidence Coefficient 𝒛(𝜶 𝟐)
Constructing a Confidence Interval
• Step 1: Set-Up
– Describe the population parameter of interest
• Step 2: The Confidence Interval Criteria
– Check the assumptions
– Identify the probability distribution and the
formula to be used
– State the level of confidence 𝟏 − 𝜶
• Step 3: The Sample Evidence
– Collect the sample information
9
Constructing a Confidence Interval
• Step 4: The Confidence Interval
– Determine the confidence coefficient 𝑧(𝛼 2)
– Find the error bound for a population mean
𝐸𝐵𝑀 = 𝑧(𝛼 2) ∙
𝜎
𝑛
– Find the lower and upper confidence limits
• Step 5: State the confidence interval
from 𝑥 − 𝐸𝐵𝑀
to
𝑥 + 𝐸𝐵𝑀
(units)
The confidence coefficient
• Some useful numbers from the table:
If
if
if
If
If
If
if
1 − 𝛼 = 0.80
1 − 𝛼 = 0.90
1 − 𝛼 = 0.94
1 − 𝛼 = 0.95
1 − 𝛼 = 0.96
1 − 𝛼 = 0.98
1 − 𝛼 = 0.99
(80%),
(90%),
(94%),
(95%),
(96%),
(98%),
(99%),
then
then
then
then
then
then
then
𝑧
𝑧
𝑧
𝑧
𝑧
𝑧
𝑧
𝛼
𝛼
𝛼
𝛼
𝛼
𝛼
𝛼
2
2
2
2
2
2
2
= 1.28
= 1.645
= 1.88
= 1.96
= 2.055
= 2.33
= 2.575
Check for yourself!
10
Example: textbook cost
A random sample of 60 students from X
University has revealed that their average
annual textbook spending is $928. From
previous studies, it is known that the standard
deviation for annual textbook costs can be
takes as $230. Find a 95% confidence interval
for the mean annual textbook costs for all
students at X University.
Example: textbook costs
Step 1: What is the population parameter of
interest?
Step 2: 𝜎 = $230 is known. Is a sample of 60
students good enough? (we need the sampling
distribution to be approximately normal); we will
therefore use the standard normal distribution;
the level of confidence is 1 − 𝛼 = 0.95 (95%)
Step 3: 𝑛 = 60, 𝑥 = $928
11
Example: textbook costs
Step 4: 0.95/2 = 0.475, 𝑧 𝛼 2 = 1.96 (table)
𝐸𝐵𝑀 = 𝑧 𝛼 2 ∙
𝜎
𝑛
= 1.96 ∙
230
60
= 58.2
𝑥 − 𝐸𝐵𝑀 = 869.8 , 𝑥 + 𝐸𝐵𝑀 = 986.2
Step 5: The 95% confidence interval for the
population mean 𝜇 is:
from $870 to $986
(same precision as the data)
How to decrease the error?
• To decrease the value of EBM (and thus, to
decrease the size of the confidence interval for
𝜇) there are two possibilities:
(A) Decrease the confidence level. A smaller
confidence level will result in a smaller
𝑧(𝛼/2) аnd thus, you’ll get a smaller EBM.
(B) Increase the size of a sample. A larger
value of n means a larger value of 𝑛 and
thus, you’ll get a smaller value of EBM.
• Tradeoffs: (A) less certain, (B) more costly
12
Example: practice
A survey by Future Shop involving 35 households
in the area revealed the mean spending of $850
on home electronics during the last year.
Construct a 98% confidence interval for the
average annual spending on home electronics for
all households in the area, if the population
standard deviation is known to be $300.
Answer: from $732 to $968.
Estimating the sample size
• If we wish the error EBM to be smaller than a
certain value, 𝜀, but the confidence level is
fixed at 1 − 𝛼 , we can choose the necessary
sample size:
𝜎
𝜀 > 𝐸𝐵𝑀 = 𝑧(𝛼 2) ∙
𝑛
Thus, 𝑛 >
𝑧 𝛼 2 ∙𝜎 2
𝜀
13
Estimating the sample size
𝑧 𝛼 2 ∙𝜎
𝜀
2
• The number
rounded up to the
nearest integer is denoted by 𝑛𝑚𝑖𝑛 : the
minimum required sample size.
• Example: a supermarket manager needs to
estimate the average weekly grocery spending
by his customers at a 90% level of confidence
and with an error not exceeding $10. What is
the minimum sample size needed, if he knows
that the population standard deviation is $60?
Example: grocery shopping
•
•
•
•
•
Solution.
Given: 1 − 𝛼 = 0.9, 𝜎 = $60, 𝜀 = $10
Find: 𝑛𝑚𝑖𝑛
First, we have 𝑧(𝛼 2) = 1.645
Now, we compute:
𝑧 𝛼 2 ∙𝜎
𝜀
2
=
1.645∙60 2
10
= 97.4
Thus, the minimum required sample size is
𝑛𝑚𝑖𝑛 = 98 customers
14
Example: practice
An insurance company wants to estimate the
average mileage driven by residents per week in
Hamilton, so that the error does not exceed 20
km at the 99% level of confidence. From other
studies they know that the population standard
deviation can be taken as 100 km. Estimate the
sample size needed for this study.
Answer: 166 drivers
15