Download Chapter 7 - Confidence Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Chapter 7
Confidence Intervals
J.C. Wang
Goal and Objectives
Goal: to learn confidence intervals
Objectives:
I
To understand that each interval has two end-points (lower
and upper bound) and Interpret the confidence interval
I
To compute the confidence interval: a point estimate ± the
margin of error
I
To determine the sample size
Outline
Introduction
Applications, Definations and Notation
Confidence Intervals
Computation
z-Confidence Intervals
t-Confidence Intervals
Sample Size Determination
An Example
Comparing Two Populations
Comparing Means of Two Independent Populations
Comparing Means of Two Dependent Groups
Confidence Interval for Proportion
Confidence Interval for Population Proportion
Sample Size Determination
Applications of Estimation in Business
examples
I
Store inventory value
I
Manufacture process
I
Distribution process
I
Drug delivery
I
Auditor
Definitions
I
Sample statistic: a value computed from the sample (i.e.,
from data).
I
Point estimate (pt.est): a single sample statistic that
estimates the population parameter such as the mean or
proportion.
I
Interval estimate of the true population parameter takes
into account the sampling distribution of the point estimate
where we have an upper bound and a lower bound.
Notations
to be discussed and used later
I
CI — confidence interval
I
CVal — critical value
I
ME — margin of error
I
SE — standard error
I
SD — standard deviation
I
pt.est. — point estimate
I
Zα/2 — normal distribution critical value (use invnorm)
I
tn−1 — students t distribution critical value with n − 1
degrees of freedom (use math solver or the invT)
Computation of
confidence intervals
I
pt.est ± ME
I
Where the point estimate estimates population mean µ (by
x) or population proportion p (by p̂)
I
marginOfError = criticalValue × standardError
I
In other words, ME = (CVal)(SE).
iClicker Question 7.1 pre-lecture
iClicker Question 7.1 pre-lecture
Standard Error
I
Most of the time we will not have the SD of population
mean, but we can compute sample SE of the mean:
s
SEx = √
n
I
Also, we will not have the SD of population proportion, but
we can compute sample proportion SE:
r
p̂(1 − p̂)
SEp̂ =
n
Critical Value
I
z for normal distribution
I
t for students t-distribution
I
The students t-distribution has n − 1 degrees of freedom,
df = n − 1
z-Critical Value
I
Notation: zα/2 = upper (100 × α/2)th standard normal
percentile
I
That is: P(Z > zα/2 ) = α/2 ⇐⇒ P(Z ≤ zα/2 ) = 1 − α/2
So, zα/2 = invNorm(1 − α/2)
I
Example 95% confidence interval will give 2.5% in each tail
of the bell-shaped curve; therefore, the z-CVal,
zcv = z.025 = invNorm(1−.025) = invNorm(.975) = 1.96.
z-Critical Value
continued
area to the left of zα is 1 −
2
α
2
1−α
α
α
2
2
zα
2
t-Critical Value
using TI calculators
1. math −→ solver −→ tcdf(L, U, D) − A/T , where
I
I
I
I
I
L = tcv (to be solved)
U = 9999 (i.e., U = ∞)
D = df = n − 1
A = α (error rate)
T = number of tails = 2 for c.i.
2. or use invT(1 − α/2, df )
Cereal Box Packaging Example
Consider a cereal packaging plant in Battle Creek that is
concerned with putting 368 gram of cereal into a box.
I
What are the costs associated with putting too much cereal
in a box?
I
What are the costs associated with putting too little cereal
in a box?
I
Let’s construct a 95% confidence interval.
Cereal Box Packaging Example
continued
I
Suppose sample size n = 25
I
Suppose sample average x = 365 grams
I
Suppose SD is a process SD; therefore, σ = 15 grams
I
Suppose we want a 95% confidence interval
I
Therefore, the critical value is zcv = 1.96
Cereal Box Packaging Example
continued, margin of error
I
Recall ME = CVal × SE
I
The critical value (CVal) for 95% CI means that the area
under the curve of one tail is 5% ÷ 2 or 0.025; therefore,
zcv = invNorm(1 − .025) = invNorm(.975) = 1.96
I
I
σ
15
SE = √ = √ = 3
n
25
ME = 1.96 × 3 = 5.88
Cereal Box Packaging Example
continued, confidence interval
I
Since the confidence interval is the pt.est ± ME
I
CI = 365 ± 5.88 = (359.12, 370.88).
I
Therefore, we are 95% confident that the population mean
is between 359 and 371.
I
Since 368, the value that is printed on the box indicates the
manufacturing process is working properly (is within the
interval), there is no reason to conclude that anything is
wrong with the process.
z-Confidence Interval Using TI Calculators
example
Let’s use TI calculator:
I
Do this: STAT → TESTS → Zinterval → STATS
↓ σ:15 ↓ x:365 ↓ n:25 ↓ C-Level:.95 ↓ CALCULATE
I
READOUT:
Zinterval
(359.12, 370.88)
x = 365
n = 25
Since 368, the target of the package, is within the interval;
production should continue.
Note on z-Confidence Intervals
I
The value of z selected for constructing such a confidence
interval is called the critical value for the distribution.
I
There are different critical values for each level of
confidence (or confidence level, CL), 1 − α, where α =
significance level, SL (or error rate).
I
Frequently Used zcv :
SL
10%
5%
1%
CL
90%
95%
99%
2-tailed CVal
1.645
1.96
2.58
Note: There is a trade off between the width of the confidence
interval and the level of confidence.
Problem When SD is Unknown
We have been dealing with N(µ, σ) where σ (population or
process SD) is known. What happens when standard deviation
(σ) is not from a population or process SD? Is this requirement
rigid? Can we compute standard deviation from the sample?
Let us review some history first.
History of the Student t Distribution
William Gosset, an employee of Guinness Breweries in Ireland,
had a preoccupation with making statistical inferences about
the mean when SD was unknown. Since the employees of the
company were not allowed to publish their scientific work under
their own name. He chose the pseudonym “Student.”
Therefore, his contribution is still known as Student’s
t-Distribution.
Comparing Standard Normal Curve
with t curves
Comparison of Standard Normal
with t Curves
0.4
N(0,1)
t1
t5
t10
density
0.3
0.2
0.1
0.0
−3
−2
−1
0
1
x
2
3
4
t-Confidence Interval for the Mean
summer II quiz example
Construct a 95% CI for the mean score for Summer II Quiz
Data of 14 students
Given: 95% CL, x = 25, s = 10.777, n = 14,
10.777
√
= 2.8803
14
CVal = tα/2,n−1 = t.025,13 = invT(1 − .025, 13) = 2.1604
SE
=
ME
= 2.1604 × 2.8803 = 6.2225
pt.est
= 25
95%CI = pt.est ± ME = (18.778, 31.222)
t-Confidence Interval for the Mean
using TI calculators
I
Do this: STAT → TESTS ↓ TInterval → STATS
↓ x:25 ↓ Sx :10.777 ↓ n:14 ↓ C-Level:.95 ↓
CALCULATE
I
READOUT:
Tinterval
(18.778, 31.222)
x = 25
n = 14
I
We are 95% confident that the true mean quiz score is
between 18.8 and 31.2.
iClicker Question 7.2
iClicker Question 7.2
Sample Size Determination
based on confidence intervals
I
What sample size should we use for the average quiz
score determination if we want 95% confidence, ME =5,
and σ = 10.777
I
n=
z 2σ2
1.962 × 10.7772
==
= 17.8 ≈ 18
ME 2
52
Slow Wave Sleep Example
page 100, problem #1
21 20 22 7 9 14 23 9 10 25 15 17 11
I (a) x = 15.6154 and s = 6.1310
I
(b) population average and SD: not possible.
I
(c) the sample average will miss the population average by
the SE.
√
√
(d) SEx = s/ n = 6.1310/ 13 = 1.7
I
I
(e)
ME = CVal × SE = t.975,13−1 × 1.7 = 2.1788 × 1.7 = 3.704
I
(f) 95% CI is 15.6154 ± 3.704 = (11.91, 19.32)
Slow Wave Sleep Example
continued
I
(f) (continued) can also do this (assuming data have been
entered into list 1, L1 ):
STAT → TESTS ↓ tInterval → DATA
↓ List:L1 ↓ CALCULATE
I
(g) If the confidence level is reduced to 90%, the new
interval will be shorter.
I
(h) 90% CI → (12.585, 18.646)
I
(i) Interpret the 90% CI.: We are 90 percent confident that
the true (population average) is between 12.585 and
18.646.
Slow Wave Sleep Example
continued
I
(j) Does the 95% CI suggest that elderly men over 60
spend 20% of their sleep in REM?
No, since 20 (%) is not in the 95% CI.
I
(k) What sample size should we use if we change the ME
to 2.5?
CVal 2 × SD 2
1.962 × 6.132
use
n=
=
=
23.10
= 24
ME 2
2.52
Eg., CI
A manager of a consumer electronics company wants to
investigate the TV viewing habits of residents of a small
midwestern town. A random sample of 40 respondents is selected,
and each respondent is instructed to keep a detailed record of all
TV viewing in a certain week. The viewing time per week was
X = 15.3 hours, S = 3.8 hours and 27 respondents watch the
evening news at least three weeknights. Compute the margin of
error for a 95% confidence interval.
I
I
I
I
I
Error rate α = 1 − .95 = .05, 1 − α2 = 1 − .025 = .975.
Degrees of freedom = n − 1 = 40 − 1 = 39.
CVal = invT(.975, 39) = 2.0227, SE = √3.8
= .6008
40
ME = 2.0227 × .6008 = 1.215
95% CI: (15.3 − 1.215, 15.3 + 1.215) = (14.085, 16.515)
Calculate the Margin of Error
from a given confidence interval
Note that, since a confidence interval is computed by
pt. est. ± ME = (LB, UB)
where LB = Lower confidence Bound and
UB = Upper confidence Bound, we have
ME =
UB − LB
.
2
So, if a 95% confidence interval is known = (14.085,16.515),
then
16.515 − 14.085
ME =
= 1.215
2
iClicker Question 7.3
iClicker Question 7.3
Comparing Means
of two independent populations
I
We are not limited to comparing an average to a constant.
Suppose we want to compare the means of two
independent populations.
I
Parameter of interest: δ = µ1 − µ2
I
Recall: CI is pt.est ± ME
pt.est
ME
= x 1 − x 2,
= CVal × SE where
CVal = tn1 +n2 −2 ,
q
SE =
SE12 + SE22
Example
battery example
A statistics student designed an experiment to see if there was
any real difference in battery life between brand-name AA
batteries and generic AA batteries. He used six pairs of AA
alkaline batteries from two major battery manufactures: a well
known brand name and a generic brand. He measured the
length of battery life while playing a CD player continuously. He
recorded the time (in minutes) when the sound stopped.
Battery Example
continued
Generic
x
206
S
10.3
n
6
Want 95% CI
Brand Name
187.4
14.6
6
I
(a) What is the standard error?
I
(b) What is the 95% CVal?
I
(c) What is the ME?
I
(d) What is the 95% CI?
I
(e) Does this confidence interval suggest that generic AA
batteries will last longer than brand-name AA batteries?
I
(f) Interpret the 95% CI.
Battery Example
continued, answers
I
(a)
s
(n1 − 1)s12 + (n2 − 1)s22
= 12.6,
n1 + n2 − 2
r
12.62 12.62
+
= 7.27
6
6
pooled SD =
SE
=
I
(b) CVal = tn1 +n2 −2 = invT(.975, 10) = 2.2281
I
(c) ME = CVal × SE = 2.2281 × 7.27 = 16.1983
I
(d) 95% CI → (2.35, 34.85)
Battery Example
continued, answers
I
(e) Does this confidence interval suggest that generic AA
batteries will last longer than brand-name AA batteries?
Yes, because zero is not within the interval
I
(f) Interpret the 95% CI.
We are 95% confident that the true mean difference is
between 2.35 and 34.85.
Battery Example
continued, using TI calculator
I
Do this: STAT → TESTS ↓ 2-SampTInt → STATS ↓
x 1 :206.0 ↓ Sx1:10.3 ↓ n1 :6 ↓ x 2 :187.4 ↓ Sx2:14.6 ↓
n2 :6 ↓ C-Level:.95 ↓ Pooled:Yes ↓ CALCULATE
I
READOUT:
2-sampTInt
(2.3471, 34.853)
df=10
:
Sxp: 12.6342788
:
I
Zero is not within this interval, we can conclude that there
is a difference between the two means.
iClicker Question 7.4
iClicker Question 7.4
Comparing Means of 2 dependent groups
I
We are not limited to comparing two averages of
independent samples. Suppose we want to compare the
means of two related samples.
I
Remember CI = pt. est. ± ME, where pt. est. =
D = X 1 − X 2.
I
ME = CVal × SE, where
CVal = tα/2,n−1 = invT(1 −
S
α
, n − 1), and SE = √diff
2
n
Example
computer stock prices
We want to compare January 2002 prices vs. January 2003
prices of computer companies, see page 92.
Computer Stock Prices
Jan. 02 Jan. 03
Diff.
x
25.91
17.96
7.946
s
6.34
5.65
6.1426
size n
5
5
5
Computer Stock Prices Example
continued
I
What is Standard Error?
I
What is 95% Critical Value?
I
What is 95% Margin of Error?
I
What is a 95% Confidence Interval?
I
Does this confidence interval suggest a difference in stock
prices between Jan. 2002 and Jan. 2003?
I
Interpret the 95% CI
Computer Stock Prices Example
answers
I
I
I
I
I
I
s
6.1426
SE = √diff = √
= 2.7471
n
5
CVal = t.025,n−1 = invT(1 − .025, 4) = 2.7764
ME = 2.7764 × 2.7471 = 7.6271
95%CI −→ 7.946 ± 7.6271 = (0.3189, 15.573)
Does this confidence interval suggest a difference in stock
prices between Jan. 2002 and Jan. 2003?
Yes, because zero is NOT within CI.
Interpret the 95% CI:
We are 95% confident that the true difference is between
.3189 and 15.573.
Computer Stock Prices Example
answers using TI calculators
I
Do this: STAT → EDIT and Enter data into L1 and L2 then
place cursor on L3 , do 2nd2 − 2nd1 (i.e.,L2 − L1 ) →
STAT → TESTS ↓ tInterval → DATA ↓ List:L3 ↓
C-Level:.95 ↓ CALCULATE
I
READOUT:
TInterval
(0.3189, 15.573)
x = 7.946
Sx = 6.1426
n = 5
I
Zero is not within this interval, we can conclude that there
is a difference between the two means.
West Michigan Telecom Example
problem 13 on page 104
Some stock market analysts have speculated that parts of West
Michigan Telecom might be worth more that the whole. For
example, the company’s communication systems in Ann Arbor
and Detroit can be sold to other communications companies.
Suppose that a stock market analyst chose nine (9) acquisition
experts and asked each to predict the return (in percent) on
investment (ROI) in the company held to the year 2003 if (i) it
does business as usual, or (ii) if it breaks up its communication
system and sells all its parts. Their predictions follow:
West Michigan Telecom Example, continued
Expert
Not Break
Break Up
I
1
2
3
4
5
6
7
12 21
8 20 16
5 18
15 25 12 17 17 10 21
√
√
SE = sdiff / n = 2.8626/ 9 = 0.9542
8
21
28
9
10
15
I
CVal = tα/2,n−1 = t.025,8 = invT(1 − .025, 8) = 2.3060
I
ME = 2.306 × .9542 = 2.2004
I
95%CI −→ (1.0218, 5.4226)
I
Does this confidence interval suggest a difference between
breaking up the company or not?
Yes, because zero is NOT within CI.
I
Interpret the 95% CI:
We are 95% confident that the true difference among the
experts is between 1.0218 and 5.4226.
West Michigan Telecom Example
continued, using TI calculators
I
STAT → EDIT and Enter data into L1 and L2 and place
cursor on L3 , do 2nd2 − 2nd1 (i.e., L2 − L1 ) then do
STAT → TESTS ↓ tInterval → Data ↓ List:L3 ↓
C-Level:.95 ↓ CALCULATE
I
READOUT:
TInterval
(1.02, 5.42)
x = 3.22
Sx: 2.8626
n = 9
I
Zero is not within this interval, we can conclude that there
is a difference between the two means.
Eg., CI #2
An on-line grocery store in a mid-sized midwestern city that has
more than 10,000 customers. The following statistics summarizes
the May 1999 prices for a shopping list of eight items an on-line
grocery and a local supermarket. The average difference between
the two shopping lists and the standard deviation are 0.0375 and
0.22, respectively. What is the 90% margin of error?
I
I
I
I
I
Given: n = 8, D = .0375, Sdiff = .212, want 90% CI
α = 1 − .9 = .1, 1 − α2 = .95, df = n − 1 = 8 − 1 = 7.
.22
CVal = invT(.95, 7) = 1.89458, SE = √
= .07778
8
ME = 1.89458 × .07778 = 0.1474
If a 90% confidence interval was instead given:
(−.1099, .18485), then ME = .18485−(−.1099)
= 0.1474
2
iClicker Question 7.5
iClicker Question 7.5
Confidence Interval
for population proportion
Suppose we want to estimate the population proportion using
intervals.
I
Recall CI is pt.est ± ME
I
Therefore, use
pt.est =
I
ME = CVal × SE
I
This CI works well if
success
x
=
sampleSize
n
n × p > 5 and n × (1 − p) > 5
(Note: that is, it works well if the expected number of
successes and the expected number of
failures are both greater than 5.)
EAS Sensor Example
If a sales clerk fails to remove the EAS sensor when an item is
purchased, it can result in an embarrassing situation for the
customer. A survey was conducted to study consumer reaction
to such false alarms. Of 250 customers surveyed, 40 said that
if they were to set off an EAS alarm because store personnel
did not deactivate the merchandise, then “they would never
shop at the store again.”
EAS Sensor Example
continued
I
40
pt.est =
= 0.16, SEp̂ =
250
r
.16(1 − .16)
= 0.02319
250
I
CVal = zα/2 = z.025 = invNorm(1 − .025) = 1.96
I
ME = 1.96 × 0.02319 = 0.04544
I
95%CI −→ (.11456, .20544)
I
Interpret the 95% CI:
We are 95% confident that the true proportion is between
0.1146 and 0.2054.
EAS Sensor Example
continued, using TI calculators
I
Do this: STAT → TESTS ↓ 1-PropZint ↓ x:40 ↓ n:250
↓ C-Level:.95 ↓ CALCULATE
I
READOUT:
Zinterval
(.11456, .20544)
P̂ = .16
n = 250
I
We are 95% confident that the true proportion is between
11.46% and 20.54%.
Exercise #17 on page 105
Given: x = 600, n=2000
I
600
= 0.3
2000
r
.3(1 − .3)
SEp̂ =
= 0.0102
2000
CVal = zα/2 = z.025 = invNorm(1 − .025) = 1.96
pt.est
ME
=
= 1.96 × 0.0102 = 0.0201
I
95%CI −→ (.2799, .3201)
I
Interpret the 95% CI:
We are 95% confident that the true proportion is between
0.28 and 0.32.
Sample Size Determination
based on CI of proportion
I
What is the true proportion of success p?
I
Decide which confidence level to use
I
Determine margin of error that you’re willing to accept
EAS Example
Suppose that you are a student with a grant to study this EAS
issue, and you realize that there are not enough funds to gather
data on 250 subjects. So you want to determine a new sample
size by relaxing the confidence level to 90% and use p=.16 and
the ME of 0.04544, what is the new sample size?
EAS Example
continued, answer
z2
× [p̂(1 − p̂)]
n=
ME 2
I
CVal90% = z(1−.9)/2 = z.05 = invNorm(1 − .05) = 1.645
I
ME = .04544
I
p̂ = .16 (note: see discussions on next slide)
I
1.6452
use
n=
×
[.16(1
−
.16)]
=
176.1
= 177
.045442
Sample Size Determination for Proportion
discussions
z2
× [p̂(1 − p̂)]
n=
ME 2
I
When a rough estimate of p is available (such as that from
a pilot study, or some educated guess), use it for p̂ above.
Otherwise, use 21 for p̂ (a conservative estimate).
I
It is recommended to always round it up to the next integer
(as n = 177 here which is rounded up from 176.1).
Eg., CI #3
In a study reported in a well known business newspaper, a large
plastic container manufacturer surveyed 1007 U.S. worker. Of the
people surveyed, 665 indicated that they take their lunch to work
with them. Of the 665 taking their lunch, 200 reported that they
carry the lunch in a brown bag. Consider the population of U.S.
workers who take their lunch to work with them. Set up a 90%
confidence interval estimate of the population proportion who take
brown-bag lunches.
I
I
I
Given: x = 200, n = 665, 90% CI
Do this: STAT → TESTS ↓ 1-PropZint ↓ x:200 ↓ n:665 ↓
C-Level:.9 ↓ CALCULATE
(.2715, .33) is a 90% CI for the population proportion who take
brown-bag lunches.
iClicker Question 7.6
iClicker Question 7.6
Eg., CI #3, continued
In a study reported in a well known business newspaper, a large
plastic container manufacturer surveyed 1007 U.S. worker. Of the
people surveyed, 665 indicated that they take their lunch to work
with them. Of the 665 taking their lunch, 200 reported that they
carry the lunch in a brown bag. Consider the population of U.S.
workers who take their lunch to work with them. After reading this
story, an analyst wants to determine the sample size necessary for
a 95% confidence level of estimating the true proportion of workers
who take their lunch to work with them when the margin of error is
0.03.
I
I
I
Initial estimate is available: p̂ = 200
665 = 0.3.
CVal = .975 = 1.96 for 95% confidence level
The required sample size is
CVal 2
1.962
n=
× p̂×(1− p̂) =
×0.3×(1−0.3) = 896.4 ≈ 897
ME 2
0.032
iClicker Question 7.7
iClicker Question 7.7