Download Chapter 6 Contents The problem of estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Chapter 6 Contents
I
introduction to estimation (section 6.1)
I
Confidence Interval for a Population Mean – Normal (Z)
The problem of estimation
I
in gaining some knowledge of the mean of the population
Statistic (section 6.2)
I
Confidence Interval for a Population Mean – Student’s t
I
Large Sample Confidence Interval for a Population
I
Determining the Sample Size (section 6.5)
I
Finite Population Correction for Simple Random Sampling
I
(not covered)
The sample average just gives a number as the estimate of
I
Confidence Interval for a Population Variance (not covered)
the unknown population mean
I
sample mean is called a Point Estimate of the population
I
A point estimate is almost surely likely to be wrong
I
If the sample average income is $23288, it is unlikely that
mean
I
the population average is $23288
In general we are interested in some parameter ( eg.
mean, median, s.d, number of modes) of an unknown
I
Use a statistic to estimate the parameter ( eg. sample
mean, sample median, sample s.d)
I
This would be called the Point Estimate
More reasonable to say the average income is close to
$23288
population
I
We draw a sample. Use the sample average as an
estimate of the population mean
Proportion (section 6.4)
I
Want to find the average income of a household in
Michigan
Statistic (section 6.3)
I
Suppose we have a large population and we are interested
I
Confidence intervals are an attempt to do this
Confidence intervals
I
Suppose we have a population that can be modeled as
I
Normal
State a measure of accuracy of the proposed interval (
Confidence Level). This is the probability that we will get a
I
The population mean µ is not known
I
Using a sample (of size ‘n‘) propose a range of values (
sample such that the the unknown µ is in the proposed
interval
Confidence interval) for µ . Typically of the form
X̄ ± ‘Margin of Error‘
Normal mean. Known standard deviation
I
I
The basic model here is: we have a normal population
Recall If X is N(µ, σ), then
P(µ−1.96σ < X < µ+1.96σ) = P(−1.96 < Z < 1.96) = .95
whose mean µ is unknown but the s.d. σ is known
I
we plan to draw a sample of of size ‘n‘ and calculate X̄
from the sample
I
We know from the empirical rule that roughly 95% of the
sample averages will be within 2σX̄ of µX̄ .
I
Since µX̄ = µ and σX̄ =
√σ ,
n
0.95
we can say
−3
I
roughly 95% of the sample averages will be within µ ±
I
or roughly 95% of the sample averages will be such that µ
is within X̄ ± 2 √σn
I
We will make all this a bit more general and precise
2 √σn
−1.96
0
1.96
3
confidence interval for µ
I
P(µX̄ − 1.96σX̄ < X̄ < µX̄ + 1.96σX̄ ) = .95
I
since µX̄ = µ and σX̄ =
I
P(µ − 1.96 √σn < X̄ < µ + 1.96 √σn ) = .95
I
Rewriting the above using a bit easy algebra
I
P(X̄ − 1.96 √σn < µ < X̄ + 1.96 √σn ) = .95
I
We now call X̄ ± 1.96 √σn a 95% confidence interval for µ
√σ
n
I
State the required confidence level, like, 80%,90%, 95%
I
write the confidence level as 1 − α =
I
If conf level is 90%, 1 − α = .9, α = 0.1, α2 = .05
I
from the standard normal table find zα/2 such that the area
conf .level
.
100
solve for α
to the right zα/2 of is α/2
I
If conf level 90%, z.05 = 1.65
I
The required confidence interval is X̄ ± zα/2 √σn
Conf. interval for µ, σ unknown
I
to recap
I
A confidence interval for µ from a normal population is
I
X̄ ± ‘Margin of error‘
I
if the confidence level or confidence coefficient is
100(1 − α), then
I
‘Margin of error‘ = zα/2 √σn
I
Typically σ is not known
I
the sample s.d (denoted by ) ‘ s‘ serves as an estimate of σ
I
If n is large ( rule of thumb n ≥ 30 ), s is a reasonably
accurate estimate of σ
I
use the confidence interval
s
X̄ ± zα/2 √
n
Interpretation of confidence interval
I
We called X̄ ± 1.96 √σn a 95% confidence interval for µ
because,
I
P(X̄ − 1.96 √σn < µ < X̄ + 1.96 √σn ) = .95
I
What is the random quantity in the above equation?
I
X̄ . It is X̄ that changes from sample to sample
Interpretation of confidence interval
I
P(X̄ − 1.96 √σn < µ < X̄ + 1.96 √σn ) = .95
I
In repeated sampling in 95% cases the conf.level
calculated from the sample will contain the unknown µ
I
Say a sample of size ‘n‘ is ‘good‘ if the 95% conf. interval
coming from this sample contains the unknown mean.
I
Roughly 95% of the samples will be ‘good‘
Problem 6.4
tophat 3,13
I
n=90, x̄ = 25.9, s = 2.7
I
σ unknown but n ≥ 30
I
confidence coefficient = 95%. 1 − α = .95, α2 = .025
I
z.025 = 1.96
I
conf. interval = 25.9 ± 1.96 √2.7
= 25.9 ± .56
90
I
confidence coefficient = 90%. 1 − α = .90, α2 = .05
I
z.025 = 1.65
I
conf. interval = 25.9 ± 1.65 √2.7
= 25.9 ± .47
90
I
confidence coefficient = 99%. 1 − α = .99, α2 = .005
I
z.005 = 2.576
I
conf. interval = 25.9 ± 2.576 √2.7
90
I
= 25.9 ± .73
Problem 6.11
I
n = 307, x̄ = 3.11, s = .66
b confidence coefficient = 98%, 1 − α = .98, α2 = .01
I
z.01 = 2.326
I
conf. interval = 3.11 ± 2.326 √.66
307
I
= 3.11 ± .088 = 93.02, 3.20)
tophat 24,32
variations
Find confidence interval
I
The basic equation in the σ known case is
I
Margin of Error = ME = zα/2 √σn
I
The equation connects three quantities; ME, confidence
coefficient ( through zα/2 ), n
I
If any two are given, we can find the third
Find Confidence Coefficient
I
we are given ‘n ‘, confidence coefficient – hence
α and so zα/2
I
Find confidence interval
I
equivalently find ME and set X̄ ± ME
Example
Beechcraft, Inc. wants to estimate the average time it takes for
the Beechjet corporation jet to climb from sea level to 41,000
I
We are given n, ME
I
Find Confidence coefficient
I
ME = zα/2 √σn ; so zα/2 = ME
I
Use normal table to determine
feet. From previous experience, company engineers believe
√
n
σ
P(−zα/2 < Z < zα/2 ) = 1 − α
that the standard deviation of climbing time is 4 minutes. The
model is tested in 100 random trials.
1. If the sample mean is 30 minutes.Find 80% confidence
interval for the average climbing time from sea level to
41,000 feet.
2. If Beechcraft, Inc. uses 0.0515 as the ME, what is the
confidence level (in percentage) associated with the
resulting confidence interval?
3. If Beechcraft, Inc. wants the 80% confidence interval for
the mean with width 0.515, find the required sample size
I
n=100, σ = 4
I
If the sample mean is 30 minutes.Find 80% confidence
what is the confidence level (in percentage) associated
interval for the average climbing time from sea level to
with the resulting confidence interval?
I
41,000 feet.
I
x̄ = 30, 1 − α = .8,
= .1, zα/2 = 1.28
I
4
confidence interval: 30 ± 1.28 √100
I
If Beechcraft, Inc. wants the 80% confidence interval for
the mean to be of width 0.0515, so ME =
I
0.055
2 .find
I
√
100
4
= 1.29
confidence level = 100P(−1.29 < Z < 1.29) = 80%
Sample size determination
Here the confidence coefficient is given and ME is given.
I
ME = zα/2 √σn
I
n=
Problem is to find ‘n‘
I
zα/2 = .0515
the
required sample size
I
ME = 0515.
4
= .0515
zα/2 √100
I
α
2
If Beechcraft, Inc. uses 0.515 as the estimate of width,
zα/2 σ
ME
1.29∗4 2
2
Sometimes the accuracy is stated in terms of the width of
I
In the last problem n =
the interval. Note ME = width
2 .
I
so 353 samples are required to ensure that a margin of
this is called sample size determination problem
.275
= 352.07
error of .0275 has 80% confidence
Relationship between n, α and ME
tophat 84
I
ME = zα/2 √σn
I
With confidence Level (hence z) fixed, ME decreases as n
increases (Larger the sample, narrower the ME)
I
With ME fixed, As n increases, z increases; i.e. the
confidence level increases
I
With confidence level fixed, as ME decreases, n increases
( need more samples to get a narrower interval)
I
I
tophat 11,16,95
σ - unknown
The basic model here is: we have a normal population
whose mean µ and σ both unknown.
I
we want to draw a sample of of size ‘n’ and get a
confidence interval for the unknown mean µ
I
Since σ is not known, we estimate it by the sample
standard deviation ‘s’
I
This gives us the “ t-statistic “
t=
I
X̄ − µ
s
The distribution of the t-statistic is no longer normal. It has
Student’s t-Statistic
The t-statistic has a sampling distribution very much like
that of the z-statistic: mound-shaped, symmetric, with
mean 0.
The primary difference
between the sampling
distributions of t and z
is that the t-statistic is
more variable than the
z-statistic.
a distribution called ‘student’s t-distribution with (n-1)
degrees of freedom
Degrees of Freedom
The actual amount of variability in the sampling
distribution of t depends on the sample size n. A
convenient way of expressing this dependence is to say
that the t-statistic has (n – 1) degrees of freedom (df).
Student’s t Distribution
Standard
Normal
Bell-Shaped
Symmetric
‘Fatter’ Tails
t (df = 13)
t (df = 5)
0
z
t
t - Table
t-value
If we want the t-value with an area of .025 to its right
and 4 df, we look in the table under the column t.025 for
the entry in the row corresponding to 4 df. This entry is
t.025 = 2.776. The corresponding standard normal zscore is z.025 = 1.96.
I
Note the distribution depends ( unlike the known σ case)
I
by
on ‘n’.
I
The confidence interval for µ when σ is unknown is given
s
X̄ ± t(n−1),/2 √
n
Degrees of freedom: Before estimating σ by ‘s’ the ‘n’
observations are used to compute X̄ . If ‘n’ numbers are to
have a fixed mean, then only (n-1) of these numbers can
be arbitrary.
where
I
t(n−1) is found from the t-table
I
‘s’ is the sample standard deviation
I
Given confidence level and ME, find sample size
I
Cannot give an answer like the known σ case
I
Given an interval X̄ , s, find the confidence level
I
One solution: Take a preliminary sample, estimate σ.
I
can answer with software
I
with tables: only limited answers
Proceed with this estimate to decide on a sample for the
second stage.
I
will not pursue this ‘two-stage’ sampling here
I
The t-table ( in the text) does not list all ‘degrees of
freedom’. So we will work with the closest in the table.
I
as n → ∞, the t-distribution converges to the standard
normal
Problems 13,29,32,106
Problem 13
Problem 29
I
n = 1751, x̄ = 6563, s = 2484
I
Find 90% confidence interval
I
Since ‘n‘ is large, we do not need ‘t-distribution‘. Can work
with normal tables
I
I
1 − α = .9, α2 = .05, zα/2 = 1.645
confidence interval: 6563 ±
1.645 √2484
1751
= 6563 ± 97.65
I
n = 7, x̄ = 89.86, s = 11.63
I
n is small and σ is unknown so cannot use normal
approximation
I
1 − α = .95, α2 = .025, t6,.025 = 2.447
I
√
confidence interval: 89.86 ± 2.447 11.63
= (79.10, 100.62)
7
Problem 32
I
n = 20, x̄ = 3.8, s = 1.2
I
1 − α = .9, α2 = .05, t19,.05 = 1.729
I
confidence interval: 3.8 ± 1.729 √1.2
= 3.8 ± .464
20
I
The average LOS for women is 4.6. In this hospital it is less
than 4.26. So women in this hospital have a smaller LOS.
tophat 40,41,46
Conf. Int. for PROPORTIONS
I
I
I
I
We have a large population with two categories ‘S’ and ’F’
I
p- proportion of ‘S’ in the population is unknown
I
confidence innterval for p based on a “large” sample
p̂- sample proportion
q
For large n, p̂ is N p, p(1−p)
n
The confidence interval is
p̂ ± z
I
r
p(1 − p)
n
Since this involves the unknown p, replace by its estimate
p̂
p̂ ± z
r
p̂(1 − p̂)
n
z- from normal tables using prescribed conf. level
Conditions Required for a Valid Large-Sample
Adjusted Confidence interval
Confidence Interval for p
I
I
A random sample is selected from the target population.
I
The sample size n is large. (This condition will be satisfied
The confidence interval just discussed performs poorly if p
is close to 0 or to 1
I
A better confidence interval due to Agresti is
if both np̂ ≥ 15 and n(1 − p̂) ≥ 15) . Note that np̂ and
p̃ ± zα/2
n(1 − p̂) are simply the number of successes and number
of failures, respectively, in the sample.).
where p̃ =
+2
n+4 .
r
p̃(1 − p̃)
n+4
sample size
sample size
q
p(1−p)
n
I
ME = z
I
z 2
n = [ ME
] p̂(1 − p̂)
I
p̂ depends on the sample
I
ME = width/2
I
If we have a prior estimate of p use that in place of p̂
I
If nothing is known about p, be conservative and use
z 21
p = 1/2. i.e., n = [ ME
] 4
Problem 43
problems 43,45,54,73
I
n = 225, p̂ = .46
I
np̂ = 103.5, n(1 − p̂) = 121.5
I
Both are larger than 15. So o.k. to use the method
I
conf. interval: z.025 = 1.96, p̂ = .46, (1 − p̂) = .54
I
q
.46 ± 1.96 .46∗.54
225 = .46 ± .065
Problem 45
I
n = 2045, p̂ =
Problem 54
818
2045
= .4
q
I
p̂ is approximately N(.4,
I
conf. interval: z.025 = 1.96, p̂ = .4, (1 − p̂) = .6
.4∗.6
2045 )
I
The population: Senior HR executives
I
The proportion of HR executives who believe that their
managers interview too many people
= N(.4, .011)
I
q
.46 ± 1.96 .4∗.6
2045 = (.38, .42)
I
No. Zillow’s claim falls outside the conf .interval
211
502
I
n = 502, p̂ =
I
np̂ = 211, n(1 − p̂) = 291
I
Both are larger than 15. So o.k. to use the method
I
conf. interval: z.01 = 2.326, p̂ = .42, (1 − p̂) = .58
q
.42 ± 2.326 .42∗.58
502 = .42 ± .051
I
I
= .42
narrower
Problem 73
I
Problem :. What is the required sample size for
determining the proportion of defective items in a process
1
3
I
preliminary estimate of p =
I
ME = .01
I
confidence level = 99%, so z.005 = 2.576
I
21 2
n = ( 2.576
.01 ) 3 3 = 14735
if the proportion is to be known within 0.05 with 90%
confidence. No guess as to the population proportion is
available
I
.05 2 1
n = ( z.05
) 4 = 271
I
A polling company wants to estimate the proportion of the
population that will vote for party D in the next election.
They want to do it with a margin of error of 3% and with
Problem 63 A company believes its market share is about 14%.
Find the minimum required sample size for estimating the
actual market share to within 5% with 90% confidence
95% confidence. How large a sample should they take?
n=(
I
n=
.025 2 1
( z.03
) 4
= 1068
1.65 2
) (.14)(.86)
.05