Download Dr. Ka-fu Wong

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Dr. Ka-fu Wong
ECON1003
Analysis of Economic Data
Ka-fu Wong © 2003
Chap 7- 1
Chapter Seven
Estimation and Confidence Intervals
GOALS
1.
2.
3.
4.
5.
6.
l
Define a what is meant by a point estimate.
Define the term level of confidence.
Construct a confidence interval for the population
mean when the population standard deviation is
known.
Construct a confidence interval for the population
mean when the population standard deviation is
unknown.
Construct a confidence interval for the population
proportion.
Determine the sample size for attribute and variable
sampling.
Ka-fu Wong © 2003
Chap 7- 2
Point and Interval Estimates
 A point estimate is a single value (statistic) used
to estimate a population value (parameter).
 A confidence interval is a range of values within
which the population parameter is expected to
occur.
Ka-fu Wong © 2003
Chap 7- 3
Confidence Intervals
 The degree to which we can rely on the statistic is as
important as the initial calculation. Remember, most of
the time we are working with samples. And samples give
us estimates of the population parameter – only
estimates. Ultimately, we are concerned with the
accuracy of the estimate.
1. Confidence interval provides range of values
 Based on observations from 1 sample
2. Confidence interval gives information about closeness to
unknown population parameter
 Stated in terms of probability
 Exact closeness not known because knowing exact
closeness requires knowing unknown population
parameter
Ka-fu Wong © 2003
Chap 7- 4
Areas Under the Normal Curve
If we draw an observation from
the normal distributed
population, the drawn value is
likely (a chance of 68.26%) to
lie inside the interval of
(µ-1σ, µ+1σ).
Between:
± 1  - 68.26%
± 2  - 95.44%
± 3  - 99.74%
P((µ-1σ <x<µ+1σ) =0.6826.
µ+2σ
µ-2σ µ
µ-3σ µ-1σ µ+1σ µ+3σ
Ka-fu Wong © 2003
Chap 7- 5
P(µ-1σ <x<µ+1σ) vs
P(x-1σ <µ <x+1σ)
 P(µ-1σ <x<µ+1σ) is the probability that a drawn
observation will lie between (µ-1σ, µ+1σ).
P(µ-1σ <x<µ+1σ)
= P(µ-1σ -µ-x <x -µ-x<µ +1σ -µ-x)
= P(-1σ -x <-µ<1σ -x)
= P(-(-1σ -x )>-(-µ)>-(1σ -x))
= P(1σ +x >µ>-1σ +x)
= P(x - 1σ <µ <x+1σ)
 P(x-1σ <µ <x+1σ) is the probability that the population
mean will lie between (x-1σ, x+1σ).
Ka-fu Wong © 2003
Chap 7- 6
P(µ-1σm <x<µ+1 σm) vs
P(m-1 σm <µ <m+1 σm)
(m=sample mean)
 P(µ-1 σm <m<µ+1 σm) is the probability that a drawn
observation will lie between (µ-1σ, µ+1σ).
P(µ-1 σm <m<µ+1 σm)
= P(µ-1 σm -µ-m <x -µ-m<µ +1 σm -µ-m)
= P(-1 σm -m <-µ<1 σm -m)
= P(-(-1 σm -m )>-(-µ)>-(1 σm -m))
= P(1 σm +m>µ>-1 σm +m)
= P(m - 1 σm <µ <m+1 σm)
 P(m-1 σm <µ <m+1 σm) is the probability that the
population mean will lie between (m-1 σm , m+1 σm).
Ka-fu Wong © 2003
Chap 7- 7
P(µ-a <x<µ+b) vs P(x-a<µ <x+b)
 P(µ-a <x<µ+b) is the probability that a drawn
NO!!!!
observation will lie between (µ-a, µ+b).
 P(x-a <µ <x+b) is the probability that the
population mean will lie between (x - a, x+ b).
 Generally, P(µ-a <x<µ+b) = P(x-a <µ <x+b)
 Generally, P(µ-a <x<µ+b) and P(x-a <µ <x+b)
are not equal. They are equal only if a = b. That
is, when the confidence interval is symmetric.
Ka-fu Wong © 2003
Chap 7- 8
P(µ-a <x<µ+b) = P(x-b <µ <x+a)
 P(µ-a <x<µ+b) is the probability that a drawn
observation will lie between (µ-a, µ+b).
P(µ-a <x<µ+b)
= P(µ-a -µ-x <x -µ-x<µ +b -µ-x)
= P(-a -x <-µ<b -x)
= P(-(-a -x )>-(-µ)>-(b -x))
= P(a +x >µ>-b +x)
= P(x - b <µ <x+a)
 P(x-b <µ <x+a) is the probability that the population
mean will lie between (x - b, x+ a).
Ka-fu Wong © 2003
Chap 7- 9
Elements of Confidence Interval Estimation
We are concerned about the probability that the
population parameter falls somewhere within the interval
around the sample statistic.
Confidence Interval
Confidence Limit
(Upper)
Confidence Limit
(Lower)
X Z 
X
Sample Statistic
X
X  Z 
X
Generally, we consider symmetric confidence intervals only.
Ka-fu Wong © 2003
Chap 7- 10
Confidence Intervals
The likelihood (probability) that the sample mean of a randomly
drawn sample will fall within the interval:   Z     Z  
n
X
x_
  2.58
X
 1.645
 1.96
X
X
  1.645
X
 1.96
90% Samples
  2.58
X
X
X
95% Samples
99% Samples
Ka-fu Wong © 2003
Chap 7- 11
Confidence Intervals
The likelihood (or probability) that the sample mean will fall
within “1 standard deviation” of the population mean is the
same as the likelihood (or probability) that the population
mean will fall within “1 standard deviation” of the sample
mean.
Z
P( Z   X   Z )
X
X
P( X Z     X  Z )
X
X
1.645
0.90
0.90
1.96
0.95
0.95
2.58
0.99
0.99
Ka-fu Wong © 2003
Chap 7- 12
Level of Confidence
1. Probability that the unknown
parameter falls within the interval
population
2. Denoted (1 -   level of confidence

is the probability that the parameter is not
within the interval
3. Typical values are 99%, 95%, 90%
Ka-fu Wong © 2003
Chap 7- 13
Interpreting Confidence Intervals
 Once a confidence interval has been constructed, it will
either contain the population mean or it will not.
 For a (1-)
95% confidence interval,
 If we were to draw 1000 samples and construct the
(1-) confidence interval for the population mean for
95%
each of the 1000 samples.
 Some of the intervals contain the population mean,
some not.
 If the interval is a (1-)
95% confidence interval, about
950 of the confidence intervals will contain the
population mean.
 That is, (1-)
95% of the samples will contain the
population mean.
Ka-fu Wong © 2003
Chap 7- 14
Intervals & Level of Confidence
Sampling
Distribution
of Mean
_
/2
x
1 -
/2
x = 
Intervals
Extend from
X  Z 
X  Z 
X
_
X
(1 - ) % of
Intervals
Contain  .
to
 % Do Not.
X
Large Number of Intervals
Ka-fu Wong © 2003
Chap 7- 15
Point Estimates and Interval Estimates
)  X (Z
) 
 /2 X
 /2 n
X (Z

The factors that determine the width of a confidence
interval are:
1. The size of the sample (n) from which the statistic
is calculated.
2. The variability in the population, usually
estimated by s.
3. The desired level of confidence.
_
/2
Ka-fu Wong © 2003
x
1 -
 =
/2
_
X
Chap 7- 16
Point and Interval Estimates
 We may use the z distribution if one of the following
conditions hold:
 The population is normal and its standard deviation is
known
 The sample has more than 30 observations (The
population standard deviation can be known or
unknown).
s
X z
n
 Technical note:
 If the random variables A and B are normally distributed,
Y = A+B and X=(A+B)/2 will be normally distributed.
 If the population is normal, the sample mean of a
random sample of n observations (for any integer n) will
be normally distributed.
Ka-fu Wong © 2003
Chap 7- 17
Point and Interval Estimates
 Use the t distribution if all of the following conditions are
fulfilled:
 The population is normal
 The population standard deviation is unknown and the
sample has less than 30 observations.
s
X t
n
 Note that the t distribution does not cover those nonnormal populations.
Ka-fu Wong © 2003
Chap 7- 18
Student’s t-Distribution
 The t-distribution is a family of distributions that is bellshaped and symmetric like the standard normal
distribution but with greater area in the tails. Each
distribution in the t-family is defined by its degrees of
freedom. As the degrees of freedom increase, the tdistribution approaches the normal distribution.
 Student is a pen name for a statistician named William S.
Gosset who was not allowed to publish under his real
name. Gosset assumed the pseudonym Student for this
purpose. Student’s t distribution is not meant to
reference anything regarding college students.
Ka-fu Wong © 2003
Chap 7- 19
Student’s t-Distribution
Standard
Normal
Bell-Shaped
t (df = 13)
Symmetric
t (df = 5)
‘Fatter’ Tails
0
Ka-fu Wong © 2003
Z
t
Chap 7- 20
Student’s t Table
Upper Tail Area
df
.25
.10
.05
/2
Assume:
n=3
df = n - 1 = 2
 = .10
/2 =.05
1 1.000 3.078 6.314
2 0.817 1.886 2.920
.05
3 0.765 1.638 2.353
t Values
Ka-fu Wong © 2003
0
2.920
t
Chap 7- 21
Degrees of freedom (df)
 Degrees of freedom refers to the number of independent
data values available to estimate the population’s
standard deviation. If k parameters must be estimated
before the population’s standard deviation can be
calculated from a sample of size n, the degrees of freedom
are equal to n - k.
 Example
Sum of 3 numbers is 6
X1 = 1 (or Any Number)
X2 = 2 (or Any Number)
X3 = 3 (Cannot Vary)
Sum = 6
Ka-fu Wong © 2003
Degrees of freedom
= n -1
= 3 -1
=2
Chap 7- 22
t-Values
x
t
s
n
where:
x = Sample mean
= Population mean
s = Sample standard deviation
n = Sample size

Ka-fu Wong © 2003
Chap 7- 23
Confidence interval for mean
( unknown in small sample)
A random sample of n = 25 has X = 50 and S = 8. Set
up a 95% confidence interval estimate for .
X  t  / 2, n 1 
50  2.0639 
S
n
8
   X  t  / 2, n 1 
   50  2.0639 
25
46.69    53.30
Ka-fu Wong © 2003
S
n
8
25
Chap 7- 24
Central Limit Theorem
 For a population with a mean  and a variance 2
the sampling distribution of the means of all possible
samples of size n generated from the population will be
approximately normally distributed.
 The mean of the sampling distribution equal to  and the
variance equal to 2/n.
The population distribution
The sample mean of n observation
Ka-fu Wong © 2003
X ~ ?( , )
2
X n ~ N ( , 2 / n )
Chap 7- 25
Standard Error of the Sample Means
 The standard error of the sample mean is the
standard deviation of the sampling
distribution of the sample means.
 It is computed by

x 
n
x
is the symbol for the standard error of
the sample mean.
 σ is the standard deviation of the population.
 n is the size of the sample.

Ka-fu Wong © 2003
Chap 7- 26
Standard Error of the Sample Means
 If  is not known and n  30, the standard
deviation of the sample, designated s, is used to
approximate the population standard deviation.
The formula for the standard error is:
sx
Ka-fu Wong © 2003
s

n
Chap 7- 27
95% and 99% Confidence Intervals for
the sample mean
 The 95% and 99% confidence intervals are
constructed as follows:
 95% CI for the sample mean is given by
s
  1.96
n
 99% CI for the sample mean is given by
s
  2.58
n
Ka-fu Wong © 2003
Chap 7- 28
95% and 99% Confidence Intervals for µ
 The 95% and 99% confidence intervals are
constructed as follows:
 95% CI for the population mean is given by
s
X  1.96
n
 99% CI for the population mean is given by
s
X  2.58
n
Ka-fu Wong © 2003
Chap 7- 29
Constructing General Confidence Intervals
for µ
 In general, a confidence interval for the mean is
computed by:
s
X z
n
Ka-fu Wong © 2003
Chap 7- 30
EXAMPLE 3
 The Dean of the Business School wants to
estimate the mean number of hours worked
per week by students. A sample of 49 students
showed a mean of 24 hours with a standard
deviation of 4 hours. What is the population
mean?
 The value of the population mean is not known.
Our best estimate of this value is the sample
mean of 24.0 hours. This value is called a point
estimate.
Ka-fu Wong © 2003
Chap 7- 31
Example 3
continued
Find the 95 percent confidence interval for
the population mean.
s
4
X  1.96
 24.00  1.96
n
49
 24.00  1.12
The confidence limits range from 22.88 to 25.12.
About 95 percent of the similarly constructed
intervals include the population parameter.
Ka-fu Wong © 2003
Chap 7- 32
Confidence Interval for a Population
Proportion
 The confidence interval for a population
proportion is estimated by:
pˆ  Z / 2
Ka-fu Wong © 2003
pˆ (1  pˆ )
n 1
Chap 7- 33
EXAMPLE 4
 A sample of 500 executives who own their own
home revealed 175 planned to sell their homes
and retire to Arizona. Develop a 98%
confidence interval for the proportion of
executives that plan to sell and move to
Arizona.
(1   )  0.98    0.02
Z / 2  Z0.01  2.33
(.35 )(.65)
.35  2.33
 .35  .0456
500  1
Ka-fu Wong © 2003
Chap 7- 34
Finite-Population Correction Factor
 A population that has a fixed upper bound is said to be
finite.
 For a finite population, where the total number of objects
is N and the size of the sample is n, the following
adjustment is made to the standard errors of the sample
means and the proportion:
 Standard error of the sample means when  is known:
x 

n
N n
N 1
 Standard error of the sample means when  is NOT
known and need to be estimated by s:
s
ˆ x 
n
Ka-fu Wong © 2003
N n
N
Chap 7- 35
Finite-Population Correction Factor
 Standard error of the sample proportions:
ˆ pˆ 
ˆ (1  p
ˆ)
p
n 1
N n
N
 This adjustment is called the finite-population
correction factor.
 If n/N < .05, the finite-population correction
factor is ignored.
Ka-fu Wong © 2003
Chap 7- 36
EXAMPLE 5
 Given the information in EXAMPLE 3, construct a 95%
confidence interval for the mean number of hours worked
per week by the students if there are only 500 students
on campus.
 Because n/N = 49/500 = .098 which is greater than 05,
we use the finite population correction factor.
4
500  49
24  1.96(
)(
)  24.00  1.0648
500  1
49
4
500  49
24  1.96(
)(
)  24.00  1.0102
49
500
Ka-fu Wong © 2003
Chap 7- 37
Selecting a Sample Size
 There are 3 factors that determine the size of
a sample, none of which has any direct
relationship to the size of the population.
They are:
 The degree of confidence selected.
 The maximum allowable error.
 The variation in the population.
Ka-fu Wong © 2003
Chap 7- 38
Selecting a Sample Size
)  X (Z
) 
 /2 X
 /2 n
X (Z
 To find the sample size for a variable:
z*
s
z*s
E n 

n
 E 
2
where : E is the allowable error, z is the z- value
corresponding to the selected level of confidence,
and s is the sample deviation of the pilot survey.
Ka-fu Wong © 2003
Chap 7- 39
EXAMPLE 6
 A consumer group would like to estimate the
mean monthly electricity charge for a single
family house in July within $5 using a 99
percent level of confidence. Based on similar
studies the standard deviation is estimated to
be $20.00. How large a sample is required?
2
 (2.58)( 20) 
n
  107
5


Ka-fu Wong © 2003
Chap 7- 40
Sample Size for Proportions
 The formula for determining the sample size in
the case of a proportion is:
Z
n  p(1  p ) 
E 
2
 where p is the estimated proportion, based on
past experience or a pilot survey; z is the z value
associated with the degree of confidence
selected; E is the maximum allowable error the
researcher will tolerate.
Ka-fu Wong © 2003
Chap 7- 41
EXAMPLE 7
 The American Kennel Club wanted to
estimate the proportion of children that
have a dog as a pet. If the club wanted the
estimate to be within 3% of the population
proportion, how many children would they
need to contact? Assume a 95% level of
confidence and that the club estimated
that 30% of the children have a dog as a
pet.
2
 1.96 
n  (.30)(. 70)
  897
 .03 
Ka-fu Wong © 2003
Chap 7- 42
Summary:
Confidence interval for sample mean
General confidence interval:
ˆ  r ( , n )  
ˆ
( = population mean; = confidence level; = standard deviation)
unknown
known
Sample
Size (n)
<30
≥30
Ka-fu Wong © 2003
Population distribution
Normal
Unknown

ˆ  Z

 /2 n
?

ˆ  Z

 /2 n
Population distribution
Normal
Unknown
ˆ  t

 / 2, n  1

n
?
ˆ
ˆ  Z

 /2 n
ˆ   ( x ˆ )2 /(n  1)


i
1/ 2
Chap 7- 43
Summary:
Confidence Interval for sample proportion
General confidence interval:
pˆ  r ( , n )  
pˆ
(p= population mean; = confidence level; = standard deviation)
  p(1  p)
1/ 2
<30
≥30
Population distribution
Normal
Unknown
pˆ  Z
 /2


 /2
Population distribution
Normal
Unknown

pˆ  t
  ˆ 
 / 2, n  1 n n  1


n
pˆ  Z
1/ 2
unknown
known
Sample
Size (n)
ˆ  pˆ (1  pˆ )


n
ˆ
  ˆ 
 / 2  n n  1
pˆ  Z
Because  = p(1-p), we know  if only if we know p. If we know p, there is no
need to estimate p or to construct the confidence interval for p.
Ka-fu Wong © 2003
Chap 7- 44
Chapter Seven
Estimation and Confidence Intervals
- END -
Ka-fu Wong © 2003
Chap 7- 45