• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Transcript
Chapter Outline
2.1 Estimation
• Confidence Interval Estimates for Population Mean
• Confidence Interval Estimates for the Difference Between
Two Population Mean
• Confidence Interval Estimates for Population Proportion
• Confidence Interval Estimates for the Difference Between
Two Population Proportion
2.2 Error of Estimation & Determining the sample size
• Statistical inference - process of drawing an inference about
the data statistically.
• It concerned in making conclusion about the characteristics
of a population based on information contained in a sample.
• Since populations are characterized by numerical descriptive
measures called parameters, therefore, statistical inference is
concerned in making inferences about population
parameters.
2.1 Estimation
• In estimation, there are two terms that firstly, should be
understand. The two terms involved in estimation are:
i)Estimator : sample statistics used to estimate a population
parameter.
ii)Estimate: value that obtained from a sample to estimate a
population parameter.
• An estimate of a population parameter may be expressed in
two ways:
i) Point Estimate
ii) Interval Estimate.
i)
•
•
•
Point Estimate
A point estimate of a population parameter is a single
value of a statistic.
For example, the sample mean x is a point estimate of
the population mean μ.
Similarly, the sample proportion p̂ is a point estimate
of the population proportion P.
ii) Interval Estimate
•
An interval estimate is defined by two numbers,
between which a population parameter is said to lie.
• For example, a < μ < b is an interval estimate of the
population mean μ. It indicates that the population
mean is greater than a but less than b.
Point estimators
• Choosing the right point estimators to estimate a parameter
depends on the properties of the estimators it selves.
• There are four properties of the estimators that need to be
satisfied in which it is considered as best linear unbiased
estimators. The properties are:
 Unbiased
 Consistent
 Efficient
 Sufficient
Point estimators for mean, variance, and proportion
Population mean
• Given a sample X1, X2,X3,...,Xn of size n taken from a certain population with
unknown mean, µ and variance, σ2 . The sample mean X is the best
estimator of µ.
Population variance
• Given a sample X1, X2,X3,...,Xn of size n taken from a certain population
2
with mean, µ and variance, σ2 . The sample variance
n
1
S2 
is the best estimator of σ2 .
 X I  X 
n 1
i 1
Population proportion
• Given a sample X1, X2,X3,...,Xn of size n taken from a certain population with
unknown proportion P . The sample proportion P̂ is the best estimator of
P.
2.1.1 Confidence Interval
• Each interval is constructed with regard to a given confidence
level and is called a confidence interval.
• The confidence level associated with a confidence interval
states how much confidence we have that this interval
contains the true population parameter.
• The confidence level is denoted by
i) Confidence Interval Estimates for
Population Mean,()
The (1   )100% Confidence Interval of Population Mean, 
(i) x  z

2
n
if  is known and normally distributed population

 

or  x  z
   x  z

2
2
n
n

s
if  is unknown, n large (n  30)
2
n
s
s 

or  x  z
   x  z

2
2
n
n

(ii) x  z
s
(iii) x  tn 1,
if  is unknown, normally distributed population
2
n
and small sample size  n  30 
s
s 

or  x  tn 1,
   x  tn 1,

2
2
n
n

Example:
If a random sample of size n  20 from a normal population
with the variance  2  225 has the mean x  64.3, construct
a 95% confidence interval for the population mean, .
Solution:
It is known that, n  20,   x  64.3 and   15
For 95% CI,
95%  100(1 –  )%
1 –  0.95
  0.05

2
 0.025
z  z0.025  1.96
2
  
Hence, 95% CI  x  z 

n

2 
 15 
 64.3  1.96 

20


 64.3  6.57
 [57.73, 70.87]
@
57.73    70.87
Thus, we are 95% confident that the mean of random variable
is between 57.73 and 70.87
Example:
The brightness of a television picture tube can be evaluated by measuring the
amount of current required to achieve a particular brightness level. A random
sample of 10 tubes indicated a sample mean x  317.2 microamps and a sample
standard deviation is s  15.7 microamps. Find (in microamps) a 99% confidence
interval estimate for mean current required to achieve a particular brightness
level.
Solution:
s  15.7, n  10  30, x  317.2
For 99% CI: 99%  1   100%
1    0.99
  0.01

 0.005
2
From t normal distribution table: t ,n  1  t0.005 ,9  3.250
2
Hence 99% CI
 15.7 
 317.2  t0.005 ,9 

 10 
 15.7 
 317.2   3.250  

 10 
  301.0645,333.3355  microamps
Thus, we are 99% confident that the mean mean current required to achieve a
particular brightness level is between 301.0645 and 333.3355
Exercise:
Taking a random sample of 35 individuals waiting to be serviced by the
teller, we find that the mean waiting time was 22.0 min and the standard
deviation was 8.0 min. Using a 90% confidence estimate the mean
waiting time for all individuals waiting in the service line.
ii) Confidence Interval Estimates for the Differences
Between Two Population Mean, 1  2 
i) Variance  12 and  2 2
X
1
are known:
 X 2   Z
2
 12
n1

 22
n2
ii) If the population variances,  1 and  2 are unknown, then the
following tables shows the different formulas that may be used
depending on the sample sizes and the assumption on the
population variances.
2
2
Equality of
variances,
when  12 , 2 2
are unknown
Sample size
n1  30, n2  30
X
1
 X 2   Z
 12   2 2
2
 X1  X 2   Z S p
 12   22
2
Sp
2
2
1
n1  30, n2  30
X
2
s
s
 2
n1 n2
1
1

n1 n2
n1  1 s12   n2  1 s2 2


n1  n2  2
1
 X 2   t
2
,v
s12 s2 2

n1 n2
2
s
s2 



n1 n2 

v
2
2
 s12   s2 2 
  

n
n
 1   2 
n1  1
n2  1
2
1
X
1  X 2   t S p
2
Sp
2
2
,v
1
1

n1 n2
n1  1 s12   n2  1 s2 2


n1  n2  2
v  n1  n2  2
Example:
Two machines are used to fill plastic bottles with liquid laundry detergent.
The standard deviations of fill volume are known to be 1  0.10 and  2  0.15
fluid ounce for the two machines, respectively. Two random samples of n1  14
bottles from the machine 1 and n2  12 bottles from machine 2 are selected,
and the sample means fill volume are x1  30.5 and x2  29.4 fluid ounces.
Construct a 90% confidence interval on the mean difference in fill volumes.
Interpret the results.
Solution:
For 90% CI:
Machine 1:
x1  30.5
Machine 2:
x2  29.4
 1  0.10
n1  14
 2  0.15
n2  12
1   100%  90
1    0.90
  0.1

 0.05
2
X
1
 X 2   Z
2
 12  2 2
0.102 0.152

  30.5  29.4   Z 0.05

12
14
n1 n2
 1.1  1.6449  0.0509 
 1.0163,1.1837 
We are 90% confidence that the mean difference to fill volumes lies
between 1.0163 and 1.1837 fluid ounces.
Example:
A study was conducted to compare the starting salaries for university
graduates majoring in computer science and engineering. A random
sample of 50 recent university graduates in each major were selected and
the following information was obtained.
Major
Computer Science,
Mean
SD
RM 2500
RM 100
Engineering
RM 2800
RM 150
Construct a 99% confidence interval for the difference in the mean
starting salaries for two majors.
Solution:
X
c
 X e   Z
2
sc 2 se 2
1002 1502

  2500  2800   Z 0.005

nc ne
50
50
 300   2.5758  650   365.6703, 234.3297 
We are 99% confidence that the mean difference of starting salaries for
to major lies between -365.6703 and -234.3297.
Exercise:
randomly selected from faculty of mechanical engineering. Result for test 2
SSM 3763 shown the following data:
Male
: X M  82, S M  8
Female : X F  76, S F  6
Assume that both population are normally distributed and have equal
population variances. Construct a 95% confidence interval for the difference
in the two means.
iii) Confidence Interval Estimates for
Population Proportion,(p)
The (1   )100% Confidence Interval for p for Large Samples (n  30)
pˆ  z
pˆ 1  pˆ 
2
n
Example:
According to the analysis of Women Magazine in June 2005, “Stress has
become a common part of everyday life among working women in
Malaysia. The demands of work, family and home place an increasing
burden on average Malaysian women”. According to this poll, 40% of
working women included in the survey indicated that they had a little
amount of time to relax. The poll was based on a randomly selected of
1502 working women aged 30 and above. Construct a 95% confidence
interval for the corresponding population proportion.
Solution:
Let p be the proportion of all working women age 30 and above, who
have a limited amount of time to relax, and let p̂ be the corresponding
sample proportion. From the given information,
n = 1502 , p̂ = 0.40 , qˆ  1  pˆ = 1 – 0.40 = 0.60
Hence, 95% CI :
 p̂  Z 
2
ˆpqˆ
n
 0.40  Z 0.025
 0.4  0.6 
1502
 0.4  0.01264069
  0.375,0.425  or 37.5% to 42%
Thus, we can state with 95% confidence that the proportion of all working
women aged 30 and above who have a limited amount of time to relax is
between 37.5% and 42.5%.
Exercise:
The wedding ceremony for a couple, Jamie and Robbin will be held in
Menara Kuala Lumpur. A survey has been carried out to determine the
proportion of people who will come to the ceremony. From 250
invitations, only 180 people agree to attend the ceremony. Find a 90%
confidence interval estimate for the proportion of all people who will
attend the ceremony.
iv) Confidence Interval Estimates for the Differences
Between Two Population Proportion,  p1  p2 
Example:
Two separate surveys were carried out to investigate whether or not the
users of Plus highway were in favour of raising the speed limit on highways.
Of the 250 car drivers interviewed, 220 were in favour of raising the speed
limit while of the 200 motorists interviewed , 180 were in favour. Find a 95%
confidence interval for the difference in proportion between the car drivers
and motorist who are in favour of raising the speed limit.
Solution:
ˆpc 
220
180
 0.88, ˆpm 
 0.9
250
200
Hence, 95% CI :
ˆpc  ˆpm  Z 
2
ˆpc  qˆ c 
nc

ˆpm  qˆ m 
nm
  0.88  0.9   Z 0.025
0.88  0.12  0.9  0.1

250
200
 0.02  1.9600  0.03
  0.0788,0.0388 
We are 95% confident that the difference between the car drivers and motorist who are in
favour of raising the speed limits lies between -0.0788 and 0.0388.
2.2 Error of Estimation and Choosing the
Sample Size
When we estimate a parameter, all we have is the estimate
value from n measurements contained in the sample. There
are two questions that usually arise:
(i) How far our estimate will lie from the true value of the
parameter?
(ii) How many measurements should be considered in the
sample?
The distance between an estimate and the estimated parameter
is called the error of estimation.
For example if most estimates are within 1.96 standard deviations
of the true value of the parameter, then we would expect the
error of estimation to be less than 1.96 standard deviations of the
estimator, with the probability approximately equal to 0.95.
 In the process of determining the sample size, we have
to define the parameter to be estimated and standard
error of its point estimator.
 Firstly, choose the bound (B) on the margin of error
and confidence coefficient (1-α).
 Then, use the following equation to find suitable
sample size, n:
  
 s 
B  z  
 or B  z  

n
n
2
2

B  z  
2
pˆ qˆ 

n 
Example:
The college president asks the statistics teacher to estimate the average age
of the students at their college. The statistics teacher would like to be 99%
confident that the estimate should be accurate within 1 year. From the
previous study, the standard deviation of the ages is known to be 3 years.
How large a sample is necessary?
Solution:
B  1, s  3, confidence coefficient  99%, thus  1    0.99
 s 
B  z  

n
2 
 3 
1  z 0.005 

 n
 3 
1  2.5758

 n
 3 
0.38882  

 n
  0.01,

 0.005
2
Z0.005  2.5758
Square both side :
2


3
(0.38882) 2  

 n
9
0.1507 
n
 n  59.71  60
Example:
How large a sample required if we want to be 95% confident
that the error in using p̂ to estimate p is less than 0.05? If
pˆ  0.12 , find the required sample size.
Solution:

B  z  
2 
pˆ qˆ 

n 
 (0.12)( 0.88) 

0.05  z 0.025 

n


 (0.12)( 0.88) 

0.05  1.96

n


 (0.12)( 0.88) 

0.0255  

n


  1  0.95  0.05
z 0.025  1.96
Square both side :
2
 0.1056 
2

(0.0255)  

n


 n  162.39  162
Exercise:
The diameter of a two years old Sentang tree is normally
distributed with a Standard deviation of 8 cm. how many trees
should be sampled if it is required to estimate the mean
diameter within ± 1.5 cm with 95% confidence interval?
Related documents