Download Statistical Inference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 2
Statistical Inference
 Estimation
-Confidence interval estimation for mean and
proportion
-Determining sample size

Hypothesis Testing
-Test for one and two means
-Test for one and two proportions
Statistical inference
Statistical inference is a process of drawing an inference
about the data statistically. It concerned in making
conclusion about the characteristics of a population based
on information contained in a sample. Since populations
are characterized by numerical descriptive measures called
parameters, therefore, statistical inference is concerned in
making inferences about population parameters.
ESTIMATION
In estimation, there are two terms that firstly, should be
understand.The two terms involved in
estimation are estimator and estimate.
An estimate of a population parameter may be expressed
in two ways: point estimate and interval estimate.
Point estimate
A point estimate of a population parameter is
a single value of a statistic. For example, the
sample mean x is a point estimate of the
population mean μ. Similarly, the sample
proportion p̂ is a point estimate of the
population proportion p.
Interval estimate
An interval estimate is defined by two numbers,
between which a population parameter is said to
lie.
For example, a < x < b is an interval estimate of
the population mean μ. It indicates that the
population mean is greater than a but less than b.
Point estimators
Choosing the right point estimators to estimate a
parameter depends on the properties of the estimators it
selves. There are four properties of the estimators that
need to be satisfied in which it is considered as best linear
unbiased estimators.The properties are:
 Unbiased
 Consistent
 Efficient
 Sufficient
Confidence Interval
A range of values constructed from the sample data. So that
the population parameter is likely to occur within that range at a
specified probability.
• Specified probability is called the level of confidence.
• States how much confidence we have that this interval
contains the true population parameter. The confidence level is
denoted by
•

i.
ii.
To compute a confidence interval, we will consider two
situations:
We use sample data to estimate,  with X and the
population standard deviation  is known.
We use sample data to estimate, with X and the
population standard deviation is unknown. In this
case, we substitute the sample standard deviation (s)
for the population standard deviation
Example 2.1:
A publishing company has just published a new textbook. Before the company
decides the price at which to sell this textbook, it wants to know the average
price of all such textbooks in the market.The research department at the
company took a sample of 36 comparable textbooks and collected the
information on their prices. this information produced a mean price RM 70.50
For this sample. It is known that the standard deviation of the prices of all
such textbooks is RM4.50.
(a) What is the point estimate of the mean price of all such college textbooks?
(b) Construct a 90% confidence interval for the mean price of all such college
textbooks.
Solution:
(a) The point estimate of the mean price of all such college textbooks is
RM70.50, that is Point estimate of μ = x = RM70.50
(b) It is known that, n = 36, μ =
x = RM70.50 and  = RM4.50
For 90% CI 90%  1   100%
1    0.90
  0.1

2
 0.05
From normal distribution table: z  z0.05  1.65
2
Hence, 90% CI:
  
 x  Z 
2 

n
 4 .5 
 70.50  1.65 

 36 
 70.50  1.24
  RM69.26 , RM71.74 
Thus, we are 90% confident that the mean price of all such college textbooks
between RM69.26 and RM 71.74.
Example 2.2:
The brightness of a television picture tube can be evaluated by measuring the
amount of current required to achieve a particular brightness level. A random
sample of 10 tubes indicated a sample mean x  317.2 microamps and a sample
standard deviation is s  15.7 microamps. Find (in microamps) a 99% confidence
interval estimate for mean current required to achieve a particular brightness
level.
Solution:
s  15.7, n  10  30, x  317.2
For 99% CI:
99%  1   100%
1    0.99
  0.01

 0.005
2
From t normal distribution table: t ,n  1  t0.005 ,9  3.250
2
 15.7 

317
.
2

t
Hence 99% CI
0.005 ,9 

 10 
 15.7 
 317.2   3.250  

 10 
  301.0645,333.3355  microamps
Thus, we are 99% confident that the mean mean current required to achieve a
particular brightness level is between 301.0645 and 333.3355
Exercise 2.1:
Taking a random sample of 35 individuals waiting to be serviced
by the teller, we find that the mean waiting time was 22.0 min and
the standard deviation was 8.0 min. Using a 90% confidence
estimate the mean waiting time for all individuals waiting in the
service line.
Answer : [19.7757, 24.2243]
Confidence Interval Estimates for the differences between two population
mean,  1  2 
2
2
i) Variance  1 and  2 are known
X
1
 X 2   Z
2
 12
n1

 22
n2
ii) If the population variances,  1 and  2 are unknown, then the following
tables shows the different formulas that may be used depending on the
sample sizes and the assumption on the population variances.
2
2
Equality of
variances,
 12 , 2 2 when
are unknown
Sample size
n1  30, n2  30
X
1
 X 2   Z
 12   2 2
2
 X1  X 2   Z S p
 12   22
2
Sp
2
2
1
n1  30, n2  30
X
2
s
s
 2
n1 n2
1
1

n1 n2
n1  1 s12   n2  1 s2 2


n1  n2  2
1
 X 2   t
2
,v
s12 s2 2

n1 n2
2
s
s2 



n1 n2 

v
2
2
 s12   s2 2 
  

n
n
 1   2 
n1  1
n2  1
2
1
X
1  X 2   t S p
2
Sp
2
2
,v
1
1

n1 n2
n1  1 s12   n2  1 s2 2


n1  n2  2
v  n1  n2  2
Example 2.3:
Two machines are used to fill plastic bottles with liquid laundry detergent.The
Standard deviations of fill volume are known to be 1  0.10 and  2  0.15 fluid
ounce for the two machines, respectively. Two random samples of n1  14
bottles from the machine 1 and n2  12 bottles from machine 2 are selected,
and the sample means fill volume are x1  30.5 and x2  29.4 fluid ounces.
Construct a 90% confidence interval on the mean difference in fill volumes.
Interpret the results.
Solution:
Machine 1:
x1  30.5
Machine 2:
x2  29.4
 1  0.10
n1  14
 2  0.15
n2  12
For 90% CI
1   100%  90
1    0.90
  0.1

 0.05
2
X
1
 X 2   Z
2
 12  2 2
0.102 0.152

  30.5  29.4   Z 0.05

n1 n2
14
12
 1.1  1.6449  0.0509 
 1.0163,1.1837 
We are 90% confidence that the mean difference to fill volumes lies
between 1.0163 and 1.1837 fluid ounces.
Example 2.4:
A study was conducted to compare the starting salaries for university
graduates majoring in computer science and engineering. A random sample of
50 recent university graduates in each major were selected and the following
information was obtained.
Major
Mean
SD
Computer Science
RM 2500
RM 100
Engineering
RM 2800
RM 150
Construct a 99% confidence interval for the difference in the mean starting
salaries for two majors.
Solution:
X
c
 X e   Z
2
sc 2 se 2
1002 1502

  2500  2800   Z 0.005

nc ne
50
50
 300   2.5758  650   365.6703, 234.3297 
We are 99% confidence that the mean difference of starting salaries for to
major lies between -365.6703 and -234.3297
Exercise 2.2:
18 male undergraduate students and 20 female undergraduate students are
randomly selected from faculty of mechanical engineering. Result for test 2
SSM 3763 shown the following data:
Male
: X M  82, S M  8
Female : X F  76, S F  6
Assume that both population are normally distributed and have equal
population variances. Construct a 95% confidence interval for the difference
in the two means.
Answer : [1.3772, 10.6228]
Example 2.5:
According to the analysis of Women Magazine in June 2005,“Stress has
become a common part of everyday life among working women in Malaysia.
The demands of work, family and home place an increasing burden on
average Malaysian women”.According to this poll, 40% of working women
included in the survey indicated that they had a little amount of time to
relax.The poll was based on a randomly selected of 1502 working women
aged 30 and above. Construct a 95% confidence interval for the
corresponding population proportion.
Solution:
Let p be the proportion of all working women age 30 and above, who have a
limited amount of time to relax, and let pˆ be the corresponding sample
proportion. From the given information,

n = 1502 , pˆ = 0.40 , qˆ =1− pˆ = 1 – 0.40 = 0.60
Hence, 95% CI :
 p̂  Z 
2
ˆpqˆ
n
 0.40  Z 0.025
 0.4  0.6   0.4  0.01264069
1502
  0.375,0.425  or 37.5% to 42%
Thus, we can state with 95% confidence that the proportion of all working
women aged 30 and above who have a limited amount of time to relax is
between 37.5% and 42.5%.
Exercise 2.3.
In a random sample of 70 automobiles registered in a
certain state, 28 of them were found to have emission
levels that exceed a state standard. Find a 95% confidence
interval for the proportion of automobiles in the state
whose emission levels exceed the standard.
Answer : [0.2852, 0.5148]
Example 2.6:
Two separate surveys were carried out to investigate whether or not the
users of Plus highway were in favour of raising the speed limit on highways.
Of the 250 car drivers interviewed, 220 were in favour of raising the speed
limit while of the 200 motorists interviewed , 180 were in favour. Find a 95%
confidence interval for the difference in proportion between the car drivers
and motorist who are in favour of raising the speed limit.
Solution:
220
180
ˆpc 
 0.88, ˆpm 
 0.9
250
200
Hence, 95% CI :
ˆpc  ˆpm  Z 
2
ˆpc  qˆ c 
nc

ˆpm  qˆ m 
nm
  0.88  0.9   Z 0.025
0.88  0.12  0.9  0.1

250
200
 0.02  1.9600  0.03
  0.0788,0.0388 
We are 95% confident that the difference between the car drivers and
motorist who are in favour of raising the speed limits lies between -0.0788
and 0.0388.
Exercise 2.4
In a test of the effect of dampness on electric connections,
100 electric connections were tested under damp
conditions and 150 were tested under dry conditions.
Twenty of the damp connections failed and only 10 of dry
ones failed. Find a 90% confidence interval for the
difference between the proportions of connections that
fail when damp as opposed to dry.
Answer : [0.0591, 0.208]
Error of estimation and choosing the
sample size
When we estimate a parameter, all we have is the
estimate value from n measurements contained in the
sample.There are two questions that usually arise:
(i) How far our estimate will lie from the true value
of the parameter?
(ii) How many measurements should be considered in
the sample?
The distance between an estimate and the estimated
parameter is called the error of estimation.
For example if most estimates are within 1.96
standard deviations of the true value of the
parameter, then we would expect the error of
estimation to be less than 1.96 standard deviations of
the estimator, with the probability approximately
equal to 0.95.
z
2
p 1  p 
B
n
Example 2.7:
The college president asks the statistics teacher to estimate the average age
of the students at their college.The statistics teacher would like to be 99%
confident that the estimate should be accurate within 1 year. From the
previous study, the standard deviation of the ages is known to be 3 years.
How large a sample is necessary?
Solution:
B  1, s  3, confidence coefficient  99%, thus  1    0.99
Z 
2
  0.01,
s
3
 B: Z 0.005 
1
n
n
From the table,
Z0.005  2.5758
2.5758 
3
1
n
n  59.71  60 student

2
 0.005
Exercise 2.5:
The diameter of a two years old Sentang tree is normally
distributed with a Standard deviation of 8 cm. how many
trees should be sampled if it is required to estimate the
mean diameter within ± 1.5 cm with 95% confidence
interval?
Answer : 109 trees
EXERCISES
Exercise
2. A tire manufacturer wishes to investigate the tread life
of its tires. A sample of 10 tires driven 50, 000 miles
revealed a sample mean of 0.32 inches of tread
remaining with a standard deviation of 0.09 inches.
Construct a 95 percent confidence interval for the
population mean. Would it be reasonable for the
manufacturer to conclude that after 50, 000 miles the
population mean amount of tread remaining is 0.30
inches?
Answer : [0.2556, 0.3844]
Exercise
2. Resin-based composites are used in restorative
dentistry. A comparison of the surface hardness of
specimens cured for 40 seconds with constant power
with that of specimens cured for 40 seconds with
exponentially increasing power. 15 specimens were cured
with each method. Those cured with constant power had
an average surface hardness (in N/mm) of 400.9 with a
standard deviation of 10.6. Those cured with exponentially
increasing powder had an average surface hardness of
367.2 with a standard deviation of 6.1. Find a 98%
confidence interval for the difference in mean hardness
between specimens cured by two methods.
Answer: [25.7804, 41.6196]
Exercise:
3. The wedding ceremony for a couple, Jamie and Robbin will be
held in Menara Kuala Lumpur. A survey has been carried out
to determine the proportion of people who will come to the
ceremony. From 250 invitations, only 180 people agree to
attend the ceremony. Find a 90% confidence interval estimate
for the proportion of all people who will attend the
ceremony.
Answer : [0.6733, 0.7767]