Download Chapter 4 Statistical inferences

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 4
Statistical Inferences
Estimation
Chapter 4
Statistical Inference
 Estimation
-Confidence interval estimation for mean and
proportion
-Determining sample size
 Hypothesis Testing
-Test for one and two means
-Test for one and two proportions
Statistical Inference
Statistical inference is a process of drawing an inference about
the data statistically. It concerned in making conclusion about
the characteristics of a population based on information
contained in a sample. Since populations are characterized by
numerical descriptive measures called parameters, therefore,
statistical inference is concerned in making inferences about
population parameters.
ESTIMATION
In estimation, there are two terms that firstly, should be
understand. The two terms involved in estimation are
estimator and estimate.
An estimate of a population parameter may be expressed in
two ways: point estimate and interval estimate.
Point Estimate
A point estimate of a population parameter is a single value
of a statistic. For example, the sample mean x is a point
estimate of the population mean μ. Similarly, the sample
proportion p̂ is a point estimate of the population
proportion p.
Interval estimate
An interval estimate is defined by two numbers, between
which a population parameter is said to lie.
For example, a < x < b is an interval estimate of the
population mean μ. It indicates that the population mean is
greater than a but less than b.
Point estimators
Choosing the right point estimators to estimate a parameter
depends on the properties of the estimators it selves. There
are four properties of the estimators that need to be satisfied
in which it is considered as best linear unbiased estimators.
The properties are:
 Unbiased
 Consistent
 Efficient
 Sufficient
Confidence Interval
• A range of values constructed from the sample data. So that
the population parameter is likely to occur within that range at a
specified probability.
• Specified probability is called the level of confidence.
• States how much confidence we have that this interval
contains the true population parameter. The confidence level is
denoted by (1-α)×100%
• Example :- 95% level of confidence would mean that if 100
confidence intervals were constructed, each based on the
different sample from the same population, we would expect 95
of the intervals to contain the population mean.
 To compute a confidence interval, we will consider
two situations:
i. We use sample data to estimate, μ with X and the
population standard deviation, σ is known.
ii. We use sample data to estimate, μ with X and the
population standard deviation is unknown. In this
case, we substitute the sample standard deviation
(s) for the population standard deviation σ
Example 2.1:
Find 95% confidence interval for a population mean for these
values :
a) n  36, x  13.3, s 2  3.42
2
b) n  64, x  2.73, s  0.1047
a) 1st Step: 1   100  95
1    0.95
  0.05
  0.025
2
2nd Step: Find from table.
Z 0.025  1.96
3rd Step: Use formula.
CI  x  Z
2
s
n
 1.8493 
CI  13.3  (1.96) 

36


= 13.3  0.6041
= 12.6959,13.9041
4rd step :
Conclusion:
95% confidence interval of mean lies in between
12.6959 to 13.9041.
Example 2.3:
The brightness of a television picture tube can be
evaluated by measuring the amount of current
required to achieve a particular brightness level. A
random sample of 10 tubes indicated a sample mean
317.2 microamps and a sample standard deviation is
15.7microamps. Find (in microamps) a 99%
confidence interval estimate for mean current
required to achieve a particular brightness level.
Solution:
s  15.7
x  317.2
For 99% CI:
s  15.7, n  10  30, x  317.2
99%  1   100%
1    0.99
  0.01

 0.005
2
From t normal distribution table:
t ,n 1  t0.005 ,9  3.250
2
 15.7 
Hence 99% CI  317.2  t
0.005 ,9 

10


 15.7 
 317.2   3.250  

 10 
  301.0645,333.3355 microamps
Thus, we are confident that 99% of the mean current required to
achieve a particular brightness level is between 301.0645 and
333.3355
Exercise 2.1:
Taking a random sample of 35 individuals waiting to be
serviced by the teller, we find that the mean waiting
time was 22.0 min and the standard deviation was 8.0
min. Using a 90% confidence level, estimate the mean
waiting time for all individuals waiting in the service
line.
Answer : [19.7757, 24.2243]
Exercise:
The mean and standard deviation of the maximum loads
supported by sample of 60 cables are given 11.09 tons and
0.73 tons. Find 95% confidence interval of the mean of the
maximum loads all cables produced by company.
Example 2.5:
According to a poll, 40% of working women says that they feel
stress in working. The poll was based on a randomly selected of
1502 working women aged 30 and above. Construct a 95%
confidence interval for the corresponding population
proportion.
Solution:
Let p be the proportion of all working women age 30 and above,
who have a limited amount of time to relax, and let pˆ be the
corresponding sample proportion. From the given information,
n = 1502 , pˆ = 0.40 , qˆ =1− pˆ = 1 – 0.40 = 0.60
Hence, 95% CI :
ˆpqˆ
 0.4  0.6 
p̂  Z 
2
n
 0.40  Z 0.025
 0.4  1.96
1502
 0.4  0.6 
1502
 0.4  0.0248
  0.3752 , 0.4248  or 37.52% to 42.48%
Thus, we can state with 95% confidence that the proportion of all working
women aged 30 and above who have stress is between 37.52% and
42.48%.
Exercise 2.3
In a random sample of 70 automobiles registered in a
certain state, 28 of them were found to have emission
levels that exceed a state standard. Find a 95%
confidence interval for the proportion of automobiles
in the state whose emission levels exceed the standard.
Answer : [0.2852, 0.5148]
Error of estimation and
choosing the sample size
• When we estimate a parameter, all we have is the estimate
value from n measurements contained in the sample. There
are two questions that usually arise:
• (i) How far our estimate will lie from the true value of the
parameter?
• (ii) How many measurements should be considered in the
sample?
The distance between an estimate and the estimated
parameter is called the error of estimation.
For example if most estimates are within 1.96 standard
deviations of the true value of the parameter, then we would
expect the error of estimation to be less than 1.96 standard
deviations of the estimator, with the probability approximately
equal to 0.95.
In the process of determining the sample size, we
have to determine ethe parameter to be estimated and
the standard error of its point estimator. Firstly,
choose the bound, B on the margin of errror and
confidence coefficient (1-α). Then, use the following
equation to find for the suitable sample size, n.
z 
n    /2  , where n is rounded up to the nearest number.
 E 
2
z
2
p 1  p 
B
n
E = margin of error
Example 2.6:
The college president asks the statistics teacher to estimate the
average age of the students at their college. The statistics teacher
would like to be 99% confident that the estimate should be
accurate within 1 year. From the previous study, the standard
deviation of the ages is known to be 3 years.
How large a sample is necessary?
Solution: B  1,
s  3, confidence coefficient  99%, thus,
1    0.99
  0.01,

2
 0.005
From the table,
Z0.005  2.5758
s
3
Z 
 B: Z 0.005 
1
n
n
2
3
2.5758 
1
n
n  59.71  60 student
Exercise 2.5:
The diameter of a two years old Sentang tree is normally
distributed with a Standard deviation of 8 cm. How many
trees should be sampled if it is required to estimate the
mean diameter within ± 1.5 cm with 95% confidence
interval?
Answer : 110 trees
EXERCISES
Exercise 2.6
A tire manufacturer wishes to investigate the tread life
of its tires. A sample of 10 tires driven 50, 000 miles
revealed a sample mean of 0.32 inches of tread
remaining with a standard deviation of 0.09 inches.
Construct a 95 percent confidence interval for the
population mean. Would it be reasonable for the
manufacturer to conclude that after 50, 000 miles the
population mean amount of tread remaining is 0.30
inches?
Answer : [0.2556, 0.3844]
Exercise: 2.8
The wedding ceremony for a couple, Jamie and Robbin will be
held in Menara Kuala Lumpur. A survey has been carried out
to determine the proportion of people who will come to the
ceremony. From 250 invitations, only 180 people agree to
attend the ceremony. Find a 90% confidence interval estimate
for the proportion of all people who will attend the ceremony.
Answer : [0.6733, 0.7767]
Chapter 4
Statistical Inferences
Hypothesis Testing
WHY WE HAVE TO DO THE
HYPOTHESIS?
• To make decisions about populations based on the sample
information.
• Example :- we wish to know whether a medicine is really effective
to cure a disease. So we use a sample of patients and take their data
in effect of the medicine and make decisions.
• To reach the decisions, it is useful to make assumptions about the
populations. Such assumptions maybe true or not and called the
statistical hypothesis.
Definitions
Hypothesis Test:
• It is a process of using sample
data and statistical procedures to
decide whether to reject or not
to reject the hypothesis
(statement) about a population
parameter value (or about its
distribution characteristics).
Null Hypothesis, H0
• Generally this is a statement that a population has a
specific value. The null hypothesis is initially
assumed to be true. Therefore, it is the hypothesis to
be tested.
Alternative Hypothesis, H1
• It is a statement about the same population
parameter that is used in the null hypothesis and
generally this is a statement that specifies that the
population parameter has a value different in some
way, from the value given in the null hypothesis.
The rejection of the null hypothesis will imply the
acceptance of this alternative hypothesis.
Test Statistic:
• It is a function of the sample data on which the decision is
to be based.
Critical/ Rejection region:
• It is a set of values of the test statistics for which the null
hypothesis will be rejected.
Critical point:
• It is the first (or boundary) value in the critical region.
P-value:
• The probability calculated using the test statistic. The
smaller the p-value is, the more contradictory is the data to
H0
Procedure for hypothesis testing
1.
Define the question to be tested and formulate a hypothesis for
stating the problem.
H o :  a or   a or   a
H1:  a or   a or  > a
2. Choose the appropriate test statistic and calculate the sample statistic value.
The choice of test statistics is dependent upon the probability distribution of
the random variable involved in the hypothesis.
3. Establish the test criterion by determining the critical value and critical
region.
4. Draw conclusions, whether to accept or to reject the null hypothesis.
Hypothesis tests for a normal
population mean, μ
Tails of a Test
Sign
Sign
Rejection Region
Two-Tailed
Test
=

In both tail
Left-Tailed
Test

Right-Tailed
Test

<
>
In the left tail In the right tail
Example 4.7:
A sample of 50 Internet shoppers were asked how much
they spent per year on Internet. From this sample, mean
expenses per year on Internet is 30460 and sample
standard deviation is 10151. It is desired to test whether
they spend in mean expenses is RM32500 per year or
not. Test at α = 0.05.
Solution:
The hypothesis tested are:
H 0 :   32500 (claim)
H1 :   32500
Test Statistic:
30460  32500
2040

 1.4212
10150
1435.4268
50
Critical Value : As two tailed (=), so alpha has to divide by two,becomes :
Z
Rejection Region:  0.05 / 2  0.025
Z 0.025  1.96  1.96, 1.96
Ztest Shoppers
 1.4212
 Z 0.025
 1.96per year on the
Conclusion: The Internet
spend
RM32500
Internet.
Do not reject H
0
Example 4.8:
A random sample of 10 individuals who
listen to radio was selected and the hours
per week that each listens to radio was
determined. The data are follows:
  0.01
9 8 7 4 8 6 8 8 9 10
Test a hypothesis if mean hours individuals
listen to radio is less than 8 hours at
.
Solutions:
The hypothesis tested are:
H0 :   8
H1 :Statistic:
  8 (claim)
Test
n < 30
x  7.7,s  1.7029
7.7  8
0.3

 0.5571
1.7029 0.5385
Critical Value:10
ttest 
Rejection Region:
t0.01,101  t0.01,9  2.8214
Conclusion :
ttest  0.5571  t0.01,9  2.8214
Mean hours individuals listen
to radio is greater or equal to 8 hours.
Do not reject H 0
Exercise 4.1:
A paint manufacturing company claims that the mean drying time for its
paint is at most 45 minutes. A random sample of 35 trials tested. It is found
that the sample mean drying time is 49.50 minutes with standard deviation
3 minutes. Assume that the drying times follow a normal distribution. At
1% significance level, is there any sufficient evidence to support the
company claim? (Ans: 8.8741, Reject)
9.4 Population Proportion, p
Test Statistic
Null hypothesis
:
Z
pˆ  p0
p0 q0
n
Alternative hypothesis
Rejection Region
H 0 : p  p0
H1 : p  p0
H1 : p  p0
H1 : p  p0
Z  z
H1 : p  p0
H1 : p  p0
Z<  z
Z   z
2
or Z  z
2
50
Example
When working properly, a machine that is used to make chips for
calculators produce 4% defective chips. Whenever the machine
produces more than 4% defective chips it needs an adjustment.
To check if the machine is working properly, the quality control
department at the company often takes sample of chips and inspects
them to determine if they are good or defective. One such random
sample of 200 chips taken recently from the production line
contained 14 defective chips. Test at the 5% significance level
whether or not the machine needs an adjustment.
Solutions:
The hypothesis tested are:
H 0 : p  0.04
H1 : p  0.04 (claim)
Test Statistic:
pˆ 
Z test
14
 0.07
200
0.07  0.04

 2.1651
(0.04)(0.96)
200
Z0.05  1.6449
Critical Value:
Rejection Region : Ztest  2.1651  Z0.05  1.6449
Reject H0
Conclusion : Machine needs adjustment
Exercise 4.3:
A manufacturer of a detergent claimed that his detergent is least 95%
effective is removing though stains. In a sample of 300 people who had
used the detergent and 279 people claimed that they were satisfied
with the result. Determine whether the manufacturer’s claim is true
at 1% significance level.
Answer: Do not Reject
EXERCISES
1. A paint manufacturing company claims that the mean drying time for its paint
is at most 45 minutes. A random sample of 20 trials tested. It is found that the
sample mean drying time is 49.50 minutes with standard deviation 3 minutes.
Assume that the drying times follow a normal distribution.
(a) Construct a 99% confidence interval for the mean drying
time of the
paint.
(b) At 5% significance level, is there any sufficient evidence to support the
company claim?
(c) Suppose that another manufacturing company wants to estimate the mean
drying time for its paints at 95% confidence level. Given , what is the sample
size of trials required in order to obtain an estimate that is within maximum
error of 3 minutes?
2.
A truck loaded with 8000 electronic circuit boards has just pulled
into a firm’s receiving dock. The supplier claims that no more than 4%
of the electronic circuit boards fall outside the most rigid level of
industry performance specifications. In a simple random sample of 300
electronic circuit boards from the shipment, 15 fall outside these
specifications.
(a) Construct the 95% confidence interval for the percentage of all
boards in this shipment that fall outside the specifications.
(b) Test whether the supplier’s claim would appear to be correct at 10%
significance level.