Download Lecture 5(May 14)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Ch9. Inferences Concerning
Proportions
Outline
Estimation of Proportions
Hypothesis concerning one Proportion
Hypothesis concerning several proportions
Analysis of r*c tables
Goodness of fit
Estimation
In acceptance sampling we are concerned with
the proportion of defectives in a lot, and in life
testing we are concerned with the percentage of
certain components which will perform
satisfactorily during a stated period of time.
It should be clear from these examples that
problems concerning proportions, percentages,
or probabilities are really equivalent.
Estimation
The point estimator of the population
proportion, itself, is usually the sample
proportion X/n.
If the n trials satisfy the assumptions
underlying the binomial distribution(P105),
we know the mean and the standard
deviation of the number of success is
given by np and np(1  p)
Estimator
The mean and the standard deviation of
the proportion of success (namely, of the
sample proportion) are given by
np
p
n
and
np(1  p)

n
p(1  p)
n
The first of these results shows that the sample
proportion is an unbiased estimator of the binomial
parameter p.
Confidence interval
Construction of confidence interval for the
binomial parameter p (estimator).
We first define x0 and x1 such that
x0
 b(k; n, p)   / 2 and
k 0
n
 b(k ; n, p)   / 2
k  x1
Thus, we assert with a probability of approximate 1   ,
and at least 1   , that the inequality
x0 ( p)  x  x1 ( p)
EX
Suppose we want to find approximate 95%
confidence interval for p for samples of
size n=20.
x0 and x1 can be determined by
B( x0 ; n, p)  0.025
1  B( x1  1; n, p)  0.025
p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
x0 0 1 3
5 7 9 11 14
x1 6
9 11 13 15 17 19 20 -
Confidence interval
For n is large
We construct approximate confidence intervals for the
binomial parameter p by using the normal approximation
to the binomial distribution. With the probability 1  
 z / 2
X  np

 z / 2
np(1  p)
This yields
x
 z / 2
n
x
x
x
x
(1  )
(1  )
x
n
n  p z
n
n
 /2
n
n
n
EX.
If x=36 of n=100 persons interviewed are
familiar with the tax incentives for installing
certain energy-saving devices, construct a
95% confidence interval for the
corresponding true proportion.
Solution: x/n =36/100=0.36
hence 0.36 1.96 0.36(1100 0.36)  p  0.36 1.96 0.36(1100 0.36)
0.266  p  0.454
Maximum error
The error when we use X/n as estimator of
p is given by |X/n -p|
Again using the normal distribution, we
can assert with probability 1  
that
the inequality
|
X
p(1  p)
 p | z / 2
n
n
Maximum error of estimate
E  z / 2
p(1  p)
n
EX
In a sample survey conducted in a large
city, 136 of 400 persons answered yes to
the question of whether their city’s public
transportation is adequate. With 99%
confidence, what can we say about the
maximum error if x/n=0.36 is used as an
estimate of the corresponding true
proportion?
E  z / 2
p(1  p)
0.34  0.66
 2.575
 0.061
n
400
Sample size determine
If p is known
z / 2 2
n  p(1  p)(
)
E
If p is unknown
1 z / 2 2
n (
)
4 E
9.2 Hypothesis
The test of null hypothesis that a
proportion equals some specified constant
is widely used in sampling inspection,
quality control, and reliability verification.
Statistic for large sample test concerning p
Null hypothesis p  p0
X  np0
Z
np0 (1  p0 )
Criterion Region for testing
(Large sample)
Alternative
hypothesis
Reject null
hypothesis if
p  p0
Z   z
p  p0
Z  z
p  p0
p  p0
Z   z / 2 or Z  z / 2
EX
In a study designed to investigate whether
certain detonator used with explosives in
coal mining meet the requirement that at
least 90% will ignite the explosive when
charged, it is found that 174 of 200
detonators function properly. Test the null
hypothesis p=0.9 again the alternative
p<0.9 at the 0.05 level of significance.
Solution
1. Null hypothesis: p  0.9
Alternative hypothesis p  0.9
2. Level of significance: 0.05
3. Criterion: Reject the null hypothesis if Z<-1.645
4. Calculation:
X  np0
174  200(0.9)
Z

 1.41
np0 (1  p0 )
200(0.9)(0.1)
5. The null hypothesis cannot be rejected.
6. P-value: 0.079 > level of significance 0.05
Hypothesis concerning several
proportions
We compare the consumer response to
two different products, when we decide
whether the proportion of defectives of a
given process remains constant from day
to day.
Testing
p1  p2 
 pk  p
Large-sample test
We require independent random samples of size n1 , n2 , , nk
if the corresponding number of successes are
X1 , X 2 , , X k the test we should use is based on the fact
X i  npi
that
Z 
1) Large samples the sampling distribution of i
npi (1  pi )
is approximately the standard normal distribution
2)
3)
The square of random variable having the standard
normal distribution with 1 degree of freedom
The sum of k independent random variables having
chi-square distribution with 1 degree of freedom is a
random variable having the chi-square distribution with
k degrees of freedom. (Proves are not required)
Cont.
2
(
x

n
p
)
2   i i i
i 1 ni pi (1  pi )
k
Is a value of random variable having approximately the
chi-square distribution with k degrees of freedom.
In practice, we substitute for the pi, which under the
null hypothesis are all equal, the pooled estimate
x1  x2 
pˆ 
n1  n2 
 xk
 nk
The null hypothesis should be rejected if the difference
ˆ are large, the critical region is
between the xi and ni p
 2  2
where the number of degrees of freedom is k-1.
Another approach
successes
Failures
Total
Sample1
...
x1
...
n1  x1
n1
...
..
Sample k
xk
nk  xk
nk
Total
x
n x
n
Define the observed cell frequency
oij : i  1, 2, j  1,
,k
The expected number of successes and failures for
the j-th sample are estimated by
e1 j  n j  pˆ
and
thus
2
k
 2  
i 1 j 1
e2 j  n j  (1  pˆ )
(oij  eij )2
eij
Samples of three kind of materials, subjected to
extreme temperature changes, produced the results
should in the following table
A
B
C
Total
successes 41
27
22
90
Failures
79
53
78
210
Total
120
80
100
300
Use the 0.05 level of significance to test whether, the
probability of crumbling is the same of the three kinds
of materials.
Solution
1. Null hypothesis: p1  p2  p3
Alternative hypothesis: are not all equal
2. Level of significance: 0.05
3. Criterion: Reject the null hypothesis if  2  5.991 ,
degree 2
4. Calculation:
pˆ 
90
3

300 10
e11 
90 120
 36
300
e12 
90  80
 24
300
  4.575
2
5. The null hypothesis cannot be rejected.
e
 30
13
Statistics for test concerning
difference between two proportions
Z
X1 X 2

n1 n2
1 1
pˆ (1  pˆ )(  )
n1 n2
For large samples, is a random variable having
approximately the standard normal distribution.
Confidence interval for the values of p1  p2
x1 x2
  z / 2
n1 n2
x1
x
x2
x
(1  1 )
(1  2 )
n1
n1
n
n2
 2
n1
n2
9.4 Analysis of r*c tables
The key random variable
r
c
 2  
i 1 j 1
(oij  eij )2
eij
is chi-square distribution with (r-1)(c-1) degrees
of freedom
9.5 Goodness of fit
Goodness of fit: try to compare an
observed frequency distribution with the
corresponding values of an expected, or
theoretical, distribution.
2
(
o

e
)
2   i i
ei
i 1
k
is a random variable has the chi-square distribution
with k-m degrees of freedom, where k is the number
of terms in the formula and m is the number of
quantities, obtained from the observed data that are
needed to calculate the expected frequencies.
EX
Page 312~313.