Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
http://cc.jlu.edu.cn/ms.html
Medical Statistics
9
Tao Yuchun
1
2014.3.25
Statistical Analysis
of
Enumeration Data
2. Statistical Inference for
enumeration data
2
2014.3.25
9.1 Sampling error of frequency
• Example
Suppose the death rate is 0.2, if the rats are fed
with a kind of poison.
What will happen when we do the experiment
on n=1, 2, 3 or 4 rat(s)?
3
2014.3.25
n
1
d
0
1
2
0
1
2
0
1
2
3
0
1
2
3
4
3
4
Frequency distribution
Sample rate
0.8
0/1=0
0.2
1/1=1
0.8×0.8=0.64
0/2=0
0.8×0.2+0.2×0.8=0.32
1/2=0.5
0.2×0.2
2/2=1
0.8×0.8×0.8=0.512
0/3=0
3(0.8×0.8×0.2)=0.384
1/3=0.3
3(0.8×0.2×0.2)=0.096
2/3=0.7
0.2×0.2×0.2=0.008
3/3=1
0.8×0.8×0.8×0.8=0.4096
0/4=0
4(0.8×0.8×0.8×0.2)=0.4096 1/4=0.25
6(0.8×0.8×0.2×0.2)=0.1536 2/4=0.5
4(0.8×0.2×0.2×0.2)=0.0256 3/4=0.75
0.2×0.2×0.2×0.2=0.0016
4/4=1
4
2014.3.25
In general,
Supposed the population proportion is , sample size
=n.
• The frequency P 
P  
X is a random variable.
n
 (1   )
P 
n
• When  is unknown and n is big enough,
approximately equal to
P
is
p(1  p)
SP 
n
5
2014.3.25
Example 9-1 HBV Surface antigen. 200 people
were tested, 7 positive.
X
7
P 
 3.5%
n 200
p(1  p)
0.035(1  0.035)
SP 

n
200
 0.0130  1.30%
6
2014.3.25
•In theory
If the sample size n is big enough, and
observed frequency is p , then we have
approximately
p(1  p)
P ~ N ( ,
)
n
7
2014.3.25
9.2 Confidence Interval of Probability
If the sample size n is big enough, and
observed frequency is p , then
 95% Confidence interval:
p(1  p)
:
p  1.96
n
 99% Confidence interval:
:
p(1  p)
p  2.58
n
8
2014.3.25
Example 9-2 HBV Surface antigen. 200 people
were tested, 7 positive. Calculate confidence
interval for the π .
p(1  p)
:
p  1.96
n
 3.5%  1.96  1.30%  0.95% ~ 6.05%
p(1  p)
:
p  2.58
n
 3.5%  2.58  1.30%  0.15% ~ 6.85%
9
2014.3.25
•Distinguish between μ and  for sampling
error and confidence interval
X
X 
x
p
n
 (1   )
p 
n

n
S
SX 
n
Sp 
p(1  p)
n
 : p  Z / 2 S p
 : X  t / 2 S X
10
2014.3.25
9.3 The hypothesis testing of proportion
(Z test)
(1) Comparison of sample proportion
and population proportion ( Onesample Z test)
Example 9-3 Cerebral infarction
Cases
Cure rate
New Method
98
50%
Routine
30%.
• 50% is sample proportion, p=50%.
• 30% is population proportion, π0=30%.
11
2014.3.25
•Hypotheses and α :
H 0 :    0  0.3
H1 :    0  0.3
α= 0.05
•Statistic Z :
Z

p 0
 0 (1   0 )
n
0 .5  0 .3
 4.32
0.3(1  0.3)
98
•Decision rule : If |Z|≥Zα , then reject H0 ;
Otherwise, no reason to reject H0 (accept H0 ).
12
2014.3.25
• Zα is : •Two sides:
•One side:
Z 0.05  1.96, Z 0.01  2.58
Z 0.05  1.65, Z 0.01  2.33
Since |Z|=4.32>Z0.05=1.96, reject H0 . New method
is better than routine.
(2) Comparison of two sample
proportions ( Two-samples Z test)
Example 9-4
Carrier rate of Hepatitis in B City: 522
people were tested, 24 carriers, p1= 4.06% (population
carrier rate: 1); in Countryside: 478 people were tested,
33 carriers, p2= 6.90% (population carrier rate: 2).
13
2014.3.25
H 0 : 1   2
H1 :  1   2
α= 0.05
X1  X 2
pc 
n1  n2
24  33
pc 
 0.057
522  478
S p1  p2 
S p1  p2
1 1
pc (1  pc )(  )
n1 n2
1
1
 0.057(1  0.057)(

)  0.0147
522 478
14
2014.3.25
• here pc is pooled estimation of two sample proportions,
Sp1-p2 is standard error of p1-p2.
•Statistic Z :
p1  p2
Z
S p1  p2
p1  p2 0.046  0.069
Z

 1.565
S p1  p2
0.0147
•Decision rule : If |Z|≥Zα , then reject H0 ;
Otherwise, no reason to reject H0 (accept H0 ).
Since |Z|=1.565<Z0.05=1.96, not reject H0 . B City is same
as Countryside for population carrier rate (1=2).
15
2014.3.25
Summary
• The parameter estimation and hypothesis testing
of proportion are based on the normal
approximation (when sample size is big enough).
• How big is enough?
By experience, n > 5 and n(1-) >5 .
For sample: np > 5 and n(1-p) >5 .
• If the sample size is not big, Z test can’t be used
and there is no t-test for proportion. (see more
detailed text book)
16
2014.3.25
 9.4 Chi-square test
The Z test can only be used for comparing
 with a given 0 (one sample) or comparing
1 with 2 (two samples).
If we need to compare more than two
samples, Chi-square test is widely used.
17
2014.3.25
(1) Basic idea of χ2 test
• Given a set of actual frequency distribution
A1, A2, A3 …
to test whether the data follow certain theory.
• If the theory is true, then we will have a set of theoretical
frequency distribution:
T1, T2, T3 …
• Comparing A1, A2, A3 … and T1, T2, T3 …, If they are
quite different, then the theory might not be true;
Otherwise, the theory is acceptable.
18
2014.3.25
(2) Chi-square test for 2×2 table
Example 9-5 Acute lower respiratory infection
Treatment
Effect
Non-effect
Total
Effect rate
Drug A
68(64.82) a
6(9.18) b
74 (a+b)
91.89 %
Drug B
52(55.18) c
11(7.82) d
63(c+d)
82.54 %
Total
120 (a+c)
17 (b+d)
137
87.59 %
H0: 1=2
H1: 1≠2
• here 1 is population effect rate
for drug A, 2 is population effect
rate for drug B.
α=0.05
19
2014.3.25

To calculate the theoretical frequencies;
If H0 is true, 1=2 120/137
T11=74120/137 =64.82, T21=63120/137=55.18
T12=7417/137 =9.18, T22=6317/137=7.82
nR nC
TRC 
nR : Row total nC : Column tot al
n
 To compare A and T by a statistic 2 ;
2
2
2
(
A

T
)
(
A

T
)
(
A

T
)
 2  11 11  12 12  ......  
T11
T12
T
20
2014.3.25
(A T)
 
T
Karl Pearson
2
2
• Chi-square test was invented
by Karl Pearson.
• Chi-square test is also called
Pearson’s chi-square test .
1857 - 1936
If H0 is true, 2 follows a chi-square distribution.
 = (row-1)(column-1)
If the 2 value is big enough, we doubt about H0 ,
then reject H0 !
21
2014.3.25
•For Example 9-5 :
2
2
2
2
(
68

64
.
82
)
(
52

55
.
18
)
(
6

9
.
18
)
(
11

7
.
82
)
2 



 2.734
64.82
55.18
9.18
7.82
 = (row-1)(column-1)=(2-1)(2-1)=1,
• 2α(ν) =20.05(1)=3.84, Now, 2 =2.734<3.84, then
P > 0.05, H0 is not rejected. We have no reason
to say the effects of two treatments are different.
•Question: What is 2α(ν) ?
Why 2 < 2α(ν) , then P > 0.05 ?
22
2014.3.25
• Chi-square distribution is a distribution for continuous
variable.
• Chi-square distribution has a parameter--  (degree of
freedom), it determines shape of 2 curve.
• The area under 2 curve is distribution of 2 probability.
χ2
ν=3
ν=5
ν=10
ν=30
The 2 curves for different 
23
2014.3.25
• The Table for 2 distribution.
• 2 critical value denotes 2α(ν) , α is probability,
ν is degree of freedom.
•The area under the 2 curve means [for20.05(1)]:
24
2014.3.25
• For 22 table, there is a specific formula of chisquare calculation:
a
b
a+b

ad  bc   n
 
a  bc  d a  c b  d 
2
c
a+c
d
c+d
b+d n
2
•For Example 9-5 :
68
52
120
6
11
17
74
63
137
(68 11  6  52) 137
 
 2.734
74  63 120 17
2
2
25
2014.3.25
• Chi-square test required large sample.
• Pearson’s chi-square test statistic follows
chi-square distribution approximately.
•For 22 table :
(1) If n≥40, and every Ti ≥5, 2 test is applicable;
(2) If n < 40 or Ti < 1, 2 test is not applicable, you
should use Fisher’s Exact Test;
(3) If n≥40, and only one 1≤Ti < 5, 2 test needs
adjustment.
26
2014.3.25
• The correction formula of 2 test for 22 table :
 A  T  0.5
2
 
2
T
2
n

 ad  bc    n
2

2
  
a  bc  d a  c b  d 
27
2014.3.25
Example 9-6 Hematosepsis
Treatment Effective
Drug A
Drug B
Total
• Here
28 (26.09)
12 (13.91)
40
No effect
Total
2 (3.91)
4 (2.09)
6
30
16
46
Effective
rate (%)
93.33
75.00
86.96
n=46>40, but
T12=306/46=3.91< 5; T22=166/46=2.09< 5.
• You should use the correction formula of 2 test
for 22 table :
28
2014.3.25
28
12
40
H 0 : 1   2 ,
2
4
6
30
16
46
H1 :  1   2
  0.05
46 2
( 28  4  2  12 
)  46
2
2 
 1.687
30  16  40  6
  (2  1)( 2  1)  1
 02.05(1)  3.84
1.687  3.84, P  0.05, H 0 is not rejected.
We have no reason to say the effects of two
treatments are different.
29
2014.3.25
(3) Chi-square test for R×C table
Example 9-7 Leukaemia
table 9-7 Blood types for the different leukaemia patients
diseases
A
B
O
AB
Total
H1
acute
leukaemia
chronic
leukaemia
Total
58
49
59
18
184
43
27
33
8
111
101
76
92
26
295
H0: The distributions of blood types in two populations are all same
H1: The distributions are not all same
30
2014.3.25

The formula of 2 test statistic for R×C table :
2


A
2
  n 
 1
 nR  nC 
nR : Row total nC : Column tot al
•For Example 9-7 :
2
2
2


58
49
8
2
  295

 ... 
 1  1.84
26 111 
 101184 76 184
ν=(R-1)(C-1)=(2-1)(4-1)=3,Checked χ20.05(3)=7.81,
now χ2=1.84<7.81,then P>0.05, H0 is not rejected. The
distributions of blood types in two populations are same.
31
2014.3.25
•Question: Why 2=1.84 < 20.05(3)=7.81, then
P > 0.05 ?
•The answer is in this figure !
32
2014.3.25
(4) Caution for Chi-square test
(1) Either 22 table or RC table are all called
contingency table. 22 table is a special case of
RC table .
(2) When R >2, “H0 is rejected”only means there
is difference among some groups. Does not
necessary mean that all the groups are different.
(3) The 2 test requires large sample :
By experience,
 The theoretical frequencies should be greater than 5
in more than 4/5 cells ;
33
2014.3.25
 The theoretical frequency in any cell should be greater
than 1.
Otherwise, we can not use chi-square test directly.
• If the above requirements are violated,
what should we do?
(1) Increase the sample size.
(2) Re-organize the categories, Pool some
categories, or Cancel some categories.
34
2014.3.25
 You should know:
Chi-square test is a very important method of
Statistical inference for enumeration data !
C
35
2014.3.25
Related documents