Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Supplement: Splus Programs
Logistic Regression:
>options(contrasts=c("contr.treatment", "contr.ploy"))
>import.data(DataFrame="smoke.data",FileName="c://smoke.data.txt",FileType=
"ASCII")
>student.smoke=as.matrix(smoke.data[,1:2])
>parents.smoke=factor(smoke.data[,3])
>smoke.d=data.frame(student.smoke,parents.smoke)
>smoke.glm=glm(student.smoke~parents.smoke,data=smoke.d,
family=binomial(link=logit),na.action=na.fail)
>coefficient<-summary(smoke.glm)$coefficients
>anova(smoke.glm,test="Chisq")
Log-linear Model:
>ship.type=factor(rep(c("A","B","C","D","E"),each=8))
>year.constr=factor(rep(rep(c("1960-64","1965-69","1970-74",
"1975-79"),each=2),5))
>period.oper=factor(rep(rep(c("1960-74","1975-79"),4),5))
>months.serv=c(127,63,1095,1095,1512,3353,0,2244,44882,17176,28609,20370,
7064,13099,0,7117,1179,552,781,676,783,1948,0,274,251,105,
288,192,349,1208,0,2051,45,0,789,437,1157,2161,0,542)
>damage.number=c(0,0,3,4,6,18,0,11,39,29,58,53,12,44,0,18,1,1,0,1,6,2,0,1,0
,0,0,0,2,11,0,4,0,0,7,7,5,12,0,1)
>ship.dataframe=data.frame(damage.number,months.serv,ship.type,
year.constr,period.oper)
>log.months.serv=log(months.serv)
>log.months.serv[months.serv==0]=0
#fit log-linear model
>log.ship.dataframe=data.frame(damage.number,log.months.serv,
ship.type,year.constr,period.oper)
>ship.glm=glm(damage.number~offset(log.months.serv)+ship.type+
year.constr+period.oper,data=log.ship.dataframe,
family=poisson(link=log),na.action=na.fail)
>table.6.3=round(summary(ship.glm,dispersion=1.69)$coefficients[,1:2],2)
1
Supplement: Exponential Family
Definition (canonical exponential family):
The exponential family has the density in the canonical form,
s
. f x1 , x2 ,, x p ;1 , 2 ,, s f x, exp j T j x A hx
j 1
Example 1:
Let X ~ N , 2 . Then,
f x; ,
2
x x 2
x 2
exp
exp 2 2
2
2
2
2
2
1
2 log 2 2
2
2
2
where
1
2 1
2
1 2 , T1 x x, 2 2 , T2 x x , A 2 log 2 2 .
2
2
2
Important Result 1:
E Tj X
A
j
and
cov T j X , Tk X
2 A
.
j k
Example 2:
Let X ~ Binomial , 2 . Then,
n
n
n x
f x; p p x 1 p exp x log p n x log 1 p
x
x
n
p
n log 1 p
exp x log
1 p
x
2
where
n
p
, T1 x x, hx
1 log
1 p
x
p
1
1 exp 1
exp 1
1 p
1 p
A1 n log 1 p n log 1 exp 1
log 1 exp 1 log 1 p
Thus,
p
A1 n exp 1
1 p
ET1 X E X
n
np,
1
1
1 exp 1
1 p
2 A1 n exp 1 n exp 1 exp 1
VarT1 X CovT1 X , T1 X
1 exp 1
12
1 exp 1 2 .
np np 2 np1 p
Important Result 2:
The moment generating function for T X T1 X T2 X Ts X is
M T t M T t1 , t 2 ,, t s Eexp t1T1 X t 2T2 X t sTs X
exp A t A
and the cumulant generating function is
T t log M T t A t A .
Example 3:
Let X ~ Poisson . Then,
x exp
1
f x;
exp x log
x!
x!
where
1 log , T1 x x, A1 exp 1 .
3
The moment generating function of T1 X X is
M T t exp A1 t A1 exp exp 1 t exp 1
exp exp 1 exp t 1 exp exp t 1
and the cumulant generating function is
T t expt 1 .
Note that
T' 0 exp t t 0 E X
and
T' 0 exp t t 0 Var X .
Important Result 3:
T X T1 X T2 X Ts X is distributed according to an exponential
family with density
s
f t1 , t 2 ,, t s ;1 , 2 ,, s f t , exp j t j A h t .
j 1
Important Result 4:
T X T1 X T2 X Ts X is sufficient and complete.
4
Supplement: Analysis of Variance
Regression Treatment of One-Way Classification:
I
Model 1:
Let
Yij u ti ij , i 1,2, , I ; j 1,2, , J i , J i ti 0.
i 1
i u t i .Then, Yij i ij .
The regression model in matrix form is
Y11 1
Y
12 1
Y
1J 1 1
Y21 0
Y
22 0
Y
Y 0
2J2
Y 0
I1
YI 2 0
YIJ I 0
11
0 0
0 0
12
0 0
1J 1
1 0
1 21
1 0 22
2
X
1 0 2 J 2
I
0 1
I1
0 1
I2
0 1
IJ I
J1 0
0 J
2
t
X X
0 0
0
0
,
JI
5
J1
Y1 j
Jj 1 J 1Y1
2
J Y
Y
t
2
j
X Y j 1 2 2 ,
JI
J
Y
Y I I
I j
j 1
Ji
where Yi
Y
j 1
ij
. Then,
Ji
Y1
Y
1
b X t X X tY 2
YI
Thus,
SS (b) b t X t Y Y1
J 1Y1
I
J
Y
2
2
YI
J i Yi 2
i 1
J
Y
I I
Y2
and
RSS (model I) residual sum of square Y t Y SS (b)
Ji
I
I
Yij2 J i Yi 2
i 1 j 1
i 1
Ji
I
Ji
I
Y Yi 2
2
ij
i 1 j 1
Ji
I
i 1 j 1
Y Yi
i 1 j 1
2
ij
Y
Ji
I
2
i 1 j 1
Yi
2
ij
since
Y
Ji
j 1
ij
Yi Y 2YijYi Yi Y Yi 2Yi Yij
2
Ji
2
ij
j 1
Ji
2
Ji
j 1
2
ij
Ji
2
Ji
j 1
Ji
Y Yi 2Yi J i Yi Y Yi 2Yi 2
j 1
Ji
2
ij
2
Yij2 Yi 2
j 1
j 1
6
2
ij
2
j 1
As test
H 0 : 1 2 I u H 0 : t1 t 2 t I 0 ,
the reduced model under H 0 is
Yij u ij (model 1) .
Thus,
Ji
I
Yˆij Y
Y
I
J
i 1
ij
i 1 j 1
i
SS (u ) regression sum of square under H 0 nY 2 ,
RSS (model 1) residual sum of square under H 0
Ji
I
i 1 j 1
Yij Yˆij
Y
Ji
I
2
ij
i 1 j 1
Y
2
RSS (model 1) RSS (model I) Yij Y Yij Yi
Ji
I
I
2
i 1 j 1
I
Ji
i 1 j 1
2
2
ij
2 J i Yi Yi Y J i Yi J i Y
I
2
i 1
2Y Y Y Y
i 1
ij
i 1 j 1
J 2Y
i
Ji
I
2
I
2
2
i 1 j 1
Y 2YijY Y Y 2YijYi Yi
2
ij
Ji
2
i
i
i
2Yi Y Yi 2 Y 2
J i Yi 2 2Yi Y Y 2 J i Yi Y
I
i 1
I
2
i 1
Then, the F statistic is
J Y
I
RSS (model 1) RSS (model I)
I 1
F
RSS (model I)
nI
i
i 1
i
Y
2
I 1
Y
I
Ji
i 1 j 1
ij
Yi
nI
As H 0 is true, F ~ FI 1,n I .
7
2
2
Y 2
Homework
1. The following table refers to 661 children with birth weights 650 g and 1749
g all of whom survived for at least one year. The variables of interest are:
Cardiac: mild heart problems of the mother during pregnancy
Comps: gynaecological problems during pregnancy
Smoking: mother smoked at least one cigarette per day during the first
months of pregnancy.
BW: was the birth weight less than 1250
Cardiac
Yes
No
Comps
Yes
No
Yes
No
Smoking Yes
No
Yes
No
Yes
No
Yes
No
BW Yes
10
25
12
15
18
12
42
45
No
7
5
22
19
10
12
202 205
Analyze the data and interpret the relationship of the children weights and
mother’s habits and health conditions.
2. The data given in Splus build in data frame Insurance (in the library MASS)
consist of the numbers of policy-holders, Holders, the numbers of car insurance
claims made by those policyholders, Claims. There are three explanatory
variables, District (four levels), Group (of car, four levels), and Age (four
ordered levels). Please analyze the data up to the three way interaction with
offset log(Holders). What are the factors in determining the number of claims?
8