Download Log-linear modeling and missing data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Occurrence and timing
of
events
depend
on
exposure
Risk depends on exposure
Exposure to the risk of an event
ID
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4
Age
13
14
15
16
17
18
19
20
21
22
23
13
14
15
16
17
18
19
13
14
15
16
17
18
19
20
21
13
14
15
16
17
18
19
20
Person-age record file
Time-varying covariates
Educ MS Exposure
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
2
0
1
2
0
1
2
0
1
2
0
1
2
0
0.5
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
0
0
1
0
1
0.5
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
0
0
1
0
0
1
2
0
1
2
1
0.5
1
0
1
1
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
1
0.5
Age at first marriage and age at
change in education: Personyears file
Educ: 0 = not in school full-time
1 = secondary eduction
Censoring
2 = postsecondary education
Marriage [MS]: 0 = not married
Event
Event
1 = married
Events and exposures
EDUC
Events Exposure
Marriages
1
0
18
2
1
6
0
2
9
Total
3
33
O/E rate
0.0909
All age periods prior to marriage
and age at marriage are included.
Event
Source: Yamaguchi, 1991, p. 22
Exposure: examples
•
•
•
•
•
•
To risk of conception
To risk of infection (e.g. malaria, HIV)
To marriage
To risk of divorce
To risk of dying
Health risk
Exposure to risk
Whenever an event or act gives rise
to gain or loss that cannot be
predicted
Risk of the unexpected
Williams et al., 1995, Risk management and insurance, McGraw-Hill, New York, p. 16
Exposure analysis
• Being exposed or not
• If exposed, level of exposure (intensity)
• Factors affecting level of exposure
(e.g. age, contacts, etc.)
• Interventions may affect level of exposure
–
–
–
–
Contraceptives and sterilisation are used to prevent unwanted pregnancies
Breastfeeding prolongs postpartum amenorrhoea (PPA)
Immunisation prevents (reduces) risk of infectious disease
Lifestyle reduces/increases risk of lung cancer
• Which mechanism(s) determines level of exposure
– e.g. Breastfeeding stimulates production of prolactin hormone, which inhibits
ovulation
Hobcraft and Little, ??
Risk levels and differentials
Risk measures
Prediction of risk levels
Determinants of differential risk levels
Risk = potential variation in outcome
(Objective) risk measures
•
Count: Number of events during given period (observation window)
• Count data
•
Probability: probability of an outcome: proportion of risk set
experiencing a given outcome (event) at least once
• Basis = Risk set
• Risk set = all persons at risk at given point in time.
•
Rate: number of events per time unit of exposure (person-time)
• Basis: duration of exposure (duration at risk)
• Rate (general) = change in one quantity per unit change in another quantity
(usually time; other possible measures include space, miles travelled)
Risk measures
•
•
Difference of probabilities: p1 - p2 (risk difference)
Relative risk: ratio of probabilities (focus: risk factor)
• prob. of event in presence of risk factor/ prob. of event in absence of risk
factor (control group; reference category): p1 / p2
• Odds: odds on an outcome: ratio of favourable outcomes to
unfavourable outcomes. Chance of one outcome rather than
another: p1 / (1-p1)
The odds are what matter when placing a bet on a given outcome, i.e. when
something is at stake. Odds reflect the degree of belief in a given outcome.
Relation odds and relative risk: Agresti, 1996, p. 25
Risk measures
• Odds: two categories (binary data)
p
Odds 
1- p
(Range [odds scale] : 0 ... )
ln(odds)  ln
Odds
1
p

1  Odds 1  Odds -1
p
 logit(p)   (range : - ,  )
1- p
exp[ ]
1
p

1  exp[ ] 1  exp[- ]
In regression analysis,  is linear predictor:  = 0 + 1 x1 + 2 x2 +
Parameters of logistic regression: ln(odds) and ln(odds ratio)
Risk measures
• Odds: multiple categories (polytomous data)
Select category 3 as reference category
p
p
p



1
Odds
Odds
Odds
p
p
p
1
3
2
1
2
3
3
ln(odds 1)  ln
3
3
p
 logit(p )  
p
1
1
1
ln(odds 2)  ln
3
ln p1 - ln p3  1  p1  exp[1] p3
exp[1]
exp[1]

p1 
1  exp[1]  exp[ 2]  exp[ j]
j
p
 logit(p )  
p
2
2
3
p  [1  p  p ]
3
1
2
exp[ i ]
pi 
 exp[ j]
j
Parameters of logistic regression: ln(odds) and ln(odds ratio)
2
Risk measures
• Odds ratio : ratio of odds (focus: risk indicator, covariate)
• odds in target group / odds in control group [reference category]: ratio of
favourable outcomes in target group over ratio in control group. The odds
ratio measures the ‘belief’ in a given outcome in two different
populations or under two different conditions. If the odds ratio is one, the
two populations or conditions are similar.
Target group: k=1; Control group: k=2
Odds k 1 

12
Odds
k 2
p
p
1k 1
1k  2
p
p
2 k 1
2 k 2
Parameters of logistic regression: ln(odds) and ln(odds ratio)
Risk measures in epidemiology
• Prevalence: proportion (refers to status)
• Incidence rate: rate at which events (new cases)
occur over a defined time period [events per
person-time]. Incidence rate is also referred to as
incidence density (e.g. Young, 1998, p. 25;
Goldhaber and Fireman, 1991).
• Case-fatality ratio: proportion of sick people who
die of a disease (measure of severity of disease). Is
not a rate!! (Young, 1998, p. 27)
Confusion:
Birth defect prevalence: proportion of live births having defects
Birth defect incidence: rate of development of defects among all embryos
over the period of gestation (Young, 1998, p. 48)
Risk measures in epidemiology
• Attributable risk (among the exposed): proportion
of events (diseases) attributable to being exposed:
[p1-p2]/p1 (since non-exposed can also develop
disease)
(Subjective) risk measures
• Subjective probability: degree of belief about the outcome of a
trial or process, or about the future. It is the perception of the
probability of an outcome or event. ‘It is highly dependent on
judgment’ (Keynes, 1912, A treatise on probability, Macmillan, London).
Keynes regarded probability as a subjective concept: our judgment
(intuition, gut feeling) about the likelihood of the outcome.
– See also Value-expectancy theory: attractiveness of an
alternative (option) depends on the subjective probability of an
outcome and the value or utility of the outcome (Fishbein and
Ajzen, 1975).
In case of multiple categories,
select a reference category
Reference category is coded 0
Various coding schemes!
Coding schemes
• Contrast coding: one category is reference
category (simple contrast coding; dummy coding).
Model parameters are deviations from reference category.
• Indicator variable coding: indicator (0,1)
variables
• Cornered effect coding (Wrigley, 1985, pp. 132-136) [0,1])
• Effect coding: the mean is the reference. Model
parameters are deviations from the mean.
• Centred effect coding (Wrigley, 1985, pp. 132-136) [-1,+1]
• Other types of coding: see e.g. SPSS Advanced
Statistics, Appendix A
Vermunt, 1997, p. 10
Coding schemes
• Categories are coded:
– Binary: [0,1], [-1,+1], [1,2]
– Multiple: [0,1,2,3,..], [set of binary]
e.g. 3 categories:
0 0 0
0 1 0
0 0 1
Coding schemes
Selection of reference category
depends on research question
Example
Number of young adults leaving home
by age and sex, Netherlands, 1961 birth cohort
Sex
Age
Females
Males
Total
Early (LT 20)
135
74
209
Late (GE 20)
143
178
321
Total
278
252
530
Censored at int
13
40
53
TOTAL
291
292
583
The survey (Sept. 1987 - Febr. 1988):
Sample of 583 young adults born in 1961
530 left home before survey
53 censored cases
Descriptive statistics
Young adults leaving home
by age and sex, Netherlands, 1961 birth cohort
A. Counts
Age
Early (LT 20)
Late (GE 20)
Total
Females
135
143
278
Males
74
178
252
Total
209
321
530
B. Probabilities
Age
Early (LT 20)
Late (GE 20)
Total
Females
0.49
0.51
1.00
Males
0.29
0.71
1.00
F+M
0.39
0.61
1.00
C. ODDS and LOGIT
Age
ODDS: Early/Late
LOGIT:Early/late
Females
0.94
-0.058
Males
0.42
-0.878
F+M
0.65
-0.429
Number of young adults leaving home
by age and sex, Netherlands, 1961 birth cohort
Sex
Age
Females
Males
Total
Early (LT 20)
135
74
209
Late (GE 20)
143
178
321
Total
278
252
530
Reference categories: Late [20], Males
Odds on leaving home early (rather than late)
Logit
- Males: 74/178 = 0.416
-0.877
- Females: 135/143 = 0.944
-0.058
Odds ratio (): 0.944/0.416 = 2.27
0.820
(if we bet that a person leaves home early, we should bet on females; they are the
‘winners’ - leave home early)
Var() = 2 [1/135+1/143+1/74+1/178] = 0.1725
ln  = 0.819
Var(ln  ) = 1/135+1/143+1/74+1/178 = 0.0335 Selvin, 1991, p. 345
Leaving home
Table
Number of young adults leaving home by age and sex
Females
Males
Total
< 20
135
74
209
 20
143
178
321
Total
278
252
530
Are males more likely to leave home early than females ?
Dummy coding: reference category: (i) females; (ii) leaving home late
Logit model:
Logit pi  ln
pi
1 - pi
i is sex (i=1 for females and 2 for males)
ODDS
Females (reference): 135/143 = 0.9440
Males : 74/178 = 0.4157
ODDS RATIO
ODDSmales/ODDSfemales = 0.4157/0.9440 = 0.4404
LOGIT p is ln(0.9440) = –0.05757 for females and
ln(0.4157) = -0.8777 for males
Ln odds ratio = -0.8201
NOTE that –0.8777 = –0.05757 – 0.8201
Relation probabilities, odds and logit
Odds and probabilities
2.5
9.0
2
8.0
1.5
7.0
1
6.0
0.5
5.0
0
4.0
-0.5
3.0
-1
2.0
-1.5
1.0
-2
0.0
-2.5
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Probability
0.7
0.8
0.9
odds
Logit
Odds
10.0
1.0
logit
Risk analysis: models
Prediction of risk levels and differentials risk levels
Probability models and regression models
– Counts  Poisson r.v.  Poisson distribution  Poisson
regression / log-linear model
– Probabilities  binomial and multinomial r.v.  binomial and
multinomial distribution  logistic regression / logit model
(parameter p, probability of occurrence, is also called risk; e.g. Clayton and
Hills, 1993, p. 7)
– Rates  Occurrences/exposure  Poisson r.v.  log-rate
model