Download Log-linear modeling and missing data

Occurrence and timing of events depend on exposure Risk depends on exposure Exposure to the risk of an event ID 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 Age 13 14 15 16 17 18 19 20 21 22 23 13 14 15 16 17 18 19 13 14 15 16 17 18 19 20 21 13 14 15 16 17 18 19 20 Person-age record file Time-varying covariates Educ MS Exposure 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 0.5 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1 0.5 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 0 1 0 0 1 2 0 1 2 1 0.5 1 0 1 1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0.5 Age at first marriage and age at change in education: Personyears file Educ: 0 = not in school full-time 1 = secondary eduction Censoring 2 = postsecondary education Marriage [MS]: 0 = not married Event Event 1 = married Events and exposures EDUC Events Exposure Marriages 1 0 18 2 1 6 0 2 9 Total 3 33 O/E rate 0.0909 All age periods prior to marriage and age at marriage are included. Event Source: Yamaguchi, 1991, p. 22 Exposure: examples • • • • • • To risk of conception To risk of infection (e.g. malaria, HIV) To marriage To risk of divorce To risk of dying Health risk Exposure to risk Whenever an event or act gives rise to gain or loss that cannot be predicted Risk of the unexpected Williams et al., 1995, Risk management and insurance, McGraw-Hill, New York, p. 16 Exposure analysis • Being exposed or not • If exposed, level of exposure (intensity) • Factors affecting level of exposure (e.g. age, contacts, etc.) • Interventions may affect level of exposure – – – – Contraceptives and sterilisation are used to prevent unwanted pregnancies Breastfeeding prolongs postpartum amenorrhoea (PPA) Immunisation prevents (reduces) risk of infectious disease Lifestyle reduces/increases risk of lung cancer • Which mechanism(s) determines level of exposure – e.g. Breastfeeding stimulates production of prolactin hormone, which inhibits ovulation Hobcraft and Little, ?? Risk levels and differentials Risk measures Prediction of risk levels Determinants of differential risk levels Risk = potential variation in outcome (Objective) risk measures • Count: Number of events during given period (observation window) • Count data • Probability: probability of an outcome: proportion of risk set experiencing a given outcome (event) at least once • Basis = Risk set • Risk set = all persons at risk at given point in time. • Rate: number of events per time unit of exposure (person-time) • Basis: duration of exposure (duration at risk) • Rate (general) = change in one quantity per unit change in another quantity (usually time; other possible measures include space, miles travelled) Risk measures • • Difference of probabilities: p1 - p2 (risk difference) Relative risk: ratio of probabilities (focus: risk factor) • prob. of event in presence of risk factor/ prob. of event in absence of risk factor (control group; reference category): p1 / p2 • Odds: odds on an outcome: ratio of favourable outcomes to unfavourable outcomes. Chance of one outcome rather than another: p1 / (1-p1) The odds are what matter when placing a bet on a given outcome, i.e. when something is at stake. Odds reflect the degree of belief in a given outcome. Relation odds and relative risk: Agresti, 1996, p. 25 Risk measures • Odds: two categories (binary data) p Odds  1- p (Range [odds scale] : 0 ... ) ln(odds)  ln Odds 1 p  1  Odds 1  Odds -1 p  logit(p)   (range : - ,  ) 1- p exp[ ] 1 p  1  exp[ ] 1  exp[- ] In regression analysis,  is linear predictor:  = 0 + 1 x1 + 2 x2 + Parameters of logistic regression: ln(odds) and ln(odds ratio) Risk measures • Odds: multiple categories (polytomous data) Select category 3 as reference category p p p    1 Odds Odds Odds p p p 1 3 2 1 2 3 3 ln(odds 1)  ln 3 3 p  logit(p )   p 1 1 1 ln(odds 2)  ln 3 ln p1 - ln p3  1  p1  exp[1] p3 exp[1] exp[1]  p1  1  exp[1]  exp[ 2]  exp[ j] j p  logit(p )   p 2 2 3 p  [1  p  p ] 3 1 2 exp[ i ] pi   exp[ j] j Parameters of logistic regression: ln(odds) and ln(odds ratio) 2 Risk measures • Odds ratio : ratio of odds (focus: risk indicator, covariate) • odds in target group / odds in control group [reference category]: ratio of favourable outcomes in target group over ratio in control group. The odds ratio measures the ‘belief’ in a given outcome in two different populations or under two different conditions. If the odds ratio is one, the two populations or conditions are similar. Target group: k=1; Control group: k=2 Odds k 1   12 Odds k 2 p p 1k 1 1k  2 p p 2 k 1 2 k 2 Parameters of logistic regression: ln(odds) and ln(odds ratio) Risk measures in epidemiology • Prevalence: proportion (refers to status) • Incidence rate: rate at which events (new cases) occur over a defined time period [events per person-time]. Incidence rate is also referred to as incidence density (e.g. Young, 1998, p. 25; Goldhaber and Fireman, 1991). • Case-fatality ratio: proportion of sick people who die of a disease (measure of severity of disease). Is not a rate!! (Young, 1998, p. 27) Confusion: Birth defect prevalence: proportion of live births having defects Birth defect incidence: rate of development of defects among all embryos over the period of gestation (Young, 1998, p. 48) Risk measures in epidemiology • Attributable risk (among the exposed): proportion of events (diseases) attributable to being exposed: [p1-p2]/p1 (since non-exposed can also develop disease) (Subjective) risk measures • Subjective probability: degree of belief about the outcome of a trial or process, or about the future. It is the perception of the probability of an outcome or event. ‘It is highly dependent on judgment’ (Keynes, 1912, A treatise on probability, Macmillan, London). Keynes regarded probability as a subjective concept: our judgment (intuition, gut feeling) about the likelihood of the outcome. – See also Value-expectancy theory: attractiveness of an alternative (option) depends on the subjective probability of an outcome and the value or utility of the outcome (Fishbein and Ajzen, 1975). In case of multiple categories, select a reference category Reference category is coded 0 Various coding schemes! Coding schemes • Contrast coding: one category is reference category (simple contrast coding; dummy coding). Model parameters are deviations from reference category. • Indicator variable coding: indicator (0,1) variables • Cornered effect coding (Wrigley, 1985, pp. 132-136) [0,1]) • Effect coding: the mean is the reference. Model parameters are deviations from the mean. • Centred effect coding (Wrigley, 1985, pp. 132-136) [-1,+1] • Other types of coding: see e.g. SPSS Advanced Statistics, Appendix A Vermunt, 1997, p. 10 Coding schemes • Categories are coded: – Binary: [0,1], [-1,+1], [1,2] – Multiple: [0,1,2,3,..], [set of binary] e.g. 3 categories: 0 0 0 0 1 0 0 0 1 Coding schemes Selection of reference category depends on research question Example Number of young adults leaving home by age and sex, Netherlands, 1961 birth cohort Sex Age Females Males Total Early (LT 20) 135 74 209 Late (GE 20) 143 178 321 Total 278 252 530 Censored at int 13 40 53 TOTAL 291 292 583 The survey (Sept. 1987 - Febr. 1988): Sample of 583 young adults born in 1961 530 left home before survey 53 censored cases Descriptive statistics Young adults leaving home by age and sex, Netherlands, 1961 birth cohort A. Counts Age Early (LT 20) Late (GE 20) Total Females 135 143 278 Males 74 178 252 Total 209 321 530 B. Probabilities Age Early (LT 20) Late (GE 20) Total Females 0.49 0.51 1.00 Males 0.29 0.71 1.00 F+M 0.39 0.61 1.00 C. ODDS and LOGIT Age ODDS: Early/Late LOGIT:Early/late Females 0.94 -0.058 Males 0.42 -0.878 F+M 0.65 -0.429 Number of young adults leaving home by age and sex, Netherlands, 1961 birth cohort Sex Age Females Males Total Early (LT 20) 135 74 209 Late (GE 20) 143 178 321 Total 278 252 530 Reference categories: Late [20], Males Odds on leaving home early (rather than late) Logit - Males: 74/178 = 0.416 -0.877 - Females: 135/143 = 0.944 -0.058 Odds ratio (): 0.944/0.416 = 2.27 0.820 (if we bet that a person leaves home early, we should bet on females; they are the ‘winners’ - leave home early) Var() = 2 [1/135+1/143+1/74+1/178] = 0.1725 ln  = 0.819 Var(ln  ) = 1/135+1/143+1/74+1/178 = 0.0335 Selvin, 1991, p. 345 Leaving home Table Number of young adults leaving home by age and sex Females Males Total < 20 135 74 209  20 143 178 321 Total 278 252 530 Are males more likely to leave home early than females ? Dummy coding: reference category: (i) females; (ii) leaving home late Logit model: Logit pi  ln pi 1 - pi i is sex (i=1 for females and 2 for males) ODDS Females (reference): 135/143 = 0.9440 Males : 74/178 = 0.4157 ODDS RATIO ODDSmales/ODDSfemales = 0.4157/0.9440 = 0.4404 LOGIT p is ln(0.9440) = –0.05757 for females and ln(0.4157) = -0.8777 for males Ln odds ratio = -0.8201 NOTE that –0.8777 = –0.05757 – 0.8201 Relation probabilities, odds and logit Odds and probabilities 2.5 9.0 2 8.0 1.5 7.0 1 6.0 0.5 5.0 0 4.0 -0.5 3.0 -1 2.0 -1.5 1.0 -2 0.0 -2.5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Probability 0.7 0.8 0.9 odds Logit Odds 10.0 1.0 logit Risk analysis: models Prediction of risk levels and differentials risk levels Probability models and regression models – Counts  Poisson r.v.  Poisson distribution  Poisson regression / log-linear model – Probabilities  binomial and multinomial r.v.  binomial and multinomial distribution  logistic regression / logit model (parameter p, probability of occurrence, is also called risk; e.g. Clayton and Hills, 1993, p. 7) – Rates  Occurrences/exposure  Poisson r.v.  log-rate model

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Log-linear modeling and missing data