Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multilevel Regression Models sean f. reardon 17 june, 2004 Outline I. II. III. IV. What/Why Multilevel Regression? The Basic Multilevel Regression Model Growth Models A Taste of Advanced Topics Part I.A. What are multilevel data and multilevel analysis? What are multilevel data? •Multilevel data are data where observations are clustered in units •Observations within the same unit may be more similar than observations in separate units, on average –What effect does this have on estimation and statistical inference? Examples of multilevel data with contextual clustering • Observations of students, clustered within schools • Observations of siblings, clustered within families • Observations of individuals, clustered within countries, states, or neighborhoods Examples of multilevel data with intra-person clustering • Repeated test scores, clustered within students • Multiple measures of a latent construct, clustered within persons Other examples of multilevel data • Patients, clustered within doctors • Coefficient estimates, clustered within studies (meta-analysis) • Widget sizes, clustered within factories • And so on… What is multilevel regression analysis? • Also called – Hierarchical Linear Models – Mixed Models – Multilevel Models – Growth Models – Slopes-as-Outcomes Models Multilevel Regression Models • A form of regression models • Used to answer questions about the relationship of context to individual outcomes • Used to estimate both within-unit and betweenunit relationships (and cross-level interactions) – e.g., within- vs. between-school relationships between SES and achievement Part I.B. What’s wrong with OLS? The OLS Model Yi 0 1 X i i i ~ N 0, 2 Assumptions of OLS • • • • Linearity Errors are normally distributed Errors are homoskedastic Errors are uncorrelated/independent – Knowing the error term for one observation is not informative of the error term of any other observation Some Example Data • Data from Early Childhood Longitudinal Study-Kindergarten Cohort (NCES, 19982004) – Longitudinal study of 21,000 kindergarten students in K class of 1998-99 – Followed through fifth grade (2003-04) ECLS-K data • Subsample – 399 kindergarten students – sampled from 17 schools • Math Score: – Fall kindergarten math test scores – Administered 2-3 months into school year • Age – Age in months at time of math assessment – Ranges from 60-79 months What is the relationship between age and math scores? • Note: this is NOT a growth model • It is a cross-sectional model • A growth model requires repeated measures, so we can observe intraindividual growth OLS Regression: Math on Age . reg math age Source | SS df MS -------------+-----------------------------Model | 1765.41947 1 1765.41947 Residual | 22896.5737 420 54.5156517 -------------+-----------------------------Total | 24661.9932 421 58.5795563 Number of obs F( 1, 420) Prob > F R-squared Adj R-squared Root MSE = = = = = = 422 32.38 0.0000 0.0716 0.0694 7.3835 -----------------------------------------------------------------------------math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .4956666 .0871016 5.69 0.000 .3244572 .666876 _cons | -10.91381 6.049008 -1.80 0.072 -22.80391 .9762943 ------------------------------------------------------------------------------ OLS Regression: Math on Age Next look at the residuals from this model. Are they homoscedastic? Normally distributed? Independent? Math IRT vs. Age in Months: 10 20 30 40 50 Whole Sample With OLS Regression Line 60 65 70 Age in Months Math IRT Scale Scores 75 OLS Predicted Value 80 60 212 213 216 376 379 435 452 460 466 481 488 621 635 643 0 20 40 60 0 20 40 60 0 20 40 196 60 1119 1212 Total 0 20 40 60 665 65 60 65 70 75 80 60 65 70 75 80 60 65 70 75 80 60 65 70 75 80 Age in Months Math IRT Scale Scores Graphs by fall school identification number OLS Predicted Value 70 75 80 Math IRT vs. Age in Months: 10 15 20 25 30 School #216 60 65 70 Age in Months Math IRT Scale Scores OLS--Sample 75 Math IRT vs. Age in Months: 20 30 40 50 School #1212 65 70 Age in Months Math IRT Scale Scores OLS--Sample 75 • Residuals look correlated with each other within schools • Formal test of this dependence – ANOVA Random Effects ANOVA One-way Analysis of Variance for resid: Residuals Number of obs = R-squared = 422 0.1638 Source SS df MS F Prob > F ------------------------------------------------------------------------Between s_id1 3750.169 17 220.59817 4.65 0.0000 Within s_id1 19146.405 404 47.392091 ------------------------------------------------------------------------Total 22896.574 421 54.386161 Intraclass Asy. correlation S.E. [95% Conf. Interval] -----------------------------------------------0.13487 0.05204 0.03287 0.23687 Estimated SD of s_id1 effect Estimated SD within s_id1 Est. reliability of a s_id1 mean (evaluated at n=23.44) 2.718128 6.884191 0.78517 60 212 213 216 376 379 435 452 460 466 481 488 621 635 643 0 20 40 60 0 20 40 60 0 20 40 196 60 60 1119 1212 Total 0 20 40 665 60 65 70 75 80 60 65 70 75 80 60 65 70 75 80 60 65 Age in Months Math IRT Scale Scores OLS--Sample OLS--School Graphs by fall school identification number 70 75 80 65 70 75 80 Math IRT vs. Age in Months: 15 20 25 30 35 Within-school fitted lines 60 65 70 Age in Months within-school fitted regression lines 75 80 OLS--Whole Sample Consequences of NonIndependence of Residuals • The computation of standard errors in OLS depends on the assumption of independence of errors • If errors are not independent, then standard errors will, in general, be too small (so the probability of Type I errors is larger than it should be) Two extreme examples • n individuals observed from each of K schools (total of nK observations) • if Yik= Yjk for all i and j in school k, then knowing k completely determines Y, so there are really only K unique observations • In this case, we can just treat each school as a single observation (with outcome Y.k), and use OLS on the sample of K schools Two extreme examples • n individuals observed from each of K schools (total of nK observations) • if YikYjk for all i and j and k, then knowing k tells us nothing about Y, so there are really nK unique observations • In this case, there is no dependence of the errors, so we can use OLS on the sample of nK students. When do we need multilevel regression? • In the intermediate case, where knowing the school gives us some, but not complete information about Y. • e.g., test scores vary both within and between schools • e.g., individuals vary within and between neighborhoods • e.g., mood varies both within individuals (over time) and between individuals Intermission I Part II.A. Farewell OLS What we know so far • Two observations within the same unit may be more similar than two observations chosen at random • If the regression model does not explain all of the between-unit differences (and it is unlikely that they will), we will have correlated errors within units • This is a violation of the independence of residuals assumption in OLS • At a minimum, this results in incorrect standard errors (too small) How do we allow dependence in the regression model? • We want a model that explicitly allows the level of the outcome variable to vary across level-two units • For example, we want to let the mean reading score differ across schools • So let’s write a model that allows this Some notation • i indexes level-one units (people within schools, observations within persons) • j indexes level-two units (e.g., schools, if we have students nested within schools) • We will use r to denote a level-one residual, and u to denote a level-two residual Farewell OLS: Our first multilevel model • Instead of : Yij 0 1 X ij ij • Let’s write: Yij 0 1 X ij u j rij Farewell OLS: Our first multilevel model Yij 0 1 X ij u j rij Farewell OLS: Our first multilevel model Yij 0 1 X ij u j rij Outcome for observation i in unit j Farewell OLS: Our first multilevel model Yij 0 1 X ij u j rij Outcome for observation i in unit j Intercept Farewell OLS: Our first multilevel model Yij 0 1 X ij u j rij Outcome for observation i in unit j Intercept Value of X for observation i in unit j Coefficient Farewell OLS: Our first multilevel model Yij 0 1 X ij u j rij Outcome for observation i in unit j Residual term specific to unit j Intercept Value of X for observation i in unit j Coefficient Farewell OLS: Our first multilevel model Yij 0 1 X ij u j rij Outcome for observation i in unit j Residual term specific to observation i in unit j Residual term specific to unit j Intercept Value of X for observation i in unit j Coefficient Farewell OLS: Our first multilevel model Yij 0 1 X ij u j rij Outcome for observation i in unit j Residual term specific to observation i in unit j Residual term specific to unit j Intercept Value of X for observation i in unit j Coefficient What is uj? • • • • • A residual term Specific to unit j Common to all observations in unit j Subscript j, no subscript i Interpretation: the difference between the overall intercept and the intercept in unit j What is rij? • A residual term • Specific to observation i in unit j • Has a mean of 0, so any part of ij that is common to all observations within j has been removed • So the rij’s may be independent • Not guaranteed to be independent Features of this model • Note that: ij = uj + rij • We also have: Var(ij) = Var(uj + rij) = Var(uj) + Var(rij) + 2*Cov(uj,rij) = Var(uj) + Var(rij) • We will come back to variance decomposition later Features of this model • The level of Yij – after adjusting for Xij – may vary across the units • We have made no assumptions yet about the distribution of the uj’s or the rij’s. • The relationship between X and Y does not depend on j (1 does not depend on j) So how do we estimate this model? • We want an estimate of 1 , the relationship between Xij and Yij. • Two approaches: – Fixed Effects estimator – Random Effects estimator Another way to write this model Yij 0 1 X ij u j rij 0 u j 1 X ij rij 0 j 1 X ij rij where 0 j 0 u j The fixed effects estimator Yij 0 j 1 X ij rij • We have ‘absorbed’ the level-two error terms (the uj’s) into the intercept • Now each aggregate unit has its own intercept; so between-unit variation is accounted for in the intercepts • This solves the dependence problem with the rij’s (they may still not be independent, but not because of unexplained variation between-level-two units) The fixed effects estimator Yij 0 j 1 X ij rij • Three methods of obtaining the fixed effects estimator 1 from this model: – Dummy variables for each unit – Change or difference scores – Deviations from mean unit values • All three are mathematically equivalent • All can be estimated via OLS, with some adjustment of the degrees of freedom 15 20 25 30 35 Math: Within School Slopes (Fixed-Effects) vs. OLS 60 65 70 Age in Months Fixed-Effects Model 75 OLS 80 The random effects estimator • We treat the variance between units as consisting of parameter variance (true variance between units) and error variance (extra variance produced because of sampling) • We treat units with larger samples as having more reliable estimated unit means The random effects estimator • So our estimate of the unit mean for a particular unit is a weighted average of the unit mean estimated in a fixed effects model and the overall mean—our estimates are shrunken toward the grand mean • Let’s see a picture of this: Math: Within School Slopes vs. OLS 30 25 20 15 15 20 25 Math IRT Scale Score 30 35 Fixed-Effects 35 Random Effects 60 65 70 Age in Months Xb + u[s_id1] 75 80 Fitted values 60 65 70 Age in Months Xb + u[s_id1] 75 80 Fitted values Part II.B. The basic multilevel model The basic multilevel model • Random effects ANOVA is the simplest random effects model • The random effects model is a very simple kind of multilevel model • So we are building up here to the multilevel model The multilevel model as a random effects model •We write the random effects model as: Yij = 0 + uj + rij •We can also write it as: Yij = 0j + rij . . The multilevel model as a random effects model •We write the random effects model as: Yij = 0 + uj + rij •We can also write it as: Yij = 0j + rij . . Level-1 model The multilevel model as a random effects model •We write the random effects model as: Yij = 0 + uj + rij •We can also write it as: Yij = 0j + rij 0j = 00 + uj . Level-1 model The multilevel model as a random effects model •We write the random effects model as: Yij = 0 + uj + rij •We can also write it as: Yij = 0j + rij 0j = 00 + uj Level-1 model Level-2 model (here we’re using the 00 notation where before we used 0; this is the notation of HLM) HLM Notation (Null Model) • Level-1 model: Yij = 0j + rij • Level-2 model 0j = 00 + uj • Mixed model: Yij = 00 + uj+ rij HLM Notation • Level-1 model: Yij = 0j + 1jXij + rij • Level-2 model: 0j = 00 + uj 1j = 10 • Mixed model: Yij = 00 + 10Xij +uj+ rij HLM Notation • Mixed model: Yij = 00 + 10Xij +uj+ rij Structural part of the model HLM Notation • Mixed model: Yij = 00 + 10Xij +uj+ rij Structural part of the model Stochastic (random) part of the model HLM Notation • Mixed model: Yij = 00 + 10Xij +uj+ rij Fixed Effects HLM Notation • Mixed model: Yij = 00 + 10Xij +uj+ rij Fixed Effects Random Effect Part II.C. Variance Decomposition Reminder: The unconditional (null) random effects model •The one-way random effects ANOVA model: Yij = 00 + uj + rij Mixed (composite) model •We can also write it as: Yij = 0j + rij Level-1 model Level-2 model 0j = 00 + uj •Useful as a baseline model •Allows us to decompose the variance Variance decomposition • Var(Yij) = Var(uj) + Var(rij) = 00 + 2 • Intraclass Correlation (): the proportion of the total variance in Yij that is between level2 units = 00 /(00 + 2) Multilevel Analyses • Analytic Problems: – Explain variation in means across units – Estimate within- and between-unit relationships – Distinguish contextual from compositional variation in means across units – Explain how and why within-unit relationships differ across units Explaining variation in means across units • Why do some schools have higher mean achievement levels? • Why do some hospitals have lower mortality rates? • Why do some countries have higher infant mortality rates? Explaining variation in means across units • Means-as-outcomes regression (MLM) Yij = 0j + rij 0j = 00 + 01Wj + uj Note: why don’t we just compute the means of Y in each unit and use OLS at level 2? where Wj is a variable indicating some characteristic of unit j (no i subscript) – Wj may be inherent to level-2 • School curriculum, doctor/patient ratio, regime type – Wj may be a compositional property of unit j • School racial composition, patient diagnosis composition, average maternal education level Explaining variation in means across units • Called “means-as-outcomes” because the Wj’s can only explain mean differences in Yij across units (Wj only predicts the intercept, not the slope) • uj is now a level-2 residual • We can compute an R2 at both levels of the model: – 00 from the null model is the total level-2 variance that can be explained – 2 R between – 2 R within = [00 (null) - 00 (model)]/[00 (null)] = [2 (null) - 2(model)]/[2 (null)] Example (null model) The outcome variable is MATH1 Final estimation of fixed effects: ---------------------------------------------------------------------------Standard Approx. Fixed Effect Coefficient Error T-ratio d.f. P-value ---------------------------------------------------------------------------For INTRCPT1, B0 INTRCPT2, G00 20.369682 0.129474 157.327 867 0.000 ---------------------------------------------------------------------------Final estimation of variance components: ----------------------------------------------------------------------------Random Effect Standard Variance df Chi-square P-value Deviation Component ----------------------------------------------------------------------------INTRCPT1, U0 3.25337 10.58439 867 3733.93573 0.000 level-1, R 6.46230 41.76128 ----------------------------------------------------------------------------- Intraclass correlation: = 10.58439 / (10.58439 + 41.76128) = 0.202 Example: Means-as-outcomes model The outcome variable is MATH1 Final estimation of fixed effects: ---------------------------------------------------------------------------Standard Approx. Fixed Effect Coefficient Error T-ratio d.f. P-value ---------------------------------------------------------------------------For INTRCPT1, B0 INTRCPT2, G00 19.585257 0.133633 146.560 866 0.000 S_PRIVAT, G01 3.691165 0.286361 12.890 866 0.000 ---------------------------------------------------------------------------- Final estimation of variance components: ----------------------------------------------------------------------------Random Effect Standard Variance df Chi-square P-value Deviation Component ----------------------------------------------------------------------------INTRCPT1, U0 2.87151 8.24560 866 3070.97445 0.000 level-1, R 6.46291 41.76917 ----------------------------------------------------------------------------- Level-2 variance explained: R2 = (10.58439 - 8.24560) / (10.58439) = 0.221 Part II.D. Variable Centering Variable Centering • An important topic with major implications for fitting and interpreting multilevel models… Variable Centering • An important topic with major implications for fitting and interpreting multilevel models… • …which we will not have time to cover today. Part II.E. Random Coefficients and the Full 2-Level Model Individual-Level Model Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij Slope on X1 for unit j Intercept for unit j Outcome for person i in unit j Slope on X2 for unit j Slope on XK for unit j Contextual Model Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij 0j = 00 1j = 10 2j = 20 … Kj = K0 In OLS, the intercept and slopes are fixed – they are the same in all units Contextual Questions • Does the intercept vary across units? • Can we predict the intercepts using level-2 covariates (Z’s)? • Do the slopes vary across units? • Can we predict the slopes using level-2 covariates (Z’s)? Does the intercept vary across units? Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij 0j = 00 + u0j 1j = 10 2j = 20 … Kj = K0 In the random effects model, the intercept varies around some grand mean intercept (00), and the slopes are fixed – they are the same in all units Test H0: Var(u0j) = 0 Can we predict the intercepts? Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij 0j = 00 + 01Z1 + 02Z2 + … + 0MZM + u0j 1j = 10 Here, the Zm’s predict the 2j = 20 intercept. … Kj = K0 Test H0: 0m = 0 Do the slopes vary across units? Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij 0j = 00 + u0j 1j = 10 + u1j 2j = 20 + u2j The intercept and each of the slopes varies around thei grand means (the k0’s) … Kj = K0 + uKj Test H0: Var(ukj) = 0 Can we predict the slopes? Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij 0j = 00 + 01Z1 + 02Z2 + … + 0MZM + u0j 1j = 10 + 11Z1 + 12Z2 + … + 1MZM + u1j 2j = 20 + 21Z1 + 22Z2 + … + 2MZM + u2j … Kj = K0 + K1Z1 + K2Z2 + … + KMZM + uKj Here, the Zm’s predict the slopes. Test H0: km = 0 Example • ECLS-K Fall Kindergarten data • 8,799 white and black students in 807 schools (618 public, 189 private schools) • SES measured by standarized SES variable • Outcome is Fall K math score Research Questions • What is the within-school relationship between race and SES and math scores? • Do average math scores vary across schools? • Are math scores higher in private schools? • Does the relationship between SES and math scores vary across schools? • Is the relationship between SES and math scores lower in private schools? Example (cont.) • See HLM command files lecture8[a-g].hlm and corresponding HLM output files lecture8[a-g].txt • We will meet in the lab 2/11/04. Intermission II Part III.A. Growth Modeling Growth Models • Allow us to model development over time and to investigate correlates of betweenperson variation in growth trajectories over time. • The fitted model describes an expected growth trajectory for each person, rather than a single expected value on an outcome measure Examples • Modeling inter-individual changes in some outcome – School achievement, income, attitudes • Modeling growth in national characteristics – GNP, population, etc • Modeling change in organizational outcomes – Business profits, hospital mortality rates 45 268 314 442 514 569 624 723 918 949 978 1105 1542 1552 1653 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 9 11 12 13 14 15 11 12 13 14 15 age Graphs by id 11 12 13 14 15 11 12 13 14 15 45 268 314 442 514 569 624 723 918 949 978 1105 1542 1552 1653 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 9 11 12 13 14 15 11 12 13 14 15 age Graphs by id 11 12 13 14 15 11 12 13 14 15 The Growth Model • Made up of a within-unit model of change and a between-unit model of interindividual variation in change • Requires repeated measures of outcome within each unit • Requires multilevel error structure since errors are likely not independent Within-unit model of change • Outcome varies as a function of time Yit = fi(time) + rit • Simple case: f is linear: Yit = 0i + 1i(timeit) + rit Within-unit model of change • Outcome varies as a function of time Yit = fi(time) + rit • Simple case: f is linear: Yit = 0i + 1i(timeit) + rit Intercept for unit i Within-unit model of change • Outcome varies as a function of time Yit = fi(time) + rit • Simple case: f is linear: Yit = 0i + 1i(timeit) + rit Intercept for unit i Growth slope for unit i 45 268 314 442 514 569 624 723 918 949 978 1105 1542 1552 1653 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 9 11 12 13 14 15 11 12 13 14 15 age Graphs by id 11 12 13 14 15 11 12 13 14 15 4 3 2 1 0 11 12 13 age 14 15 Between-unit model • Model between-unit differences in growth trajectories Yit = 0i + 1i(timeit) + eit 0i = 00 + 01(Xi) + r0i 1i = 10 + 11(Xi) + r1i Between-unit model of intercept Between-unit model of slope 4 3 2 1 0 11 12 13 age 14 15 Within-unit model for change • Simple case: f is linear: Yit = 0i + 1i(timeit) + rit • Need to specify zero-point for time – Pick a point that is interpretable and substantively meaningful for your study – e.g., age; time in school, time since institution opened, calendar time, etc… – Affects estimation and interpretation of the intercept Defining time TOLERANCEit = 0i + 1i(AGEit) + eit 0i = 00 + r0i 1i = 10 + r1i Final estimation of fixed effects: ---------------------------------------------------------------------------Standard Approx. Fixed Effect Coefficient Error T-ratio d.f. P-value ---------------------------------------------------------------------------For INTRCPT1, B0 INTRCPT2, G00 -0.081187 0.511521 -0.159 15 0.876 For TIME slope, B1 INTRCPT2, G10 0.130812 0.043074 3.037 15 0.009 ---------------------------------------------------------------------------- Defining time TOLERANCEit = 0i + 1i(AGEit-11) + eit 0i = 00 + r0i 1i = 10 + r1i Final estimation of fixed effects: ---------------------------------------------------------------------------Standard Approx. Fixed Effect Coefficient Error T-ratio d.f. P-value ---------------------------------------------------------------------------For INTRCPT1, B0 INTRCPT2, G00 1.357750 0.074445 18.238 15 0.000 For TIME slope, B1 INTRCPT2, G10 0.130812 0.043074 3.037 15 0.009 ---------------------------------------------------------------------------- Modeling Inter-personal variation in growth trajectories TOLERANCEit = 0i + 1i(AGEit-11) + eit 0i = 00 + 01(MALEi) + r0i 1i = 10 + 11(MALEi) + r1i Final estimation of fixed effects: ---------------------------------------------------------------------------Standard Approx. Fixed Effect Coefficient Error T-ratio d.f. P-value ---------------------------------------------------------------------------For INTRCPT1, B0 INTRCPT2, G00 1.355556 0.102740 13.194 14 0.000 MALE, G01 0.005016 0.155328 0.032 14 0.975 For TIME slope, B1 INTRCPT2, G10 0.102333 0.058323 1.755 14 0.101 MALE, G11 0.065095 0.088177 0.738 14 0.473 ---------------------------------------------------------------------------- Parameters of the growth model Yit = 0i + 1i(timeit) + eit 0i = 00 + 01(Xi) + r0i 1i = 10 + 11(Xi) + r1i Structural parameters of the growth model 0i: true intercept for individual i 1i: true slope for individual i 00: population average intercept (for individuals with X=0) 01: population average difference in level-one intercept for individuals with one unit difference in X 10: population average slope (for individuals with X=0) 01: population average difference in level-one slope for individuals with one unit difference in X Stochastic (random) parameters of the growth model Var(eit) = e2: level–1 residual variance Var(r0i) = 02: level–2 residual variance in true intercept (0i) Var(r1i) = 12: level–2 residual variance in true slope (1i) Cov(r0i, r1i) = 01: level–2 residual covariance in true intercept (0i) and true slope (1i) Part III.B. Additional Issues in Growth Modeling Growth Modeling Issues • • • • • • • Timing of observations Centering the time variable Variable numbers of observations Missing observations Time-varying covariates Slope-intercept covariances Non-linear growth curves Timing of observations • If each person (unit) has the same number of observations, and if the timing of observations is the same for all units,the data (and design) are said to be balanced. • In this case, the growth model is equivalent to a repeated measures ANOVA • But if not, the growth model is mode flexible than the repeated measures ANOVA model Centering time in growth models • Group-mean centering time results in unbiased estimate of average within-person growth rate • Any other centering of time results in biased estimate of average within-person growth rate if individuals’ mean times are correlated with their mean outcomes • In balanced design, mean time is the same for all persons, so centering does not affect slope estimate • In unbalanced design, it may be necessary to center time • Centering time affects interpretation of the intercept Variable numbers of observations • Number and timing of observations may differ by design or because of missingness • Types of missingness (see S&W p. 157-159): – MCAR: missing completely at random – CDD: covariate dependent dropout – MAR: missing at random • The growth model estimates are unbiased under any of these types of missingness Time-varying covariates • So far, we have considered only the relationship between stable person-level covariates and the growth trajectory • What about time-varying covariates? • A time-varying covariate is a covariate whose value changes over time • Examples from S&W chapter 5 Examining the covariance matrix in growth models Yit = 0i + 1i(timeit) + eit 0i = 00 + 01(Xi) + r0i 1i = 10 + 11(Xi) + r1i Var(r0i) = 02: level–2 residual variance in true intercept (0i) Var(r1i) = 12: level–2 residual variance in true slope (1i) Cov(r0i, r1i) = 01: level–2 residual covariance in true intercept (0i) and true slope (1i) The slope-intercept covariance • Do individuals with high initial values of Y have faster growth rates of Y? – Is r0i correlated with r1i? • It depends on how we center the time variable • It is possible to observe any correlation (-1 to +1) between r0i and r1i depending where we define the intercept. CNLSY reading fitted growth curves PIAT reading score 65.00 52.50 40.00 27.50 15.00 -0.50 1.00 2.50 4.00 Age (centered at 6.5 years) 5.50 Positive mood fitted growth curves Positive mood level 250.00 212.50 175.00 137.50 100.00 0 1.67 3.33 5.00 Time (days since start of treatment) 6.67 Non-linear growth curves • So far we have assumed that each individual’s growth trajectory is linear (constant growth rate over time) • Now we consider cases where growth may be non-linear – Polynomial curve – Piecewise linear – Discontinuous Polynomial growth curves • Quadratic growth trajectory: Yit = 0i + 1i(timeit) + 2i(time2it) + eit 18 16 14 Hourly Wage ($) 0i = 00 + r0i 1i = 10 + r1i 2i = 20 + r2i Wage Growth 12 10 8 6 4 2 0 18 20 22 24 Age 26 28 30 Piecewise linear growth curves • Piecewise growth trajectory: Yit = 0i + 1i(time1it) + 2i(time2it) + eit 18 16 14 Hourly Wage ($) 0i = 00 + r0i 1i = 10 + r1i 2i = 20 + r2i Wage Growth 12 10 8 6 4 2 0 18 20 22 GED 24 Age 26 28 30 Discontinuous growth curves • Discontinuous growth trajectory: Yit = 0i + 1i(timeit) + 2i(eventit) + eit 18 16 14 Hourly Wage ($) 0i = 00 + r0i 1i = 10 + r1i 2i = 20 + r2i Wage Growth 12 10 8 6 4 2 0 18 20 22 GED 24 Age 26 28 30 Discontinuous growth curves • Discontinuous growth trajectory (with timevarying growth rate): Yit = 0i + 1i(timeit) + 2i(eventit) + 3i(timeit*eventit) + eit Wage Growth 18 16 14 Hourly Wage ($) 0i = 00 + r0i 1i = 10 + r1i 2i = 20 + r2i 3i = 30 + r3i 12 10 8 6 4 2 0 18 20 22 GED 24 Age 26 28 30 Intermission III Part IV.A. A Taste of Advanced Topics 3+ Level Models Examples of 3-level data • • • • • Students > classrooms > schools Students > schools > districts Patients > Doctors > Hospitals Children > Families > Neighborhoods Repeated observations > individuals > contexts Data may have more than 3 levels—but the more levels, the more data needed to model relationships. • For example: Repeated obs > students > classrooms > schools > districts > countries > planets … The 3-level null model Yijk 0 jk eijk Level 1 model 0 jk 00k r0 jk Level 2 model 00k 000 u00k Level 3 model Yijk 000 u00k r0 jk eijk True grand mean intercept Level-3 error/residual term: deviation of cluster k mean from grand mean Level-1 error/residual: deviation of observation i from cluster j mean Level-2 error/residual term: deviation of cluster j mean from cluster k mean 3-level variance decomposition • 02 = Var(eijk): true within level-1 variance (variance within j, between i) • 002 = Var(r0jk): true level-2 variance (variance within k, between j) • 0002 = Var(u00k): true between k level-3 (variance within k, between j) • Var(Yijk)= 02 + 002 + 0002 3-level variance decomposition • Remember the ICC from the 2-level model: Proportion of true variance in Y that lies between clusters – ICC = 002 /(02 + 002) • We apply the same logic to the 3-level model: • Proportion of total variance that lies between level-3 units: 0002 /(02 + 002 + 0002 ) • Proportion of level-1 + level –2 variance that lies between level-2 units: 002 /(02 + 002 ) The 3-level growth model Ytjk 0 jk 1 jk timetjk etjk 0 jk 00k 01k X jk r0 jk 1 jk 10k 11k X jk r1 jk 00k 01k 10k 11k 000 001Z k u00k 010 100 101Z k u10k 110 Part IV.B. A Taste of Advanced Topics: Meta-Analysis Meta-Analysis • The problem: we often have a lot of studies, each trying to estimate the same parameter – effect of small classes on learning rates – effect of welfare receipt on income, maternal depression, child welfare, etc. Multiple studies • Suppose we conducted a number of similar studies to estimate the effect of treatment T on outcome Y. • Each study gives us an estimate of d, the standardized effect of T on Y. • The estimates of d may vary across studies – Why? • We would like to estimate the true average effect of T in the population Possible reasons for varying estimates across studies • The d’s may vary because of sampling variance (each study is conducted with a different sample) • The d’s may vary because of differences in the populations of each study sample • The d’s may vary because of differences in the study design (e.g., different instruments) • The d’s may vary because of differences in the treatment studied (differences in implementation, duration, etc.) Approach 1 • In each study, we fit a regression model to estimate the treatment effect 1: Yi 0 1 Ti ri • But the treatment effect may vary across studies: 1 j 10 u1 j • Here 10 is the true mean effect of T across studies, and u1j is the deviation of the effect in study j from this mean. Approach 1 • If we had access to all the data from each study, we could fit a multilevel model: Yij 0 j 1 j Tij rij 0 j 00 u0 j 1 j 10 u1 j • But what if we don’t have access to all the original data? Approach 2 • Recall, that in each study, we have a regression model like this: Yi 0 1 Ti ri • Typically, each study will report the estimate of 1 and some measure of its sampling variance (its standard error) • Remember (lecture 4) that we estimate the grand mean by weighting individual estimates by their precision (the inverse of the variance of the estimate) • We use the standard errors of the study-specific effect estimates to construct these weights, so we don’t need the original data. Key Points • We need from each study an estimate of the treatment effect and its sampling variance (standard error) • The treatment effects must be measured in the same metric across all studies Part IV.C. A Taste of Advanced Topics: Cross-Classified Data and Models Cross-Classified Data • Observations nested in multiple, nonhierarchical units – e.g. persons nested in schools and neighborhoods – patients nested in multiple doctors/clinics – students nested in multiple classrooms – repeated observations on relationship dynamics nested in partners (where partner changes are common) Cross-Classified Data • Yijk is outcome Y for person i in neighborhood j and school k Yijk 0 jk rijk 0 jk 000 u0 j v0 k w0 jk Cross-Classified Data • In 3-level hierarchical data, each observation can be decomposed into: – – – – – a grand mean (common to all observations) a row-specific (e.g. school) deviation from the grand mean a column-specific (e.g. nbhd) deviation from the grand mean a row*column-specific deviation and individual deviation from the row*column mean Yijk 000 u0 j v0 k w0 jk rijk Conclusion Resources: • Textbooks – Raudenbush & Bryk (2002) Hierarchical Linear Models. Sage. – Singer & Willett (2003) Applied Longitudinal Data Analysis • Multilevel Listserv – http://www.nursing.teaching.man.ac.uk/staff/mcampbell/mu ltilevel.html – http://www.jiscmail.ac.uk/lists/multilevel.html Resources: • Software – – – – – HLM MLWin SAS (PROC MIXED) SPSS Stata (-gllamm-)