* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Multilevel 1
Survey
Document related concepts
Transcript
Multilevel Models 1 Sociology 229: Advanced Regression Copyright © 2010 by Evan Schofer Do not copy or distribute without permission Announcements • Assignment 4 Due • Assignments 2 & 3 handed back. Multilevel Data • Often we wish to examine data that is “clustered” or “multilevel” in structure – Classic example: Educational research • Students are nested within classes • Classes are nested within schools • Schools are nested within districts or US states • We often refer to these as “levels” • • • • Ex: If the study is individual/class/school… Level 1 = individual level Level 2 = classroom Level 3 = school – Note: Some stats books/packages label differently! Multilevel Data • Students nested in class, school, and state • Variables at each level may affect student outcomes California Oregon School Class Class School Class Class School Class Class Class Class School Class Class Class Class Multilevel Data • Simpler example: 2-level data Class Class Class Class Class Class • Which can be shown as: Level 2 Level 1 Class 1 S1 S2 Class 2 S3 S1 S2 Class 3 S3 S1 S2 S3 Multilevel Data • We are often interested in effects of variables at multiple levels • • • • • Ex: Predicting student test scores Individual level: grades, SES, gender, race, etc. Class level: Teacher qualifications, class size, track School: Private vs. public, resources State: Ed policies (funding, tests), budget – And, it is useful to assess the relative importance of each level in predicting outcomes • Should educational reforms target classrooms? Schools? Individual students? • Which is most likely to have big consequences? Multilevel Data • Repeated measurement is also “multilevel” or “clustered” • Measurement at over time (T1, T2, T3…) is nested within persons (or firms or countries) • Level 1 is the measurement (at various points in time) • Level 2 = the individual Person 1 Person 2 Person 3 Person 4 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 Multilevel Data • Examples of multilevel/clustered data: • Individuals from same family – Ex: Religiosity • People in same country (in a cross-national survey) – Ex: Civic participation • Firms from within the same industry – Ex: Firm performance • Individuals measured repeatedly – Ex: Depression • Workers within departments, firms, & industries – Ex: Worker efficiency – Can you think of others? Example: Pro-environmental values • Source: World Values Survey (27 countries) • Let’s simply try OLS regression . reg supportenv age male dmar demp educ incomerel ses Source | SS df MS -------------+-----------------------------Model | 2761.86228 7 394.551755 Residual | 105404.878 27799 3.79167876 -------------+-----------------------------Total | 108166.74 27806 3.89005036 Number of obs F( 7, 27799) Prob > F R-squared Adj R-squared Root MSE = = = = = = 27807 104.06 0.0000 0.0255 0.0253 1.9472 -----------------------------------------------------------------------------supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0021927 .000803 -2.73 0.006 -.0037666 -.0006187 male | .0960975 .0236758 4.06 0.000 .0496918 .1425032 dmar | .0959759 .02527 3.80 0.000 .0464455 .1455063 demp | -.1226363 .0254293 -4.82 0.000 -.172479 -.0727937 educ | .1117587 .0058261 19.18 0.000 .1003393 .1231781 incomerel | .0131716 .0056011 2.35 0.019 .0021931 .0241501 ses | .0922855 .0134349 6.87 0.000 .0659525 .1186186 _cons | 5.742023 .0518026 110.84 0.000 5.640487 5.843559 Aggregation • If you want to focus on higher-level hypotheses (e.g., schools, not children), you can aggregate • Make “school” the unit of analysis • OLS regression analysis of school-level variables • Individual-level variables (e.g., student achievement) can be included as school averages (aggregates) – Ex: Model average school test score as a function of school resources and average student SES • Problem: Approach destroys individual-level data • Also: Loss of statistical power (Tabachnick & Fidel 2007) • Also: Can’t draw individual-level interpretations: ecological fallacy. Example: Pro-environmental values • Aggregation: Analyze country means (N=27) . reg supportenv age male dmar demp educ incomerel ses Source | SS df MS -------------+-----------------------------Model | 2.58287267 7 .36898181 Residual | 7.72899325 19 .406789119 -------------+-----------------------------Total | 10.3118659 26 .396610228 Number of obs F( 7, 19) Prob > F R-squared Adj R-squared Root MSE = 27 = 0.91 = 0.5216 = 0.2505 = -0.0257 = .6378 -----------------------------------------------------------------------------supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .0211517 .0391649 0.54 0.595 -.0608215 .1031248 male | 3.966173 4.479358 0.89 0.387 -5.409232 13.34158 dmar | .8001333 1.127099 0.71 0.486 -1.558913 3.15918 demp | -.0571511 1.165915 -0.05 0.961 -2.497439 2.383137 educ | .3743473 .2098779 1.78 0.090 -.0649321 .8136268 incomerel | .148134 .1687438 0.88 0.391 -.2050508 .5013188 ses | -.4126738 .4916416 -0.84 0.412 -1.441691 .6163439 _cons | 2.031181 3.370978 0.60 0.554 -5.024358 9.08672 Note loss of statistical power – few variables are significant when N is only 27 Ecological Fallacy • Issue: Data aggregation limits your ability to draw conclusions about level-1 units • The “ecological fallacy” – Robinson, W.S. (1950). "Ecological Correlations and the Behavior of Individuals". American Sociological Review 15: 351–357 • Among US states, immigration rate correlates positively with average literacy • Does this mean that immigrants tend to be more literate than US citizens? • NO: You can’t assume an individual-level correlation! – The correlation at individual level is actually negative – But: immigrants settled in states with high levels of literacy – yielding a correlation in aggregate statistics. OLS Approaches • Another option: Just use OLS regression • Allows you to focus on lower-level units – No need for aggregation • Ex: Just analyze individuals as the unit of analysis, ignoring clustering among schools • Include independent variables measured at the individual-level and other levels • Problems: • 1. Violates OLS assumptions (see below) • 2. OLS can’t take full advantage of richness of multilevel data – Ex: Complex variation in intercepts, slopes across groups. Multilevel Data: Problems • Issue: Multilevel data often results in violation of OLS regression assumption • OLS requires an independent random sample… • Students from the same class (or school) are not independent… and may have correlated error – If you don’t control for sources of correlated error, models tend to underestimate standard errors • This leads to false rejection of H0 – “Type I Error” -- Too many asterisks in table • This is a serious issue, as we always want to err in the direction of conservatism Multilevel Data: Problems • Why might nested data have correlated error? – Example: Student performance on a test • Students in a given classroom may share & experience common (unobserved) characteristics • Ex: Maybe the classroom is too dark, causing all students to perform poorly on tests – If all those students score poorly, they fall below the regression line… and have negative error – But OLS regression requires that error be “random” – Within-class error should be random, not consistently negative – Other sources of within-class (or school) error • An especially good teacher; poor school funding • Other ideas? Multilevel Data: Problems • Sources of correlated error within groups – Ex: Cross-national study of homelessness • People in welfare states have a common unobserved characteristic: access to generous benefits – Ex: Study of worker efficiency in workgroups • Group members may influence each other (peer pressure) leading to group commonalities. Multilevel Data: Problems • When is multilevel data NOT a problem? – Answer: If you can successfully control for potential sources of correlated error • Add a control to OLS model for: classroom, school, and state characteristics that would be sources of correlated error in each group • Ex: Teacher quality, class size, budget, etc… • But: We often can’t identify or measure all relevant sources of correlated error • Thus, we need to abandon simple OLS regression and try other approaches. Example: Pro-environmental values • Source: World Values Survey (~26 countries) . reg supportenv age male dmar demp educ incomerel ses Source | SS df MS -------------+-----------------------------Model | 2761.86228 7 394.551755 Residual | 105404.878 27799 3.79167876 -------------+-----------------------------Total | 108166.74 27806 3.89005036 Number of obs F( 7, 27799) Prob > F R-squared Adj R-squared Root MSE = = = = = = 27807 104.06 0.0000 0.0255 0.0253 1.9472 -----------------------------------------------------------------------------supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0021927 .000803 -2.73 0.006 -.0037666 -.0006187 male | .0960975 .0236758 4.06 0.000 .0496918 .1425032 dmar | .0959759 .02527 3.80 0.000 .0464455 .1455063 demp | -.1226363 .0254293 -4.82 0.000 -.172479 -.0727937 educ | .1117587 .0058261 19.18 0.000 .1003393 .1231781 incomerel | .0131716 .0056011 2.35 0.019 .0021931 .0241501 ses | .0922855 .0134349 6.87 0.000 .0659525 .1186186 _cons | 5.742023 .0518026 110.84 0.000 5.640487 5.843559 Robust Standard Errors • Strategy #1: Improve our estimates of the standard errors – Option 1: Robust Standard Errors • reg y x1 x2 x3, vce(robust) • The Huber / White / “Sandwich” estimator • An alternative method of computing standard errors that is robust to a variety of assumption violations – Provides accurate estimates in presence of heteroskedasticity • Also, robust to model misspecification – Note: Freedman’s criticism: What good are accurate SEs if coefficients are biased due to poor specification? • Doesn’t fix the clustered error problem… Example: Pro-environmental values • Robust Standard Errors . reg supportenv age male dmar demp educ incomerel ses, robust Linear regression Number of obs F( 7, 27799) Prob > F R-squared Root MSE = = = = = 27807 102.48 0.0000 0.0255 1.9472 -----------------------------------------------------------------------------| Robust supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0021927 .0008113 -2.70 0.007 -.0037829 -.0006024 male | .0960975 .0237017 4.05 0.000 .049641 .142554 dmar | .0959759 .025602 3.75 0.000 .0457948 .146157 demp | -.1226363 .0251027 -4.89 0.000 -.1718388 -.0734339 educ | .1117587 .0057498 19.44 0.000 .1004888 .1230286 incomerel | .0131716 .0056017 2.35 0.019 .002192 .0241513 ses | .0922855 .0135905 6.79 0.000 .0656474 .1189237 _cons | 5.742023 .0527496 108.85 0.000 5.638631 5.845415 Standard errors shift a tiny bit… fairly similar to OLS in this case Robust Cluster Standard Errors • Option 2: “Robust cluster” standard errors – An extension of robust SEs to address clustering • reg y x1 x2 x3, vce(cluster groupid) – Note: Cluster implies robust (vs. regular SEs) • It is easy to adapt robust standard errors to address clustering in data; See: – http://www.stata.com/support/faqs/stat/robust_ref.html – http://www.stata.com/support/faqs/stat/cluster.html • Result: SE estimates typically increase, which is appropriate because non-independent cases aren’t providing as much information compared to a sample of independent cases. Example: Pro-environmental values • Robust Cluster Standard Errors . reg supportenv age male dmar demp educ incomerel ses, cluster(country) Linear regression Number of clusters (country) = 26 Number of obs = F( 7, 25) = Prob > F = R-squared = Root MSE = 27807 12.94 0.0000 0.0255 1.9472 -----------------------------------------------------------------------------| Robust supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0021927 .0017599 -1.25 0.224 -.0058172 .0014319 male | .0960975 .0341053 2.82 0.009 .0258564 .1663386 dmar | .0959759 .0722285 1.33 0.196 -.0527815 .2447333 demp | -.1226363 .0820805 -1.49 0.148 -.2916842 .0464115 educ | .1117587 .0301004 3.71 0.001 .0497658 .1737515 incomerel | .0131716 .0260334 0.51 0.617 -.0404452 .0667885 ses | .0922855 .0405742 2.27 0.032 .0087214 .1758496 _cons | 5.742023 .2451109 23.43 0.000 5.237208 6.246838 Cluster standard errors really change the picture. Several variables lose statistical significance. Dummy Variables • Another solution to correlated error within groups/clusters: Add dummy variables • Include a dummy variable for each Level-2 group, to explicitly model variance in means • A simple version of a “fixed effects” model (see below) • Ex: Student achievement; data from 3 classes • Level 1: students; Level 2: classroom • Create dummy variables for each class – Include all but one dummy variable in the model – Or include all dummies and suppress the intercept Yi DClass 2 X i DClass 3 X i X i i Dummy Variables • What is the consequence of adding group dummy variables? • A separate intercept is estimated for each group • Correlated error is absorbed into intercept – Groups won’t systematically fall above or below the regression line • In fact, all “between group” variation (not just error) is absorbed into the intercept – Thus, other variables are really just looking at within group effects – This can be good or bad, depending on your goals. Dummy Variables • Note: You can create a set of dummy variables in stata as follows: • xi i.classid – creates dummy variables for each unique value of the variable “classid” – Creates variables named _Iclassid_1, _Iclassid2, etc • These dummies can be added to the analysis by specifying the variable: _Iclassid* • Ex: reg y x1 x2 x3 _Iclassid*, nocons – “nocons” removes the constant, allowing you to use a full set of dummies. Alternately, you could drop one dummy. Example: Pro-environmental values • Dummy variable model . reg supportenv age male dmar demp educ incomerel ses _Icountry* Source | SS df MS -------------+-----------------------------Model | 11024.1401 32 344.504377 Residual | 97142.6001 27774 3.49760928 -------------+-----------------------------Total | 108166.74 27806 3.89005036 Number of obs F( 32, 27774) Prob > F R-squared Adj R-squared Root MSE = = = = = = 27807 98.50 0.0000 0.1019 0.1009 1.8702 -----------------------------------------------------------------------------supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0038917 .0008158 -4.77 0.000 -.0054906 -.0022927 male | .0979514 .0229672 4.26 0.000 .0529346 .1429683 dmar | .0024493 .0252179 0.10 0.923 -.046979 .0518777 demp | -.0733992 .0252937 -2.90 0.004 -.1229761 -.0238223 educ | .0856092 .0061574 13.90 0.000 .0735404 .097678 incomerel | .0088841 .0059384 1.50 0.135 -.0027554 .0205237 ses | .1318295 .0134313 9.82 0.000 .1055036 .1581554 _Icountry_32 | -.4775214 .085175 -5.61 0.000 -.6444687 -.3105742 _Icountry_50 | .3943565 .0844248 4.67 0.000 .2288798 .5598332 _Icountry_70 | .1696262 .0865254 1.96 0.050 .0000321 .3392203 … dummies omitted … _Icountr~891 | .243995 .0802556 3.04 0.002 .08669 .4012999 _cons | 5.848789 .082609 70.80 0.000 5.686872 6.010707 Dummy Variables • Benefits of the dummy variable approach • It is simple – Just estimate a different intercept for each group • sometimes the dummy interpretations can be of interest • Weaknesses • Cumbersome if you have many groups • Uses up lots of degrees of freedom (not parsimonious) • Makes it hard to look at other kinds of group dummies – Non-varying group variables = collinear with dummies • Can be problematic if your main interest is to study effects of variables across groups – Dummies purge that variation… focus on within-group variation – If there isn’t much within group variation, there isn’t much to analyze – Related point: fixed effects can amplify noise (e.g., in panel data). Dummy Variables • Note: Dummy variables are a simple example of a “fixed effects” model (FEM) • Effect of each group is modeled as a “fixed effect” rather than a random variable • Also can be thought of as the “within-group” estimator – Looks purely at variation within groups – Stata can do a Fixed Effects Model without the effort of using all the dummy variables • Simply request the “fixed effects” estimator in xtreg. Fixed Effects Model (FEM) • Fixed effects model: Yij j X ij ij • For i cases within j groups • Therefore j is a separate intercept for each group • It is equivalent to solely at within-group variation: Yij Y j ( X ij X j ) ij j • X-bar-sub-j is mean of X for group j, etc • Model is “within group” because all variables are centered around mean of each group. Fixed Effects Model (FEM) . xtreg supportenv age male dmar demp educ incomerel ses, i(country) fe Fixed-effects (within) regression Group variable (i): country Number of obs Number of groups = = 27807 26 R-sq: Obs per group: min = avg = max = 511 1069.5 2154 within = 0.0220 between = 0.0368 overall = 0.0239 F(7,27774) = 89.23 corr(u_i, Xb) = 0.0213 Prob > F = 0.0000 -----------------------------------------------------------------------------supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0038917 .0008158 -4.77 0.000 -.0054906 -.0022927 male | .0979514 .0229672 4.26 0.000 .0529346 .1429683 dmar | .0024493 .0252179 0.10 0.923 -.046979 .0518777 demp | -.0733992 .0252937 -2.90 0.004 -.1229761 -.0238223 educ | .0856092 .0061574 13.90 0.000 .0735404 .097678 incomerel | .0088841 .0059384 1.50 0.135 -.0027554 .0205237 ses | .1318295 .0134313 9.82 0.000 .1055036 .1581554 _cons | 5.878524 .052746 111.45 0.000 5.775139 5.981908 -------------+---------------------------------------------------------------sigma_u | .55408807 Identical to dummy variable model! sigma_e | 1.8701896 rho | .08069488 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(25, 27774) = 94.49 Prob > F = 0.0000 ANOVA: A Digression • Suppose you wish to model variable Y for j groups (clusters) • Ex: Wages for different racial groups • Definitions: • The grand mean is the mean of all groups – Y-bar • The group mean is the mean of a particular sub-group of the population – Y-bar-sub-j ANOVA: Concepts & Definitions • Y is the dependent variable • We are looking to see if Y depends upon the particular group a person is in • The effect of a group is the difference between a group’s mean & the grand mean • Effect is denoted by alpha (a) • If Y-bar = $8.75, YGroup 1 = $8.90, then Group 1= $0.15 • Effect of being in group j is: α j Yj Y • It is like a deviation, but for a group. ANOVA: Concepts & Definitions • ANOVA is based on partitioning deviation • We initially calculated deviation as the distance of a point from the grand mean: d i Yi Y • But, you can also think of deviation from a group mean (called “e”): ei ,Group1 Yi ,Group1 YGroup1 • Or, for any case i in group j: eij Yij Y j ANOVA: Concepts & Definitions • The location of any case is determined by: • The Grand Mean, m, common to all cases • The group “effect” , common to members • The distance between a group and the grand mean • “Between group” variation • The within-group deviation (e): called “error” • The distance from group mean to an case’s value The ANOVA Model • This is the basis for a formal model: • For any population with mean m • Comprised of J subgroups, Nj in each group • Each with a group effect • The location of any individual can be expressed as follows: Y μ α ij j eij • Yij refers to the value of case i in group j • eij refers to the “error” (i.e., deviation from group mean) for case i in group j Sum of Squared Deviation • We are most interested in two parts of model • The group effects: j • Deviation of the group from the grand mean • Individual case error: eij • Deviation of the individual from the group mean • Each are deviations that can be summed up • Remember, we square deviations when summing • Otherwise, they add up to zero • Remember variance is just squared deviation Sum of Squared Deviation • The total deviation can partitioned into j and eij components: • That is, j + eij = total deviation: α j Yj Y eij Yij Yj eij α j (Yj Y) (Yij Yj ) Yij Y Sum of Squared Deviation • The total deviation can partitioned into j and eij components: • The total variance (SStotal) is made up of: – – – j : between group variance (SSbetween) eij : within group variance (SSwithin) SStotal = SSbetween + SSwithin ANOVA & Fixed Effects • Note that the ANOVA model is similar to the fixed effects model • But FEM also includes a X term to model linear trend ANOVA Yij μ α j eij Fixed Effects Model Yij j X ij ij • In fact, if you don’t specify any X variables, they are pretty much the same Within Group & Between Group Models • Group-effect dummy variables in regression model creates a specific estimate of group effects for all cases • Bs & error are based on remaining “within group” variation • We could do the opposite: ignore within-group variation and just look at differences between • Stata’s xtreg command can do this, too • This is essentially just modeling group means! Between Group Model . xtreg supportenv age male dmar demp educ incomerel ses, i(country) be Between regression (regression on group means) Group variable (i): country Number of obs Number of groups = = 27 27 R-sq: Obs per group: min = avg = max = 1 1.0 1 within = . between = 0.2505 overall = 0.2505 sd(u_i + avg(e_i.))= .6378002 F(7,19) Prob > F = = 0.91 0.5216 -----------------------------------------------------------------------------supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .0211517 .0391649 0.54 0.595 -.0608215 .1031248 male | 3.966173 4.479358 0.89 0.387 -5.409232 13.34158 dmar | .8001333 1.127099 0.71 0.486 -1.558913 3.15918 demp | -.0571511 1.165915 -0.05 0.961 -2.497439 2.383137 educ | .3743473 .2098779 1.78 0.090 -.0649321 .8136268 incomerel | .148134 .1687438 0.88 0.391 -.2050508 .5013188 ses | -.4126738 .4916416 -0.84 0.412 -1.441691 .6163439 _cons | 2.031181 3.370978 0.60 0.554 -5.024358 9.08672 Note: Results are identical to the aggregated analysis… Note that N is reduced to 27 Fixed vs. Random Effects • Dummy variables produce a “fixed” estimate of the intercept for each group • But, models don’t need to be based on fixed effects • Example: The error term (ei) • We could estimate a fixed value for all cases – This would use up lots of degrees of freedom – even more than using group dummies • In fact, we would use up ALL degrees of freedom – Stata output would simply report back the raw data (expressed as deviations from the constant) • Instead, we model e as a random variable – We assume it is normal, with standard deviation sigma. Random Effects • A simple random intercept model – Notation from Rabe-Hesketh & Skrondal 2005, p. 4-5 Random Intercept Model Yij 0 j ij • Where is the main intercept • Zeta () is a random effect for each group – Allowing each of j groups to have its own intercept – Assumed to be independent & normally distributed • Error (e) is the error term for each case – Also assumed to be independent & normally distributed • Note: Other texts refer to random intercepts as uj or nj. Random Effects • Issue: The dummy variable approach (ANOVA, FEM) treats group differences as a fixed effect • Alternatively, we can treat it as a random effect • Don’t estimate values for each case, but model it • This requires making assumptions – e.g., that group differences are normally distributed with a standard deviation that can be estimated from data. Linear Random Intercepts Model • The random intercept idea can be applied to linear regression • • • • Often called a “random effects” model… Result is similar to FEM, BUT: FEM looks only at within group effects Aggregate models (“between effects”) looks across groups – Random effects models is a hybrid: a weighted average of between & within group effects • It exploits between & within information, and thus can be more efficient than FEM & aggregate models. – IF distributional assumptions are correct. Linear Random Intercepts Model . xtreg supportenv age male dmar demp educ incomerel ses, i(country) re Random-effects GLS regression Group variable (i): country R-sq: within = 0.0220 between = 0.0371 overall = 0.0240 Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed) Assumes normal uj, uncorrelated with X vars Number of obs Number of groups = = 27807 26 Obs per group: min = avg = max = 511 1069.5 2154 Wald chi2(7) Prob > chi2 625.50 0.0000 = = -----------------------------------------------------------------------------supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0038709 .0008152 -4.75 0.000 -.0054688 -.0022731 male | .0978732 .0229632 4.26 0.000 .0528661 .1428802 dmar | .0030441 .0252075 0.12 0.904 -.0463618 .05245 demp | -.0737466 .0252831 -2.92 0.004 -.1233007 -.0241926 educ | .0857407 .0061501 13.94 0.000 .0736867 .0977947 incomerel | .0090308 .0059314 1.52 0.128 -.0025945 .0206561 ses | .131528 .0134248 9.80 0.000 .1052158 .1578402 _cons | 5.924611 .1287468 46.02 0.000 5.672272 6.17695 -------------+---------------------------------------------------------------sigma_u | .59876138 SD of u (intercepts); SD of e; intra-class correlation sigma_e | 1.8701896 rho | .09297293 (fraction of variance due to u_i) Linear Random Intercepts Model • Notes: Model can also be estimated with maximum likelihood estimation (MLE) • Stata: xtreg y x1 x2 x3, i(groupid) mle – Versus “re”, which specifies weighted least squares estimator • Results tend to be similar • But, MLE results include a formal test to see whether intercepts really vary across groups – Significant p-value indicates that intercepts vary . xtreg supportenv age male dmar demp educ incomerel ses, i(country) mle Random-effects ML regression Number of obs = 27807 Group variable (i): country Number of groups = 26 … MODEL RESULTS OMITTED … /sigma_u | .5397755 .0758087 .4098891 .7108206 /sigma_e | 1.869954 .0079331 1.85447 1.885568 rho | .0769142 .019952 .0448349 .1240176 -----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)= 2128.07 Prob>=chibar2 = 0.000 Choosing Models • Which model is best? • There is much discussion (e.g, Halaby 2004) • Fixed effects are most consistent under a wide range of circumstances • Consistent: Estimates approach true parameter values as N grows very large • But, they are less efficient than random effects – In cases with low within-group variation (big between group variation) and small sample size, results can be very poor – Random Effects = more efficient • But, runs into problems if specification is poor – Esp. if X variables correlate with random group effects – Usually due to omitted variables. Hausman Specification Test • Hausman Specification Test: A tool to help evaluate fit of fixed vs. random effects • Logic: Both fixed & random effects models are consistent if models are properly specified • However, some model violations cause random effects models to be inconsistent – Ex: if X variables are correlated to random error • In short: Models should give the same results… If not, random effects may be biased – If results are similar, use the most efficient model: random effects – If results diverge, odds are that the random effects model is biased. In that case use fixed effects… Hausman Specification Test • Strategy: Estimate both fixed & random effects models • Save the estimates each time • Finally invoke Hausman test – Ex: • • • • • xtreg var1 var2 var3, i(groupid) fe estimates store fixed xtreg var1 var2 var3, i(groupid) re estimates store random hausman fixed random Hausman Specification Test • Example: Environmental attitudes fe vs re . hausman fixed random Direct comparison of coefficients… ---- Coefficients ---| (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fixed random Difference S.E. -------------+---------------------------------------------------------------age | -.0038917 -.0038709 -.0000207 .0000297 male | .0979514 .0978732 .0000783 .0004277 dmar | .0024493 .0030441 -.0005948 .0007222 demp | -.0733992 -.0737466 .0003475 .0007303 educ | .0856092 .0857407 -.0001314 .0002993 incomerel | .0088841 .0090308 -.0001467 .0002885 ses | .1318295 .131528 .0003015 .0004153 -----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(7) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 2.70 Prob>chi2 = 0.9116 Non-significant pvalue indicates that models yield similar results… Within & Between Effects • Issue: What is the relationship between within-group effects and between-group effects? • FEM models within-group variation • BEM models between group variation (aggregate) – Usually they are similar • Ex: Student skills & test performance • Within any classroom, skilled students do best on tests • Between classrooms, classes with more skilled students have higher mean test scores – BUT… Within & Between Effects • But: Between and within effects can differ! • Ex: Effects of wealth on attitudes toward welfare • At the country level (between groups): – Wealthier countries (high aggregate mean) tend to have prowelfare attitudes (ex: Scandinavia) • At the individual level (within group) – Wealthier people are conservative, don’t support welfare • Result: Wealth has opposite between vs within effects! – Watch out for ecological fallacy!!! – Issue: Such dynamics often result from omitted level-1 variables (omitted variable bias) • Ex: If we control for individual “political conservatism”, effects may be consistent at both levels… Within & Between Effects / Centering • Multilevel models & “centering” variables • Grand mean centering: computing variables as deviations from overall mean • Often done to X variables • Has effect that baseline constant in model reflects mean of all cases – Useful for interpretation • Group mean centering: computing variables as deviation from group mean • Useful for decomposing within vs. between effects • Often in conjunction with aggregate group mean vars. Within & Between Effects • You can estimate BOTH within- and betweengroup effects in a single model • Strategy: Split a variable (e.g., SES) into two new variables… – 1. Group mean SES – 2. Within-group deviation from mean SES » Often called “group mean centering” • Then, put both variables into a random effects model • Model will estimate separate coefficients for between vs. within effects – Ex: • egen meanvar1 = mean(var1), by(groupid) • egen withinvar1 = var1 – meanvar1 • Include mean (aggregate) & within variable in model. Within & Between Effects • Example: Pro-environmental attitudes . xtreg supportenv meanage withinage male dmar demp educ incomerel ses, i(country) mle Random-effects ML regression Group variable (i): country Random effects ~ Gaussian Between & withinu_i effects are opposite. Older countries are MORE environmental, but older people are LESS. Omitted variables? Wealthy European countries Log strong likelihood -56918.299 with green =parties have older populations! Number of obs Number of groups = = 27807 26 Obs per group: min = avg = max = 511 1069.5 2154 LR chi2(8) Prob > chi2 620.41 0.0000 = = -----------------------------------------------------------------------------supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------meanage | .0268506 .0239453 1.12 0.262 -.0200812 .0737825 withinage | -.003903 .0008156 -4.79 0.000 -.0055016 -.0023044 male | .0981351 .0229623 4.27 0.000 .0531299 .1431403 dmar | .003459 .0252057 0.14 0.891 -.0459432 .0528612 demp | -.0740394 .02528 -2.93 0.003 -.1235873 -.0244914 educ | .0856712 .0061483 13.93 0.000 .0736207 .0977216 incomerel | .008957 .0059298 1.51 0.131 -.0026651 .0205792 ses | .131454 .0134228 9.79 0.000 .1051458 .1577622 _cons | 4.687526 .9703564 4.83 0.000 2.785662 6.58939 Generalizing: Random Coefficients • Linear random intercept model allows random variation in intercept (mean) for groups • But, the same idea can be applied to other coefficients • That is, slope coefficients can ALSO be random! Random Coefficient Model Yij 1 1 j 2 X ij 2 j X ij ij Yij 1 1 j 2 2 j X ij ij Which can be written as: • Where zeta-1 is a random intercept component • Zeta-2 is a random slope component. Linear Random Coefficient Model Both intercepts and slopes vary randomly across j groups Rabe-Hesketh & Skrondal 2004, p. 63 Random Coefficients Summary • Some things to remember: • Dummy variables allow fixed estimates of intercepts across groups • Interactions allow fixed estimates of slopes across groups – Random coefficients allow intercepts and/or slopes to have random variability • The model does not directly estimate those effects – Just as we don’t estimate coefficients of “e” for each case… • BUT, random components can be predicted after you run a model – Just as you can compute residuals – random error – This allows you to examine some assumptions (normality). STATA Notes: xtreg, xtmixed • xtreg – allows estimation of between, within (fixed), and random intercept models • • • • xtreg y x1 x2 x3, i(groupid) fe - fixed (within) model xtreg y x1 x2 x3, i(groupid) be - between model xtreg y x1 x2 x3, i(groupid) re - random intercept (GLS) xtreg y x1 x2 x3, i(groupid) mle - random intercept (MLE) • xtmixed – allows random slopes & coefs • “Mixed” models refer to models that have both fixed and random components • xtmixed [depvar] [fixed equation] || [random eq], options • Ex: xtmixed y x1 x2 x3 || groupid: x2 – Random intercept is assumed. Random coef for X2 specified. STATA Notes: xtreg, xtmixed • Random intercepts • xtreg y x1 x2 x3, i(groupid) mle – Is equivalent to • xtmixed y x1 x2 x3 || groupid: , mle • xtmixed assumes random intercept – even if no other random effects are specified after “groupid” – But, we can add random coefficients for all Xs: • xtmixed y x1 x2 x3 || groupid: x1 x2 x3 , mle cov(unstr) – Useful to add: “cov(unstructured)” • Stata default treats random terms (intercept, slope) as totally uncorrelated… not always reasonable • “cov(unstr) relaxes constraints regarding covariance among random effects (See Rabe-Hesketh & Skrondal). STATA Notes: GLLAMM • Note: xtmixed can do a lot… but GLLAMM can do even more! • “General linear & latent mixed models” • Must be downloaded into stata. Type “search gllamm” and follow instructions to install… – GLLAMM can do a wide range of mixed & latentvariable models • Multilevel models; Some kinds of latent class models; Confirmatory factor analysis; Some kinds of Structural Equation Models with latent variables… and others… • Documentation available via Stata help – And, in the Rabe-Hesketh & Skrondal text. Random intercepts: xtmixed • Example: Pro-environmental attitudes . xtmixed supportenv age male dmar demp educ incomerel ses || country: , mle Mixed-effects ML regression Group variable: country Wald chi2(7) = 625.75 Log likelihood = -56919.098 Number of obs Number of groups = = 27807 26 Obs per group: min = avg = max = 511 1069.5 2154 Prob > chi2 0.0000 = -----------------------------------------------------------------------------supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0038662 .0008151 -4.74 0.000 -.0054638 -.0022687 male | .0978558 .0229613 4.26 0.000 .0528524 .1428592 dmar | .0031799 .0252041 0.13 0.900 -.0462193 .0525791 demp | -.0738261 .0252797 -2.92 0.003 -.1233734 -.0242788 educ | .0857707 .0061482 13.95 0.000 .0737204 .097821 incomerel | .0090639 .0059295 1.53 0.126 -.0025578 .0206856 ses | .1314591 .0134228 9.79 0.000 .1051509 .1577674 _cons | 5.924237 .118294 50.08 0.000 5.692385 6.156089 -----------------------------------------------------------------------------[remainder of output cut off] Note: xtmixed yields identical results to xtreg , mle Random intercepts: xtmixed • Ex: Pro-environmental attitudes (cont’d) supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0038662 .0008151 -4.74 0.000 -.0054638 -.0022687 male | .0978558 .0229613 4.26 0.000 .0528524 .1428592 dmar | .0031799 .0252041 0.13 0.900 -.0462193 .0525791 demp | -.0738261 .0252797 -2.92 0.003 -.1233734 -.0242788 educ | .0857707 .0061482 13.95 0.000 .0737204 .097821 incomerel | .0090639 .0059295 1.53 0.126 -.0025578 .0206856 ses | .1314591 .0134228 9.79 0.000 .1051509 .1577674 _cons | 5.924237 .118294 50.08 0.000 5.692385 6.156089 ----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+-----------------------------------------------country: Identity | sd(_cons) | .5397758 .0758083 .4098899 .7108199 -----------------------------+-----------------------------------------------sd(Residual) | 1.869954 .0079331 1.85447 1.885568 -----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 2128.07 Prob >= chibar2 = 0.0000 xtmixed output puts all random effects below main coefficients. Here, they are “cons” (constant) for groups defined by “country”, plus residual (e) Non-zero SD indicates that intercepts vary Random Coefficients: xtmixed • Ex: Pro-environmental attitudes (cont’d) . xtmixed supportenv age male dmar demp educ incomerel ses || country: educ, mle [output omitted] supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0035122 .0008185 -4.29 0.000 -.0051164 -.001908 male | .1003692 .0229663 4.37 0.000 .0553561 .1453824 dmar | .0001061 .0252275 0.00 0.997 -.0493388 .049551 demp | -.0722059 .0253888 -2.84 0.004 -.121967 -.0224447 educ | .081586 .0115479 7.07 0.000 .0589526 .1042194 incomerel | .008965 .0060119 1.49 0.136 -.0028181 .0207481 ses | .1311944 .0134708 9.74 0.000 .1047922 .1575966 _cons | 5.931294 .132838 44.65 0.000 5.670936 6.191652 -----------------------------------------------------------------------------Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+-----------------------------------------------country: Independent | sd(educ) | .0484399 .0087254 .0340312 .0689492 sd(_cons) | .6179026 .0898918 .4646097 .821773 -----------------------------+-----------------------------------------------sd(Residual) | 1.86651 .0079227 1.851046 1.882102 -----------------------------------------------------------------------------LR test vs. linear regression: chi2(2) = 2187.33 Prob > chi2 = 0.0000 Here, we have allowed the slope of educ to vary randomly across countries Educ (slope) varies, too! Random Coefficients: xtmixed • What if the random intercept or slope coefficients aren’t significantly different from zero? • Answer: that means there isn’t much random variability in the slope/intercept • Conclusion: You don’t need to specify that random parameter – Also: Models include a LRtest to compare with a simple OLS model (no random effects) • If models don’t differ (Chi-square is not significant) stick with a simpler model. Random Coefficients: xtmixed • What are random coefficients doing? • Let’s look at results from a simplified model 8 – Only random slope & intercept for education 3 4 5 6 7 Model fits a different slope & intercept for each group! 0 2 4 6 highest educational level attained 8 Random Coefficients • Why bother with random coefficients? – 1. A solution for clustering (non-independence) – Usually people just use random intercepts, but slopes may be an issue also – 2. You can create a better-fitting model – If slopes & intercepts vary, a random coefficient model may fit better – Assuming distributional assumptions are met – Model fit compared to OLS can be tested…. – 3. Better predictions – Attention to group-specific random effects can yield better predictions (e.g., slopes) for each group » Rather than just looking at “average” slope for all groups. Random Coefficients • 4. Multilevel models explicitly put attention on levels of causality • Higher level / “contextual” effects versus individual / unit-level effects • A technology for separating out between/within • NOTE: this can be done w/out random effects – But it goes hand-in-hand with clustered data… • Note: Be sure you have enough level-2 units! – Ex: Models of individual environmental attitudes • Adding level-2 effects: Democracy, GDP, etc. – Ex: Classrooms • Is it student SES, or “contextual” class/school SES? Multilevel Model Notation • So far, we have expressed random effects in a single equation: Random Coefficient Model Yij 1 1 j 2 X ij 2 j X ij ij • However, it is common to separate levels: Level 1 equation Yij 1 2 X ij ij Intercept equation 1 1 u1 j Slope Equation 2 2 u2 j Gamma = constant u = random effect Here, we specify a random component for level-1 constant & slope Multilevel Model Notation • The “separate equation” formulation is no different from what we did before… • But it is a vivid & clear way to present your models • All random components are obvious because they are stated in separate equations • NOTE: Some software (e.g., HLM) requires this – Rules: • 1. Specify an OLS model, just like normal • 2. Consider which OLS coefficients should have a random component – These could be the intercept or any X (slope) coefficient • 3. Specify an additional formula for each random coefficient… adding random components when desired Cross-Level Interactions • Does context (i.e., level-2) influence the effect of level-1 variables? – Example: Effect of poverty on homelessness • Does it interact with welfare state variables? – Ex: Effect of gender on math test scores • Is it different in coed vs. single-sex schools? – Can you think of others? Cross-level interactions • Idea: specify a level-2 variable that affects a level-1 slope Level 1 equation Yij 1 2 X ij ij Intercept equation 1 1 u1 j Slope equation with interaction 2 2 3 Z j u2 j Cross-level interaction: Level-2 variable Z affects slope (B2) of a level-1 X variable Coefficient 3 reflects size of interaction (effect on B2 per unit change in Z) Cross-level Interactions • Cross-level interaction in single-equation form: Random Coefficient Model with cross-level interaction Yij 1 1 j 2 X ij 2 j X ij 3X ij Z j ij – Stata strategy: manually compute cross-level interaction variables • Ex: Poverty*WelfareState, Gender*SingleSexSchool • Then, put interaction variable in the “fixed” model – Interpretation: B3 coefficient indicates the impact of each unit change in Z on slope B2 • If B3 is positive, increase in Z results in larger B2 slope. Cross-level Interactions • Pro-environmental attitudes . xtmixed supportenv age male dmar demp educ income_dev inc_meanXeduc ses || country: income_mean , mle cov(unstr) Mixed-effects ML regression Group variable: country Interaction between country mean Number of obs = 27807 income and individual-level education Number of groups = 26 supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0038786 .0008148 -4.76 0.000 -.0054756 -.0022817 male | .1006206 .0229617 4.38 0.000 .0556165 .1456246 dmar | .0041417 .025195 0.16 0.869 -.0452395 .0535229 demp | -.0733013 .0252727 -2.90 0.004 -.1228348 -.0237678 educ | -.035022 .0297683 -1.18 0.239 -.0933668 .0233227 income_dev | .0081591 .005936 1.37 0.169 -.0034753 .0197934 inc_meanXeduc| .0265714 .0064013 4.15 0.000 .0140251 .0391177 ses | .1307931 .0134189 9.75 0.000 .1044926 .1570936 _cons | 5.892334 .107474 54.83 0.000 5.681689 6.102979 ------------------------------------------------------------------------------ Interaction: inc_meanXeduc has a positive effect… The education slope is bigger in wealthy countries Note: main effects change. “educ” indicates slope when inc_mean = 0 Cross-level Interactions • Random part of output (cont’d from last slide) . xtmixed supportenv age male dmar demp educ income_dev inc_meanXeduc ses || country: income_mean , mle cov(unstr) -----------------------------------------------------------------------------Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+-----------------------------------------------country: Unstructured | sd(income~n) | .5419256 .2095339 .253995 1.156256 sd(_cons) | 2.326379 .8679172 1.11974 4.8333 corr(income~n,_cons) | -.9915202 .0143006 -.999692 -.7893791 -----------------------------+-----------------------------------------------sd(Residual) | 1.869388 .0079307 1.853909 1.884997 -----------------------------------------------------------------------------LR test vs. linear regression: chi2(3) = 2124.20 Prob > chi2 = 0.0000 Random components: Income_mean slope allowed to have random variation Interceps (“cons”) allowed to have random variation “cov(unstr)” allows for the possibility of correlation between random slopes & intercepts… generally a good idea. Beyond 2-level models • Sometimes data has 3 levels or more • • • • Ex: School, classroom, individual Ex: Family, individual, time (repeated measures) Can be dealt with in xtmixed, GLLAMM, HLM Note: stata manual doesn’t count lowest level – What we call 3-level is described as “2-level” in stata manuals – xtmixed syntax: specify “fixed” equation and then random effects starting with “top” level • xtmixed var1 var2 var3 || schoolid: var2 || classid:var3 – Again, specify unstructured covariance: cov(unstr) Beyond Linear Models • Stata can specify multilevel models for dichotomous & count variables – Random intercept models • • • • xtlogit – logistic regression – dichotomous xtpois – poisson regression – counts xtnbreg – negative binomial – counts xtgee – any family, link… w/random intercept – Random intercept & coefficient models – Plus, allows more than 2 levels… • xtmelogit – mixed logit model • xtmepoisson – mixed poisson model Panel Data • Panel data is a multilevel structure • Cases measured repeatedly over time • Measurements are ‘nested’ within cases Person 1 Person 2 Person 3 Person 4 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 – Obviously, error is clustered within cases… but… – Error may also be clustered by time • Historical time events or life-course events may mean that cases aren’t independent – Ex: All T1s and all T5s • Ex: Models of economic growth… certain periods (e.g., Oil shocks of 1970s) affect all countries. Panel Data • Issue: panel data may involve clustering across cases & time • Good news: Stata’s “xt” commands were made for this • Allow specification of both ID and TIME clusters… • Ex: xtreg var1 var2 var3, mle i(countryid) t(year) – Note: You can also “mix and match” fixed and random effects • Ex: You can use dummies (manually) to deal with timeclustering with a random effect for case ids Panel Data: serial correlation • Panel data may have another problem: • Sequential cases may have correlated error – Ex: Adjacent years (1950 & 1951 or 2007 & 2008) may be very similar. Correlation denoted by “rho” (r) • Called “autocorrelation” or “serial correlation” • “Time-series” models are needed • xtregar – xtreg, for cases in which the error-term is “first-order autoregressive” • First order means the prior time influences the current – Only adjacent time-points… assumes no effect of those prior • Can be used to estimate FEM, BEM, or GLS model • Use option “lbi” to test for autocorrelation (rho = 0?). Panel Data: Choosing a Model • If clustering is mainly a nuisance: • Adjust SEs: vce(cluster caseid) • Or simple fixed or random effects – Choice between fixed & random • Fixed is “safer” – reviewers are less likely to complain – If hausman test works, random = OK, too • But, if cross-sectional variation is of interest, fixed can be a problem… – In that case, use random effects… and hope the reviewers don’t give you grief. Panel Data: Choosing a Model • If you have substantive interests in cross-level dynamics, mixed models are probably the way to go… • Plus, you can create a better-fitting model – Allows you to relax the assumption that slopes are the same across groups.