Download Lecture5-12-09 - University of Washington

Methods for Multilevel Analysis XH Andrew Zhou, PhD Professor, Department of Biostatistics University of Washington Examples of Multilevel (Hierarchical) Data     Individual-family-neighborhood Students-classroom-schooldistrict Patient-provider-facility (the Ambulatory Care Quality Improvement Project (ACQUIP). Other types, multiple outcomes nested within individual ACQUIP Alcohol Trial • • • A group-randomized trial Intervention: Feedback given to the providers at each visit on patient’s general perceived health status as well as the condition specific perceived health status for 6 common conditions — chronic obstructive pulmonary disease (COPD), coronary artery disease (CAD), hypertension, depression, diabetes, and alcohol problems. Outcome at 1-yr follow-up: (1) Self-reports of advice about alcohol from their provider; binary outcome. Hierarchical Nature of Data     Patients – Providers – facility Patient’s characteristics, e.g. advice at baseline, co-morbility Provider’s characteristics, e.g panel size Facility’s characteristics, e.g. urban vs rural. Research Questions    Whether the intervention was significantly related with patient self-reports of advice about alcohol from their providers after one year of the intervention. Independent effects of patient-level, provider-level, and facility-level factors. Quantification of provider-to-provider variability and facility-to-facility variability and the degree which it can be explained by patient-level, provider-level, and facility-level factors Research Questions, Cont   Do facilities differ in expected outcomes after controlling for individual-level, provider-level, and facility-level factors? Do providers differ in expected outcomes after controlling for individual-level, provider-level, and facility-level factors? Multilevel (Hierarchical) Models A hierarchical model analysis will treat the sites and the providers as random effects and will parse out the amount of total variation in the outcome that is attributable to each level of hierarchy. An example using two-level linear model on schools  A study of the relationship between a single student-level predictor variable (say, socioeconomic status (SES)) and one student-level outcome variable (mathematics achievement) in J schools randomly drawn from the entire population of schools. The SES-Achievement relationship in one school   Our regression model would be Figure 2.1 provides a scatterplot of this relationship. Yi   0  1 xi   i ,  i ~ N (0,  2 ) Yi is the math achievement score for student i, x i is the socioeconomic status for subject i.  0 is the intercept, and 1 is the slope. Centering in covariates     0 is defined as the expected achivement of a student whose SES is zero. It may be helpful to scale the independent variable, X, so that the intercept will be meaningful. We center SES by subtracting the mean SES from each score. Figure 2.2 shows the regression model with centering. The SES-Achievement relationship in two schools  Figure 2.3 shows separate regression models for two schools. Yi1   01  11 xi1   i1 ,  i1 ~ N (0,  ) 2 Yi1 is the achievement for student i in school 1, x i1 is the socioeconomic status for subject i in school 1.  01 is the intercept, and 11 is the slope in school 1. Yi 2   02  12 xi 2   i 2 ,  i 2 ~ N (0,  2 ) Yi 2 is the achievement for student i in school 2      The two lines indicate that School 1 and School 2 differ in two ways. (1) School 1 has higher mean than school 2 (01>02) (2) SES is less predictive of achievement in School 1 than School 2 (11<12) If students had been randomly assigned to the two schools, we could say that School 1 is both more “effective” and more “equitable”. Of course, students are not assigned at random, so such interpretations of school effects are unwarranted without taking into account other differences in student composition. The SES-Achievement relationship in J schools (2-level Variance Component) Yij   0 j  11 xij   ij ,  ij ~ N (0,  ) 2 Yij is the achievement for student i in school j, x ij is the socioeconomic status for subject i in school j.  0j is the intercept, and 1j is the slope in school j. For each school, effectiveness and equality are described by the pair of values ( 0j , 1 j ).  Often sensible and convenient to assume that the intercept and slope have a bivariate normal distribution across the population of schools. E( 0 j )   0 ,Var ( 0 j )   00 , E( 1 j )   1, Var ( 1 j )  11,Cov(0 j ,1 j )= 01 Interpretation      0: the average school mean for the population of schools 00: the population variance among the school means 1: the average SES-achievement slope for the population of schools 11:the population variance among the slopes 01: the population covariance between slopes and intercepts    Figure 2.4 provides a scatterplot of the relationship between 0j and 1j for a hypothetical sample of 200 schools. There is more dispersion among means than slopes (00> 11) Two effects tend to be negatively correlated (01<0); schools with high averaged achievment, 0j, tend to have weak SES-achievement relationship, 1j Modeling the second level • • Having examined graphically how schools vary in terms of their intcepts and slopes, we wish to develop a model to predict 0j and 1j using school characteristics. Let Wj be an indicator, which takes on a value of one for Catholic schools and a value of zero for public Two-level Linear Model, Cont  0 j   00   01W j  uoj , uoj ~ N (0, 00 ), 1 j   10   11W j  u1 j , u1 j ~ N (0, 00 ), cov(u0 j , u1 j )   10 Interpretation       00: the mean achievement for public schools 01: the mean achievement difference between Catholic and public schools 10: the average SES-achievement slope in public schools 11: the mean difference in SESachievement slope in between Catholic and public schools u1j:the unique effect of school j on mean achievement holding Wj constant u0j: the unique effect of school j on SESachievement slope holding Wj constant Estimation methods   It is not possible to estimate the parameters of these regression models directly because the outcomes (0j, 1j) are not observed. However, the data contain information needed for this estimation. Estimation methods, cont  Combining models in two stages, we obtain Yij   00   01W j   10 ( X ij  X )   11W j ( X ij  X )  ij , ij  u0 j  u1 j (Xij -X)+ ij . Estimation methods, Cont    The overall linear regression model is not the typical linear model assumed in standard ordinary least squares (OLS). Efficient estimation and accurate hypothesis testing based on OLS require that the random errors are independent, normally distributed, and have constant variance. In contrast, random errors in our overall model are dependent within each school and also have non-constant variances. Estimation methods, cont.  The variance of random errors has the following complicated form: Var (ij )   00  11 (Xij -X) + . 2 2 Estimation methods, cont     Through standard regression analysis is not appropriate, such models can be estimated by iterative maximum likelihood procedure. Figure 2.5 provides a graphical representation of the model specified above. Here we see two hypothetical plots of the association between 0j and 1j, one for public and a second for Catholic schools. Plots show Catholic schools have both higher mean achievement and weaker SES effects than do the public school Estimation methods, Cont • 1. 2. 3. Three types of parameters to estimate to be estimated: Fixed effects (00,01,10,11) Random level-1 coefficients (0j,1j) Variance-covariance components (2,00,11,01) Three common estimation methods    Maximum likelihood (ML) method is a general estimation procedure, which produces estimates for the population parameters that maximize the probability of the observing the data given the model. Iterative generalized least squares (IGLS) and Restricted Iterative generalized least squares. Bayesian method ML method  1. 2. Two different likelihood functions: Full Maximum Likelihood (FML) – both the regression coefficients and the variance components are included in the likelihood function. Restricted Maximum Likelihood (RML) – only the variance components are included in the likelihood function, and the regression coefficients are estimated in a second estimation step. Comparison of these two methods    FML is more efficient and can provide estimates for both variance components and fixed effect parameters. But, FML may produce biased estimates for variance components. RML can provide less biases estimates for the variance components and is equivalent to ANOVA estimates, which are optimal, if the groups are balanced. FML still continues to be used because (1) its computation is generally easier, and (2) it is easier to compare two models that differ in the fixed parameters using the likelihood-based tests. However, with RML, only differences in the random part can be compared with likelihood-based tests IGLS and RIGLS  The combined model is Yij   00   01W j   10 ( X ij  X )   11W j ( X ij  X )  ij , var(ij )   00   11 (X ij -X) 2 + 2 , cov(ij ,kj )=  00   01[1  (X ij -X)+(X kj -X)]+ 11 (X ij -X)(X kj -X) for i  k. Or we can re-write the model as Y ~ N mJ J ( X  , ), Y  (Y11 , , YmJ J ) ',   ( 00 ,  01 ,  10 ,  11 ) '. IGLS and RIGLS, Cont   If , 00, 11, and 01 were known, then the covariance matrix,, could be constructed immediately, and the estimation could be performed with generalized least squares. However, without knowledge of the covariance matrix, the estimation method is instead and iterative process known as iterative generalized least squares (IGLS). IGLS and RIGLS, Cont   The first step is to start with reasonable estimates of the fixed parameters. Typically these are the estimates from Ordinary Least Squares (OLS) that assumes 00=11=01=0. From these estimates, the raw residuals are formed: ˆ ˆ yij  yij   0  1 xij IGLS and RIGLS, Cont Let Y be the vector of raw residuals and it can be shown E[YY T ]   The estimation of these variance components involves an application of Generalized Least Squares (GLS). GLS is a regression technique that is used when the error terms from OLS estimation display non-random patterns, such as correlation. IGLS and RIGLS, Cont   With the estimates of and from GLS, the iterative procedure returns to the fixed part of the model and calculates new estimates of the fixed effects. The procedure alternates between the fixed and random effects in this way until convergence, or until the parameter estimates do not change from iteration to iteration. IGLS and RIGLS, Cont     IGLS estimation may produce biased estimates of the random parameters because it does not take into account the sampling variation of the estimates for variance components. This may be most severe in small samples. However, unbiased estimates can be produced using Restricted Iterative Generalized Least Squares (RIGLS). The main difference between IGLS and RIGLS is that IGLS uses maximum likelihood and RIGLS uses restricted maximum likelihood. Bayesian method    Bayesian methods combine any prior information about the parameters with the information contained in the data to produce a posterior distribution. MCMC methods are commonly used computational methods for generaring a random sample from a posterior distribution. MCMC methods are also iterative and include Gibbs sampling and Metropolis-Hastings sampling. MCMC methods tend to produce more accurate interval estimates for small samples. Three-level binary response models for the Alcohol Drinking   Let Yijk be the binary response variable for whether to receive drinking advice by subject i cared by provider j in hospital k Xijk is an intervention status for subject i by provider j in hospital k. Three-level logistic regression logit(Pr(Yijk  1))   0 jk  1 X ijk  eijk  0 jk   0  vk  u jk vk ~ N (0,  v2 ) u jk ~ N (0,  u2 ) eijk ~ N (0,  e2 )    The parameter e is a natural test for whether the assumption of Binomial variation is valid. If is significantly different from one, the data is said to exhibit extra-binomial variation. If is less than one is, the data is said to be under-dispersed and if is greater than one, the data is is said to be overdispersed. Two estimation methods Two estimation methods for multilevel logistic regression models: • A quasi-likelihood approach • Bayesian approach with MCMC methods. I will briefly describe these two approaches below. Two Quasi-likelihood methods     For the quasi-likelihood approach, the first step in the estimation is to approximate the non-linear logistic regression equation using a Taylor series expansion. A Taylor series approximates a nonlinear function by an infinite series of terms. If only the first term in the series is used, then the estimation is known as a first order approximation. If the second term in the series is also used, then is referred to as second order approximation. If the Taylor series is expanded about the fixed parameters only, then the estimation is known as Marginal Quasi-likelihood (MQL). Two Quasi-likelihood methods,Cont   If the Taylor series is expanded about the fixed and the random parameters, then the estimation is known as Penalized Quasi-Likelihood (PQL). Once the quasi-likelihood has been formed, the estimation procedures, IGLS and RIGLS, can be applied to estimate the parameter values. Bayesian method  The MCMC method used for the logistic regression equations in this paper will be Metropolis-Hastings sampling. ACQUIP Alcohol Trial • • • Binary outcome at 1-yr follow-up: (1) Self-reports of advice about alcohol patients receive from their provider. Patient-level covariates Provider-level covariates. The Alcohol Example, Cont    Random assignment at the firm level should ensure that, on average, the two groups should be balanced on the baseline covariates. However, imbalance may still occur and confounding may still present a problem. Patient-level potential confounders: hypertension, liver disease, being a smoker in the past year, and the AUDIT score. Provider-level potential confounders: the number of patients per provider (Panel Size) and provider training. Alcohol example logit(Pr(Yijk  1))   0 jk  1 X ijk   2 AdviceAtBaselineijk   3 Hypertensionijk   4 LiverDiseaseijk   5 PastYearSmokerijk   6 BaselineAUDITijk   7 PanelSizeijk   8 Fellowijk   9 NPijk  10 PAijk  11 Residentijk  12 RNijk  eijk  0 jk   0  vk  u jk vk ~ N (0,  v2 ) u jk ~ N (0,  u2 ) eijk ~ N (0,  e2 ) Three-level logistic regression, Cont     Here, the variables Hypertension, LiverDisease, and PastYearSmoker are dichotomous variables that are equal to one if the patient reported the condition and zero if the patient did not report the condition. The variable BaselineAUDIT is the patients AUDIT score at the baseline and is a continuous variable that ranges from 0 to 40, the variable PanelSize indicates the range of the provider’s panel size. The variables Fellow, NP, PA, Resident, and RN are dichotomous variables representing the categorical variable of provider type. The referent provider type is staff physician Results  Table 3.5.1 shows the MQL estimates under the combinations of first order and second order approximation and the binomial and extra-binomial assumptions and Table 3.5.2 shows the PQL estimates under the combinations of first order and second order approximation and the binomial and extra-binomial assumptions. Results for fixed effects   The estimates for the fixed effects are quite stable between estimation procedures. The estimate of the intervention effect is approximately 1.35, indicating that a patient in the intervention group is more likely to report advice than a patient in the control group. This result is not significant if a two-tailed test is used. However, this result is significant if a one-tailed test is used. The p-values for the one-tailed test range from 0.02 to 0.05 depending on which estimate is considered. Results for fixed effects, Cont   A patient self-report of advice at baseline as well as the patient’s baseline AUDIT score are the only additional variables significantly associated with a patient self-report of advice on the one-year follow-up survey. None of the provider-level variables are associated with a patient self-report of advice on the one-year follow-up survey. Results for variances and covariances    Estimates of the variance components are slightly more variable across estimation procedures in this model. The estimate of the site level variance component has increased from approximately zero to be in the range of 0.01 to 0.04. However, these estimates tend to include zero in the confidence interval, indicating as before, that there may be little or no residual clustering at the site level. The provider level variance components estimates are between 0.01 and 0.16, thus showing the greatest variation under the different estimations. Results for variances and covariances, cont  The majority of the residual variation is at the patient level. The estimates for the patient level variance component remain close to one and support the assumption of binomial variance at the patient level.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture5-12-09 - University of Washington