Download Microsoft PowerPoint - NCRM EPrints Repository

A simulation study of the effect of sample size and level of interpenetration on inference from cross-classified multilevel logistic regression models Rebecca Vassallo ESRC Research Methods Festival, July 2012 Introduction • Influence of the interviewer and area on survey response behaviour • Reflects unmeasured factors including the interviewer’s and area’s characteristics • Violation of the assumption of independence of observations • Standard analytical techniques will underestimate standard errors and can result in incorrect inference (Snijders & Bosker, 1999) • Multilevel modelling has become a popular method in analysing area and interviewer effects on nonresponse 2 Introduction • Estimation problem relating to the identifiability of area and interviewer variation • Interpenetrated sample design considered as the gold standard for separating interviewer effects from area effects • Restrictions in field administration capabilities and survey costs only allow for partial interpenetration • Multilevel cross-classified specification used in such cases (Von Sanden, 2004) • No studies available examining the properties of parameter estimates from such models under different conditions 3 Study Aims • Examine the implications of interviewer dispersal patterns within different scenarios on the quality of parameter estimates • Percentage relative bias, confidence interval coverage, power of significance tests and correlation of random parameter estimates • Different scenarios vary in sample sizes, overall rates of response, and the area and interviewer variance • Identify the smallest interviewer pool and the most geographically-restrictive interviewer case allocation required for acceptable levels of bias and power 4 Methodology: Simulation Model • Model: logit(𝑝𝑖𝑗𝑠 )= β0 + uj + vs ; uj ~N(0, σ2u ); vs ~N(0, σ2v ) • STATA Version 12 calling MLwiN Version 2.25 through the ‘runmlwin’ command (Leckie & Charlton, 2011) • Markov Chain Monte Carlo (MCMC) estimation method • MCMC method produces less biased estimates compared to firstorder marginal quasi-likelihood (MQL) and second-order penalised quasi-likelihood (PQL) (Browne, 1998; Browne & Draper, 2006) • IRIDIS High Performance Computing Facility cluster at the University of Southampton 5 Methodology: Data Generating Procedure • Overall probability 𝜋 of the outcome for the area and the interviewer with zero random effects determines overall intercept (fixed for all cases) • Cluster-specific random effects for each interviewer and area generated separately from N(0, σ2𝑢 ) & N(0, σ2𝑣 ) • 𝑢𝑗 and 𝑣𝑠 are generated for every simulation, but maintained constant across different scenarios where the only factor that changes is interviewer case allocations • The allocation of workload from different areas to specific interviewers is limited to a finite number of possibilities 6 Methodology: Data Generating Procedure • logit(𝑝𝑖𝑗𝑠 ) of each case are computed and converted to probabilities 𝑝𝑖𝑗𝑠 • Values of the dependent variable 𝑌𝑖𝑗𝑠 - a dichotomous outcome for each case - are generated from a Bernoulli distribution with probability 𝑝𝑖𝑗𝑠 • For each scenario of the experimental design, 1000 simulated datasets are generated using R Version 2.11.1 7 Methodology: Simulation Factors • Simulated scenarios vary in the following factors: -the overall sample size (N) -the number of interviewers and areas (Nints ; Narea𝑠 ) -the interviewer-area classifications [which vary in terms of the number of areas each interviewer works in (maximum 6 areas) and the overlap in the interviewers working in neighbouring areas] -the ICC (variances σ2𝑢 & σ2𝑣 ) -the overall probability of the outcome variable (π) • Medium scenario design (similar to values observed in a real dataset - a realistic starting point): 120 areas (48 cases/area) allocated to 240 interviewers (24 cases/int), totalling 5760 cases, σ2𝑣 =0.3, σ2𝑢 =0.3, π=0.8 8 Methodology - Quality Assessment Measures • Correlation between area and interviewer parameter estimates. High negative values indicate identifiability problems 1 1000 1000 𝑖=1 𝑐𝑜𝑣 𝜎𝑢2 𝜎𝑣2 𝜎𝑢 𝜎𝑣 • Percentage relative bias 1 1000 1000 𝑖=1 𝜃−𝜃 ∗ 100 𝜃 • Confidence interval coverage for 95% Wald confidence interval and the 95% MCMC quantiles are compared to nominal 95% • Power of Wald test - proportion of simulations in which the null hypothesis is correctly rejected 9 Results - Power of Tests • For medium scenario power ≈1 for all interviewer case allocations • For smaller N, more sparse allocations are required to get power >0.85 • Lower σ2𝑣 (0.2) results in lower power • When Nints =Nareas more interviewer dispersion is required for acceptable levels of power • Higher π (0.9) requires 2 areas/int for power>0.9 • Reduced interviewer overlapping for a constant number of areas/int does not improve power 10 Results - Correlation between σ2𝑣 & σ2𝑢 Estimates • For all scenarios, high negative ρ (>-0.4) are observed when interviewers work in 1 area only • No substantial change in ρ with varying total sample sizes • Very high negative ρ (up to -0.9) for N ints =N areas scenarios; ρ only reduced to <-0.1 when interviewer is working in 4+ areas (compared to 2+ areas/int for N ints =2*N areas scenarios) • Higher ρ with increasing π up till 2 areas/int allocations; thereafter no change in ρ by π • Lower ρ with increasing σ2𝑣 up till 3 areas/int allocations • Lower ρ with less interviewer overlapping for the 2 areas/int cases 11 Results – Percentage Relative Bias • In most scenarios N=5760, the relative percentage bias is around 1-2% once interviewers are allocated to 2+ areas • Further interviewer dispersion (3+ areas) & less interviewer overlapping do not yield systematic drops in bias • When interviewers are working in 2+ areas, the bias in the σ2𝑣 estimate is generally greater than the bias in σ2𝑢 estimate [when Nints =2*Nareas ] • Greater bias observed for smaller sample sizes, with the scenario including 1440 cases with Nints =Nareas obtaining bias values between 5-13% for all allocations 12 Results - Confidence Interval Coverage • Close to 95% nominal rate in all scenarios • Some cases of under-coverage or over-coverage for scenarios when interviewer works in just one area -87% coverage (N=5760, N ints =2*N areas , 𝜎𝑣2 =0.2, 𝜎𝑢2 =0.3, π=0.8, one area/int) for 𝜎𝑣2 CI -88% coverage (N=2880 or N=1440, N ints =2*N areas , 𝜎𝑣2 =0.3, 𝜎𝑢2 =0.3, π=0.8, one area/int) for 𝜎𝑣2 CI -100% coverage (N=5760 or 2880 or 1440, N ints =N areas , 𝜎𝑣2 =0.3, 𝜎𝑢2 =0.3, π=0.8, one area/int) for for 𝜎𝑣2 and for 𝜎𝑢2 CIs • No clear evidence that the MCMC quantiles perform better than the Wald asymptotic normal CIs 13 Conclusion • Interpenetration not required to distinguish between area and interviewer variation • Good quality estimates obtained for large sample sizes (≈6000 cases) if interviewers work in at least two areas • Better estimates obtained when the number of interviewers is greater than the number of areas • Higher overall probabilities & smaller variances (smaller ICC) require more interviewer dispersion for some survey conditions • The extent of interviewer overlapping shown not to be important • Results and their implications can be extended to other applications 14 Acknowledgements • University of Southampton, School of Social Sciences Teaching Studentship • UK Economic and Social Research Council (ESRC), PhD Studentship (ES/1026258/1) • Gabriele B. Durrant & Peter W. F. Smith, PhD Supervisors 15 References • Browne, W. J. (1998). Applying MCMC Methods to Multi-level Models. PhD thesis, University of Bath. • Browne, W. & Draper, D. (2006). A comparison of Bayesian and likelihoodbased methods for fitting multilevel models. Bayesian Analysis, 1, 473-514. • Leckie, G. & Charlton, C. (2011). runmlwin: Stata module for fitting multilevel models in the MLwiN software package. Centre for Multilevel Modelling, University of Bristol. • Snijders, T.A.B. & Bosker, R.J. (1999). Multilevel Analysis: an introduction to basic and advanced multilevel modelling. London: Sage. • Von Sanden, N. D. (2004). Interviewer effects in household surveys: estimation and design. Unpublished PhD thesis, School of Mathematics and Applied Statistics, University of Wollongong. 16

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Microsoft PowerPoint - NCRM EPrints Repository