Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Ethnic dealignment? Combining individual and aggregate data to improve estimates of ethnic voting in Britain in 2001 and 2005 Stephen Fisher, Jane Key, Nicky Best, Sylvia Richardson Department of Sociology, University of Oxford and Department of Epidemiology and Public Health Imperial College, London http://www.bias-project.org.uk Outline Introduction and substantive issues Methods and Results Standard multilevel model Ecological and Hierarchical Related Regression Results Results Discussion and further work Introduction and substantive issues NCRM BIAS project: Overall goals To develop a set of statistical frameworks for combining data from multiple sources To improve our capacity to handle limitations inherent in observational data. Key statistical tools: Bayesian hierarchical models and ideas from graphical models form the basic building blocks for these developments A decline in ethnic minority support for Labour? Ethnic minority vote consistently around 80% from 1974 to 2001 Between 2001 and 2005 there were Islamic terrorist attacks US and UK led invasions of Afghanistan and Iraq Heightened security and suspicion of non-whites Unlawful detention of foreign terror suspects Convictions of British soldiers for Iraqi prisoner abuse These and other events are thought to have undermined support for Labour among ethnic minorities. On the other hand, harsh stance on immigration in Conservative 2005 election campaign may have alienated ethnic voters This paper seeks to test whether the gap in Labour vote between whites and non-whites narrowed between 2001 and 2005. Data Problem: Not enough high quality survey data on ethnic minorities So combine individual and aggregate data using new class of multilevel models developed by BIAS Individual-level: British Election Study post-election surveys. 97 registered ethnic minorities in 2001, and 137 in 2005 Constituency-level: 2001 and 2005 election results 2001 Census data on % who are non-white Population: Focus on Labour voting as proportion of registered pop. since census might be reasonable proxy for this, but not voting pop. Methods and Results Data sources: individual data Source of data Resolution Coverage Variables N British Election Study, 2001, 2005 Individual Sample of electorate Individuals Outcome*, 2,669 $ Predictor 3,431 General Election results, 2001, 2005 Area (Parliamentary Constituency) Electorate Outcome 2001 Census † ‡ Constituencies 128 (2001) 128 (2005) 43.2 million 641 43.1 million 628 (2001) (2005) Area Population Predictor 57.1 million 641 (Parliamentary Constituency) † *Outcome = Vote choice (Labour / other); Outcome = Proportion voting Labour $ ‡ Predictor = Ethnicity (white / non-white); Predictor = Proportion non-white Multilevel model for individual BES data b a s2 qi xij yij person j area i Multilevel model for individual BES data b a s2 qi xij yij person j area i yij = voted Labour (1) / other (0) xij = non-white (1) / white (0) Multilevel model for individual BES data b a s2 qi xij yij person j area i yij = voted Labour (1) / other (0) xij = non-white (1) / white (0) yij ~ Bernoulli(pij), person j, area i logit pij = a + b xij + qi Multilevel model for individual BES data b a s2 qi xij yij person j area i yij = voted Labour (1) / other (0) xij = non-white (1) / white (0) yij ~ Bernoulli(pij), person j, area i logit pij = a + b xij + qi qi ~ Normal(0, s2) Multilevel model for individual BES data b a s2 qi xij yij person j area i yij = voted Labour (1) / other (0) xij = non-white (1) / white (0) yij ~ Bernoulli(pij), person j, area i logit pij = a + b xij + qi qi ~ Normal(0, s2) b = individual-level effect of ethnicity on vote choice qi = “unexplained” area effects Results: Share of the vote for whites and non-whites based on BES survey data A. As a proportion of voters N % Labour Std. error. 95% CI White 1876 43 1.4 (40,46) 2001 Non-White 67 72 6.1 (60,84) White 2546 36 1.3 (33,38) 2005 Non-White 97 51 6.3 (38,63) Results: Share of the vote for whites and non-whites based on BES survey data A. As a proportion of voters N % Labour Std. error. 95% CI White 1876 43 1.4 (40,46) 2001 Non-White 67 72 6.1 (60,84) White 2546 36 1.3 (33,38) 2005 Non-White 97 51 6.3 (38,63) Results: Share of the vote for whites and non-whites based on BES survey data B. As a proportion of electorate N % Labour Std. error. 95% CI White 2572 33 1.1 (31,35) 2001 Non-White 97 55 5.7 (43,66) White 3294 27 1.0 (25,29) 2005 Non-White 137 37 5.0 (27,47) Results: Share of the vote for whites and non-whites based on BES survey data B. As a proportion of electorate N % Labour Std. error. 95% CI White 2572 33 1.1 (31,35) 2001 Non-White 97 55 5.7 (43,66) White 3294 27 1.0 (25,29) 2005 Non-White 137 37 5.0 (27,47) Results from regression analysis of BES electorate data 2001 No random effect With random effect 2005 Change -1.0 -0.5 0.0 0.5 1.0 estimated effect of ethnicity on Labour voting (log odds ratio) Comments Between 2001 and 2005, non-white vote drops from 72% to 51% (voters) 55% to 37% (electorate) Small sample size → large SE and CI for the proportion of nonwhites voting Labour Gap in Labour vote between whites and non-whites narrows from 29 points in 2001 to 15 points in 2005 (voters) 22 points in 2001 to 10 points in 2005 (electorate) But, change is not statistically significant (multilevel analysis) Is this just because sample size is too small? What can we learn from aggregate data? Data sources: individual and aggregate data Source of data Resolution Coverage Variables N British Election Study, 2001, 2005 Individual Sample of electorate Individuals Outcome*, 2,669 $ Predictor 3,431 General Election results, 2001, 2005 Area (Parliamentary Constituency) Electorate Outcome 2001 Census † ‡ Constituencies 128 (2001) 128 (2005) 43.2 million 641 43.1 million 628 (2001) (2005) Area Population Predictor 57.1 million 641 (Parliamentary Constituency) † *Outcome = Vote choice (Labour / other); Outcome = Proportion voting Labour $ ‡ Predictor = Ethnicity (white / non-white); Predictor = Proportion non-white Standard ecological regression model t2 a b ci Yi area i Xi Ni Standard ecological regression model Yi = number voting Labour Ni = registered electorate t2 a b Xi = proportion non-white ci Yi area i Xi Ni Standard ecological regression model Yi = number voting Labour Ni = registered electorate t2 a b Xi = proportion non-white Yi ~ Binomial(qi, Ni), area i ci logit qi = a + bXi + ci Yi ci ~ Normal(0, t2) area i Xi Ni Standard ecological regression model Yi = number voting Labour Ni = registered electorate t2 a b Xi = proportion non-white Yi ~ Binomial(qi, Ni), area i ci logit qi = a + bXi + ci Yi ci ~ Normal(0, t2) b = effect of area ethnicity on probability of voting Labour b ≠ b → ecological bias area i Xi Ni Ecological bias Bias in ecological studies can be caused by: Confounding confounders can be area-level (between-area) or individuallevel (within-area). → include control variables and/or random effects in model Non-linear covariate-outcome relationship, combined with within-area variability of covariate No bias if covariate is constant in area (contextual effect) Bias increases as within-area variability increases …unless models are refined to account for this hidden variability Alleviating ecological bias Alleviate bias associated with within-area covariate variability Obtain information on within-area distribution fi(x) of covariates, e.g. from individual-level data Use this to form well-specified model for ecological data by integrating (averaging) the underlying individual-level model Yi ~ Binomial(qi , Ni); qi = pij(x) fi(x) dx qi is average group-level probability (of voting Labour) pij(x) is individual-level probability given covariates x fi(x) is distribution of covariate x within area i Alleviating ecological bias Consider single binary covariate x, e.g. white/non-white f(xi) → proportion of individuals with x=1 in each area Individual-level model pij = probability of voting Labour log pij = a + b xij (log link assumed for simplicity) → pij = ea if person j is white (xij=0) pij = ea+b if person j is non-white (xij=1) Integrated group-level model Xi = proportion non-white in area i (mean of xij) qi = average probability (proportion) voting Labour area i = ∑j pij /Ni = ea (1-Xi) + ea+b Xi Standard ecological regression model t2 Yi ~ Binomial(qi, Ni), logit qi = a + bXi + ci a b area i ci ci ~ Normal(0, t2) Yi area i Xi Ni Integrated ecological regression model s2 Yi ~ Binomial(qi, Ni), a b area i qi = pij(xiji,a, b,qi)fi(x)dx qi qi ~ Normal(0, s2) Yi area i Xi Ni Integrated ecological regression model s2 Yi ~ Binomial(qi, Ni), a b area i qi = pij(xiji,a, b,qi)fi(x)dx qi qi ~ Normal(0, s2) b can be interpreted as individual-level effect of ethnicity on probability of voting Labour Yi area i Xi Ni Combining individual and aggregate data Multilevel model for individual data b xij a Integrated ecological model s2 s2 qi qi yij person j a b Yi area i area i Xi Ni Combining individual and aggregate data a b Hierarchical Related Regression (HRR) model s2 Joint likelihood for yij and Yi depending on shared parameters a, b, qi and s2 qi xij yij Yi person j area i Xi Ni Combining individual and aggregate data a b s2 Estimation carried out using R software (maximum likelihood) or WinBUGS (Bayesian) qi xij yij Yi person j area i Xi Ni Comparison of results from individual and HRR analysis Individual Combined (HRR) 2001 No random effect With random effect 2005 Change -1.0 -0.5 0.0 0.5 1.0 estimated effect of ethnicity on Labour voting (log odds ratio) Comparison of results from individual and HRR analysis Individual Combined (HRR) 2001 No random effect With random effect 2005 Change -1.0 -0.5 0.0 0.5 1.0 estimated effect of ethnicity on Labour voting (log odds ratio) Discussion and further work Conclusions BES survey estimates halving of gap in Labour voting between whites and non-whites of from 29 to 15 points Due to small-N for ethnic minorities, not statistically significant Combined aggregate and individual level HRR analysis suggests a significant decline in the ethnic voting gap But if constituency level random effects are allowed for the change is again statistically insignificant → considerable heterogeneity between constituencies → suggests other important individual or area predictors Lack of statistical significance may reflect data problems (see below) may be ‘real’ – BES may over-estimate change in Labour share of ethnic vote (quota sample reported 66% ethnic minorities questioned voted Labour in 2005, compared with 51% in BES) Substantive Data Limitations Norm is to consider share of the vote, so unfortunate that this can’t be done using HRR model Ethnic minorities aren’t all the same But Labour voting as a share of the electorate still a valid issue, substantive conclusions likely to be similar. Previous research suggests Blacks more Labour than S Asians Unfortunately not enough data or variance (at both levels) to explore differences between minority groups. Other sources of ecological bias are likely due to absence of controls for other relevant variables, eg. socio-economic factors HRR models can be extended to include additional variables Requires constituency-level data on joint distribution of ethnicity and other relevant variables Strengths of HRR approach…… Aims to provide individual-level inference using aggregate data by: Fitting integrated individual-level model to alleviate one source of ecological bias Including samples of individual data to help identify effects Uses data from all constituencies, not just those in BES survey Improves precision of parameter estimates …..and limitations of HRR approach Integrated individual-level model relies on large contrasts in the predictor proportion across areas Limited variation in % non-white across constituencies: (median 2.7%, 95th percentile 33%; only 9 constituencies in 2005 had non-white majority) Our estimates may not be completely free from ecological bias (Jackson et al, 2006) Estimation of ethnicity effect strongly confounded with area random effects Further Work Further analysis will consider fuller model specifications with ethnic contextual effects and individual and aggregate level control variables Also intend to investigate inclusion of other sources of individual-level data, such as opinion polls, in HRR models References Fisher S, Key J, Best N, Richardson S. Ethnic dealignment? Combining individual and aggregate data to improve estimates of ethnic voting in Britain in 2001 and 2005. Paper in preparation. Jackson C, Best N and Richardson S. (2008) Studying place effects on health by synthesising individual and area-level outcomes. Social Science and Medicine, 67:1995-2006 Jackson C, Best N and Richardson S. (2008) Hierarchical related regression for combining aggregate and individual data in studies of socio-economic disease risk factors. J Royal Statistical Society Series A: Statistics in Society 171(1):159-178 Jackson C, Best N and Richardson S. (2006) Improving ecological inference using individual-level data Statistics in Medicine, 25(12):2136-2159 Papers available from www.bias-project.org.uk Validated turnout based on BES data N % Voting Std. error. 95% CI White 2572 72 1.1 (70,74) 2001 Non-White 97 66 5.7 (54,77) White 3294 75 1.0 (73,77) 2005 Non-White 137 73 4.5 (64,81) Simulation Study True Effect True log OR Ecological model Eco + Ind model Smoking range % 0-25% 0 -exposed: 25% (100 areas) (100 Individual data Smoking range % 0-50% 0 -exposed: 50% (100 areas) areas) (100 Area data Area data + sample of 10 individuals Smoking range % 0 -exposed: 100% 0-100% (100 areas) (100 Smoking range exposed: 0-25% 0% - 25% (25 (25areas) areas) -0.5 0.0 0.5 1.0 log OR ofof IHD f orfor smokers EstimatedLog effect exposure on outcome whites RRof IHD smokers 1.5