* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DIF detection using OLR - University of California, Davis
Survey
Document related concepts
Transcript
DIF detection using (Ordinal) Logistic Regression Laura Gibbons, PhD Paul K. Crane, MD MPH Internal Medicine University of Washington Outline • • • • • Brief statistical background DIFdetect package What do we do when we find DIF? New, simpler, faster solutions! Discussion Statistical background • Recall definition of DIF: when a demographic characteristic interferes with relationship expected between ability level and responses to an item • A conditional definition; have to control for ability level, or else we can’t differentiate between DIF and differential test impact The 2 Parameter Logistic model • Logit P(Y=1|a,b,θ)=Da(θ-b) – Produces an item characteristic curve – Models probability that a person correctly responds to an item given the item parameters (a,b) and their person level θ – D is a constant – a, b notation reversed from biomedical conventions The 2 PL model • Logit P(Y=1|a,b,θ)=Da(θ-b) – b is the item difficulty • When θ=b, 50% probability of getting the item correct – a is item discrimination • a determines slope around the point where θ=b Modest Uniform DIF Item characteristic curves for "Close your eyes" in Spanish and English speakers 1 0.5 0 -3 -2 -1 0 1 2 3 Non-Uniform DIF Item category characteristic curves for the item “ability to walk 1 block” separately in AfricanAmericans (yellow lines) and whites Probability of endorsing 1 0.5 0 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Physical functioning Uniform and Non-uniform DIF Itemcharacteristic curves for "Repeating Phrase" in English and Spanish speakers 1 0.5 0 -3 -2 -1 0 1 2 3 Logistic regression applied to DIF detection • Swaminathan and Rogers (1990) • Tested two models: – P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) – P(Y=1|X)=f(β1X) • Compared the –2 log likelihoods of these two models to a chi squared distribution with 2 df • Uniform and non-uniform tested at same time Camilli and Shepard (1994) • Recommended a two step procedure, to first test for non-uniform DIF and then for uniform DIF – P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) – P(Y=1|X, group)= f(β1X+β2*group) – P(Y=1|X)=f(β1X) • -2 log likelihoods of each pair of models compared to determine non-uniform DIF and uniform DIF in two separate steps Millsap and Everson (1994) • Dismissive of “observed score” techniques such as logistic regression • X contains several items that have DIF, so adjusting for X is theoretically problematic • Advocated latent approaches such as IRT for DIF detection Zumbo (1999) • Extended Swaminathan and Rogers framework to ordinal logistic regression case to handle polytomous items • Did not address latent trait; also used a single step rather than two steps Crane, van Belle, Larson (2004) • Logistic regression model is a reparameterization of the IRT model, as long as IRT-derived θ estimates are used as ability scores • Raised the issue of multiple hypothesis testing of non-uniform DIF Crane et al. (2004) – 2 • Biggest change in terms of specific criteria for uniform DIF – Recognized that non-uniform and uniform DIF were analogous to effect modification and confounding – Employed epidemiological thinking about how to detect confounding relationships from the data; size of effect. Crane et al. (2004) – 3 • Same models used (though now θ not X) – P(Y=1|θ, group)= f(β1θ+β2*group) – P(Y=1|θ)=f(β1'θ) • Determine the impact of including the group term on the magnitude of the relationship between θ and item responses • Determine size of |(β1-β1')/β1|. If this is large, uniform DIF (confounding) is present Work still pending • “Optimal” criteria for uniform and nonuniform DIF are unknown – Adjust α for multiple hypotheses? – Effect size for non-uniform DIF? In huge data sets, likely to have a significant interaction term. – What proportional change in β1 is meaningful UDIF? Also under investigation • What is the role of model fit statistics? For example, if NU DIF is present, the model with group and ability only should not fit. • How important is the proportional odds/Graded response assumption? Should stereotype or other models be used in some instances? DIFdetect package • Can download from the web • www.alz.washington.edu/DIFDETECT/welcome.html • STATA-based user friendly package For those who tire of clicking • Difd varlist, ABility(str) GRoups(str) [with lots of optional specifications] Outline revisited Brief statistical background DIFdetect package • What do we do when we find DIF? • New, simpler, faster solutions! • Discussion What to do when we find DIF? • Educational settings often items with DIF are discarded • Unattractive option for us – Tests are too short as it is; lose variation – Lose precision – DIF doesn’t mean that the item doesn’t measure the underlying construct at all, just that it does so differently in different groups What do we do – 2 • Need a technique to incorporate items found to have DIF differently than DIFfree items • Precedent for this approach in Reise, Widaman, and Pugh (1993) – Constrain parameters for DIF-free items to be identical across groups – Estimate parameters for items found with DIF separately in appropriate groups Compensatory DIF • Compensatory DIF occurs when DIF in some items leads to erroneous findings in other items – Both false-positive and false-negative DIF findings Adjust ability for DIF 1. Rearrange the data to estimate a DIFadjusted theta score in PARSCALE 2. Use that new theta estimate to evaluate for compensatory DIF • Repeat steps 1 and 2 until the same items are identified each time = no compensatory DIF Rearrange data for PARSCALE Population A Population B DIF free item 1 Present Present DIF free item 2 Present Present DIF free item 3 Present Present DIF Item n Present Missing DIF Item n Missing Present DIF Item n+1 Present Missing DIF Item n+1 Missing Present … DIF free item (n-1) … Modified data set • • • • • • • • 0001 12XX2 0002 12XX4 0003 01XX3 … 0132 1X2X2 0133 0X1X3 0134 1X2X4 … • • • • 0932 0XX22 0933 1XX23 0934 0XX14 … New tools! 1. Difforpar itemlist, ID(id) RUnname(test0) ABility(ability0) GRoups(group) [with lots of optional specifications] Look at log for lack of convergence, dropped variables, nonsense output, and other warnings. New tools, continued 2. run PARSCALE with code_test0.psl 3. run thetain: thetain origdata origid test0 [merges thetatest0 and sethetatest0 into original data set] The process continues • Repeat steps 1-3 with the new thetas until the same items come up with DIF • For short lists, you can read the log file • For long lists, examine vars_testN.txt • When finished, you can check Difd.dta for model fit and assumptions Adjusting for additional groups • mergevirtual origdata originalid [merges itemdata (containing final virtual items) into original data set] • run DIFforPar with the next group, with the new list of some original and virtual items (can copy from vars_testN.txt) and do it all again! Other tools for Stata • PrePar Writes code and data for Parscale – Syntax: prepar namelist, ID(str) ru() • DIFforSRZ Do file for DIFdetect using SRZ 1step criteria – Syntax: run difforsrz abil ru – Set variable list, group, criteria, in the do file. Coming soon • DIFforPar extended for grouped variables with more than 2 categories; continous in Stata, grouped in Parscale. • Samemetric.ado (for now use: prepardata itemlist, ID(str) RUnname(str)). Have we adjusted for DIF/ controlled for confounding? • Can only adjust for measured covariates • Confounders such as education level may mean different things for different groups • Unmeasured confounders • May lack power or data may be too sparse Adjusted cognitive ability scores • So far our adjusted scores correlate highly with non-adjusted scores. • May contain additional information. • Language DIF Questions and comments