Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Pragmatic Model for Data Quality Assessment in Clinical Research Michael G. Kahn, M.D., Ph.D. Department of Pediatrics University of Colorado, Denver Colorado Clinical and Translational Sciences Institute Department of Clinical Informatics, The Children’s Hospital Electronic Data Methods Forum Methods Symposium 17-October-2011 Funding was provided by a contract from AcademyHealth. Additional support was provided by AHRQ 1R01HS019912-01 (Scalable PArtnering Network for CER: Across Lifespan, Conditions, and Settings), AHRQ 1R01HS019908 (Scalable Architecture for Federated Translational Inquiries Network), and NIH/NCRR Colorado CTSI Grant Number UL1 RR025780 (Colorado Clinical and Translational Sciences Institute). Disclosures Presentation based on AcademyHealth supported paper: “A Pragmatic Framework for Single-Site and Multi-Site Data Quality Assessment in Electronic Health Record-Based Clinical Research” Michael G. Kahn*,1,2, Marsha A. Raebel3,4, Jason M. Glanz3,5, Karen Riedlinger6, John F. Steiner3 1. Department of Pediatrics, University of Colorado Anschutz Medical Center, Aurora Colorado 2. Colorado Clinical and Translational Sciences Institute, University of Colorado Anschutz Medical Center, Aurora Colorado 3. Institute for Health Research, Kaiser Permanente Colorado, Denver Colorado 4. School of Pharmacy, University of Colorado, Aurora, Colorado 5. Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado 6. Northwest Kaiser Center for Health Research, Portland Oregon 2 What is the issue? • Poor data quality can invalidate research findings – – – – Cohort identification Risk factors / exposures / confounders Interventions Outcomes • Data quality in non-research settings even more problematic – Documentation practices – Workflow – Diligence to data quality • Our focus: how to assess data quality systematically? 3 Why is a systematic data quality assessment framework useful? • We all do various data quality reviews – We know what we’ve looked at – We may not know what we haven’t looked at • A comprehensive evaluation of data quality is too resource intensive – Need to focus on aspects that matter – If needs changes, are existing DQ assessments adequate? 4 Key Features of this Presentation • A comprehensive data quality framework adapted from information sciences for clinical research – Definitions of data quality • Multi-dimensional • Context-dependent • Operationalize DQ assessments – Uses framework to ensure coverage • A formative proposal – data quality meta-data tags 5 Data Quality Assessment Stages • Stage 1: initial assessments of source data sets prior to analysis – Simple global analyses, visualizations, descriptive statistics – Both single-site and multi-site • Stage 2: Study-specific analytic subsets with complex models and detailed data validations focused on dependent and independent variables. 6 A trivial example: Martial Status by Age Would this result be worrisome? 7 It’s tough being 6 years old……. 8 Should we be worried? • No – Large numbers will swamp out effect of anomalous data or use trimmed data – Simulation techniques are insensitive to small errors • Yes – Observed site variation may be driven by differences in data quality, not clinical practices – Genomic associations look for small signals (small differences in risks) amongst populations 9 Hyperkalemia with K+-sparing Agents 10 Comparative Temporal Trends: Serum Glucose 11 Data quality assessment lifecycles 1. Site level Extraction from EMR Data quality assessments 2. Multi-site level Data quality assessments 3. Final analytic data set Data merging 12 Multi-site quality assessment workflows • Many loops • Many decisions (diamonds) 13 Data quality dimensions from the IS/CS literature • Terms used in Information Sciences literature to describe data quality Wand Y, Wang R. Anchoring data quality dimensions in ontological foundations. Comm ACM. 1996;39(11):86-95. 14 Defining data quality: The “Fit for Use” Model • Borrowed from industrial quality frameworks – Juran (1951): “Fitness for Use” • design, conformance, availability, safety, and field use • Multiple adaptations by information science community – Not all adaptations are clearly specified – Not all adaptations are consistent – Not linked to measurement/assessment methods 15 The Wang and Strong Data Quality Model • Interviews with broad set of data consumers • Yielded 118 data quality features Four categories Fifteen dimensions • Includes features of the data and the data system Our modification: Two data categories Five dimensions Wang, R. and D. Strong (1996). "Beyond accuracy: What data quality means to data consumers." J. Management Information Systems 12(4): 5-34. 16 17 How to measure data quality? • Need to link conceptual framework with methods • Maydanchik: Five classes of data quality rules – Attribute domain: validate individual values – Relational integrity: accurate relationships between tables, records and fields across multiple tables – Historical: time-vary data – State-dependent: changes follow expected transitions – Dependency: follow real-world behaviors Maydanchik, A. (2007). Data quality assessment. Bradley Beach, NJ, Technics Publications. 18 Data Quality Assessment METHODS • Five classes of data quality rules 30 assessment methods – – – – – Attribute domain rules (5 methods) Relational integrity: (4 methods) Historical: (9 methods) State-dependent: (7 methods) Dependency: (5 methods) Time and change assessments dominate!! 19 Dimension 1: Attribute domain constraints 20 Dimension 2: Relational integrity rules 21 22 Dimension 4: State-dependent rules 23 Dimension 5:Attribute dependency rules 24 How to use this framework • Determine which aspects of data quality matter most at Stage 1 – – – – What is needed to support Stage 2 What is doable with data sources? What can the project afford to do? What needs to be done once versus repeatedly • Write up a data quality assessment plan – What’s in, what’s out – And why 25 Extra credit slides: A formative proposal • President’s Council of Advisors on Science and Technology (PCAST) – Recommended mandatory “metadata” tags attached to all HIT data elements • Metadata are descriptions of the data • PCAST proposed tags: data provenance, privacy permissions/restrictions 26 27 Extra credit slides: A formative proposal • CER community defines metadata tags that describe data quality for data elements and data sets – Simple distributions (mean, median, min, max, missingness, histograms) • ala OMOP OSCAR – More comprehensive set of measures derived from this framework • If you are interested in this concept, contact me! 28 A Pragmatic Model for Data Quality Assessment in Clinical Research Michael G. Kahn, M.D., Ph.D. [email protected] Funding was provided by a contract from AcademyHealth. Additional support was provided by AHRQ 1R01HS019912-01 (Scalable PArtnering Network for CER: Across Lifespan, Conditions, and Settings), AHRQ 1R01HS019908 (Scalable Architecture for Federated Translational Inquiries Network), and NIH/NCRR Colorado CTSI Grant Number UL1 RR025780 (Colorado Clinical and Translational Sciences Institute). Validation of Electronic Clinical Data for Comparative Effectiveness Research EDM Forum Methods Stakeholder Symposium October 27, 2011 David R. Flum, MD, MPH Principal investigator Beth Devine, PharmD, MBA, PhD Lead investigator, CER Core Investigator, HIT Core University of Washington, Seattle, WA Sponsored by AHRQ 1 R01 HS020025-01 – Enhanced Registries for QI & CER Outline • SCOAP • SCOAP CERTN • SCOAP CERTN Validation Study The SCOAP Story • Washington State grassroots initiative to “own” surgical quality and cost Clinician led Foundation for Healthcare Quality Life Sciences Discovery Fund project grant‐2007 • Simple concept‐benchmarking Data sharing/feedback network Interventions‐checklist initiative • Now self‐sustaining o Improving healthcare for all Washingtonians o Millions in reduced healthcare costs o Development in neurosurgery and orthopedic spine • SCOAP – a Learning Healthcare system Average Cost/Case (2009 dollars) SCOAP Bends the Cost Curve $22,000 Non-SCOAP Hospitals $20,000 $18,000 $ 67.3 Million $16,000 SCOAP Hospitals $14,000 $12,000 $10,000 2006 2007 All SCOAP Procedures combined (35,994 patients) 2008 2009 SCOAP + AHRQ = SCOAP CERTN • SCOAP: A learning healthcare system needs… Long‐term care, medication use and outcomes Outpatient and inpatient data Patient reported outcomes Broader range of clinical areas Leverage EHRs Stakeholder engagement • AHRQ: Interest in creating enhanced registries Use multiple data sources, link to multiple EHRs and evaluate data across 2+ care delivery sites Overcome common barriers of patient registries Create CER and link outcomes to advancing QI Provide a platform for dissemination & translation SCOAP CERTN • Comparative Effectiveness Research Translation Network Uses stakeholders and real word clinical practice to address questions of comparative benefit and risks for: o Payers, Industry ,Policy makers Builds automated data flow Leverages SCOAP’s success to disseminate/translate research findings • Project 1: Partially automate SCOAP data capture “ The Validation Study for semi‐automated data collection of machine‐readable data” • Project 2: CER study in peripheral artery disease – intermittent claudication Partial automation improves access to data • Deploy Microsoft Amalga UIS™, leveraging site‐specific electronic medical records to capture clinical data • Data used for SCOAP QI registries and SCOAP CERTN projects – Access more data on more patients for queries – Potential access to data buried in text via text mining – Parallel validation of partial automation and text mining • Increases scalability for SCOAP to expand into other clinical disciplines for both QI and CER • Amalga installation at sites is covered by SCOAP CERTN Benefits to SCOAP Hospitals • Partial Automation of Data Decreases manual effort for SCOAP data abstraction Participation in additional SCOAP registries for QI and CER with minimum additional resource burden • Site benefits of Amalga installation Interfaces into Amalga done by Microsoft Ability to do unlimited number of ad hoc queries against their own data in Amalga using SCOAP CERTN base views Evaluation of Amalga system against their needs After grant end, Microsoft willing to negotiate mutually agreed pricing model and business terms to continue and/or expand Amalga use • Participation in a learning network that takes advantage of best practices around semi‐automated abstraction of QI measures • Learning from the ongoing development of text mining tools SCOAP CERTN Validation Study: Overview • Specific Aim Validate machine‐readable data flowing from Amalga for 2 uses: o SCOAP QI o Comparative effective research using the CERTN network • Inclusion criteria – all SCOAP patients • Data sources – Variables (from over 700 data elements) Core SCOAP, Vascular‐Interventional SCOAP, General SCOAP Metrics from national QI initiatives • Validation dataset Armus (manual abstraction), Amalga (automated extraction) Medical records = gold standard SCOAP CERTN Validation Study: Overview CERTN Validation Study: Study Aims • • • • • 1) Calculate the concordance in the combined Armus‐Amalga Validation dataset. 2) Validate data elements coming from Armus database by estimating precision, recall and F‐measure of elements, comparing against the gold standard (medical record). 3) Validate machine‐readable data elements flowing from Amalga by estimating precision, recall and F‐measure of elements, comparing against the gold standard (medical record). 4) Establish “benchmark” to indicate when Amalga ready to “go‐live”. 5) Estimate degree to which Amalga is “learning” by comparing validation metrics extracted from Amalga, over each quarterly iteration of Amalga. CERTN Validation Study: Evaluation Plan (1) • Codify terms Gold standard = medical record Referent = Armus database (SCOAP QI database) • Codify metrics used in current SCOAP QI Process with Armus database Inter‐rater reliability o comparing data abstraction from medical records among abstractors Level of precision (accuracy) o Armus database compared to medical record Level of recall (completeness) o Armus database from medical record F‐measure – harmonic mean of precision and recall CERTN Validation Study: Evaluation Plan (2) • • • • Establish benchmark for determining when data from Amalga are accurate and complete enough to “Go‐Live” at hospital sites Determine list of metrics to be compared – all machine readable Core (with QI value), General SCOAP, Vascular‐Interventional SCOAP ‘Packets’ of metrics (demographic/treatment, binary/categorical) In Amalga ‐ identify step(s) that require validation Establish timeframe for comparison Armus ‐Image at the time of quarterly report generation Amalga o Ensure dataflow is complete and correct (iterate), then o Compare periodically to Armus data – Daily, weekly, monthly CERTN Validation Study: Evaluation Plan (3) • • • • Develop & execute protocols and algorithms to check Amalga against referent (Armus) Apply statistical procedures to estimate precision and recall Develop and execute methods to address discrepancies and discordant values in Armus and Amalga databases – precision/recall If Armus incorrect, follow QI procedures If Amalga incorrect, enter correct data into pre‐populated web‐form with track overrides and comments o All original and overridden data retained in dataset o Involve HIT Core to improve precision and recall of data flow from Amalga Develop and execute audit process to address concordance – random sample of concordant metrics – precision/recall Iterate process until benchmarks reached, then “Go‐Live” at hospitals Quality Improvement All SCOAP Patients SCOAP Hospital DUA/BAA hospital to UW/MSFT BAA equivalent hospital to FHCQ (SCOAP) *existing General Surgical Pediatric Vascular/Interventional SCOAP is a Coordinated Quality Improvement Program (RCW 43.70.510). The exchange of data from a hospital to SCOAP is exempt from disclosure. Amalga MSFT CDR All Patients or all SCOAP Patients SCOAP Pts ONLY Hospital #1 Hospital #2 Interfaces to EHRs, like ADT, Labs, Medications, Text Reports and Transcription Hospital n The “use” of PHI is acceptable under HIPAA laws and regulations as part of clinical quality assurance and quality improvement. NLP SCOAP Hospital Privacy Statement Needs to Broadly Include that Patient Information May Be Used for Quality Improvement Quality Improvement Current manual abstractors gather data from the chart for SCOAP’s 700+ data points on the data collection form SCOAP Hospital General Surgical Pediatric Vascular/Interventional VALIDATE Amalga MSFT CDR Hospital #2 Hospital n Through an automated comparison of the two versions of the SCOAP datasets discrepancies are identified and a set of automated “flags” is generated. These flags (e.g. data points 14, 98, 612, and 707 are discrepant for patient 12345) generate for each patient a list of data points with two values (one manually obtained and one obtained via automation using Amalga and NLP). SCOAP Pts ONLY Hospital #1 In parallel, data about SCOAP only patients from each site database is transferred into a “SCOAP” database and Amalga generates a subset of SCOAP data points. SCOAP The QI staff will compare Value 1 versus Value 2 for data point X (blinded as to which is human abstraction versus automated abstraction using Amalga and NLP) and mark which one is correct. The QI staff will find the relevant segment of the record (e.g. operative report) and attach this snippet of text in a de‐identified fashion. Quality Improvement Research SCOAP Hospital General Surgical Pediatric Vascular/Interventional Researchers get a list of data flags (e.g. data point 14, 98, 612, and 707) and responses indicating which value was correct as well as relevant, non‐ identifiable chart‐based snippets (e.g. relevant segment of operative reports) that will help facilitate Amalga abstraction improvement and NLP . VALIDATE Amalga MSFT CDR SCOAP Pts ONLY Hospital #1 Hospital #2 Hospital n SCOAP To verify that when manual and automated approaches agree, that, in fact, they are both correct ‐ for concordant values a random subset of data points will be sent for manual review by QI staff using same processes as above. Quality Improvement Research CQIP Protections AHRQ Statutory Confidentiality Protection of Research Data SCOAP Hospital General Surgical Pediatric Vascular/Interventional QI or SITE COORDINATION STAFF & SCOAP DATA MANAGER CERTN INVESTIGATORS & RESEARCH STAFF/ANALYSTS VALIDATE De‐ID DATA ONLY Amalga MSFT CDR SCOAP Pts ONLY Hospital #1 Hospital #2 Hospital n SCOAP Describing this comparative or ”learning loop” process in academic publications and presentations is of interest to CERTN project leads. Questions? Jay Desai HealthPartners Research Foundation EDM Forum Methods Symposium Washington, DC October 27, 2011 Team Jay Desai (HealthPartners Research Foundation) Patrick O’Connor (HealthPartners Research Foundation) Greg Nichols (Kaiser Permanente Northwest) Joe Selby (KPNC PCORI) Pingsheng Wu (Vanderbilt University) Tracy Lieu (Harvard Pilgrim) Diabetes Datalink HMO Research Network Approximately 11 million member population SUrveillance, PREvention, and ManagEment of Diabetes Mellitus (SUPREME‐DM) Twelve participating health systems Funded by AHRQ Diabetes Registries for CER, Surveillance, and other Research Purposes The Gold Standard problem Sensitivity, Specificity, Predictive Positive Value Confidence and Understanding Understand contribution of EHD sources Variation in EHD data sources Population Representativeness Case Retention The Gold Standard Problem Biological gold standard for diabetes identification: Elevated blood glucose levels With good care management glucose levels may be below the threshold for diabetes diagnosis Remission due to substantial weight loss, bariatric surgery Comparative validity Medical record documentation Self‐report Claims‐based diagnosis codes Diabetes Case Identification versus True Diabetes Population A False Positives (B) False Negatives (C) Sensitivity = A/(A+C) Actual Yes Actual No Case ID Yes A B A + B Case ID No C D C + D A + C B + D Diabetes Case Identification versus True Diabetes Population A False Positives (B) False Negatives (C) Specificity = B/(B+D) (D) Actual Yes Actual No Case ID Yes A B A + B Case ID No C D C + D A + C B + D Diabetes Case Identification versus True Diabetes Population A False Positives (B) Sensitivity = A/(A+C) False Negatives (C) Example: 90% Sensitivity 99% Specificity 5% Prevalence 81% PPV PPV = A/(A+B) Actual Yes Actual No Case ID Yes A B A + B Case ID No C D C + D A + C B + D Tailoring Diabetes Case Definition to Specific Research Questions High Sensitivity High Predictive Positive Value Maximize inclusion of Maximize identification of potential cases Observational studies Surveillance Population‐based quality metrics CER Attenuate results but may have broader generalizability ‘true’ cases Studies involving subject interventions Registries that guide clinical interactions Accountability tied to providers or systems Intervention studies CER Stringent case identification Potential selection bias Building Confidence in Case Identification Vary time frames using same case identification criteria Shorter time frames: more confident but capture fewer cases Longer time frames: less confident but may capture more cases What is ideal, especially with no gold standard? Periodic recapture Building Confidence in Case Identification Prioritizing case identification criteria Assign probabilities of ‘true case’ Prioritize data sources More independent data sources identification… more confidence. Building the DataLink Registry Initial registry construction Broad sweep: Any indication of diabetes from any electronic health data source Datalink: Prevalent Diabetes Case Definition Enrolled beginning in 2005 Look back at data beginning from 2000 Within a 2‐year time period… At least 2 face‐to‐face outpatient diabetes diagnoses, or At least 1 inpatient diabetes diagnosis, or At least 1 anti‐glycemic pharmacy dispense or claim (excluding metformin, thiazolidinediones, or exenatide when no other criteria are met), or At least 2 elevated blood glucose levels (HbA1c, fasting plasma glucose, random plasma glucose) or one elevated OGTT 2008‐9 Diabetes Case Identification at a DataLink Health System ?% Pharmacy Dispense (68%) Outpatient & Inpatient Diagnoses (92%) ?% ?% ?% Laboratory Results (63%) Dynamic and Static Cohorts Dynamic Cohort Figure 1. Dynamic & Static Diabetes Prevalence at one DataLink Health System (2008-9) Cumulative case identification over multiple years Enter as new case or care system member Leave due to death or disenrollment Static Cohort Identification over defined time period and followed Enter…none. 10 8.2 8 7.7 6 Dynamic (2000‐9) 4 Static (2008‐9) 2 0 Prevalence Differential Use and Characteristics of Electronic Health Data Sources There is wide variation across health systems regarding the primary source for case identification. There may be selection bias associated with specific data sources. This could affect case‐mix and therefore results. Initial Case Identification: At least 2 elevated blood glucose levels (lab) Initial Case Identification: At least 1 diabetes drug pharmacy claims 2008‐9 Diabetes Case Identification at a DataLink Health System ?% Pharmacy Dispense (68%) Outpatient & Inpatient Diagnoses (92%) ?% ?% ?% Laboratory Results (63%) Data Sources: Added Value Insurance and pharmacy claims are routinely used for diabetes case identification. Numerous validation studies against medical record or self‐report What is the added case identification value of clinical data found in EMR’s? Step‐wise Contributions of Electronic Health Data Sources to Diabetes Case Identification (2008‐9) at One DataLink Health System Prevalence (N) % Cases A B C D E F G 2 Outpatient Claims Dx 1 Inpatient Claims Dx Only A + B 1 Diabetes Drug Claim Only C + D 2 Elevated Blood Glucose Tests Only E + F 7.0% (12,916) 0.1% (111) 7.1% (13,027) 0.4% (678) 7.4% (13,705) 0.2% (422) 7.7% (14,127) 91% +1% +5% +3% Patient Characteristics Based on Qualifying Case Identification EHD (2008‐9) at One DataLink Health System Prevalence (N) % Cases A B C D E F G Outpatient Claims Dx Inpatient Claims Dx Only A + B Diabetes Drug Claim Only C + D Blood Glucose Lab Test Only E + F 7.0% (12,916) 0.1% (111) 7.1% (13,027) 0.4% (678) 7.4% (13,705) 0.2% (422) 7.7% (14,127) 91% 1% 5% 3% Select Patient Characteristics Female (%) 49 60 49 79 51 50 51 18‐44 years (%) 12 26 12 58 14 8 14 HbA1c < 8 (%) 81 91 81 91 81 98 82 LDL‐c < 100 (%) 74 71 74 46 73 56 72 Current smoker (%) 11 11 11 14 11 15 11 Population Representativeness CER studies? Assess relative effectiveness of various treatments and systems of care in defined patient populations. Uninsured No: if population defined based on insurance claims Probably: if population defined based on EMR Units of analysis Patient, Provider, Clinic, Health System Large multi‐site registries more likely to provide representative ‘units of analysis’ HIE potential to include smaller, less integrated systems Percent Retention of 2002 Incident Diabetes Cohort at One DataLink Health System Comparing Selected Characteristics of 2006 Incident Cohort by Retention Status through 2010: Baseline characteristics Retained Cohort Lost to disenrollment Female 51% 50% 18‐44 years 15% 27% 45‐64 years 52% 55% 65+ years 32% 16% Current smoker 15% 19% BMI ≥ 30 kg/m2 60% 62% HbA1c < 8% 86% 78% LDL‐c < 100 mg/dl 51% 43% SBP < 140 mmHg 80% 79% DBP < 90 mmHg 94% 90% Summary No realistic EHD gold standard for many conditions. When designing a registry, Think multi‐purpose Maximize case capture so that a variety of case definitions can be derived depending on specific study needs. Consider developing several case definitions with different levels of confidence [sensitivity & PPV]. Summary For CER studies we are interested in defined patient populations, providers, clinics, health systems… The greater the diversity of health systems participating in a disease registry the better…the more representative. EMR‐derived registries may include uninsured and be most representative. Summary Cohorts developed using insurance claims have substantial attrition due to disenrollment over time. Important to include demographic and clinical characteristics of retained population compared to those loss‐to‐follow‐up CER studies requiring long follow‐up to outcomes may be challenging if based on secondary use of EHD data Improve as health systems get regionally connected so patients can be tracked across systems (HIE’s)?