Download Assessing Validity and Quality, and Understanding Context

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
A Pragmatic Model for Data Quality
Assessment in Clinical Research
Michael G. Kahn, M.D., Ph.D.
Department of Pediatrics University of Colorado, Denver
Colorado Clinical and Translational Sciences Institute
Department of Clinical Informatics, The Children’s Hospital
Electronic Data Methods Forum
Methods Symposium
17-October-2011
Funding was provided by a contract from AcademyHealth. Additional support was provided by AHRQ 1R01HS019912-01 (Scalable
PArtnering Network for CER: Across Lifespan, Conditions, and Settings), AHRQ 1R01HS019908 (Scalable Architecture for
Federated Translational Inquiries Network), and NIH/NCRR Colorado CTSI Grant Number UL1 RR025780 (Colorado Clinical and
Translational Sciences Institute).
Disclosures
Presentation based on AcademyHealth supported paper:
“A Pragmatic Framework for Single-Site and Multi-Site Data Quality
Assessment in Electronic Health Record-Based Clinical Research”
Michael G. Kahn*,1,2, Marsha A. Raebel3,4, Jason M. Glanz3,5, Karen Riedlinger6, John F. Steiner3
1. Department of Pediatrics, University of Colorado Anschutz Medical Center, Aurora Colorado
2. Colorado Clinical and Translational Sciences Institute, University of Colorado Anschutz Medical Center,
Aurora Colorado
3. Institute for Health Research, Kaiser Permanente Colorado, Denver Colorado
4. School of Pharmacy, University of Colorado, Aurora, Colorado
5. Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado
6. Northwest Kaiser Center for Health Research, Portland Oregon
2
What is the issue?
• Poor data quality can invalidate research findings
–
–
–
–
Cohort identification
Risk factors / exposures / confounders
Interventions
Outcomes
• Data quality in non-research
settings even more problematic
– Documentation practices
– Workflow
– Diligence to data quality
• Our focus: how to assess data quality systematically?
3
Why is a systematic data quality assessment
framework useful?
• We all do various data quality reviews
– We know what we’ve looked at
– We may not know what we haven’t looked at
• A comprehensive evaluation of data quality is
too resource intensive
– Need to focus on aspects that matter
– If needs changes, are existing DQ assessments
adequate?
4
Key Features of this Presentation
• A comprehensive data quality framework adapted
from information sciences for clinical research
– Definitions of data quality
• Multi-dimensional
• Context-dependent
• Operationalize DQ assessments
– Uses framework to ensure coverage
• A formative proposal – data quality meta-data tags
5
Data Quality Assessment Stages
• Stage 1: initial assessments
of source data sets prior to analysis
– Simple global analyses, visualizations,
descriptive statistics
– Both single-site and multi-site
• Stage 2: Study-specific analytic
subsets with complex models and detailed data
validations focused on dependent and
independent variables.
6
A trivial example: Martial Status by Age
Would this result be worrisome?
7
It’s tough being 6 years old…….
8
Should we be worried?
• No
– Large numbers will swamp out effect of anomalous
data or use trimmed data
– Simulation techniques are insensitive to small errors
• Yes
– Observed site variation may be driven by differences
in data quality, not clinical practices
– Genomic associations look for small signals (small
differences in risks) amongst populations
9
Hyperkalemia with K+-sparing Agents
10
Comparative Temporal Trends: Serum Glucose
11
Data quality assessment lifecycles
1. Site level
Extraction from
EMR
Data quality
assessments
2. Multi-site level
Data quality
assessments
3. Final
analytic
data set
Data merging
12
Multi-site
quality
assessment
workflows
• Many loops
• Many
decisions
(diamonds)
13
Data quality dimensions from the IS/CS literature
• Terms used in Information Sciences literature to
describe data quality
Wand Y, Wang R. Anchoring data quality dimensions in ontological foundations. Comm ACM. 1996;39(11):86-95.
14
Defining data quality: The “Fit for Use” Model
• Borrowed from industrial quality frameworks
– Juran (1951): “Fitness for Use”
• design, conformance, availability, safety, and field use
• Multiple adaptations by information science
community
– Not all adaptations are clearly specified
– Not all adaptations are consistent
– Not linked to measurement/assessment methods
15
The Wang and Strong Data Quality Model
• Interviews with broad set of data consumers
• Yielded 118 data quality features
 Four categories
 Fifteen dimensions
• Includes features of the data
and the data system
Our modification: Two data categories
 Five dimensions
Wang, R. and D. Strong (1996). "Beyond accuracy: What data quality means to data consumers." J. Management Information Systems 12(4): 5-34.
16
17
How to measure data quality?
• Need to link conceptual framework with methods
• Maydanchik: Five classes of data quality rules
– Attribute domain: validate individual values
– Relational integrity: accurate relationships between
tables, records and fields across multiple tables
– Historical: time-vary data
– State-dependent: changes follow expected transitions
– Dependency: follow real-world behaviors
Maydanchik, A. (2007). Data quality assessment. Bradley Beach, NJ, Technics Publications.
18
Data Quality Assessment METHODS
• Five classes of data quality rules  30
assessment methods
–
–
–
–
–
Attribute domain rules (5 methods)
Relational integrity: (4 methods)
Historical: (9 methods)
State-dependent: (7 methods)
Dependency: (5 methods)
Time and change assessments dominate!!
19
Dimension 1: Attribute domain constraints
20
Dimension 2: Relational integrity rules
21
22
Dimension 4: State-dependent rules
23
Dimension 5:Attribute dependency rules
24
How to use this framework
• Determine which aspects of data quality matter
most at Stage 1
–
–
–
–
What is needed to support Stage 2
What is doable with data sources?
What can the project afford to do?
What needs to be done once versus repeatedly
• Write up a data quality assessment plan
– What’s in, what’s out
– And why
25
Extra credit slides: A formative proposal
• President’s Council of Advisors on Science and
Technology (PCAST)
– Recommended mandatory “metadata” tags attached
to all HIT data elements
• Metadata are descriptions of the data
• PCAST proposed tags: data provenance, privacy
permissions/restrictions
26
27
Extra credit slides: A formative proposal
• CER community defines metadata tags that
describe data quality for data elements and data
sets
– Simple distributions (mean, median, min, max,
missingness, histograms)
• ala OMOP OSCAR
– More comprehensive set of measures derived from
this framework
• If you are interested in this concept, contact me!
28
A Pragmatic Model for Data Quality
Assessment in Clinical Research
Michael G. Kahn, M.D., Ph.D.
[email protected]
Funding was provided by a contract from AcademyHealth. Additional support was provided by AHRQ 1R01HS019912-01 (Scalable
PArtnering Network for CER: Across Lifespan, Conditions, and Settings), AHRQ 1R01HS019908 (Scalable Architecture for
Federated Translational Inquiries Network), and NIH/NCRR Colorado CTSI Grant Number UL1 RR025780 (Colorado Clinical and
Translational Sciences Institute).
Validation of Electronic Clinical Data for
Comparative Effectiveness Research
EDM Forum Methods Stakeholder Symposium
October 27, 2011
David R. Flum, MD, MPH
Principal investigator
Beth Devine, PharmD, MBA, PhD
Lead investigator, CER Core
Investigator, HIT Core
University of Washington, Seattle, WA
Sponsored by AHRQ 1 R01 HS020025-01 – Enhanced Registries for QI & CER
Outline
• SCOAP
• SCOAP CERTN
• SCOAP CERTN Validation Study The SCOAP Story
• Washington State grassroots initiative to “own”
surgical quality and cost
 Clinician led
 Foundation for Healthcare Quality
 Life Sciences Discovery Fund project grant‐2007
• Simple concept‐benchmarking
 Data sharing/feedback network
 Interventions‐checklist initiative
• Now self‐sustaining
o Improving healthcare for all Washingtonians
o Millions in reduced healthcare costs
o Development in neurosurgery and orthopedic spine
• SCOAP – a Learning Healthcare system
Average Cost/Case (2009 dollars)
SCOAP Bends the Cost Curve
$22,000
Non-SCOAP Hospitals
$20,000
$18,000
$ 67.3 Million
$16,000
SCOAP Hospitals
$14,000
$12,000
$10,000
2006
2007
All SCOAP Procedures combined (35,994 patients)
2008
2009
SCOAP + AHRQ = SCOAP CERTN
• SCOAP: A learning healthcare system needs…






Long‐term care, medication use and outcomes
Outpatient and inpatient data
Patient reported outcomes
Broader range of clinical areas
Leverage EHRs
Stakeholder engagement
• AHRQ: Interest in creating enhanced registries
 Use multiple data sources, link to multiple EHRs and evaluate data across 2+ care delivery sites  Overcome common barriers of patient registries  Create CER and link outcomes to advancing QI
 Provide a platform for dissemination & translation
SCOAP CERTN
• Comparative Effectiveness Research Translation Network
 Uses stakeholders and real word clinical practice to address questions of comparative benefit and risks for:
o Payers, Industry ,Policy makers
 Builds automated data flow
 Leverages SCOAP’s success to disseminate/translate research findings
• Project 1: Partially automate SCOAP data capture
 “ The Validation Study for semi‐automated data collection of machine‐readable data”
• Project 2: CER study in peripheral artery disease – intermittent claudication
Partial automation improves access to data
• Deploy Microsoft Amalga UIS™, leveraging site‐specific electronic medical records to capture clinical data
• Data used for SCOAP QI registries and SCOAP CERTN projects
– Access more data on more patients for queries
– Potential access to data buried in text via text mining
– Parallel validation of partial automation and text mining
• Increases scalability for SCOAP to expand into other clinical disciplines for both QI and CER
• Amalga installation at sites is covered by SCOAP CERTN
Benefits to SCOAP Hospitals
• Partial Automation of Data  Decreases manual effort for SCOAP data abstraction
 Participation in additional SCOAP registries for QI and CER with minimum additional resource burden
• Site benefits of Amalga installation
 Interfaces into Amalga done by Microsoft  Ability to do unlimited number of ad hoc queries against their own data in Amalga using SCOAP CERTN base views
 Evaluation of Amalga system against their needs
 After grant end, Microsoft willing to negotiate mutually agreed pricing model and business terms to continue and/or expand Amalga use
• Participation in a learning network that takes advantage of best practices around semi‐automated abstraction of QI measures • Learning from the ongoing development of text mining tools
SCOAP CERTN Validation Study: Overview
• Specific Aim
 Validate machine‐readable data flowing from Amalga for 2 uses:
o SCOAP QI o Comparative effective research using the CERTN network
• Inclusion criteria – all SCOAP patients
• Data sources – Variables (from over 700 data elements)
 Core SCOAP, Vascular‐Interventional SCOAP, General SCOAP
 Metrics from national QI initiatives
• Validation dataset
 Armus (manual abstraction), Amalga (automated extraction)
 Medical records = gold standard
SCOAP CERTN Validation Study: Overview
CERTN Validation Study: Study Aims
•
•
•
•
•
1) Calculate the concordance in the combined Armus‐Amalga Validation dataset. 2) Validate data elements coming from Armus database by estimating precision, recall and F‐measure of elements, comparing against the gold standard (medical record). 3) Validate machine‐readable data elements flowing from Amalga by estimating precision, recall and F‐measure of elements, comparing against the gold standard (medical record). 4) Establish “benchmark” to indicate when Amalga ready to “go‐live”.
5) Estimate degree to which Amalga is “learning” by comparing validation metrics extracted from Amalga, over each quarterly iteration of Amalga. CERTN Validation Study: Evaluation Plan (1) • Codify terms
 Gold standard = medical record
 Referent = Armus database (SCOAP QI database)
• Codify metrics used in current SCOAP QI Process with Armus database
 Inter‐rater reliability o comparing data abstraction from medical records among abstractors
 Level of precision (accuracy)
o Armus database compared to medical record
 Level of recall (completeness) o Armus database from medical record
 F‐measure – harmonic mean of precision and recall
CERTN Validation Study: Evaluation Plan (2) •
•
•
•
Establish benchmark for determining when data from Amalga are accurate and complete enough to “Go‐Live” at hospital sites
Determine list of metrics to be compared – all machine readable
 Core (with QI value), General SCOAP, Vascular‐Interventional SCOAP
 ‘Packets’ of metrics (demographic/treatment, binary/categorical)
In Amalga ‐ identify step(s) that require validation
Establish timeframe for comparison
 Armus ‐Image at the time of quarterly report generation
 Amalga o Ensure dataflow is complete and correct (iterate), then o Compare periodically to Armus data
– Daily, weekly, monthly CERTN Validation Study: Evaluation Plan (3) •
•
•
•
Develop & execute protocols and algorithms to check Amalga against referent (Armus)  Apply statistical procedures to estimate precision and recall
Develop and execute methods to address discrepancies and discordant values in Armus and Amalga databases – precision/recall
 If Armus incorrect, follow QI procedures
 If Amalga incorrect, enter correct data into pre‐populated web‐form with track overrides and comments
o All original and overridden data retained in dataset
o Involve HIT Core to improve precision and recall of data flow from Amalga
Develop and execute audit process to address concordance – random sample of concordant metrics – precision/recall
Iterate process until benchmarks reached, then “Go‐Live” at hospitals
Quality Improvement
All SCOAP Patients
SCOAP
Hospital DUA/BAA hospital to UW/MSFT BAA equivalent
hospital to FHCQ (SCOAP) *existing
General Surgical
Pediatric
Vascular/Interventional
SCOAP is a Coordinated Quality Improvement Program (RCW 43.70.510). The exchange of data from a hospital to SCOAP is exempt from disclosure.
Amalga MSFT CDR
All Patients or all SCOAP Patients
SCOAP Pts ONLY
Hospital
#1
Hospital
#2
Interfaces to EHRs, like ADT, Labs, Medications, Text Reports and Transcription
Hospital n
The “use” of PHI is acceptable under HIPAA laws and regulations as part of clinical quality assurance and quality improvement. NLP
SCOAP
Hospital Privacy Statement Needs to Broadly Include that Patient Information May Be Used for Quality Improvement
Quality Improvement
Current manual abstractors gather data from the chart for SCOAP’s 700+ data points on the data collection form
SCOAP
Hospital General Surgical
Pediatric
Vascular/Interventional
VALIDATE
Amalga MSFT CDR
Hospital
#2
Hospital n
Through an automated comparison of the two versions of the SCOAP datasets discrepancies are identified and a set of automated “flags” is generated. These flags (e.g. data points 14, 98, 612, and 707 are discrepant for patient 12345) generate for each patient a list of data points with two values (one manually obtained and one obtained via automation using Amalga and NLP).
SCOAP Pts ONLY
Hospital
#1
In parallel, data about SCOAP only patients from each site database is transferred into a “SCOAP” database and Amalga generates a subset of SCOAP data points.
SCOAP
The QI staff will compare Value 1 versus Value 2 for data point X (blinded as to which is human abstraction versus automated abstraction using Amalga and NLP) and mark which one is correct. The QI staff will find the relevant segment of the record (e.g. operative report) and attach this snippet of text in a de‐identified fashion.
Quality Improvement
Research
SCOAP
Hospital General Surgical
Pediatric
Vascular/Interventional
Researchers get a list of data flags (e.g. data point 14, 98, 612, and 707) and responses indicating which value was correct as well as relevant, non‐
identifiable chart‐based snippets (e.g. relevant segment of operative reports) that will help facilitate Amalga abstraction improvement and NLP .
VALIDATE
Amalga MSFT CDR
SCOAP Pts ONLY
Hospital
#1
Hospital
#2
Hospital n
SCOAP
To verify that when manual and automated approaches agree, that, in fact, they are both correct ‐ for concordant values a random subset of data points will be sent for manual review by QI staff using same processes as above.
Quality Improvement
Research
CQIP Protections
AHRQ Statutory Confidentiality Protection of Research Data
SCOAP
Hospital General Surgical
Pediatric
Vascular/Interventional
QI or SITE COORDINATION STAFF & SCOAP DATA MANAGER
CERTN INVESTIGATORS & RESEARCH STAFF/ANALYSTS
VALIDATE
De‐ID DATA
ONLY
Amalga MSFT CDR
SCOAP Pts ONLY
Hospital
#1
Hospital
#2
Hospital n
SCOAP
Describing this comparative or ”learning loop” process in academic publications and presentations is of interest to CERTN project leads.
Questions? Jay Desai
HealthPartners Research Foundation
EDM Forum Methods Symposium
Washington, DC
October 27, 2011
Team
 Jay Desai (HealthPartners Research Foundation)
 Patrick O’Connor (HealthPartners Research Foundation)
 Greg Nichols (Kaiser Permanente Northwest)
 Joe Selby (KPNC PCORI)
 Pingsheng Wu (Vanderbilt University)
 Tracy Lieu (Harvard Pilgrim)
Diabetes Datalink
 HMO Research Network
 Approximately 11 million member population
 SUrveillance, PREvention, and ManagEment of Diabetes Mellitus (SUPREME‐DM)
 Twelve participating health systems
 Funded by AHRQ
Diabetes Registries for CER, Surveillance, and other Research Purposes
 The Gold Standard problem
 Sensitivity, Specificity, Predictive Positive Value
 Confidence and Understanding
 Understand contribution of EHD sources
 Variation in EHD data sources
 Population Representativeness
 Case Retention
The Gold Standard Problem
 Biological gold standard for diabetes identification:
 Elevated blood glucose levels
 With good care management glucose levels may be below the threshold for diabetes diagnosis
 Remission due to substantial weight loss, bariatric surgery
 Comparative validity
 Medical record documentation
 Self‐report
 Claims‐based diagnosis codes
Diabetes Case Identification versus True Diabetes Population
A
False Positives
(B)
False Negatives
(C)
Sensitivity = A/(A+C)
Actual Yes
Actual No
Case ID Yes
A
B
A + B
Case ID No
C
D
C + D
A + C
B + D
Diabetes Case Identification versus True Diabetes Population
A
False Positives
(B)
False Negatives
(C)
Specificity = B/(B+D)
(D)
Actual Yes
Actual No
Case ID Yes
A
B
A + B
Case ID No
C
D
C + D
A + C
B + D
Diabetes Case Identification versus True Diabetes Population
A
False Positives
(B)
Sensitivity = A/(A+C)
False Negatives
(C)
Example:
90% Sensitivity
99% Specificity
5% Prevalence
81% PPV
PPV = A/(A+B)
Actual Yes
Actual No
Case ID Yes
A
B
A + B
Case ID No
C
D
C + D
A + C
B + D
Tailoring Diabetes Case Definition to Specific Research Questions
High Sensitivity
High Predictive Positive Value
 Maximize inclusion of  Maximize identification of 



potential cases
Observational studies
Surveillance
Population‐based quality metrics
CER
 Attenuate results but may have broader generalizability





‘true’ cases
Studies involving subject interventions
Registries that guide clinical interactions
Accountability tied to providers or systems
Intervention studies
CER
 Stringent case identification
 Potential selection bias
Building Confidence in Case Identification
 Vary time frames using same case identification criteria
 Shorter time frames: more confident but capture fewer cases
 Longer time frames: less confident but may capture more cases
 What is ideal, especially with no gold standard?
 Periodic recapture Building Confidence in Case Identification
 Prioritizing case identification criteria
 Assign probabilities of ‘true case’
 Prioritize data sources
 More independent data sources identification… more confidence.
Building the DataLink Registry
Initial registry construction
Broad sweep:
Any indication of diabetes from any electronic health data source
Datalink: Prevalent Diabetes Case Definition
 Enrolled beginning in 2005
 Look back at data beginning from 2000
 Within a 2‐year time period…
 At least 2 face‐to‐face outpatient diabetes diagnoses, or
 At least 1 inpatient diabetes diagnosis, or
 At least 1 anti‐glycemic pharmacy dispense or claim (excluding metformin, thiazolidinediones, or exenatide when no other criteria are met), or
 At least 2 elevated blood glucose levels (HbA1c, fasting plasma glucose, random plasma glucose) or one elevated OGTT
2008‐9 Diabetes Case Identification at a DataLink Health System
?%
Pharmacy Dispense
(68%)
Outpatient & Inpatient
Diagnoses
(92%)
?%
?%
?%
Laboratory Results
(63%)
Dynamic and Static Cohorts
 Dynamic Cohort
Figure 1. Dynamic & Static Diabetes
Prevalence at one DataLink
Health System (2008-9)
 Cumulative case identification over multiple years
 Enter as new case or care system member
 Leave due to death or disenrollment
 Static Cohort
 Identification over defined time period and followed
 Enter…none.
10
8.2
8
7.7
6
Dynamic
(2000‐9)
4
Static
(2008‐9)
2
0
Prevalence
Differential Use and Characteristics of Electronic Health Data Sources  There is wide variation across health systems regarding the primary source for case identification.
 There may be selection bias associated with specific data sources.  This could affect case‐mix and therefore results.
Initial Case Identification: At least 2 elevated blood glucose levels (lab)
Initial Case Identification: At least 1 diabetes drug pharmacy claims
2008‐9 Diabetes Case Identification at a DataLink Health System
?%
Pharmacy Dispense
(68%)
Outpatient & Inpatient
Diagnoses
(92%)
?%
?%
?%
Laboratory Results
(63%)
Data Sources: Added Value
 Insurance and pharmacy claims are routinely used for diabetes case identification.
 Numerous validation studies against medical record or self‐report
 What is the added case identification value of clinical data found in EMR’s?
Step‐wise Contributions of Electronic Health Data Sources to Diabetes Case Identification (2008‐9) at One DataLink Health System
Prevalence (N)
% Cases A
B
C
D
E
F
G
2 Outpatient
Claims Dx
1 Inpatient Claims Dx
Only
A + B
1 Diabetes Drug Claim Only
C + D
2 Elevated Blood Glucose Tests Only
E + F
7.0%
(12,916)
0.1%
(111)
7.1%
(13,027)
0.4%
(678)
7.4%
(13,705)
0.2%
(422)
7.7%
(14,127)
91% +1% +5%
+3%
Patient Characteristics Based on Qualifying Case Identification EHD (2008‐9) at One DataLink
Health System
Prevalence (N)
% Cases A
B
C
D
E
F
G
Outpatient
Claims Dx
Inpatient Claims Dx
Only
A + B
Diabetes Drug Claim Only
C + D
Blood Glucose Lab Test Only
E + F
7.0%
(12,916)
0.1%
(111)
7.1%
(13,027)
0.4%
(678)
7.4%
(13,705)
0.2%
(422)
7.7%
(14,127)
91% 1% 5%
3%
Select Patient Characteristics
Female (%)
49
60
49
79
51
50
51
18‐44 years (%)
12
26
12
58
14
8
14
HbA1c < 8 (%)
81
91
81
91
81
98
82
LDL‐c < 100 (%)
74
71
74
46
73
56
72
Current smoker (%)
11
11
11
14
11
15
11
Population Representativeness
 CER studies?
 Assess relative effectiveness of various treatments and systems of care in defined patient populations.
 Uninsured
 No: if population defined based on insurance claims
 Probably: if population defined based on EMR
 Units of analysis
 Patient, Provider, Clinic, Health System
 Large multi‐site registries more likely to provide representative ‘units of analysis’
 HIE potential to include smaller, less integrated systems Percent Retention of 2002 Incident Diabetes Cohort at One DataLink Health System
Comparing Selected Characteristics of 2006 Incident Cohort by Retention Status through 2010:
Baseline characteristics
Retained Cohort Lost to disenrollment
Female
51%
50%
18‐44 years
15%
27%
45‐64 years
52%
55%
65+ years
32%
16%
Current smoker
15%
19%
BMI ≥ 30 kg/m2
60%
62%
HbA1c < 8%
86%
78%
LDL‐c < 100 mg/dl
51%
43%
SBP < 140 mmHg
80%
79%
DBP < 90 mmHg
94%
90%
Summary
 No realistic EHD gold standard for many conditions.
 When designing a registry,
 Think multi‐purpose
 Maximize case capture so that a variety of case definitions can be derived depending on specific study needs.  Consider developing several case definitions with different levels of confidence [sensitivity & PPV].
Summary
 For CER studies we are interested in defined patient populations, providers, clinics, health systems…
 The greater the diversity of health systems participating in a disease registry the better…the more representative.
 EMR‐derived registries may include uninsured and be most representative.
Summary
 Cohorts developed using insurance claims have substantial attrition due to disenrollment over time.
 Important to include demographic and clinical characteristics of retained population compared to those loss‐to‐follow‐up
 CER studies requiring long follow‐up to outcomes may be challenging if based on secondary use of EHD data
 Improve as health systems get regionally connected so patients can be tracked across systems (HIE’s)?