Download Adding more value by smart querying and Natural Language

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Adding more value by smart
querying and Natural Language
Processing
Eneida Mendonca MD, PhD, FACMI, FAAP
Associate Professor
Pediatrics, Biostatistics and Medical Informatics
Industrial and Systems Engineering (Affiliated)
Assistant Director, Clinical/Health Informatics
Institute for Clinical and Translational Research (ICTR)
University of Wisconsin - Madison
Healthcare is Changing
Consortium on Technology for Proactive Care
EPISODIC, REACTIVE
FOCUS ON DISEASE
Northeastern University
PROACTIVE and PREVENTIVE
FOCUS ON WELLBEING
QUALITY OF LIFE
HOSPITAL-CENTRIC
PATIENT-CENTRIC, HOME-BASED
TRAINING & EXPERIENCE
BASED
MORE EVIDENCE – BASED
DECISION SUPPORT
FRAGMENTED, LOCAL DATA
INTEROPERABLE, EHR AVAILBLE
ANYWHERE, ANYTIME
PASSIVE PATIENTS
EMPOWERED, ENAGAGED,
INFORMED, PARTICIPATING
Big Data in Healthcare
•  EHR Types and sources
•  Structured
•  Coded (e.g. ICD, SNOMED, CPT)
•  Numerical values (e.g. temperature, BP, HR, age)
•  Unstructured (~ 95%)
•  Text (e.g. discharge summary, email)
•  Image (e.g. radiological, microscopy, photos)
•  Audio (e.g. dictation, heart murmur)
•  Other data:
•  Geographic coding + environmental data
•  Biobanking
•  Census data
•  Tumor registries
10
Proprietary & Confidential
Weber GM, Mandl KD, Kohane IS. Finding the Missing Link for Big Biomedical Data. JAMA. Published online May 22,
1 2014
Making healthcare smarter n
Challenges for Patient Identification &
Intervention
•  Data Challenges – 
– 
– 
– 
– 
– 
– 
Large Scale: Up to 10s of millions of pa5ents High Dimensionality: Thousands of dimensions spanning many years Semi-­‐Structured: Clinical notes, imaging, medical codes Distributed: Mul5ple providers and representa5ons Sparse and Irregular: Periodic visits, different for each person Uncertain: Subjec5ve, data entry errors, bias for billing Incomplete: Many items missing from the medical record •  Task Challenges – 
– 
– 
– 
Cri?cal decisions: May literally mean life or death No clear right answer: Evidence is oFen ambiguous Limited ?me: Manage complexity, mul5ple granularity Domain experts are people… •  Too much (or too liIle) trust in numbers •  “But my pa5ents are different…” •  Users resistant to technology change Biomedical Informa?cs Core “Provide ICTR with computa5onal and informa5onal infrastructure and services by developing and implemen5ng technologies necessary for clinical and transla5onal research inves5gators at UW-­‐Madison and the Marshfield Clinic Research founda5on (MCRF).” BMI Core exper?se Lead: Umberto Tachinardi -­‐ Co-­‐Lead (MCRF): Amit Acharya Area Leaders UW Faculty 7 PhDs UW Staff 6 Masters 2 Students Marshfield Clinic Leaders A. Acharya, P. Peissig Bioinforma?cs Imaging Informa?cs Clinical Informa?cs IT Services M. Colin R. Bruce E. Mendonca T. Mish (J. BooP UWCCC) Clinical Informatics
Theoretical
•  Clinical decision support
•  Data mining
•  Quality, outcomes, predictive modeling
•  Data modeling
•  Controlled terminologies
•  Privacy and Security
Practical
•  Design research databases and
implement OnCore and REDCap
applications
•  Natural language processing of
Clinical Narratives
•  Regulatory compliance of clinical
research databases
•  Development of algorithms for
complex queries
•  Ontology development tool (IntellaQ)
9 Pre-­‐award consulta5on Researchers want data …
…when they find us (BMIC), we help them learn that:
-  From hypothesis to data retrieval is not always a straight line
-  Identifying a cohort may need specific definitions not in the data
-  They don’t know if the data is available
-  They may need:
-  authorization to get the data
-  to secure/protect data/privacy
-  new methods to be developed (e.g. NLP, imaging/
genomics pipelines)
-  to share data, and learn how to do that
-  access to tools (mostly BMIC-operated)
Cohort iden5fica5on •  Study feasibility counts –  Do I have a pa5ent popula5on large enough to draw a sta5s5cally meaningful sample? •  Internal studies •  Collabora5ve studies (e.g., PCORNet) –  Define phenotypes of interest –  Create internal mapping for data defini5ons from network EHR and clinical studies •  Planning and recruitment –  Site Iden5fica5on, Inves5gator Iden5fica5on –  Predic5ng Enrollment –  Facilita5ng Recruitment... •  Regarding recruitment –  Clinical data can expedite iden5fica5on of subjects •  Used to pre-­‐iden5fy •  Generate mailing/call lists –  EHRs can facilitate evalua5on and enrollment •  Review of chart made easier •  Privacy issues offer some Query building/execu5on After learning what they need, and how to get that…
…they need data to be collected and presented:
-  BMI people help them translate what they say they want, to what is
available (e.g. map between English and ICD/SNOMED, cluster lab
data)
-  Transform the mapped request into database language (SQL or
i2b2 tool)
-  Execute the codes and retrieve data
-  Prepare data for delivery
-  cleansing
-  quality check
-  reruns
-  secure and package
-  Deliver!
What is i2b2?
•  Software used for anonymous patient cohorts
as well as identifying patients once research
begins
•  Simplifies research across multiple clinical data
sets
•  Standardized platform for academic and
clinical research across organizations
•  Answers the question: Do I have a patient
population large enough to draw a statistically
meaningful sample
i2b2 Web Client
Grouping Boxes (Query Logic)
Grouping Boxes (Query Logic)
i2b2 Data
• 
• 
Patient Information
–  Demographic Information
–  Age
–  Race
–  Ethnicity
–  Gender
–  Marital Status
–  Location (Postal Code)
Biometric Information
–  Height
–  Weight
–  Body Mass Index (BMI)
–  Blood Pressure
•  Systolic
•  Diastolic
• 
• 
Patient History Information
Social History
–  Alcohol Use History
–  Drug Use History
–  Tobacco Use History
i2b2 Data
• 
• 
• 
• 
Diagnosis Information
–  ICD-9-CM Codes
–  ICD-O Codes
Procedure Information
–  CPT Codes
–  HCPCS Codes
–  ICD-9-CM Procedure Codes
Allergies and Allergens Information
–  Allergy Information
–  Allergen Information
Vaccine and Immunization Information
–  CVX Codes
–  MVX Codes
• 
Medication Information
–  GPI Codes
• 
Lab Information
–  LOINC Codes
Encounter Information
–  Age at Encounter
–  Length of Stay
–  Encounter Location (Postal Code)
–  Encounter Type
–  Patient Class
• 
De-identified Data Set
•  De-identified Data Set i2b2 instance:
–  18 forms of PHI are removed
–  3 digit zip code •  wildcard concept (53703 becomes 537)
•  if the US Census population is small; 000 is displayed
–  All Dates are shifted by a given random seed (+/- 30)
–  Age is displayed by year only
•  full birth year is not exposed
•  Patients >= 90 referred to by a single concept
•  Based on safe harbor Potential users for i2b2
• 
• 
• 
• 
• 
• 
UW Health Researchers
Professionally Credentialed Physicians
Non-Accredited Fellows
Accredited Fellows
Residents
Physician Assistants & Nurse Practitioners
–  “Don’t call them mid-levels!”
•  Affiliated Physicians (i.e. DFM PCPs)
•  IT related groups –  IS (formerly known as ITS), HIMC, BPAD, DABI, etc.
Some i2b2 numbers
Concept(category(
NUMBER(OF(PATIENTS:((
GENDER(FEMALE:((
1,246,647)
GENDER(MALE:((
1,202,972)
GENDER(UNKNOWN:(
BIOMETRIC:(NUMBER(OF(PATIENTS(
Number(of(
occurrences(
2,490,638)
41,019)
756,781)
BIOMETRIC:(BMI(
3,792,083)
BIOMETRIC:(DIASTOLIC(
6,812,531)
DIAGNOSIS:(BILL(
13,564,794)
DIAGNOSIS:(ENCOUNTER(
66,688,519)
PROCEDURE:(ORDERED(
65,683,720)
LAB(TOTAL:(
LAB(NUMBER(OF(PATIENTS:(
MEDICATION(TOTAL:(
MEDICATION(NUMBER(OF(PATIENTS:(
SOCIAL(HISTORY:(TOTAL(
SOCIAL(HISTORY:(NUMBER(OF(PATIENTS(
)
256,629,737)
887,676)
12,090)
766,673)
64,589,592)
797,963)
Examples of requests: Study feasibility counts We are interested in women whose breast cancer returned aFer more than 15 years. How many women fit this category? We are also interested in any biobanked material. Might there be a biopsy samples or other materials also maintained for these women? Examples of requests: Data Our IRB protocol states that we will submit a request to obtain a list of pa5ents that have ICD-­‐9 codes associated with inflammatory diseases of the GI track. Our stated 5me period in our IRB applica5on is 1/1/2010 through 12/31/2015. Reques5ng data –  Inclusion and exclusion criteria –  Inclusion Date Range –  Study popula5on (age, gender, race, ethnicity) –  Data elements needed –  Define the constraints (filters) •  Inpa5ent vs. Outpa5ent •  “Medical home” here •  Department restric5ons –  Data format for delivery NLP Infrastructure
Unstructured Information Management Architecture (UIMA)
26
Other uses of text •  Template sentences •  Pre-­‐configured templates (sentences) oFen used to document visits •  Pre-­‐configured list of choices used for certain documenta5on (e.g., signs and symptoms related to par5cular problems, medica5on lists) •  Pre-­‐configure forms context dependent (coded + text) Query building/execu5on After learning what they need, and how to get that…
…they need data to be collected and presented:
- BMI people help them translate what they say they want, to what is available
(e.g. map between English and ICD/SNOMED, cluster lab data)
WAIT! Data is NOT structured! (not a number or a code)
Query building/execu5on Research data
needs
Broker
i2b2
Honest
DATA
Automatic Querying
Stru
cture
d Da
ta
EDW
Human Operated
Data Querying
NLP
EHR
Research Data Workflow (current)
Request for data
clinical
Request for data
quality
ServiceNow Triage
Request for data
research
DABI
HIMC
BMI Core
Epic team
BPAD
Request for data
unknown/mixed
30
30
Query workflow Enter request Triage Aggregate count Requires IRB QI/QA study Assess Feasibility “Scien5fic” Check Compliance Check Query is validated Data prepared for delivery REDCap Research Administra?on Protocols •  Basic Protocol Setup •  Regulatory Tracking (e.g. SRC, IRB, CTRC) •  Protocol ac5va5on •  DSMC Reviews • 
• 
• 
• 
• 
• 
• 
Subjects Screening Registra5on Consen5ng Eligibility Subject Status SAEs Devia5ons Protocol Setup & Management Calendars •  Procedures & Evalua5ons •  Treatment & FU Schedules •  Visit Tolerances •  Foot Notes eCRFs •  Forms Design •  Assign forms to studies Financials •  Study Budgets •  Nego5ated Rates •  SOC vs. Research •  CMS compliance assistance •  Payment Milestones Study Data Management Visit Tracking & Data Capture Visits •  Automated subject calendars •  Visit & Procedure Tracking Addi5onal Visits & • 
Procedures SOC Vs. Research • 
•  eCRF comple5on •  Query resolu5on •  Data Monitoring •  Data Discrepancy Management •  Data Export Financial Management •  Automated invoice items •  Generate Invoices •  Study Payment tracking •  Unplanned visits & procedures •  Excep5ons OnCore vs REDCap   ICTR OnCore CRM is required for the following protocols;   All therapeu5c interven5on protocols being conducted within the auspices of UW-­‐
Madison or the UW-­‐Madison Affiliated Covered En5ty (ACE) that involve the use of:  
 
 
 
Drugs (including over-­‐the-­‐counter medica5ons, vitamins, herbals, and supplements) Biologics (including vaccines and stem cells) Devices (including Emergency, One-­‐Time and Humanitarian Use Device) Radia5on therapy or medical imaging   Protocols that use an inves5gator-­‐held Inves5ga5onal New Drug (IND) or Inves5ga5onal Device Exemp5on (IDE)   Protocols that use University of Wisconsin Hospital and Clinics (UWHC) or University of Wisconsin Medical Founda5on (UWMF) ancillary services (including Clinical Research Unit) that will result in UWHC or UWMF charges or fees that will be billed to the subject or research account (NOTE: subjects par5cipa5ng in this protocol category must also be registered through UWHC Pa5ent Registra5on)   Protocols under the purview of the ICTR Data Monitoring CommiPee REDCap -­‐ Standard Features  Supports “Classic” & “Longitudinal” projects  Surveys eCRF data entry typing & valida5on REDCap Shared Library  Data Dic5onary  Branching Logic   Mul5-­‐Arm and Mul5-­‐event studies  Record Management REDCap -­‐ Data Tools •  Data Import/Export tools •  Built-­‐in data quality tools –  Basic analy5cs tools –  Data quality checks •  Standard repor5ng •  Data Security –  User Permissions/Roles –  Data Access groups REDCap -­‐ Advanced Features  Security/Audit Logs/HIPAA data  Data Import/Export  Data Access Groups  Randomiza5on  Double Data Entry  Project Linking  SQL Query Data Type  API  DTS Data Security Quality of Care •  Which asthma pa5ents are taking an inhaled cor5costeroid? •  Are we following the guidelines for young pa5ents with hypertension? •  * How am I doing as compare to others? (as a physician) Tools for QI/QA and research •  Recrui5ng pa5ents at the point-­‐of-­‐care (CDSS – aler5ng systems) •  Crea5ng customized data entry forms that translate to coded data •  Pa5ent-­‐reported outcomes ques5onnaires •  Linking EHR data to informa5on resources Tools for QI/QA and research •  Crea5ng opera5onal dashboard reports •  Clinical flowsheets •  Crea5ng pa5ent sets and storing pa5ent data (Clinical registries) •  Crea5ng order sets (e.g., chemotherapy) Narratives (texts)
Methods Development Molecular data
BMIC will provide:
-  NLP, Imaging and Genomic
pipeline development
-  Computational methods for
image quantification
-  Molecular data analysis
-  Data mining
Images
Tools Development Now we learned what people want, and
find ways to automate that so others
can also benefit. We develop tools that
can more easily be used by our
customers!
-  Re-usable pipelines for: image, genomic and
text analysis
-  Ontology editor (IntellaQ)
-  De-Identification and re-identification tools
(honest broker)
-  Secure computing solutions
-  Storage and computing solutions (for large
datasets)
Data Analysis With data is their hands, researchers still use the
BMIC to help them analyze the data they got. The
analysis is certainly a mix of biostatistics and
biomedical informatics tools. The BMIC provide
expertise on:
-  Machine learning
-  Data mining
-  Big Data processing
-  Phenotyping
-  Genotyping
-  Complex image analysis
-  Ontology development
Infrastructure opera5ons/
support The infrastructure provided by the BMI core,
depends on people and technology. Those
infrastructural services, depend on many things to be
available and operational:
-  Network
-  Servers
-  Databases
-  Applications
-  Security
Keeping the lights ON!
Tools for Research and Analy5cs Analytics
•  Toolbox of Analy?cs Components –  Pa5ent similarity analy5cs –  Predic5ve modeling –  Clustering –  Process mining …
?
Similarity
Analysis
x1
x2
xN
Query
patient
Patient
population
x1
x2
,… ,
–  Scalable –  Designed for sparse data –  Computa5onally efficient for popula5on-­‐wide analyses xN
xN
•  Data model designed for analy5cs •  Separate model training and scoring phases –  Learning techniques that can incorporate user feedback x1
x2
…
,
x1
x2
…
…
•  Key Proper?es Patient similarity
assessment in clinical
factor/feature space
xN
Visualization Analysis of Patient Cohorts
x1
x2
?
Clinically similar to
…
?
Similarity
Analysis
xN
Query patient
Patient similarity
assessment in clinical
factor/feature space
x1
x2
x1
x2
,… ,
x1
x2
…
…
…
Patient population
,
xN
xN
xN
?
Visual outcome analysis
Dashboard Techniques •  “At-­‐a-­‐glance” summary of analy5cs results Dashboards Dashboard Wide Range of Visualiza5on Use Cases •  Dashboard / “At a Glance” – e.g., “What is the pa5ent’s risk score?” •  In-­‐Depth Exploratory Analysis – e.g., “Understanding impacts of varia5ons in care” – Exper?se through interac?on •  Key for scaling to “big data” Visualization Analysis of Patient Cohorts
x1
x2
?
Clinically similar to
…
?
Similarity
Analysis
xN
Query patient
Patient similarity
assessment in clinical
factor/feature space
x1
x2
x1
x2
,… ,
x1
x2
…
…
…
Patient population
,
Visual cohort refinement
xN
xN
xN
?
Visual outcome analysis
Scaling for Big Data Many Features Many Pa5ents Many Pa5ents Feature Selec?on Cohort Selec?on Addi?onal Analy?cs Feedback via Expert Interac5on Disease surveillance © 2011 IBM Corporation
hIp://healthmap.org/ Personal healthcare analytics? © 2011 IBM
Corporation
Personal fitness monitoring © 2011 IBM
Corporation
Acknowlegment of authors of slides included in this presenta5on: • 
• 
• 
• 
• 
• 
• 
• 
• 
• 
Umberto Tachinardi, MD, PhD – UW Madison Peter Embi, MD – OSU Biomedical Informa5cs Philip Payne, PhD -­‐ OSU Biomedical Informa5cs James Cimino, MD – CC/NIH Lynn Vogel, PhD – MD Anderson Sass Babyon, BS – UW Health Stephanie Berkson – UW Health Grace Flood, MD – UW Health Kurt Riegel – UW Health HIMC and BMI staff – UW Health/UW-­‐Madison SMPH All slides are copyright of the UW Health and UW-­‐Madison SMPH, unless otherwise iden5fied. Thank you! Tak! Ques5ons? Spørgsmål? [email protected]