Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Adding more value by smart querying and Natural Language Processing Eneida Mendonca MD, PhD, FACMI, FAAP Associate Professor Pediatrics, Biostatistics and Medical Informatics Industrial and Systems Engineering (Affiliated) Assistant Director, Clinical/Health Informatics Institute for Clinical and Translational Research (ICTR) University of Wisconsin - Madison Healthcare is Changing Consortium on Technology for Proactive Care EPISODIC, REACTIVE FOCUS ON DISEASE Northeastern University PROACTIVE and PREVENTIVE FOCUS ON WELLBEING QUALITY OF LIFE HOSPITAL-CENTRIC PATIENT-CENTRIC, HOME-BASED TRAINING & EXPERIENCE BASED MORE EVIDENCE – BASED DECISION SUPPORT FRAGMENTED, LOCAL DATA INTEROPERABLE, EHR AVAILBLE ANYWHERE, ANYTIME PASSIVE PATIENTS EMPOWERED, ENAGAGED, INFORMED, PARTICIPATING Big Data in Healthcare • EHR Types and sources • Structured • Coded (e.g. ICD, SNOMED, CPT) • Numerical values (e.g. temperature, BP, HR, age) • Unstructured (~ 95%) • Text (e.g. discharge summary, email) • Image (e.g. radiological, microscopy, photos) • Audio (e.g. dictation, heart murmur) • Other data: • Geographic coding + environmental data • Biobanking • Census data • Tumor registries 10 Proprietary & Confidential Weber GM, Mandl KD, Kohane IS. Finding the Missing Link for Big Biomedical Data. JAMA. Published online May 22, 1 2014 Making healthcare smarter n Challenges for Patient Identification & Intervention • Data Challenges – – – – – – – Large Scale: Up to 10s of millions of pa5ents High Dimensionality: Thousands of dimensions spanning many years Semi-‐Structured: Clinical notes, imaging, medical codes Distributed: Mul5ple providers and representa5ons Sparse and Irregular: Periodic visits, different for each person Uncertain: Subjec5ve, data entry errors, bias for billing Incomplete: Many items missing from the medical record • Task Challenges – – – – Cri?cal decisions: May literally mean life or death No clear right answer: Evidence is oFen ambiguous Limited ?me: Manage complexity, mul5ple granularity Domain experts are people… • Too much (or too liIle) trust in numbers • “But my pa5ents are different…” • Users resistant to technology change Biomedical Informa?cs Core “Provide ICTR with computa5onal and informa5onal infrastructure and services by developing and implemen5ng technologies necessary for clinical and transla5onal research inves5gators at UW-‐Madison and the Marshfield Clinic Research founda5on (MCRF).” BMI Core exper?se Lead: Umberto Tachinardi -‐ Co-‐Lead (MCRF): Amit Acharya Area Leaders UW Faculty 7 PhDs UW Staff 6 Masters 2 Students Marshfield Clinic Leaders A. Acharya, P. Peissig Bioinforma?cs Imaging Informa?cs Clinical Informa?cs IT Services M. Colin R. Bruce E. Mendonca T. Mish (J. BooP UWCCC) Clinical Informatics Theoretical • Clinical decision support • Data mining • Quality, outcomes, predictive modeling • Data modeling • Controlled terminologies • Privacy and Security Practical • Design research databases and implement OnCore and REDCap applications • Natural language processing of Clinical Narratives • Regulatory compliance of clinical research databases • Development of algorithms for complex queries • Ontology development tool (IntellaQ) 9 Pre-‐award consulta5on Researchers want data … …when they find us (BMIC), we help them learn that: - From hypothesis to data retrieval is not always a straight line - Identifying a cohort may need specific definitions not in the data - They don’t know if the data is available - They may need: - authorization to get the data - to secure/protect data/privacy - new methods to be developed (e.g. NLP, imaging/ genomics pipelines) - to share data, and learn how to do that - access to tools (mostly BMIC-operated) Cohort iden5fica5on • Study feasibility counts – Do I have a pa5ent popula5on large enough to draw a sta5s5cally meaningful sample? • Internal studies • Collabora5ve studies (e.g., PCORNet) – Define phenotypes of interest – Create internal mapping for data defini5ons from network EHR and clinical studies • Planning and recruitment – Site Iden5fica5on, Inves5gator Iden5fica5on – Predic5ng Enrollment – Facilita5ng Recruitment... • Regarding recruitment – Clinical data can expedite iden5fica5on of subjects • Used to pre-‐iden5fy • Generate mailing/call lists – EHRs can facilitate evalua5on and enrollment • Review of chart made easier • Privacy issues offer some Query building/execu5on After learning what they need, and how to get that… …they need data to be collected and presented: - BMI people help them translate what they say they want, to what is available (e.g. map between English and ICD/SNOMED, cluster lab data) - Transform the mapped request into database language (SQL or i2b2 tool) - Execute the codes and retrieve data - Prepare data for delivery - cleansing - quality check - reruns - secure and package - Deliver! What is i2b2? • Software used for anonymous patient cohorts as well as identifying patients once research begins • Simplifies research across multiple clinical data sets • Standardized platform for academic and clinical research across organizations • Answers the question: Do I have a patient population large enough to draw a statistically meaningful sample i2b2 Web Client Grouping Boxes (Query Logic) Grouping Boxes (Query Logic) i2b2 Data • • Patient Information – Demographic Information – Age – Race – Ethnicity – Gender – Marital Status – Location (Postal Code) Biometric Information – Height – Weight – Body Mass Index (BMI) – Blood Pressure • Systolic • Diastolic • • Patient History Information Social History – Alcohol Use History – Drug Use History – Tobacco Use History i2b2 Data • • • • Diagnosis Information – ICD-9-CM Codes – ICD-O Codes Procedure Information – CPT Codes – HCPCS Codes – ICD-9-CM Procedure Codes Allergies and Allergens Information – Allergy Information – Allergen Information Vaccine and Immunization Information – CVX Codes – MVX Codes • Medication Information – GPI Codes • Lab Information – LOINC Codes Encounter Information – Age at Encounter – Length of Stay – Encounter Location (Postal Code) – Encounter Type – Patient Class • De-identified Data Set • De-identified Data Set i2b2 instance: – 18 forms of PHI are removed – 3 digit zip code • wildcard concept (53703 becomes 537) • if the US Census population is small; 000 is displayed – All Dates are shifted by a given random seed (+/- 30) – Age is displayed by year only • full birth year is not exposed • Patients >= 90 referred to by a single concept • Based on safe harbor Potential users for i2b2 • • • • • • UW Health Researchers Professionally Credentialed Physicians Non-Accredited Fellows Accredited Fellows Residents Physician Assistants & Nurse Practitioners – “Don’t call them mid-levels!” • Affiliated Physicians (i.e. DFM PCPs) • IT related groups – IS (formerly known as ITS), HIMC, BPAD, DABI, etc. Some i2b2 numbers Concept(category( NUMBER(OF(PATIENTS:(( GENDER(FEMALE:(( 1,246,647) GENDER(MALE:(( 1,202,972) GENDER(UNKNOWN:( BIOMETRIC:(NUMBER(OF(PATIENTS( Number(of( occurrences( 2,490,638) 41,019) 756,781) BIOMETRIC:(BMI( 3,792,083) BIOMETRIC:(DIASTOLIC( 6,812,531) DIAGNOSIS:(BILL( 13,564,794) DIAGNOSIS:(ENCOUNTER( 66,688,519) PROCEDURE:(ORDERED( 65,683,720) LAB(TOTAL:( LAB(NUMBER(OF(PATIENTS:( MEDICATION(TOTAL:( MEDICATION(NUMBER(OF(PATIENTS:( SOCIAL(HISTORY:(TOTAL( SOCIAL(HISTORY:(NUMBER(OF(PATIENTS( ) 256,629,737) 887,676) 12,090) 766,673) 64,589,592) 797,963) Examples of requests: Study feasibility counts We are interested in women whose breast cancer returned aFer more than 15 years. How many women fit this category? We are also interested in any biobanked material. Might there be a biopsy samples or other materials also maintained for these women? Examples of requests: Data Our IRB protocol states that we will submit a request to obtain a list of pa5ents that have ICD-‐9 codes associated with inflammatory diseases of the GI track. Our stated 5me period in our IRB applica5on is 1/1/2010 through 12/31/2015. Reques5ng data – Inclusion and exclusion criteria – Inclusion Date Range – Study popula5on (age, gender, race, ethnicity) – Data elements needed – Define the constraints (filters) • Inpa5ent vs. Outpa5ent • “Medical home” here • Department restric5ons – Data format for delivery NLP Infrastructure Unstructured Information Management Architecture (UIMA) 26 Other uses of text • Template sentences • Pre-‐configured templates (sentences) oFen used to document visits • Pre-‐configured list of choices used for certain documenta5on (e.g., signs and symptoms related to par5cular problems, medica5on lists) • Pre-‐configure forms context dependent (coded + text) Query building/execu5on After learning what they need, and how to get that… …they need data to be collected and presented: - BMI people help them translate what they say they want, to what is available (e.g. map between English and ICD/SNOMED, cluster lab data) WAIT! Data is NOT structured! (not a number or a code) Query building/execu5on Research data needs Broker i2b2 Honest DATA Automatic Querying Stru cture d Da ta EDW Human Operated Data Querying NLP EHR Research Data Workflow (current) Request for data clinical Request for data quality ServiceNow Triage Request for data research DABI HIMC BMI Core Epic team BPAD Request for data unknown/mixed 30 30 Query workflow Enter request Triage Aggregate count Requires IRB QI/QA study Assess Feasibility “Scien5fic” Check Compliance Check Query is validated Data prepared for delivery REDCap Research Administra?on Protocols • Basic Protocol Setup • Regulatory Tracking (e.g. SRC, IRB, CTRC) • Protocol ac5va5on • DSMC Reviews • • • • • • • Subjects Screening Registra5on Consen5ng Eligibility Subject Status SAEs Devia5ons Protocol Setup & Management Calendars • Procedures & Evalua5ons • Treatment & FU Schedules • Visit Tolerances • Foot Notes eCRFs • Forms Design • Assign forms to studies Financials • Study Budgets • Nego5ated Rates • SOC vs. Research • CMS compliance assistance • Payment Milestones Study Data Management Visit Tracking & Data Capture Visits • Automated subject calendars • Visit & Procedure Tracking Addi5onal Visits & • Procedures SOC Vs. Research • • eCRF comple5on • Query resolu5on • Data Monitoring • Data Discrepancy Management • Data Export Financial Management • Automated invoice items • Generate Invoices • Study Payment tracking • Unplanned visits & procedures • Excep5ons OnCore vs REDCap ICTR OnCore CRM is required for the following protocols; All therapeu5c interven5on protocols being conducted within the auspices of UW-‐ Madison or the UW-‐Madison Affiliated Covered En5ty (ACE) that involve the use of: Drugs (including over-‐the-‐counter medica5ons, vitamins, herbals, and supplements) Biologics (including vaccines and stem cells) Devices (including Emergency, One-‐Time and Humanitarian Use Device) Radia5on therapy or medical imaging Protocols that use an inves5gator-‐held Inves5ga5onal New Drug (IND) or Inves5ga5onal Device Exemp5on (IDE) Protocols that use University of Wisconsin Hospital and Clinics (UWHC) or University of Wisconsin Medical Founda5on (UWMF) ancillary services (including Clinical Research Unit) that will result in UWHC or UWMF charges or fees that will be billed to the subject or research account (NOTE: subjects par5cipa5ng in this protocol category must also be registered through UWHC Pa5ent Registra5on) Protocols under the purview of the ICTR Data Monitoring CommiPee REDCap -‐ Standard Features Supports “Classic” & “Longitudinal” projects Surveys eCRF data entry typing & valida5on REDCap Shared Library Data Dic5onary Branching Logic Mul5-‐Arm and Mul5-‐event studies Record Management REDCap -‐ Data Tools • Data Import/Export tools • Built-‐in data quality tools – Basic analy5cs tools – Data quality checks • Standard repor5ng • Data Security – User Permissions/Roles – Data Access groups REDCap -‐ Advanced Features Security/Audit Logs/HIPAA data Data Import/Export Data Access Groups Randomiza5on Double Data Entry Project Linking SQL Query Data Type API DTS Data Security Quality of Care • Which asthma pa5ents are taking an inhaled cor5costeroid? • Are we following the guidelines for young pa5ents with hypertension? • * How am I doing as compare to others? (as a physician) Tools for QI/QA and research • Recrui5ng pa5ents at the point-‐of-‐care (CDSS – aler5ng systems) • Crea5ng customized data entry forms that translate to coded data • Pa5ent-‐reported outcomes ques5onnaires • Linking EHR data to informa5on resources Tools for QI/QA and research • Crea5ng opera5onal dashboard reports • Clinical flowsheets • Crea5ng pa5ent sets and storing pa5ent data (Clinical registries) • Crea5ng order sets (e.g., chemotherapy) Narratives (texts) Methods Development Molecular data BMIC will provide: - NLP, Imaging and Genomic pipeline development - Computational methods for image quantification - Molecular data analysis - Data mining Images Tools Development Now we learned what people want, and find ways to automate that so others can also benefit. We develop tools that can more easily be used by our customers! - Re-usable pipelines for: image, genomic and text analysis - Ontology editor (IntellaQ) - De-Identification and re-identification tools (honest broker) - Secure computing solutions - Storage and computing solutions (for large datasets) Data Analysis With data is their hands, researchers still use the BMIC to help them analyze the data they got. The analysis is certainly a mix of biostatistics and biomedical informatics tools. The BMIC provide expertise on: - Machine learning - Data mining - Big Data processing - Phenotyping - Genotyping - Complex image analysis - Ontology development Infrastructure opera5ons/ support The infrastructure provided by the BMI core, depends on people and technology. Those infrastructural services, depend on many things to be available and operational: - Network - Servers - Databases - Applications - Security Keeping the lights ON! Tools for Research and Analy5cs Analytics • Toolbox of Analy?cs Components – Pa5ent similarity analy5cs – Predic5ve modeling – Clustering – Process mining … ? Similarity Analysis x1 x2 xN Query patient Patient population x1 x2 ,… , – Scalable – Designed for sparse data – Computa5onally efficient for popula5on-‐wide analyses xN xN • Data model designed for analy5cs • Separate model training and scoring phases – Learning techniques that can incorporate user feedback x1 x2 … , x1 x2 … … • Key Proper?es Patient similarity assessment in clinical factor/feature space xN Visualization Analysis of Patient Cohorts x1 x2 ? Clinically similar to … ? Similarity Analysis xN Query patient Patient similarity assessment in clinical factor/feature space x1 x2 x1 x2 ,… , x1 x2 … … … Patient population , xN xN xN ? Visual outcome analysis Dashboard Techniques • “At-‐a-‐glance” summary of analy5cs results Dashboards Dashboard Wide Range of Visualiza5on Use Cases • Dashboard / “At a Glance” – e.g., “What is the pa5ent’s risk score?” • In-‐Depth Exploratory Analysis – e.g., “Understanding impacts of varia5ons in care” – Exper?se through interac?on • Key for scaling to “big data” Visualization Analysis of Patient Cohorts x1 x2 ? Clinically similar to … ? Similarity Analysis xN Query patient Patient similarity assessment in clinical factor/feature space x1 x2 x1 x2 ,… , x1 x2 … … … Patient population , Visual cohort refinement xN xN xN ? Visual outcome analysis Scaling for Big Data Many Features Many Pa5ents Many Pa5ents Feature Selec?on Cohort Selec?on Addi?onal Analy?cs Feedback via Expert Interac5on Disease surveillance © 2011 IBM Corporation hIp://healthmap.org/ Personal healthcare analytics? © 2011 IBM Corporation Personal fitness monitoring © 2011 IBM Corporation Acknowlegment of authors of slides included in this presenta5on: • • • • • • • • • • Umberto Tachinardi, MD, PhD – UW Madison Peter Embi, MD – OSU Biomedical Informa5cs Philip Payne, PhD -‐ OSU Biomedical Informa5cs James Cimino, MD – CC/NIH Lynn Vogel, PhD – MD Anderson Sass Babyon, BS – UW Health Stephanie Berkson – UW Health Grace Flood, MD – UW Health Kurt Riegel – UW Health HIMC and BMI staff – UW Health/UW-‐Madison SMPH All slides are copyright of the UW Health and UW-‐Madison SMPH, unless otherwise iden5fied. Thank you! Tak! Ques5ons? Spørgsmål? [email protected]