Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Searching and Exploring Biomedical Data Vagelis Hristidis School of Computing and Information Sciences Florida International University Roadmap Why is it challenging to search EMRs? XOntoRank: Leveraging Ontologies to improve sensitivity in EMR search ObjectRank: Use authority flow to rank EMR entities BioNav: Using MeSH to explore the results of PubMed queries Vagelis Hristidis, Searching and Exploring Biomedical Data 2 Roadmap Why is it challenging to search EMRs? XOntoRank: Leveraging Ontologies to improve sensitivity in EMR search ObjectRank: Use authority flow to rank EMR entities BioNav: Using MeSH to explore the results of PubMed queries Vagelis Hristidis, Searching and Exploring Biomedical Data 3 ELECTRONIC MEDICAL RECORDS (EMRs) Adoption of EMRs hard due to political reasons ◦ No unique patient id ◦ Confidentiality ◦ HIPAA (Health Insurance Portability and Accountability Act) Move towards XML-based format. One of most promising: Health Level 7’s Clinical Document Architecture (CDA). EMRs pose new challenges for Computer Scientists ◦ Confidentiality, authentication, secure exchange ◦ Storage, Scalability ◦ Dictionaries, terms disambiguation ◦ Search for interesting patterns (Data Mining) ◦ Data Integration, Schema mapping ◦ Searching and Exploring 4 Vagelis Hristidis, Searching and Exploring Biomedical Data SAMPLE CDA FRAGMENT 5 Vagelis Hristidis, Searching and Exploring Biomedical Data CDA Document – Tree View 6 Vagelis Hristidis, Searching and Exploring Biomedical Data LIMITATIONS OF Traditional IR General XML Search Text-based search engines do not exploit the XML tags, hierarchical structure of XML Whole XML document treated as single unit unacceptable given the possibly large sizes of XML documents Proximity in XML can also be measured in terms of containment edges EMRs have known but complex semantics EMRs include free text, numeric data, time sequences, negative statements. Routine references in EMRs to external information sources like dictionaries and ontologies. Vagelis Hristidis, Searching and Exploring 7 Biomedical Data Syntax vs. Semantics in Schema Example – query “Asthma Theophylline” More details at [Hristidis et al. NSF Symposium on Next Generation of Data Mining ’07] Vagelis Hristidis, Searching and Exploring Biomedical Data 8 Roadmap Why is it challenging to search EMRs? XOntoRank: Leveraging Ontologies to improve sensitivity in EMR search ObjectRank: Use authority flow to rank EMR entities BioNav: Using MeSH to explore the results of PubMed queries Vagelis Hristidis, Searching and Exploring Biomedical Data 9 XOntoRank: Leverage Ontological Knowledge Algorithm to enhance keyword search using ontological knowledge (e.g., SNOMED) [ICDE’08 poster, ICDE’09 full paper] Medical Dictionary 301229001 Bronchial Finding 118946009 Disorder of Thorax 50043002 Disorder of Respiratory system Is a Is a Is a 41427001 Disorder of Bronchus 79688008 Respiratory Obstruction Is a Medical Dictionary 405944004 Asthmatic Bronchitis Is a Is a Is a Finding site of May be 195967001 Asthma Finding site of Is a May be 266364000 Asthma attack Vagelis Hristidis, Searching and Exploring Biomedical Data 82094008 Lower respiratory tract structure 955009 Bronchial Structure Finding site of 10 Example 1 q = {“bronchitis”, “albuterol”} result = Observation code value Bronchitis value Albuterol Vagelis Hristidis, Searching and Exploring Biomedical Data 11 Example 2 q = {“asthma”, “albuterol”} result = ??? Vagelis Hristidis, Searching and Exploring Biomedical Data 12 XOntoRank A CDA node may be associated to a query keyword w through ontology. XOntoRank first assigns scores to ontological concepts ◦ OntoScore OS(): Semantic relevance of a concept c in the ontology to a query keyword w. Then, given these scores, assign Node Scores NS() to document nodes Other aggregation functions are possible. Vagelis Hristidis, Searching and Exploring Biomedical Data 13 Computing OntoScore of Concept Given Query Keyword Three ways to view the ontology graph: ◦ As an unlabeled, undirected graph. ◦ As a taxonomy. ◦ As a complete set of relationships. Vagelis Hristidis, Searching and Exploring Biomedical Data 14 Roadmap Why is it challenging to search EMRs? XOntoRank: Leveraging Ontologies to improve sensitivity in EMR search ObjectRank: Use authority flow to rank EMR entities BioNav: Using MeSH to explore the results of PubMed queries Vagelis Hristidis, Searching and Exploring Biomedical Data 15 Authority Flow Ranking in EMRs Query: “pericardial effusion” EventsPlan Hospitalization History = “48 year old..” Medication TimeStampCreated=”20 03-02-13 21:57:00.0".. Cardiac PatientID=”1438" ciate d v3 v7 Hospitalization _with TimeStampCreated=”2004-1027 22:00:00.0" History=”18 year old boy with an aggressive form of chest lymphoma…” Allergies = “NKDA”…... ith v2 prescribed_to asso _w TimeStampCreated=”2004-11-03 11:57:00.0" Events=”….small residual pericardial effusion…..” as so cia ted v1 p re s cribe d _ by recorded_by Employee TimeStampCreated=”2004-12recorded_by 23 14:03:00.0" Title=”Pediatric Cardiologist”…. v6 v5 EventsPlan Events=“4 month old baby… pericardial effusion...” Complication=”apical impulse … Echov4 large increasing pericardial effusion…” A subset of the electronic health record dataset. Work under submission. Vagelis Hristidis, Searching and Exploring Biomedical Data 16 Associated_ Events A-E created_by Employee created_by for A-H Patient Hospitalization P-M of H-E M-E prescribed_by recorded_by P-E Authority Flow Ranking H-M prescribed_ Medication to Schema of the EMR dataset Vagelis Hristidis, Searching and Exploring Biomedical Data 17 User Study Vagelis Hristidis, Searching and Exploring Biomedical Data 18 Explaining Subgraph Vagelis Hristidis, Searching and Exploring Biomedical Data 19 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Average Specificity Average Sensitivity User Study Results CO085BM25 BM25 Mean Sensitivity CO085 CO030 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 CO085BM25 BM25 CO085 CO030 Mean Specificity BM25: Traditional Information Retrieval Ranking Function CO: Clinical ObjectRank (Authority Flow) Vagelis Hristidis, Searching and Exploring Biomedical Data 20 Roadmap Why is it challenging to search EMRs? XOntoRank: Leveraging Ontologies to improve sensitivity in EMR search ObjectRank: Use authority flow to rank EMR entities BioNav: Using MeSH to explore the results of PubMed queries Vagelis Hristidis, Searching and Exploring Biomedical Data 21 Biological Databases (cont’d) – Results Navigation [ICDE09, TKDE 2010] With SUNY Buffalo. Demo at http://db.cse.buffalo.edu/bionav/ Most publications in PubMed annotated with Medical Subject Headings (MeSH) terms. Present results in MeSH tree. Propose navigation model and smart expansion techniques that may skip tree levels. Vagelis Hristidis, Searching and Exploring Biomedical Data 22 BioNav: Exploring PubMed Results - Query Keyword: prothymosin - Number of results: 313 - Navigation Tree stats: • # of nodes: 3941 • depth: 10 • total citations: 30897 Big tree with many duplicates! MESH (313) Amino Acids, Peptides, and Protei Proteins (307) Nucleoproteins (40) Histones (15) 4 more nodes 45 more nodes 2 more nodes Biological Phenomena, … (217) Cell Physiology (161) Cell Growth Processes (99) 15 more nodes 3 more nodes Genetic Processes (193) Gene Expression (92) Transcription, Genetic (25) 1 more node 10 more nodes 95 more nodes Static Navigation Tree for query “prothymosin” Vagelis Hristidis, Searching and Exploring Biomedical Data 23 BioNav: Exploring PubMed Results Reveal to the user a selected set of descendent concepts that: (a) Collectively contain all results (b) Minimize the expected user navigation cost Not all children of the root are necessarily revealed as in static navigation. Vagelis Hristidis, Searching and Exploring Biomedical Data 24 BioNav Evaluation Overall Navigation Cost (# of Concepts Revealed + # of EXPAND Actions) 20 18 16 14 12 10 8 6 4 2 0 Static BioNav Vagelis Hristidis, Searching and Exploring Biomedical Data 25 References Abhijith Kashyap, Vagelis Hristidis, Michalis Petropoulos, and Sotiria Tavoulari. Effective Navigation of Query Results Based on Concept Hierarchies. IEEE Transactions on Knowledge and Data Engineering (TKDE) 2010 Fernando Farfán, Vagelis Hristidis, Anand Ranganathan, and Michael Weiner. XOntoRank: Ontology-Aware Search of Electronic Medical Records. IEEE International Conference on Data Engineering (ICDE) 2009 Abhijith Kashyap, Vagelis Hristidis, Michalis Petropoulos, and Sotiria Tavoulari. BioNav: Effective Navigation on Query Results of Biomedical Databases. IEEE International Conference on Data Engineering, ICDE 2009 Vagelis Hristidis, Fernando Farfán, Redmond P. Burke, Anthony F. Rossi, Jeffrey A. White. Information Discovery on Electronic Medical Records. National Science Foundation Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation (NGDM) 2007 Supported by NSF IIS-0811922: Information Discovery on Domain Data Graphs, 20082011 NSF CAREER IIS-0952347, 2010-2015 Vagelis Hristidis, Searching and Exploring Biomedical Data 26