Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science Department University of California Los Angeles, CA [email protected] www.kmed.cs.ucla.edu Outline Data types Uses of knowledge bases to enhance information management Sample systems Structured data Multi-media Free-text Conclusion Information Formats used in Biomedical Applications Structure Data Multi-media Images Semi-structure Free-text Uses of Knowledge Bases to Enhance Information Management Approximate matching Query conditions Image features Similar conceptual terms Uses of Knowledge Bases to Enhance Information Management KB query processing Similarity query answering Associative query answering Scenario-specific query answering Sentinel --Triggering and alerting Examples of KB Information Systems CoBase (1990-1998), DARPA A database that cooperates with the user for structure data KMeD (1991-2000), NSF A Knowledge-based medical multi-media database Medical Digital Library (2001-2005), NIH A knowledge-based digital file room for patient care, education, and research. CoBase www.cobase.cs.ucla.edu Project leader: Wesley W. Chu Graduate students: K. Chiang C. Larson R. Lee M. Merzbacher M. Minock Frank Meng Wenlei Mao Mark Yang K. Zhang Staff: Q. Chen Gladys Chow Hua Yang CoBase: Cooperative Databases Conventional query answering Need to know the detailed data based schema Cannot get approximate answers Cannot answer conceptual queries Cooperative query answering Derive approximate answers Answer conceptual queries Provide additional relevant answers that user does not (or does not know how to) ask for Cooperative Queries Heterogeneous Information Sources CoBase Servers Find a nearby friendly airport that can land F-15 Find hospitals with facility similar to St. John’s near LAX CoBase provides: Relaxation Approximation Association Explanation Domain Knowledge Find a seaport with railway facility in Los Angeles Generalization and Specialization More Conceptual Query Generalization Conceptual Query Generalization Specific Query Specialization Conceptual Query Specialization Specific Query Cooperative Querying for Medical Applications Query Relaxed Query Find the treatment used for the tumor similar-to (loc, size) X1 on 12 year-old Korean males. Find the treatment used for the tumor Class X on preteen Asians. Association The success rate, side effects, and cost of the treatment. Type Abstraction Hierarchies for Medical Domain Tumor (location, size) Age Ethnic Group Class X Preteens 9 10 12 Teen Adult [loc1 loc3] [s1 s3] Class Y [locY sY] 11 Asian Korean X3 X1 X2 [loc1 s1] [loc2 s2] [loc3 s3] African Chinese Japanese European Filipino KB: Type Abstraction Hierarchy Using clustering technique to group similar Attribute values Image features Spatial relationships among objects Provides multi-level knowledge (conceptual) representation Data mining for TAH for Numerical Attribute Values Clustering metrics: relaxation error Difference between the exact value and the returned approximate value Relaxation error is weighted by the probability of occurrence of each value Can be extended to multiple attributes Query Relaxation Query Relax Attribute Display Yes Database Answers No TAHs Query Modification Summary: CoBase Derive Approximate Answers Answer Conceptual Queries Provide Associative Query Answers KMeD www.kmed.cs.ucla.edu PI: Wesley Chu, Ph.D, Computer Science Department Co-PIs: A. Cardenas, Ph.D, Computer Science Department Ricky Taira , Ph.D, School of Medicine Graduate students: Alex Bui Chrisitna Chu John Dionisio T. Plattner D. Johnson C. Hsu T. Ieong Consultants: Denies Aberle, M.D. C.M. Breant, Ph.D KMeD Goal: Retrieval of Images by Features & Content Features Spatial Relations size, shape, texture, density, histology angle of coverage, shortest distance, overlapping ratio, contact ratio, relative direction Evolution of Object Growth fusion, fission Characteristics of Medical Queries Multimedia Temporal Evolutionary Spatial Imprecise Knowledge-Based Image Model TAH TAH TAH SR(t,b) Tumor Size SR(t,l) Lateral Ventricle Knowledge Level SR(t,l) SR(t,b) Brain TAH Tumor Lateral Ventricle SR: Spatial Relation b: Brain t: Tumor l: Lateral Ventricle Schema Level Representation Level (features and content) Queries KnowledgeBased Query Processing Query Analysis and Feature Selection Knowledge-Based Content Matching Via TAHs Query Relaxation Query Answers User Model To customize users’ interest and preference, needs, and goals. e.g. query conditions, relaxation control, etc. User type Default Parameter Values Feature and Content Matching Policies Complete Match Partial Match User Model (cont.) Relaxation Control Policies Relaxation Order Unrelaxable Object Preference List Measure for Ranking Triggering conditions Query Preprocessing Segment and label contours for objects of interest Determine relevant features and spatial relationships (e.g., location, containment, intersection) of the selected objects Organize the features and spatial relationships of objects into a feature database Classify the feature database into a Type Abstraction Hierarchy (TAH) Similarity Query Answering Determine relevant features based on query input Select TAH based on these features Traverse through the TAH nodes to match all the images with similar features in the database Present the images and rank their similarity (e.g., by mean square error) Visual Query Language and Interface Point-click-drag interface Objects may be represented by icons Spatial relationships among objects are represented graphically Visual Query Example Retrieve brain tumor cases where a tumor is located in the region as indicated in the picture Summary: KMeD Image retrieval by feature and content Matching images based on features Processing of queries based on spatial relationships among objects Answering of imprecise queries Expression of queries via visual query language Integrated view of temporal multimedia data in a timeline metaphor Medical Digital Library www.kmed.cs.ucla.edu Project leader: Wesley W. Chu Graduate students: Victor Z. Liu Wenlei Mao Qinghua Zou Consultants: Hooshang Kangaloo, M.D. Denies Aberle, M.D. Data Types Used in a Medical Digital Library Structured data (patient lab data, demographic data,…)--CoBase Images (X rays, MRI, CT scans)--KMeD Free-text (Patient reports, Teaching files, Literature, News articles)--FTRS (Free-text retrieval system) A Free-Text Retrieval System (FTRS) Ad hoc query Patient report for content correlation Knowledge-based FreeText Retrieval System (FTRS) Query results Patient reports Medical literature Teaching materials News Articles A Sample Patient Report … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … Scenario-Specific Retrieval … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … ??? ??? How How to to treat the diagnose disease the disease Diagnosisrelated articles Treatmentrelated articles Challenge I: Indexing for Free-Text Extracting key concepts in the freetext for indexing Free-text: Lung cancer, small cell, stage II Concept terms in knowledge source: stage II small cell lung cancer Conventional methods use NLP Not scalable Challenge II: Mismatch between terms used in query and documents Example Query: … lung cancer, … ? Document 1: … lung carcinoma … ? ? Document 3: anti-cancer drug combinations… Document 2: … lung neoplasm … Challenge III: Terms used in the query are too general Expanding the general terms in the query to specific terms that are used in the document Query: lung cancer, diagnosis chest x-rayoptions , bronchography, … √ ? Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer … A Medical KB:Unified Medical Language System (UMLS) Meta-thesaurus - control vocabulary (1.6M biomedical phrases, representing 800K concepts) Semantic Network – classify concepts into classes (e.g. disease and syndrome, treated by, therapeutic procedure, etc.) Specialized Lexicon Using knowledge sources to resolve these challenges Challenge I: Automatic indexing of free text Challenge II : Mismatch between terms in the query and the documents Challenge III: Terms in the query are too general IndexFinder: Extracting domainspecific key concepts Technique Permute words from text to generate concept candidates. Use knowledge base to select the valid candidates. Problem Valid candidates may be irrelevant to the document. Redundant concept Filtering out Irrelevant Concepts Syntactic filter: Limit permutation of words within a sentence. Semantic filter: Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts Use ISA relationship to filter out general concepts and yield specific concepts. Using knowledge sources to resolve these challenges Challenge I: Automatic indexing of free text Challenge II : Mismatch between terms in the query and the documents Challenge III: Terms in the query are too general Phrase-based Vector Space Model (VSM) Query: … lung cancer, … √ ? lung cancer = lung carcinoma … missing!!! Document: … anti-cancer lung neoplasm carcinoma drug … … combinations … parent_of anti-cancer drug combinations lung neoplasm … Knowledge source Phrase-based VSM Examples Query: “lung cancer …” Document: “anti-cancer drug combinations …” Query Phrases: [(C0242379); “lung” “cancer”]… Phrases: [(C0003393); “anti” “cancer” “drug” “combin”]… [(C0242379); “lung” “cancer”] … Document [(C0003393); “anti” “cancer” “drug” “combin”] … Using knowledge sources to resolve these challenges Challenge I: Automatic indexing of free text Challenge II : Mismatch between terms in the query and the documents Challenge III: Terms in the query are too general Query Expansion (QE) Queries in the following form benefit from expansion: <key concept> + <general supporting concept(s)> e.g. lung cancer e.g. treatment options expansion <key concept> + <specific supporting concept(s)> e.g. lung cancer e.g. chemotherapy, radiotherapy Knowledge-based Scenariospecific Expansion Knowledge Source Disease or Syndrome treats Therapeutic or Preventive Procedure + heart disease Statistical lung cancer study patient result result survive increase mediastinoscopy bronchoscopy bronchoscopy chemotherapy radiotherapy heart surgery Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS) 1 0.9 Knowledge-based Statistical expansion expansion (Stem (Phrase VSM)VSM) Statistical expansion (Stem VSM) Statistical Phraseexpansion VSM (no (Stem expansion) VSM) Stem VSM (no expansion) Phrase Stem VSM VSM (no(no expansion) expansion) Stem VSM (no expansion) Overall improvement: 33%, 100 queries vs. 5%, 50 queries 0.8 0.7 Precision 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 Recall 0.6 0.7 0.8 0.9 1 FTRS: Scenario-specific Query Answering Sample templates: “<disease>, treatment,” “<disease>, diagnosis ” relevant documents Phrase-based VSM Engine lung cancer IndexFinder Template: “<disease>, treatment” lung cancer, treatment Query Expansion … lung cancer radiotherapy chemotherapy cisplatin FTRS: Scenario-specific content correlation IndexFinder extracts key concepts from free-text for content correlation relevant documents Query Templates e.g. treatment, diagnosis, etc. Phrase-based VSM Engine Scenario Selection IndexFinder Query Expansion … Patient Report Summary: KB Free-text retrieval Technologies IndexFinder – extracts key concepts from the free-text Phrase-based VSM – a new document indexing paradigm (concept and its word stems) to improve retrieval effectiveness Knowledge-based query expansion – match query with scenario-specific documents provides scenario-specific free-text retrieval Conclusions Knowledge sources provides Approximate matching Query conditions Image features Query processing Similarity query answering User modeling Associative answering Triggering and alerting Document retrieval Convert ad hoc free-text into controlled vocabulary Phrase-based VSM Content correlation Scenario-specific retrieval Increase capabilities and effectiveness Information Management Acknowledgement This research is supported by DARPA, NSF Grant # 9619345, and NIC/NIH Grant#4442511-33780