Download IFAC 2003 keynote - UCLA Computer Science

Document related concepts

The Cancer Genome Atlas wikipedia , lookup

Transcript
Knowledge-based Information Management
for Biomedical Applications
Wesley Chu
Computer Science Department
University of California
Los Angeles, CA
[email protected]
www.kmed.cs.ucla.edu
Outline



Data types
Uses of knowledge bases to enhance
information management
Sample systems




Structured data
Multi-media
Free-text
Conclusion
Information Formats used in
Biomedical Applications

Structure Data

Multi-media Images

Semi-structure

Free-text
Uses of Knowledge Bases to Enhance
Information Management

Approximate matching

Query conditions

Image features

Similar conceptual terms
Uses of Knowledge Bases to Enhance
Information Management


KB query processing

Similarity query answering

Associative query answering

Scenario-specific query answering
Sentinel --Triggering and alerting
Examples of KB Information Systems

CoBase (1990-1998), DARPA

A database that cooperates with the user for
structure data

KMeD (1991-2000), NSF


A Knowledge-based medical multi-media database
Medical Digital Library (2001-2005), NIH

A knowledge-based digital file room for patient care,
education, and research.
CoBase
www.cobase.cs.ucla.edu

Project leader: Wesley W. Chu

Graduate students:
K. Chiang
C. Larson
R. Lee
M. Merzbacher
M. Minock
Frank Meng
Wenlei Mao
Mark Yang
K. Zhang

Staff:
Q. Chen
Gladys Chow
Hua Yang
CoBase: Cooperative Databases

Conventional query answering




Need to know the detailed data based schema
Cannot get approximate answers
Cannot answer conceptual queries
Cooperative query answering



Derive approximate answers
Answer conceptual queries
Provide additional relevant answers that user
does not (or does not know how to) ask for
Cooperative Queries
Heterogeneous
Information Sources
CoBase Servers
Find a nearby friendly airport
that can land F-15
Find hospitals with facility
similar to St. John’s near LAX
CoBase provides:
Relaxation
Approximation
Association
Explanation
Domain Knowledge
Find a seaport with railway
facility in Los Angeles
Generalization and Specialization
More Conceptual Query
Generalization
Conceptual Query
Generalization
Specific Query
Specialization
Conceptual Query
Specialization
Specific Query
Cooperative Querying for Medical Applications

Query


Relaxed Query


Find the treatment used for the tumor similar-to
(loc, size) X1 on 12 year-old Korean males.
Find the treatment used for the tumor Class X on
preteen Asians.
Association

The success rate, side effects, and cost of the
treatment.
Type Abstraction Hierarchies for
Medical Domain
Tumor (location, size)
Age
Ethnic Group
Class X
Preteens
9
10
12
Teen
Adult
[loc1 loc3]
[s1 s3]
Class Y
[locY sY]
11
Asian
Korean
X3
X1
X2
[loc1 s1] [loc2 s2] [loc3 s3]
African
Chinese Japanese
European
Filipino
KB: Type Abstraction Hierarchy

Using clustering technique to group
similar




Attribute values
Image features
Spatial relationships among objects
Provides multi-level knowledge
(conceptual) representation
Data mining for TAH for Numerical
Attribute Values

Clustering metrics: relaxation error



Difference between the exact value and the
returned approximate value
Relaxation error is weighted by the
probability of occurrence of each value
Can be extended to multiple attributes
Query Relaxation
Query
Relax
Attribute
Display
Yes
Database
Answers
No
TAHs
Query
Modification
Summary: CoBase



Derive Approximate Answers
Answer Conceptual Queries
Provide Associative Query Answers
KMeD
www.kmed.cs.ucla.edu



PI: Wesley Chu, Ph.D, Computer Science Department
Co-PIs:
 A. Cardenas, Ph.D, Computer Science Department
 Ricky Taira , Ph.D,
School of Medicine
Graduate students:
Alex Bui
Chrisitna Chu
John Dionisio
T. Plattner
D. Johnson
C. Hsu
T. Ieong

Consultants:
Denies Aberle, M.D.
C.M. Breant, Ph.D
KMeD Goal: Retrieval of Images by
Features & Content

Features


Spatial Relations


size, shape, texture, density, histology
angle of coverage, shortest distance,
overlapping ratio, contact ratio, relative
direction
Evolution of Object Growth

fusion, fission
Characteristics of Medical Queries





Multimedia
Temporal
Evolutionary
Spatial
Imprecise
Knowledge-Based Image Model
TAH
TAH
TAH
SR(t,b)
Tumor Size
SR(t,l)
Lateral
Ventricle
Knowledge Level
SR(t,l)
SR(t,b)
Brain
TAH
Tumor
Lateral
Ventricle
SR: Spatial Relation
b: Brain
t: Tumor
l: Lateral Ventricle
Schema Level
Representation Level
(features and content)
Queries
KnowledgeBased
Query
Processing
Query Analysis and
Feature Selection
Knowledge-Based
Content Matching
Via TAHs
Query Relaxation
Query Answers
User Model
To customize users’
interest and preference, needs, and goals.
e.g. query conditions, relaxation control, etc.



User type
Default Parameter Values
Feature and Content Matching Policies


Complete Match
Partial Match
User Model (cont.)

Relaxation Control Policies

Relaxation Order

Unrelaxable Object

Preference List

Measure for Ranking

Triggering conditions
Query Preprocessing




Segment and label contours for objects of
interest
Determine relevant features and spatial
relationships (e.g., location, containment,
intersection) of the selected objects
Organize the features and spatial relationships
of objects into a feature database
Classify the feature database into a Type
Abstraction Hierarchy (TAH)
Similarity Query Answering




Determine relevant features based on
query input
Select TAH based on these features
Traverse through the TAH nodes to
match all the images with similar
features in the database
Present the images and rank their
similarity (e.g., by mean square error)
Visual Query Language and Interface



Point-click-drag interface
Objects may be represented by icons
Spatial relationships among objects
are represented graphically
Visual Query Example
Retrieve brain tumor
cases where a tumor is
located in the region as
indicated in the picture
Summary: KMeD






Image retrieval by feature and content
Matching images based on features
Processing of queries based on spatial relationships
among objects
Answering of imprecise queries
Expression of queries via visual query language
Integrated view of temporal multimedia data in a
timeline metaphor
Medical Digital Library
www.kmed.cs.ucla.edu


Project leader: Wesley W. Chu
Graduate students:
Victor Z. Liu
Wenlei Mao
Qinghua Zou

Consultants:
Hooshang Kangaloo, M.D.
Denies Aberle, M.D.
Data Types Used in a Medical
Digital Library



Structured data (patient lab data,
demographic data,…)--CoBase
Images (X rays, MRI, CT scans)--KMeD
Free-text (Patient reports, Teaching
files, Literature, News articles)--FTRS
(Free-text retrieval system)
A Free-Text Retrieval System
(FTRS)
Ad hoc query
Patient report for content correlation Knowledge-based FreeText Retrieval
System (FTRS)
Query results
Patient
reports
Medical
literature
Teaching
materials
News Articles
A Sample Patient Report
…
Tissue Source:
LUNG (FINE NEEDLE ASPIRATION) (LEFT
LOWER LOBE)
…
FINAL DIAGNOSIS:
- LUNG NODULE, LEFT LOWER LOBE (FINE
NEEDLE ASPIRATION):
- LUNG CANCER, SMALL CELL, STAGE II.
…
Scenario-Specific Retrieval
…
Tissue Source:
LUNG (FINE NEEDLE
ASPIRATION) (LEFT LOWER
LOBE)
…
FINAL DIAGNOSIS:
- LUNG NODULE, LEFT
LOWER LOBE (FINE NEEDLE
ASPIRATION):
- LUNG CANCER, SMALL
CELL, STAGE II.
…
???
??? How
How to
to
treat
the
diagnose
disease
the disease
Diagnosisrelated
articles
Treatmentrelated
articles
Challenge I: Indexing for Free-Text


Extracting key concepts in the freetext for indexing

Free-text: Lung cancer, small cell, stage II

Concept terms in knowledge source: stage
II small cell lung cancer
Conventional methods use NLP

Not scalable
Challenge II: Mismatch between
terms used in query and documents

Example
Query: … lung cancer, …
?
Document 1: … lung carcinoma …
?
?
Document 3: anti-cancer
drug combinations…
Document 2: … lung neoplasm …
Challenge III: Terms used in the
query are too general
Expanding the general terms in the query
to specific terms that are used in the
document
Query: lung cancer, diagnosis
chest x-rayoptions
, bronchography, …
√
?
Document: … the effectiveness of chest x-ray and
bronchography on patients with lung cancer …
A Medical KB:Unified Medical
Language System (UMLS)



Meta-thesaurus - control vocabulary (1.6M
biomedical phrases, representing 800K
concepts)
Semantic Network – classify concepts into
classes (e.g. disease and syndrome, treated
by, therapeutic procedure, etc.)
Specialized Lexicon
Using knowledge sources to
resolve these challenges



Challenge I: Automatic indexing of free
text
Challenge II : Mismatch between terms
in the query and the documents
Challenge III: Terms in the query are
too general
IndexFinder: Extracting domainspecific key concepts

Technique



Permute words from text to generate
concept candidates.
Use knowledge base to select the valid
candidates.
Problem


Valid candidates may be irrelevant to the
document.
Redundant concept
Filtering out Irrelevant Concepts

Syntactic filter:


Limit permutation of words within a
sentence.
Semantic filter:


Use the semantic type (e.g. body part,
disease, treatment, diagnosis) to filter out
irrelevant concepts
Use ISA relationship to filter out general
concepts and yield specific concepts.
Using knowledge sources to
resolve these challenges



Challenge I: Automatic indexing of free
text
Challenge II : Mismatch between terms
in the query and the documents
Challenge III: Terms in the query are
too general
Phrase-based Vector Space Model
(VSM)
Query: … lung cancer, …
√
?
lung cancer = lung carcinoma …
missing!!!
Document: … anti-cancer
lung neoplasm
carcinoma
drug
…
…
combinations …
parent_of
anti-cancer drug
combinations
lung neoplasm …
Knowledge source
Phrase-based VSM Examples
Query:
“lung cancer …”
Document:
“anti-cancer drug
combinations …”
Query
Phrases:
[(C0242379); “lung” “cancer”]…
Phrases:
[(C0003393); “anti” “cancer”
“drug” “combin”]…
[(C0242379); “lung” “cancer”] …
Document [(C0003393); “anti” “cancer” “drug” “combin”] …
Using knowledge sources to
resolve these challenges



Challenge I: Automatic indexing of free
text
Challenge II : Mismatch between terms
in the query and the documents
Challenge III: Terms in the query are
too general
Query Expansion (QE)

Queries in the following form benefit
from expansion:
<key concept> + <general supporting concept(s)>
e.g. lung cancer e.g. treatment options
expansion
<key concept> + <specific supporting concept(s)>
e.g. lung cancer e.g. chemotherapy, radiotherapy
Knowledge-based Scenariospecific Expansion
Knowledge Source
Disease or
Syndrome
treats
Therapeutic or
Preventive Procedure
+
heart disease
Statistical
lung cancer
study
patient
result
result
survive
increase
mediastinoscopy
bronchoscopy
bronchoscopy
chemotherapy
radiotherapy
heart surgery
Retrieval Effectiveness Comparison
(Corpus: OHSUMED, KB: UMLS)
1
0.9
Knowledge-based
Statistical expansion
expansion
(Stem
(Phrase
VSM)VSM)
Statistical expansion (Stem VSM)
Statistical
Phraseexpansion
VSM (no (Stem
expansion)
VSM)
Stem VSM (no expansion)
Phrase
Stem
VSM
VSM
(no(no
expansion)
expansion)
Stem VSM (no expansion)
Overall
improvement:
33%,
100 queries
vs.
5%,
50 queries
0.8
0.7
Precision
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
Recall
0.6
0.7
0.8
0.9
1
FTRS: Scenario-specific
Query Answering

Sample templates:
“<disease>, treatment,”
“<disease>, diagnosis ”
relevant documents
Phrase-based
VSM Engine
lung cancer
IndexFinder
Template:
“<disease>,
treatment”
lung cancer,
treatment
Query
Expansion
…
lung cancer
radiotherapy
chemotherapy
cisplatin
FTRS: Scenario-specific
content correlation

IndexFinder extracts key concepts from free-text for content correlation
relevant documents
Query
Templates
e.g. treatment,
diagnosis, etc.
Phrase-based
VSM Engine
Scenario
Selection
IndexFinder
Query
Expansion
…
Patient
Report
Summary: KB Free-text retrieval

Technologies



IndexFinder – extracts key concepts from the
free-text
Phrase-based VSM – a new document indexing
paradigm (concept and its word stems) to
improve retrieval effectiveness
Knowledge-based query expansion – match
query with scenario-specific documents
provides scenario-specific free-text retrieval
Conclusions

Knowledge sources
provides




Approximate matching
 Query conditions
 Image features
Query processing
 Similarity query answering
 User modeling
 Associative answering
 Triggering and alerting
Document retrieval
 Convert ad hoc free-text into controlled vocabulary
 Phrase-based VSM
 Content correlation
 Scenario-specific retrieval
Increase capabilities and effectiveness Information Management
Acknowledgement
This research is supported by DARPA, NSF
Grant # 9619345, and NIC/NIH
Grant#4442511-33780