Download Medical Concept Representation: From Classification to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
AMIA 05; CG Chute
Biomedical Informatics
Biomedical Informatics
The Continuum Of Biomedical Informatics
Bioinformatics meets Medical Informatics
Reference
Ontologies in Biomedicine
Biology
Medicine
10
9
Christopher G. Chute, MD DrPH
8
7
6
Professor and Chair, Biomedical Informatics
Mayo Clinic College of Medicine
Rochester, Minnesota, USA
5
4
3
2
1
G
en
om
Pr
et
ot ics
ab
eo
ol
M
m
ic
ol
Pa ics
e
th
M cul
w
ol
a
ec r M ays
ul
od
ar
el
Si
i
m ng
C
ul
el
at
lu
i
M
la
ol
r M on
ec
od
ul
e
a
G
en r A ls
om ss
a
ys
ic
T
B
io est
sp
in
g
ec
im
e
La ns
b
D
is
ea Tri Dat
al
a
se
s
&
D
M Syn at a
e
dr
Pa dic
om
tie al
Im e
nt
a
Re
gi
ng
c
E H or
d
R
D
St
at
a
ru
ct
ur
es
0
M
AMIA, Washington DC, 2005
© Mayo Clinic College of Medicine 2005
Biomedical Informatics
2
Biomedical Informatics
Big Science Collaboration
and Semantics
• Communication of findings and results
Making Shared Context Explicit
• Human publication
• Sharing of data resources as building blocks
• Foundation for incremental, big-science
• Harnessing computing requires formalization
• Data format and structures
• Discrete terms, vocabulary, ontology
• Non-ambiguous concepts, non-overlapping terms
• Information models and problem architectures
• Standards, conventions, and shared context
© Mayo Clinic College of Medicine 2005
CONCEPT
CONCEPT
Symbolises
Refers To
Refers To
Symbolises
“I see a ClipArt
image of a rose”
Stands For
Referent
“Rose”,
“ClipArt”
Symbol
“Rose”,
“ClipArt”
Symbol
Context
Stands For
Context
Syntax
Terminologies
3
Biomedical Informatics
Formal Shared
Context
© Mayo Clinic College of Medicine 2005
Terminologies
[From Solbrig]4
Biomedical Informatics
If science is communication
what is its language?
• Most ontologies and vocabularies are created to
meet a specific application or use-case
• Despair their re-use in alternative contexts
• Virtually all terminologies invoke concepts “out
of domain”
Science Interlingua
Interlocking Reference Terminologies
Reference Terminology
• A coherent organization of concepts about a
well-characterized domain, e.g.
• LOINC – drugs
• SNOMED, MeSH – anatomy, drugs
• Identifying common “atoms” an elusive goal
• Fraught with composition
• Micro-information models (sentences and ¶’s)
© Mayo Clinic College of Medicine 2005
© Mayo Clinic College of Medicine
2005
5
• Units of Measure
• Pharmaceuticals
• Anatomy
• Cellular processes
• Reference concepts that underpin scientific
expression
• Abstract concept space that serves no
application need (with or without the real world)
© Mayo Clinic College of Medicine 2005
6
1
AMIA 05; CG Chute
Biomedical Informatics
Biomedical Informatics
The VA, FDA, NLM Drug Ontology Projects
Science Interlingua
Babble vs Babel (and other dyslexiae)
• Will an Esperanto of Science fail out of the box?
[modified from Steve Brown]
Unique Ingredient
Identifier: UNII
Chemical
Structure
VHA
Drug Class
Labeled
Properties
Drug Component
Clinical Drug
Of Action,
Mechanism
Pharmacokinetics
Finished Dose Form
Effect,
Physiologic
Intent,
Therapeutic
• Is it language for humans?
• It is language for computers?
• Is “the” science ontology [or interlocking plurality]
• a period table?
• a body of weights and measures?
• a dictionary?
• a thesaurus?
• a language?
• Whither cognition?
• Inference is not understanding
Active Ingredient
Dosage
Form
Drug Product
Indications
Packaged Drug (NDC)
Product Label
© Mayo Clinic College of Medicine 2005
7
Biomedical Informatics
© Mayo Clinic College of Medicine 2005
Familiar Points Along Continuum
Modern Health Vocabularies
• Nomenclature – Highly Detailed Descriptions
(SNOMED)
• Classification – Organized Aggregation of
Descriptions into a Rubric (ICDs)
• Groupings – High Level Categories of Rubrics
(DRGs)
Nomenclature
Detailed
9
Biomedical Informatics
Machine Abstracted,
Algorithmically
Derived
Classifications
Clinical
Classifications
Surveys, Scales,
and Datasets
Clinical Observations and
Patient Record Recording
Molecular, Genomic, Cellular
Increasing Detail and Richness
© Mayo Clinic College of Medicine
2005
© Mayo Clinic College of Medicine 2005
10
Blois, 1988
Medicine and the nature of vertical reasoning
• Molecular: receptors, enzymes, vitamins, drugs
• Genes, SNPs, gene regulation
• Physiologic pathways, regulatory changes
• Cellular metabolism, interaction, meiosis,…
• Tissue function, integrity
• Organ function, pathology
• Organism (Human), disease
• Sociology, environment, nutrition, mental health…
High Level Groups,
e.g. DRGs
© Mayo Clinic College of Medicine 2005
Classification Groups
Grouped
Biomedical Informatics
Abstraction Layers
Narrow Boundary of Ideal, Human Recording
Increasing Abstraction
8
Biomedical Informatics
What are factors that erode
shared context in biomedicine
• Concept granularity (specificity)
• Vertical scope (molecules to society)
• Divergent concept content (codes)
• Divergent information models
• Terminology – Information model boundary
• Human use of “machine” concepts
• Human orneriness
Humanly
Recorded
© Mayo Clinic College of Medicine 2005
11
© Mayo Clinic College of Medicine 2005
12
2
AMIA 05; CG Chute
Biomedical Informatics
Biomedical Informatics
Content vs. Structure
Contest or Synergy? Computer Equivalent?
Domain-specific expansion of “MS”
Semantics by domain context
Cardiology mitral stenosis
Neurology multiple sclerosis
Anesthesia morphine sulfate
Obstetrics magnesium sulfate
Research science manuscript
Physics millisecond
Education Master of Science
U.S. Postal Service Mississippi
Computer science Microsoft
Correspondence female name prefix
© Mayo Clinic College of Medicine 2005
Family History of Breast Cancer
Family History of Heart Disease
Family History of Stroke
Family History
Breast Cancer
Heart Disease
Stroke
Terminologic
Model
Information
Model
Equivalent
Content
[adapted from Rossi-Mori]
13
Biomedical Informatics
© Mayo Clinic College of Medicine 2005
14
Biomedical Informatics
Information Model (HL7)
Terminology Model (SNOMED)
HL7 RIM
targetSiteCode(Observation)
targetSiteCode(Procedure)
methodCode
(Observation & Procedure)
approachSiteCode(Procedure)
priorityCode(Act)
Context mis-match
Challenges for Reference concepts
• Cultural aggregation and splitting
SNOMED CT Attribute
“finding site”
• 7 words for “rice” in Thai
• 17 works for “snow” in Inuktitut
)Must the reference enumerate the superset?
• Composition vs Terms
• Pre-coordinated terms – “Colon Cancer”
• Neologisms for “sentence concepts”
)Should the reference include conceptual atoms
“procedure site”
“method”
“approach,” “access”
and molecules?
“priority”
• Composition is in the eye of the beholder
[adapted from Markwell]
© Mayo Clinic College of Medicine 2005
15
Biomedical Informatics
© Mayo Clinic College of Medicine 2005
16
Biomedical Informatics
Reference Truth: Variations in Identity
On orthologs, paralogs, and SNPs
• Identity in context of resolving to same concept
in a reference terminology
• Enzymes that share function: Sulfotransferase
• Orthologs across primates
• Paralogs (including pseudogenes) in humans
• Polymorphisms between individuals
© Mayo Clinic College of Medicine 2005
© Mayo Clinic College of Medicine
2005
SULT6B1 Sequence
Differences Between Primates
Location in Human SULT6B1
17
Nucleotide Sequence
Nucleotide
Amino Acid
Human
Gorilla
Chimpanzee
A148G
G274A
A314C
A321G
G336C
C390G
T400C
G429C
GCT(538-540)CCC
C609A
C636T
A651C
A697G
LYS50GLU
GLU92LYS
LYS105THR
THR107THR
LEU112PHE
PHE130LEU
PHE134LEU
ARG143SER
ALA180PRO
ALA203ALA
HIS212HIS
PRO217PRO
SER233GLY
A
G
A
A
G
C
T
G
GCT
C
C
A
A
A
A
C
G
C
G
T
C
CCC
A
C
C
A
G
A
A
G
C
C
C
C
GCT
A
T
A
G
© Mayo Clinic College of Medicine 2005
18
3
AMIA 05; CG Chute
Biomedical Informatics
Biomedical Informatics
Human SULT Genes
SULT1A2
~5.1 kb
SULT1A3
~7.3 kb
270
105
SULT1A1
~4.4 kb
347
74
152
126
98
127
95
181
152
126
98
127
95
181
152
126
98
127
95
181
SULT Data Mining
Pseudogenes
Exon length ( bp)
337
SULT1C3
196
2
>142
113
88
113
3
129
4
98
6
95
5
127
7a
181
7b
181
8a
>113
8b
>113
499
~18 kb
4998
SULT1B1
>28 kb
432
SULT1C1
~20.1 kb
>164
129
172
126
95
98
127
95
181
98
127
95
181
851
2129
3035
137
1421
4144
233
>113
SULT6B
361
1
2
106
3
113
4
90
4b
84
5
127
6
95
7
157
8
246
~29 kb
542
SULT1C2
~10.1 kb
97
SULT1E1
~18 kb
154
215
SULT2A1
~16 kb
SULT2B1
~48 kb
152
SULT4A1
~38 kb
544
126
98
127
95
181
126
98
127
95
181
127
95
178
127
95
181
127
95
139
209
209
248
131
81
974
730
3840
407
3349
4231
3541
3368
167
955
291
1605
© Mayo Clinic College of Medicine 2005
19
Biomedical Informatics
© Mayo Clinic College of Medicine 2005
20
Biomedical Informatics
Discriminate Differences of Kind
Reference Terminology
• At what level of granularity?
• What is a Sulfotransferase?
• What level of detail should a reference
terminology of enzymes convey?
• Is the logical limit of all reference terminologies
a basis on types of quark?
© Mayo Clinic College of Medicine 2005
21
© Mayo Clinic College of Medicine 2005
22
Biomedical Informatics
LexGrid as a Terminology Interchange
• Proliferation of ontologies and vocabularies
• Varieties of formats and terminology models
• Various versions over time
• Hard to find appropriate resource
• Establish “web of terminology” to link content
• Extension of Semantic Web concept
• Common tools, formats, and interfaces
LexGrig.org & HL7 CTS integrated into cBIO
© Mayo Clinic College of Medicine 2005
© Mayo Clinic College of Medicine
2005
23
4
Related documents