Download Knowledge Discovery for Cancer Informatics and Public Health Informatics: Techniques, Case Studies, and Lessons Learned

Document related concepts

Middle East respiratory syndrome wikipedia , lookup

Bioterrorism wikipedia , lookup

Timeline of the SARS outbreak wikipedia , lookup

Transcript
Knowledge Discovery for Cancer
Informatics and Public Health
Informatics: Techniques, Case
Studies, and Lessons Learned
Hsinchun Chen
Director, Artificial Intelligence Lab,
University of Arizona
Acknowledgements: NIH, NSF, NCI, ACC,
NTU
Artificial Intelligence Lab
Research
• UA MIS Department (4th ranked); 30+ research
scientists; $25M funding since 1990; Chen, IEEE
and AAAS Fellow
• Intelligence & Security Informatics research;
NSF, DOJ, CIA; COPLINK system deployed in
1600+ agencies; Dark Web for countering
terrorism
• Biomedical Informatics research; NLM, NCI;
Chen NLM Scientific Counselor; HelpfulMed,
GeneScene system and BioPortal
Artificial Intelligence Lab, MIS, University of Arizona
2
A Little Promotion
GeneScene:
Cancer Pathway Knowledge
Extraction And Visualization
GeneScene Team
• Text Mining & Knowledge
Integration
• Data Mining
• Visualization & System
Development
• Domain Experts
• User Studies & Evaluation
Artificial Intelligence Lab, MIS, University of Arizona
Dr. Gondy Leroy (Claremont)
Byron Marshall (Oregon SU)
Dan McDonald (Utah SU)
Zan Huang (Penn SU)
Jiexun Li (Drexel U)
Chun-Ju Tseng
Shauna Eggers
Dr. Jesse Martinez
Dr. George Watts
Dr. Bernie Futscher (AZCC)
Dr. Hua Su
Dr. Karin Quinones
5
Outline
• GeneScene overview
• Research directions
 Text mining
 Knowledge integration
 Data mining
• GeneScene Visualizer
Artificial Intelligence Lab, MIS, University of Arizona
6
Knowledge Explosion: PubMed
No. of New Publications
Accumulated New Publications
12000000
600000
10000000
550000
500000
8000000
450000
6000000
400000
4000000
350000
2000000
300000
1980 1983 1985 1988 1990 1993 1995 1998 2000 2003
Year
•
1980 1983 1985 1988 1990 1993 1995 1998 2000 2003
Year
Average number of new citations appearing in PubMed
 In 1980: 746/day
 In 2004: 1,640/day
Artificial Intelligence Lab, MIS, University of Arizona
7
Artificial Intelligence Lab, MIS, University of Arizona
8
GeneScene Overview
•
Motivation
 Relieving information overload in biomedical research
 Automating processes of knowledge extraction and data analysis
•
Focus: genetic regulatory pathways
 Dissection of regulatory networks is crucial for a thorough understanding of
biological processes
 Complexity of biological networks raises challenges for computational research
•
Research goals
 To develop novel Natural Language Processing (NLP) techniques to support
information extraction
 To develop machine learning and data mining techniques to support highthroughput data analysis
 To create an integrated framework for pathway-related knowledge representation
and visualization
 Ultimately, to provide biomedical researchers with a pathway-related knowledge
discovery and integration platform
•
Funding: NIH/NLM, 1 R33 LM07299-01 (May 2002 – April 2006)
Artificial Intelligence Lab, MIS, University of Arizona
9
GeneScene: Components & Scope
Ontology-enhanced Knowledge
Integration
To aggregate and consolidate pathway
relations extracted from literature and to
integrate them with existing knowledge
sources using biomedical ontologies
Text Mining of Biomedical
Literature
To automatically extract regulatory
relations between biological
entities from free text
Data Mining for Genomic Studies
To extract regulatory pathway
information based on genomic
data & other resouces
Visualization of Regulatory Pathways
To facilitate accessing, understanding,
and analysis of extracted pathway
knowledge
Artificial Intelligence Lab, MIS, University of Arizona
10
Text Mining
• Extract all pathway-relevant relations from text
• Relations with gene or protein names on either
end of the relation are extracted
• Two types of relations in GeneScene
 Co-occurrence Relations (Concept Space): relations
between terms that often co-occur in a set of abstracts
  HelpfulMed (Cancer Space)
 Linguistic Relations: precise & semantically rich
relations from each abstract
Artificial Intelligence Lab, MIS, University of Arizona
11
HelpfulMED Search of Medical Websites
HelpfulMED search of Evidence-based Databases
What does database cover?
Search which databases?
How many documents?
Enter search term
Consulting HelpfulMED Cancer Space (Thesaurus)
Enter search term
Select relevant search terms
New terms are posted
Search again...
Or find relevant webpages
Browsing HelpfulMED Cancer Map
1
Visual Site Browser
Top level map
2
3
Diagnosis, Differential
4
Brain Neoplasms
5
Brain Tumors
Chinese Medical Intelligence (CMI)
• Goal:
 Providing medical and health information services to both
researchers and public.
• Content:
 350,000 high quality medical-related webpages collected from
mainland China, Hong Kong and Taiwan.
 Meta-search 3 large general Chinese search engines.
• Key Features:
 Built-in Simplified/Traditional Chinese encoding conversion
 Dynamic summarization for both Simplified and Traditional
Chinese
 Automatic categorization
 Visualization using SOM
Simplified Chinese summary
Chinese folder display
Chinese visualization
with SOM
Results are from both Simplified
and Traditional Chinese
Select websites from mainland
China, Hong Kong and Taiwan
Traditional Chinese summary
Original encoding of the result
Simplified/Traditional
Chinese summarization
Select search engines from mainland
Chinese
results
China,Traditional
Hong Kong
and Taiwan
haven been converted
into simplified Chinese
GeneScene Full Parser:
Arizona Relation Parser (ARP)
• Syntax and semantics are combined together in a hybrid parsing
grammar as opposed to the pipelined approach
• Introducing over 150 new word classes, while retaining many of the
original syntax word classes (i.e. noun, verb)
• The new word classes have semantic and lexical properties
• Semantic and syntactic properties of the new tags are not explicitly
detailed in the dictionary, but rather determined by the parsing rules
that define them
• Rules that apply to the tags reveal the syntactic and semantic roles of
the tags
Artificial Intelligence Lab, MIS, University of Arizona
21
ARP: Architecture
Tagging
Hybrid Parsing
Dictionary
/ Lexicon
Contextual
& Lexical
Rules
Transition
Rules
Correction
Rules
1
1
Sentence
Splitter
2
AZ Phrase
Tagger
3
AZ POS
Tagger
Grammar
Rules
3
2
4
Pre-processing
Parser
(FSA)
Correct
Parsing
Errors
5
Combine tags in
tag chart using
Grammar
Relation Extraction
Relations in Flat
Files
7
Apply Semantic Constraints by
Identifying Gene, Hormone,
and Protein Names
GO &
HUGO
4
6
Identify Knowledge
Patterns and Separate
Conjunctions
Conjunction
Rules
Knowledge
Patterns
Architecture diagram for the parser, consisting of three main stages: tagging, parsing, and relation extraction
Artificial Intelligence Lab, MIS, University of Arizona
22
Problem: Gene Pathway
•Title Key roles for E2F1 in signaling p53-
dependent apoptosis and in cell division within
developing tumors.
•Abstract: Apoptosis induced by the p53 tumor
suppressor can attenuate cancer growth in
preclinical animal models. Inactivation of the
pRb proteins in mouse brain epithelium by the
T121 oncogene induces aberrant proliferation
and p53-dependent apoptosis. p53 inactivation
causes aggressive tumor growth due to an
85% reduction in apoptosis. Here, we show
that E2F1 signals p53-dependent apoptosis
since E2F1 deficiency causes an 80% apoptosis
reduction. E2F1 acts upstream of p53 since
transcriptional activation of p53 target genes is
also impaired. Yet, E2F1 deficiency does not
accelerate tumor growth. Unlike normal cells,
tumor cell proliferation is impaired without
E2F1, counterbalancing the effect of apoptosis
reduction. These studies may explain the
apparent paradox that E2F1 can act as both an
oncogene and a tumor suppressor in
experimental systems
Action
Protocols
Graphic
Representation
p53
reads
"E2F1 signals p53-dependent
apoptosis"
E2F1
apoptosis
p53
infers
So, I'm assuming... a straight
line pathway...
E2F1
apoptosis
Expert
errs and
corrects
E2F1
reads
"E2F1 acts upstream of p53"
p53
apoptosis
E2F1
p53
reads
"E2F1 deficiency does not
accelerate tumor growth"
apoptosis
tumor growth
Final
graph
Prepositions: OF/BY/IN
OF
BY
IN
q0
Nominalization
(-ion)
q5
Adjective,
noun,
verb (-ed)
Adjective,
Noun,
verb (-ed)
Nominalization
(-ion)
Nominalization
(-ion)
Negation
q4
NP, 5: str1
NP
q1
Aux, 1: tr13
OF
q6
OF
Nominalization
(-ion)
q7
mod
Aux
mod
Negation
q2
Adjective,
noun,
verb (-ed)
q18
q13
NP
verb
aux
OF
verb
verb
q14
verb
Nominalization
(-ion)
q15
q3
mod
OF
q8
BY
q9
NP
q11
BY
q10
q12
NP
IN
IN
NP
NP
BY
IN
q16
NP
q17
IN
Example Map (one abstract)
Arizona Relation Parser Output
Original Sentence
Resulting Relation
Entity 1
Negation
Connector
Entity 2
(1) wild-type p53 tumor
suppressor protein, which
induces […] apoptosis…
wild-type p53
tumor suppressor
protein
False
induces
apoptosis
(2) Wt p53 also induced
significant apoptosis
Wt p53
False
also
induced
significant
apoptosis
(3) oncogene mutant p53
suppresses apoptosis
oncogene mutant
p53
False
suppresses
apoptosis
(4) mutant p53 blocked
E1A-induced apoptosis
Mutant p53
False
blocked
E1A-induced
apoptosis
E1A
False
induced
apoptosis
mutant p53
True
does not
induce
apoptosis
(5) mutant p53 […] does not
induce […] apoptosis
Artificial Intelligence Lab, MIS, University of Arizona
26
Text Mining Statistics (Jan. 2005)
Collection
P53
AP1
Yeast
Number of Abstracts
205,820
400,487
68,025
Number of Abstracts
w/ Relation Extracted
87,903
90,773
28,971
Linguistic relations
(full parser)
182,499
172,116
54,805
2,724,099
3,265,524
6,535,737
Concept Space
Relations
Artificial Intelligence Lab, MIS, University of Arizona
27
Knowledge Integration: Organizing Relations
• Relations are more useful when well organized
 Multiple name strings of the same biological entities
or processes are aggregated
 Important contextual information is captured
 Entities are cross-referenced to outside resources
• Well organized relations should help with
domain appropriate analysis tasks
Artificial Intelligence Lab, MIS, University of Arizona
28
An Example: Context and Term Variation
• 4 somewhat contradictory PubMed abstract
snippets*:
 (1) wild-type p53 tumor suppressor protein, which
induces […] apoptosis…
 (2) Wt p53 also induced significant apoptosis
 (3) oncogene mutant p53 suppresses apoptosis
 (4) mutant p53 blocked E1A-induced apoptosis
• (1) and (2) say that “p53 induces apoptosis”
• (3) and (4) say that “p53 inhibits apoptosis”
* From PubMed documents 10594026, 8643473, and 11809683, and 11375269
Artificial Intelligence Lab, MIS, University of Arizona
29
An Example: Context and Term Variation
• Analyzing the context more closely:
 Wild-type (1), & wt (2) p53 are non-mutated
 Mutant (3) & (4) p53 are mutated
 P53 protein (1) is a protein
 Oncogene p53 (3) is a gene
• Identifying context is important in organizing
extracted information. Words near “p53” suggest
that while normal p53 induces apoptosis,
mutated p53 inhibits it
* From pubmed documents 10594026, 8643473, and 11809683, and 11375269
Artificial Intelligence Lab, MIS, University of Arizona
30
Biological Entity Recognition and Identification
• To aggregate relations we need to recognize
and identify biological entities
• Recognition finds substance references in text,
identification matches those references to
external resources (Tuason et al. 2004)
• Three key object recognition difficulties (Fukuda et al.,
Palakal et al.):
 Compound words
 Ambiguous expressions
 New or unknown words
Artificial Intelligence Lab, MIS, University of Arizona
31
Aggregation System Design
PubMed
Abstracts
BioAggregate Tagger
Arizona
Relation
Parser (ARP)
HUGO
RefSeq
SGD
LocusLink
GO
Relational
Triples
Lexicon
Curation
BioAggregate
Tagger
Feature
Lexicons
Aggregated
Relations
Network
Visualizer
Artificial Intelligence Lab, MIS, University of Arizona
Relational
Triples
Aggregatable
Substance Lexicon
Decompositional
Tagging
Aggregated
Relations
32
A Decompositional Approach to Biomedical
Concept Matching
• BioAggregate tagger decomposes name strings
found in a relation by left-to-right longest-first
pattern matching using domain appropriate
lexicons of feature-signaling terms
• Lexicons built from existing resources and
human generated lists
 Substance names in LocusLink, RefSeq, HUGO, and
SGD, etc.
 Biological processes in Gene Ontology
 Lexicons of other features
Artificial Intelligence Lab, MIS, University of Arizona
33
Features For Decompositional Matching
Feature
Lexicon
Aggregatable
Substance
Explanation
A gene and its product(s). All references to a particular gene and its
product(s) share the same Aggregatable Substance value. E.g., p53,
tp53, and trp53.
Mutation
Indicating the status of an aggregatable substance. Only has two values,
mutated or not mutated (wild-type).
Substance
Type
"Type" of aggregatable substance. Currently there are three recognized
types: gene, protein, and mRNA.
Associator
Essentially verbs. This feature attempts to resolve verbs that occur in
multiple forms, but have the same stem. E.g., inhibit, inhibits, inhibited,
and inhibiting all share the Connector Associator value "inhibit."
Function
A biological process, such as apoptosis or angiogenesis (as in
biological_process list of Gene Ontology), or an action performed on an
aggregatable substance, such as phosphorylation or inhibition.
Species
The species/organism information associated with an entity or relation.
Cellular
Component
Stopword
The sub-cellular component or location of an entity or relation.
Common words judged to meet the standard “ignoring this word will not
mischaracterize pathway relations.”
Artificial Intelligence Lab, MIS, University of Arizona
34
P53 Testbed
• ARP extracted 182,499 relations from
87,903 PubMed abstracts related to the
gene p53
• As extracted, the relations display very
little overlap with 142,974 distinct entity
names and 127,397 distinct relational
pairs
Artificial Intelligence Lab, MIS, University of Arizona
35
More Abstract
More Detailed
5 Levels of Aggregation Granularity
Aggregation Level
Baseline
(string match entities and connector)
Feature Match
(feature synonym resolution)
Possible Applications
basis of comparison
detailed pathway analysis
Typed Substance
(distinguish genes and proteins)
pathway analysis – granularity is
comparable to some human-curated
databases
Aggregatable Substance
explore the function of a gene and its
gene products
Simple Pathway
(substance/function 4 categories for
connectors)
Artificial Intelligence Lab, MIS, University of Arizona
high level overviews and input to some
machine learning algorithms
36
Network Consolidation Results
Distinct Items Identified in 182,660 the P53 Relations at Different
Levels of Aggregation
Number of Distinct Items
60,000
50,000
40,000
Entities
30,000
Relations
20,000
Disjoint Relations
10,000
0
Baseline
Feature
Match
Typed
Substance
Typed No Aggregatable Simple
Residual Substance Pathway
Aggregation Level
• The number of distinct entities and relations are sharply reduced over various
levels of aggregation
• When fewer relations are disjoint, the knowledge network encompasses more
information
• Network density increased 20-fold
Genomic Data Mining in AI Lab
• Joint learning of genetic networks from
microarray data & existing knowledge
• Gene selection for cancer diagnosis
Artificial Intelligence Lab, MIS, University of Arizona
38
Gene Selection for Cancer Diagnosis
• Gene array data have been widely used for
cancer classification/prediction (Golub et al., 1999; Ben-Dor et
al., 2001)
• The major problems of gene array data (Model et al.,
2001; Lu & Han, 2003)




High dimensionality (hundreds or thousands of genes)
Small number of available samples
Most genes are irrelevant to cancer distinction
Genes are interacting with each other
• It is important to identify the marker genes for
cancer diagnosis
Artificial Intelligence Lab, MIS, University of Arizona
39
Experiment: Ovarian Cancer Diagnosis
• Ovarian cancer




25,580 projected cases in 2004
16,090 deaths estimated in 2004
53% overall 5 year survival
31% 5 year survival in those with distant metastases
at diagnosis
 75% of cases diagnosed in late stages (III & IV)
• Predict survival of ovarian cancer: alive or dead?
 Clinical measurements
• Two attributes: stage & grade
 Gene methylation level
• Differentially methylated between normal and cancer tissues
Artificial Intelligence Lab, MIS, University of Arizona
40
Ovarian Cancer Methylation Array
• University of Iowa Gynecologic Oncology tumor
bank (provided by Dr. Bernie Futscher at AZCC)
 114 DNA samples
•
•
•
•
•
•
11 Normal ovary
19 Stage I
18 Stage II
24 Stage III
17 Stage IV
25 Low malignant potential (LMP)
 6560 genes
• Top 1000 genes with highest standard deviation across all
samples are regarded potentially relevant
Artificial Intelligence Lab, MIS, University of Arizona
41
Gene Selection Techniques
• Individual gene ranking
 F-statistics (Mendenhall & Sincich, 1995)
• Gene subset selection
 Optimal search algorithms
• Genetic algorithm (GA) (Holland, 1975)
• Tabu search (TS) (Glover, 1986)
 Evaluation criteria
• Maximum relevance & minimum redundancy (MRMR) (Ding
& Peng, 2003)
• Support vector machine (SVM) (Vapnik, 1995)
Artificial Intelligence Lab, MIS, University of Arizona
42
Marker Genes for Survival Prediction
• Q1: Which genes can be used to predict the survival of
ovarian cancer based on their methylation level?
Level
Full set
F-stat
GA/MRMR
GA/SVM
TS/MRMR
TS/SVM
#F
1000
100
57
39
24
96
Pooled StDev =
N
30
30
30
30
30
30
Mean
67.690
77.398
76.199
80.263
80.702
82.807
1.847
StDev
2.100
1.414
2.018
1.592
1.613
2.205
Individual 95% CIs For Mean
Based on Pooled StDev
------+---------+---------+---------+
(*-)
(-*)
(*-)
(-*)
(*-)
(-*)
------+---------+---------+---------+
70.0
75.0
80.0
85.0
• Conclusion
 TS/SVM selected 96 out of 1000 genes, which achieved the
highest prediction accuracy (82.807%)
Artificial Intelligence Lab, MIS, University of Arizona
43
Methylation vs. Clinical Diagnosis
• Q2: Will methylation-based methods perform better than
clinical diagnosis in predicting survival of ovarian cancer?
Level
Clinical
Full set
F-stat
GA/MRMR
GA/SVM
TS/MRMR
TS/SVM
#F
2
1000
70
40
46
24
30
Pooled StDev =
N
30
30
30
30
30
30
30
Mean
75.281
57.566
75.581
75.506
79.813
77.715
81.948
1.790
StDev
1.770
2.747
1.413
1.756
0.205
1.868
1.769
Individual 95% CIs For Mean
Based on Pooled StDev
---------+---------+---------+------(*)
(*)
(*)
(*)
(*)
(*)
(*)
---------+---------+---------+------63.0
70.0
77.0
• Conclusions: prediction accuracy
 Full set < Clinical < Marker genes (selected by TV/SVM)
Artificial Intelligence Lab, MIS, University of Arizona
44
GeneScene Visualizer
• To provide graphical presentation of large-scale
regulatory networks comprised of pathway relations
extracted through text mining technologies
• Three testing collections
 p53 (87,903 abstracts)
 AP1 (90,773 abstracts)
 Yeast (28,971 abstracts)
• Currently loading and parsing entire PubMed for Cancer
pathway
 ~ 7 million abstracts
Artificial Intelligence Lab, MIS, University of Arizona
45
GeneScene Visualizer: Functionality
• Searching: by specific elements, e.g., diseases
or genes
• Network-based exploration and navigation
• Accessing the underlying PubMed abstract
• Saving and loading search and visualization
results
• Various manipulations on the table and network
view of the retrieved relations: filter, sort, zoom,
highlight, isolate, expand, print, etc.
Artificial Intelligence Lab, MIS, University of Arizona
46
GeneScene Visualizer V1.5
Artificial Intelligence Lab, MIS, University of Arizona
47
GeneScene Visualizer V1.5
Artificial Intelligence Lab, MIS, University of Arizona
48
Affect of Aggregation
• Same relations, before and after aggregation
Baseline level
Simple Pathway level
Artificial Intelligence Lab, MIS, University of Arizona
49
Affect of Aggregation: Mutation Feature
When mutant and non-mutant
are combined, an apparent
conflict arises: TP53 is both
activating and inhibiting
MDM2.
Artificial Intelligence Lab, MIS, University of Arizona
When the Mutation feature is
selected, non-mutant TP53 is
shown to activate and mutant
TP53 to inhibit MDM2.
50
User Feedback
General Comments
• Interviewees were generally impressed with the
features and usefulness of the system
 “In my head I've been trying to do what this is doing for you.”
 “It took me a few weeks just to find that Sin3 interacts with p53,
where when you type this in [to Genescene] it's right there.”
 “ Just playing around [with the system] I am seeing things that I
didn't know before.”
 “If this is the entire Medline, I would probably use it every time I
search.”
 “I think a lot of people would get a lot of use out of this, as long
as it doesn't scare them off in the beginning.”
Artificial Intelligence Lab, MIS, University of Arizona
51
Lessons Learned
•
•
•
•
Biomedical information is precise but terminologies
fluid
Biomedical professionals need search and analysis
help
Biomedical linguistic parsing and ontologies are
promising for biomedical text mining
The need for integrated biomedical data (gene
microarray) and text mining (literature)
Ongoing Research
• Combining bottom-up data mining
(MicroArray/Methylation) with top-down text
mining results
• Creating CancerPath testbed for cancer
genomic network visualization
• Biological networks topological analysis (growth,
preferential attachment)
• Other biomedical applications: plant science
pathway (Arabidopsis; Galbraith Lab); infectious
disease surveillance
Artificial Intelligence Lab, MIS, University of Arizona
53
BioPortal:
Infectious Disease Information
Sharing, Surveillance, Analysis,
and Visualization
Research Partners and Supports
•
•
•
•
•
•
•
•
•
•
University of Arizona
University of California, Davis
Kansas State University
University of Utah
Arizona Department of Public
Health
New York State Department of
Health/HRI
California Department of
Health Services/PHFE
U.S. Geological Survey
The SIMI Group
National Taiwan University
•
•
•
•
•
•
NSF
CIA/ITIC
DHS
DOD/AFMIC
CDC
AZDPS
UA Team Members
• Dr. Hsinchun Chen
• Dr. Daniel Zeng
• Lu Tseng
• Cathy Larson
• Kira Joslin
•
•
•
•
•
Wei Chang
James Ma
Hsinmin Lu
Ping Yan
Aaron Sun
•
•
•
•
Keith Alcock
Sapna Brahmanandam
Milind Chabbi
Yuan Wang
Outline
• Project Background
• BioPortal Achievements
 System Architecture
 System Functionalities
 BioPortal Collaboration Framework
• New Developments
 International Foot-and-mouth Disease Monitoring
 Syndromic Surveillance
 Disease Contact Tracing
BioPortal Project Goals
• Demonstrate and assess the technical feasibility and
scalability of an infectious disease information
sharing (across species and jurisdictions), alerting,
and analysis framework.
• Develop and assess advanced data mining and
visualization techniques for infectious disease data
analysis and predictive modeling.
• Identify important technical and policy-related
challenges in developing a national infectious disease
information infrastructure.
Information Sharing Infrastructure Design
Data Ingest Control
Module
Cleansing / Normalization
PHINMS
Network
NYSDOH
Adaptor
Adaptor
SSL/RSA
Adaptor
SSL/RSA
Info-Sharing Infrastructure
Portal Data Store
(MS SQL 2000)
XML/HL7
Network
CADHS
New
Data Access Infrastructure Design
Public health
professionals,
researchers, policy
makers, law enforcement
agencies & other users
WNV-BOT Portal
Browser (IE/Mozilla/…)
Data Search
and Query
SpatialTemporal
Visualization
SSL connection
Analysis /
Prediction
Dataset
Privileges
Management
Web Server (Tomcat 4.21 / Struts 1.2)
User Access Control API (Java)
Data Store
HAN or
Personal
Alert
Management
Data Store
(MS SQL 2000)
Access
Privilege
Def.
Datasets Integrated: WNV, BOT
Index
Dataset
Test Data Available
Data Duration
(MM/YY)
Spatial
Granularity
Data Size
Temporal
Granularity
1
[NY] WNV Human
Yes
Test Data
574
Zip
Date
2
[NY] Dead Bird
Yes
Test Data
942
Lat/Long
shifted
3
[NY] Mosquito
Yes
Test Data
815
County
Date
4
[NY] WNV Captive Animal
Yes
Test Data
39
Zip
Date
5
[NY] Botulism Human
Yes
Test Data
10
Zip
Date
6
[CA] WNV Human
Yes
09/03–10/03
186
County
Date
7
[CA] Dead Bird
Yes
01/03–10/03
3032
City/zip
Minutes
8
[CA] Chicken Sera
Yes
04/03–10/03
18887
Site
Date
9
[CA] Mosquito Pool
Yes
01/98–10/03
3518
Site
Date
10
[CA] Botulism
Yes
01/01–12/02
53
Zip
Date
11
[USGS] EPIZOO - Preliminary
Yes
07/99–09/03
46 events
County
Date
12
[USGS] EPIZOO – WNV
Yes
08/1999-07/2004
113 events
County
Date
13
[USGS] EPIZOO - Botulism
Yes
12/1989-12/2004
702 events
County
Date
14
[UC Davis] FMD
Yes
1996 - 2003
3288
Site/Province
Date/Month
15
International FMD
Yes
01/1982-03/2005
6789
Province
Non-temporal
16
BioWatch
Yes
1/10- 1/17 2004
480
Exact Site
Date
17
[CA] Mosquito Treatment
Yes
1/14-11/30 2004
6194
Exact Site
Date
18
National Infant Botulism
Yes
1/16-11/25 2004
15
Zip
Date
Communications/Messaging
• Scalable, flexible, light-weight, and extendible. Easy to include:
 New diseases
 New jurisdictions
 New techniques!
• Messaging infrastructure – installed and tested
 NYSDOH-UA: PHIN MS
 CADHS-UA: Regional message broker
 NWHC-UA: PHIN MS
• XML generation/conversion
 NY_DeadBird, NY_Alerts, NY_BotHuman, NY_WNVHuman,
NY_CaptiveAnimal, NY_Mosquito
 CA_BotHuman, CA_WNVHuman, CA_DeadBird, CA_Chicken,
CA_Mosquito
 USGS_Epizoo
Spatio-Temporal Data Mining &
Hotspot Analysis
• A hotspot is a condition indicating some form of
clustering in a spatial and temporal distribution
(Rogerson & Sun 2001; Theophilides et al. 2003; Patil & Tailie 2004).
• For WNV, localized clusters of dead birds typically
identify high-risk disease areas (Gotham et al. 2001).
• Automatic detection of dead bird clusters using
hotspot analysis can help predict disease outbreaks
and aid in effective allocation of prevention/control
resources.
Case Study (NY WNV)
140 records
March 5
224 records
May 26
baseline
July 2
new cases
 On May 26, 2002, the first dead bird with WNV
was found in NY
• Based on NY’s test dataset
Analysis results from SaTScan and RNNH
SaTScan picks
large cluster
- 71 new
- 7 baseline
SaTScan #2
Zoom in
Hotspots
Zoomdensity
high
in
area
RNNH
RNNH picks
small cluster
- 53 new
- 6 baseline
RNNH
SaTScan
SaTScan #1
NY Deadanalysis
Baseline
Hotspot
Close-up
cases
+
of
bird
new
the
2002
in
hotspots
cases
results
zoomed-in
in zoomed-in
area area
Dead Bird Hotspots Identified
WNV/BOT BioPortal
Acknowledgment:
NSF, ITIC, NYSDH, CDHS, USGS
(Drs. Kvach and Ascher)
Dataset name
Advanced
Spatial
/ Temporal
Search criteria
Select background maps
Results
listed
in table
Available
dataset
list
User main page
Positive cases
Time range
Select NY / CA
population, river and
lakes
County / State
Choose WNV disease
data
Select CA dead bird,
chicken and NY dead
bird data
Positive cases
User Login
Positive cases
Start STV
Specify bird
species
NY deaddistribution
Spatial
bird temporal
distribution
pattern
pattern
GIS
Timeline
Close
Zoom in
NY
Zoom in
Periodic
Pattern
Year 2001
data
Control
panel
Move time
slider, year 3
2
2 weeks
View1 all
year
3 year
window
window
datain 3 year span
Concentrated
Similar
time
Overall pattern
in May / Jun
pattern
Spatial distribution
Overlay population map
pattern
Dead bird cases
Dead
bird
cases
migrate
from
long island
distribute
along
Into upstate
NY
populated areas near
Hudson river
Enable
population
map
Season end
Move time
slider
BioPortal HotSpot Analysis: RSVC,
SaTScan, and CrimeStat Integrated (first visual, real-time
hotspot analysis system for disease surveillance)
• West Nile virus in California
Hotspot Analysis-Enabled STV
Select hotspot to
Regular STV
highlight case
points
Select algorithms
Hotspots found!
Select baseline and
case periods
Select target
baseline and
case periodsarea
geographic
International FMD BioPortal
Acknowledgment: DHS, DOD,
UC Davis (Drs. Thurmond and Lynch)
International FMD BioPortal Goals
• Real-time, web-based situational awareness of FMD
outbreaks worldwide through the establishment of an
international information sharing and analysis system
• FMDv characterization at the genomic level integrated with
associated epidemiological information and modeling tools to
forecast national, regional, and/or international spread and
the prospect of import into the U.S. and the rest of North America
• Web-based crisis management of resources—facilities,
personnel, diagnostics, and therapeutics
Research Plans
• Global FMD epidemiological data
 (Near) real-time data collection
 Web-based information sharing and analysis
• International FMD news
 Indexed collection of global FMD news
 Search and visualization of the FMD news via the web
• FMD genetic/sequence data
 Predictive model using phylogenetic, spatial, and temporal
information to stop FMD at the boarder
 Visualization for FMD event in time, space, and genetic space
Preliminary Global FMD Dataset
•
•
•
•
•
•
Provider: UC Davis FMD Lab
Information sources: reference labs and OIE
Coverage: 28 countries globally
Time span: May, 1905 – March, 2005
Dataset size: 30,000+ records of which 6789 records are complete
Host species: Cattle, Caprine, Ovine, Bovine, Swine, NK, Elephant,
Buffalo, Sheep, Camelidae, Goat
Regionwise Distribution of FMD Data
Europe
14%
Africa
1%
Middle East Asia
4%
Central and South
Asia
15%
Buffaloes
Elephant 3%
0%
Sheep Goats
0%
3%
Camelidae
Cattle
0%
5%
Caprine
4%
Swine
11%
Ovine
37%
South America
66%
Bovine
37%
Global FMD Coverage in BioPortal
FMD Migration Visualization using
BioPortal (cases in South Asia)
FMD Cases travel back
and forth between
countries
International FMD News
• Provider: UC Davis FMD Lab
• Information sources: Google, Yahoo, and open
Internet sources
• Time span: Oct 4, 2004 – present (real-time
messaging under development)
• Data size: 460 events (6/21/05)
• Coverage: 51 countries
UNDEFINED
5%
(Africa:11, Asia:16,
Europe:12, Americas:12)
America
27%
Africa
11%
Aisa
1%
Asia
15%
Australia
14%
Europe
27%
Searching
FMD News


http://fmd.ucdavis.edu/
Searchable by



Date range
Country
Keyword
Visualizing FMD News on BioPortal
FMD Genetic Information Analysis
• Genome clustering analysis
 Phylogenetic clustering
 Spatial clustering
 Temporal clustering
• Hotspot detection among gene sequences
 Create a tree structure based on semantic distance between
gene sequences.
 Automatically detect the dense portion of the tree.
 Identify the connection between the semantic cluster and the
geographic pattern of gene sequences.
FMD Genetic Visualization
• Goal: Extend STV to incorporate 3rd dimension,
phylogenetic distance
 Include a phylogenetic tree.
 Identify phylogenetic groups and color-code the isolate
points on the map.
 Leverage available NCBI tools such as BLAST.
• Proof of concept: SAT 2 & 3 analysis
 Data: 54 partial DNA sequence records in South Africa
received from UC Davis FMD Lab (Bastos,A.D. et al. 2000, 2003)
 Date range: 1978-1998
 Countries covered: South Africa, Zimbabwe, Zambia,
Namibia, Botswana
Sample FMD
Sequence Records
Color-coded View (MEGA3)
Textual View of Gene Sequence
Phylogenetic Tree
of Sample FMD Data
Identify 6 groups
within 2 major families
(MEGA3; based on
sequence similarity)
Group6
Group1
Group5
Group2
Group4
Group3
Genetic, Spatial, and Temporal
Visualization of FMD Data
Phylogenetic tree
color coded
Isolates’ locations
color coded
Isolates’
appearances in
time
FMD Time Sequence Analysis
First family cases
appeared throughout the
period
2nd family cases exist before 1993
and a comeback lately
Second family cases
existed before 1993 and
reappeared later after 1997
FMD Periodic Pattern Analysis
2nd family concentrated in Feb.
while 1st family spread evenly
Locations of Family 1 records
Selected only groups 1,
2, and 3 and found a
spatial cluster
Locations of Family 2 records
Sparse isolate locations
Selected only groups 4,
5, and 6
Syndromic Surveillance
91
2015/10/12
Chief Complaints As a Data Source
• Chief complaints (CCs) are short free-text phrases
entered by triage practitioners describing reasons
for patients’ ER visit
 Examples: lt foot pain [left foot pain]; cp [chest pain]; sob
[shortness of breath]; so [should be ‘sob’]; poss uti
[possibly urinary tract infection]
• Advantages of using CCs for surveillance purposes
 Timeliness: Diagnose results are on average 6 hours
slower than CCs
 Availability and low-cost: Most hospitals have free-text
CCs available in electronic form
92
2015/10/12
Existing CC Classification Methods
Classification Method
Systems
Authors
Keyword Match + Synonym
List + Mapping Rules
DOHMH
(NY City),
EARS
Mikosz et. al. (2004)
Weighted Keyword Match
(Vector Cosine Method) +
Mapping Rules
ESSENCE
Sniegoski (2004)
Naïve Bayesian
Bayesian Network
RODS
N/A
Olszewski (2003), Ivanov et. al
(2002)
Chapman et. al. (2004)
93
2015/10/12
Overall System Design
Stage 1
Stage 2
Chief
Complaints
symptoms
CC
Standardization
Symptom
Grouping
Weighted
Semantic
Similarity Score
EMT-P
UMLS
Ontology
UMLS
Concepts
Synonym
List
Stage 3
Symptom
Grouping
Table
EMT-P
Symptom
Groups
Syndromes
Syndrome
Classification
JESS
EARS
Syndrome
Rules
EARS
Symptom
Table
94
2015/10/12
A Stage 2 Example: CC Concepts  Symptom
Group Concepts
coagulopathy purpura
4
Blood
In urine
5
5
5
ureteral
stone
coma
ecchymosis
6
bleeding=
1/4+1/5+1/6=
0.62
other=1/5=0.2
coma=1/5=0.2
dead=1/5=0.2
UMLS
5
out pass
2015/10/12
altered_mental_status=
1/5=0.2
95
Summary of Stage 2
Performance
Covered by the
EARS/EMT-P
3%
3001
concepts
1835 CC
records
from
Stage 1
44%
53%
Additional coverage
suggested by our
Weighted Semantic
Similarity Score
approach
Unidentified
contains
Covered by the
EARS/EMT-P
417 unique
concepts
11%
64%
25%
Additional coverage
suggested by our
Weighted Semantic
Similarity Score
approach
Unidentified
2015/10/12
96
BioPortal – Taiwan Syndromic
Surveillance
97
2015/10/12
Multi-lingual Chief Complaints:
Chinese Example
• Data Characteristics:
 Mixed expressions in both Chinese and
English
• 頭痛;頭暈;FEVER;腹痛;噁心嘔吐多次;旅遊史(無)
• 車禍,導致左手背A/W,疼痛不適,咳嗽有痰
• 18% CC records from NTU Med. Center contain
Chinese expressions.
• Some hospitals have 100% CC records in Chinese
(For example, 馬偕紀念醫院)
 Misspellings and typographic errors are not
serious
98
2015/10/12
Chinese CC Preprocessing: System
Design
English Expressions
Stage 0.1
Chinese
Chief
Complaints
Stage 0.2
Chinese
Separate
Chinese
Chinese and Expressions
Phrase
English
Segmentation
Expressions
Chinese
Medical
Phrases
Raw Chinese
CCs
Segmented
Chinese
Phrases
Common
Chinese
Phrases
Stage 0.3
Chinese
Phrase
Translation
Translated
Chinese
Phrases
Chinese to
English
Dictionary
Mutual
Info.
99
2015/10/12
Result: Self Validation
• Use the 280 translations against 1978
chief complaints from hospital A
• 1610 (82%) records are in English
• 368 (18%) records contain Chinese
• 36% contains trivial info.
Eg. “r/o septic shock 外院轉入”
• 64% contains non-trivial info.
Eg. “poor intake and 味覺喪失”
• 67% has complete translation
• 2% has partial translation
100
• 20% does not have translation
2015/10/12
General Grouping
• Taiwan surveillance data visualization: 2.2M+ scrubbed
chief complaints records
101
2015/10/12
Group by Hospital
102
2015/10/12
Group by Syndrome Classification
103
2015/10/12
Disease Outbreak Detection Using
Chief Complaints
• Markov Switching
Model
104
Data Source
• Emergency Department Free-text Chief
Complaints (CCs)
 Medical practitioners use both Chinese and
English to record CCs
 368,151 records; 23.77% contains Chinese
characters
 Time period: 2000-6-30 to 2003-4-27
• Use BioPortal Multilingual CC Classifier to
classify CC records into syndromes
Syndrome Prevalence
• Botulism-Like (1.4%)
• Rash (2.4%)
• Constitutional (25.4%) • Respiratory (17.8%)
 Upper Respiratory
• Gastrointestinal
(7.3%)
(26.4%)
 Lower Respiratory
• Hemorrhagic (6.4%)
(12.7%)
• Neurological (14.1%) • Fever (18%)
• Other (34.9%)
• Choose Resp. and GI syndrome for further analysis
• Two syndromes with high prevalence
• Can be extended to other syndromes
GI Syndrome Time Series
Autocorrelation
Function
Series the_ts
0.4
ACF
120
0.2
100
0.0
80
-0.2
60
the_ts
0.6
140
0.8
160
1.0
180
Gastrointestinal Syndrome Time Series
2001
2002
2003
0
100
200
Lag
300
400
GI Syndrome Time Series
(cont’d)
•
•
•
•
Strong day-of-week effect
Seasonal effect is less strong
Sporadic jumps
Seems to have 1 – 2 peaks per year
100
60
GI Time Series
Count
140
180
Estimation Results (GI Syndrome)
2002
2003
2001
2002
2003
2001
2002
2003
30
-10 0 10
0.0 0.2 0.4 0.6 0.8 1.0
Outbreak
State
State
Jumps
post_jump_size
50
2001
Estimation Results (GI Syndrome)
(cont’d)
• Jumps appear during Chinese New Years
• The Markov switching model identified 4
high GI-count period
 2000-12-23 to 2001-1-28 (Jan. 23 New Year
Eve)
 2002-1-29 to 2002-3-15 (Feb. 11 New Year
Eve)
 2002-5-9 to 2002-10-14
 2002-12-13 to 2003-2-18 (Jan. 30 New Year
Eve)
Taiwan SARS Contact Tracing
111
2015/10/12
Social Network Analysis in Epidemiology
• Conceptualizing a population as a set of individuals linked
together to form a large social network provides a fruitful
perspective for better understanding the spread of some
infectious diseases. (Klovdahl, 1985)
• Social Network Analysis in epidemiology has two major
activities:
 Network Construction
• Link the whole set of persons in a particular population with
relationships or types of contacts
 Network Analysis
• Measure and make inferences about structural properties of the
social networks through which infectious agent spread
112
A Taxonomy of Network Construction
Network Construction
Disease
Sexually Transmitted
Disease (STD)
Tuberculosis (TB)
Severe Acute
Respiratory Syndrome
(SARS)
Linking Relationship
Examples
Sexual Contact
AIDS (Klovdahl, 1985)
Gonorrhea (Ghani et al., 1997)
Syphilis (Rethenberg et al., 1998)
Drug Use
AIDS (Klovdahl et al., 1994)
AIDS (Rethenberg et al., 1998)
Needle Sharing
AIDS (Klovdahl et al., 1994)
AIDS (Rethenberg et al., 1998)
Social Contact
AIDS (Klovdahl et al., 1994)
AIDS (Rethenberg et al., 1998)
Personal Contact
(Klovdahl et al., 2001)
(McElroy et al, 2003)
Geographical Contact
(Klovdahl et al., 2001)
(McElroy et al, 2003)
The Source of Infection
(CDC*, 2003)
(Shen et al., 2004)
Personal Contact
(Meyers et al., 2005)
*CDC: Centers for Disease Control and Prevention
113
A Taxonomy of Network Analysis
Network Analysis
Levels of Analysis
Description
Examples
Network
Visualization
Show the spread of an
infectious agent transmitted
from one person to another
AIDS (Klovdahl, 1985)
Syphilis (Rethenberg et al., 1998)
SARS (CDC*, 2003)
SARS (Shen et al., 2004)
Network
Measurement
Study the structure of a
population through which an
infectious agent is transmitted
during close personal contact
Syphilis (Rethenberg et al., 1998)
AIDS (Klovdahl et al., 1994)
AIDS (Rethenberg et al., 1998)
Develop disease containment
strategies or programs
Network Simulation
Evaluate the spread of an
infectious agent within a
population with different
network parameters
*CDC: Centers for Disease Control and Prevention
Gonorrhea (Ghani et al., 1997)
SARS (Meyers et al., 2005)
114
Network Visualization
• Utilize social network to visualize the transmission of an
infectious agent from one person to another within a
particular population
• Focus on the identification of
 Subgroups within the population
 Characteristics of each subgroup
 Bridges between subgroups which transmit a disease from a
subgroup to another
Clusters in Singapore Source (CDC, 2003)
Syphilis Transmission (Rothenberg et al., 1998)
115
Epidemic Phases and Social Networks
• Potterat et al. (2001) proposed that structure of sexual
networks is a more reliable indicator of STD epidemic phase.
 Two sexual networks in Colorado Springs, U.S. were compared:
• Bacterial STD from 1990 to 1991 (a STD outbreak)
• Chlamydia from 1996 to 1999 (stable or declining phase)
 Sexual network in stable or declining phase was relatively
• Fragmented
• Dendritic
• Lack of cyclic structures
• Cunningham et al. (2004) further examined the relationship
between network characteristics and epidemic phases.
 After epidemic
• Macro-level structure
 Average distance declined.
 Density increased.
• Micro-level structure
 Numbers of n-cliques and k-plexes declined.
116
Research Test Bed
• We use Taiwan SARS data as our research test
bed.
• SARS (Severe Acute Respiratory Syndrome) is a
novel infectious disease which emerged in 2002.
 The first human case was identified in Guangdong Province,
China on November 16, 2002. (Donnelly et al., 2004)
 A 65-years-old doctor from Guangdong Province stayed at a
hotel in Hong Kong in February 2003 and infected at least 17
other guests and visitors at the hotel, some of whom later came
to other countries and initiated local transmission of SARS.
(Peiris et al., 2006)
 26 countries, including Vietnam, Singapore, Canada, and
Taiwan, reported SARS cases.
 Financial impact: $50B
117
SARS in Taiwan
• The first SARS case in Taiwan was a Taiwanese
businessman who traveled to Guangdong Province via Hong
Kong in the early February 2003.
 Had onset of symptoms on February 26, 2003
 Infected two family members and one healthcare worker
• Eighty percent of probable SARS cases were infected in hospital
setting.
 The first outbreak began at a municipal hospital in April 23, 2003.
 Total seven hospital outbreaks were reported.
 Hospital shopping and transfer were suspected to trigger such
sequential hospital outbreaks.
118
Taiwan SARS Data
• Taiwan SARS data was collected by the Graduate Institute of
Epidemiology at National Taiwan University during the
SARS period.
• In this dataset, there are 961 patients, including 638
suspected SARS patients and 323 confirmed SARS patients.
• The contact-tracing data of patients in this dataset has two
main categories, personal and geographical contacts, and
nine types of contacts.
 Personal contacts: family member, roommate, colleague/classmate, and close
contact
 Geographical contacts: foreign-country travel, hospital visit, high risk area
visit, hospital admission history, and workplace
119
Taiwan SARS Data (Cont.)
• Hospital admission history is the category with largest number of
records (43%).
• Personal contacts are primarily comprised of family member records.
Category
Personal
Geographical
Type of Contacts
Records
Suspected
Patients
Confirmed
Patients
Family Member
177
48
63
Roommate
18
11
15
Colleague/Classmate
40
26
23
Close Contact
11
10
12
Foreign-Country Travel
162
100
27
Hospital Visit
215
110
79
High Risk Area Visit
38
30
7
Hospital Admission History
622
401
153
Workplace
142
22
120
1425
638
323
Total
120
Research Design
121
Phase Analysis (Cont.)
• Network Partition
 We partition each contact network on a weekly basis with linkage
accumulation.
 From 2/24 to 5/4, there are 10 weeks in total.
2/24
3/3
3/10
3/17
5/4
Personal
Contact
Network
Week1
Week2
Week3
……………
Week10
122
Phase Analysis (Cont.)
• Network Measurement
 We investigate two factors that contribute to the transmission of
disease in macro-structure:
• Density: the degree of intensity to which people are linked together
 Density
 Average degree of nodes
• Transferability: the degree to which people can infect others
 Betweenness
 Number of components
Lower density
Higher density
Lower Transferability
Higher Transferability
123
Connectivity Analysis
• Geographical contacts provide much higher connectivity
than personal contacts in the network construction.
 Decrease the number of components from 961 to 82
 Increase the average degree from 0.31 to 108.62
Applied Contacts in
the network construction
Average Degree
(Patient Nodes)
Maximum Degree
(Patient Nodes)
Number of
Components
0.31
4
847
Geographical Contacts
108.62
474
82
Personal + Geographical Contacts
108.85
474
10
Personal Contacts
124
Connectivity Analysis (Cont.)
• The hospital admission history provides the highest connectivity of
nodes in the network construction.
• The hospital visit provides the second highest connectivity.
• This result is consistent with the fact that most of patients got infected
in the hospital outbreaks during the SARS period.
Applied Contacts in
the network construction
Personal
Contacts
Geographical
Contacts
Average
Degree
Maximum
Degree
Number of
Components
Family Member
0.204
4
893
Roommate
0.031
2
946
Colleague/Classmate
0.06
3
934
Close Contact
0.023
1
949
Foreign-Country Travel
2.727
41
848
Hospital Visits
10.077
105
753
High Risk Area Visit
1.388
36
924
Hospital Admission
History
50.479
289
409
Workplace
4.694
61
823
125
One-Mode Network with Only Patient Nodes
:Suspected
:Confirmed
126
Contact Network with Geographical Nodes
:Area
:Hospital
:Suspected
:Confirmed
127
Potential Bridges Among Geographical Nodes
• Including geographical nodes helps to reveal some potential
people who play the role as a bridge to transfer disease from
one subgroup to another.
128
Network Visualization (Cont.)
• For a hospital outbreak, including geographical nodes and
contacts in the network is also useful to see the possible disease
transmission scenario within the hospital.
• Background of the Example
 Mr. L, a laundry worker in H Hospital, had a fever on 2003/4/16 and was
reported as a suspected SARS patient.
 Nurse C took care of Mr. Liu on 4/16 and 4/17.
 Nurse C and Ms. N, another laundry worker in H Hospital, began to have
symptoms on 4/21.
 H Hospital was reported to have an SARS outbreak on 4/24.
 Nurse C’s daughter had a fever on 5/1.
129
Phase Analysis – Density
• Normalized density and average degree show similar patterns:
 In the importation phase, foreign-country contact network increases
dramatically in Week 4 (3/17-3/23), followed by personal contact network.
 In the hospital outbreak phase, both personal and hospital networks
increase dramatically. But in Week 10, personal network still increases
while hospital network decreases.
Hospital
Outbreak
Importation
0.45
Hospital
Outbreak
Importation
0.45
0.4
0.4
0.35
0.35
0.3
0.3
Foreign Country
0.25
Hospital
0.2Personal
0.25
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
2
3
4
5
6
Density
7
8
9
10
2
Foreign Country
Hospital
Personal
3
4
5
6
7
8
Average Degree
9
10
130
Phase Analysis – Transferability
• From betweenness, we can see that personal network doesn’t have
enough transferability until Week 9.
 Personal network just forms several small fragments without big groups
in the importation phase.
• From the number of components, hospital network is the only one
which can consistently link patients together.
Hospital
Outbreak
Importation
0.9
Hospital
Outbreak
Importation
12
0.8
10
0.7
8
0.6
0.5
0.4
Foreign Country
Foreign Country
6
Hospital
Hospital
Personal
Personal
4
0.3
0.2
2
0.1
0
0
2
3
4
5
6
7
Betweenness
8
9
10
2
3
4
5
6
7
8
9
Number of Components
10
131
Ongoing Research
• Worldwide infectious disease breaking news
collection, monitoring, and analysis
• Markov-switching model based disease
surveillance
• Infectious disease social network analysis and
contact tracing
• Other public health concerns and infectious
disease applications
Artificial Intelligence Lab, MIS, University of Arizona
13
2
Building Research Partnership
• Emerging critical medical and public health
concerns
• Willing and engaging international domain
(biomedical) partners and funding sources
• Data, data, and more data
• From academic research to scalable
solutions/systems and lasting impacts
Artificial Intelligence Lab, MIS, University of Arizona
13
3
For more information:
BioPortal web site:
http://www.bioportal.org
AI Lab web site:
http://ai.arizona.edu
[email protected]