Download Document

Document related concepts

Epigenetics of neurodegenerative diseases wikipedia , lookup

Point mutation wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene desert wikipedia , lookup

Gene therapy wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genome evolution wikipedia , lookup

Genomic imprinting wikipedia , lookup

Minimal genome wikipedia , lookup

Oncogenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Gene wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Gene expression programming wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome (book) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

NEDD9 wikipedia , lookup

Transcript
Molecular Entity Types
Phenotypic Entity Types
Gene
Differentiation Status
Clinical Stage
Genomic Information
Malignancy Types
Phenomic Information
Histology
Variation
Site
Developmental State
Heredity Status
Genomic Variation associated with Malignancy
Flow Chart for Manual Annotation Process
Auto-Annotated Texts
Biomedical Literature
Machine-learning Algorithm
Annotators (Experts)
Entity Definitions
Manually Annotated Texts
Annotation Ambiguity
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Defining biomedical entities
A point mutation was found at codon 12 (G  A).

Variation
Defining biomedical entities
Data Gathering
A point mutation was found at codon 12 (G  A).

Variation
A point mutation was found at codon 12


Variation.Type
Variation.Location
Data Classification
(G

A).


Variation.InitialState
Variation.AlteredState
Defining biomedical entities

Conceptual boundaries

Sub-classification of entities
Defining biomedical entities

Conceptual boundaries

Sub-classification of entities
 Levels of specificity
Levels of specificity
Gene Entity
Malignancy type Entity
Gene
Protein kinase (Super family)
MAPK (Gene family)
MAPK10
Cancer/Tumor
Carcinoma
Lung carcinoma
Squamous cell lung carcinoma
Defining biomedical entities

Conceptual boundaries

Sub-classification of entities
 Levels of specificity
 Conceptual overlaps between entities
Symptom: Subjective or objective evidence of disease.
Disease: A specific pathological process with a characteristic set of symptoms.
Arrhythmia vs. Long QT Syndrome
Defining biomedical entities

Conceptual boundaries

Sub-classification of entities
 Levels of specificity
 Conceptual overlaps between entities
 Domain-specific clarification
Gene entity clarification:
Regulation element -- promoters (eg. TATA box)
Defining biomedical entities

Conceptual boundaries
 Sub-classification of entities
 Levels of specificity
 Conceptual overlaps between entities
 Domain-specific clarification

Syntactical boundaries

Text boundary issues
The K-ras gene……
Defining biomedical entities

Conceptual boundaries
 Sub-classification of entities
 Levels of specificity
 Conceptual overlaps between entities
 Domain-specific clarification

Syntactical boundaries

Text boundary issues (The K-ras gene)

Pronoun co-reference (this gene, it, they)
Defining biomedical entities

Conceptual boundaries
 Sub-classification of entities
 Levels of specificity
 Conceptual overlaps between entities
 Domain-specific clarification

Syntactical boundaries

Text boundary issues (The K-ras gene)

Co-reference (this gene, it, they)
 Structural overlap -- entity within entity (same entity type)
MAP kinase kinase kinase
Defining biomedical entities

Conceptual boundaries
 Sub-classification of entities
 Levels of specificity
 Conceptual overlaps between entities
 Domain-specific clarification

Syntactical boundaries

Text boundary issues (The K-ras gene)

Pronoun co-reference (this gene, it, they)
 Structural overlap -- entity within entity (different entity type)
Squamous cell lung carcinoma
Defining biomedical entities

Conceptual boundaries
 Sub-classification of entities
 Levels of specificity
 Conceptual overlaps between entities
 Domain-specific clarification

Syntactical boundaries


Text boundary issues (The K-ras gene)
Co-reference (this gene, it, they)
 Structural overlap -- entity within entity
 Discontinuous mentions (N- and K-ras )
Semantic ambiguity challenges

Ambiguity within an entity type
CAT
catalase
glycine-N-acyltransferase (GLYAT)
Semantic ambiguity challenges


Ambiguity within an entity type
Ambiguity between entity types
CAT
Gene entity
Organism
Semantic ambiguity challenges



Ambiguity within entity types
Ambiguity between entity types
Gene entity ambiguity

3% of human genes share aliases
 Huge ambiguity of genes between species (mouse and human)
 Gene.general, Gene.gene/RNA, Gene.protein
Gene
Gene
RNA
Protein
Variation
Type
Location
Initial State
Altered State
Malignancy Type
Site
Histology
Clinical Stage
Differentiation Status
Heredity Status
Developmental State
Physical Measurement
Cellular Process
Expressional Status
Environmental Factor
Clinical Treatment
Clinical Outcome
Research System
Research Methodology
Drug Effect
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
http://www.ldc.upenn.edu/mamandel/itre/annotators/onco/definitions.html
Manual Annotation Corpus Release
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Jena University Language & Information Engineering Lab: http://www.julielab.de
K Bretonnel Cohen and Lawrence Hunter, BMC Bioinformatics. 2006; 7(Suppl 3): S5.
Summary -- Entity Definition

Developed iterative process for biomedical entity definition;

Defined genomic and phenotypic entities with distinct conceptual
and syntactical boundaries in genomic variation of malignancy;

Constructed a manually annotated corpus with 1442 oncologyfocused articles.
Named Entity Extractors
Mycn is amplified in neuroblastoma.
Gene
Variation type
Malignancy type
Automated Extractor Development

Training and testing data

1442 cancer-focused MEDLINE abstracts
 70% for training, 30% for testing
Automated Extractor Development

Training and testing data

1442 cancer-focused MEDLINE abstracts
 70% for training, 30% for testing

Machine-learning algorithm

Conditional Random Field (CRF)
 Sets of Features
Lung cancer is the
MType Mtype
…
of carcinoma deaths worldwide.
Automated Extractor Development

Training and testing data

1442 cancer-focused MEDLINE abstracts
 70% for training, 30% for testing

Machine-learning algorithm

Conditional Random Fields (CRFs)
 Sets of Features
 Orthographic features (capitalization, punctuation, digit/number/alphanumeric/symbol);
 Character-N-grams (N=2,3,4);
 Prefix/Suffix: (*oma);
 Offsite conjuction (3 consecutive word tokens);
 Domain-specific lexicon (NCI neoplasm list).
Extractor Performance
Entity
Gene
Variation Type
Location
State-Initi al
State-Sub
Overall
Malign ancy type
Clinical St age
Site
Histology
Deve lopmental State
Precision
0.864
Recall
0.787
0.8556
0.8695
0.8430
0.8035
0.8541
0.7990
0.7722
0.8286
0.7809
0.7870
0.8456
0.8493
0.8005
0.8310
0.8438
0.8218
0.6492
0.6555
0.7774
0.7500
• Precision: (true positives)/(true positives + false positives)
• Recall: (true positives)/(true positives + false negatives)
Normal text
Malignancies
PMID: 15316311
Morpho logic and molecular characterization of renal cell carcinoma in children a nd y oung adu lts.
A new WHO classification of renal cell carcinoma has been introduced in 2004. This classification
includes the recently described renal cell carcinomas with the ASPL-TFE3 gene fusion and carcinomas
with a PRCC -TFE3 gene fusion. Collectively, these tumors have been termed Xp11.2 or TFE3
translocation carcinomas, which prima rily occur in children and young adults. To further study the
characteristics of renal cell carcinoma in young patients and to determi ne their genetic background, 41
renal cell carcinomas of patients younger than 22 years were morphologically and genetically
characterized . Loss of heterozygosity analysis of the von Hippel - Lindau gene region and screening for
VHL gene mu tations by direct sequencing were performed in 20 tumors. TFE3 protein overexpression,
which correlates with the presence of a TFE3 gene fusion, was assessed by immunohistochemistry.
Applying the new WHO classification for renal cell carcinoma, there we re 6 clear cell ( 15 %), 9 papillary
(22 %), 2 chromophobe, and 2 collecting duct carcinomas. Eig ht carcinomas showed translocation
carcinoma morphology (20 %). One carcinoma occurred 4 years after a neuroblastoma. Thirteen tumors
could not be assigned to types specified by the new WHO classification: 10 were grouped as unclassified
(24 %), including a unique renal cell carcinoma with prominently vacuolated cytoplasm and WT1
expression. Three carcinomas occurred in combination with nephroblastoma. Molecular analysis revealed
deletions at 3p25-26 in one translocation carcinoma, one chromophobe renal cell carcinoma, and one
papillary renal cell carcinoma. There were no VHL mutations. Nuclear TFE3 overexpression was detected
in 6 renal cell carcinomas, all o f which showed areas with voluminous cytoplasm and foci of papillary
architecture, consistent with a translocation carcinoma phenotype. The large proportion of TFE3 "
translocation " carcinomas and "unclassified " carcinomas in the first two decades of life demonstrates that
renal cell carcinomas in young patients contain genetically and phenotypically distinct tumo rs with further
potential for novel renal cell carcinoma subtypes. The far lower f requency of clear cell carcinomas and
VHL alterations comp ared with adults suggests that renal cell carcinomas in young patients have a unique
genetic background.
CRF-based Extractor vs. Pattern Matcher

The testing corpus

39 manually annotated MEDLINE abstracts selected
 202 malignancy type mentions identified

The pattern matching system




5,555 malignancy types extracted from NCI neoplasm ontology
Case-insensitive exact string matching applied
85 malignancy type mentions (42.1%) recognized correctly
The malignancy type extractor

190 malignancy type mentions (94.1%) recognized correctly
 Included all the baseline-identified mentions
The Types of Mentions NOT Identified by Pattern Matching
Mention Types
Mention Examples
NCI List
Acronyms
NB
Neuroblastoma
Lexical variants (plural forms)
Renal cell carcinomas
Renal cell carcinoma
Polymorphic expressions
Lung cancer
(tumor/tumour)
Lung neoplasm
higher levels of specificity
Solid tumor
<More specific tumor>
Tumor names with modifiers
Translocation carcinoma
Carcinoma
Normalization
abdominal neoplasm
abdomen neoplasm
Abdominal tumour
Abdominal neoplasm NOS
Abdominal tumor
Abdominal Neoplasms
Abdominal Neoplasm
Neoplasm, Abdominal
Neoplasms, Abdominal
Neoplasm of abdomen
Tumour of abdomen
Tumor of abdomen
ABDOMEN TUMOR
Unique Identifier
Normalization
abdominal neoplasm
abdomen neoplasm
Abdominal tumour
Abdominal neoplasm NOS
Abdominal tumor
Abdominal Neoplasms
Abdominal Neoplasm
Neoplasm, Abdominal
Neoplasms, Abdominal
Neoplasm of abdomen
Tumour of abdomen
Tumor of abdomen
ABDOMEN TUMOR
UMLS metathesaurus
Concept Unique Identifier (CUI)
19,397 CUIs with 92,414 synonyms
C0000735
Normalization – Computational Procedures

Rule-based algorithm

Applied to both entity mentions and vocabulary terms (UMLS metathesaurus)
 Case insensitivity (carcinoma/Carcinoma)
 Space/punctuation removal (lung-cancer/lungcancer)
 Stemming (neuroblastoma/neuroblastomas)
 Applied to mentions only
 First/last character removal (additional space/punctuation)
 First/last word removal (translocation lung carcinoma)

Evaluate the accuracy and the priority of the rules
 1,000 randomly selected entity mentions
 Choose the best performed rule combination and sequences
MEDLINE Data Processing

Tagging MEDLINE pre-2006 abstracts

15,433,668 MEDLINE abstracts
 9,153,340 redundant and 580,002 distinct malignancy type mentions
 ~60% extracted mentions matched to UMLS CUIs
 1,642 CPU-hours (2.44 days on a 28-CPU cluster)

Infrastructure construction (postgreSQL Database)
Gene-Malignancy-Evidence Matrix
21,493,687 normalized gene symbols (16,875 unique)
Gene
Malignancy
Evidence
A1BG
A1BG
A1BG
……
ABCC1
ABCC1
ABCC1
……
B3GAT1
B3GAT1
B3GAT1
……
ERVK6
ERVK6
ERVK6
……
NFKB1
NFKB1
NFKB1
……
VIM
VIM
VIM
……
Adenocarcinoma
Adenocarcinoma
Adenocarcinoma
……
Lung Carcinoma
Lung Carcinoma
Lung Carcinoma
……
Breast Neoplasm
Breast Neoplasm
Breast Neoplasm
……
1634938
2292657
3566173
……
11156254
11159731
11172691
……
6870377
9129046
9701020
……
9056412
9620301
9640365
……
12842827
12901803
12934082
……
12375611
12657940
12673425
……
Stage IV Melanoma of the Skin
Stage IV Melanoma of the Skin
Stage IV Melanoma of the Skin
……
Colon Carcinoma
Colon Carcinoma
Colon Carcinoma
……
Gastrointestinal Stromal Tumor
Gastrointestinal Stromal Tumor
Gastrointestinal Stromal Tumor
……
Gene-Malignancy-Evidence Matrix
5,398,954 normalized malignancy types (4,166 CUIs)
Gene
Malignancy
Evidence
A1BG
A1BG
A1BG
……
ABCC1
ABCC1
ABCC1
……
B3GAT1
B3GAT1
B3GAT1
……
ERVK6
ERVK6
ERVK6
……
NFKB1
NFKB1
NFKB1
……
VIM
VIM
VIM
……
Adenocarcinoma
Adenocarcinoma
Adenocarcinoma
……
Lung Carcinoma
Lung Carcinoma
Lung Carcinoma
……
Breast Neoplasm
Breast Neoplasm
Breast Neoplasm
……
1634938
2292657
3566173
……
11156254
11159731
11172691
……
6870377
9129046
9701020
……
9056412
9620301
9640365
……
12842827
12901803
12934082
……
12375611
12657940
12673425
……
Stage IV Melanoma of the Skin
Stage IV Melanoma of the Skin
Stage IV Melanoma of the Skin
……
Colon Carcinoma
Colon Carcinoma
Colon Carcinoma
……
Gastrointestinal Stromal Tumor
Gastrointestinal Stromal Tumor
Gastrointestinal Stromal Tumor
……
Gene-Malignancy-Evidence Matrix
3,100,773 distinct Gene-Malignancy-Evidence relations
Gene
Malignancy
Evidence
A1BG
A1BG
A1BG
……
ABCC1
ABCC1
ABCC1
……
B3GAT1
B3GAT1
B3GAT1
……
ERVK6
ERVK6
ERVK6
……
NFKB1
NFKB1
NFKB1
……
VIM
VIM
VIM
……
Adenocarcinoma
Adenocarcinoma
Adenocarcinoma
……
Lung Carcinoma
Lung Carcinoma
Lung Carcinoma
……
Breast Neoplasm
Breast Neoplasm
Breast Neoplasm
……
1634938
2292657
3566173
……
11156254
11159731
11172691
……
6870377
9129046
9701020
……
9056412
9620301
9640365
……
12842827
12901803
12934082
……
12375611
12657940
12673425
……
Stage IV Melanoma of the Skin
Stage IV Melanoma of the Skin
Stage IV Melanoma of the Skin
……
Colon Carcinoma
Colon Carcinoma
Colon Carcinoma
……
Gastrointestinal Stromal Tumor
Gastrointestinal Stromal Tumor
Gastrointestinal Stromal Tumor
……
Ranked by Frequency
6850
6800
6750
6700
TP53-Carcinoma
ESR1-Breast Carcinoma
ESR1-Breast Neoplasm
6650
6600
6550
6500
Gene-Malignancy Relaions
Summary -- Extractor Development and Application

Developed well-performed automated entity extractors across
genomic and phenotypic domains;

Constructed rule-based computational procedure for normalization;

Applied the extractors and normalizers to all MEDLINE abstracts;

Imported the extracted information into a relational database.
Text Mining Applications -- Hypothesizing NB Candidate Genes
Text Mining Applications -- Hypothesizing NB Candidate Genes
Two distinct subtypes of neuroblastoma
Developmenta
l State
NB Subtype A
NB Subtype B
Younger age
Older age
Biology
Clinical
Stage
Differentiation
Lower
Stage
Proliferation
Higher
Stage
Clinical
Outcome
Trk Expression
Favorable
High level
expression of
NTRK1
Unfavorable
High level
expression of
NTRK2
Text Mining Applications -- Hypothesizing NB Candidate Genes

Two distinct subtypes of neuroblastoma
Distinct clinical behaviors (favorable vs. unfavorable)
• NGF/NTRK1 (TrkA) vs. BDNF/NTRK2 (TrkB) signaling pathways
•
Trk Signaling
Angiogenesis
Differentiation
Drug
Resistance
Tumorigenicity
NB Subtype A
NTRK1/NGF
Inhibits
Yes
Inhibits
Inhibits
NB Subtype B
NTRK2/BDNF
Promotes
No
Promotes
Promotes
Text Mining Applications -- Hypothesizing NB Candidate Genes

Two distinct subtypes of neuroblastoma
Distinct clinical behaviors (favorable vs. unfavorable)
• NGF/NTRK1 (TrkA) vs. BDNF/NTRK2 (TrkB) signaling pathways
• Determine the early response genes differentiating the two pathways
• More precise prognosis and clinical intervention
•
Text Mining Applications -- Hypothesizing NB Candidate Genes
NTRK1
NTRK2
SH-SY5Y
SH-SY5Y
NGF
BDNF
RNA extraction at 0,1.5hrs,4hrs and 12hrs
Affymetrix U133A Expression Array
(RMAexpress normalization, SAM test)
751 differentially expressed genes
Text Mining Applications -- Hypothesizing NB Candidate Genes
Microarray Expression Data Analysis
symbol
NALP1
RALY
Gene Set 1: NTRK1, NTRK2
CDC2L6
RASGRP2
KCNK3
468
RPS6KA1
SEC61A2
VGF
CACNA1C
TBX3
283
THRA
B4GALT5
NRXN2
GNB5
Gene Set 2: NTRK2, NTRK1
RAI2
FRS3
Text Mining Applications -- Hypothesizing NB Candidate Genes

Differentially represented genes in biomedical literature
NTRK1 vs. NTRK2 pathway differentially associated genes/proteins
based on literature
• Preferential association determined by co-occurrence with either
receptor 5 times or more over the other
• Assumption: the co-occurrence frequency is reflecting functional
correlation
•
Text Mining Applications -- Hypothesizing NB Candidate Genes
NTRK1/NTRK2 Preferentially Associated Genes in Literature
LitSet 1: NTRK1 Associated Genes
514
157
LitSet 2: NTRK2 Associated Genes
Text Mining Applications -- Hypothesizing NB Candidate Genes
Microarray Expression Data Analysis
NTRK1/NTRK2 Associated Genes in Literature
NTRK1 Associated Genes
Gene Set 1: NTRK1, NTRK2
18
514
468
283
Gene Set 2: NTRK2, NTRK1
4
157
NTRK2 Associated Genes
Functional Pathway Analysis
Determine gene enrichment score for six
selected functional pathways:
CD -- Cell Death;
CGP -- Cell Growth and Proliferation;
CCSI -- Cell-to-Cell Signaling and Interaction;
CM -- Cell Morphology
NSDF -- Nervous System Development and Function;
CAO -- Cellular Assembly and Organization.
Functional Pathway Analysis
CD
CGP
CCSI
CM
NSDF
CAO
Overall Group
(N=10,459)
1979, 18.9%
2251, 21.5%
1492, 14.3%
1068, 10.2%
897, 8.58%
755, 7.22%
Array Group
(N= 751)
153, 20.4%
154, 20.5%
57, 9.98%
85, 11.3%
108, 19.6%
103, 13.7%
Text Group
(N= 550)
309, 56.2%
304, 55.3%
186, 33.8%
219, 39.8%
148, 26.9%
115, 20.9%
Overlap Group
(N=22)
12, 54.5%
3, 13.6%
7, 31.8%
7, 31.8%
9, 40.9%
11, 50%
Six selected pathways:
CD -- Cell Death;
CGP -- Cell Growth and Proliferation;
CCSI -- Cell-to-Cell Signaling and Interaction;
Ingenuity Pathway Analysis Tool Kit
CM -- Cell Morphology;
NSDF -- Nervous System Development and Function;
CAO -- Cellular Assembly and Organization.
Hypergeometric Test P-values
CD
CGP
CCSI
CM
NSDF
CAO
Array Group
0.152
0.746
0.999
0.146
<0.001
<0.001
Text Group
0.0166
0.0216
0.0227
0.0109
<0.001
<0.001
Overlap Group
<0.001
0.728
0.009
0.001
<0.001
<0.001
Hypergeometric Test between Array and Overlap Groups
CD
CGP
CCSI
CM
NSDF
CAO
Overlap Group
<0.001
0.728
0.00940
0.0124
<0.001
0.0117
Multiple-test corrected P-values (Bonferroni step-down)
RT-PCR Experimental Validation
11 out of 22 genes selected for RT-PCR validation:
Symbol
Description
CAMK4 calcium/calmodulin-dependent protein kinase IV
VSNL1 visinin-like 1
TBC1D8 TBC1 domain family, member 8 (with GRAM domain)
RPS6KA1 ribosomal protein S6 kinase, 90kDa, polypeptide 1
EFNB3 ephrin-B3
B3GAT1 beta-1,3-glucuronyltransferase 1 (glucuronosyltransferase P)
GNAS
GNAS complex locus
NEFH
neurofilament, heavy polypeptide 200kDa
INA
internexin neuronal intermediate filament protein, alpha
NEFL
neurofilament, light polypeptide 68kDa
TYRO3 TYRO3 protein tyrosine kinase
RT-PCR Experimental Validation
11 out of 22 genes selected for RT-PCR validation:
Symbol
Description
CAMK4 calcium/calmodulin-dependent protein kinase IV
VSNL1 visinin-like 1
TBC1D8 TBC1 domain family, member 8 (with GRAM domain)
RPS6KA1 ribosomal protein S6 kinase, 90kDa, polypeptide 1
EFNB3 ephrin-B3
B3GAT1 beta-1,3-glucuronyltransferase 1 (glucuronosyltransferase P)
GNAS
GNAS complex locus
NEFH
neurofilament, heavy polypeptide 200kDa
INA
internexin neuronal intermediate filament protein, alpha
NEFL
neurofilament, light polypeptide 68kDa
TYRO3 TYRO3 protein tyrosine kinase
RT-PCR Experimental Validation
11 out of 22 genes selected for RT-PCR validation:
Symbol
Description
CAMK4 calcium/calmodulin-dependent protein kinase IV
VSNL1 visinin-like 1
TBC1D8 TBC1 domain family, member 8 (with GRAM domain)
RPS6KA1 ribosomal protein S6 kinase, 90kDa, polypeptide 1
EFNB3 ephrin-B3
B3GAT1 beta-1,3-glucuronyltransferase 1 (glucuronosyltransferase P)
GNAS
GNAS complex locus
NEFH
neurofilament, heavy polypeptide 200kDa
INA
internexin neuronal intermediate filament protein, alpha
NEFL
neurofilament, light polypeptide 68kDa
TYRO3 TYRO3 protein tyrosine kinase
RT-PCR Experimental Validation
EFNB3
2.5
2
1.5
TrkA
TrkB
1
0.5
0
0hr
1.5hr
4hr
12hr
EFNB3 Discussion




EFNB3 (ephrin-B3) belongs to a family of ligands that
binds to Eph family receptor tyrosine kinases
Implicated in axon guidance and vertebrate nervous
system development
Exhibited growth-suppressive activity against NB cells in
vitro
Preferentially and significantly associated with low tumor
stage and favorable clinical outcomes in neuroblastoma
primary tumors
RT-PCR Experimental Validation
TYRO3
1.4
1.2
1
0.8
TrkA
0.6
TrkB
0.4
0.2
0
0hr
1.5hr
4hr
12hr
TYRO3 Discussion




Trans-memberane receptor tyrosine kinase activated by
GAS6
GAS6 has showed to promote human fetal
oligodendrocyte survival without proliferation
GAS6 may also contribute to cell adhesion and immune
responses
Further study of GAS6/TYRO3 signaling is needed
Summary -- NB Application

Prioritized array-determined differentially expressed genes by
integrating text mining results

Literature-based method showed its capability of enriching
functionally relevant genes by pathway analysis

RT-PCR experiments further validated the inferential power of text
mining
Conclusion

Created a process for iteratively and precisely defining biomedical
semantic types directly from literature

Developed automated entity extractors across genomic and
phenotypic domains in malignancy with satisfactory accuracy rates

Applied this computational entity recognition and normalization
process to all MEDLINE abstracts

Integrated text mining results with neuroblastoma experimental data
to hypothesize candidate genes differentiating neuroblastoma
subtypes
Future Directions

Increasing dimensions of Information matrix

Context-based normalization algorithm

Relation extraction with deeper semantic parsing
Acknowledgement
Penn BioIE Team:
Dr. Mark Liberman
Dr. Mark Mandel
Dr. Ryan McDonald
Dr. Fernando Pereira
Annotator team
White Lab:
Steve Carroll
Hawren Fang
Kevin Murphy
Brodeur Lab:
Dr. Garrett Brodeur
Ms. Ruth Ho
Dr. Jane Minturn
CHOP NAP Core:
Dr. Eric Rappaport
CHOP Bioinformatics Core:
Dr. Xiaowu Gai
Dr. Jim Zhang