Download Construction of Molecular Network Pathways using

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Laboratory for
Text information, Mining, Analysis and Prediction
Construction of Molecular
Networks and Pathways using
OMICs and Literature Data
Mathew Palakal and Meeta Pradhan
School of Informatics
IUPUI
1
Indiana University-Purdue University, Indianapolis
From Bibliomics to Target Discovery for
Colorectal Cancer
CRC related
Keywords
2
BioSIFTER
BioMAP
Literature
harvesting and
Personalization
Mining and
Identification of
novel biomarkers
BioSIFTER
BioSIFTER
BioSIFTER
BioSIFTER
BioSIFTER
BioSIFTER
BioMAP: BioMedical Literature Mining
“ A major challenge faced by biologist is to identify the most
significant genes in a disease that can be targeted”
Nodes/Links
Experimental Data
Our Hypothesis: Augmenting the
experimental data with literature
data can help to identify novel
molecules that may be of significant
relevance to the study under
consideration.
New Nodes/Links
Augmented with
Literature Data
9
Regulatory Network Construction and Analysis
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
CRC miRNA Network
Multi-scale Multi-level
Analysis
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
Topological
Analysis
P53
EP300
Protein Interaction Prediction
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Sub-Graph
Analysis
Validation of the
Significant Genes
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
10
Hyper geometric
Associations
CRC TF Network
Experiments on TF Networks
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
Identification of
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
P53
EP300
Set of 48 keywords:
significant nodes in the
network
myh, mlh1, cdk8, crcs7, dcc, crcs6, tgfbr1, tpx2, crcs,
apc, hnpcc7, msh2, mlh1, braf, hnpcc, msh6, pten, Topological
Analysis
fus1, cxcl2, rad18, hgf, axin2, casp3, prl3, nat1, gstm1,
gstt1, cyp2c9, bcl2, prmt1, sn38, cpt11, proxy, smad3,
Sub-Graph
Annotating the
Analysis
igfbp1, pdgfb, capg, plk1,
ifim1,
csnk2a2, mbl2, pms2,
Interaction
Network
with miRNA
and miRNAcolorectal Cancer
cxcl2, igfir, cyp27b1, cyp24,
mucins,
Hyper geometric
Expression Data
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
11
Associations
TF Network
Validation of the
Significant Genes
Literature Mining
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
Identification of
significant nodes in the
network
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
 Retrieved 133,923 articles.
Topological
Analysis
 Obtained 2724 unique
Swiss-Prot entry names.Sub-Graph
Annotating the
P53
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
12
Interaction Network
with miRNA and miRNA
Expression Data
Analysis
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Protein Interaction Prediction
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
2724
miRNA Network
Identification of
significant nodes in the
network
Protein-protein interaction prediction is based on:
 Gene Ontology Annotation Similarity Association
Topological
 Structural Interaction
 Pfam domain interaction
Annotating the
P53
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
13
 Sequence
Interaction Network
with miRNA
and miRNA
Potential
Analysis
Expression Data
Analysis
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Sliding Window Algorithm for PPI Prediction
Experiment
Data
P53
hMLH1: DNA repair parameters for probable
Physico-chemical
MSH2: DNA repair
CDK8: Wntinterface
signaling
interacting
identification
miRNA Network
2F1Y A
1c26 A
Identification of
significant nodes in the
network
Hydrophobicity
Literature
augmented data
Accessibility
SMAD4, P53, NF-kB,
AKT1,
PAK1, SOS Propensity
Residue
Interface
Topological
Analysis
P53 : EP300= Total Interacting Score (Number
of Interface Residue and Number of
P53Structure Interacting)
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
14
1Z1M A
Sub-Graph
Analysis
Annotating the
Interaction Network
with miRNA and miRNA
1L3E B
Expression Data
TF Network
Protein
% structure Interacting
P53_HUMAN
70
P53_HUMAN
59
P53_HUMAN
67
UBP7_HUMAN
100
3BIY A
EP300
MDM2_HUMAN
EP300_HUMAN
MDM4_HUMAN
P53_HUMAN
Hyper geometric
Associations
% structure Interacting
Validation of the
93
Significant
Genes
100
100
74
Transcription Factor Network Generation for CRC
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
 117 transcription factors
Identification of
significant nodes in the
network
Literature
augmented data
SMAD4, P53, NF-kB,
 277
non-transcription
factors
AKT1,
PAK1, SOS
Topological
Analysis
 700 interactions
P53
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
15
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Multi-level Multi-parametric Approach to Identify Significant
Transcription Factors in CRC Network
Experiment
Data
hMLH1: DNA repair Analysis
 Topological
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
Nodestrength = function (ProteinInteractionPropensityScore,
Literature
Topological Features)
augmented data
Identification of
significant nodes in the
network
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
 Sub-Graph Analysis
P53
EP300
 Hyper geometric Associations
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
Topological
Analysis
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Multiparametric approach is used to identify significant Transcription Factors.
16
Results: Significant Transcription Factors in CRC
Network
Experiment
Data
DNA repair
HighlyhMLH1:
Scored
MSH2:
DNA repairCommon Transcription factors:
CDK8: Wnt signaling
miRNA Network
c-Jun, NF-kB, P53, STAT3, SP1, STAT1, c-MYC, E2F1, SMAD3, MEF2A
Literature
augmented
Highly
Scoreddata
Unique
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
Topological:
P53
EP300
Hypergeometric:
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
Module:
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
17
Transcription Factors:
LEF1, MEF2C, SMAD2,
SMAD4, ELK-1, PPARA
Annotating the
Interaction Network
DAND5, RXRA, ESR1,
with miRNA and miRNA
Expression Data
ATF-2, SP3, RARA, PPARD
Identification of
significant nodes in the
network
Topological
Analysis
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
P73, ETS1, ETS2, GATA-1,
FOXA1, FOXA2, SLUG,
HAND1, SNAIL, VDR, TF7L2,
ITF-2, REST, SRF, IRF1
Validation of the
Significant Genes
Result: A Highly-scored Module
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
PIAS1
Identification of
significant nodes in the
network
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
C-JUN
ATF-2
MAPK14
P53
EP300
MAPK1
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
18
ELK-1
MK10
Topological
Analysis
ESR1
Annotating the
Interaction Network
with
JNK1miRNA and miRNA
Expression Data
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
MK09
Validation of the
Significant Genes
Validation of the Significant Genes
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
Identification of
significant nodes in the
network
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
Topological
Analysis
P53
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
19
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Validation of the Significant Genes
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
Identification of
significant nodes in the
network
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
Topological
Analysis
P53
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
20
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Validation of the Significant Genes
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
Identification of
significant nodes in the
network
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
Topological
Analysis
P53
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
21
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Global Transcription Factor Association Network
showing Functional Groups
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
Identification of
significant nodes in the
network
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
Topological
Analysis
P53
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
22
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Annotation of miRNA with Transcription
Factors in CRC
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
P53
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
23




Expression dataset: GSE14985
3 Normal samples,
3 colon samples
miRNA Network
No. of miRNA :723
Identification of
Top 100 differentially expressed miRNA
arenodes in the
significant
network
identified.
 26 upregulated and 74 downregulated miRNA
are
Topological
Analysis
further analyzed.
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Novel miRNA identified
Experiment
Data
Up-regulated
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
Novel miRNA
Literature
miRNA Network
Identification of
Target of miRNA
Relevance tosignificant
cancer nodes in the
hsa-miR-663
CCND1, FOS, PTEN, TGFBR1
Not reported*
hsa-miR-630
ATM, BAX,BCL2,BCL2L2, CASP3,
Not reported*
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
P53
p53, TP73
EP300
hsa-miR-424
ATF2, BCR, CCND1,CDK6,
Protein Interaction Prediction
network
Annotating the
Interaction Network
with miRNA and miRNA
CHEK1,
Expression DataKidney,
 Gene Ontology AnnotationE2F1, EGFR, ESR1, ETS1, FLT3,
TF Network
Similarity Association
 Structural Interactions
HIF1A, MUC1, MYB, RARA, RUNX1,
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
24
SMAD3, SP2,WEE1
Topological
Analysis
Sub-Graph
Analysis
Hyper geometric
Associations
Pancreatic cancer
Validation of the
Significant Genes
Novel miRNA Identified
Down-regulated
Experiment
Data
Novel
miRNA
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
hsa-let-7c
Target of miRNA
Disease
miRNA Network
Lung,hepatocellular
Identification of
cancer
significant nodes in the
Literature
network
augmented data
hsa-let-7d
Epithelial
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
Ovariancancer Topological
Analysis
hsa-let-7i
BCL2, HIF1A, NFKB1, TLR4
Breast cancer
hsa-miR-103
BMP7, CDK6, PPARA
Pancreatic cancer
Sub-Graph
hsa-miR-100
AKT1,CCND1, ESR1,FGFR3,JUN,P53
Annotating the Oral squamous cell
Analysis
Interaction
Network
P53
EP300
MYC
carcinoma
with miRNA and miRNA
hsa-miR-99a
AKT1, BDNF, CCND1, JUN,IGF1,
JUN,Data Bladder cancer Hyper geometric
Expression
Protein Interaction Prediction
Associations
MYC, p53
 Gene Ontology Annotation
TF Network
Similarity
hsa-miR-30e
Association
Bcl2l2, ERBB2
Lung cancer
 Structural Interactions
hsa-miR-425
SMAD3
Glioblastoma
 Pfam Domain Interactions
Validation of the
 Sequence
Potential Analysis AKT1, IRS1
hsa-miR-361-5p
Ovarian cancerSignificant Genes
Interaction
Scoring
hsa-miR-494
AKT1, CDK6, JUN, PTEN
Cardiac Hypertrophy
(i) First Principle Methods
hsa-miR-331-3p
AKT1, EGFR, ERBB2
Epithelial ovarian cancer
(ii) Machine
Learning
25
BBC3, BCL2, MCL1, MEF2C, MYC,
NGF, PPARA, ADAM9
BDNF, CCND1, EGFR, SMAD3
miRNA-gene Network
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
Identification of
significant nodes in the
network
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
Topological
Analysis
P53
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
26
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Number of miRNA Associated with CRC
Related Pathways
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
Identification of
significant nodes in the
network
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
Topological
Analysis
P53
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
27
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
Sub-Graph
Analysis
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Validation of the Significant Genes
Experiment
Data
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
miRNA Network
Identification of
Literature
augmented data
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
P53
EP300
Protein Interaction Prediction
 Gene Ontology Annotation
Similarity Association
 Structural Interactions
 Pfam Domain Interactions
 Sequence Potential Analysis
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
28
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
Module:
Brca1:
significant
nodes in the
network
p53:c-Myc
Pathway:Topological
Brca1 as a
transcription
Analysis
regulator
Domain:Sub-Graph
DNA
Analysis
Damage
Hyper geometric
Associations
TF Network
Validation of the
Significant Genes
Protein-Protein Interaction Prediction Tool
hMLH1: DNA repair
MSH2: DNA repair
CDK8: Wnt signaling
Identification of
significant nodes in
the network
Experiment Data
Topological
Analysis
SMAD4, P53, NF-kB,
AKT1, PAK1, SOS
Literature
augmented data
P53
Sub-Graph
Analysis
Annotating the
Interaction Network
with miRNA and miRNA
Expression Data
EP300
Algorithm for
Interacting Proteins
Interaction Scoring
(i) First Principle Methods
(ii) Machine Learning
29
Hyper geometric
Associations
Validation of the
Significant Nodes
Publications
M. Pradhan, P. Gandra, M. Palakal, Predicting Protein-Protein Interactions using First Principle
Methods and Statistical Scoring, ACM International Symposium on Biocomputing, Calicut, 2010.
M. Pradhan and M. Palakal, Global analysis of transcription factors and functional domains in CRC.
(Manuscript under preparation).
M. Pradhan, P. Gandra, M. Palakal, Predicting Protein-Protein Interactions using First Principle
Methods and Statistical Scoring, ACM International Symposium on Biocomputing, Calicut, 2010.
M. Pradhan and M. Palakal, Identifying CRC specific pathways and biomarkers from literature
augmented proteomics data, BIOCOMP 2010.
M. Pradhan and M. Palakal Global analysis of miRNA target genes in colon rectal cancer, IEEE BIBM
Hong Kong, 2010.
M. Pradhan and M. Palakal, Global analysis of transcription factors in CRC using protein interaction
networks. (Manuscript in final stages).
M. Pradhan and M. Palakal, Identifying candidate pathways and genes in CRC: meta-analysis of gene
expression data (Manuscript in preparation).
M. Pradhan and M. Palakal, Machine Learning for Predicting Protein Interactions (Manuscript in
preparation).
M. Pradhan, Sanders P and M. Palakal, Algorithm for Protein-drug binding predictions (Manuscript in
preparation).
Y. Pandit , M. Pradhan and M. Palakal, Database for Protein-Protein Interaction Predictions
(Manuscript in preparation).
33
Acknowledgements
The TiMAP team:
Meeta Pradhan
Shielly Hartanto
Premchand Gandra
Deepali Jhamb
Rini Pauly
Gokul Kilaru
Philip Sanders
Yogesh Pandit
Sijin C. A.
Tulip Nadu
Kshithija Nagulapalli
http://regen.informatics.iupui.edu/research/
Questions?
35
Related documents