Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Laboratory for Text information, Mining, Analysis and Prediction Construction of Molecular Networks and Pathways using OMICs and Literature Data Mathew Palakal and Meeta Pradhan School of Informatics IUPUI 1 Indiana University-Purdue University, Indianapolis From Bibliomics to Target Discovery for Colorectal Cancer CRC related Keywords 2 BioSIFTER BioMAP Literature harvesting and Personalization Mining and Identification of novel biomarkers BioSIFTER BioSIFTER BioSIFTER BioSIFTER BioSIFTER BioSIFTER BioMAP: BioMedical Literature Mining “ A major challenge faced by biologist is to identify the most significant genes in a disease that can be targeted” Nodes/Links Experimental Data Our Hypothesis: Augmenting the experimental data with literature data can help to identify novel molecules that may be of significant relevance to the study under consideration. New Nodes/Links Augmented with Literature Data 9 Regulatory Network Construction and Analysis Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling CRC miRNA Network Multi-scale Multi-level Analysis Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis P53 EP300 Protein Interaction Prediction Annotating the Interaction Network with miRNA and miRNA Expression Data Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Sub-Graph Analysis Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning 10 Hyper geometric Associations CRC TF Network Experiments on TF Networks Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS P53 EP300 Set of 48 keywords: significant nodes in the network myh, mlh1, cdk8, crcs7, dcc, crcs6, tgfbr1, tpx2, crcs, apc, hnpcc7, msh2, mlh1, braf, hnpcc, msh6, pten, Topological Analysis fus1, cxcl2, rad18, hgf, axin2, casp3, prl3, nat1, gstm1, gstt1, cyp2c9, bcl2, prmt1, sn38, cpt11, proxy, smad3, Sub-Graph Annotating the Analysis igfbp1, pdgfb, capg, plk1, ifim1, csnk2a2, mbl2, pms2, Interaction Network with miRNA and miRNAcolorectal Cancer cxcl2, igfir, cyp27b1, cyp24, mucins, Hyper geometric Expression Data Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 11 Associations TF Network Validation of the Significant Genes Literature Mining Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Retrieved 133,923 articles. Topological Analysis Obtained 2724 unique Swiss-Prot entry names.Sub-Graph Annotating the P53 EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 12 Interaction Network with miRNA and miRNA Expression Data Analysis Hyper geometric Associations TF Network Validation of the Significant Genes Protein Interaction Prediction Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS 2724 miRNA Network Identification of significant nodes in the network Protein-protein interaction prediction is based on: Gene Ontology Annotation Similarity Association Topological Structural Interaction Pfam domain interaction Annotating the P53 EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 13 Sequence Interaction Network with miRNA and miRNA Potential Analysis Expression Data Analysis Sub-Graph Analysis Hyper geometric Associations TF Network Validation of the Significant Genes Sliding Window Algorithm for PPI Prediction Experiment Data P53 hMLH1: DNA repair parameters for probable Physico-chemical MSH2: DNA repair CDK8: Wntinterface signaling interacting identification miRNA Network 2F1Y A 1c26 A Identification of significant nodes in the network Hydrophobicity Literature augmented data Accessibility SMAD4, P53, NF-kB, AKT1, PAK1, SOS Propensity Residue Interface Topological Analysis P53 : EP300= Total Interacting Score (Number of Interface Residue and Number of P53Structure Interacting) EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 14 1Z1M A Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA 1L3E B Expression Data TF Network Protein % structure Interacting P53_HUMAN 70 P53_HUMAN 59 P53_HUMAN 67 UBP7_HUMAN 100 3BIY A EP300 MDM2_HUMAN EP300_HUMAN MDM4_HUMAN P53_HUMAN Hyper geometric Associations % structure Interacting Validation of the 93 Significant Genes 100 100 74 Transcription Factor Network Generation for CRC Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network 117 transcription factors Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, 277 non-transcription factors AKT1, PAK1, SOS Topological Analysis 700 interactions P53 EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 15 Annotating the Interaction Network with miRNA and miRNA Expression Data Sub-Graph Analysis Hyper geometric Associations TF Network Validation of the Significant Genes Multi-level Multi-parametric Approach to Identify Significant Transcription Factors in CRC Network Experiment Data hMLH1: DNA repair Analysis Topological MSH2: DNA repair CDK8: Wnt signaling miRNA Network Nodestrength = function (ProteinInteractionPropensityScore, Literature Topological Features) augmented data Identification of significant nodes in the network SMAD4, P53, NF-kB, AKT1, PAK1, SOS Sub-Graph Analysis P53 EP300 Hyper geometric Associations Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning Topological Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data Sub-Graph Analysis Hyper geometric Associations TF Network Validation of the Significant Genes Multiparametric approach is used to identify significant Transcription Factors. 16 Results: Significant Transcription Factors in CRC Network Experiment Data DNA repair HighlyhMLH1: Scored MSH2: DNA repairCommon Transcription factors: CDK8: Wnt signaling miRNA Network c-Jun, NF-kB, P53, STAT3, SP1, STAT1, c-MYC, E2F1, SMAD3, MEF2A Literature augmented Highly Scoreddata Unique SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological: P53 EP300 Hypergeometric: Protein Interaction Prediction Gene Ontology Annotation Similarity Association Module: Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 17 Transcription Factors: LEF1, MEF2C, SMAD2, SMAD4, ELK-1, PPARA Annotating the Interaction Network DAND5, RXRA, ESR1, with miRNA and miRNA Expression Data ATF-2, SP3, RARA, PPARD Identification of significant nodes in the network Topological Analysis Sub-Graph Analysis Hyper geometric Associations TF Network P73, ETS1, ETS2, GATA-1, FOXA1, FOXA2, SLUG, HAND1, SNAIL, VDR, TF7L2, ITF-2, REST, SRF, IRF1 Validation of the Significant Genes Result: A Highly-scored Module Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network PIAS1 Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS C-JUN ATF-2 MAPK14 P53 EP300 MAPK1 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 18 ELK-1 MK10 Topological Analysis ESR1 Annotating the Interaction Network with JNK1miRNA and miRNA Expression Data Sub-Graph Analysis Hyper geometric Associations TF Network MK09 Validation of the Significant Genes Validation of the Significant Genes Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis P53 EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 19 Annotating the Interaction Network with miRNA and miRNA Expression Data Sub-Graph Analysis Hyper geometric Associations TF Network Validation of the Significant Genes Validation of the Significant Genes Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis P53 EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 20 Annotating the Interaction Network with miRNA and miRNA Expression Data Sub-Graph Analysis Hyper geometric Associations TF Network Validation of the Significant Genes Validation of the Significant Genes Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis P53 EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 21 Annotating the Interaction Network with miRNA and miRNA Expression Data Sub-Graph Analysis Hyper geometric Associations TF Network Validation of the Significant Genes Global Transcription Factor Association Network showing Functional Groups Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis P53 EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 22 Annotating the Interaction Network with miRNA and miRNA Expression Data Sub-Graph Analysis Hyper geometric Associations TF Network Validation of the Significant Genes Annotation of miRNA with Transcription Factors in CRC Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS P53 EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 23 Expression dataset: GSE14985 3 Normal samples, 3 colon samples miRNA Network No. of miRNA :723 Identification of Top 100 differentially expressed miRNA arenodes in the significant network identified. 26 upregulated and 74 downregulated miRNA are Topological Analysis further analyzed. Annotating the Interaction Network with miRNA and miRNA Expression Data Sub-Graph Analysis Hyper geometric Associations TF Network Validation of the Significant Genes Novel miRNA identified Experiment Data Up-regulated hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling Novel miRNA Literature miRNA Network Identification of Target of miRNA Relevance tosignificant cancer nodes in the hsa-miR-663 CCND1, FOS, PTEN, TGFBR1 Not reported* hsa-miR-630 ATM, BAX,BCL2,BCL2L2, CASP3, Not reported* augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS P53 p53, TP73 EP300 hsa-miR-424 ATF2, BCR, CCND1,CDK6, Protein Interaction Prediction network Annotating the Interaction Network with miRNA and miRNA CHEK1, Expression DataKidney, Gene Ontology AnnotationE2F1, EGFR, ESR1, ETS1, FLT3, TF Network Similarity Association Structural Interactions HIF1A, MUC1, MYB, RARA, RUNX1, Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 24 SMAD3, SP2,WEE1 Topological Analysis Sub-Graph Analysis Hyper geometric Associations Pancreatic cancer Validation of the Significant Genes Novel miRNA Identified Down-regulated Experiment Data Novel miRNA hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling hsa-let-7c Target of miRNA Disease miRNA Network Lung,hepatocellular Identification of cancer significant nodes in the Literature network augmented data hsa-let-7d Epithelial SMAD4, P53, NF-kB, AKT1, PAK1, SOS Ovariancancer Topological Analysis hsa-let-7i BCL2, HIF1A, NFKB1, TLR4 Breast cancer hsa-miR-103 BMP7, CDK6, PPARA Pancreatic cancer Sub-Graph hsa-miR-100 AKT1,CCND1, ESR1,FGFR3,JUN,P53 Annotating the Oral squamous cell Analysis Interaction Network P53 EP300 MYC carcinoma with miRNA and miRNA hsa-miR-99a AKT1, BDNF, CCND1, JUN,IGF1, JUN,Data Bladder cancer Hyper geometric Expression Protein Interaction Prediction Associations MYC, p53 Gene Ontology Annotation TF Network Similarity hsa-miR-30e Association Bcl2l2, ERBB2 Lung cancer Structural Interactions hsa-miR-425 SMAD3 Glioblastoma Pfam Domain Interactions Validation of the Sequence Potential Analysis AKT1, IRS1 hsa-miR-361-5p Ovarian cancerSignificant Genes Interaction Scoring hsa-miR-494 AKT1, CDK6, JUN, PTEN Cardiac Hypertrophy (i) First Principle Methods hsa-miR-331-3p AKT1, EGFR, ERBB2 Epithelial ovarian cancer (ii) Machine Learning 25 BBC3, BCL2, MCL1, MEF2C, MYC, NGF, PPARA, ADAM9 BDNF, CCND1, EGFR, SMAD3 miRNA-gene Network Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis P53 EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 26 Annotating the Interaction Network with miRNA and miRNA Expression Data Sub-Graph Analysis Hyper geometric Associations TF Network Validation of the Significant Genes Number of miRNA Associated with CRC Related Pathways Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis P53 EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 27 Annotating the Interaction Network with miRNA and miRNA Expression Data Sub-Graph Analysis Hyper geometric Associations TF Network Validation of the Significant Genes Validation of the Significant Genes Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS P53 EP300 Protein Interaction Prediction Gene Ontology Annotation Similarity Association Structural Interactions Pfam Domain Interactions Sequence Potential Analysis Interaction Scoring (i) First Principle Methods (ii) Machine Learning 28 Annotating the Interaction Network with miRNA and miRNA Expression Data Module: Brca1: significant nodes in the network p53:c-Myc Pathway:Topological Brca1 as a transcription Analysis regulator Domain:Sub-Graph DNA Analysis Damage Hyper geometric Associations TF Network Validation of the Significant Genes Protein-Protein Interaction Prediction Tool hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling Identification of significant nodes in the network Experiment Data Topological Analysis SMAD4, P53, NF-kB, AKT1, PAK1, SOS Literature augmented data P53 Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data EP300 Algorithm for Interacting Proteins Interaction Scoring (i) First Principle Methods (ii) Machine Learning 29 Hyper geometric Associations Validation of the Significant Nodes Publications M. Pradhan, P. Gandra, M. Palakal, Predicting Protein-Protein Interactions using First Principle Methods and Statistical Scoring, ACM International Symposium on Biocomputing, Calicut, 2010. M. Pradhan and M. Palakal, Global analysis of transcription factors and functional domains in CRC. (Manuscript under preparation). M. Pradhan, P. Gandra, M. Palakal, Predicting Protein-Protein Interactions using First Principle Methods and Statistical Scoring, ACM International Symposium on Biocomputing, Calicut, 2010. M. Pradhan and M. Palakal, Identifying CRC specific pathways and biomarkers from literature augmented proteomics data, BIOCOMP 2010. M. Pradhan and M. Palakal Global analysis of miRNA target genes in colon rectal cancer, IEEE BIBM Hong Kong, 2010. M. Pradhan and M. Palakal, Global analysis of transcription factors in CRC using protein interaction networks. (Manuscript in final stages). M. Pradhan and M. Palakal, Identifying candidate pathways and genes in CRC: meta-analysis of gene expression data (Manuscript in preparation). M. Pradhan and M. Palakal, Machine Learning for Predicting Protein Interactions (Manuscript in preparation). M. Pradhan, Sanders P and M. Palakal, Algorithm for Protein-drug binding predictions (Manuscript in preparation). Y. Pandit , M. Pradhan and M. Palakal, Database for Protein-Protein Interaction Predictions (Manuscript in preparation). 33 Acknowledgements The TiMAP team: Meeta Pradhan Shielly Hartanto Premchand Gandra Deepali Jhamb Rini Pauly Gokul Kilaru Philip Sanders Yogesh Pandit Sijin C. A. Tulip Nadu Kshithija Nagulapalli http://regen.informatics.iupui.edu/research/ Questions? 35