Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis Bio-Trac 40 (Protein Bioinformatics) October 9, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center 1 Overview • Introduction - What are large-scale omics data? - What do they tell you? How to interpret? • Approaches - Omics data integration - Resources: databases and tools • Case studies • Systems biology - Top-down, bottom-up - Pathway, network modeling 2 Bioinformatics focus is changing… • Individual molecules – DNA, RNA, proteins – Sequence, structure, function – Evolutionary analysis • Population of molecules – – – – Genome, proteome and other “-omes” Interactions, complexes Genomics, Proteomics Pathways, processes High level organizations 3 From One Gene: multiple genetic variants, multiple transcripts, multiple protein products… and PTMs… 4 To Global Knowledge: Genome The “-ome” and “-omics” Transcriptome Proteome Metabolome Other “-omes”: ORFeome Promoterome Interactome Receptome Phenome more… 5 Gastric Cancer ECM cluster Genes Global analysis Potential Gene Markers SPARC COL3A1 SULF1 YARS ABCA5 THY1 6 SIDT2 Corresponding to ECM cluster (Chen et al., 2003; Qiu et al, 2007) Identification of novel MAP kinase pathway signaling targets (PMA/TPA K562 cells MAPK pathway targets) ~3500 spots Digest of U-24 ~91spot changes reproducible Twenty-five targets of this signaling pathway were identified, of which only five were previously characterized as MKK/ERK effectors. The remaining targets suggest novel roles for this signaling cascade in cellular processes of nuclear transport, nucleotide excision repair, nucleosome assembly, membrane trafficking, and cytoskeletal regulation. -- Mol Cell. 6:1343-54, 2000 7 Drosophila Embryo Interaction Map Using Y2H technology, 102 bait protein homologous to human cancer genes, 2300 interactions detected, 710 high confidence. The proteins in the map that bear an RA (Ras Association) or RBD (Raf-like Ras-binding) domain define a discrete subnetwork around Ras-like GTPases (colored in yellow). Genome Res. 15:376-84, 2005. The exploration of the present map leads to numerous biological hypothesis and expands our knowledge of regulatory protein networks important in human cancer as shown by the biological analysis of a particularly interesting network surrounding the Ras oncogene. 8 Strategy for Functional Analyses of Omics Data Omics Data Microarray, 2D, IP, MS, etc. Protein mapping Bioinformatics Databases Data integration Gene, Protein, PPI, Pathway, PTM, etc. Text mining Literature (MEDLINE) Functional annotation Functional analysis ~50% GO annotations biological insights GO Profiling: Molecular function, biological process, cellular component Molecular networks (e.g. interaction, association) Biological pathways (e.g. KEGG, Reactome, PID, BioCarta) <10% pathway annotations Pathway, network, biomarker discovery 9 Methods for Functional Analysis • • • • Omics data integration Functional profiling Pathway analysis Resources/knowledgebases – Molecular databases – Omics data repositories • Bioinformatics tools – Open source: DAVID, FatiGO, iProXpress – Commercial: Ingenuity, GeneGO • Literature – Text mining 10 Principles of multi-omics data integration for Systems Biology Protein-Centric –Omics Analysis Transcriptomics iProXpress mRNA microarray dbEST coding EST DNA methylation profiling: coding genes Proteomics Protein Protein precursor Splicing forms Function Sites Epigenomics Peptide Natural peptides Protease/ Peptidase Peptidomics Enzyme1 Metabolic Pathways Metabolites: HMDB Enzyme2 dbSNP/ HapMap: NS-SNP Signaling Pathways Genomics Functional Profiling and Analysis Biological Processes Metabolomics 11 ID Mapping Batch gene/protein retrieval and profiling Enter ID, gi # Information matrix Functional profiling http://pir.georgetown.edu/pirwww/ search/idmapping.shtml 12 Protein annotations Well annotated entry: human p53 Comments (CC line) Features (FT line) References (RX line) (P53_HUMAN) 21 years! Cross References (DR line) GO 13 what molecular function? what biological process? what cellular component? 14 Biological Pathways and Networks Signaling pathways Metabolic pathways Organelle biogenesis Molecular networks 15 Pathways Human metabolic maps Global gene expression in skeletal muscle from gastric bypass patients before surgery and 1 year afterward. General trend after surgery: upregulated anaerobic metabolism; down-regulated oxidative phosphorylation green, down-regulated genes red, up-regulated genes white, no data available Proc Natl Acad Sci U S A. 2007 Feb 6;104(6):1777-82 http://www.pnas.org/cgi/data/0610772104/DC1/30 16 Databases of Protein Functions • Metabolic Pathways – KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways – EcoCyc: Encyclopedia of E. coli Genes and Metabolism – MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) • Inter-Molecular Interactions and Regulatory Pathways – – – – – – – IntAct: Protein interaction data from literature and user submission BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins Reactome - A curated knowledgebase of biological pathways Pathway Interaction Database (PID) BioCarta: Biological pathways of human and mouse Pathway Commons • GO and GO annotation projects 17 Gene Ontology (GO) 18 GO Slim http://www.geneontology.org/GO.slims.shtml 19 Biological Pathway Resource Collection http://www.pathguide.org/ • • • • • Protein-protein interactions Metabolic pathways Signaling pathways Pathway diagrams Transcription factors / gene regulatory networks • Protein-compound interactions • Genetic interaction networks 20 http://www.pathwaycommons.org/pc/home.do 21 KEGG Metabolic & Regulatory Pathways KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/pathway.html) 22 BioCarta Cellular Pathways (http://www.biocarta.com/index.asp) Transforming Growth Factor (TGF) beta signaling [Homo sapiens] 23 Transforming Growth Factor (TGF) beta signaling [Homo sapiens] Reactome: events and objects (including modified forms and complex) (http://reactome.org/cgibin/eventbrowser?DB=gk_curre nt&FOCUS_SPECIES=Homo% 20sapiens&ID=170834&) Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens] Object -> REACT_7364.1: Phospho-R-SMAD [cytosol] Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens] Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol] Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleus Object -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] …… 24 PID Transforming Growth Factor beta signaling 25 Transforming Growth Factor (TGF) beta signaling Reactome PID 26 ~26 proteins in PID are not defined in Reactome, while only 2 in Reactome not defined in PID Ca2+ Growth signals PRO:000000616 TGF-beta signaling – comparison between PID and Reactome LAP TGF-b Furin TGF-b Growth signals TGF-b TGF-beta receptor PRO:000000410 S P Y P Y P S P Y P T P T P S P T P Smad 2 S P Y P Y P S P Y P II ISP S P T P Cytoplasm S P S P PRO:000000650 MEKK1 ERK1/2 I S PRO:000000523 P II Stress signals Smad 2 Smad 4 S P S P Smad 2 X P Shc XIAP CaM TAK1 X Shc S P S P Smad 2 TPTP SPSP Smad 2 SPSP S P Y P KU TAK1 Smad 4 Degradation MAPKKK S P S P S P S P Smad 2 Smad 2 Smad 4 Smad 4 Ski X DNA binding and transcription regulation Nucleu s P38 MAPK pathway JNK cascade S P T P Y P Phosphorylation (P) at Serine (S), Threonine (T) and Tyrosine (Y) KU Ubiquitination (U) at Lysine (K) Common in both Reactome & PID Only reported in Reactome 27 in * All others are in PID. Not all components the pathway from both databases are listed GEO: a gene expression/ molecular abundance repository http://www.ncbi.nlm.nih.gov/geo/ IntAct: open source database system and analysis tools for protein interaction data http://www.ebi.ac.uk/pride/ PRIDE: centralized, standards compliant, public data repository for proteomics data 28 http://www.ebi.ac.uk/pride/ Analysis Tools • iProXpress – http://pir.georgetown.edu/iproxpress/ • DAVID – http://david.abcc.ncifcrf.gov/ • Babelomics - FatiGO – http://babelomics.bioinfo.cipf.es/ • Commercial: – Ingenuity: http://www.ingenuity.com/ – GeneGO: http://www.genego.com/ • Visual tools: – Cytoscape: http://www.cytoscape.org/ – CellDesigner: http://www.celldesigner.org/ 29 iProXpress: Integrative analysis of proteomic and gene expression data Data http://pir.georgetown.edu/iproxpress/ MS spectrum Peptide ident. Protein ident. Information Function Pathway Family Categorize Statistics Association Knowledge 30 iProXpress – Pathway Profiling • Organelle proteome data sets ER Mit • Protein information matrix: extensive annotations including protein name, family classification, function, protein-protein interaction, pathway… • Functional profiling: iterative categorization, sorting, cross-dataset comparison, coupled with manual examination. KEGG pathway 31 iProXpress Analysis Interface 1 2 3 4 5 6 7 8 Cross-data groups comparative profiling 32 http://david.abcc.ncifcrf.gov/ 33 A Literature-Derived Network for Yeast All MEDLINE abstracts processed using statistical cooccurrence and NLP methods: • Functional association (cooccurrence – grey shades • Physical interaction – green • Regulation of expression – red • Phosphorylation – dark blue • Dephosphorylation – light blue Inference: Ssn3 ->Hsp104 (b) and Ume6 -> Ino2 & Erg9 (c) expressions Jensen et al., 2006 34 Case Studies Pathway studies: analysis of proteomics and gene expression data from cancer research I. Estrogen Signaling Pathways (estrogen-induced apoptosis) Breast cancer cells (+E2) IP (AIB1, pY) 1D-gel MS/MS II. Purine Metabolic Pathways (radiation-induced DNA repair) Human fibroblast (AT patient) + irradiation 2D-gel MS DNA microarray III. Melanosome Biogenesis (comparative organelle proteomic profiling) Melanoma cell isolation of stage specific melanosmes MS 35 I. Estrogen Signaling Pathways (estrogen-induced apoptosis) E2 MCF-7 200nM for 2h MCF-7/5C Estrogen deprived condition Signaling pathway: early events? Breast cancer cells Growth pY-IP AIB1 Mimicking clinical condition: 2nd phase antiestrogen drug resistance Apoptosis AIB1-IP Hu ZZ, et al. (2008) US HUPO Integrated Bioinformatics MS proteomics Expression Profiling, Pathway/Network Mapping 36 Proteins only in E2 treated MCF-7/5C cells from both pY-IP and AIB1-IP GO profiling (biological process) Transcription Cell communication Chromosome remodeling & co-repression, cell cycle inhibition, apoptosis 37 Pathway Mapping: G(o) alpha-2 subunit (pY/5C +E2) RAP1GAP (AIB1/5C+E2) 38 Hypothesized E2-induced Apoptosis Pathways E2 GPR30 ? ?Gas Cytoplasm pY GNAO2 pY-IP AIB1-IP G(o) alpha-2, GPCR signaling GNAO2 AIB1 Rap1a Rap1GAP E2 ERa Function Rap1GAP Apoptosis MEK Growth inhibition/apoptosis BAD-mediated apoptosis CDK1 ERK BAD Sirt3 pY CDK1 Apoptosis Histone modification, apoptosis TLE3 Co-repression, apoptosis Cell growth Sirt3 TLE3 E2 ERa Sirt3 Nucleus CIP29 Cell cycle arrest/apoptosis RUNX3 AIB1 pY CIP29 39 Text mining for proteinprotein interaction (PPI) information 40 II. Purine Metabolic Pathways (radiation-induced DNA repair) Ionizing Radiation AT5BIVA ATCL8 ATM introduced AT patient fibroblast ATM-mutated ATM Sensitive to IR damage ATM-wild type Resistant to IR damage 2D-gel/MS DNA Microarray Proteins differentially expressed (1093) mRNAs differentially expressed (231) Hu ZZ, et al. (2008) J Prot. Bioinfo. Integrated Bioinformatics Intersections Expression Profiling, Pathway/Network Mapping (13 proteins/genes) 41 KEGG pathway profiles 42 (RRM2) 43 Purine metabolic pathway ATP X dATP DNA synthesis dGTP X GTP DNA repair ADP dADP dGDPGDP 1.17.4.1 Ribonucleoside diphosphate 1.17.4.1 reductase subunit M2 (RRM2) 44 Functional Association Networks RRM2 HDAC1 p53 BRCA1 RRM2 connected to other major DNA repair and cell cycle proteins, such as p53, BRCA1, HDAC1. 45 RRM2 in radiation-induced ATM-p53-mediated DNA repair pathway ATM p53 HDAC1 BRCA1 BRCA1 ATM p53 RRM2 RRM1 RR complex DNA repair 46 Comparative organelle proteome profiling allows to propose key proteins potentially involved in regulation of organelle biogenesis Keratinocytes Melanocyte P Drebrin P ARPC4 b-actin P Molecular motors: kinesin, dynein /dynactin, dynamin Myosin V, myosin Ic, Id, I4 P em ne a r b ll m e C 2 P21-rac1 Rab5c M Myo-Va P P M Lyst P AP-2a Pmel17 DDT? P vATPaseG2 Stage IV SLC24A5 (golden) V Lysosome Vinculin P vATPase G2 M Early endosome M Rab38 Rab27a 3 Tyrp1 TYR V H+ Na+/K+/Ca2+ 2 Late endosome 3 C Schematic drawing of melanosome biogenesis pathway and key proteins involved in each stage. Matp OA1 4 MART1 Sec24 M Pmel17 4 P P AP-2a Stage II Tyrp1 B Atp7a VAP-A P DCT TYR M Cu2+ Golgi ic m as m pl lu do icu En Ret eus III. Organelle Proteomes V 1 A P Pmel17 MART1 PEDF V Matp Ib1 M MGST3 Stage I hybrid organelle Flotillin-2 P Rab5 V TYR Tyrp1 V Newly identified and validated M Mouse color gene homolog P Proposed new protein * Untagged are known melanosome proteins Chi A, et al. (2006) J. Prot. Res. 47 Towards Systems Biology (Nature 422:193, 2003) Genomics Bibliomics Transcriptomics Literature Mining Proteomics Metabolomics Bioinformatics …mics …mics …omics Integrated knowledge and tools are needed for Systems Biology’s research 48 What is Systems Biology? Systems Biology, 2004, 1(1):19-27. ‘Systems biology defines and analyses the interrelationships of all of the elements in a functioning system in order to understand how the system works.’ -- Leroy Hood • How an organism works from an overall perspective. • Interactions of parts of biological systems – how molecules work together to serve a regulator function in cells or between cells. – how cells work to make organs, how organs work to make a person. • Systems biology is the converse of reductionist biology. 49 Reductionist vs. Systems Biology The driving force in 20th century biology has been reductionism: From the population to the individual From the individual to the cell From the cell to the biomolecule From the biomolecule to the genome From the genome to the genome sequence With the publication of genome sequences, reductionist biology has reached its endpoint The driving force for 21st century biology will be integration: Integrating the activity of genes and regulators into regulatory networks Integrating the interactions of amino acids into protein folding predictions Integrating the interactions of metabolites into metabolic networks Integrating the interactions of cells into organisms Integrating the interactions of individuals into ecosystems 50 Universal Organizing Principles Large-scale organization Level 4 Functional modules Level 3 Regulatory motif, pathway Level 2 Omics data, information Level 1 Although the individual components are unique to a given organism, the topologic properties of cellular networks share surprising similarities with those of natural and social networks 51 Approaches: top-down or bottom-up Three types of models • top-down: systemic-data driven, to discover or refine pre-existing models that describe the measured data (more on regulatory models). Emerges as dominant method due to “-omics”. • bottom-up: starts with the molecular properties to construct models to predict systemic properties followed by validation and model refinement (more on kinetic models) (Silicon cell program: 52 http://www.siliconcell.net/) Bruggeman FJ, Westerhoff HV. Trends Microbiol. 2007 15:45-50. Top-down Yeast two-hybrid Combination of techniques (Y2H, protein arrays) Integration of other types of information (expression, localization or genetic studies) Curr Opin Chem Biol. 2006 Dec;10(6):551-8. dynamic biologically relevant interaction 53 subnetworks EGFR-GAB1-ERK/Akt network EGFR signaling network model is constructed based on the reaction stoichiometry and kinetic constants Bottom-up J Biol Chem. 2006 281:19925-38 The model allows predictions of temporal patterns of cellular responses to EGF under diverse perturbations (e.g., EGF doses): • The dynamics of GAB1 tyr-phosphorylation is controlled by positive GAB1-PI3K and negative MAPK-GAB1 feedbacks. • The essential function of GAB1 is to enhance PI3K/Akt activation and extend the duration of Ras/MAPK signaling. • GAB1 plays a critical role in cell proliferation and tumorigenesis by amplifying positive interactions between survival and 54 mitogenic pathways Gene regulatory networks (GRNs) WIRED Systems biology looks at the connections between components in cells. Essential elements of the role of Dorsal in establishing dorsoventral polarity in Drosophila embryonic development Reprod Toxicol. 19:281-90, 2005 55 Modeling of the main modules of cell-cycle progression Three functional units: • Start function: onset of S-phase • Cyclin cascades (C1, C2, C3) • End function: onset of mitosis to cell division 56 Chembiochem 5:1322-33, 2004 Challenges to Systems Biology • A complete characterization of an organism (molecular constituents interactions cell function) • Spatial-temporal molecular characterization of a cell • A thorough systems analysis of “molecular response” of a cell to external/internal perturbations • Information must be integrated into mathematical models to enable knowledge testing by formulating hypothesis and discovery of new biological mechanisms… 57 Cellular Maps? signaling, metabolism, gene regulation … 58