Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CONFIDENTIAL Building and Systems usingBiology protein interaction networks: for Drug Discovery industry perspective Andrej Bugrim GeneGo, Inc. Copyright GeneGo 2000-2003 Topics CONFIDENTIAL • Annotation process and collecting network content for idustrial-type applications • Biological and disease ontologies – how to improve and use them in functional analysis • Tools: utilizing network data in pharmaceutical R&D Copyright GeneGo 2000-2003 Multi-level understanding of human biology Causative relations CONFIDENTIAL Level of phenotype Level of Cell process/ network Mechanistic relations Level of protein Copyright GeneGo 2000-2003 Disease-centered knowledge base in MetaMiner (Oncology example) CONFIDENTIAL GG annotation team Disease group Network group Specialty group Chemistry group Causative disease associations: DNA, RNA, protein levels Protein-protein; Protein-DNA; protein-RNA interactions Biomarkers Ligand-receptor interactions: drugs, leads, hits Compare General BC schema Causative BC models BC-perturbed cell processes Other cancers chosen by Consortium Copyright GeneGo 2000-2003 CONFIDENTIAL Content Copyright GeneGo 2000-2003 Three interactions domains in MetaCore Ligands: metabolites, peptides, xenoboitics Membrane receptors Signal transduction: G proteins, Secondary messengers Kinases Phosphotases Transcription factors CONFIDENTIAL •1,600 drugs w/targets • 4,100 endogenous metabolites •>21,000 ligand-receptor interactions •850 GPCRs and other membrane receptors •110 Nuclear hormone receptors 172K manually curated physical signaling interactions 538 canonical maps 42,000 13-step canonical signal transduction pathways 924 Human transcription factors 6,000 target genes 11,300 metabolic reactions Core effect: metabolic pathways 116 Fine metabolic maps Metabolites 4,100 endogenous metabolites Copyright GeneGo 2000-2003 MetaBase Content Overview – Database • Chemical compounds • Drugs • Chemical Reactions • Metabolic networks – Network • Proteins + genes • Transcription factors • Chemical compounds • Drugs • Endogenous compounds • Proteins linked to drugs • Reactions • Small molecule ligands for human receptors • blockers for ion channels • Pubmed journals • Pubmed articles • Total amount of interactions177,000 – Content • GeneGo regulatory networks • GeneGo disease networks • Maps • Regulatory maps • Metabolic maps • Traditional metabolic maps (EC) • Diseases CONFIDENTIAL 580,000 8,590 35,600 251 13,402 924 26,000 2,740 4,100 2,711 5,330 3,510 629 3,100 81,400 120 88 538 325 116 97 4,920 Copyright GeneGo 2000-2003 CONFIDENTIAL MetaBase content by type Database Chemical compounds 580,000 Reaction substrates with kinetic data 3,580 27,418 Compounds with structures 25,662 Compounds in network Metabolic reactions Genes (human: 35,600 38,700) Total:137,500 Human proteins 14,570 15,700 Compounds in reacts Drug metabolites Drugs Endogenous compounds 3,422 8,590 4,100 Copyright GeneGo 2000-2003 CONFIDENTIAL Network interactions Manually curated interactions (172,787) Protein-protein interactions Logical relations; 1,934; 1% Signalling interactions; 137,297; 79% Protein-protein; 87,675; 51% Activation/ inhibition via binding; 43,079; 52% Covalent modification; 5,967; 8% Unspecified regulation; 3,990; 5% Regulation of transcription; 15,725; 21% Influence on expression; 10,120; 14% Small molecule-protein; 42,383; 26% Small molecule-protein Metabolic reactions; 35,490; 21% Y2H "Interactome"; 2,370; 1% With virus protiens; 335; 0% Chip-Chip; 980; 1% With MicroRNA; 1,620; 1% All interactions taken from articles indexed in Pubmed Pubmed journals 3,100 Pubmed articles 81,400 Binding to receptors; 14,497; 34% Regulation of other proteins; 6,218; 15% Regulation of enzymes; 8,898; 21% Binding to kinazes; 6,984; 16% Regulation of transporters; 5,786; 14% Copyright GeneGo 2000-2003 CONFIDENTIAL Type of interactions in network Effects activation Direct interaction Indirect interaction inhibition Mechanism Mechanism phosphorylation influence expression dephosphorylation unspecified unspecified on other type of covalent modification binding transport cleavage transcription regulation transformation catalysis competition Copyright GeneGo 2000-2003 CONFIDENTIAL Distribution of interactions by mechanism influence on expression 12% unspecified 6.4% phosphorylation 4.1% binding 48% dephosphorylation 0.5% covalent modification 1% transport 2% cleavage 2% competition 0.1% catalysis 8% transformation 1% transcription regulation 15% Copyright GeneGo 2000-2003 CONFIDENTIAL Network objects Total number of nodes: 40,229 Network objects Metabolites of xenobiotics; 1,924 Enzymes; 2,910 Metabolic reactions; 5,353 Kinazes; 626 Phosphatases; 137 Xenobiotic compounds; 15,955 Metabolic reactions; 5,353 Proteins; 13,406 Proteases; 352 Transcription factors; 924 membrane receptors; 764 Receptor Ligands; 640 Nuclear hormone receptors; 110 Chemical compounds ; 25,662 Drug metabolites; 1,032 Drugs; 2,741 Transporters; 804 Ion Channels; 217 Other; 5,922 Endogenous compounds; 4,010 Copyright GeneGo 2000-2003 CONFIDENTIAL Proteins: distribution by tissue & localization Proteins: distribution by tissue Proteins: distribution by cell compartment 4452 Com m on for all these tissues 7484 Uteri 7695 Upper GI Tract Trachea 18107 Unspecified 7471 823 nucleus 7758 Tonsil integral to plasma membrane 7427 Thyroid Testes 7803 Spleen 530 cytoplasm 6761 Thym us 684 399 plasma membrane 7064 7788 Skin Salivary Gland 6241 Retina 8376 5715 Pancreas 249 membrane fraction 226 178 147 mitochondrion 6961 Placenta extracellular region extracellular space 7377 Prostate 335 integral to membrane 7655 Spinal Cord 126 soluble fraction Ovary 7263 100 membrane 7150 Marrow Mam m ary Gland 7430 cytosol 94 Lung 7485 endoplasmic reticulum 91 Liver 6888 Kidney intracellular 56 Golgi apparatus 56 proteinaceous extracellular matrix 54 cytoskeleton 48 actin cytoskeleton 44 lysosome 42 7247 7064 Heart 7236 Colon Brain 7365 Adrenal Gland 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 7181 1 10 100 1,000 10,000 100,000 Copyright GeneGo 2000-2003 Molecular functions in Database signal transducer activity; 2535; 13% catalytic activity; 4086; 23% CONFIDENTIAL transcription regulator activity; 1396; 7% transporter activity; 1078; 6% enzym e regulator activity; 599; 3% structural m olecule activity; 459; 2% translation regulator activity; 75; 0% m otor activity; 77; 0% antioxidant activity; 51; 0% chaperone regulator activity; 11; 0% chem oattractant activity; 8; 0% chem orepellant activity; 3; 0% binding; 8503; 46% Copyright GeneGo 2000-2003 Endogenous compounds (4,100 total) CONFIDENTIAL •3,070 endogenous compounds involved in metabolic reactions: 6,819 reactions with endogenous compounds only •751 endogenous ligand for 498 receptors with 2,455 interactions •4000 (98%) of endogenous compounds in network •15,962 network interactions with endogenous metabolites •3,600 compounds with structures and brutto-formulas (other 700 are “generic”: contain acyl-, alkyl- and other variable groups) Endogenous compounds by origin Other 19% Steroids Fatty Acids 4% 5% Nucleotides 2% Vitamins/Cofactors 6% Peptides 10% Carbohydrates 15% Lipids 43% Copyright GeneGo 2000-2003 Network and pathway statistics in GeneGO • • • • • CONFIDENTIAL >40,000 nodes; ~177,000 edges; Average node degree: 3,77; 241 million shortest pathways; Average shortest pathway length: 5.3811; • 42,000 13-step canonical signal transduction pathways; • 200 canonical metabolic pathways- major metabolic fluxes like glycolysis or TCA; • 72,000 pathways on metabolic maps: pathways analogous to KEGG (KEGG has 42,500) Enzyme1 reaction1 metabolite reaction2 Enzyme2 Copyright GeneGo 2000-2003 CONFIDENTIAL Pathways in regulatory network Start: TMR (transmembrane receptor) TF (Transcription Factor) +P B Tr Tr B kinase A 1 C B +P B kinase Tr B Z B a B B D 2 Tr B +P kinase ab 3 End: Target genes Copyright GeneGo 2000-2003 CONFIDENTIAL Ontologies Copyright GeneGo 2000-2003 CONFIDENTIAL Knowledge base (ontologies) By genre: - Drama - Action - Romance - Horror - Foreign By director: - Lynch -Tarantino - Leone - Stone - Antonioni By actor: - Pitt - Nicholson - Depp - Redford - Damon By year: -2007 -2006 -2005 -2004 -2003 • How do you compare “action” movies vs. Tarantino movies vs. 2003 movies? •These are incomparable as these are different categories Molecular pthwy Cellular process Disease Metabolic process Mixed ontologies Copyright GeneGo 2000-2003 CONFIDENTIAL Multiple ontologies in MetaDiscovery Platform: multi-dimensional knowledge base on human biology Copyright GeneGo 2000-2003 Enrichment in GO and GeneGo processes GO processes CONFIDENTIAL GeneGo process networks • Resolution: interactions between proteins • Connections between all proteins in folder • Clear signaling path, effect within process • Resolution: list of proteins • No connections between proteins • No sgnaling/effect within process •4 samples from 4 patiens •Disease/norm from same patients •Affy U133A arrays Copyright GeneGo 2000-2003 CONFIDENTIAL Inflammation Genes from GO process “Inflammatory response” Genes from GO-processes “Inflammatory response” “Immune response” 231 Genes from GO-process “Immune response” 446 613 Not in networks Not in networks 79 199 In networks In networks 152 247 Not in networks In networks 268 345 Genes in 15 process networks Genes added to networks 1642 1297 Copyright GeneGo 2000-2003 CONFIDENTIAL Diseases 38,709 Human genes total 4,881 Diseases, based on MeSH 17% Human genes linked to diseases – 6,318 83% Human genes not linked to diseases – 32,391 21,264 unique articles, indexed in PubMed 34% Diseases linked to genes – 1,630 Diseases with no gene links – 3,251 66% 6,318 genes are linked to 1,630 diseases Copyright GeneGo 2000-2003 Disease tree – Neoplasms by Site CONFIDENTIAL Copyright GeneGo 2000-2003 Drug toxicity tree CONFIDENTIAL 38 Drug-induced pathological processes Folders from MeSH Folders created at GeneGo based on reviews Copyright GeneGo 2000-2003 Gene-Disease connections in public domain and GeneGo OMIM GENE Only genetic info (mutation, SNPs) -No expression - No protein activity, loc CONFIDENTIAL MeSH Only citation with Diseases name. Low trust Only hierarchical structure disease tree Public domain does not have structured information about disease connectivity(by clinical classification) and causative relations withgenes and proteins GeneGo •Hierarchical strusture disease classification •Genes associated with diseases •Cited articles 4,888 diseases 6,429 33, 792 Copyright GeneGo 2000-2003 CONFIDENTIAL Content. Cancer maps and networks. Breast Cancer: general scheme Copyright GeneGo 2000-2003 Angiogenesis in tumor growth CONFIDENTIAL Copyright GeneGo 2000-2003 Fine metabolic differences between rodents, human CONFIDENTIAL Unique genes Human Mouse, Rat Unique genes and orthologs catalyse one reaction 141 mouse genes 74 rat genes There is no human orthologs for Protein A Unique genes catalyze unique reactions 9 mouse genes 2 rat genes Orthologs catalyse different reactions 1 mouse gene 1 rat gene Copyright GeneGo 2000-2003 CONFIDENTIAL Tools Copyright GeneGo 2000-2003 Data analysis workflow in MetaDiscovery suit Molecular bio data Metabolites Structures sdf, MOL HTS, HCS ISIS DB Custom interactions data: -Y2H -Pull-down -Co-expression - annotation HTS, HCS MetaLink PathwayEditor CONFIDENTIAL Custom maps, networks, pathways MapEditor MetaCore/MetaDrug platform Signature networks -Diseases -Drug response P-value scoring Ontologies: -GO processes -GeneGo processes -Canonical pathways -Metabolic networks -Diseases -Toxicities Cross-experiment comparison -Time series - Multi-patient cohorts - Multiple logical operations -Complete report Network alignment - Multiple algorithms - Sub-network queries SBML, BioPax Med. chemistry: - Indications - Toxicities - Off-site effects Biology: - Biomarkers - Pathway-based targets Modeling software: -CellDesigner - Virtual Cell - Copyright GeneGo 2000-2003 CONFIDENTIAL MetaCore™ Platform Pathway editor Statistics for pathways, processes, networks Data:m-arrays, SAGE, proteomics, siRNA, metabolites, custom interactions Logical operations module Networks Building Tools Visualization Tools curated interactions from the literature Oracle Based Database Copyright GeneGo 2000-2003 Pathways Integration CONFIDENTIAL Interactive, static maps – 550 maps – Signaling, regulation, metabolism, diseases – Backbone of formalized “state of art” in the field Networks of protein interactions – Dynamic; built “on-the-fly” – Exploratory tool – Build new pathways for genes of interest Copyright GeneGo 2000-2003 Choose direction and checkpoints within network building page CONFIDENTIAL From – histamine through – histamine H1 receptor to – Actin Copyright GeneGo 2000-2003 CONFIDENTIAL False discovery rate filter Threshold 0.01 Apply i Non-significant bars become semi-transparent Copyright GeneGo 2000-2003 New customization modules CONFIDENTIAL • MapEditor: custom maps synchronized with MC/MD database – Draw pathways maps from scratch – Transform gene lists into networks into pathway maps – Edit MetaCore’s canonical maps – View and score your maps within the context of canonical maps – Map experimental data on custom maps • MetaLink: overlaying custom interactions – Import custom interactions (Y2H, co-expression, pull-down, etc.) – Visualize using GeneGo network building algorithms – Score “unknown” proteins (high IP potential) based on relevance to “benchmark” networks built from MetaCore interactions • PathwayEditor: annotation technology transfer, at the database level – Custom annotation of interactions, compounds, diseases, metabolism in the framework of internal annotation system at GeneGo – Use the annotation forms, workflows and QC system developed at GeneGo – Novel objects are imported and integrated with pre-existing data in MetaCore Copyright GeneGo 2000-2003 Adding Localizations CONFIDENTIAL Additional Localizations can be added Copyright GeneGo 2000-2003 Your NEW map is now an interactive part of MetaCore CONFIDENTIAL Users can visualize their experimental data on the new map Copyright GeneGo 2000-2003 Mapping interaction sets on networks CONFIDENTIAL Resulting Direct Interactions network Pink interactions are from the uploaded links file Mouse over an interaction to see the uploaded weight value Blue interactions are in both the links file and the MetaCore database Copyright GeneGo 2000-2003 CONFIDENTIAL Algorithms Copyright GeneGo 2000-2003 CONFIDENTIAL Old and new ways to analyze data Current way of analysis: all significance calculations done before mapping onto network Full data tables Statistical procedures, thresholds of fold, pvalue either in MC or 3rd party tools Sets of genes Connect them on network by one way or another: Too many choices, no clear way to choose New way of analysis: significance calculations follow the mapping onto network Full data tables Apply to global network Statistical procedures in MC based on concurrent analysis of expression profiles and connectivity Sets of network modules Copyright GeneGo 2000-2003 Samples are analyzed in pathway’s expression space CONFIDENTIAL Sample 1 Sample 2 Sample 3 Sample 4 Gene 1 1 4 3 2 Gene 2 4 2 7 6 Gene 3 2 9 3 8 Gene 4 2 5 4 2 Copyright GeneGo 2000-2003 Network signatures for compounds effects Mestranol Tamoxifen CONFIDENTIAL Phenobarbital Phenobarbital Copyright GeneGo 2000-2003 CONFIDENTIAL Finding topologically significant nodes Topologically significant B 4 out 6 under nodes regulated by B are differentially expressed: more than random share = significant A Not topologically significant C Only 1 out of 6 nodes regulated by C is differentially expressed: could be due to random event = not significant In reality algorithm also considers nodes beyond first-degree neighbors Differentially expressed genes Non-differentially expressed genes Copyright GeneGo 2000-2003 Why JAK1 is significant in this dataset? CONFIDENTIAL Regulation via JAK1 Feedback loops JAK1 provides essential network conduit between PLAUR and many differentially expressed targets of STAT1 Topological significance helps to find important links in pathways that do not come up on HT screens Copyright GeneGo 2000-2003 Regulation of lipid Metabolism Topologically significant nodes revealed by the new algorithm CONFIDENTIAL Differentially expressed genes identified by microarray and confirmed by proteomic screen Copyright GeneGo 2000-2003 CONFIDENTIAL Putting it all together: network activity inference – Identifying causal relation between putative input and output signals – Tracking effects of molecular perturbation trough activation/inhibition cascades Predicted input Scoring intermediary nodes Experimental data Experimental data: terminate cascade Predicted target Experimental data: start cascade Z Z Z Inferred activity Copyright GeneGo 2000-2003 Work in progress • • • • CONFIDENTIAL Finding Patterns of significance (based on one experiment): – Significant neighborhoods – Significant receptors (by underlying cascade) – Significant transcription factors (by upstream cascade) – Significant interaction types (by distribution of expression at terminals) Finding common and different pathway modules (based on multiple samples: – Looking for “differential pathways” - modules that distinguish one group of samples from another – Finding common motifs in a group of pathway modules Inferring patterns of network activity – Identifying causal relation between putative input and output signals – Tracking effects of molecular perturbation trough activation/inhibition cascades Looking into mutual gene-process information and Bayesian inference of significance – If gene G occurs only in process P its up-/down-regulation is a significant evidence with respect to inferring P’s status – If gene G occurs in many other processes in addition to P its up-/downregulation is not a significant evidence with respect of inferring P’s status Copyright GeneGo 2000-2003 CONFIDENTIAL Future products Copyright GeneGo 2000-2003 MetaMiner Consortiums for 2007 CONFIDENTIAL • Oncology (breast cancer, 4 other cancers) • Metabolic diseases (diabetes II, obesity, metabolic syndrome) • CNS and neurodegenerative diseases • Immunological and autoimmune diseases Copyright GeneGo 2000-2003 MetaMiner consortiums: Analytical platform for disease areas HTS, HCS Cancer consortium labs CONFIDENTIAL Cancer relevant annotations, datatabases, Active cpds analysis creening MetaMiner (Oncology) platform Biomarkers: -Combination of different types - Expression - Secreted proteins - Metabolites -Convergence hubs (core effectors) Drug targets: -Divergence hubs on networks; - “Druggability” testing - Pathways connectivity Data parsing, normalization Experimental data depository • Maps for disease, processes, drug action • Custom maps for projects Compounds scoring: - Indications - Toxicities - Off-site effects Data analysis Copyright GeneGo 2000-2003 MetaTox consortium. Functional descriptors CONFIDENTIAL Mapping on descriptors Enrichment by category Pathways maps Toxicity, process maps Sub-networks, modules, nodes Predictive models Indexing & scoring by tox. category Copyright GeneGo 2000-2003