* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Simplex sigillum veri New approaches to the analysis of
Ancestral sequence reconstruction wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Citric acid cycle wikipedia , lookup
Proteolysis wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene expression wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Expression vector wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Point mutation wikipedia , lookup
Gene nomenclature wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Metabolomics wikipedia , lookup
Magnesium transporter wikipedia , lookup
Oxidative phosphorylation wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Interactome wikipedia , lookup
Gene expression profiling wikipedia , lookup
NADH:ubiquinone oxidoreductase (H+-translocating) wikipedia , lookup
Evidence networks for the analysis of biological systems Rainer Breitling IBLS – Molecular Plant Science group Bioinformatics Research Centre University of Glasgow, Scotland, UK Background Datasets and evidence networks in post-genomic biology Genomics Fully sequenced genomes (1995-2004): 18 archaea 163 bacteria 3 protozoa 24 yeast species and fungi 2 plants (Arabidopsis, rice) 2 insects (flies, honey bee) 2 worms (C.elegans, C. briggsae) 3 fish (fugu, puffer, zebrafish) chicken, cow, dog, mouse, rat, chimp human lots of “lists” of genes Transcriptomics •microarrays measure gene expression levels (mRNA concentrations) •relative or absolute values •in organisms, tissues, cells •produce gene lists (e.g., which genes are up-regulated by a disease, by drug treatment, in a certain tissue) Proteomics •2D gels, liquid chromatography, and mass spectrometry measure protein concentrations •in tissues, cells, organelles •detect chemical modifications and processing of proteins •produces lists of protein variants that are different among conditions Metabolomics •chromatography and mass spectrometry measure metabolite concentrations •in tissues, cells, body fluids, cell culture medium •produces lists of affected metabolites Evidence networks • relate items (genes, proteins, metabolites) that “have something to do with each other” • relationship is based on objective evidence • represented as bipartite graphs – two classes of nodes: items and evidence – automated analysis of results possible – intuitive visualization and links to literature Types of evidence networks • Relationship can be based on – physical neighborhood – phyletic pattern similarity – expressional correlation – biophysical similarity – chemical transformation – functional co-operation – literature co-citations Types of evidence networks • Relationship can be based on – physical neighborhood – phyletic pattern similarity – expressional correlation – biophysical similarity – chemical transformation – functional co-operation – literature co-citations A O M P K Z Y Q V D R L B C E F G H S N U J X I T W phy: a o m p k z y - - d - l - - - - - - - - - - - i t – 22 aompkzy--d-l-----------it- NtpA [C] H+-ATPase subunit A 17 aompkzy--d-l-----------it- NtpB [C] H+-ATPase subunit B 17 aompkzy--d-l-----------it- NtpD [C] H+-ATPase subunit D 18 aompkzy--d-l-----------it- NtpI [C] H+-ATPase subunit I Types of evidence networks • Relationship can be based on – physical neighborhood – phyletic pattern similarity – expressional correlation – biophysical similarity – chemical transformation – functional co-operation – literature co-citations Types of evidence networks • Relationship can be based on – physical neighborhood – phyletic pattern similarity – expressional correlation – biophysical similarity – chemical transformation – functional co-operation – literature co-citations Types of evidence networks • Relationship can be based on – physical neighborhood – phyletic pattern similarity – expressional correlation – biophysical similarity – chemical transformation – functional co-operation – literature co-citations Types of evidence networks • Relationship can be based on – physical neighborhood – phyletic pattern similarity – expressional correlation – biophysical similarity – chemical transformation – functional co-operation – literature co-citations Types of evidence networks • Relationship can be based on – physical neighborhood – phyletic pattern similarity – expressional correlation – biophysical similarity – chemical transformation – functional co-operation – literature co-citations What is the big picture? Graph-based iterative Group Analysis for the automated interpretation of biological datasets lists + graphs = understanding What does this list mean? Fold-Change Gene Symbol Gene Title 1 26.45 TNFAIP6 tumor necrosis factor, alpha-induced protein 6 2 25.79 THBS1 thrombospondin 1 3 23.08 SERPINE2 serine (or cysteine) proteinase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 2 4 21.5 PTX3 pentaxin-related gene, rapidly induced by IL-1 beta 5 18.82 THBS1 thrombospondin 1 6 16.68 CXCL10 chemokine (C-X-C motif) ligand 10 7 18.23 CCL4 chemokine (C-C motif) ligand 4 8 14.85 SOD2 superoxide dismutase 2, mitochondrial 9 13.62 IL1B interleukin 1, beta 10 11.53 CCL20 chemokine (C-C motif) ligand 20 11 11.82 CCL3 chemokine (C-C motif) ligand 3 12 11.27 SOD2 superoxide dismutase 2, mitochondrial 13 10.89 GCH1 GTP cyclohydrolase 1 (dopa-responsive dystonia) 14 10.73 IL8 interleukin 8 15 9.98 ICAM1 intercellular adhesion molecule 1 (CD54), human rhinovirus receptor 16 9.97 SLC2A6 solute carrier family 2 (facilitated glucose transporter), member 6 17 8.36 BCL2A1 BCL2-related protein A1 18 7.33 TNFAIP2 tumor necrosis factor, alpha-induced protein 2 19 6.97 SERPINB2 serine (or cysteine) proteinase inhibitor, clade B (ovalbumin), member 2 20 6.69 MAFB v-maf musculoaponeurotic fibrosarcoma oncogene homolog B (avian) iterative Group Analysis (iGA) iGA uses simple hypergeometric distribution to obtain p-values Breitling et al., BMC Bioinformatics, 2004, 5:34 Graph-based iGA Breitling et al., BMC Bioinformatics, 2004, 5:100 Graph-based iGA 1. step: build the network Breitling et al., BMC Bioinformatics, 2004, 5:100 Graph-based iGA 2. step: assign ranks to genes Breitling et al., BMC Bioinformatics, 2004, 5:100 Graph-based iGA 3. step: find local minima p = 1/8 = 0.125 p = 6/8 = 0.75 p = 2/8 = 0.25 Breitling et al., BMC Bioinformatics, 2004, 5:100 Graph-based iGA 4. step: extend subgraph from minima p=0.014 p=0.018 p=0.125 p=1 Breitling et al., BMC Bioinformatics, 2004, 5:100 Graph-based iGA 5. step: select p-value minimum p=0.014 p=0.018 p=0.125 p=1 Breitling et al., BMC Bioinformatics, 2004, 5:100 Advantages of GiGA • fast, unbiased and comprehensive analysis • assignment of statistical significance values to interpretation • detection of significant changes even if data are too noisy to reliably detect changed genes • statistically meaningful interpretation already without replication experiments • detection of patterns even for small absolute changes • flexible use of annotations + intuitive visualization Example 1 Microarrays Gene expression changes during the yeast diauxic shift Yeast diauxic shift study DeRisi et al. (1997)Science 278: 680-6 Yeast diauxic shift study 0h UP 9.5h 11.5h 13.5h 15.5h 18.5h 20.5h 6144 - purine base metabolism 6099 - tricarboxylic acid cycle 6099 - tricarboxylic acid cycle 3773 - heat shock protein activity 6099 - tricarboxylic acid cycle 9277 - cell wall (sensu Fungi) 3773 - heat shock protein activity 5749 - respiratory chain complex II (sensu Eukarya) 6099 - tricarboxylic acid cycle 3773 - heat shock protein activity 297 - spermine transporter activity 6950 - response to stress 6121 - oxidative phosphorylation, succinate to ubiquinone 5977 - glycogen metabolism 5749 - respiratory chain complex II (sensu Eukarya) 15846 - polyamine transport 297 - spermine transporter activity 8177 - succinate dehydrogenase (ubiquinone) activity 6950 - response to stress 6121 - oxidative phosphorylation, succinate to ubiquinone 4373 - glycogen (starch) synthase activity 3773 - heat shock protein activity 4373 - glycogen (starch) synthase activity 8177 - succinate dehydrogenase (ubiquinone) activity 15846 - polyamine transport 4373 - glycogen (starch) synthase activity 4129 - cytochrome c oxidase activity 6537 - glutamate biosynthesis 5353 - fructose transporter activity 7039 - vacuolar protein catabolism 5751 - respiratory chain complex IV (sensu Eukarya) 6097 - glyoxylate cycle 15578 - mannose transporter activity 6950 - response to stress 5749 - respiratory chain complex II (sensu Eukarya) 5750 - respiratory chain complex III (sensu Eukarya) 7039 - vacuolar protein catabolism 4129 - cytochrome c oxidase activity 6121 - oxidative phosphorylation, succinate to ubiquinone 9060 - aerobic respiration 8645 - hexose transport 5751 - respiratory chain complex IV (sensu Eukarya) 8177 - succinate dehydrogenase (ubiquinone) activity 4129 - cytochrome c oxidase activity GiGA results – diauxic shift Down-regulated genes using GeneOntology-based network locus gene description ("anchor gene") p-value members max. rank YHL015W ribosomal protein S20 5.87E-86 39 48 YMR217W GMP synthase 3.38E-13 9 172 YDR144C aspartyl protease|related to Yap3p 4.06E-08 6 242 YNL065W multidrug resistance transporter 4.02E-05 3 141 6.41E-05 4 367 YLR062C YGL225W May regulate Golgi function and glycosylation in Golgi 1.12E-04 4 422 YPR074C transketolase 1 1.44E-04 4 449 total genes measured in network: 4087. small ribosomal subunit large ribosomal subunit nucleolar rRNA processing translational elongation GiGA case study – diauxic shift Up-regulated genes using metabolic network locus gene description p-value members max. rank YER065C isocitrate lyase 4.96E-53 39 54 YGR088W catalase T 3.09E-10 11 106 YFR015C glycogen synthase (UDP-glucose-starch glucosyltransferase) 2.08E-04 3 45 YJR073C unsaturated phospholipid N-methyltransferase 3.85E-04 5 156 YDR001C neutral trehalase 5.01E-04 3 60 YCR014C DNA polymerase IV 5.44E-04 17 481 YIR038C glutathione transferase 8.64E-04 5 183 total genes measured in network: 744. respiratory chain complex II glyoxylate cycle citrate (TCA) cycle oxidative phosphorylation (complex V) respiratory chain complex III respiratory chain complex IV Example 2 Metabolomics Changes in metabolic profiles in drug-treated trypanosomes GiGA applied to metabolomics data • Challenge: No annotation available • Solution: Build evidence network based on hypothetical reactions between observed masses (=mass differences) Metabolite tree of mass 257.1028 (glycerylphosphorylcholine) 6 generations Metabolite tree of mass 257.1028 4 generations Metabolite tree of mass 257.1028 2 generations Metabolite tree of mass 257.1028 colors indicate changes of metabolite signals compared to untreated samples after 60 min pentamidine (red = down, green = up) GiGA metabolite trees for one experimental example Choline tree found by GiGA (most significant subgraph, p<10-13) extracted from Summary • post-genomic technologies produces “lists” • neighborhood relationships yield “evidence networks (graphs) • lists + graphs = biological insights • GiGA graph analysis highlights and connects relevant areas in the “evidence network” Acknowledgements • Pawel Herzyk – Sir Henry Wellcome Functional Genomics Facility • Anna Amtmann & Patrick Armengaud – IBLS Molecular Plant Science group • Mike Barrett – IBLS Parasitology Research group • FGF academic users: Wilhelmina Behan, Simone Boldt, Anna Casburn-Jones, Gillian Douce, Paul Everest, Michael Farthing, Heather Johnston, Walter Kolch, Peter O'Shaughnessy, Susan Pyne, Rosemary Smith, Hawys Williams Contact Rainer Breitling Bioinformatics Research Centre Davidson Building A416 University of Glasgow, Scotland, UK [email protected] http://www.brc.dcs.gla.ac.uk/~rb106x