Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Network Biology Data Biological, Conceptual and Computational Issues around Network, System, and Pathway Data The Abstract and The Concrete Topic Outline Lessons from Genome Program and Abstract Ideas to transform data to information when looking at systems data. Two examples of Concrete Tools (ready for use) WebGestalt (for large sets of genes) Ingenuity (for networks) A Concrete Thing: Bioinformatics Resource Center (under development) Other tools under development Human Genome Project (HGP): Genome-encoded “parts list” as data integrator. Past Lessons Directions inproteins. Data… -Common Data Elementsand of geneFuture and gene Products of transcripts and Enabling Integration and Comparison of data in NEW ways… Individualized Genotype data within populations Genome Data Phenotype and System Data GeneKeyDB and related work as an integrative foundation that can help merge with other data. HGP Highlighted some ways to succeed or fail with large data sets. ? Lessons Learned applicable for systems bio of expression, proteomics, genetic data sets? Yes. ?But, are some new approaches needed to understand SYSTEM data? Yes. Genome Data` Biggest Lesson: A Biodata item has 2 questions attached to it…Mayr…HGP showed importance of the why questions in thinking about and organizing data. Other genotype, phenotype, system data Genome Data A datum… How? Why? HGP results and Future Issues for new data…. Genotype + Environment + DEVELOPMENT ==> Phenotype 1) Astounding Results Importance of Network thinking in development and physiology for data to explain phenotype (e.g. PAX6) 2) Some relevance from HGP data approaches, but…Need new bioinformatics tools for network data and thinking… Δ data in Cellular signaling networks Δ data in Regulatory networks Δ data in protein coding Δ data in Cellular signaling A waynetworks of thinking about data… Bioinformatics: Finding the (genotypic, environmental data) difference that makes the (phenotypic data) difference. (Many differences that make an interesting difference, NOT at protein coding, but at complex networks) Δ data in Regulatory networks Δ data in protein coding What is a “Network” way of viewing data… Edges or Lines Nodes or Vertices may be • Undirected vs. directed • Weighted vs. unweighted. May be • Genes • Gene products • Hormones, signals • Metabolites • Publications • Functional Sequence Elements Could be… • Co-expression Networks A• Biological network Gene Regulatory networks can expressed andand signal • be Cell-Cell communication manipulated in terms of transduction networks. theory.” • “graph Phylogenetic relationships among Combinatorial algorithms genes, species, networks: orthology, are needed to analyze paralogy, etc. (trees, clades, etc.) . or other Directed 0.9+ • graphs Gene Ontology Acyclic Graphs. + 1.7 + 1.2 e.g. Alon U. 2003. Science 301: 1866; Barabasi Linked. 2003. Plume Books. Barabasi AL, Oltvai ZN. 2004. Nat. Rev. Genetics 5: 101 What is a “Network” way of viewing data… Nodes or Vertices May be • Genes • Gene products • Hormones, signals • Metabolites • Publications • Functional Sequence Elements Edges or Lines may be • Undirected vs. directed • Weighted vs. unweighted. Tightly connected Experimental correlation modules might be can A• Biological network be expressed and vs. (can befound… undirected) manipulated in terms of Might be loosely analogous to mechanistic & directed “graph theory.” a protein sequence module Combinatorial algorithms that is conserved, duplicated, areandneeded to analyze diverged. Might see similarity. across different graphs tissue, species, etc. + 0.9 + 1.7 + 1.2 e.g. Alon U. 2003. Science 301: 1866; Barabasi Linked. 2003. Plume Books. Barabasi AL, Oltvai ZN. 2004. Nat. Rev. Genetics 5: 101 Data Storage & Collaborative Bioinformatics Existing Knowledge Large Molecular data sets Phenotype Data GeneKeyDB Microarray data, proteome, etc. MuTrack Genetic Data WebQTL Williams et al UTHSC data integration (via GeneKEyDB, BioFoundation) and NeedGene-centered to collaborate, integrate, Comparative, Boolean, other operations on Gene Sets & Networks WebGestalt and Ingenuity two examples COMPARE to are find differences in biological NETWORKS. Integrative Bioinformatics Genotype & Phenotype Data Sets Collaborative, Integrative, and Comparative Bioinformatics Data Comparative Visualization Bioinformatics & Data Mining & Stats Comparative Cladistic Phylogenetic Analysis Graph Algorithms Sequence and Network Modularity Network Analysis CS, Stats, Network modules: Duplicated Diverged Converged WebGestalt Web-based Gene Set Analysis Toolkit http://bioinfo.vanderbilt.edu/webgestalt Bing Zhang Can upload gene sets based on 1)IDs (e.g. affy, locus link, protein IDs from chip, proteome, etc.) 2) Genome Location Or… 3) Gene Ontology (common biological process, molecular function, cellular location) Manipulate data, as set of genes or gene products RNA expression, proteome, genomics, statistical genetics, etc. all produce list of genes that may function in a network. 1 of 3 things to do Boolean operations on multiple sets or retrieving orthologs. 2 of 3 things to do Retrieve Data and other IDs 1 of 3 things to do 3rd thing to do “Unusual” Properties across set e.g. What GO (biological processes, molecular functions, and cellular locations) are in the set? Are they any that seem to occur more than than expected… Co-occurrence of genes and publications (GRIF) Protein Domains in set Chromosome locations in set… Pathways in set (1) Pathways in set (2) Ingenuity A commercial tool for manipulating graphs (networks). VU License http://bioinfo.vanderbilt.edu/wiki/Ingenuity (Also some open source tools, cytoscape, GeNetViz, etc. ) Use of Commercial tool, Ingenuity by Dr N. Deanne and Dr. Beauchamp Pathways (3) Bioinformatics Resource Center Developing a Bioinformatics Resource Center (BRC) that will consist Training infrastructure and applied workshops Support faculty using existing tools and databases (CaBIG, custom statistical packages, NCBI genomics, imaging,molecular structure resources). Collaborative IT Establish accessible databases in shared cores and support faculty using these resources. … Integrative IT Web sites that integrate information from disparate data sets: Comparative IT Systems biology: comparing data across multiple platforms to identify new patterns—tissues and cells, molecular pathways, model organisms, toxins, etc (taken from VUMC Strategic Plan). Other systems… Construction projects that can be further formed by your needs… CollabCore and Lab Blogs Genepedia, GeneKeyDB, BioFoundation Extensions to Webgestalt TFCAT, GeneCAT, CladeCAT, Pazar Acknowledgments Bing Zhang Stefan Kirov Leslie Galloway Barbara Jackson Betty Lou Alspaugh Oakley Crawford Suzanne Baktash Xinxia Peng Harold Shanafield Sam Wang Adam Tebbe Shawn Ericson Jeff Horner A few collaborators… Bonnie LaFleur Shawn Levy Phil Dexheimer Michael Langston CS collaborator Wyeth Wasserman Dan Goldowitz and the TMGC Rob Williams et al WebQtl, etc. Erich Baker Dan Beauchamp Natasha Deanne Chad Johnson