Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM Conference on Functional Genomics and Bioinformatics Approaches to Infectious Disease Research October 8, 2004 Portland, Oregon Database Options for Integrated Functional Genomics Requirements Covers genomics and functional genomics Active and open developer community Options GUS: Genomics Unified Schema Chado: generic model organism database (GMOD http://www.gmod.org) A Few GUS Web Sites U. Penn Sanger Institute U. Georgia Java Servlets DoTS RAD TESS SRES U. Toronto Core U. Chicago Oracle RDBMS Object Layer for Data Loading GUS Flora Centromere Database Phytophthora sojae Virginia genome Bioinformiatics Insitiute GUS (Genomics Unified Schema) http://www.gusdb.org Namespace Domain Features DoTS Sequence and annotation EST clusters Gene models RAD Gene Expression MIAME/MAGE-OM TESS Gene Regulation TFBS organization Sres Shared Resources Ontologies Core Data Provenance Documentation BioMaterial annotation SRES EST clustering and assembly RAD Identify shared TF binding sites DoTS Genomic alignment and comparative sequence analysis TESS Examples of GUS users Large sequencing center Lightly staffed genomics project Multiple plant species: Brett Tyler, Virginia Bioinformatics Institute and collaborators Expression based project CryptoDB: Kissinger Lab, University of Georgia Data mining project GeneDB: Pathogen Sequencing Unit at the Sanger Institute dbDirt: Allen Okey, University of Toronto Bioinformatics Core Facility University of Pennsylvania Bioinformatics Core Facility GUS Project Goals Provide: A platform for broad genomics data integration An infrastructure system for functional genomics Support: Websites with advanced query capabilities Research driven queries and mining GUS components Your data GenBank NRDB dbEST SNPs Genetraps MicroArrays Phenotypes Pathways Orthologs Taxonomy GO SO EC More… Pipeline API Plugins (data loaders) Data Load API Web Development Kit Perl Object Layer Queries And analysis Warehouse (Oracle or PostgreSQL) Functional genomics with GUS Expression (RAD) Proteomics ImmunoHistChem Sequence & Features MIAME Study Sample MIAPE Study MISFISHIE Study Sample Sample In Situ Hybridization Central Dogma Image Analysis Image Analysis Image Analysis Statistical Processing Statistical Processing Statistical Processing Regulation (TESS) Interaction Functional Annotation of the Genome www.mged.org psidev.sf.net www.scgap.org GUS versus chado GUS represents biology in the database tables Forces applications to load and retrieve data consistently Chado represents biology in the applications Allows flexibility in what can be stored but applications may not be consistent Central dogma and sequences Gene Feature RNA Feature NA Sequence Protein Feature AA Sequence Central dogma and sequences Gene RNA Protein Gene Feature RNA Feature Protein Feature NA Sequence AA Sequence Central dogma and sequences Gene RNA Protein RNA Multiple genes Gene 1 Gene 2 Multiple sequences (experimental variety) genome NA Sequence AA Sequence Central dogma and sequences Gene RNA Protein Gene Instance RNA Instance Protein Instance Gene Feature RNA Feature Protein Feature NA Sequence AA Sequence Obtaining and Using GUS www.gusdb.org More info at www.gusdb.org/documentation Active gusdev mailing list Relatively straightforward to install Loading data a struggle for new users Growing number of tools available Addressing how to use and write tools with visits Web Development Kit (WDK) to generate web sites on GUS Current GUS Developers At Penn Steve Fischer: Project manager, WDK, Elisabetta Manduchi: RAD project manager, RAD study annotator Angel Pizarro: Schema development, proteomics, MAGE export Mike Saffitz: DBA, web services, Postgres Dave Barkan: WDK, GO pipeline, Apollo interface Thomas Gan: WDK, genomic alignments pipeline John Iodice: ApiDoTS pipeline, data loading Li Li: OrthoMCL pipeline Junmin Liu: RAD websites, expression displays Debbie Pinney: Data loaders, Hum and MusDoTS pipeline Jonathan Schug: TESS, architecture and schema development Trish Whetzel: Data loading, RAD, schema development Plus rest of group contributes through various GUS-based projects Pathogen Sequencing Unit, Sanger Institute Kissinger Group, U. of Georgia Terry Clark, U. of Chicago WDKTestSite Developed in collaboration with Adrian Tivey& Marie-Adele Rajandream (PSU, Sanger Institute) The PlasmoDB Team Shailesh Date Kobby Essien Martin Fraunholz Bindu Gajria Greg Grant John Iodice Jessie Kissinger Philip Labo Li Li Jules Milgram David Roos Chris Stoeckert Trish Whetzel NIAID grant: R01 AI058515 GUS supports a wide variety of queries Suppose you want to find all kinases in P. falciparum Gene Report Pages Integrate Genomics and Functional Genomics RAD Study-Annotator Covers the MIAME checklist and exploits the MGED Ontology Allows entering of very specific details of an experiment Web-based forms: Modular structure Written in PHP Front-end data integrity checks using JavaScript Manages Data Privacy based on Project/Group selections present in GUS schema Manduchi et al. 2004 Bioinformatics 20:452-459. Vision for GUS Installable for every lab Extendable to all areas of functional genomics Improve install scripts, documentation Postgres version Sequence, array-based expression experiments Array CGH, 2-D gel electrophoresis, mass spectrometry, yeast 2-hybrids In situ hybridizations, metabolites Interoperable with other GUS installations and with common tools Exchange files and scripts, MAGE-ML (use community standards) Web services (exchange objects) Interface with open source tools such as Gbrowse, Artemis, Apollo Standards and Ontologies for Functional Genomics 2 October 23-26, 2004 held at the University of Pennsylvania Medical School www.jax.org/courses/events Co-Hosted by The Jackson Laboratory University of Pennsylvania European Bioinformatics Institute Student Scholarships Available -------------------------------------------------------- Funded in part by NHGRI NCRR NERC GSK Photo by R. Kennedy, B Trist, R. Tarver, for GPTMC