Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Swiss-Prot Fortelaza • Sybil: Comparative Analysis System • Gemina: Epidemiological Resource Owen White July 31st, 2006 ISMB POSTERS • Sam Angiuoli Ergatis/Sybil Poster H-82 • Aaron Gussman Gemina Poster B-46 Swiss-Prot Fortelaza • Sybil: Comparative Analysis System • Gemina: Epidemiological Resource Sybil searches and computes • Searches: – All v all blastp searches – Mummer: Nucleotide/Protein SNPs • Clustering: – – – – – Evaluation of proteins-match networks Scoring system set by user COGs – bidirectional best hits Jaccard COG-clustering for transitive closure Also paralogs • Syntenic block: – Collections of J-COG between species – Runs of J genes without K non-homologous intervening genes How Sybil Computes are Performed • Blast • Position effect (conserved gene order) • MUMmer blastp COG BSML Dumper PE – SNPs • PROmer • Gene families – COGs – Paralogs MUMmer SNPs BSML Loader PROmer Primary output: BSML-XML Data Prep For Comparative Analysis GMOD Consortium GenBank Files EMBL Files Custom Files BSML GFF3 Chado blastp COG BSML Dumper PE MUMmer PROmer SNPs BSML Loader Jaccard Clustering Jaccard-filtered Orthologs Match Reduction Fig. 6. Using a minimal spanning tree (MST) algorithm to remove redundant matches. Protein cluster image before (left) and after (right) applying the MST filter. Sybil: Chromosomal summaries Preferences for pop-up displays are user configurable. Jaccard-filtered COGs Syntenic blocks Fig. 1. Whole genome alignment of GBS strains Tettelin, Hervé et al. (2005) Proc. Natl. Acad. Sci. USA 102, 13950-13955 Copyright ©2005 by the National Academy of Sciences Fig. 2. GBS core genome Tettelin, Hervé et al. (2005) Proc. Natl. Acad. Sci. USA 102, 13950-13955 Copyright ©2005 by the National Academy of Sciences Fig. 3. GBS pan-genome Tettelin, Hervé et al. (2005) Proc. Natl. Acad. Sci. USA 102, 13950-13955 Copyright ©2005 by the National Academy of Sciences Other Sybil Features • • • • Open source. sybil.sourceforge.net Complete demo database Other packages: – – – – – Chado relation database BSML XML (Bioinformatic Sequence Markup Language) Bioperl (Lincoln Stein's Bio::Graphics package) Apache Batik SVG toolkit MUMmer suffix-tree alignment tools Important: To run Sybil • You must load data into Chado. • We have Flat file BSML parsers • To be released as open source. Ergatis: latest discussion. Sam Angiuoli Ergatis/Sybil Poster H-82 • me: then when're we releasing ergatis? • Sam: so, the plan was that all these scripts would just come bundled with Ergatis • me: right. • Sam: we need a deadline • me: oh. is this on the record? I think I'll just put this chat in my power point for tomorrow. • Sam: i reallly don't think there is that much we need to do in order to release it. most of the concerns will be about how a user can install and configure it to point to their installs of all the 3rd party search tools they'd want to use. Swiss-Prot Fortelaza • Sybil: Comparative Analysis System • Gemina: Epidemiological Resource Defining Infection Systems Pathogen Host Transmission Method Bacterial and Viral pathogens Anatomy Disease Blood and blood-forming organs diseases NIAID Category A, B & C Priority Pathogens Circulatory system diseases human and animal animal structure Complications of pregnancy body region Digestive system diseases cardiovascular system Genitourinary system diseases cell direct indirect mechanical vector-borne Symptom digestive system endocrine system mechanical vector-borne Reservoir Geographic Location Infection Systems distinguish modes of transmission, hosts, disease Pathogen Host Transmission Method Anatomy Disease Clostridium botulinum C Bos taurus indirect: vehicle-borne ingestion gastrointestincal (GI) tract Foodborne botulism Clostridium botulinum F Homo sapiens indirect: vehicle-borne ingestion gastrointestincal (GI) tract Infant botulism Clostridium botulinum B Homo sapiens direct: contact skin Wound botulism Clostridium botulinum Homo sapiens Indirect: airborne respiratory tract Botulism Mycobacterium tuberculosis Homo sapiens direct: droplet spread respiratory tract Tuberculosis Mycobacterium tuberculosis Homo sapiens Indirect: airborne respiratory tract Tuberculosis Mycobacterium tuberculosis Homo sapiens brain Meningitis Mycobacterium tuberculosis Pan troglodytes lymph nodes Tuberculosis Indirect: airborne Ontologies & Controlled Vocabularies in Gemina • infectious disease and body system oriented • hierarchical query and retrieval • Mapping of terms from newly defined threat_systems and MRS terms disease – anatomy – symptom – transmission method – reservoir – geographic location (1667) (1322) (424) disease +diseases of the respiratory system +infectious and parasitic diseases +arthropod-borne viral disease +intestinal infectious diseases +other bacterial diseases +bacterial infection +gas gangrene +staphylococcus infection +tetanus (16) (243) reservoir +animal reservoir +arthropod +mollusc +environmental reservoir +soil +food +human reservoir +blood +respiratory tract (964) Anatomy Ontology +Animal_structure +Body_region +Cardiovascular_system +Cell +Digestive_system +Embryonic_structure +Endocrine_system +Fluids_and_secretions +Hemic_and_immune_system +Integumentary_system +Musculoskeletal_system +Nervous_system +Respiratory_system +Sense_organ +Stomatognathic_system +Tissue +Urogenital_system Respiratory_system + larynx + lung + pharynx + nasopharynx + oropharynx +Africa +Americas +Caribbean +Central America +North America +South America +Argentina +Bolivia +Brazil +North Region +Northeast Region +Rio Grando do Norte +Sergipe + Fortaleza +Central West Region +Antarctic Regions +Arctic Regions +Asia +Atlantic Islands +Europe +Indian Ocean Islands +Oceania +Oceans and Seas +World Wide Geographic location Gemina query page: select topic tabs to add terms to Selection Summary Scroll down the list of choices or click on Tree view to navigate the hierarchy of terms Query Anatomy Ontology for terms including ‘tissue’ Identify Infection Systems involving nerve tissue, select, add to Selection Box Gemina Search Results View and sort Infection Systems by topic. Unique ID. Navigate back to the Gemina Query Page Curated GEMINA Infection Systems (as of July 28th, 2006) NIAID Category Pathogen Number of Infection Systems Number of Geographic Locations Total 22 1616 3852 A Bacillus anthracis 18 - A Clostridium botulinum 61 257 A Francisella tularensis 44 18 A Yersinia pestis 33 48 B Brucella abortus 3 - B Brucella canis 7 - B Brucella melitensis 18 - B Brucella spp. 11 - B Brucella suis 15 - B Burkholderia mallei 55 47 B Burkholderia pseudomallei 210 108 B Campylobacter jejuni 42 148 B Clostridium perfringens 120 30 B Coxiella burnetti 67 69 B Escherichia coli 328 545 B Listeria monocytogenes 96 191 B Rickettsia prowazekii 10 74 B Salmonella typhimurium 105 89 B Staphylococcus aureus 100 86 B Vibrio cholerae 31 178 C Influenza 168 97 C Mycobacterium tuberculosis 74 1867 Microbial Rosetta Stone (MRS): is a database that relates microorganism names, taxonomic classifications, diseases, and scientific literature for the the most important human, animal and plant microbial pathogens, with linkage to public genomic sequence databases Applications of Gemina • Pathogen Identification Applications: – biodefense, animal health care, food safety, diagnostics, pathology, clinical research, forensics, drug discovery • Under Open Access. Applications of Gemina • Pathogen Identification Applications: – biodefense, animal health care, food safety, diagnostics, pathology, clinical research, forensics, drug discovery • Under Open Access. • Disease/Anatomy/Symptoms – DNA sequence, genomes – Physical resources – Proteomic data Case Study: Submit queries of multiple terms to view related Infection Systems Microbial Identification of Clinically Significant Microbes NIH Clinical Center Collaboration: Dr. Patrick Murray • Creation of Identification Clinical Reference Set • Identify unique signature tags to distinguish organisms • Goal: identify the minimum number of tests (50 bp unique signatures) to identify a gram-negative rod bacteria using Pyrosequencing • Genus-level identification • Species, Strain-level identification • Test Set: Clinical Isolates of Gram Negative Rods not reliably identified by biochemical testing: 140 Proteobacteria Case Study2: Insignia Homeland Security PANDA DNA Sequence Sequence Data Flow Data Input: NCBI: Genomic Sequence TIGR: Infection Systems Annotation extractor Chado genome annotation Diagnosics: DNA Signatures: Univ. MD MRS Database schema: Pathogens and Disease Epidemiology Data Flow TAXON_ID Web Interface Gemina DNA datasets outside of GenBank that we have identified and included in PANDA. Organism Name Sequencing Center Acidobacteria bacterium Ellin345 DOE Joint Genome Institute Acinetobacter baumannii Genoscope Actinobacillus actinomycetemcomitans HK1651 University of Oklahoma Bacteriovorax marinus Wellcome Trust Sanger Institute Bordetella avium Wellcome Trust Sanger Institute Burkholderia cenocepacia J2315 Wellcome Trust Sanger Institute Chromohalobacter salexigens DOE Joint Genome Institute Citrobacter rodentium Wellcome Trust Sanger Institute Clavibacter michiganensis subsp. sepedonicus Wellcome Trust Sanger Institute Clostridium botulinum A Wellcome Trust Sanger Institute Clostridium difficile 630 Wellcome Trust Sanger Institute Erwinia amylovora Wellcome Trust Sanger Institute Escherichia coli 042 Wellcome Trust Sanger Institute Escherichia coli E2348/69 Wellcome Trust Sanger Institute Francisella tularensis subsp. holarctica FSC200 Baylor College of Medicine Frankia sp. EAN1pec DOE Joint Genome Institute Geobacillus stearothermophilus 10 University of Oklahoma Helicobacter mustelae Wellcome Trust Sanger Institute Lactobacillus brevis DOE Joint Genome Institute Mannheimia haemolytica PHL213 Baylor College of Medicine Methylobacterium extorquens AM1 Integrated Genomics Mycobacterium marinum M Wellcome Trust Sanger Institute Mycobacterium microti Wellcome Trust Sanger Institute Neisseria meningitidis FAM18 Wellcome Trust Sanger Institute Organism Name Sequencing Center Paenibacillus larvae subsp. larvae Baylor College of Medicine Proteus mirabilis Wellcome Trust Sanger Institute Pseudomonas fluorescens SBW25 Wellcome Trust Sanger Institute Rhizobium leguminosarum bv. viciae 3841 Wellcome Trust Sanger Institute Rhodobacter capsulatus SB1003 Integrated Genomics Salmonella bongori 12149 Wellcome Trust Sanger Institute Salmonella typhimurium DT104 Wellcome Trust Sanger Institute Salmonella typhimurium SL1344 Wellcome Trust Sanger Institute Salmonella typhimurium TR7095 Wellcome Trust Sanger Institute Serratia marcescens subsp. marcescens Db11 Wellcome Trust Sanger Institute Shewanella baltica DOE Joint Genome Institute Shigella dysenteriae M131649 Wellcome Trust Sanger Institute Shigella sonnei 53G Wellcome Trust Sanger Institute Spiroplasma kunkelii CR2-3x University of Oklahoma Streptococcus equi Wellcome Trust Sanger Institute Streptococcus equi subsp. zooepidemicus Wellcome Trust Sanger Institute Streptococcus pneumoniae 23F Wellcome Trust Sanger Institute Streptococcus pyogenes Manfredo Wellcome Trust Sanger Institute Streptococcus suis P1/7 Wellcome Trust Sanger Institute Streptococcus uberis 0140J Wellcome Trust Sanger Institute Thermoanaerobacter ethanolicus DOE Joint Genome Institute Vibrio salmonicida LFI1238 Wellcome Trust Sanger Institute Wolbachia endosymbiont of Onchocerca volvulus Wellcome Trust Sanger Institute Wolbachia pipientis Wellcome Trust Sanger Institute Yersinia enterocolitica (type 0:8) Wellcome Trust Sanger Institute Ongoing Development • Creating links to Insignia from Results page • Enable choice of target and background genomes from Gemina Search Results • links to Web resources for each Pathogen • Community involvement in development of ontologies • Workshop on Ontology of Diseases: Nov. 6-7, 2006 • Inclusion of additional datasets (Scotland – Disease data) Sam Angiuoli Sybil/Ergatis Poster H-82 Jonathan Crabtree Sybil Interface Aaron Gussman Gemina Poster B-46