Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bioscience Discovering virulence genes present in novel strains and metagenomes Chris Stubben IC postdoc, B-7 Operated by Los Alamos National Security, LLC for NNSA Bioscience Operated by Los Alamos National Security, LLC for NNSA Overview • Review current functional classification systems • Discuss Virulence Factor Ontology • Identify virulence genes in novel strains and metagenomes Slide 3 Functional classification systems • EC numbers for enyzmes (1956) • Swiss-Prot keywords (1986) • E. coli gene functions, M. Riley (1993) • TIGR role categories (1995) • Gene Ontology (1998) function gen e Slide 4 What functions are related to virulence? • Some systems have a few terms – Swiss-Prot keywords = virulence, toxin, antibiotic resistance – TIGR roles = pathogenesis, toxin production and resistance • Gene Ontology (GO) also has pathogenesis, resistance to antibiotics, plus many more GO terms related to the enzymatic activity of toxins Slide 5 Gene Ontology (GO) • 25,688 terms in three structured controlled vocabularies (ontologies) – 15098 biological processes – 2186 cellular components – 8404 molecular functions • Standard for eukaryotic gene annotation • Increasingly used for prokaryotes – TIGR (2002) – Plant pathogens by PAMGO at VBI (2005) – Human pathogens at 8 BRCs (2006) Slide 6 Bioinformatics Resource Centers (BRC) • NIAID funded, $100 million dollar effort to create eight bioinformatic centers for human pathogens • Goal is to provide easy access to genomic data from multiple strains like eukaryotic model organism databases BRCs = ? Slide 7 Example: Toxin annotation in GO Step 1, Assign GO terms, maybe – – – – activation of Rho GTPase activity N-terminal peptidyl-glutamine deamination actin cytoskeleton reorganization stress fiber formation Slide 8 Step 2, add references and evidence codes Virulence Protein Experimental Computational Sequence similarity • Knockout mutants (IMP) • Overexpression phenotypes (IDA) • Genetic interactions (IGI) • Microarrays (IEP or RCA) • BLAST alignments (ISA) • Orthologous proteins (ISO) • Hidden markov models of protein families or domains (ISM) Function Genomic context • Phlyogenetic profiles, conserved neighborhoods, gene fusion, shared regulatory sites, etc (IGC) Slide 9 Example: Toxin searches in GO • If a gene is annotated to ‘adenylate cyclase activity’, how do you know it’s a toxin? • It may also annotated to “cell killing” or related term, but is that enough? • However, an alternative is to define virulence factors and toxins (both outside the scope of GO) in a new ontology Slide 10 Why we need a Virulence Factor ontology • Lots of effort to characterize pathogenic processes and systems (eg, BRCs) • Many different definitions of pathogen, virulence and virulence factors • Not clear what terms in GO may be related to toxins and virulence (BRCs have already assigned 750,000 GO terms to 300,000 genes) Slide 11 Virulence Factor Ontology working group • Goal is to combine existing toxin and virulence terms from various groups into a single ontology – TVFac and antibiotic resistance (AR) terms at LANL – Gemina virulence factors and AR terms at U. of Maryland – PAMGO terms in GO • Participants – MITRE. Lynette Hirschmman, Marc Colosimo, and others – LANL. Chris Stubben, Murray Wolinsky and Jian Song – U of Maryland IGS. Lynn Schriml and Michelle Gwinn Slide 12 Virulence Factor Ontology (VFO) • Three new ontologies, one very simple that points to additional terms in GO or to new ontologies • Virulence factor (definition needed!) – – – – – – – – – toxin associated processes New antibiotic resistance New adhesion entry into host acquisition of nutrients from host avoidance of host defenses simplified GO trees (slims) growth within host modification of host morhphology dissemination from host Slide 13 Virulence genes in novel strains • Emerging, engineered and novel strains will most likely be sequenced quickly using next generation sequencing technologies, • and then compared to near neighbor strains using sequence similarity (BLAST) or models (HMMs like PFams, TIGRFams, FIGFams, EnteroFams, etc). Slide 14 Compare novel strains to what? • Very few manual annotations available for prokaryotes, especially in public databases like NCBI and UniProt Table 1. Percentage of genes in UniProt with functional assignments to Gene Ontology terms based on experimental evidence in the primary literature. “Curated information from the literature serves as the gold-standard data set for comparative analyses” -Nature Sep10, 2008 Use BRCs! Slide 15 BRC annotations • Genomes annotations should have references and evidence codes signifying whether annotations were produced experimentally or computationally 3.8% of Y.pestis CO92 with manual annotations Slide 16 Y. pestis CO92 annotations at ERIC Table 1 and 2. Sequence features and coding sequence annotations for Y. pestis CO92 at ERIC Slide 17 Yersinia antibiotic resistance genes Table 1 and 2. Antibiotic resistance genes found using Swiss-prot keyword search ‘antibiotic resistance’ in UniProt and using GO term search ‘response to antibiotic’ in ERIC. Only one gene in common! Slide 18 Vibrio toxins in GO, UniProt, and NMPDR Slide 19 Virulence genes in metagenomes • Recent comparison of virulence genes in chicken, cow, mouse and human gut metagenomes (metavirulomes) was based on SEED subsystem categories at NMPDR • Another alternative is to use GO term mappings to protein family and domain databases like PFam Slide 20 IMG/metagenomes from JGI • Select metagenomes and save Slide 21 Create abundance profiles • Compare using Pfam, COG, or TIGRfam abundance profiles Slide 22 Find virulence genes • Use GO term mappings to PFAM database to find virulence genes ID PF00144 PF05139 PF05223 PF07091 PF01289 PF01376 PF03023 PF03077 PF03945 PF05394 PF05480 PF05658 PF05662 PF05932 PF07269 PF07675 PF07822 PF09025 PF09207 PF06414 PF06769 PF02794 Map tp GO term response to antibiotic response to antibiotic response to antibiotic pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis pathogenesis Pfam Air 1 Air 2 Soil Whalefall Human 7 Beta-lactamase 0.3094 0.2349 0.2757 0.1087 0.0191 Erythromycin esterase 0.0041 0.0114 0.0240 0.0000 0.0064 NTF2-like N-terminal transpeptidase domain 0.0000 0.0000 0.0010 0.0000 0.0000 Ribosomal RNA methyltransferase (FmrO) 0.0000 0.0000 0.0021 0.0000 0.0000 Thiol-activated cytolysin 0.0000 0.0000 0.0021 0.0000 0.0000 Heat-labile enterotoxin beta chain 0.0000 0.0000 0.0000 0.0000 0.0000 MviN-like protein 0.0247 0.0341 0.0459 0.0225 0.1146 Putative vacuolating cytotoxin 0.0000 0.0000 0.0000 0.0037 0.0000 delta endotoxin, N-terminal domain 0.0041 0.0000 0.0000 0.0000 0.0000 Avirulence protein 0.0000 0.0000 0.0010 0.0000 0.0000 Staphylococcus haemolytic protein 0.0000 0.0000 0.0010 0.0000 0.0000 Hep_Hag 0.0289 0.0265 0.0073 0.0112 0.0127 Haemagglutinin 0.0289 0.0379 0.0021 0.0000 0.0191 Tir chaperone protein (CesT) 0.0000 0.0000 0.0010 0.0000 0.0000 T-complex transport apparatus lipoprotein VirB7 0.0000 0.0000 0.0010 0.0000 0.0000 Cleaved Adhesin Domain 0.0000 0.0000 0.0000 0.0000 0.0064 Neurotoxin B-IV-like protein 0.0000 0.0000 0.0010 0.0000 0.0000 YopR Core 0.0000 0.0000 0.0000 0.0037 0.0000 Yeast killer toxin 0.0000 0.0000 0.0010 0.0000 0.0000 Zeta toxin 0.0000 0.0076 0.0010 0.0000 0.0000 Plasmid encoded toxin Txe 0.0041 0.0038 0.0010 0.0037 0.0191 RTX toxin acyltransferase family 0.0000 0.0038 0.0000 0.0000 0.0000 Slide 23 Need better mappings to virulence genes • Current GO term mappings miss most virulenceassociated genes. Table 1 and 2. PFAMs and TIGRfams overrepresented in air compared to soil ID PF00593 PF07715 PF03466 PF00126 PF00440 PF00873 PF00015 PF07992 PF00106 PF01381 PFAM TonB dependent receptor TonB-dependent Receptor Plug Domain LysR substrate binding domain Bacterial regulatory helix-turn-helix protein, lysR family Bacterial regulatory proteins, tetR family AcrB/AcrD/AcrF family Methyl-accepting chemotaxis protein (MCP) signaling domain Pyridine nucleotide-disulphide oxidoreductase short chain dehydrogenase Helix-turn-helix ID TIGR00014 TIGR01297 TIGR01782 TIGR02606 TIGR01552 TIGR01435 TIGR01352 TIGR01509 TIGR00093 TIGR02690 TIGRFAM arsenate reductase (glutaredoxin) cation diffusion facilitator family transporter TonB-dependent receptor putative addiction module antidote protein prevent-host-death family protein putative glutamate--cysteine ligase TonB family C-terminal domain haloacid dehalogenase superfamily pseudouridine synthase family arsenical resistance protein ArsH Air 1 Air 2 0.90 0.94 0.68 0.48 0.42 0.77 0.29 0.78 0.99 0.31 Air 1 Soil 0.87 0.95 0.58 0.42 0.43 0.58 0.22 0.79 0.89 0.38 Air 2 0.40 0.29 0.23 0.25 0.25 0.20 1.01 0.49 0.54 0.11 0.16 0.33 0.16 0.14 0.17 0.43 0.05 0.58 0.74 0.17 Whalefall Human 7 0.31 0.00 0.37 0.02 0.52 0.18 0.38 0.27 0.28 0.20 0.64 0.03 0.42 0.00 0.77 0.49 0.58 0.23 0.23 0.64 0.05 0.13 0.01 0.03 0.14 0.00 0.68 0.28 0.28 0.02 Whalefall Human 7 0.04 0.00 0.16 0.24 0.00 0.00 0.00 0.00 0.18 0.16 0.22 0.42 0.37 0.03 0.28 0.42 0.18 0.16 0.00 0.00 Soil 0.30 0.46 0.26 0.28 0.48 0.20 0.74 0.46 0.40 0.28 Slide 24