Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Problems with metagenome annotation Number of known sequences How much has been sequenced? Environmental sequencing First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Year If the database doubles every 15 months, how often do you need to rerun your sample? Long Queues MG-RAST speed is not dependent on MG size! Days to weeks Minutes to seconds The SEED database Started with a few subsystems Over 2,000 subsystems Unmanageable! Needed a solution so the annotators could find their subsystems. Created hierarchy Over 2,000 Subsystems Three level “hierarchy” • Amino Acids and Derivatives – Alanine, serine, and glycine • Serine Biosynthesis • Amino Acids and Derivatives – Lysine, threonine, methionine, and cysteine • Methionine Biosynthesis Classification # Classification SS # Classification SS # SS Experimental Subsystems 498 Regulation and Cell signaling 51 Motility and Chemotaxis 11 Clustering-based subsystems 352 Virulence 49 Plant cell walls and outer surfaces 10 Carbohydrates 160 Stress Response 43 Phages 10 Cofactors, Vitamins, Prosthetic Groups, Pigments 123 DNA Metabolism 41 Cell Division and Cell Cycle 10 Amino Acids and Derivatives 96 Aromatic Compounds 38 Photosynthesis 9 Protein Metabolism 95 Phages 36 Metabolite damage 8 Virulence, Disease, Defense 70 Secondary Metabolism 34 Phosphorus Metabolism 7 Miscellaneous 70 Iron acquisition and metabolism 31 Potassium metabolism 4 RNA Metabolism 65 Nucleosides and Nucleotides 24 Transcriptional regulation 2 Membrane Transport 65 Sulfur Metabolism 20 Plasmids 2 Respiration 62 Dormancy and Sporulation 17 Central metabolism 2 Cell Wall and Capsule 62 Plant-prokaryote 12 Autotrophy 2 FQ8D8DZ01AWR9I One hit: xxx07431423 (fig|448385.11.peg.379) DNA-directed RNA polymerase beta' subunit (EC 2.7.7.6) RNA polymerase bacterial FQ8D8DZ02G8RSI has two hits: xxx02998721 3e-04 “hypothetical protein” xxx05921978 4e-03 “Fibrinogen-binding protein” Fibrinogen-binding protein is in subsystem “Streptococcus pyogenes virulome” FQ8D8DZ02GF820 207 hits Glutamate synthase [NADPH] large chain (EC 1.4.1.13) Ammonia assimilation Ammonium metabolism H. pylori Glutamine, Glutamate, Aspartate and Asparagine Biosynthesis Iron-sulfur experimental FQ8D8DZ02GF820 has 250 hits: Does it matter? Compare things that are the same! Know which version of the database you used Recompute if you are not sure!