Download 15. DirtySecretsOfMetagenomes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Problems with
metagenome annotation
Number of known sequences
How much has been sequenced?
Environmental
sequencing
First
bacterial
genome
100
bacterial
genomes
1,000
bacterial
genomes
Year
If the database doubles every 15 months, how
often do you need to rerun your sample?
Long
Queues
MG-RAST speed is not dependent on MG size!
Days to weeks
Minutes to
seconds
The SEED database

Started with a few subsystems
Over 2,000 subsystems



Unmanageable!
Needed a solution so the annotators could find
their subsystems.
Created hierarchy
Over 2,000 Subsystems
Three level “hierarchy”
• Amino Acids and Derivatives
– Alanine, serine, and glycine
• Serine Biosynthesis
• Amino Acids and Derivatives
– Lysine, threonine, methionine, and cysteine
• Methionine Biosynthesis
Classification
#
Classification
SS
#
Classification
SS
# SS
Experimental
Subsystems
498 Regulation and Cell
signaling
51 Motility and
Chemotaxis
11
Clustering-based
subsystems
352 Virulence
49 Plant cell walls and
outer surfaces
10
Carbohydrates
160 Stress Response
43 Phages
10
Cofactors, Vitamins,
Prosthetic Groups,
Pigments
123 DNA Metabolism
41 Cell Division and Cell
Cycle
10
Amino Acids and
Derivatives
96 Aromatic Compounds
38 Photosynthesis
9
Protein Metabolism
95 Phages
36 Metabolite damage
8
Virulence, Disease,
Defense
70 Secondary Metabolism
34 Phosphorus
Metabolism
7
Miscellaneous
70 Iron acquisition and
metabolism
31 Potassium metabolism
4
RNA Metabolism
65 Nucleosides and
Nucleotides
24 Transcriptional
regulation
2
Membrane Transport
65 Sulfur Metabolism
20 Plasmids
2
Respiration
62 Dormancy and
Sporulation
17 Central metabolism
2
Cell Wall and Capsule
62 Plant-prokaryote
12 Autotrophy
2
FQ8D8DZ01AWR9I
One hit: xxx07431423 (fig|448385.11.peg.379)
DNA-directed RNA polymerase beta' subunit (EC
2.7.7.6)
RNA polymerase bacterial
FQ8D8DZ02G8RSI has two hits:
xxx02998721
3e-04 “hypothetical protein”
xxx05921978
4e-03
“Fibrinogen-binding protein”
Fibrinogen-binding protein is in subsystem
“Streptococcus pyogenes virulome”
FQ8D8DZ02GF820
207 hits
Glutamate synthase [NADPH] large chain (EC 1.4.1.13)
Ammonia assimilation

Ammonium metabolism H. pylori

Glutamine, Glutamate, Aspartate
and Asparagine Biosynthesis

Iron-sulfur experimental

FQ8D8DZ02GF820 has 250 hits:
Does it matter?

Compare things that are the same!

Know which version of the database you used

Recompute if you are not sure!
Related documents