Download Metagenomic Analysis Using MEGAN4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genome evolution wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Pathogenomics wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Metabolic network modelling wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Transcript
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinforma@cs Graduate Cer@ficate Program MaEhew Vaughn iPlant, University of Texas Super Compu@ng Center Introduc)on In METAGENOMICS, the aim is to understand the composi)on and opera)on of complex microbial consor@a in environmental samples through sequencing and analysis of their DNA. Similarly, metatranscriptomics and metaproteomics target the RNA and proteins obtained from such samples. Technological advances in next-­‐genera)on sequencing methods are fueling a rapid increase in the number and scope of environmental sequencing projects. In consequence, there is a drama@c increase in the volume of sequence data to be analyzed. hEp://ab.inf.uni-­‐tuebingen.de/soUware/megan/welcome.html The Importance of Metagenomics is Driven by Sequencing Costs The $100 Human Genome Basic Computa)onal Metagenomics The first three basic computa@onal tasks for such data are: § taxonomic analysis (“who is out there?”) § func)onal analysis (“what are they doing?”) § compara)ve analysis. (“how do different samples compare?”) This is an immense conceptual and computa@onal challenge that MEGAN is designed to address. hEp://ab.inf.uni-­‐tuebingen.de/soUware/megan/welcome.html Algorithms in Bioinforma@cs MEGAN-­‐4 USES §  Taxonomic §  Metagenomic §  Metatranscriptomic §  Metaproteomic §  16S rRNA sequences §  Func@on/Gene Ontology (SEED) §  Metabolomics/Pathway Analyses (KEGG) §  Compara@ve Genomics hEp://ab.inf.uni-­‐tuebingen.de/soUware/megan/welcome.html Getting started
Prepare a dataset for use with MEGAN:
1. First compare reads against a database of reference sequences,
e.g. BLASTX search against the NCBI-NR database.
2. Reads file & resulting BLAST file can be directly imported into MEGAN*
Automatic taxonomic classification or functional classification,
Uses SEED or KEGG classification, or both.
3. Multiple datasets can be opened simultaneously for comparative views
aatacgaacaE tgccatggacgc tggccaEgac Compara@ve Data Metagenomic sample Raw Digital Data DNA-­‐RNA-­‐Protein nr nt BLAST Ref pdb seq MEGAN4 rdb hEp://ab.inf.uni-­‐tuebingen.de/soUware/megan/welcome.html Taxonomic analysis
MEGAN can be used to interactively explore
the dataset. Figure shows assignment of
reads to the NCBI taxonomy.
Each node is labeled by a taxon and the
number of reads assigned to the taxon,
The size of a node is scaled logarithmically to
represent the number of assigned reads.
Tree display options allow you to interactively
drill down to the individual BLAST hits and to
export all reads
One can select a set of taxa and then use
MEGAN to generate different types of charts
hEp://ab.inf.uni-­‐tuebingen.de/soUware/megan/welcome.html Multiple Chart Options are Available
Functional analysis using the SEED classification
SEED1 is a compara@ve genomics environment of curated genomic data. The
following figure shows a part of the SEED analysis of a marine metagenome
sample. MEGAN attempts to map each read to a
SEED functional role by the highest
scoring BLAST protein match with a
known functional role.
SEED rooted trees are “multi-labeled”
because different leaves may represent
the same functional role (if it occurs in
different types of subsystems)
The current complete SEED tree has
about 13,000 nodes.
1hEp://www.theseed.org/wiki/Main_Page hEp://ab.inf.uni-­‐tuebingen.de/soUware/megan/welcome.html Functional analysis using the KEGG classification
To perform a KEGG analysis, MEGAN
attempts to match each read to a KEGG
orthology (KO) accession number, using the
best hit to a reference sequence
Reads are then assigned to enzymes and
pathways. The KEGG classification is
represented by a rooted tree whose leaves
represent pathways. See:
http://www.kegg.jp/kegg/pathway.html
Each pathway can also be inspected visually,
for example the citric acid cycle (shown).
These provide inferences regarding the
cellular activities of a sample.
KEGG displays different participating enzymes
by numbered rectangles. MEGAN shades
each such rectangle is so as to indicate the
number of reads assigned to the
corresponding enzyme.
hEp://ab.inf.uni-­‐tuebingen.de/soUware/megan/welcome.html KEEG Pathways and examples KEGG (Kyoto Encyclopedia of Genes and Genomes) “is a database resource for understanding high-­‐level func@ons and u@li@es of the biological system, such as the cell, the organism and the ecosystem, from molecular-­‐level informa@on, especially large-­‐scale molecular datasets generated by genome sequencing and other high-­‐throughput experimental technologies” KEGG is used to observe paPerns in metabolic pathways, func)onal hierarchies, Diseases, Ortholog Groups, Genes and Genomes. KEGG is heavily used by the metabolism community, and for compara)ve transcriptomics. Here are some examples of the KEGG results from our metabolic samples. Do they suggest anything to you? Comparitive analysis using the SEED classification
MEGAN also supports the
simultaneous analysis and
comparison of the SEED functional
content of multiple metagenomes, or
multiple timepoints/samples (shown)
A comparative view of assignments
to a KEGG pathway is also possible.
hEp://ab.inf.uni-­‐tuebingen.de/soUware/megan/welcome.html Computational comparison of metagenomes
MEGANs analysis window compares multiple datasets.
This enables creating distance matrices for a collection of datasets using
different ecological indices.
MEGAN supports a number of different
methods for calculating a distance matrix,
These can be visualized either using a split
network calculated using the neighbor-net
algorithm, or using a multi-dimensional
scaling plot. NeighborNet[1] is an algorithm that computes unrooted phylogene@c networks from molecular sequence data.
The figure we shows a comparison of eight
marine datasets based on the taxonomic
content of the datasets and computed using
Goodall’s index.
1Bryant and Moulton : Neighbor-­‐net, an agglomera@ve method for the construc@on of phylogene@c networks -­‐ Molecular Biology and Evolu@on 21 (2003) hEp://ab.inf.uni-­‐tuebingen.de/soUware/megan/welcome.html Comparative Taxonomic Visualization
MEGAN provides a comparison view that is based on a tree in which each node
shows the number of reads assigned to it for each of the datasets.
This can be done either as a pie chart, a bar chart or as a heat map.
Once the datasets are all individually opened
MEGAN provides a “compare” dialog.
The following figure shows the taxonomic
comparison of all eight marine datasets.
Here, each node in the NCBI taxonomy is
shown as a bar chart indicating the number of
reads (normalized, if desired) from each
dataset assigned to the node. hEp://ab.inf.uni-­‐tuebingen.de/soUware/megan/welcome.html