Download Quantifying Environmental Adaptation of Metabolic Pathways in

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Tara A.Gianoulis, Jeroen Raes
April 13,2010
Presenter: Quan Zhang
 Introduction
 Data collection
 Three methods
 Linear Model (LM)
 Canonical correlation analysis (CCA)
 Discriminative partition matching (DPM)
 Results : three case studies
 Energy conversion strategies
 Balancing amino acid Synthesis vs. Import
 Lipid and Glycan metabolism
 Conclusion
 Discussion
 It is critical to understand:
 Environmental influence on microbial communities VS.
how microbes reshape their environment.
 Direct sequencing:
 First large-scale technique that allows us to see the
functions of these microbial communities
 Evidence for genomic adaptations:
 Comparative metagenomics approaches
 Sequence composition, genome size, evolutionary rates,
metabolic capabilities in different environments
A one-dimensional
representation of the
environmental metabolic profiles
for microbes sampled from nine
Dinsdale EA, et al. (2008) Functional metagenomic profiling of
nine biomes. Nature 452:629–632
 The previous studies used a rough definition for
 For example: marine vs. land
 This study treated environments explicitly as a set of
continuous features
 For example: temperature, sample depth …
 Define metabolic footprint of distinct environments
 Footprint– The set of metabolic pathways that depend
on or covary with the environment
 Data collection
 Global Ocean Survey (GOS) dataset: filter size 0.1-0.8
 Discard Sargasso Sea 11
 Remaining 37 sites from CAMERA
 Environmental features
 temperature, sample depth, water depth, salinity and
monthly average chlorophyll level
 Processing feature data
 average the salinity for all nonzero(except freshwater site)
 corroborate the missing measurements using World Ocean
Assign the peptides to a particular site using a mapping algorithm
that cross-referenced between reads, scaffolds, and peptides based
on predicted gene coordinates.
The “multiple sites” peptide distribution is similar to the distribution of all
peptides, so this implies there are no major differences in assembly quality
 Assign the peptide to a pathway
 Similarity search tool: BLASTP
 Database: STRING 7.0 ( current STRING 8.2)
 Threshold: bitscore>60, 80% consistency among top 5
 Assign pathway frequency for each site
 Build two matrices
 Rows are sites, columns are environmental features
 Rows are sites, columns are metabolic features
 Determine the first order relationships between each pair of
metabolic and environmental features
 Two directions:
 The environmental factors: variable; predicted from subset of
pathway frequencies
 The pathway frequency : variable; predicted from environmental
 Determine the subset of predictive variables:
 Stepwise regression
 Akaike’s information criterion (AIC)
 Top 20 pathways showing the highest pairwise correlation were
 Limitation:
 Views each feature in isolation
 There are hidden dependencies among the environmental features
Predicting specific environmental parameters from subsets of metabolic pathways.
Gianoulis T A et al. PNAS 2009;106:1374-1379
©2009 by National Academy of Sciences
 Canonical correlation analysis (CCA)
 Determines whether a global relationship between
environmental and metabolic features exists
 Calculates the relative contribution of each feature to
the global relationship by weighting both sets of
features simultaneously.
 Discriminative partition matching(DPM)
 Analyzes whether groupings of sites based on similar
environmental features also shared functional
(pathway) similarities
between two groups
of variables
 species variables vs.
 Looks at the
environment variables
(community ecology)
 genetic variables vs.
variables (population
Given a linear combination of X variables:
F = f1X1 + f2X2 + ... + fpXp
and a linear combination of Y variables:
G = g1Y1 + g2Y2 + ... + gqYq
----------------------------------------------------------------------------------------------------------The first canonical correlation is:
Maximum correlation coefficient between F and G,
for all F and G
F1={f11,f12,...,f1p} and G1={g11,g12,...,g1q}
are corresponding canonical variates (dimensions)
----------------------------------------------------------------------------------------------------------The second canonical correlation is:
Maximum correlation coefficient between F and G,
for all F, orthogonal to F , and G, orthogonal to G
F2={f21,f22,...,f2p} and G2={g21,g22,...,g2q}
are corresponding second canonical variates (dimensions)
Amino acid
Lipid synthesis and glycan
Energy conversion
 For environmental metadata
 Cluster sites based on their quantitative environmental
 Two or more clusters
 For metabolism matrices
 Partition the sites in the metabolism matrices into 2 site sets
 Calculate the mean frequency of each pathway in each site set.
 If the means of the pathway frequencies between 2 site
sets were not significantly different:
environment-based partitioning does not reflect
functional differences
 If they do differ significantly:
environmental features are related to that specific
aspect of metabolism
 Specially, Benajamini-Hochberg was employed to
correct p-value
When a two-sample t-test is performed on a gene, p-value is used to
measure the significantly different level between two groups of
 Similarities
 Both are used to explore relationships between metabolism
and quantitative environmental parameters
 Differences
 All environmental variables are equally important when defining
the site sets
 Robust to noise
 May lose individual differences among sites and their relationships
to the environment
 Weights each environmental feature and each metabolic pathway
 More sensitive, but more susceptible to noise
•NMI stands for
Normalized Mutual
•NMI attempts to
determine how well one
classification is able to
predict the second
•If the NMI and
transposed NMI scores
are high, then either
classification is good at
predicting the other.
 Energy conversion strategies
 Balancing amino acid Synthesis vs. Import
 Lipid and Glycan metabolism
 Many of the environmentally-dependent pathways
were associated with energy conversion.
 Ample diversification in energy conversion strategies
 Helps organisms maintain adequate energy levels
despite changing environmental conditions
Light capture
and electron
 Phenomenon: Metabolic pathways associated with
amino acid and cofactor transport and metabolism
varied greatly with environment
 This variation may be a way to cope with the
oligotrophic (nutrient-limited) nature of the oceans
 Example: changes in amino acid uptake strategies
Amino acid uptake is sensitive to light availability, which could be an
additional factor in their variation.
We could say temperature and
chlorophyll influenced the
metabolism pathways mostly.
 Phenomenon: correlation of amino acid biosynthesis
pathways with the environment was unrelated to the
energetic cost of synthesizing a particular amino acid
 Significant positive correlation between the structural
correlation of the amino acid pathways and their
dependence on potentially limiting cofactors
 Import of exogenous amino acids may be preferred
when cofactors are limiting
 Methionine is a central amino acid in oceanic
 Cobalamin is a methionine cofactor containing cobalt.
 Reduction of methoione is caused by cofactor
 Observation :
 synthesis of methionine and cobalamin
 amino acid transporters , methionine degradation
 Thus, methoionine has a significant role in shaping
downstream environmental adaptations.
 Lipid & glycans are important components in
microbial cell membrane
Like what people
expected, lipid and
glycan metabolism
were related with
Explanation: Depth significantly contributed to lipid metabolism since microbes
needed to choose the optimal buoyancy as a growth condition.
 This method associates microbial community
functions with quantitative, continuous features of the
 Metabolic pathway footprints can be used to predict
environmental conditions when those data are not
 Only five environmental features ( temperature,
sample depth, water depth, salinity and monthly
average chlorophyll level) cannot fully describe the
real-world environmental complexity
 <0.3% of proteins in GOS dataset were characterized
as viral, but are expected to be much higher in reality
 Other questions?