Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tara A.Gianoulis, Jeroen Raes
April 13,2010
Presenter: Quan Zhang
 Introduction
 Data collection
 Three methods
 Linear Model (LM)
 Canonical correlation analysis (CCA)
 Discriminative partition matching (DPM)
 Results : three case studies
 Energy conversion strategies
 Balancing amino acid Synthesis vs. Import
 Lipid and Glycan metabolism
 Conclusion
 Discussion
 It is critical to understand:
 Environmental influence on microbial communities VS.
how microbes reshape their environment.
 Direct sequencing:
 First large-scale technique that allows us to see the
functions of these microbial communities
 Evidence for genomic adaptations:
 Comparative metagenomics approaches
 Sequence composition, genome size, evolutionary rates,
metabolic capabilities in different environments
A one-dimensional
representation of the
environmental metabolic profiles
for microbes sampled from nine
environments.
Dinsdale EA, et al. (2008) Functional metagenomic profiling of
nine biomes. Nature 452:629–632
 The previous studies used a rough definition for
environment
 For example: marine vs. land
 This study treated environments explicitly as a set of
continuous features
 For example: temperature, sample depth …
 Define metabolic footprint of distinct environments
 Footprint– The set of metabolic pathways that depend
on or covary with the environment
 Data collection
 Global Ocean Survey (GOS) dataset: filter size 0.1-0.8
µm
 Discard Sargasso Sea 11
 Remaining 37 sites from CAMERA
 Environmental features
 temperature, sample depth, water depth, salinity and
monthly average chlorophyll level
 Processing feature data
 average the salinity for all nonzero(except freshwater site)
 corroborate the missing measurements using World Ocean
Database
Assign the peptides to a particular site using a mapping algorithm
that cross-referenced between reads, scaffolds, and peptides based
on predicted gene coordinates.
The “multiple sites” peptide distribution is similar to the distribution of all
peptides, so this implies there are no major differences in assembly quality
 Assign the peptide to a pathway
 Similarity search tool: BLASTP
 Database: STRING 7.0 ( current STRING 8.2)
 Threshold: bitscore>60, 80% consistency among top 5
hits
 Assign pathway frequency for each site
 Build two matrices
 Rows are sites, columns are environmental features
 Rows are sites, columns are metabolic features
 Determine the first order relationships between each pair of
metabolic and environmental features
 Two directions:
 The environmental factors: variable; predicted from subset of
pathway frequencies
 The pathway frequency : variable; predicted from environmental
factors
 Determine the subset of predictive variables:
 Stepwise regression
 Akaike’s information criterion (AIC)
 Top 20 pathways showing the highest pairwise correlation were
used
 Limitation:
 Views each feature in isolation
 There are hidden dependencies among the environmental features
Ref: http://en.wikipedia.org/wiki
Predicting specific environmental parameters from subsets of metabolic pathways.
Gianoulis T A et al. PNAS 2009;106:1374-1379
©2009 by National Academy of Sciences
 Canonical correlation analysis (CCA)
 Determines whether a global relationship between
environmental and metabolic features exists
 Calculates the relative contribution of each feature to
the global relationship by weighting both sets of
features simultaneously.
 Discriminative partition matching(DPM)
 Analyzes whether groupings of sites based on similar
environmental features also shared functional
(pathway) similarities
Variables
relationships
between two groups
of variables
 species variables vs.
Units
 Looks at the
environment variables
(community ecology)
 genetic variables vs.
environmental
variables (population
genetics)
Ref: http://myweb.dal.ca/hwhitehe/BIOL4062/redundancy.ppt
X’s
Y’s
Given a linear combination of X variables:
F = f1X1 + f2X2 + ... + fpXp
and a linear combination of Y variables:
G = g1Y1 + g2Y2 + ... + gqYq
----------------------------------------------------------------------------------------------------------The first canonical correlation is:
Maximum correlation coefficient between F and G,
for all F and G
F1={f11,f12,...,f1p} and G1={g11,g12,...,g1q}
are corresponding canonical variates (dimensions)
----------------------------------------------------------------------------------------------------------The second canonical correlation is:
Maximum correlation coefficient between F and G,
for all F, orthogonal to F , and G, orthogonal to G
F2={f21,f22,...,f2p} and G2={g21,g22,...,g2q}
are corresponding second canonical variates (dimensions)
1
Ref: http://myweb.dal.ca/hwhitehe/BIOL4062/redundancy.ppt
1
Amino acid
metabolism
Lipid synthesis and glycan
metabolism
Energy conversion
 For environmental metadata
 Cluster sites based on their quantitative environmental
metadata
 Two or more clusters
 For metabolism matrices
 Partition the sites in the metabolism matrices into 2 site sets
 Calculate the mean frequency of each pathway in each site set.
 If the means of the pathway frequencies between 2 site
sets were not significantly different:
environment-based partitioning does not reflect
functional differences
 If they do differ significantly:
environmental features are related to that specific
aspect of metabolism
 Specially, Benajamini-Hochberg was employed to
correct p-value
When a two-sample t-test is performed on a gene, p-value is used to
measure the significantly different level between two groups of
samples.
Ref: http://www.silicongenetics.com/Support/GeneSpring/GSnotes/analysis_guides/mtc.pdf
 Similarities
 Both are used to explore relationships between metabolism
and quantitative environmental parameters
 Differences
 DPM
 All environmental variables are equally important when defining
the site sets
 Robust to noise
 May lose individual differences among sites and their relationships
to the environment
 CCA
 Weights each environmental feature and each metabolic pathway
independently
 More sensitive, but more susceptible to noise
•NMI stands for
Normalized Mutual
Information
•NMI attempts to
determine how well one
classification is able to
predict the second
classification.
•If the NMI and
transposed NMI scores
are high, then either
classification is good at
predicting the other.
 Energy conversion strategies
 Balancing amino acid Synthesis vs. Import
 Lipid and Glycan metabolism
 Many of the environmentally-dependent pathways
were associated with energy conversion.
 Ample diversification in energy conversion strategies
observed
 Helps organisms maintain adequate energy levels
despite changing environmental conditions
Light capture
and electron
transport
ATP
synthase
 Phenomenon: Metabolic pathways associated with
amino acid and cofactor transport and metabolism
varied greatly with environment
 This variation may be a way to cope with the
oligotrophic (nutrient-limited) nature of the oceans
 Example: changes in amino acid uptake strategies
Amino acid uptake is sensitive to light availability, which could be an
additional factor in their variation.
We could say temperature and
chlorophyll influenced the
metabolism pathways mostly.
 Phenomenon: correlation of amino acid biosynthesis
pathways with the environment was unrelated to the
energetic cost of synthesizing a particular amino acid
 Significant positive correlation between the structural
correlation of the amino acid pathways and their
dependence on potentially limiting cofactors
 Import of exogenous amino acids may be preferred
when cofactors are limiting
 Methionine is a central amino acid in oceanic
microorganisms.
 Cobalamin is a methionine cofactor containing cobalt.
 Reduction of methoione is caused by cofactor
limitation.
 Observation :
 synthesis of methionine and cobalamin
 amino acid transporters , methionine degradation
 Thus, methoionine has a significant role in shaping
downstream environmental adaptations.
 Lipid & glycans are important components in
microbial cell membrane
Like what people
expected, lipid and
glycan metabolism
were related with
environmental
conditions.
Explanation: Depth significantly contributed to lipid metabolism since microbes
needed to choose the optimal buoyancy as a growth condition.
 This method associates microbial community
functions with quantitative, continuous features of the
environment
 Metabolic pathway footprints can be used to predict
environmental conditions when those data are not
available
 Only five environmental features ( temperature,
sample depth, water depth, salinity and monthly
average chlorophyll level) cannot fully describe the
real-world environmental complexity
 <0.3% of proteins in GOS dataset were characterized
as viral, but are expected to be much higher in reality
 Other questions?