Download METABOLIC MODELING AND IN SILICO METABOLISM

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metagenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Transcript
METABOLOMICS
File name Metabolic Profiling and Metabolomics 2011
DEFINITIONS & BACKGROUND
 Metabolome = the total metabolite pool (by analogy to genome, transcriptome, proteome).
Terminology summary – Fig. 1
The term 'metabolome' refers to the entire complement of all the low molecular weight
metabolites in a sample such as a leaf, fruit, or tuber. Low molecular weight metabolites are
small organic molecules that include sugars, amino acids, organic acids, sugar phosphates,
cofactors, secondary products, etc.
 Metabolomics (more-or-less synonymous with metabolic profiling, metabolic phenotyping)
= high-throughput analysis of metabolites.
Metabolomics is the simultaneous ('multiparallel') measurement of the levels of a large number
of cellular metabolites (typically several hundred). Many of these are not identified (i.e. are just
peaks in a profile).
Metabolomics analysis is like a snapshot, showing which compounds are present and at what
relative levels at a specific time point.
More generally, metabolomics refers to a holistic analytical approach to metabolism that is not
guided by specific hypotheses. Instead, metabolomics sets out to determine how (in principle,
all) metabolite levels respond to genetic or environmental changes and, from the data, to generate
new hypotheses.
 Fluxomics – Branch of metabolomics that measures the turnover of metabolites in pathways
using labeled isotopes such as 13C. It is just beginning; instead of being a snapshot of
metabolism, it is a movie.
 History & Development
Metabolic profiling is not new. Profiling for clinical detection of human disease using blood and
urine samples has been carried out for >30 years. Chuck Sweeley at MSU pioneered this, using
gas chromatography/mass spectrometry (GC-MS). See:
Gates SC, Sweeley CC (1978) Quantitative metabolic profiling based on gas chromatography.
Clin Chem 24:1663-73. Quantitative metabolic profiles of volatilizable components of human
biological fluids, e.g. urinary organic acids, were established using GC/MS. Data were
processed by computer and statistical methods for analyzing metabolic profiles were developed.
[Note that all the elements of metabolic profiling are here.]
Plant metabolic biochemists (e.g. Lothar Willmitzer) were among other early leaders in the field.
1
Metabolomics is expanding to catch up with other multiparallel analytical techniques
(transcriptomics, proteomics) but remains far less developed and less accessible.
 Plant Metabolome Size
For all plant species together, this is estimated to be 90,000-200,000 compounds. There are far
fewer in any one species, e.g. ~5,000 in Arabidopsis.
The plant metabolome is much larger than that of yeast, where there are far fewer metabolites
than genes or proteins (<600 metabolites vs. 6000 genes). The size of the plant metabolome
reflects the vast array of plant secondary compounds. This makes metabolic profiling in plants
much harder than in other organisms.
 Metabolomics Compared to Genomics, Transcriptomics & Proteomics
Differences between metabolomics and the other multiparallel approaches:
(a) Conceptual:
1 GENE → 1 mRNA → 1 Protein
(and conversely:
Many proteins
→ Many Metabolites
→ 1 Metabolite)
There is no direct relationship between metabolite and gene in the way there is between genes
and mRNAs and proteins. A single gene does not specify the level of a single metabolite, i.e. its
pool size (although it may determine whether the metabolite is present or absent).
Rather, as MCA teaches, the level of a metabolite is determined by the activities of all the
enzymes of all the pathways that involve that metabolite, and by effectors that act on these
enzymes. In practice, therefore, metabolite levels change according to developmental,
physiological, and pathological states.
Biological variance in metabolite levels (i.e., the variation between genetically identical plants
grown in the same conditions) is accordingly large – about 10× the analytical variability – and
limits the resolution of metabolomics.
(b) Chemical: Unlike nucleic acids and proteins, metabolites have a vast range of chemical
structures and properties. Their molecular weights span two orders of magnitude (30–3000 Da).
Therefore no single extraction or analysis method works for all metabolites. (Unlike DNA
sequencing, microarrays, MS analysis of proteins – all are general methods.)
(c) Dyamic: Many metabolite levels change with half times of minutes or seconds – far faster
than nucleic acids or proteins. Thus valuable information is lost if sampling times are too far
apart. Also drastic artifactual changes can occur in short intervals between harvest and
extraction; this adds to biological variance.
 Power of Metabolomics – Metabolomics analysis can powerfully complement
transcriptomics and proteomics. Metabolomes are a step nearer actual function.
Transcriptomes or proteomes are very inadequate monitors of cell function because there is no
simple relationship between mRNA or protein levels and metabolism.
2
Thus changes in mRNA level or protein level in mutants or transgenics are usually not closely
linked to changes in metabolic function or phenotype as a whole.
Part of the reason for this is the non-linear relation between mRNA and protein levels (Fig. 2)
and the typically hyperbolic relation between enzyme level and in vivo flux rate (see MCA
class). Another cause is the high level of functional redundancy in plant metabolism – i.e.
parallel or alternative pathways for the same process.
Silent Knockout Mutations. ~90% of Arabidopsis knockout mutations are silent – i.e. have no
visible phenotype and so provide no clues to gene function. (The search for some sort of visible
phenotype therefore often becomes desperate.) The situation in yeast is similar – up to 85% of
yeast genes are not needed for survival.
When there is little or no change in growth rate (visible phenotype) of a knockout mutant, the
pool sizes of metabolites have altered so as to compensate for the effect of the mutation, leaving
metabolic fluxes are unchanged. Thus – intuitively – mutations that are silent when scored for
metabolic fluxes or growth rate (growth rate is the sum of all metabolic fluxes) should have
obvious effects on metabolite levels. There is a firm theoretical basis for this in MCA.
Example. In the Chloroplast 2010 project (phenotype analysis of knockouts of Arabidopsis
genes encoding predicted chloroplast proteins):
Fig. 3 – Various knockouts showed essentially normal growth and color but highly abnormal
free amino acid profiles, e.g. At1g50770 (‘Aminotransferase-like’)
METABOLIC PROFILING METHODS
 Sample Preparation
Metabolites are typically extracted in aqueous or methanolic media, then fractionated into
lipophilic and polar phases that are then analyzed separately. Further fractionation of each phase
may follow to split metabolites into classes prior to analysis.
No single extraction procedure works for all metabolites because conditions that stabilize one
type of compound will destroy other types or interfere with their analysis. Therefore the
extraction protocol has to be tailored to the metabolites to be profiled.
In practice, these considerations mean that metabolic profiling is often confined to fairly stable
compounds that can be extracted together. These include major primary metabolites (sugars,
sugar phosphates, amino acids, and organic acids) and certain secondary metabolites (e.g.,
phenylpropanoids, alkaloids).
The most comprehensive profiling can cover several hundred such compounds, many of which
are unidentified. Many crucial metabolites, particularly minor or unstable ones, are currently
being missed in metabolomics analyses.
3
 Main Analytical Techniques
• Gas Chromatography/Mass-Spectrometry (GC/MS)
In GC/MS, the sample is first derivatized to increase metabolite stability and volatility. The
derivatized mix is then fractionated by a gas chromatograph that is coupled to a mass
spectrometer.
The mass spectrometer scans the peaks emerging from the GC column at frequent intervals (~1
sec) and so acquires the mass spectrum of each peak, from which peaks can be identified and
quantified. Mass spectrometry ‘weighs’ ionized individual molecules and their fragments.
Molecules are identified from their fragmentation pattern and ‘weights’ (mass/charge ratios –
m/z values), with the help of mass spectra libraries, and can be quantified from peak size.
Overlapping peaks can be deconvoluted because the spectra of their constituents are distinct
(Fig. 4).
Unfortunately, knowing only the exact masses of molecules and their fragments is not enough to
identify them. Huge number of chemical structures can have the same exact mass. This is why
libraries of retention times and mass spectra, determined for standard compounds, are critical.
The major challenge for metabolomics is identification of unknown peaks. Basically, standards
are essential to the process. If there is no standard, a compound cannot be identified with
certainty. Thus, the more novel the compound, the less powerful metabolomics becomes.
Mass spectrometry (MS) metabolomic datasets provide relative quantification of cellular
metabolites (i.e. –fold changes in levels between different samples. Absolute quantification (i.e.
moles per weight of tissue) is possible with MS methods but requires an authentic standard for
each metabolite to be quantified.
Animated explanation of GC/MS: http://www.shsu.edu/~chm_tgc/sounds/flashfiles/GCMS.swf
Tutorial on MS: http://www.asms.org/whatisms/page_index.html
• Liquid Chromatography/Mass-Spectrometry (LC/MS)
In LC/MS (also termed high performance liquid chromatography, HPLC/MS) the samples are
not derivatized before analysis and an HPLC instrument is used for separation. LC/MS is more
suitable than GC/MS for labile compounds, for those that are hard to derivatize, or hard to render
volatile. LC/MS is less developed than GC/MS. A closely related method is capillary
electrophoresis (CE)/MS.
Fig. 5 – Profiling example: Metabolites related to plant isoprenoid biosynthesis. The total ion
chromatogram (TIC) is the total output of the ion detector; the extracted ion chromatograms
(EICs) are the outputs for particular ions characteristic of isoprenoid synthesis intermediates.
4
• Nuclear Magnetic Resonance (NMR) Spectroscopy
Advantages of NMR over MS:
- NMR does not destroy the sample
- NMR can detect and quantify metabolite because the signal intensity is only determined by the
molar concentration
- NMR can provide comprehensive structural information, including stereochemistry
Many atoms have nuclei that are NMR active, but most NMR data are collected for 1H and 13C
since these are present in all organic molecules.
The main weakness of NMR is low sensitivity relative to MS. It is therefore less suited for
analysis of trace compounds. As the natural abundance of 13C is only 1.1%, 13C-NMR is less
sensitive than 1H-NMR. Recent developments have considerably increased sensitivity, making it
less of a problem.
NMR uses radio-frequency (RF) radiation and magnetic fields. RF radiation is used to stimulate
nuclei present within molecules. The information obtained is displayed as a spectrum. The
horizontal axis is the chemical shift (delta, in units of ppm), which is a measure of the position at
which RF absorption occurs relative to an internal standard (tetramethylsilane, TMS). The
vertical axis is the intensity of the absorption. As with other spectral techniques, compounds
have characteristic spectra. More than 100 metabolites occur in plants at levels high enough for
analysis by NMR, so NMR spectra of mixtures contain many peaks.
Fig. 6 – Profiling example: 1H-NMR spectra of extracts of leaves of various Verbascum species
(medicinal plants)
Signal overlap is a problem in the complex spectra of plant extracts. Signal overlap hampers
metabolite identification and quantification. Better signal resolution can be obtained using
various types of 2D NMR spectroscopy. These approaches cut signal overlap by spreading the
resonances in a second dimension.
Example: Heteronuclear single quantum coherence (HSQC) spectroscopy. The 2D spectrum has
one axis for 1H and the other for a heteronucleus (an atomic nucleus other than a proton), usually
13
C or 15N. The spectrum contains a peak for each unique proton attached to the heteronucleus
being considered.
Fig. 7 – HSQC used to select for protons directly bonded to 13C. (a) 1D 1H NMR spectrum of an
equimolar mixture of the 26 small-molecule standards. (b) 2D 1H–13C HSQC NMR spectra of
the same synthetic mixture overlaid onto a spectrum of aqueous whole-plant extract from
Arabidopsis. Note the greatly improved resolution.
NMR tutorial: http://www.cis.rit.edu/htbooks/nmr/
 Data Analysis
5
The avalanche of metabolome data presents great difficulties to analyze. There are also
challenges in archiving such data; a standard framework for this is in place.
The problems in extracting meaning from large data sets are similar for all forms of profiling.
The goal is to recognize patterns for further exploration.
Various data mining tools are used for this. These statistical tools reduce data complexity by
focusing on the information content of a given data set, i.e. they try to ‘tame’ the wild profusion
of profiling data. Unlike many other statistical procedures, these methods are mostly applied
when there are no a priori hypotheses.
Data mining tools include cluster analysis (CA) and principal components analysis (PCA). The
metabolite data can be known or unidentified peaks.
CA and PCA can establish ‘guilt by association’ – they can point to where in metabolism
mutations act from the similarity of their metabolite profiles to those of known mutations.
External factors (e.g. toxins, herbicides, environmental insults) can be studied in an analogous
way.
Thus, in principle, the function of an unknown gene can be determined by comparing the
metabolic profile of a mutant in that gene with a library of such profiles generated by deleting
individual genes of known function.
Caution: This approach may not be so useful for dissecting metabolic responses to normal
environmental variations (e.g. in nutrient level, soil aeration, salinity, water supply). There is
good reason from MCA theory and from observation to expect such variations to cause relatively
little change in metabolite levels. This is because all enzymes in affected pathways tend to be upor down-regulated together (Fell, 2005).
Two key drawbacks of clustering and other current data mining methods are:


Typically, they detect only simple, one-to-one linear relationships. They do not detect
non-linear or multi-input relationships, which are common in biology.
They do not assign confidence levels, so it is not clear which clusters are trustworthy
when the input data are not well separated.
● Cluster Analysis (CA)
CA is a set of statistical methods that group similar data together. The group (‘cluster’) members
have certain properties in common and the resultant classification can yield new insights. The
classification reduces the dimensionality of a data set. Data are presented in dendrograms that
emphasize natural groupings.
Profiling example: Fig. 8 – Dendrogram of the metabolic profiles of transgenic potato tubers and
tubers incubated in a range of glucose concentrations (0 to 500 mM). Note that:

The glucose-fed samples form a cluster that is nearer the cluster of wild-type samples
than any of the transgenics.
6

That independent transgenic lines carrying the same transgene (e.g., the four ‘SP’ lines)
tend to cluster together (the principle of ‘guilt by association’).
● Principal Component Analysis (PCA)
PCA uses all the metabolite data from a sample to compute an individual metabolic profile that is
then compared to all the other profiles. In essence, PCA takes the resulting cloud of data points
and rotates it such that the maximum variability is visible – i.e. the extraction of principal
components amounts to a variance maximizing rotation of the original variable space. PCA finds
the vectors (‘principal components’) that give the best overall sample separation.
The data can be represented as two- or three-dimensional plots in which the axes (principal
components or vectors) are those that include as much as possible of the total information
derived from metabolic variances.
Profiling example: Fig. 9 – Clusters found after PCA analysis of the same data set for potato
tubers as above. Note that:



The two components chosen account together for 69% of the total metabolic variance, i.e.
only 1/3 of the original variation has been lost during data reduction.
As before, the glucose-fed samples form a cluster that is nearer the cluster of wild-type
samples than any of the transgenics.
Again, independent transgenic lines carrying the same transgene (e.g., the four ‘SP’ lines)
tend to cluster together.
● Simple Correlations
Computer-generated pairwise plots of every metabolite in the data set against every other metabolite can be informative. But when hundreds of metabolites are analyzed the potential number
of such plots is very large – many thousands – and most of them will show no relationship.
Profiling examples: Fig. 10 – correlations between pairs of metabolites among transgenic potato
tubers. Note:

The linear correlation (Frame A) between glucose-6-phosphate and fructose-6-phosphate
levels. These metabolites are interconvertible by phosphoglucose isomerase, which
catalyzes a near-equilibrium reaction. A linear relation is thus predicted.

The non-linear correlation between methionine and lysine levels (Frame C), in which lysine
accumulates continuously but methionine reaches a plateau. This is expected because
methionine synthesis is under tighter feedback and feedforward control than lysine.
 Metabolomics Resources
http://fiehnlab.ucdavis.edu/ Oliver Fiehn’s group at UC Davis. Includes databases.
7
http://www.noble.org/plantbio/MS/metabolomics.html Lloyd Sumner’s group at the Noble
Foundation. Useful short summary of analytical approaches and bioinformatics involved in
metabolomics.
http://dbkgroup.org/default.htm Douglas Kell’s group at University of Manchester – a
gateway site with explanations of metabolic profiling technologies and links to other useful sites.
Useful values (for interpreting metabolite concentration data):
- In typical plant tissues, dry weight is ~10% of fresh weight (so that there is ~ 0.9 ml of
water per gram fresh weight)
- In very rough terms, the cytoplasmic volume is 10% of the total tissue water volume.
(‘Cytoplasm’ includes mitochondria, plastids, peroxisomes, nucleus, and cytosol). The
vacuolar volume is 70% of total water, and extracellular water is 20% . The extracellular
water compartment is also termed the apoplast; the cytoplasmic + vacuole (i.e. intracellular)
water compartment is also termed the symplast.
- Plant leaves typically have a protein content of ~20% of dry weight. N content × 6.25 =
protein content (i.e. protein is ~16% N). The free amino acid content of plant tissues is
usually only a few percent of the protein-bound amino acid content.
- The osmotic potential of a typical plant cell is ~ -10 bars. A 1 molar solution of a sugar or
other non-dissociating solute has an osmotic potential of ~ -25 bars; that of a 1 molar
solution of a salt such as NaCl is ~ -45 bars. Thus the intracellular accumulation of high
concentrations of small molecules or salts has osmotic implications.
8