Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Western blot wikipedia , lookup

Cell-penetrating peptide wikipedia , lookup

Protein wikipedia , lookup

Protein adsorption wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Peptide synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Metabolism wikipedia , lookup

Proteolysis wikipedia , lookup

Protein structure prediction wikipedia , lookup

Genetic code wikipedia , lookup

Biochemistry wikipedia , lookup

Expanded genetic code wikipedia , lookup

Transcript
Identification of thermophilic species by the amino
acid compositions deduced from their genomes
David P. Kreil and Christos A. Ouzounis
University of Cambridge and European Bioinformatics Institute, Computational
Genomics Group, Research Programme, The European Bioinformatics Institute,
EMBL Outstation, Wellcome Trust Gnome Campus, Cambridge CB10 1SD, UK
Reporter: Yu Lun Kuo
E-mail: [email protected]
Date: October 26, 2006
1/17
Outline
•
•
•
•
2/17
Introduction
Materials and Methods
Results
Discussion and Conclusion
Introduction
• The properties of thermophilic protein have
been examined in the past two decades.
• Thermophilic protein for particular amino acids,
but general rules have not yet emerged.
• Experiment is not only homologous proteins,
but also protein unique to particular species.
3/17
Introduction
• The results for the genomes of six archaea, 19
bacteria, and the eukaryotic organisms.
• Using two different approaches, several factors
– Determine amino acid composition can be deduced
• GC content of the coding sequences is the
dominant influence on amino acid composition
– Possible to identify thermophilic species
4/17
Materials and Methods
• Data sources and tools
• Exploratory data analysis
• Sensitivity analysis, sampling adequacy and
significance
5/17
Data Sources and Tools
• Obtained from public databases
– EBI (European Bioinformatics Institute)
– NCBI (National Center for Biotechnology Information)
– SRS – Access to multiple molecular biology databases
– EPCLUST (Expression Profile data CLUSTering and
analysis)
– Hierarchical clustering
– PCA (Principal Components Analysis)
6/17
Exploratory Data Analysis
• For all organisms, determined global amino
acid compositions
– Matrix where the rows represent the data
sources list
– The columns correspond to the respective
percentage amino acid content
7/17
Exploratory Data Analysis
• Principal factors was supported two variables
– GC ratio (GC counts vs. AT counts)
– A binary variable (therm)
• The binary variable, therm
– 0 (zero) - mesophilic
– 1 (one) - thermophilic
8/17
Sensitivity Analysis, Sampling
Adequacy and Significance
• Miscellaneous clustering methods were tried
–
–
–
–
Average linkage (UPGMA)
Complete linkage (Maximum distance method)
Single linkage (Minimum distance method)
Weighted pair group method (WPGMA)
• PCA was repeated to verify that this
weighting did not affect any conclusions
– 20 amino acids with equal weight
9/17
Results
Red – More than average
Green – Less than average
0.6Unusually high GC ratio
57-67%
Thermophilic
High GC ratio
0.2
Thermophilic
1.5
10/17
Results (PCA of Amino Acid)
0-mesophile
1-thermophile
• A clear separation of thermophiles and
mesophiles along the second principal axis
Thermophilic
Archea – Red
Bacteria – Green
Eukaryote – Purple
Outgroup - Blue
11/17
Component Loadings
• High Loading
– Absolute component loadings > 0.6
• Component loading can be interpreted as
correlation coefficients
• Component 1
– Correlate with GC ratio
• Component 2
– Correlate with Therm
12/17
Statistical
Evidence and Specific
Very high
factor
PCA factor
Average
difference
Feature ofloading
Thermophilic
Species
loadings
for
Thermo
& meso
between thermo
& meso
component 2
more or less
• PCA
– Starting from the distinct groups of thermophiles
and mesophiles as obtained
Strong
• Gln (Q) & Glu (E)
– Have very high component loadings
• Table 2 summarizes the results and most of
the statistical evidence
Less – in Thermophiles < in mesophiles
Low factor
loadings
13/17
More - in Thermophiles > in mesophiles
Raw correlations
with the binary
variable therm
Discussion and Conclusion
• The results discern several underlying factors
that influence amino acid composition
– Completely sequenced genomes of 27 species
– Employing different methods of data analysis
• The two most prominent observations
– Dominant effect of GC pressure
– Clear identification of thermophilic species
14/17
Discussion and Conclusion
• PCA found GC ratio to be the most important
factor
• Environmental adaptations would also be
expected to play a role
– A pernix is found at a little distance from the other
thermophiles
15/17
Discussion and Conclusion
• Not only true for individual proteins or groups
of proteins but also for entire genomes
– GC contents with a stronger influence on amino
acid composition than adaptation to extreme
environments (e.g., thermophily)
– Interesting to extend analysis from different phyla
16/17
Thanks
17/17