Download 2ABL

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Magnesium transporter wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Metalloprotein wikipedia , lookup

Interactome wikipedia , lookup

Protein wikipedia , lookup

Point mutation wikipedia , lookup

Genetic code wikipedia , lookup

Protein purification wikipedia , lookup

Western blot wikipedia , lookup

Biochemistry wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Protein Sectors: Evolutionary Units of
Three-Dimensional Structure
Cell (2009)
Najeeb Halabi, Olivier Rivoire, Stanislas Leibler,
and Rama Ranganathan
presented by Jianewei Zhu
Summary
Proteins display a hierarchy of structural features at
primary, secondary, tertiary, and higher-order levels,
an organization that guides our current understanding
of their biological properties and evolutionary origins.
Here, we reveal a structural organization distinct from
this traditional hierarchy by statistical analysis of
correlated evolution between amino acids. Applied to
the S1A serine proteases, the analysis indicates a
decomposition of the protein into three quasiindependent groups of correlated amino acids that we
term ‘‘protein sectors.’’
Summary
Each sector is physically connected in the tertiary
structure, has a distinct functional role, and constitutes
an independent mode of sequence divergence in the
protein family. Functionally relevant sectors are
evident in other protein families as well, suggesting
that they may be general features of proteins. We
propose that sectors represent a structural organization
of proteins that reflects their evolutionary histories.
Introduction
• Data support two main findings:
– protein domains have a heterogeneous internal
organization of amino acid interactions that can
comprise multiple functionally distinct subdivisions
(the sectors)
– these sectors define a decomposition of proteins that is
distinct from the hierarchy of primary, secondary,
tertiary, and quaternary structure. We propose that the
sectors are features of protein structures that reflect the
evolutionary histories of their conserved biological
properties.
Results
• From Amino Acid Sequence to Sectors
• Statistical Independence
• Structural Connectivity
• Biochemical Independence
• Independent Sequence Divergence
• Sectors in Other Protein Families
Experimental Procedures
• Sequence alignment construction, annotation, and sequence
analyses(SCA) to get sectors
• Minimum discriminatory information(MDI) method to analysis
statistical independence
• Interpret sectors’ structural connectivity by others’ previous
approach
• Protein purification and kinetic assays to measure catalytic power
of biochemical independence, and thermal denaturation assays to
measure stability of biochemical independence
• PCA of the corresponding similarity matrices to provide
independent sequence divergence
From Amino Acid Sequence to Sectors
• Multiple sequence alignments(MSA)
• Measures of positional conservation
• Measures of sequence similarity
• SCA calculations
• Spectral cleaning
• Sector identification
• “Pseudo sectors”
• Representation of significant correlations
Multiple sequence alignments(MSA)
Family
PDB
Sequences
Positions
S1A
3TGI
1470
223
PDZ
1BE9
240
92
PAS
2V0W
1104
123
SH2
1AYA
582
79
SH3
2ABL
492
52
PS: 3TGI in S1A family is rat trypsin
Measures of positional conservation
(𝑎)
𝐷𝑖 ,
• The conservation
observed frequency
Background frequency 𝑞 (𝑎)
(𝑎)
𝑓𝑖 ,
q = (0.073, 0.025, 0.050, 0.061, 0.042, 0.072, 0.023, 0.053,
0.064, 0.089, 0.023, 0.043, 0.052, 0.040, 0.052, 0.073,
0.056, 0.063, 0.013, 0.033)
•
(𝑎)
𝐷𝑖
is well approximated by
(𝑎𝑖 )
𝐷𝑖
Measures of sequence similarity
Given a set S of positions, we define a similarity
matrix between pairs of sequence s,t as
where averages are here made over all amino acids a
and positions i in the set S under consideration. S is
taken to be either a sector or all the positions, and the
sequences are represented in the one-dimensional
space spanned by the first eigenvector of the similarity
(𝑆)
matrix 𝛤𝑠𝑡 .
SCA calculations
• In general, a covariance matrix reporting pairwise
correlations between positions can be defined as
• The binary approximation is
• SCA matrices can be obtained by weighting these
covariance matrices by a function 𝜙 = 𝜕𝐷 𝜕𝑓 of the
positional conservations
SCA calculations
Spectral cleaning
• Due to statistical and historical noise, most correlations
reported by 𝐶𝑖𝑗 are not functionally significant.
• The spectrum of 𝐶𝑖𝑗 is composed of 223 eigenvalues 𝜆1 >
⋯ > 𝜆223 , only the top 5 modes of 𝐶𝑖𝑗 are interpreted.
Eigenvalue spectra for the matrix
corresponding to the S1A serine
protease family (top panel) and for a
hundred trials for randomizing the S1A
sequence alignment (bottom panel).
The randomization process scrambles
the order of amino acids in each
alignment column independently; thus
amino acid frequencies at positions
are never changed. This analysis
shows that the bulk of the spectrum
(comprising the lowest 218 out of 223
total eigenvalues) can be attributed to
limited sampling of sequences.
Spectral cleaning
• Among the significant modes, the first mode has a
distinctive property: it describes a "coherent" correlation of
all positions and historical noise is expected to produce
coherent correlations between sequence positions
• SCA matrices with a dominant first mode, the first
eigenvector should just report the net contribution of each
position to the total correlation.
• The first mode is irrelevant for decomposing the protein
sequence into functional units and is removed.
Sector identification
• The three sectors in the serine protease family are
identified by examining the intermediate modes 2 to
5, and best visualized by the second and fourth
eigenvectors ( |2> and |4> ) of 𝐶𝑖𝑗 .
• we use here the bracket notation for representing
eigenvectors:|2> thus denotes second eigenvector,
with component along position i given by <i|2>.
Sector identification
• The Red Sector
• The Blue Sector
• The Green Sector
Sector identification
Sector identification
• The image and instruction is on page 22 of the
supplmental data pdf.
“Pseudo sectors”
• For the statistical method followed here to lead to protein
sectors, one requirement is that the sequences must be
distributed sufficiently uniformly.
• This example serves as an illustration of one of the
limitations of the approach taken here to identify sectors
from the eigenvectors of the correlation matrix𝐶𝑖𝑗 : when the
sequences in the alignment are non uniformly distributed,
and, more particularly, when distinct subfamilies of
sequences are present, this approach can result in the
identification of pseudo-sectors.
• The image and instruction is on page 23 of the supplmental
data pdf.
Representation of significant correlations
(E) SCA matrix after reduction of statistical
noise and of global coherent correlations. The
65 positions that remain fall into three groups
of positions (red, blue, and green, termed
‘‘sectors’’), each displaying strong intragroup
correlations and weak intergroup correlations.
In each sector, positions are ordered by
descending magnitude of contribution (Figure
S3), showing that sector positions comprise
a hierarchy of correlation strengths.
Statistical Independence
• The minimum discriminatory information (MDI)
– method aims at generalizing the definition of positional
conservation based on relative entropies to include
correlations between positions.
– Its principles are completely distinct from the SCA
method.
– If two sectors are independent, then the correlation
entropy of two taken together must be the sum of their
correlation entropies taken individually.
Statistical Independence
Structural Connectivity
Structural Connectivity
Structural Connectivity
Structural Connectivity
• No sector corresponds to any known subdivision of
proteins by primary structure segments, secondary
structure elements, or subdomain architecture.
Structural Connectivity
Biochemical Independence
• Protein purification and kinetic assays to measure
catalytic power of biochemical independence
• Thermal denaturation assays to measure stability of
biochemical independence
Biochemical Independence
Biochemical Independence
Biochemical Independence
Independent Sequence Divergence
• PCA of the corresponding similarity matrices to
provide independent sequence divergence
• The image and instruction is on page 7 of the
protein sectors pdf.
Sectors in Other Protein Families
• Two sectors are evident in the PSD95/Dlg1/ZO1
(PDZ) domain family of protein interaction modules
(blue and red, Figures 7A and S11)
• Two sectors are also evident in the Per/Arnt/Sim
(PAS) domain.
• Physically contiguous sectors are also evident in the
SH2 and SH3 families of interaction modules
(Figures 7C, 7D, S13, and S14).
• The image and instruction is on page 9 of the
protein sectors pdf.
What can we learn?