* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2ABL
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Magnesium transporter wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
Metalloprotein wikipedia , lookup
Interactome wikipedia , lookup
Point mutation wikipedia , lookup
Genetic code wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Biochemistry wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein Sectors: Evolutionary Units of Three-Dimensional Structure Cell (2009) Najeeb Halabi, Olivier Rivoire, Stanislas Leibler, and Rama Ranganathan presented by Jianewei Zhu Summary Proteins display a hierarchy of structural features at primary, secondary, tertiary, and higher-order levels, an organization that guides our current understanding of their biological properties and evolutionary origins. Here, we reveal a structural organization distinct from this traditional hierarchy by statistical analysis of correlated evolution between amino acids. Applied to the S1A serine proteases, the analysis indicates a decomposition of the protein into three quasiindependent groups of correlated amino acids that we term ‘‘protein sectors.’’ Summary Each sector is physically connected in the tertiary structure, has a distinct functional role, and constitutes an independent mode of sequence divergence in the protein family. Functionally relevant sectors are evident in other protein families as well, suggesting that they may be general features of proteins. We propose that sectors represent a structural organization of proteins that reflects their evolutionary histories. Introduction • Data support two main findings: – protein domains have a heterogeneous internal organization of amino acid interactions that can comprise multiple functionally distinct subdivisions (the sectors) – these sectors define a decomposition of proteins that is distinct from the hierarchy of primary, secondary, tertiary, and quaternary structure. We propose that the sectors are features of protein structures that reflect the evolutionary histories of their conserved biological properties. Results • From Amino Acid Sequence to Sectors • Statistical Independence • Structural Connectivity • Biochemical Independence • Independent Sequence Divergence • Sectors in Other Protein Families Experimental Procedures • Sequence alignment construction, annotation, and sequence analyses(SCA) to get sectors • Minimum discriminatory information(MDI) method to analysis statistical independence • Interpret sectors’ structural connectivity by others’ previous approach • Protein purification and kinetic assays to measure catalytic power of biochemical independence, and thermal denaturation assays to measure stability of biochemical independence • PCA of the corresponding similarity matrices to provide independent sequence divergence From Amino Acid Sequence to Sectors • Multiple sequence alignments(MSA) • Measures of positional conservation • Measures of sequence similarity • SCA calculations • Spectral cleaning • Sector identification • “Pseudo sectors” • Representation of significant correlations Multiple sequence alignments(MSA) Family PDB Sequences Positions S1A 3TGI 1470 223 PDZ 1BE9 240 92 PAS 2V0W 1104 123 SH2 1AYA 582 79 SH3 2ABL 492 52 PS: 3TGI in S1A family is rat trypsin Measures of positional conservation (𝑎) 𝐷𝑖 , • The conservation observed frequency Background frequency 𝑞 (𝑎) (𝑎) 𝑓𝑖 , q = (0.073, 0.025, 0.050, 0.061, 0.042, 0.072, 0.023, 0.053, 0.064, 0.089, 0.023, 0.043, 0.052, 0.040, 0.052, 0.073, 0.056, 0.063, 0.013, 0.033) • (𝑎) 𝐷𝑖 is well approximated by (𝑎𝑖 ) 𝐷𝑖 Measures of sequence similarity Given a set S of positions, we define a similarity matrix between pairs of sequence s,t as where averages are here made over all amino acids a and positions i in the set S under consideration. S is taken to be either a sector or all the positions, and the sequences are represented in the one-dimensional space spanned by the first eigenvector of the similarity (𝑆) matrix 𝛤𝑠𝑡 . SCA calculations • In general, a covariance matrix reporting pairwise correlations between positions can be defined as • The binary approximation is • SCA matrices can be obtained by weighting these covariance matrices by a function 𝜙 = 𝜕𝐷 𝜕𝑓 of the positional conservations SCA calculations Spectral cleaning • Due to statistical and historical noise, most correlations reported by 𝐶𝑖𝑗 are not functionally significant. • The spectrum of 𝐶𝑖𝑗 is composed of 223 eigenvalues 𝜆1 > ⋯ > 𝜆223 , only the top 5 modes of 𝐶𝑖𝑗 are interpreted. Eigenvalue spectra for the matrix corresponding to the S1A serine protease family (top panel) and for a hundred trials for randomizing the S1A sequence alignment (bottom panel). The randomization process scrambles the order of amino acids in each alignment column independently; thus amino acid frequencies at positions are never changed. This analysis shows that the bulk of the spectrum (comprising the lowest 218 out of 223 total eigenvalues) can be attributed to limited sampling of sequences. Spectral cleaning • Among the significant modes, the first mode has a distinctive property: it describes a "coherent" correlation of all positions and historical noise is expected to produce coherent correlations between sequence positions • SCA matrices with a dominant first mode, the first eigenvector should just report the net contribution of each position to the total correlation. • The first mode is irrelevant for decomposing the protein sequence into functional units and is removed. Sector identification • The three sectors in the serine protease family are identified by examining the intermediate modes 2 to 5, and best visualized by the second and fourth eigenvectors ( |2> and |4> ) of 𝐶𝑖𝑗 . • we use here the bracket notation for representing eigenvectors:|2> thus denotes second eigenvector, with component along position i given by <i|2>. Sector identification • The Red Sector • The Blue Sector • The Green Sector Sector identification Sector identification • The image and instruction is on page 22 of the supplmental data pdf. “Pseudo sectors” • For the statistical method followed here to lead to protein sectors, one requirement is that the sequences must be distributed sufficiently uniformly. • This example serves as an illustration of one of the limitations of the approach taken here to identify sectors from the eigenvectors of the correlation matrix𝐶𝑖𝑗 : when the sequences in the alignment are non uniformly distributed, and, more particularly, when distinct subfamilies of sequences are present, this approach can result in the identification of pseudo-sectors. • The image and instruction is on page 23 of the supplmental data pdf. Representation of significant correlations (E) SCA matrix after reduction of statistical noise and of global coherent correlations. The 65 positions that remain fall into three groups of positions (red, blue, and green, termed ‘‘sectors’’), each displaying strong intragroup correlations and weak intergroup correlations. In each sector, positions are ordered by descending magnitude of contribution (Figure S3), showing that sector positions comprise a hierarchy of correlation strengths. Statistical Independence • The minimum discriminatory information (MDI) – method aims at generalizing the definition of positional conservation based on relative entropies to include correlations between positions. – Its principles are completely distinct from the SCA method. – If two sectors are independent, then the correlation entropy of two taken together must be the sum of their correlation entropies taken individually. Statistical Independence Structural Connectivity Structural Connectivity Structural Connectivity Structural Connectivity • No sector corresponds to any known subdivision of proteins by primary structure segments, secondary structure elements, or subdomain architecture. Structural Connectivity Biochemical Independence • Protein purification and kinetic assays to measure catalytic power of biochemical independence • Thermal denaturation assays to measure stability of biochemical independence Biochemical Independence Biochemical Independence Biochemical Independence Independent Sequence Divergence • PCA of the corresponding similarity matrices to provide independent sequence divergence • The image and instruction is on page 7 of the protein sectors pdf. Sectors in Other Protein Families • Two sectors are evident in the PSD95/Dlg1/ZO1 (PDZ) domain family of protein interaction modules (blue and red, Figures 7A and S11) • Two sectors are also evident in the Per/Arnt/Sim (PAS) domain. • Physically contiguous sectors are also evident in the SH2 and SH3 families of interaction modules (Figures 7C, 7D, S13, and S14). • The image and instruction is on page 9 of the protein sectors pdf. What can we learn?