Download ab initio

Clase # 14. Modelado de la estructura de una proteína (II) Prof. Ramón Garduño Juárez Modelado Molecular Diseño de Fármacos Descripción • • • • Protein structure Secondary structure prediction Protein folding Tertiary structure prediction – ab initio structure predictions – Homology modeling – Fold recognition Protein Structure APRKFFVGGNWKMNGDKKSLG ELIHTLNGAKLSADTEVVCGA PSIYLDFARQKLDAKIGVAAQ NCYKVPKGAFTGEISPAMIKD IGAAWVILGHSERRHVFGESD ELIGQKVAHALAEGLGVIACI GEKLDEREAGITEKVVFEQTK AIADNVKDWSKVVLAYEPVWA IGTGKTATPQQAQEVHEKLRG WLKSHVSDAVAQSTRIIYGGS VTGGNCKELASQHDVDGFLVG GASLKPEFVDIINAKH = Protein secondary structures Alpha-helix: • Right-handed helix • 3.6 residues per helix turn • Hydrogen bond between n and n+4 Beta strand and beta sheet Side Chain Conformation • The side chain atoms of amino acids are named in the Greek alphabet according to this scheme. • The side chain torsion angles are named chi1, chi2, chi3, etc., as shown below for lysine. Secondary structure prediction • Rule-based approach • Each residue is assigned to one of the three classes: alpha, beta, coil. • Propensity: each of the 20 amino acid is assigned a probability of being alpha, beta, and coil. • Some straightforward observations: HHPPHHPP might be alpha; HPHPHPHP might be beta. • Neural network model and information theory are used. The success rate is 65-70% • When using multiple sequence alignment, the rate can be improved to > 70% • In CASP4 and CASP5, PHD can achieve 80% accuracy. Secondary structure prediction: history • 1974. Chou and Fasman propose a statistical method based on the propensities of amino acids to adopt secondary structures based on the observation of their location in 15 protein structures determined by X-ray diffraction. Clearly these statistics derive from the particular stereochemical and physicochemical properties of the amino acids. Rather than a position by position analysis the propensity of a position is calculated using an average over 5 or 6 residues surrounding each position. On a larger set of 62 proteins the base method reports a success rate of 50%. (page 446) •1978 Garnier improved the method by using statistically significant pair-wise interactions as a determinant of the statistical significance. This improved the success rate to 62% (page 447) •1993 Levin improved the prediction level by using multiple sequence alignments. The reasoning is as follows. Conserved regions in a multiple sequence alignment provides a strong evolutionary indicator of a role in the function of the protein. Those regions are also likely to have conserved structure, including secondary structure and strengthen the prediction by their joint propensities. This improved the success rate to 69%. •1994 Rost and Sander combined neural networks with multiple sequence alignments. The idea of a neural net is to create a complex network of interconnected nodes, where progress from one node to the next depends on satisfying a weighted function that has been derived by training the net with data of known results, in this case protein sequences with known secondary structures. The success rate is 72%. (page 450) Chou-Fasman • Calculate the frequency of each of the 20 amino acid in helix, sheet, and turns. • Frequency of i in structure s is divided by the frequency of all residues in structure s. • First scan the sequence to find a short piece of AA that have high probability of helix or sheet (4/6 for alpha, 3/5 for beta when score is larger than 1) • Then extend the pieces until prediction values for four AA drops below 1. • Turns are predicted: – The score for 4 AA in turn is larger than in helix and sheet – Position dependent score in turn is larger than 7.5*10E(-5) Protein folding and unfolding • Livinthal paradox: By enumeration, a 100-residue protein needs 10^29 years to find its native structure. Certain pathways should exist to guide the folding • Lattice model and atomic model • Atomic model is used mostly in unfolding simulations • Folding @ home • IBM Blue Genes • “new view” of protein folding (Peter Wolynes) through funnels in the energy landscape • Not all sequences have unique native structure Protein Folding • As proteins are formed from RNA templates, they are defined as long polypeptide chains with specific amino acid sequences that fold into threedimensional bundles whose structure governs their function. – In living organisms, the specific steps of the folding process have been hard to discern experimentally and characterize theoretically. – It seems that all the information needed to get to a precise three-dimensional shape is "in there already," contained in the one-dimensional amino acid sequence. • Protein structures are determined by a large number of conflicting and largely canceling forces exerted on the protein residues by the surrounding solvent and other residues in the protein chain. – Hydrophobic, entropic, electrostatic, vdW, etc. – In one type of protein, globular protein, the protein molecule can spontaneously and reproducibly fold to a compact, well defined structure. Ab initio Prediction of Protein Structure • • Need to find a potential function where E(S, Cnative) < E(S, Cnon_native) holds. Need to construct an algorithm to find the global minimum of this function. – Unsolved problems. Levinthal’s Paradox: For a protein with N residues, the size of its conformational space is about 10N states. – Assume the main chain conformation of a protein is adequately represented by 10 torsion angles. – Neglecting all the side chain conformation. – For a chain of 100 residues, no physically achievable search algorithm would enable it to complete its folding process. • If the atoms can move in light speed, it takes 1082 seconds, but the age of the universe is estimated as only 1017 seconds. • Protein does not fold by searching the entire conformational space. • • Are there folding pathways? Could proteins exist in metastable states? Bovine Pancreatic Trypsin Inhibitor (BPTI) • BPTI is composed of 58 amino acid residues folded into a single compact domain. – The folded conformation is stabilized by three disulfide bonds, and reducing all three disulfides leads to nearly complete unfolding. – In the absence of any one of the disulfides, however, the protein retains nearly all of its folded structure but is significantly destabilized. • BPTI can be unfolded by reducing its disulfides and can then be refolded upon the addition of an appropriate oxidant, such as the disulfide forms of glutathione or dithiothreitol. – The folding pathway was analyzed by chemically trapping and analyzing intermediate species containing one or two disulfide bonds (Creighton). – After trapping, the intermediates were physically separated by chromatography and their disulfide bonds identified by peptide mapping. – The rates of interconversion among the various species were analyzed to determine a kinetic mechanism. • The pathway shown below has been derived. The various intermediates are identified by the disulfide bonds they contain. Thus, the [30-51,5-14] intermediate contains two disulfide bonds, linking cysteines 30 and 51 and cysteines 5 and 14. Each of the major intermediates shown in the pathway, or an analog of the intermediate, has been studied by NMR spectroscopy (eg. in the laboratories of T.E. Creighton, P.S. Kim and C.K. Woodward), and the schematic representations of the intermediates are drawn to indicate qualitatively the extent to which they contain structure found in the native protein. Mutational analysis of protein folding • One or more amino acid residues can be replaced to alter interactions. By measuring the effects of these changes on the native protein and folding intermediates, the roles of the altered residues at various stages of folding can be inferred. – Amino acid replacements at different sites can have quite different effects on the stabilities of the various intermediates. – Mutations generally have small effects. Phi-value analysis Computer modeling such as molecular dynamics can be used in generating transition state structure, and in phi-value calculation Protein Folding Landscape Theory (Wolynes, Onuchic, Dill, Chan, Sali, Karplus, Brooks etc) Proteins fold on timescales ranging from a microsecond to a few minutes, so they obviously drive or are driven quickly toward the native state. • Folding can be described as the descent of the folding chain down a 'folding funnel,' with local roughness of the funnel reflecting the potential for transient trapping in local minima and the overall slope of the funnel representing the thermodynamic drive to the native state. • A key notion is, in all but the final stages of folding, there exists an ensemble of structures--protein folding consequently occurs via multiple pathways. or, Funnel Theory • There cannot be a single pathway. – if an ensemble of denatured proteins all must pass through a single narrow pathway in their phase space, then there must be a large reduction in entropy upon entering this path. This step would consequently be very unlikely and rate limiting. – It is much more likely that proteins fold via many different pathways. • This picture of protein folding dynamics, while similar to classical transition state theory, is different in spirit. – In this picture of two state systems, the barrier is a free energy barrier: an energetic barrier does not exist. – The transition state is composed of a broad ensemble of structures rather than one particular structure. – This does not mean that the transition state is completely random. The transition state may be characterized by partial structure in the form of stable pieces of secondary structure or partially correct backbone shape. • Protein folding can be given a lower order description as a quasi-static evolution of an ensemble of equilibrium structures from the denatured state to the native state over a relatively modest free energy barrier. – By grouping protein states according to a reaction coordinate, this process can thus be re-expressed as a diffusion equation. • What constitutes a well-designed protein (well designed as a folder, not well designed functionally)? – Well designed proteins are those proteins which have a large diffusion constant at temperatures below which the native state is stable and populated. – That is, well designed proteins can quickly find the native state. (Of course, for real proteins this temperature must also be between zero and one hundred C.) – Probably one reason why a protein's native state is only marginally stable is that greater stability of the native state would result in less specific residueresidue interactions leading to a much lower diffusion constant and difficulty folding. – This problem is evident in proteins with disulfur bridges: misformed disulfur bridges lead to a slowing down in the folding process. • A simplified latice model system of 27 beads (Sali et al, 1994). In this model, protein folding was simulated and the lowest energy state could be identified by enumerating various states, and by calculating explicit contacts along the lattice. – 200 random sequences of the 27 beads are generated and subjected each to the simulation. – The number density of different folds was calculated versus energy, entropy, and free energy at incremental stages of folding from Q=0 to Q=1 (Q=#correct contacts/total contacts). – This data generated a two tiered time frame that modeled protein folding. • First, a protein quickly progresses from a huge number of conformations (1016) to a much smaller population (1010). • These structures then progress through a slow, rate limiting step until one of the 103 native-like structures is found (Q>0.8). • Once a suitable structure is found, the protein quickly folds into the native form. • The topology of the protein native state appears to influence the folding mechanism. – In all-alpha proteins, the formation of the tertiary (native) structure occurs concurrently with the formation of secondary structure. – In a mixed a/ß protein, a general collapse (reduction in radius of gyration) occurs first, followed by evolution toward the native state. • The corresponding diagrams show the folding landscapes in terms of the radius of gyration (in angstroms, vertical axis) versus the fraction of native contacts (horizontal axis). – The all-alpha protein finds its native structure straightforwardly; – the others "collapse" in two stages, as shown by the L-shaped landscapes. Quiz 5 • In MD simulation, if one wants to raise the temperature from 0k to 300k, in a total of 100,000 timesteps, give two algorithms to achieve this. Give detailed steps (such as from which step to which step, do what calculation, etc).

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ab initio