* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein folding. Anfinsen`s experiments.
Paracrine signalling wikipedia , lookup
Genetic code wikipedia , lookup
Drug design wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Biochemistry wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Point mutation wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Interactome wikipedia , lookup
Metalloprotein wikipedia , lookup
Western blot wikipedia , lookup
Protein purification wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Protein structure prediction. Protein domains can be defined based on: • Geometry: group of residues with the high contact density, number of contacts within domains is higher than the number of contacts between domains. - chain continuous domains - chain discontinous domains • Kinetics: domain as an independently folding unit. • Physics: domain as a rigid body linked to other domains by flexible linkers. • Genetics: minimal fragment of gene that is capable of performing a specific function. Domains as recurrent units of proteins. • The same or similar domains are found in different proteins. • Each domain has a well determined compact structure and performs a specific function. • Proteins evolve through the duplication and domain shuffling. • Protein domain classification based on comparing their recurrent sequence, structure and functional features – Conserved Domain Database Protein folds. • Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity (topology). Sometimes a few SSEs may be missing. • Fold classification: structural similarity between folds is searched using structurestructure comparison algorithms. Definition of protein folds. Protein fold – arrangement of secondary structures into a unique topology/tertiary structure. Example of alpha+beta proteins: •TIM beta/alpha-barrel contains parallel beta-sheet barrel, closed; n=8, S=8; strand order 12345678, surrounded by alpha-helices •NAD(P)-binding Rossmann-fold domains core: 3 layers, a/b/a; parallel beta-sheet of 6 strands, •order 321456 Fold recognition. Unsolved problem: direct prediction of protein structure from the physico-chemical principles. Solved problem: to recognize, which of known folds are similar to the fold of unknown protein. Fold recognition is based on observations/assumptions: - The overall number of different protein folds is limited (1000-3000 folds) - The native protein structure is in its ground state (minimum energy) Protein structure prediction flowchart Protein sequence Database similarity search Yes Predicted threedimensional structural model Does sequence align with a protein of known structure ? Three-dimensional comparative modeling Yes Threedimensional structural analysis in laboratory No Is there a predicted structure? Yes No Protein family analysis Relationship to known structure? No Structural analysis From D.W.Mount Protein structure prediction. Prediction of three-dimensional structure from its protein sequence. Different approaches: - Homology modeling (predicted structure has a very close homolog in the structure database). - Fold recognition (predicted structure has an existing fold). - Ab initio prediction (predicted structure has a new fold). Homology modeling. Aims to produce protein models with accuracy close to experimental and is used for: - Protein structure prediction - Drug design - Prediction of functionally important sites (active or binding sites) Steps of homology modeling. 1. 2. 3. 4. 5. Template recognition & initial alignment. Backbone generation. Loop modeling. Side-chain modeling. Model optimization. 1. Template recognition. Recognition of similarity between the target and template. Target – protein with unknown structure. Template – protein with known structure. Main difficulty – deciding which template to pick, multiple choices/template structures. Template structure can be found by searching for structures in PDB using sequence-sequence alignment methods. Two zones of sequence alignment. Two sequences are guaranteed to fold into the same structure if their length and sequence identity fall into “safe” zone. Sequence identity 100 Homology modeling zone 50 Twilight zone 50 100 150 200 Alignment length 2. Backbone generation. If alignment between target and template is ready, copy the backbone coordinates of those template residues that are aligned. If two aligned residues are the same, copy their side chain coordinates as well. 3. Insertions and deletions. insertion AHYATPTTT AH---TPSS deletion Occur mostly between secondary structures, in the loop regions. Loop conformations – difficult to predict. Approaches to loop modeling: - Knowledge-based: searches the PDB for loops with known structure - Energy-based: an energy function is used to evaluate the quality of a loop. Energy minimization or Monte Carlo. 4. Side chain modeling. Side chain conformations – rotamers. In similar proteins side chains have similar conformations. If % identity is high - side chain conformations can be copied from template to target. If % identity is not very high modeling of side chains using libraries of rotamers and different rotamers are scored with energy functions. Problem: side chain configurations depend on backbone conformation which is predicted, not real E2 E3 E1 E = min(E1, E2, E3) 5. Model optimization. Energy optimization of entire structure. Since conformation of backbone depends on conformations of side chains and vice versa iteration approach: Predict rotamers Shift in backbone Classwork I: Homology modeling. - Go to NCBI Entrez, search for gi461699 Do Blast search against PDB Repeat the same for gi60494508 Compare the results Fold recognition. Goal: to find protein with known structure which best matches a given sequence. Since similarity between target and the closest to it template is not high, sequence-sequence alignment methods fail. Solution: threading – sequence-structure alignment method. Threading – method for structure prediction. Sequence-structure alignment, target sequence is compared to all structural templates from the database. Requires: - Alignment method (dynamic programming, Monte Carlo,…) - Scoring function, which yields relative score for each alternative alignment Scoring function for threading. • Contact-based scoring function depends on the amino acid types of two residues and distance between them. • Sequence-sequence alignment scoring function does not depend on the distance between two residues. • If distance between two nonadjacent residues in the template is less than 8 Å, these residues make a contact. Scoring function for threading. Ala Trp Tyr Ile S N w(a , a i , j 1 i j ); S w( Ala, Tyr ) w( Ile, Trp) w is calculated from the frequency of amino acid contacts in PDB; ai – amino acid type of target sequence aligned with the position “i” of the template; N- number of contacts Classwork I: calculate the score for target sequence “ATPIIGGLPY” aligned to template structure which is defined by the contact matrix. A T 1 1 2 3 4 * 5 6 * 7 8 9 10 * Y 2 3 I G * * 6 7 * * * 8 * 9 10 T P Y I G L -0.2 -0.1 0 -0.1 0.5 -0.2 0.2 0.3 -0.1 -0.2 -0.3 0.1 0 -0.2 -0.4 -0.1 0.1 -0.2 -0.4 -0.2 -0.1 -0.2 0.3 0.2 0.4 0.4 0.2 * 4 5 P A * * L 0.3 Alignment algorithms. • Dynamic programming. “frozen approximation”: traceback in the alignment matrix is not possible for interactions between two amino acids, so that: S N w(a , b ) i , j 1 i j b – amino acid type from template, not from target; now the score of every position does not depend on the alignment elsewhere in the sequence. • Monte Carlo Optimize the Sum of Residue-Residue Contact Potentials ... …. by a Monte Carlo Alignment Algorithm CASP prediction competitions. Threading model validation. • Correct bond length and bond angles >> 3.8 Angstroms • Correct placement of functionally important sites • Prediction of global topology, not partial alignment (minimum number of gaps) Placement of functionally important sites in threading. Prediction of structure of methylglyoxal synthase based on the template of carabamoyl phosphate synthase Classwork II: Homology modeling. - Go to NCBI Entrez, search for gi461699 Do Blast search against PDB Repeat the same for gi60494508 Predict functionally important sites GenThreader http://bioinf.cs.ucl.ac.uk/psipred. 1. Predicts secondary structures for target sequence. 2. Makes sequence profiles (PSSMs) for each template sequence. 3. Uses threading scoring function to find the best matching profile. Classwork III. - Go to http://bioinf.cs.ucl.ac.uk/psipred - Go over the options of protein structure prediction program - Predict structure for protein sequence (“gwu_thread_seq.txt”) http://bioinf2.cs.ucl.ac.uk/psiout/29594540 ad0cf784.gen.html Protein engineering and protein design. Protein engineering – altering protein sequence to change protein function or structure Protein design – designing de novo protein which satisfies a given requirement Protein engineering strategies. Goals: • Design proteins with certain function • Increase activity of enzymes • Increase binding affinity and specificity of proteins • Increase protein stability • Design proteins which bind novel ligands Protein engineering uses combinatorial libraries. • Random mutagenesis introduces different mutations in many genes of interest. • Active proteins are separated from inactive ones: - in vivo (measuring effect on the whole cell) - in vitro (phage display, gene is inserted into phage DNA, expressed, selected if it binds immobilized target protein) Specificity of Kunitz inhibitors can be optimized by protein engineering. • Kunitz domains – specific inhibitors of trypsin-like proteinases, highly conserved structure with only 33% identity. • Each Kunitz domain recognizes one or more proteinases through the binding loop (yellow). • Phage display method found mutants of Kunitz inhibitors which have higher specificity than native ones. • Modeling of mutant proteins showed that enhanced specificity is caused by increased complementarity between binding loop and the active site. Native state can be stabilized by reducing the difference in entropy between folded and unfolded conformations G G H TS U F ΔG Reaction coordinate Model system: lysozyme from bacteriophage T4. • Lysozyme has the ability to lyse certain bacteria by hydrolyzing the b-linkage between N-acetylmuramic acid (NAM) and N-acetylglucosamine (NAG) of the peptidoglycan layer in the bacterial cell wall. • Conformational transition in lysozyme involves the relative movement of its two lobes to each other in a cooperative manner Disulfide bridges increase protein stability. • Increasing stability by reducing the number of unfolded conformations (since enthalpic contribution will be the same for folded and unfolded states). • Task: to find positions on backbone where Cysteines can be introduced for disulfide bonds formation. Strategy of introducing a new disulfide bond. B. Mathews, 1989: • Analysis of disulfide bonds geometries in existing structures. • Analysis of all pairs of amino acids which are close in space. • Energy optimization of candidate disulfide bonds. • Analysis of destabilizing effect of exchanging native amino acids into Cys. As a result: three disulfide bonds were introduced through mutagenesis experiments in lysozyme Stability of mutants compared to wildtype protein. Measure of stability – melting temperature at which 50% of enzyme is inactivated during reversible heat denaturation. For wild-type Tm = 42 C. • all mutants were more stable than wild-type. • the longer the loop between Cys, the larger the effect (the more restricted is unfolded state). • the more disulfide bonds were introduced, the more stable was the mutant. From B. Mathews et al Attempts to fill cavities to stabilize lysozyme failed… • Introduction of cavities of size –CH3 group destabilizes protein by ~ 1kcal/mol. • T4 lysozyme has two cavities; mutations Leu Phe and Ala Val destabilize the protein by ~ 0.5-1.0 kcal/mol. • New side-chains (Val and Phe) adopt unfavorable conformations in cavities. Classwork IV: analyzing the lysozyme’s mutants. • Retrieve structure neighbors (1PQM and 1KNI) of 2LZM. • Which mutant might have an increased stability and why? Can structural scaffolds be reduced in size with maintaining function? A. Braisted & J.A. Wells used Z-domain (58 residues) of bacterial protein A: • removed third helix (truncated protein - 38 residues); • mutated residues in the first and second helices; • used phage display to select active forms; • restored the binding of truncated protein. Designing an amino acid sequence that will fold into a given structure. • Inverse protein folding problem: designing a sequence which will fold into a given structure – much easier than folding problem! • B. Dahiyat & S. Mayo: designed a sequence of zinc finger domain that does not require stabilization by Zn. • Wild type protein domain is stabilized by Zn (bound to two Cys and two His); mutant is stabilized by hydrophobic interactions. Paracelsus challenge: convert one fold into another by changing 50% of residues. • Challenge because all proteins with > 30% identity seem to have the same fold. • L.Regan et al: Protein G (mainly beta-sheet) was converted to Rop protein (alpha-helical) by changing only 50% residues