* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PowerPoint
Biochemistry wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Point mutation wikipedia , lookup
Magnesium transporter wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Interactome wikipedia , lookup
Western blot wikipedia , lookup
Metalloprotein wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein purification wikipedia , lookup
Proteolysis wikipedia , lookup
Protein Structure Prediction [Based on Structural Bioinformatics, section VII] . Predicting protein 3d structure Goal: 3d structure from 1d sequence What kind of fold the given sequence may adopt? An existing fold Fold recognition Comparative modeling A new fold ab-initio Measuring progress CASP – Critical Assessment of Structure Prediction CAFASP – Critical Assessment of Fully Automated Structure Prediction Targets: unpublished NMR or X-ray structures Goal: predict target 3d structure and submit it for independent and comparative review What Forces Hold the Structure? Hydrogen Bonds What Forces Hold the Structure? • Charge-charge interactions • Positive charged groups prefer to be situated against negatively charged groups • Hydrophobic effect What Forces Hold the Structure? Disulfide bonds S-S bonds between Cysteine residues Homology modeling Based on the two major observations: 1. The structure of a protein is uniquely defined by its amino acid sequence. 2. Similar sequences adopt practically identical structures, distantly related sequences still fold into similar structures. Growth of the Protein Data Bank Fraction of New Folds Two zones of sequence alignment [Rost, Protein Eng. 1999] The 7 steps to homology modeling 1. Template recognition and initial alignment ― BLAST, FASTA 2. Alignment correction ― Better alignment, MSA The 7 steps to homology modeling 3. Backbone generation ― Copy backbone atoms [and side-chains of conserved residues] 4. Loop modeling ― Knowledge based ― Energy based The 7 steps to homology modeling 5. Side-chain modeling ― Rotamer: a low energy side-chain conformation ― Rotamer library [backbone independent, dependent] ― HUGE search space [~5N] High accuracy for residues in the hydrophobic core [90%], much lower for residues in the surface [50%] The 7 steps to homology modeling 6. Model optimization ― Predict the side-chains, then the resulting shifts in the backbone, then the rotamers for the new backbone … 7. Model validation ― Calculating the model’s energy ― Determination of normality indices: ― ― ― bond lengths, bond and torsion angles Inside/outside distribution of polar residues Radial distribution function Predicting protein 3d structure Goal: 3d structure from 1d sequence What kind of fold the given sequence may adopt? An existing fold Fold recognition Comparative modeling A new fold ab-initio Fold recognition Which of the known folds is likely to be similar to the (unknown) fold of a new protein when only its amino-acid sequence is known? Fraction of new folds (PDB new entries in 1998) Koppensteiner et al., 2000, JMB 296:1139-1152. Unrelated proteins adopt similar folds Only 100 folds account for ~50% of all protein superfamilies Possible explanations: 1. Divergent evolution 2. Convergent evolution 3. Limited number of folds 4. Misguided analysis Proteins as seen by a Biologist Does a new protein sequence belong to a given family of proteins (with a specific set of mutation rules)? Fold recognition is based on: • Sequence alignment, multiple sequence alignment • Profile HMM, PSI-BLAST Proteins as seen by a Physicist “Thermodynamic hypothesis”: The native conformation of a protein corresponds to a global free energy minimum of the system (protein + solvent) Naïve approach: having a correct energy function, search for the native structure in the conformational space Threading Threading: energy based fold recognition Define: 1. Protein model and interaction description 2. Alignment algorithm 4E 3. Energy parameterization C3 E Eaib j positionsi, j Eab A C D E . A C -3 -1 -1 -4 0 1 0 2 . . D 0 1 5 6 . E ….. 0 .. 2 .. 6 .. 7 .. . C2 A1 10 5 C 9 6 A 8 7 D C A A Find best fold for a protein sequence: Fold recognition (threading) 1) ... 56) ... MAHFPGFGQSLLFGYPVYVFGD... -10 ... ... n) ... -123 ... Potential fold 20.5 GenTHREADER (Jones , 1999, JMB 287:797-815) For each template provide MSA align the query sequence with the MSA assess the alignment by sequence alignment score assess the alignment by pairwise potentials assess the alignment by solvation function record lengths of: alignment, query, template Essentials of GenTHREADER Predicting protein 3d structure Goal: 3d structure from 1d sequence What kind of fold the given sequence may adopt? An existing fold Fold recognition Comparative modeling A new fold ab-initio Ab-initio folding Goal: Predict structure from “first principles” Requires: A free energy function, sufficiently close to the “true potential” A method for searching the conformational space Benefits: Works for novel folds Shows that we understand the process Ab-initio folding – the challenge 1. 2. Current potential functions have limited accuracy The conformational space is HUGE Possible simplifications: Reduced representation Simplified potentials Coarse search strategies Representation Detailed representation – include all atoms of the protein and the surrounding solvent computational expansive • • • • • Implicit solvent models United atom representation Side-chain as centroid or cα Restricted side-chain configurations (rotamers) Restricted backbone torsion angles Rosetta [Simons et al. 1997] • • “Structural” signatures are reoccurring within protein structures Use these as cues during structure search I-sites Library – a catalog of local sequence-structure correlations Serine hairpin Type-I hairpin Frayed helix Rosetta: a folding simulation program fragments Fragment insertion Monte Carlo backbone torsion angles accept or reject Choose a fragment change backbone angles Energy function evaluate Convert to 3D Potential functions • Molecular mechanics – models the forces that determines protein conformation • Van der Waals: Lennard-Jones 12-6 • Electrostatic: Coulomb’s law • Scoring functions – empirically derived from solved structures • Useful with reduced complexity models • Useful in treating aspects of protein thermodynamics Search methods • Molecular dynamics – Simulates the motion of a molecule in a given potential • Impractical … • Coarse sampling of energy landscape: • Simulated annealing, genetic algorithms, …