* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein Structure - CS
Protein (nutrient) wikipedia , lookup
Cell-penetrating peptide wikipedia , lookup
Gene expression wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Biochemistry wikipedia , lookup
Bottromycin wikipedia , lookup
List of types of proteins wikipedia , lookup
Western blot wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein design wikipedia , lookup
Protein adsorption wikipedia , lookup
Proteolysis wikipedia , lookup
Metalloprotein wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Protein folding wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein domain wikipedia , lookup
Rosetta@home wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Structural alignment wikipedia , lookup
Protein Structure Prediction . Protein Structure Amino-acid chains can fold to form 3-dimensional structures Proteins are sequences that have (more or less) stable 3-dimensional configuration Why Structure is Important? The structure a protein takes is crucial for its function Forms “pockets” that can recognize an enzyme substrate Situates side chain of specific groups to co-locate to form areas with desired chemical/electrical properties Creates firm structures such as collagen, keratins, fibroins Determining Structure X-Ray and NMR methods allow to determine the structure of proteins and protein complexes These methods are expensive and difficult Could take several work months to process one proteins A centralized database (PDB) contains all solved protein structures XYZ coordinate of atoms within specified precision ~19,000 solved structures Growth of the Protein Data Bank Structure is Sequence Dependent Experiments show that for many proteins, the 3dimensional structure is a function of the sequence Force the protein to loose its structure, by introducing agents that change the environment After sequences put back in water, original conformation/activity is restored However, for complex proteins, there are cellular processes that “help” in folding Amino Acids What Forces Hold the Structure? Structure is supported by several types of chemical bonds/forces Hydrogen Bonds What Forces Hold the Structure? Charge-charge interactions Positive charged groups prefer to be situated against negatively charged groups What Forces Hold the Structure? Disulfide bonds S-S bonds between cysteine residues These form during folding What Forces Hold the Structure? Hydrophobic effect Levels of structure Secondary Structure -helix -strands Hydrogen Bonds in -Helixes -Strands form Sheets parallel Anti-parallel These sheets hold together by hydrogen bonds across strands Angular Coordinates Secondary residues structures force specific angles between Ramachandran Plot We can related angles to types of structures Labeling Secondary Structure Using both hydrogen bond patterns and angles, we can label secondary structure tags from XYZ coordinate of amino-acids These do not lead to absolute definition of secondary structure Prediction of Secondary Structure Input: amino-acid sequence Output: Annotation sequence of three classes: alpha beta other (sometimes called coil/turn) Measure of success: Percentage of residues that were correctly labeled Protein Folds: sequential, spatial and topological arrangement of secondary structures The Globin fold Approaches for structure prediction Homology modeling (25-30% identity as a predictor) Fold recognition Remote homology Ab initio Prediction Heavy computations Newly Determined StructuresFraction of New Folds Fraction of new folds (PDB new entries in 1998) Koppensteiner et al., 2000, JMB 296:1139-1152. A Finite Number of Protein Folds Aim: recognize fold that “matches” a given sequence Approaches: PSI-Blast, Profile HMMs, etc. Threading Threading: Essential components • structural template 4E • neighbor definition C3 • energy function C2 ACCECADAAC -3-1-4-4-1-4-3-3=-23 A1 E E aib j positionsi, j 10 5 C 9 6 A 8 7 D Eab A C D E . A C -3 -1 -1 -4 0 1 0 2 . . C A A D 0 1 5 6 . E ….. 0 .. 2 .. 6 .. 7 .. . Find best fold for a protein sequence: Fold recognition (threading) 1) ... 56) ... MAHFPGFGQSLLFGYPVYVFGD... -10 ... ... n) ... -123 ... Potential fold 20.5 GenTHREADER (Jones , 1999, JMB 287:797-815) For each template provide MSA align the query sequence with the MSA assess the alignment by sequence alignment score assess the alignment by pairwise potentials assess the alignment by solvation function record lengths of: alignment, query, template Essentials of GenTHREADER Ab-initio Structure Recognition Goal: Predict structure from “first principles” Benefits: Works for novel folds Shows that we understand the process Approaches to Ab-initio Prediction Molecular Dynamics Simulates the forces that governs the protein within water Since proteins natural fold, this would lead to solved structure Problems: Thousands of atoms Huge number of time steps to reach folded protein Intractable problem Approaches to Ab-initio Prediction Minimal Energy Assumption: folded form is the minimal energy conformation of the protein Decomposition: Define energy function Search for 3-D conformation that minimize energy Energy Function Account for the forces that apply on the molecule Van der wals forces Covalent bonds Hydrogen bonds Charges Hydrophobic effects Issues: Estimating parameters How do we compute it --- O( (# atoms)^2 ) Simplified Energy Functions Different levels of granularity Residue-Residue energy function (Bead model) Partial model Backbone as a bid Side-chain as a rigid body that can move wrt to backbone Many other variants Search Strategy High dimensional search problem How do we represent partial solutions? Position of each atom (too detailed!) Position of each reside (too coarse!) Intermediate solutions (e.g., backbone and side chain) Search Strategy Representation tradeoffs X,Y,Z coordinates Easy to compute distances between residues Might represent infeasible solutions Angles between successive residues Easy to ensure a “legal” protein Harder to compute distances Search Strategy Typical approach: Secondary structure prediction Attempts at different conformation keeping secondary structure fixed Finer moves relaxing secondary structure Use Greedy search Simulated annealing … Rosetta Method Idea: “Structural” signatures are reoccurring within protein structures Use these as cues during structure search Local structure motifs I-sites Library = a catalog of local sequence-structure correlations diverging type-2 turn Frayed helix Serine hairpin Proline helix C-cap Type-I hairpin alpha-alpha corner glycine helix N-cap Example: Non-polar Alpha-helix Example: Non-polar beta-strand Example: Gly alpha-C-cap Type 1 Construction of I-sites library Construct profiles (PSI-BLAST like) for each solved structure Collect each possible segments of fixed length (len = 3, 9, 15) Perform k-means clustering of segments Check each cluster for a “coherent” structure (in terms of dihedral angles Prune incoherent structures Iteratively refine remaining clusters by removing structurally different segments, redefining cluster membership, etc. All proteins can be constructed from fragments Recent experiment: For representative proteins, backbones were assembled from a library of 1000 different 5residue fragments. Rosetta: a folding simulation program Fragment insertion Monte Carlo backbone torsion angles fragments accept or reject Choose a fragment change backbone angles Energy function evaluate Convert to 3D Rosetta’s energy function Sequence dependent features Residue-residue contact energies are derived from the database Rosetta’s energy function Sequence-independent features Current structure vector representation Probabilities from the database The energy score for a contact between secondary structures is summed using database statistics. Rosetta prediction results 61% “topologically correct” 60% “locally correct” 73% secondary structure (Q3) correct http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php RMSD L=windowsize Tertiary structure %correct is the fraction of the sequence that is in a 30-residue window with RMSD < 6.0Å 6.0Å L=30 L=20 L=8 Sequence MDA Local structure Teriary structure Evaluation of partially correct predictions Local structure %correct is the fraction of the sequence that has mda < 90°. 90° Sequence mda = maximum deviation in backbone angles over an 8 residue window. T0116 262-322 (61 residues) prediction true structure Topologically correct (rmsd=5.9Å) but helix is mispredicted as loop. T0121 126-199 (66 residues) prediction true structure Topologically correct (rmsd=5.9Å) but loop is mispredicted as helix. T0122 57-153 (97 residues) prediction true structure ...contains a 53 residue stretch with max deviation = 96° prediction T0112 153-213 true structure Low rmsd (5.6Å) and all angles correct ( mda = 84°), but topologically wrong!! (this is rare)