* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein Structure - CS
Protein (nutrient) wikipedia , lookup
Cell-penetrating peptide wikipedia , lookup
Gene expression wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Biochemistry wikipedia , lookup
Bottromycin wikipedia , lookup
List of types of proteins wikipedia , lookup
Western blot wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein design wikipedia , lookup
Protein adsorption wikipedia , lookup
Proteolysis wikipedia , lookup
Metalloprotein wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Protein folding wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein domain wikipedia , lookup
Rosetta@home wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Structural alignment wikipedia , lookup
Protein Structure Prediction . Protein Structure  Amino-acid chains can fold to form 3-dimensional structures  Proteins are sequences that have (more or less) stable 3-dimensional configuration Why Structure is Important? The structure a protein takes is crucial for its function  Forms “pockets” that can recognize an enzyme substrate  Situates side chain of specific groups to co-locate to form areas with desired chemical/electrical properties  Creates firm structures such as collagen, keratins, fibroins Determining Structure  X-Ray and NMR methods allow to determine the structure of proteins and protein complexes  These methods are expensive and difficult  Could take several work months to process one proteins A centralized database (PDB) contains all solved protein structures  XYZ coordinate of atoms within specified precision  ~19,000 solved structures Growth of the Protein Data Bank Structure is Sequence Dependent  Experiments show that for many proteins, the 3dimensional structure is a function of the sequence  Force the protein to loose its structure, by introducing agents that change the environment  After sequences put back in water, original conformation/activity is restored  However, for complex proteins, there are cellular processes that “help” in folding Amino Acids What Forces Hold the Structure?  Structure is supported by several types of chemical bonds/forces  Hydrogen Bonds What Forces Hold the Structure?  Charge-charge  interactions Positive charged groups prefer to be situated against negatively charged groups What Forces Hold the Structure?  Disulfide   bonds S-S bonds between cysteine residues These form during folding What Forces Hold the Structure?  Hydrophobic effect Levels of structure Secondary Structure -helix -strands Hydrogen Bonds in -Helixes -Strands form Sheets parallel Anti-parallel These sheets hold together by hydrogen bonds across strands Angular Coordinates  Secondary residues structures force specific angles between Ramachandran Plot  We can related angles to types of structures Labeling Secondary Structure  Using both hydrogen bond patterns and angles, we can label secondary structure tags from XYZ coordinate of amino-acids  These do not lead to absolute definition of secondary structure Prediction of Secondary Structure Input:  amino-acid sequence Output:  Annotation sequence of three classes:  alpha  beta  other (sometimes called coil/turn) Measure of success:  Percentage of residues that were correctly labeled Protein Folds: sequential, spatial and topological arrangement of secondary structures The Globin fold Approaches for structure prediction Homology modeling  (25-30% identity as a predictor) Fold recognition  Remote homology Ab initio Prediction  Heavy computations Newly Determined StructuresFraction of New Folds Fraction of new folds (PDB new entries in 1998) Koppensteiner et al., 2000, JMB 296:1139-1152. A Finite Number of Protein Folds Aim: recognize fold that “matches” a given sequence Approaches:  PSI-Blast, Profile HMMs, etc.  Threading Threading: Essential components • structural template 4E • neighbor definition C3 • energy function C2 ACCECADAAC -3-1-4-4-1-4-3-3=-23 A1 E E aib j positionsi, j 10 5 C 9 6 A 8 7 D Eab A C D E . A C -3 -1 -1 -4 0 1 0 2 . . C A A D 0 1 5 6 . E ….. 0 .. 2 .. 6 .. 7 .. . Find best fold for a protein sequence: Fold recognition (threading) 1) ... 56) ... MAHFPGFGQSLLFGYPVYVFGD... -10 ... ... n) ... -123 ... Potential fold 20.5 GenTHREADER (Jones , 1999, JMB 287:797-815) For each template provide MSA  align the query sequence with the MSA  assess the alignment by sequence alignment score  assess the alignment by pairwise potentials  assess the alignment by solvation function  record lengths of: alignment, query, template Essentials of GenTHREADER Ab-initio Structure Recognition Goal:  Predict structure from “first principles” Benefits:  Works for novel folds  Shows that we understand the process Approaches to Ab-initio Prediction Molecular Dynamics  Simulates the forces that governs the protein within water  Since proteins natural fold, this would lead to solved structure Problems:  Thousands of atoms  Huge number of time steps to reach folded protein Intractable problem Approaches to Ab-initio Prediction Minimal Energy  Assumption: folded form is the minimal energy conformation of the protein Decomposition:  Define energy function  Search for 3-D conformation that minimize energy Energy Function  Account      for the forces that apply on the molecule Van der wals forces Covalent bonds Hydrogen bonds Charges Hydrophobic effects Issues:  Estimating parameters  How do we compute it --- O( (# atoms)^2 ) Simplified Energy Functions Different levels of granularity  Residue-Residue energy function (Bead model)  Partial   model Backbone as a bid Side-chain as a rigid body that can move wrt to backbone  Many other variants Search Strategy  High dimensional search problem How do we represent partial solutions?  Position of each atom (too detailed!)  Position of each reside (too coarse!)  Intermediate solutions (e.g., backbone and side chain) Search Strategy Representation tradeoffs  X,Y,Z   coordinates Easy to compute distances between residues Might represent infeasible solutions  Angles   between successive residues Easy to ensure a “legal” protein Harder to compute distances Search Strategy Typical approach:  Secondary structure prediction  Attempts at different conformation keeping secondary structure fixed  Finer moves relaxing secondary structure Use  Greedy search  Simulated annealing … Rosetta Method Idea:  “Structural” signatures are reoccurring within protein structures  Use these as cues during structure search Local structure motifs I-sites Library = a catalog of local sequence-structure correlations diverging type-2 turn Frayed helix Serine hairpin Proline helix C-cap Type-I hairpin alpha-alpha corner glycine helix N-cap Example: Non-polar Alpha-helix Example: Non-polar beta-strand Example: Gly alpha-C-cap Type 1 Construction of I-sites library  Construct profiles (PSI-BLAST like) for each solved structure  Collect each possible segments of fixed length (len = 3, 9, 15)  Perform k-means clustering of segments  Check each cluster for a “coherent” structure (in terms of dihedral angles  Prune incoherent structures  Iteratively refine remaining clusters by removing structurally different segments, redefining cluster membership, etc. All proteins can be constructed from fragments Recent experiment: For representative proteins, backbones were assembled from a library of 1000 different 5residue fragments. Rosetta: a folding simulation program Fragment insertion Monte Carlo backbone torsion angles fragments accept or reject Choose a fragment change backbone angles Energy function evaluate Convert to 3D Rosetta’s energy function Sequence dependent features Residue-residue contact energies are derived from the database Rosetta’s energy function Sequence-independent features Current structure vector representation Probabilities from the database The energy score for a contact between secondary structures is summed using database statistics. Rosetta prediction results 61% “topologically correct” 60% “locally correct” 73% secondary structure (Q3) correct http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php RMSD L=windowsize Tertiary structure %correct is the fraction of the sequence that is in a 30-residue window with RMSD < 6.0Å 6.0Å L=30 L=20 L=8 Sequence MDA Local structure Teriary structure Evaluation of partially correct predictions Local structure %correct is the fraction of the sequence that has mda < 90°. 90° Sequence mda = maximum deviation in backbone angles over an 8 residue window. T0116 262-322 (61 residues) prediction true structure Topologically correct (rmsd=5.9Å) but helix is mispredicted as loop. T0121 126-199 (66 residues) prediction true structure Topologically correct (rmsd=5.9Å) but loop is mispredicted as helix. T0122 57-153 (97 residues) prediction true structure ...contains a 53 residue stretch with max deviation = 96° prediction T0112 153-213 true structure Low rmsd (5.6Å) and all angles correct ( mda = 84°), but topologically wrong!! (this is rare)
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            