* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download lecture10_13
G protein–coupled receptor wikipedia , lookup
Interactome wikipedia , lookup
Western blot wikipedia , lookup
Non-coding DNA wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Point mutation wikipedia , lookup
Metalloprotein wikipedia , lookup
Proteolysis wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Epitranscriptome wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Gene expression wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Structural Bioinformatics Protein Tertiary Structure Prediction The Different levels of Protein Structure Primary: amino acid linear sequence. Secondary: -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded polypeptide chain The 3D structure of a protein is stored in a coordinate file Each atom is represented by a coordinate in 3D (X, Y, Z) The coordinate file can be viewed graphically RBP Description is given in slides 35-36 Predicting 3D Structure Outstanding difficult problem Based on sequence homology – Comparative modeling (homology) Based on structural homology – Fold recognition (threading) Comparative Modeling Similar sequences suggests similar structure Sequence and Structure alignments of two Retinol Binding Protein Structure Alignments There are many different algorithms for structural Alignment. The outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures. The RMSD of two aligned structures indicates their divergence from one another. Low values of RMSD mean similar structures Comparative Modeling Similar sequence suggests similar structure Builds a protein structure model based on its alignment (sequence) to one or more related protein structures in the database Comparative Modeling • Accuracy of the comparative model is usually related to the sequence identity on which it is based >50% sequence identity = high accuracy 30%-50% sequence identity= 90% can be modeled <30% sequence identity =low accuracy (many errors) However other parameters (such as identify length) can influence the results Comparative Modeling Modeling of a sequence based on known structures Consist of four major steps : 1. Finding a known structure(s) related to the sequence to be modeled (template), using sequence comparison methods such as PSI-BLAST 2. Aligning sequence with the templates 3. Building a model 4. Assessing the model What is a good model? What is a good model? What is a good model? Fold Recognition Protein Folds: sequential and spatial arrangement of secondary structures Globin TIM Similar folds usually mean similar function Homeodomain Transcription factors The same fold can have multiple functions Rossmann 12 different functions TIM barrel 31 different functions Fold Recognition • Fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity. • Search for folds that are compatible with a particular sequence. • "the turn the protein folding problem on it's head” rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence Basic steps in Fold Recognition : Compare sequence against a Library of all known Protein Folds (finite number) Query sequence MTYGFRIPLNCERWGHKLSTVILKRP... Goal: find to what folding template the sequence fits best There are different ways to evaluate sequence-structure fit There are different ways to evaluate sequence-structure fit 1) ... 56) ... MAHFPGFGQSLLFGYPVYVFGD... -10 ... ... n) ... -123 ... Potential fold 20.5 Ab Initio Modeling • Compute molecular structure from laws of physics and chemistry alone Theoretically Ideal solution Practically nearly impossible WHY ? – Exceptionally complex calculations – Biophysics understanding incomplete How do we know what is a good prediction ??? CASP - Critical Assessment of Structure Prediction • Competition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally. • Current state – ab-initio - the worst, but greatly improved in the last years. – Modeling - performs very well when homologous sequences with known structures exist. – Fold recognition - performs well. What can you do? FOLDIT Solve Puzzles for Science A computer game to fold proteins http://fold.it/portal/puzzles What’s Next Predicting function from structure Structural Genomics : a large scale structure determination project designed to cover all representative protein structures ATP binding domain of protein MJ0577 Zarembinski, et al., Proc.Nat.Acad.Sci.USA, 99:15189 (1998) As a result of the Structure Genomic initiative many structures of proteins with unknown function are solved Wanted ! Automated methods to predict function from the protein structures resulting from the structural genomic project. An “out of the box” approach for predicting function from structure DNA binding interface RNA binding interface RNA and DNA binding interfaces tend to have different geometric features DNA binding interface RNA binding interface Applying Differential Geometry to characterize DNA and RNA binding proteins K1 - MINIMAL CURVATURE K2- MAXIMAL CURVATURE H=(k1+k2)/2 Mean Curvature K=k1*k2 Gaussian Curvature Applying Differential Geometry to characterize DNA and RNA proteins Peak Flat Pit Minimal Surface Ridge Saddle ridge Valley Saddle valley Frequency of points Applying Differential Geometry for DNA and RNA function prediction RNA binding surfaces are distinguished from DNA binding surfaces based on Differential Geometric features 76% RNA-binding 78% DNA binding Frequency of points Differential Geometry can correctly determine whether a given binding domain binds RNA or DNA RNA pattern DNA pattern Shazman et al, NAR 2011 How can we view the protein structure ? • Download the coordinates of the structure from the PDB http://www.rcsb.org/pdb/ • Launch a 3D viewer program For example we will use the program Pymol The program can be downloaded freely from the Pymol homepage http://pymol.org • Upload the coordinates to the viewer Pymol example • • • • • • • • • Launch Pymol Open file “1aqb” (PDB coordinate file) Display sequence Hide everything Show main chain / hide main chain Show cartoon Color by ss Color red Color green, resi 1:40 Help : http://pymol.org