* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Drug design wikipedia , lookup
Expression vector wikipedia , lookup
Gene expression wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Genetic code wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Point mutation wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Biochemistry wikipedia , lookup
Metalloprotein wikipedia , lookup
Protein purification wikipedia , lookup
Interactome wikipedia , lookup
Western blot wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Proteolysis wikipedia , lookup
Construyendo modelos 3D de proteinas ‘fold recognition / threading’ Why make a structural model for your protein ? The structure can provide clues to the function through structural similarity with other proteins With a structure it is easier to guess the location of active sites With a structure we can plan more precise experiments in the lab We can apply docking algorithms to the structures (both with other proteins and with small molecules) Protein Modeling Methods • Ab initio methods: solution of a protein folding problem search in conformational space • Energy-based methods: energy minimization molecular simulation • Knowledge-based methods: homology modeling fold recognition / threading Why do we need Ab Initio Methods? data taken from PDB http://www.rcsb.org/pdb/holdings.html New folds and those sequences with very little sequence homology <15% Protein Modeling Methods • Ab initio methods: solution of a protein folding problem search in conformational space • Energy-based methods: energy minimization molecular simulation • Knowledge-based methods: homology modeling fold recogniion Predicting Protein Structure: Threading / Fold Recognition Basis * It is estimated there are only around 1000 to 10 000 stable folds in nature * Fold recognition is essentially finding the best fit of a sequence to a set of candidate folds * Select the best sequence-fold alignment using a fitness scoring function The Threading Problem • Find the best way to “mount” the residue sequence of one protein on a known structure taken from another protein Why is it called threading ? • threading a specific sequence through all known folds • for each fold estimate the probability that the sequence can have that fold Threading: Basic Strategy Query dhgakdflsdfjaslfkjsdlfjsdfjasd Library of folds Scoring & selection Spatial Interactions Template Sequence Protein Threading • Conserved Core Segments K L Protein A J I Conserved Core Segments Protein B Two structurally similar proteins Spatial adjacencies (interactions) Possible threading with a sequence Input/Output of Protein Threading Core segments C[1..m] Amino acid sequence a[1..n] Pairwise amino acid scoring function g(…) T H R E A D I N G Fold recognition (Threading) The sequence: MA A G Y AV L S + Known protein folds structural model Input: sequence H bond donor H bond acceptor Glycin Hydrophobic Library of folds of known proteins H bond donor H bond acceptor Glycin Hydrophobic S=-2 Z= -1 S=5 Z=1.5 S=20 Z=5 Amino acid type Position on sequence A 1 N D … 10 -50 101 2 -24 : C : 87 -99 : : : Y Gop Gext -80 100 10 167 100 10 : : : 100 10 Fold recognition/ Threading Disadvantages: • threading methods seldom lead to the alignment quality that is needed for homology modeling. • less than 30% of the predicted first hits are true remote homologues (PredictProtein). Threading resources • TOPITS Heuristic Threader, part of larger structure prediction system • 3DPSSM Integrated system, does its own MSA and secondary structure predictions and then threading • GenThreader Similar to 3DPSSM Side chain construction In homology modelling, construction of the side chains is done using the template structures when there is high similarity between the built protein and the templates Without such similarity the construction can be done using rotamer libraries A compromise between the probability of the rotamer and its fitness in specific position determines the score. Comparing the scores of all the rotamer for a given amino acid determines the preferred rotamer. In spite of the huge size of the problem (because each side chain influences its neighbours) there are quite succesful algorithms to this problem. In this work we examined differences in structures of amino- acid side chains around point mutations. Conformation - a given set of dihedral angle which defines a structure. Asn Rotamer - energetically favourable conformation. Phe Ab initio The sequence MA A G Y AV L S structural model Ab initio methods for modelling This field is of great theoretical interest but, so far, of very little practical applications. Here there is no use of sequence alignments and no direct use of known structures The basic idea is to build empirical function that simulates real physical forces and potentials of chemical contacts If we will have perfect function and we will be able to scan all the possible conformations, then we will be able to detect the correct fold Predicting Protein Structure: Ab Initio Methods Sequence Prediction Secondary structure Tertiary structure Energy Low energy Validation structures Mean field Minimization potentials Predicted structure Ab initio Methods Simplified models simplified alphabet (HP) simplified representation (lattice) Build-up techniques Deterministic methods quantum mechanics diffusion equations Stochastic searches Monte Carlo genetic algorithms Rosetta approach • Rosetta (David Baker) consistently outstanding performer in last two CASPs • Integrated method – I-Sites: much finer grained substructures than secondary structures. A library of all structures each AA 9mer is found in (taken from PDB) – Heuristic global energy function to estimate quality of folds – Monte Carlo search through assignments of I-Sites to minimize energy function. • Also, HMMSTR, HMM-driven method for assigning I-Sites. Rosetta prediction method • Define global scoring function that estimates probability of a structure given a sequence • Generate version of I-sites with fixed length subsequences (9 amino acids) – Calculate P(I-Site|sequence) for all sequences and Isites • Generate structures by Monte Carlo sampling of assignments of fixed size I-sites to subsequences • End up with ensemble of plausible structures Rosetta is way ahead • CASP 4 results. • CASP 5 similar, but not as dramatic. Fully automated predictions • CAFASP-2 • Meta-servers work best – Integrate predictions from several other servers – Significantly better predictions than any individual approach • Several public metaservers available: – http://bioinfo.pl/Meta/ is best all-around