Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparative modeling Ole Lund, Associate Professor, CBS, BioCentrum, DTU Comparative modeling Also known as homology modeling Uses template from related protein to build model Based on the finding that – – Protein structure tend to remain approximately the same even when many amino acids have changed during evolution! selection for conservation of structure? proteins with similar sequences often have similar structures OL Why make structural models? Fast and cheap alternative to experimental determination of structures (X-ray & NMR) – – Not as accurate as experimental methods Not all proteins can be modeled with current methods Applications – – – Drug discovery (Requires accurate model) Plan new experiments (mutations) Understanding of function OL Steps in comparative modeling 1. 2. 3. 4. 5. 6. Find template Make alignment Build loops Model side chains Refinement Evaluate model OL Recovery from errors An error on an earlier step is normally unrecoverable on a later step – – The alignment can not make up for a wrong choice of template Loop modeling can not make up for a wrong alignment Errors may be discovered on a later step and corrected for by going back and correcting it – i.e. by selecting a new (and better) template OL Template identification Search with sequence – – – Blast Psi-Blast Fold recognition methods Use significance levels (P or E values) - not %ID BLAST reports E-values: – # of random hits with expected to be found with a given score Rather than P values: – probability of finding at least one hit with a given score P = 1- exp(-E) E=loge(1-P) – http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html Use biological information Functional annotation in databases Active site/motifs OL Example: Query sequence >gi|2065035|emb|CAA65601.1| beta-lactamase [Chryseobacterium meningosepticum MLKKIKISLILALGLTSLQAFGQENPDVKIEKLKDNLYVYTTYNTFNGTKYAANAVYLVTDKGVVVIDCP WGEDKFKSFTDEIYKKHGKKVIMNIATHSHDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFD NNKSFKVGKSEFQVYYPGKGHTADNVVVWFPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSVHNIQQ KFSGAQYVVAGHDDWKDQRSIQHTLDLINEYQQKQKASN Since the discovery of penicillin, bacteria have developed defense mechanisms against these drugs. In particular, this has become a problem during the last decades, where certain pathogenic bacteria have become resistant to antibiotics. The primary defense mechanism is production of beta-lactamases, which are enzymes cleaving beta-lactam antibiotics. http://www.matfys.kvl.dk/~antony/ OL http://www.ncbi.nlm.nih.gov/blast/ Blast search vs. pdb >gi|3318914|pdb|1A7T|A gi|3318915|pdb|1A7T|B gi|3891997|pdb|1A8T|A gi|3891998|pdb|1A8T|B Length = 232 Chain Chain Chain Chain A, B, A, B, Metallo-Beta-Lactamase Metallo-Beta-Lactamase Metallo-Beta-Lactamase Metallo-Beta-Lactamase With Mes With Mes In Complex With L-159,061 In Complex With L-159,061 Score = 126 bits (317), Expect = 7e-30 Identities = 62/216 (28%), Positives = 111/216 (51%), Gaps = 1/216 (0%) Query: 27 Sbjct: 10 Query: 86 Sbjct: 70 DVKIEKLKDNLYVYTTYNTFNG-TKYAANAVYLVTDKGVVVIDCPWGEDKFKSFTDEIYK 85 D+ I +L D +Y Y + G +N + ++ + ++D P + + + + + DISITQLSDKVYTYVSLAEIEGWGMVPSNGMIVINNHQAALLDTPINDAQTEMLVNWVTD 69 KHGKKVIMNIATHSHDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFDNNKSF 145 KV I H H D GGL Y + G ++Y+ +MT + ++ P ++ F ++ + SLHAKVTTFIPNHWHGDCIGGLGYLQRKGVQSYANQMTIDLAKEKGLPVPEHGFTDSLTV 129 Query: 146 KVGKSEFQVYYPGKGHTADNVVVWFPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSV 205 + Q YY G GH DN+VVW P E +L GGC++K + +G I +A V W +++ Sbjct: 130 SLDGMPLQCYYLGGGHATDNIVVWLPTENILFGGCMLKDNQTTSIGNISDADVTAWPKTL 189 Query: 206 HNIQQKFSGAQYVVAGHDDWKDQRSIQHTLDLINEY 241 ++ KF A+YVV GH ++ I+HT ++N+Y Sbjct: 190 DKVKAKFPSARYVVPGHGNYGGTELIEHTKQIVNQY 225 OL Template sequence 1A8TB. Chain B, Metallo-...[gi:3891998] BLink, Domains, Links LOCUS 1A8T_B 232 aa linear BCT 23-MAR-1998 DEFINITION Chain B, Metallo-Beta-Lactamase In Complex With L-159,061. ACCESSION 1A8T_B VERSION 1A8T_B GI:3891998 DBSOURCE pdb: molecule 1A8T, chain 66, release Mar 23, 1998; deposition: Mar 23, 1998; class: Hydrolase; source: Mol_id: 1; Organism_scientific: Bacteroides Fragilis; Strain: Tal3636; Variant: Clinical Isolate; Gene: Ccra; Expression_system: Escherichia Coli; Exp. method: X-Ray Diffraction. KEYWORDS . SOURCE Bacteroides fragilis ORGANISM Bacteroides fragilis Bacteria; Bacteroidetes; Bacteroides (class); Bacteroidales; Bacteroidaceae; Bacteroides. …………… ORIGIN 1 aqksvkisdd isitqlsdkv ytyvslaeie gwgmvpsngm ivinnhqaal ldtpindaqt 61 emlvnwvtds lhakvttfip nhwhgdcigg lgylqrkgvq syanqmtidl akekglpvpe 121 hgftdsltvs ldgmplqcyy lggghatdni vvwlptenil fggcmlkdnq ttsignisda 181 dvtawpktld kvkakfpsar yvvpghgnyg gteliehtkq ivnqyiests kp OL // Template recognition BlaB – Beta lactamase Template 1A8T Chain A OL Alignment of query and template Look at the alignment used to find the template – – – Are secondary structure elements active sites and other motifs aligned? Can gaps be closed? Are there place for the insertions? Change the alignment manually or by a different alignment program/alignment parameters – – Take care not to change it for the worse On average I only make things slightly worse by manual intervention! OL Alignment BlaB – Beta lactamase BLAB 1A8T.A EKLKDNLYVYTTYNTFNGTKY-AANAVYLVTDKGVVVIDCPWGEDKFKSFTDEIYKKHGKKVIMNIATHS TQLSDKVYTYVSLAEIEGWGMVPSNGMIVINNHQAALLDTPINDAQTEMLVNWVTDSLHAKVTTFIPNHW BLAB 1A8T.A HDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFDNNKSFKVGKSEFQVYYPGKGHTADNVVVW HGDCIGGLGYLQRKGVQSYANQMTIDLAKEKGLPVPEHGFTDSLTVSLDGMPLQCYYLGGGHATDNIVVW BLAB FPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSVHNIQQKFSGAQYVVAGHDDWKDQRSIQHTLDLIN 1A8T.A LPTENILFGGCMLKDNQTTSIGNISDADVTAWPKTLDKVKAKFPSARYVVPGHGNYGGTELIEHTKQIVN BLAB EYQQKQK 1A8T.A QYIESTS Sequence identity 27% OL Template vs alignment identification If the template was hard to find the correct alignment will be tough to make If the Template is correct part of the model will normally be correct OL Build loops Fragment based methods – – Energy based methods – – Many implementations (M Levitt, L Holm, D Baker etc.) Fast Avoid stereo-chemically infeasible solutions Can see what is bad but not what is good! Combination of methods is often used No method can move the model (very much) towards the native conformation i.e reduce the root mean square deviation (RMSD) = How many Ångstrøms you are off OL http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php Loops: The rosetta method Find fragments (10 per amino acid) with the same sequence and secondary structure profile as the query sequence Combine them using a Monte Carlo scheme to build them to build the loop Baker et al. OL Model side chains Knowledge based methods – – SCWRL performed well in CASP4 (http://dunbrack.fccc.edu/SCWRL3.php , http://dunbrack.fccc.edu/scwrl3protsci.pdf ) Energy calculations Slow OL SCWRL (Bower, Cohen & Dunbrack) 1. 2. 3. Sidechain placement With a Rotamer Library Assumes constant angles and distances of bonds Each residue begins in its most favored rotamer Rotamer search to remove steric clashes between sidechains and backbone Rotamer search to remove steric clashes between sidechains OL Model (red) vs template (blue) OL Model evaluation Is the structure unlikely? Distributions of – – Dihedral angles (fraction in most favored regions) Bond lengths and angles Procheck – www.biochem.ucl.ac.uk/~roman/procheck/proche ck.html OL Example of Procheck output OL Benchmarking comparative modeling CASP – – Critical Assessment of Structure Predictions Sequences from about-to-be-solved-structures are given to groups who submit their predictions before the structure is published EVA – – Newly solved structures are send to prediction servers. Evaluates automatic servers OL CASP4: Best overall fold 1. 2. 3. 4. 5. Venclovas, C Baker, D Sternberg, M Rychlewski, L (Bioinfo.PL) SBI-AT Tramantano et al., 2001 OL CASP4: Best details of models 1. 2. 3. 4. 5. Venclovas, C Sternberg, M Honig, B Baker, D SBI-AT Tramantano et al., 2001 OL Accuracy of SwissModel OL http://cubic.bioc.columbia.edu/eva/cm/res/rank.html EVA Analysis of Fold accuracy (% Equivalent Positions): Ranking of the methods: 1. sdsc1 2. 3djigsaw 3. SwissModel 4. cphmodels 5. esypred OL Links to modeling servers Database of links – SwissModel – http://cl.sdsc.edu/hm.html ESyPred3D – www.bmm.icnet.uk/servers/3djigsaw/ SDSC1 – www.expasy.ch/swissmod/SM_FIRST.html 3D-Jigsaw – http://mmtsb.scripps.edu/cgi-bin/renderrelres?protmodel http://www.fundp.ac.be/urbm/bioinfo/esypred/ CPHmodels – www.cbs.dtu.dk/services/CPHmodels-2.0 OL Practical conclusions Several servers exist in the public domain Template and alignment must be correct Loops are difficult to model More info on comparative modeling – – – http://speedy.embl-heidelberg.de/gtsp/ http://www.cmbi.kun.nl/gv/course/index.html http://www.umass.edu/microbio/chime/explorer/homol mod.htm OL