* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Structural alignment - Structural Biology Labs
Magnesium transporter wikipedia , lookup
Expression vector wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Gene expression wikipedia , lookup
Biochemistry wikipedia , lookup
Genetic code wikipedia , lookup
Interactome wikipedia , lookup
Metalloprotein wikipedia , lookup
Point mutation wikipedia , lookup
Western blot wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Proteolysis wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Structural alignment [email protected] Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique shape (tertiary or three-dimensional structure). However, proteins with similar sequences adopt very similar structures. Cyclophilin from B. malayi Cyclophilin A from H. sapiens Why structural alignment ? we have sequence alignment - Clustal… KTHLCV KSHA -V that gives us an idea about a correspondence of amino acids of two (or more ) proteins That enables to infer information about function And evolution of the Protein If the sequences are similar enough !!!! What is twilight zone ? Sequence alignment unambiguously distinguish only between protein pairs of similar structure and non-similar structures when the pairwise sequence identity is high. High sequence identity roughly means over 40 %. The signal gets blurred in the twilight zone of 20-35 % sequence identity. More of the twilight zone More than 90 % sequence pairs with the sequence identity lower than 25 % have different structures. Significance of sequence alignments is length dependent. The longer the sequence the lower identity is required to be be called significant.Nevertheless, it converges to 25% with alignments longer than 80 amino acids. ‘The more similar than identical’ rule can reduce a number of false positives. Using of intermediate sequences for finding links between more distant families can also reduce a number of false positives. How far can the sequence identity drop? Average sequence identity of random alignments - 5.6 % Average sequence identity of remote homologues 8.5 % How does it work? From http://www.biochem.unizh.ch/antibody/Introduction/Institutsseminar97/source/slide2.htm Numbers Given the average length of a protein 300 amino acid, there are 20300 possibilities of building the average protein - more than atoms in universe. In reality just few hundred thousand sequences are known. It is believed that a number of basic protein folds is between 1500 - 5000. Structural alignment because: Structures are better conserved than sequences structural alignment can imply a functional similarity that is not detectable from a sequence alignment . Might help to improve sequence alignment when structures are available (phylogenetic studies, homology modeling). Will improve sequence alignment methods (use of structural alignments’ substitution matrices, gap penalties). Will improve sequence prediction methods Sequence versus structural alignment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS Material Is it difficult to make structural alignment? Structural alignment is NP-hard (nondeterministic polynomial time) problem. In other words, it is not tractable properly. Even, if it would, the result would be correct from technical point of view not necessary from biological point of view. Yes, it is. General solution Use a heuristic approach: 1. Represent the proteins A and B in some coordinate independent space 2. Compare A and B 3. Optimize the alignment between A and B (e.g. minimize R.M.S.d.) 4. Measure the statistical significance of the alignment against some random set of structure comparisons “..in some coordinate independent space…” Make the problem easier by: - comparing only distance matrices of atoms -comparing secondary structure element (SSE) - comparing cartoons - comparing vectors of SSE - combination of mentioned methods - …. None of the methods guarantee the finding of the closest structure and two methods can disagree at all amino acid positions. Nevertheless they can still provide a valuable insight into the history of the protein and give hints concerning the function. Methods for fold comparison Server CE DALI DEJAVU LOCK MATRAS PRIDE SSM TOP TOPS TOPSCAN VAST Location http://cl.sdsc.edu Method Extension of optimal path1 http://www2.ebi.ac.uk/dali Distance-matrix alignment2 http://portray.bmc.uu.se/cgi-bin/dennis/dejavu.pl SSE alignment with Caatom optimisation3 http://gene.stanford.edu/LOCK/ Absolute orientation of corresponding points4 http://bongo.lab.nig.ac.jp/~takawaba/Matras.html Markov transition model of evolution5 http://hydra.icgeb.trieste.it/pride/ Ca- Ca atom distances6 http://www.ebi.ac.uk/msd-srv/ssm/ssmstart.html Graph matching algorithm http://bioinfo1.mbfys.lu.se/TOP SSE alignment7 http:// tops.ebi.ac.uk/tops/compare1. html TOPS-diagram alignment8 http://www.rubic.rdg.ac.uk/~andrew/bioinf.org/to pscan Secondary topology-string alignment9 http://www.ncbi.nlm.nih.gov/Structure/VAST/vas tsearch.html Vector alignment10 Protein structure classification If you want to know which structures are similar to a known structure, these systems might help: A) Manual - SCOP B) Semi-automatic - CATH C) Automatic - FSSP CATH C (class) - secondary structure composition A (architecture) - overall shape, secondary structure elements orientation T (topology) - overall shape, secondary structure elements orientation + connectivity H (homologous superfamily) Sequence identity >= 35%, 60% of larger structure equivalent to smaller SSAP score >= 80.0 and sequence identity >= 20% 60% of larger structure equivalent to smaller SSAP score >= 80.0, 60% of larger structure equivalent to smaller and domains which have related functions S (sequence families) - clustering based on the sequence identity level Summary Structural alignment can help with protein annotations even when the sequence similarity is not significant. Sequence identity of two proteins with similar structures can be lower than 10 % - number of folds is limited. Recent progress in the protein structure determination increases the usefulness of structural alignment. Structural alignment is difficult problem that is solved by heuristic methods. These methods simplify the problem by moving from 3D space to 2D space sacrificing the optimum result for the speed. Summary II Different methods can provide completely different alignments. In our results, CE, Dali,Matras and Vast were the best servers for finding structural relatives. A few structural classification systems were developed (CATH, FSSP, SCOP), they provide hierarchical classification of protein structures and enable to infer functional and evolutional relationships between proteins.