* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download No Slide Title
Western blot wikipedia , lookup
Gene expression wikipedia , lookup
Biosynthesis wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Biochemistry wikipedia , lookup
Proteolysis wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Metalloprotein wikipedia , lookup
Point mutation wikipedia , lookup
Genetic code wikipedia , lookup
Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align, they are similar If they are similar, then they might have same structure or function If one of them has known structure/function, then alignment to the other yields insight about how the structure or function works ©CMBI 2005 Sequence Alignment The purpose of a sequence alignment is to line up all residues in the sequence that were derived from the same residue position in the ancestral gene or protein A B A B gap = insertion or deletion ©CMBI 2005 Alignment To carry over information from a well studied protein sequence and its structure to a newly discovered protein sequence, we need an sequence alignment that represents the protein structures today, a structural alignment. ©CMBI 2005 Alignment The implicit meaning of placing amino acid residues below each other in the same column of a protein (multiple) sequence alignment is that they are at the “same” position in the 3D structures of the corresponding proteins!! Two very simple examples: 1) the 3 active site residues of the serine protease we saw earlier 2) Cys-bridges: STCTKGALKLPVCRK TSCTEG--RLPGCKR ©CMBI 2005 Things one can do with a good alignment Carry information from a well studied to a less well studied protein. Such information can be: Phosphorylation sites Glycosylation sites Stabilizing mutations Membrane anchors Ion binding sites Ligand binding residues Cellular localization ©CMBI 2005 Significance of alignment One can only transfer information if the similarity is significantly high between the two sequences. Schneider (group of Sander) determined the “threshold curve” for transfering structural information from one known protein structure to another protein sequence: If the sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information. If the sequences are smaller in length, a higher percentage of identity is needed. Structure is much more conserved than sequence! ©CMBI 2005 Significance of alignment (2) ©CMBI 2005 Aligning sequences by hand Most information that enters the alignment procedure comes from the physico-chemical properties of the amino acids. Examples: which is the better alignment (left or right)? 1) CPISRTWASIFRCW CPISRT---LFRCW CPISRTWASIFRCW CPISRTL---FRCW 2) CPISRTRASEFRCW CPISRTK---FRCW CPISRTRASEFRCW CPISRT---KFRCW ©CMBI 2005 Aligning sequences by hand (2) Procedure of aligning depends on information available: 1) Use “only” identity of amino acid and its physico-chemical properties. This is more or less what alignment programs do. 2) Also use explicitly the secondary structure preference of the amino acids. 3) Use 3D information if one or more of the structures in the alignment are known. In most cases you will start with a alignment program (e.g. CLUSTAL) and then use your knowledge of the amino acids to improve the alignment, for instance by correcting the position of gaps. ©CMBI 2005 Helix ©CMBI 2005 Helix ©CMBI 2005 Helix preferences -4 -3 -2 -1 1 2 3 4 5 total - - - - H H H H H ALA 143 148 99 58 189 205 187 241 CYS 24 31 29 22 14 17 18 33 17 ASP 98 110 121 260 98 197 167 49 86 1186 GLU 91 100 71 71 152 287 269 70 147 1258 PHE 53 70 90 29 68 46 49 107 GLY 207 246 166 192 96 127 99 65 60 1258 HIS 48 50 39 46 28 36 38 24 30 339 ILE 94 81 133 19 79 45 68 161 99 779 LYS 99 98 80 46 98 105 69 80 154 829 LEU 105 111 188 50 140 84 113 281 MET 37 20 51 13 26 22 54 61 67 351 ASN 103 83 89 206 46 62 55 37 77 758 PRO 143 136 121 99 240 78 40 0 0 857 GLN 48 58 40 38 83 93 124 76 101 661 ARG 82 63 59 51 71 75 61 114 109 685 SER 112 128 98 292 105 126 99 48 76 1084 THR 106 99 119 253 91 80 115 72 67 1002 VAL 141 107 132 37 117 74 120 208 120 1056 TRP 29 25 29 14 30 26 28 30 29 240 TYR 66 65 75 33 58 44 56 72 48 517 268 1538 65 205 577 209 1281 ©CMBI 2005 Helix preferences and alignment 1) 2) S G V S P D Q L A A L K L I L E L A L K G T S L E T A L L M Q I A Q K L I A G S G V S P D Q L A A L -1-4-4-1-4-1 3-2 1 1-2 -3-2 -3 2 5 1 2 2 1 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5 K L I L E L A L K 2 5 I A Q K L I A G ©CMBI 2005 Helix preferences and alignment 1) S G V S P D Q L A A L K L I L E L A L K 2) G T S L E T A L L M Q I A Q K L I A G S G V S P D Q L A A L -1-4-4-1-4-1 3-2 1 1-2 -3-2 -3 2 5 1 2 2 1 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5 K L I L E L A L K 2 5 I A Q K L I A G ©CMBI 2005 Helix preferences and alignment S G V S P D Q L A A L -1-4-4-1-4-1 3-2 1 1-2 -3-2 -3 2 5 1 2 2 1 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5 Final alignment: S G V S P D Q L A - G T S L E T A L K L I L E L A L K 2 5 Q I A Q K L I A G A L K L I L E L A L K L M Q I A Q K L I A G ©CMBI 2005 A ‘real’ example of threading 1 2 If you know that in structure 1 the Ala is pointing outside and the Ser is pointing inside: Where does the Arg in structure 2 go? (and what will CLUSTAL choose?) ©CMBI 2005 An even more real example 1 2 3 4 5 6 7 8 9 10 ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS ARG THR PRO --- --- --- GLU ALA VAL CYS ARG --- --- --- THR PRO GLU ALA 11 VAL ILE ILE ©CMBI 2005 An even more real example 1 2 3 4 5 6 7 8 9 10 ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS ARG THR PRO --- --- --- GLU ALA VAL CYS ARG --- --- --- THR PRO GLU ALA PP- 11 VAL ILE ILE G- S-T LT- A-P RRR VVV CCC EEE III AAA ©CMBI 2005 Multiple sequence alignment Multiple sequence alignments can confirm or improve pair-wise sequence alignments: CWPVAASYGR CWPT---YGR CWPTA-SYGR CWPTLGLFGR CWPVAASYGR ? CWPTA-SYGR ©CMBI 2005 Multiple sequence alignment Multiple sequence alignments can reveal structural information: ASCTRGCIKLPTCKKMGRCTGY STCTKGALKLPVCRKMGKSSAY ATSTHGCMKLPCSRRFGKCSSY TSCTEGCLRLPGCKRFGRCTSY TTCTKGLLKLPGCKRFGKSSAY ASSTKGCMKLPVSRRFGRCTAY ©CMBI 2005 Multiple sequence alignment Multiple sequence alignments can validate PROSITE search results. In N-{P}-[ST]-{P} the N is the glycosylation site. The chance of finding N-{P}-[ST]-{P} is rather high. So how can you be sure? Look at the multiple sequence alignment: ASLRNASTVVTIGDTITGNLTLASYHW GSIKNGSSVITLPGTMEGNLSTTTYHY ATLRNASTVMEINGTITGDLTLASFHW ©CMBI 2005 Summary Bioinformatics is all about obtaining information. Everything you can find in a database saves you doing experiments. Sequence alignment is important for carrying over information between ‘similar proteins’. To align sequences, you need to understand the amino acids. ©CMBI 2005