Download Protein Architecture: Four Levels

Protein Architecture: Four Levels (A) Sequence alignment Problem: How similar are two sequences? s = B A N A N A t = A N A N A S (A) Sequence alignment Problem: How similar are two sequences? s = B A N A N A t = A N A N A S More precisely: #1) Find weights of transformation that turns s into t by a specific sequence of mutations p1 s p 1’ x x’ p2 p 2’ y p3 y’ p 3’ t p = p1 p2 p3 + p01 p02 p03 (A) Sequence alignment Problem: How similar are two sequences? s = B A N A N A t = A N A N A S More precisely: #1) Find weights of transformation that turns s into t by a specific sequence of mutations p1 s p 1’ x x’ p2 p 2’ y p3 y’ p 3’ t p = p1 p2 p3 + p01 p02 p03 #2) Align amino acid sequence such that the weight of the mutations is maximised, e.g.: B A N A N A A N A N A S Requirement: “Scoring Matrix” for • mutations • insertion / deletion s i ! ti : pG Common scoring matries: BLOSUM, PAM matrices gap psi ;ti BLOSUM80 scoring matrix (A) Sequence alignment Example: s = R I - L V S D K V I t = R I S L V - - K A I p = 1 · 1 · pG · 1 · 1 · p2G · 1 · pVA · 1 si ti R R I S pG K A I D K wi-1,j-1 V I wi,j-1 p(si,tj) I V V S 1 1 L L pG pG pSV wi-1,j pLS 1 1 pG wi,j pVD pG pG 1 Task: pVA 1 • Find the shortest (weighted) path (#2) • Sum over all paths (#1) (A) Sequence alignment Task: • Find the shortest (weighted) path (#2) • Sum over all paths (#1) Number of possible paths/alignments: ☞ n = 100: 1059 ☞ n = 1000: 10600 ✓ 2n n ◆ 22n ⇡p ⇡n → NP-problem? No! Needleman / Wunsch (1970) Smith / Waterman (1976) Idea (analogous to path integral for Schrödiner eq.: Complete sum wij over all paths to (i.j) recursively: wij = wi 1,j 1 psi ,tj wij = Max{wi + wi 1,j 1 psi ,tj 1,j pG + wi + wi,j 1,j pG Computational cost: O(n2) (like a route planner) 1 pG + wi,j (solves #1) 1 pG } (solves #2) “Dynamic programming” Close rela5on to Smoluchowski/Feynman path integrals action (x1 , t1 ) (x0 , t0 ) (x1 , t1 ) = Z dx0 (x0 , t0 ) x0 x1 eiS/~ = exp Z (x0 , t0 ) B dt L(x, ẋ, t) A i ~ Dx(t) exp all paths Z ! B dt L(x, ẋ, t) A xn x2 … Discretisation: i ~ Z (x1 , t1 ) ! Close rela5on to Smoluchowski/Feynman path integrals x0 x2 x1 xn … (x1 , t1 ) … Discretisation: (x0 , t0 ) (xn , tn ) = (xi+1 , ti+1 ) = = = Z dx0 (x0 , t0 ) Z Z Z Z dxi (xi , ti ) e i ~ dx1 e R ti+1 ti dxi develop ψ(x,t) in powers of Δx … @ i = @t ~ R t1 t0 dt L ··· Z dxn 1e i ~ R tn tn 1 2 V (x) ◆ 1 dt L dt L ✓ Z ti+1 ✓ 2 ◆◆ i ẋ (xi , ti ) exp dt V (x) ~ ti 2 ✓ ◆2 Z i ti+1 1 xi+1 xi (xi , ti ) exp dt ~ ti 2 t dxi ✓ i ~ V (x) !! Sequence alignment: hemoglobin of mammals Sequence comparison: hemoglobin alpha chain vs beta chain window size: 31 match: +5 dismatch: -4 (B) Phylogene5c trees Given: N sequences s(1), … s(N) Task: Find most probable evolutionary tree: • Example s(1) = B A N A N A s(1) = A N A N A S s(1) = H O T D O G distance • Cost: NP-complete ☞ Trees for different proteins are (usually) similar ☞ Reconstruction of evolution Problem: horizontal gene transfer Phylogene5c trees Phylogenetic tree of dogs Nature 438, 803-819 Phylogene5c trees Phylogenetic tree of vertebrates Nature 496, 311-316 Phylogene5c trees Phylogenetic tree of ribosomal RNA Wikimedia Phylogene5c tree of indo-‐ european languages Science 337, 957-960 (2012) (C) Structure predic5on: from sequence to structure • • “Folding problem” Ab initio → only possible for smallest proteins (since recently) (a) Secondary structure prediction Chou-Fasman method (empirical) • Calculate properties from known structures P (S|A) = P (A|S) nA,S /nS = P (A) nA /n amino acid second. structure • Search for regions with high (average) propensities for certain secondary structures • Search secondary structure boundaries (e.g., “helix breakers” such as proline) ☞ 75% prediction rate (compare to random guess: 33%) log frequencies of amino acids in secondary structure elements A.A. A R N D C Q E G H I L K M F P S T W Y V P<a> 1.42 0.98 0.67 1.01 0.70 1.11 1.51 0.57 1.00 1.08 1.21 1.16 1.45 1.13 0.57 0.77 0.83 1.08 0.69 1.06 P<b> 0.83 0.93 0.89 0.54 1.19 1.10 0.37 0.75 0.87 1.60 1.30 0.74 1.05 1.38 0.55 0.75 1.19 1.37 1.47 1.70 P<t> 0.66 0.95 1.56 1.46 1.19 0.98 0.74 1.56 0.95 0.47 0.59 1.01 0.60 0.60 1.52 1.43 0.96 0.96 1.14 0.50 Hp 1.80 -4.50 -3.50 -3.50 2.50 -3.50 -3.50 -0.40 -3.20 4.50 3.80 -3.90 1.90 2.80 -1.60 -0.80 -0.70 -0.90 -1.30 4.20 (C) Structure predic5on (a) Homology modelling Observation in PDB: Similar sequence (30% identity) → similar structure ☞ Strategy: Aquaporin-1 GlpF • • • • • • Search homologous sequence with known structure align sequences change differing amino acids meet sterical criteria (avoid atomic overlaps), and other criteria optimize rotamers Critical: correct alignment GlpF GlpF model based on Aqp1 (C) Structure predic5on: from sequence to structure (c) Protein threading No homologous structure available? ☞ Into which known fold fits the sequence best? aa S A R N D ☞ Find the known fold with the maximal … α-helix N X ln p(ai , sj ) i=1 p(ai , sj ) β-sheet Improvements: turn Sequence / structure statistics better statistics, e.g. consider triplets, spacial neighbours, cys-cys bonds, … non-polar surface area [A^2] Trp Leu Ile Phe Met Val Pro Lys Tyr His Thr Arg Ala Glu Gln Ser Cys Gly Asp Asn 236 164 155 194 137 135 124 122 154 129 90 89 86 69 66 56 48 47 45 42 estimated hydrophobic effect [kcal/mol] 4.11 4.10 3.88 3.46 3.43 3.38 3.10 3.05 2.81 2.45 2.25 2.23 2.15 1.73 1.65 1.40 1.20 1.18 1.13 1.05 (C) Structure predic5on (d) Empirical potentials E.g., ψ-angles between amino acids, e.g., Ala-Asn: h( ) V = 2 3 1 kB T ln h( ) 1 3 2 ☞ 20x20 pair interactions Vij ☞ minimize N X i=1 VSi ,Si+1 Bottom line: structure prediction is still not very accurate and reliable ! Ramachandran plots

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Protein Architecture: Four Levels