* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Amino acid sequence restriction in relation to proteolysis
Genetic code wikipedia , lookup
Magnesium transporter wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Signal transduction wikipedia , lookup
Expression vector wikipedia , lookup
Gene expression wikipedia , lookup
Point mutation wikipedia , lookup
Biochemistry wikipedia , lookup
Interactome wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Metalloprotein wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
225 Bioscience Reports 3, 225-232 (1983) Printed in Great Britain Amino acid sequence restriction in r e l a t i o n to p r o t e o l y s i s Hans 30RNVALL and Bengt PERSSON Department of Chemistry I, Karolinska Instituter, S-104 01 Stockholm 60, Sweden (Received 28 January 1983) Distributions of amino acid residues in proteins show t h a t proline is overrepresented in sequence positions rLys~ r Lys ~ following two basic residues (lArg~-lArgSJ, i.e. at sites similar to those susceptible to proteolytic cleavages of hormonal pro-forms. Conformational correlations ~Lys~ ~Lys. further show that IArg~-IArg)-t'ro sequences are often (8/II) not adiacent to elements of secondary structure, Lys Lys. ~. whereas the opposite applies to iArg#-iArg#-non~ro sequences (82/103 adjacent to elements of secondary structure). These distribution patterns from proteins in general also seem applicable in individual protein groups as demonstrated for some dehydrogenases. It appears possible that { LySArg}- { LySArg}_nonPro c o n s t i t u t e s a restricted addition to a means cessings of sequence ,n proteins, and that proline, in elements of secondary structure, contributes of avoiding u n a c c e p t a b l e proteolytic proproteins in general. A p a r t from c o n t a i n i n g the i n f o r m a t i o n for conformation and f u n c t i o n , the p r i m a r y structures of proteins have short sequences serving as 'signals' for special properties. These signals are often sites for modifications and are common between different proteins. Examples are: polypeptide glycosylations (one type affecting Asn in Asn-X-Thr/Ser ( 1 ) ) ; p h o s p h o r y l a t i o n s (e.g. on Ser in different structures (2,3)); hormonal pro-form cleavages (often, but apparently not i n v a r i a b l y (#), after dibasic structures like Lys-Arg-X (5,6)); cleavages (after small residues (7)) of signal peptides in pre-forms of s e c r e t e d proteins; or N-terminal acetylations (of small residues in special structures ( 8 ) ) . However, all these sequence types are also common elsewhere in proteins, and for correct modifications, additional factors must therefore restrict the sequence-directed specificities. One such r e s t r i c t i o n f a c t o r is caused by conformation. For example, sequences which actually get glycosylated differ from those o t h e r w i s e i d e n t i c a l t h a t do not, by being in different secondary s t r u c t u r e s (often in reverse turns (9,10)). Similarly, non-cleaved 01983 The Biochemical society 226 J{SRNVALL & PERSSON bonds in pro-form sequences appear stabilized in secondary structure, whereas otherwise identical but cleaved bonds are in regions without secondary structure (l 1). Furthermore, protease cleavages are often limited to domain borders or other accessible parts (cf. 12-1~). Another restriction might simply be selection against potentially s e n s i t i v e s e q u e n c e s at p l a c e s w h e r e modification should not be directed. Thus, the Asn-X-Ser/Thr glycosylation signal is underrepresented in proteins in general (15), possibly indicating a tendency to avoid u n f a v o u r a b l e g l y c o s y l a t i o n , and t h e r e f o r e reflecting a 'restricted sequence' in proteins. It appears possible that one further type of restricted sequence could apply to proteolytic signals, where sequence signals can be limited by the nature of residues subsequent to potentially sensitive cleavage sites. Peptide bonds involving proline or with proline in an adjacent position reduce sensitivities to many proteases, and proline may protect some structural proteins against degradation (16). In the p r e s e n t study, t h e p o s s i b l e i m p o r t a n c e of proline in proteolysis was tested by comparisons of all structures of the type occurring Lys Lys + in pro-form cleavages, i.e. at the arrow in {Arg}-(Arg}-X. Evidence was obtained that Pro, together with elements of secondary structure, is important to protect potentially sensitive sites. NonPro bonds at such sites appear to form a type of restricted sequence in p r o t e i n s , i n d i c a t i n g a g e n e r a l sequence restriction in relation to proteolysis. Table i. Occurrence of proline after dibasic sequences in proteins versus occurrence of proline in general at any position Proline residues after the dibasic structures were taken from reference 18, proline in general from reference 17. in both cases, basic proteins disturb the usefulness of the values for general interpretations, since such proteins frequently have many basic residues together. For example, without correction for this, Arg-Arg is followed by one further Arg in 36% of known sequences (18). Therefore, occurrences below are recalculated to show the distribution of proline after subtraction of lysine and ar~inine from the calculations. Structure Occurrence of Pro in this position in proteins in general (after subtraction of Lys, Arg; see above) (%) Residue after Lys-Arg i0.i Residue after Lys-Lys 6.8 Residue after Arg-Arg 8.8 Residue after Arg-Lys 3.3 Residue at any position 5.9 RESTRICTED SEQUENCES IN PROTEOLYSIS 227 M a t e r i a l s and M e t h o d s Most amino acid sequences and conformations were taken from references 17-199 and the remaining ones from references 20-2% as indicated. Results Proline occurrence after dibasic structures in general The occurrence in proteins of proline after dibasic structures (18) versus that at any position (17) is shown in Table I, The values obtained suggest that proline is over=represented after three of these f o u r d i b a s i c s e q u e n c e s in p r o t e i n s in g e n e r a l ( e s p e c i a l l y after Lys-Arg). Since Lys=Arg is a signal for the proteases cleaving pro-form structures, proline may serve a protective function towards unfavourable proteolysis. This appears common enough to be visible already on distributions in general. Proline occurrence related to known conformations Dibasi c structures with known conformations) both with and without a subsequent Pro, are listed in Tables 2 and 3, respectively. Many of Table 2, Lys .Lys Properties of sequences of the type {Arg}-IArg}-Pro in proteins of known conformation Sequence and structure data are from references 17-19 except for crystallin) which is from reference 20. Numbers list positions of the first residue shown (in one-letter codes). Secondary structures are listed as ~9 B) or neither 9 and indicate the type of structure known to cover the sequence listed. In the case of cytochrome c) the listing is ~ although covering a proline residue since the structure determined is from a homologue lacking the proline. The two listed 6-structures are also atypical) being at ends of H-bondings. Surface (+) or non-surface (-) positions are given for those entries that lack ~ or 8 structures in the regions listed. Protein Alcohol dehydrogenase Cytochrome c Immunoglobulin, yl-chain Immunoglobulin) %-chain Phospholipase A 2 Prothrombin Elastase Triosephosphate isomerase ~-crystallin 9 B-chain Sequence Secondary structure 8 Neither Surface Nonsurface 18 KKP 247 KKP 94 KKP 90 KKP 12 55 16 429 218 2 Ii KKP KRP KRP KRP RKP RKP RRP (cO (8) (8) 228 3ORNVALL P~O 0 0 ~ 0 0 0 m 9 .,-I 0 ~1 u~ i Q#I o g~o" ~ J~ o 0 0 ~ ~r ~ + /'b 4-I ~Q o ~ 4.1 ~ , ~ 9 ,,~ ~ ~.~ ~ o ~ ~ ~ gl 0 9 0 ~ m ~ ~.~ 0~'~ ,-~ ~.~ ~ 0 ~ ~ co.r~ ~.,~ co & PERSSON RESTRICTED SEQUENCES IN PROTEOLYSIS 229 r ~ ca ~ ~. ~ o~ "~ a2 x~ 230 JORNVALL & PERSSON the sequences are s u r f a c e - p o s i t i o n e d , which is natural since a dibasic s t r u c t u r e can generally not be a c c e p t e d internally in a stable protein c o n f o r m ation. Of more i n t e r e s t , however, is the f a c t t h a t the m a j o r i t y of the Lys Lys {Arg)-{Arg}-Pro sequences (Table 2) are in regions lacking secondary L ys Lys structure. Thus, of a total of II {Arg}-{Arg}-Pro in structures of known conformations, g are in regions without close association to elements of secondary structure (Table 4). In contrast, {Lys r Lys X Arg#-iArg }(X ~ Pro) sequences, the opposite applies: many for are close to elements of secondary structure (g2 of 103, Table 4). Naturally, frequent borderline cases exist, and several structures are difficult to judge. Nevertheless, the shifts in general occurrence of nonPro residues, as shown in Table 4, are substantial. It is therefore possible that increased stabilization of proteins against cleavages after dibasic structures can be obtained not onJy by conformation, but also by a subsequent proline residue. The proline contribution appears both common and general enough to be visible in whole protein properties (Tables i - 4 ) . Dehydrogenases - a single protein family In order to test the results from average distributions of different proteins, dehydrogenases with known conformations were similarly analysed, as an example of single protein families. As shown in Table 5, dibasic structures not protected by a subsequent proline residue are p r e f e r e n t i a l l y found in dehydrogenase regions of ordered structure. Conclusions from proteins in general are therefore noticeable even in individual protein families, and restricted sequences may complement conformations in providing stable proteins. Table 4. Summary of correlations between sequences and conformations Values denote the number of sequences of each type (counting +/- as -) in Tables 2 and 3. Some of the surface/non-surface distinctions are questionable. Conformation Sequence a or 8 K K {R}-{R}-X K K {R}-{R}-P Neither Surface Non-surface + 82 16 5 3 8 0 RESTRICTED SEQUENCES IN PROTEOLYSIS 231 Table 5. Conformations around Lys-Lys sequences in dehydrogenases with known tertiary structures Data from references 21-24. As shown, two Lys-Lys-Pro structures are accessible without protection from elements of secondary structure, whereas four structures not protected by Pro are inside long ~-helices 9 close to elements of secondary structure, or partly shielded. Alcohol dehydrogenase Lys-Lys-Pro (positions 18.20 irregular structure; superficial Lys-Lys-Pro (positions 247-249) irregular structure; between end of B and start of Lys-Lys-Phe (positions 338-340) after ~; partly shielded in inter-domain cleft Lactate dehydrogenase Lys-Lys-Ser (positions 317-319) inside e-helix (covering 309-324) Glyceraldehyde-3-phosphate Lys-Lys-Val (positions 114-116) dehydrogenase at start of 8 (covering 115-118) (23) or 115-120 (24)) Lys-Lys-Val (positions 256-258) inside e-helix (covering 251-265) Discussion 'Restricted' sequence The d a t a in T a b l e s 1-5 suggest a role of proline residues in regulation of proteolysis. Proline is over-represented at potentially s e n s i t i v e s i t e s in regions not s t a b i l i z e d by secondary structure. C o n s e q u e n t l y , dibasic structures not stabilized by either secondary structure or subsequent proline appear to form a type of 'restricted sequence' in proteins not destined for proteolysis. Thus, both sequence signals and sequence restrictions seem to apply to proteolysis, in a similar mode as earlier demonstrated for glycosylation. The pro-form cleavage may of course also be regulated by additional factors, and sequence restrictions may also apply to further types of proteolysis (especially, perhaps, to the similar trypsin-type of specificity). In conclusion, restricted sequences appear applicable to two protein modifications, cleavages (this work) and glycosylations (15), and may i n d i c a t e a general principle of protein regulation. Independent of possible extensions to other modification signals or of additions of other proteolysis signals, the present data suggest the presence of a new t y p e of restricted sequence, demonstrating the importance of protection against proteolysis. Acknowledgements Structural studies facilitating this work were supported by grants from the Swedish Medical Research Council (project 13X-3532), the Swedish C a n c e r S o c i e t y (project 1806), and the Magn. Bergvall's Foundation. 232 3(SRNVALL & PERSSON References i. Neuberger A, Gottschalk A, Marshall RD & Spiro RG (1972) in 'Glycoproteins' (Gottschalk A, ed), 2nd ed, pp 464-470, Elsevier, Amsterdam. 2. Feramisco JR, Glass DB & Krebs EG (1980) J. Biol. Chem. 255, 4240-4245. 3. Mercier J-C, Grosclaude F & Ribadeau-Dumas B (1971) Eur. J. Biochem. 23, 41-51. 4. Ekman R, Hakanson R & J~rnvall H (1981) FEBS Lett. 132, 265-268. 5. Steiner DF (1976) in 'Peptide Hormones' (Parsons JA, ed), University Park Press, Baltimore, pp 49-64. 6. Pradayrol L, Jornvall H, Mutt V & Ribet A (1980) FEBS Lett. 109, 55-58. 7. Austen BM (1979) FEBS Lett. 103, 308-313. 8. Jornvall H (1975) J. Theor. Biol. 55, 1-12. 9. Aubert J-P, Biserte G & Loucheux-Lefebvre M-H (1976) Arch. Biochem. Biophys. 175, 410-418. I0. Beeley JG (1977) Biochem. Biophys. Res. Commun. X6, 1051-1055. ii. Geisow MJ & Smyth DG (1980) in 'The Enzymology of Post-Translational Modification of Proteins', vol 1 (Freedman RB & Hawkins HC, eds), Academic Press, London, pp 259-287. 12. Porter RR (1959) Biochem. J. 73, 119-126. 13. Walsh KA, Ericsson LH, Parmelee DC & Titani K (1981) Ann. Rev. Biochem. 509 261-284. 14. JSrnvall H & Philipson L (1980) Eur. J. Biochem. 104,237-247. 15. Hunt LT & Dayhoff MO (1970) Biochem. Biophys. Res. Commun. 39, 757-765. 16. JSrnvall H, Pettersson U & Philipson L (1974) Eur. J. Biochem. 48, 179-192. 17. Dayhoff MO, ed (1972) 'Atlas of Protein Sequence and Structure', and (1973) Suppl. 1, (1976) Suppl. 2, and (1978) Suppl. 3, National Biomedical Research Foundation, Silver Springs, Maryland. 18. Dayhoff MO, ed (1978) in 'Protein Segment Dictionary' National Biomedical Research Foundation, Silver Springs, Md. 19. Feldmann RJ, ed (1976) in 'Atlas of Macromolecular Structure on Microfiche' (AMSOM), Tracor Jitco, Inc., Rockville, Md. 20. Blundell T, Lindley P, Miller L, Moss D, Slingsby C, Tickle I, Turnell B & Wistow G (1981) Nature (London) 289, 771-777. 21. Eklund H, NordstrSm B, Zeppezauer E, S~derlund G, Ohlsson I Boiwe T, SSderberg B-O, Tapia O, Brand~n C-I & Akeson A (1976) J. Mol. Biol. 102, 27-59. 22. Holbrook JJ, Liljas A, Steindel SJ & Rossmann MG (1975) in 'The Enzymes' (Boyer PD, ed), vol ii, pp 191-292, Academic Press. 23. Harris Jl & Waters M (1976) in 'The Enzymes' (Boyer PD, ed), vol 13, pp 1-49, Academic Press. 24. Harris Jl & Walker JE (1977) in 'Pyridine Nucleotide-Dependent Dehydrogenases' (Sund H, ed), pp 43-61, Walter de Gruyter, Berlin.