* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Structural Prediction of Membrane
Signal transduction wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Interactome wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Peptide synthesis wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Point mutation wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Metalloprotein wikipedia , lookup
Western blot wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Homology modeling wikipedia , lookup
Anthrax toxin wikipedia , lookup
Genetic code wikipedia , lookup
Biosynthesis wikipedia , lookup
Eur. J . Biochem. 128. 565-575 (1982) .?, FEBS 1982 Structural Prediction of Membrane-Bound Proteins Patrick ARGOS, J . K. Mohana RAO. and Paul A. HARGRAVE Department of Biological Sciences, Purdue University, West Lafayette. Indiana; and School of Medicine and Departmcnt of Chemistry and Biochemistry, Southern Illinois University at Carbondale (Received April 1Y :July 6, 1982) A prediction algorithm based on physical characteristis of the twenty amino acids and refined by comparison to the proposed bacteriorhodopsin structure was devised to delineate likely membrane-buried regions in the primary sequences of proteins known to interact with the lipid bilayer. Application of the method to the sequence of the carboxyl terminal one-third of bovine rhodopsin predicted a membrane-buried helical hairpin structure. With the use of lipid-buried segments in bacteriorhodopsin as well as regions predicted by the algorithm in other membrane-bound proteins, a hierarchical ranking of the twenty amino acids in their preferences to be in lipid contact was calculated. A helical wheel analysis of the predicted regions suggests which helical faces are within the protein interior and which are in contact with the lipid bilayer. X-ray diffraction studies of crystalline soluble proteins have resulted in nearly one-hundred known tertiary structures [l, 21. With their advent have come many secondary-structure prediction methods which require only a knowledge of the amino acid sequence (cf. 13 - 51). These techniques generally rely on a statistical or informational analysis of the frequency with which the 20 amino acids appear within the observed secondary structures (a-helices, p-strands, and reverse turns). Since their data base is composed of only soluble protein structures, the prediction algorithms may not be applicable to primary sequences of proteins that are in contact with a membrane. The only membrane-bound polypeptide topology known to a reasonably high resolution has been determined by electron-scattering techniques for the purple-membrane protein of Halobacterium halobium [6,7] whose seven rods of electron density are believed to be a-helical segments that traverse the membrane [8]. Such a structure should be predictable from the amino acid sequence alone through utilization of the physico-chemical characteristics of the residues interacting with the uniquely apolar membrane environment. A prediction algorithm based on the physical characters was developed in the present work and refined by application to the bacteriorhodopsin primary structure. The correlation between the predicted helical and turn regions and those of the model structure suggested by Engelman et al. [8] was 0.69, a value at least comparable to the prediction accuracy of techniques devised for soluble proteins [5,9,10]. The prediction method was then applied to the known C-terminal sequence of bovine rhodopsin and to the primary sequences of other membrane-bound proteins. Two turn segments and two helical regions were predicted for bovine rhodopsin. The algorithm clearly delineated likely membrane-bound regions within the amino acid sequence of the other proteins. A hierarchical ranking of the twenty amino acids in their preference to membrane buried was then determined. A helical ‘wheel’ analysis of the predicted regions indicated the helical faces within the protein interior and in contact with the lipid bilayer. Etqwes. Adcnosinetriphosphatase (EC 3.6.1.3). superoxide dismutase (EC 1.15.1.1); lysozyme (EC 3 2.1.17); ribonuclease (EC 3.1 27 5); adenylate kinase (EC 2.7.4.3). MATERIALS AND METHODS The physical parameters used to characterize the twenty amino acids include: (1) the hydrophobicity index measured by Nozaki and Tanford [ll], (2) the free energy of transfer calculated by Janin [12], (3) the residue ‘surrounding’ hydrophobicity statistically determined by Manavalan and Ponnuswamy [13], (4) the experimental hydration potential of Wolfenden et al. [14], (5) estimated standard free energies of transfer for a given residue type in a helix in water to a helix in a non-polar phase [15], (6) the conformational propensity parameters for reverse turn configurations as calculated by Levitt [16], (7) residue bulkiness and (8) polarity listed by Jones [17], and (9) helical conformational preference values determined in the present investigation from the putative buried helical regions in signal sequences. The hydrophobicity measures 1-5 were all made positive and the characteristics were normalized to a value of 1.0. All the parameters are listed in their normalized and scaled form in Table 1 . The physical characteristics were selected as likely candidates to discern the membrane-buried segments of proteins except for the turn potential which would point to the exposed peptide spans connecting the secondary structure in lipid contact. Sequence spans with high hydrophobicity, bulkiness, and membrane-buried helical potential as well as low polarity and turn preference would be expected for regions interacting with the uniquely apolar membrane. Various hydrophobicity measures were utilized as there were no a priori means to judge their sensitivity and ability to select membrane-bound segments. Parameters 1 - 5 attempt to quantify the hydrophobic character of the twenty amino acids through empirical investigations or statistical analyses of the exterior and interior positions of residues as they appear in known protein structures. Wolfenden et al. [I41 do not list the hydration potentials for glycine and arginine. These potentials were assigned according to the method of Moews et al. [18] who proportioned the residue hydration states to their degree of buriedness in soluble proteins as determined by Chothia [19]. Characteristic 9 is a buried helical conformational measure similar to the Chou-Fasman preference parameters for helices 566 Table 1. Scaled and norinaliied physical paramuiers for the twenty amino acids The parameters are nuinbered according to the scheme given in the text. References for the parameters are also given In thc text Amino acid (1) Hydrophobicity index (2) Free energy of transfer (3) Surrounding Hydrophobicity (4) Hydration potential (5) Burried transfer free energy (6) Turn preference (7) Bulk (8) Polarity (9) Slgnal sequence helical potential Ah cys 0.61 1.07 0.46 0.47 2.02 0.07 0.61 2.22 1.15 1.53 1.18 0.06 1.95 0.00 0.60 0.05 0.05 1.32 2.65 1.88 1.24 1.60 0.71 0.65 1.36 1.24 1.01 1.48 0.00 1.36 1.30 0.77 0.89 0.65 0.24 1.01 0.95 1.42 0.24 0.83 1.01 1.14 0.84 0.92 1.09 0.97 0.94 1.22 0.88 1.16 1.12 0.89 0.88 0.91 0.91 0.87 0.91 1.22 1 .ox 1.04 2.00 0.50 0.40 2.28 1.96 0.00 0.72 1.44 0.19 1.60 0.21 0.15 1.33 1.42 1.23 3.10 1.03 0.91 0.29 0.29 0.95 0.82 0.84 1.48 1.05 0.61 1.73 0.73 0.54 1.01 0.62 0.41 1.35 2.01 1.02 0.93 1.40 1.08 0.49 0.79 1.11 0.75 0.88 0.76 0.88 1.29 0.22 0.89 1.39 1.02 1.39 1.06 0.83 1.13 0.94 0.93 0.62 1.03 1.40 1.41 1.17 0.97 0.66 I .56 1.48 0.62 1.08 1.25 0.62 1.36 0.59 0.68 1.39 0.96 1.26 1.26 1.10 1.03 0.71 0.65 0.74 1.81 1.89 0.05 0.11 1.96 0.49 0.3 1 1.45 0.06 3.23 2.67 0.23 0.76 0.72 0.20 0.97 0.84 1.08 0.77 0.39 ASP Glu Phe GlY His Ile Lys Leu Met A sn Pro Gln Arg Ser Thr Val Trp TYr 1.51 0.01 0.13 1.58 2.07 0.12 2.03 0.23 2.06 1.47 0.20 0.93 0.25 0.00 0.92 0.94 2.00 0.79 0.75 in soluble proteins [20,21]. It is defined as the ratio of the fractional occurrence of an amino acid within the putative helical regions in signal sequences to its fractional occurrence in known protein structures [16,22]. A 705-residue data base compiled from signal sequences [I 5,231 was used to determine the preference parameter. Plots of the amino acid sequence number versus a given parametric value for a particular amino acid in a primary sequence were calculated. The curves were then ‘smoothed’ through several cycles according to the methodology of Rose and Siddhartha [24] who determine a least-squares line for all successive seven-point groups to calculate points for the smoothened graph. The plots for the nine parameters were then scaled such that their average value was 0.0 and their standard deviations from the zero line were the same. A plot resulting from an additive combination of the various parameters appropriately weighted was then calculated to give the best prediction of the bacteriorhodopsin structure as determined by Engelman et al. [8]. Amino acids with a sequence number corresponding to a combined parametric value greater than zero were considered to be in a helical conformation while those below zero were assigned as turn residues. The weighting factors for each of the characteristics in the combined graph were determined by a linear leastsquares procedure. A hypothetical curve corresponding to the Engelman bacteriorhodopsin structure was devised such that trigonometric sine curves were fitted to each of the seven helical regions and eight turn spans; that is, the hypothetical value Pi for the ith amino acid in the sequence and the kth helix or turn region is given by Pi= Fksin (2jii), 0 <.hi5 112 (1) is + 1.O for helical spans and - 1.O for turns, fk is where Fk the appropriate scale factor for the kth secondary structural region, and i is the amino acid number of the particular bacteriorhodopsin residue. The weighting factors Wl for each of the I characteristic values (Cl) were then determined by linear least squares from the equation 9 C Wi C1.i = Pi, i = 1, N I= I (2) where N is the total number of amino acids in bacteriorhodopsin. The correlation coefficient (CF) used to measure the agreement between prediction and observation is defined by Matthews [25] as where w is the number of residues predicted and observed helical, x is the number predicted and observed in the turn configuration, y is the number predicted nonhelical and observed helical, and z is the number predicted helical and observed nonhelical. Concern regarding the fact that the model of Engelman et al. [8] is not fully established by experiment is allayed by the reasonableness of the model and its basis in empirical studies. The structural model does represent the best fit of the Henderson and Unwin electron density [6] and the Khorana et al. amino acid sequence [26]. A recent refinement of the projected bacteriorhodopsin electron density to 3.7 A resolution [27] confirms the Henderson and Unwin results even to the extent of a density region that can possibly be assigned to the lysine-retinal attachment site in the C-terminal helix. Furthermore, the helices were conservatively placed which would be advantageous for the prediction algorithm. In the present work the bacteriorhodopsin model served primarily to select the best paramelers and refine their weighting factors. This was tested by calculating a plot based on all nine characteristics with unit weights of appropriate sign. The plot was essentially the same as that resulting from model refinement with the exceptions of the peak for the fourth helix which was not as high (vide infiu) and the correlation with the 567 proposed structure which was 0.60 instead of0.69 after refinement. The unit weight approach also gave curves for the other proteins tested similar to those resulting from the refined weighting factors. The ability of the prediction method to detect membrane-buried segments is thus more a consequence of the physical parameters selected rather than the bacteriorhodopsin model structure. Cross-correlation coefficients were determined for various pairs of the physical parameters. If the twenty-component series for two physical characteristics are respectively designated by [si] and [vi], then their cross correlation (CCF) becomes energy) followed by two cycles of smoothing yielded a predicted correlation of 0.50 relative to the modcl structure derived from electron scattering data [6,7]. The combined graph correlated better with the observed structure than any of the individual plots. However, the additive hydrophobic function was unable to distinguish helices F and G and predicted them as one long helix. Bulk and the buried helical preference parameter also correlated positively with the model helical segments while polarity and the turn potential showed an inverse relationship. A combined graph of the four nonhydrophobic parameters with appropriate sign reversals for polarity and turn preferences showed a correlation of 0.49 with the model structure: however, the D helix was completcly 20 20 ,.i =z1. (Xi- 2 ) ..= .z,-0 . i - j) missed in the prediction (Fig. 1). The weak D helix response C C F = - - __ - - _ _ can be explained by the large number of glycine residues with(4) 1 2 in the helix. The glycine turn potential is strong [I61 and its z (Xi - iy (yi-?;)2] i*O =l bulk is minimal [17]. where .Y and 7 are the respective average values for the two Weighting factors for all nine parameters were then deterphysical parameters. mined using the least-squares procedure based on the hypoNormalized Chou-Fasman preference parameters (PM) thetical trigonometric curve derived from the model structure. are calculated for the twenty amino acids to indicate their The value of Fk was chosen as 1 .O (Eqn 1). The resulting curve likelihood to be within a membrane. They are defined as predicted well all the helical and turn segments except the D helix and adjacent turn spans and the turn segment between helices F and G. In the hypothetical curve these latter regions were given F k values of 2.0 to 8.0 in integral steps with weighting factors and prediction correlation coefficients (Eqn 3) calculated at each step. It was clear from the correlations where NMqiis the number of times the ith amino acid type with the model bacteriorhodopsin structure as well as from appears within primary sequences predicted to be membrane visual inspection that Fk = 4.0 provided the best curve. Four buried and Ns. i is the number of times the ith amino acid type of the nine weighting factors had values near zero or a sign appears within sequences of soluble proteins. The denomina- opposite from that expected. The hydration potential, turn tor which is the fractional occurrence of a residue within the propensity, polarity, buried transfer free energy, and bulk soluble protein data base normalizes the P M values [20,21]. had respective weights of 2.42, - 1.68, - 0.76, - 0.73 and 0.52. The ratio of contributions (hydrophobic to non-hydrophobic) to the final plot showed a ratio of 1.08. The weighted physical parameters are listed in Table 2. RESULTS AND DISCUSSION The selection of the hydration potential and the buried Bacturiorliodopsin transfer free energy as the best hydrophobicity measures for membrane-buried amino acids would be expected. Nozaki The 248-residue primary structure of bacteriorhodopsin determined by Khorana et al. [26] was used in the present and Tanford [ l l ] compared empirically the hydration states investigation. The seven helices of the model structure are of isolated residues in an aqueous environment or in solutions sequentially labelled A-G from the N to the C termini, containing ethanol or dioxane which are questionable in following the pattern designated by Engelman et al. [8]. Appli- their ability to mimic the lipid bilayer environment. The surcation of the refined prediction method to the sequence is rounding hydrophobicities of Manavalan and Ponnuswamy shown in Fig.1 and was developed as described below. The [13] were calculated by averaging the Nozdki-Tanford values algorithm essentially consists of the addition of several physi- for residues surrounding a given amino acid type within cal characteristics (e.g. hydrophobicity) for each amino acid known soluble protein structures which again are unlikely within the sequence. A plot of the additive function versus to reproduce the membrane environment given their interior sequence number results in maxima and minima indicating hydrogen bonds. Similarly the Janin transfer free energies [12] respectively helical and turn regions in the bacteriorhodopsin were determined through the buried/accessible surface area ratios of amino acids in known soluble tertiary structures. The structure. Plots of the bacteriorhodopsin sequence number versus Wolfenden values [14] result from measured vapor-water the residue characteristic value were determined for each of partition coefficients for model compounds identical to each the nine physical parameters (Fig. 1); three cycles of smooth- of the amino acids. The buried free energies [15] are estimates ing were performed. Each of the graphs was then standardized for transfer of residues originally in a helix in water to a helix to a zero parametric value. It was observed that positive peaks in a non-polar phase lacking hydrogen-bonding capacity. Fig. 2 displays the weighted five-parameter plot for bacin the graphs for the all hydrophobic parameters except for the buried transfer free energy correlated with the helical regions teriorhodopsin; two cycles of smoothing were used after the in the Engelman model of bacteriorhodopsin while the nega- weighted combination of the already smoothed graphs for tive troughs corresponded to the turn regions between the the five parameters. The additive function clearly resulted in a buried helical segments (Fig. 1). The phase was opposite for superior correlation (0.69) with the model structure than the buried transfer free energy. These results would be expected did any of the individual parameters. Helices F and G were given the membrane environment of the helices. A graph distinguishable and the D helix region was delineated by a resulting from the addition of the five hydrophobic curves positive peak value (Fig. 2). If the combined parametric value (with appropriate sign change for the buried transfer free for a particular amino acid was greater than or equal to zero, I 568 -1.m j 0.0 w.0 Im.ogRtTRIdO.O 1 m.0 m.0 -1 .om 0.0 im.agRCTRiw.o QO.O m.0 .sm .sm 2 w.0 g .m + .om -.cm - .sm .m z> LL w .om E + m e - .sm -1 .om -.m .om 0.0 w.0 m . 0BRCTR 160.0 mO.0 W.0 Fig. 1. Plot o / ’ / h p amino arid sequrnw numher.for bacteriorhodopsin (BACTR) versus various umino acid physical parurne/ers discussed in /At /c.y/. The solid lines have undcrgone three cycles of smoothing. The dashed line rfers lo the helical (positive peaks) and turn (negative troughs) regions of the bacteriorhodopsin model structure. Abbreviations as well as the text numbering scheme used for the physical characteristics are (1) HFBNT, hydrophobicity index [ l l ] ; (2) HFBJN, free energy of transfer [12]; (3) HFBMP, surrounding hydrophobicity [13]; (4) HDPTL. hydration potential [14]; ( 5 ) BTRFE, buried transfer free energy [IS]; (6) TURN, Chou-Fasman turn conformational prcference parameter [ l h ] ; (7) BULK, bulk parameter [17]; (8) PLRTY, polarity [17]; (9) HELBR, membrane-buried helical potential dctermined from signal sequences (Table 1): and HELCF. Chou-Fasman helical conformational preference parameter [16] 569 Table 2. Tlwfiiv ireiglitcd ptiysical parumeters that provided thc htTsI pretlii'tion (!/ tht) Iielical .socoriclarv sirucrurnl regions in hactcriorl~oclopsin The parameters are numbered according to the scheme given in the text Amino acid Ala cys ASP Glu Phe GlY His Ile Lys Leu Met Asn Pro GI I1 Arg Ser Thr Val TrP Tyr (4) Hydration potential (5) Buried transfer free energy (6) Turn preference (7) Bulk 4.84 3.65 0.02 0.31 3.82 5.01 0.29 4.91 0.56 4.w 3.56 0.48 2.25 0.61 0.00 2.23 2.27 3.84 -0.36 - 0.29 - 1.66 - 1.43 -0.00 -0.53 - 1.05 -0.14 -1.17 -0.15 - 0.11 -0.97 - 1.04 - 0.90 - 2.26 -0.75 -0.66 -0.21 -0.21 -0.69 - 1.38 - 1.41 - 2.49 - 1.76 - 1.03 0.39 0.46 0.40 0.46 0.67 0.11 0.46 0.71 0.53 0.71 0.55 0.43 0.59 0.49 0.48 0.32 0.54 0.73 0.73 0.51 1.91 I .82 -2.91 - 1.23 -0.91 - 1.70 -1.04 -0.69 - 2.27 - 3.38 - 1.71 - 1.56 -2.35 - 1.81 -0.82 - 1.33 - 1.86 (8) Polarity Helical designation - 0.74 - 0.50 - 1.19 - 1.12 - 0.47 - 0.82 - 0.95 - 0.47 - 1.03 - 0.45 0.52 - 1.06 - 0.73 - 0.96 - 0.96 - 0.84 - 0.78 0.54 - 0.49 - 0.56 ~ ~ 2.0 1.o Table 3. Trunsmemhrane helicc~s41' haetc~ric~rliodupsiti The primary structure used is that of Khorana et al. (261. The predicted helices arc those determined from the prediction method discussed in the present work while the model hclices are those proposed by Engelinan et al. [8] n x VJ 3 0.0 + x @= W a x e-1.o I-x Predicted helices Model helices 11- 29 45- 66 88- 99 105- 127 134- 156 174-192 200 - 225 8 - 31 41- 64 78- 101 107-130 133- I57 167-191 1 YX - 224 Some researchers (cf. [23,28,29]) have utilized the ChouFasman conformational propensity parameters calculated from soluble protein structures to predict the sccondary structure of proteins that are at least partially membrane-bound. It is noteworthy that the Chou-Fasman helical parameter [16] resulted in a correlation of 0.01 between the predicted and model bacteriorhodopsin structures while the helical propensities determined from signal sequences were able to locate five of the seven helices in bacteriorhodopsin, yielding a correlation accuracy of 0.35 (Fig. 1). Use of thc Chou-Fasman S-strand preferences [I61 showed a correlation of 0.48. This would be expected as the hydration potential plays a significant role in predicting the bacteriorhodopsin helices and the ,4-strand parameters and hydration potential show a CCF value of 0.46 (Eqn 4). The helical preference display a near random cross correlation of - 0.10 with the hydration potential. The conformational preference parameter calculated from the putative helices in signal sequences cross correlates with the P-preference values and the hydration potentials as 0.32 and 0.76 respectively. These results would suggest that erroneous secondary structure predictions based on the soluble protein preferences would be expected for primary structural regions that exist within membranes. -2.0 Bovine Rliodopsin -3.0 Fig. 2. P h of' Ihc amino acid sequence numher j u r hrirteriorl~odupsin[26] wrsus a wc~ightrrljive-purumeter characteristic value jor a given amino (rcid. Thc termini of predicted helices were dcterrnined by the intersec- tion points of the positive peaks with the zero line. Turn regions were similarly dclineated at the trough intersection points. Helical segments determincd from the electron scattering data 16-81 are shown as black bars and labelled A - G according to Engelman et al. [8]. The parametric value is shown as zero at each of the seven terminal residues due to thc end-effects of the least-squares 'smoothing' procedures [24] the residue was assigned as helical; the turn regions were indicated by parametric values less than zero. Table 3 lists the predicted and observed helical regions. The combined approach predicted correctly 82 "/, of the helical residues and 89"; of the turn regions. The missed predictions at the N-termini of helices C and F are primarily caused by the appearance of charged amino acids. The weighted .five-parameter algorithm was applied to the 108 residues in the bovine rhodopsin C-terminal sequence [30-321 where the amino acid at the C-terminus is designated 1'. Two helices separated by a turn region and a long C-terminal turn span were clearly delineated (Fig. 3). Helix 1 at the C-terminus is comprised of residues 43' to 66' while helix 2 rangcs from 73' to 97'. Since the rise per residue in an ?-helix is near 1.5 A, a 25-residue stretch can traverse about 35 A, a typical membrane width. It is also striking that the C-terminal helices in both the bacterial and bovine rhodopsins are characterized by a two-prong peak with their minima corresponding to lysine residues known to be retinal attachment sites for both proteins [33,34]. Replacement of the lysine by a leucine did not alter the shape of the peak for bovine rhodopsin while the bacteriorhodopsin peak tended toward some smoothing. This result may indicate a special environment for retinal binding in the two rhodopsins and yet some differences relating to the unique functions of the chromophore in the two rhodopsins. Though the prediction algorithm has been calculated for only the C-terminal residues of bovine rhodopsin, it is suggested that its fold is similar 570 Table 4. Amino acid scyrrenc.e spans ‘predicied‘ to he nwmhrane buried by the algorithm discu.wvl in the present work us u~ellas those regions ‘suggested‘ by chemical or theorcticul investigutions Spans are only given if their weighted five-parameter peak heights were greater than or equal to 1.0. An ‘X’in the ‘suggested’ column indicates that a span with specific amino acid boundaries was not given by the authors listed in the references Molecule Predicted region Glycophorin A 61- 98 Peak height 3.8 Suggested region 65- 98 Cytochrome bs 103-128 3.5 103-125 Cytochrome 204 - 229 3.0 X 9 - 20 52- 74 9- 20 45- 62 4- 18 1- 22 80- 93 125-156 180- 193 252-213 280- 31 3 329 - 340 25- 42 56- 80 1.1 2.1 1 .o 1.8 1 .I 1 .o 1 .o 1.2 1.3 1.8 1.3 1.3 X 1.2 1.3 X c.1 ATP synthase MI3 coal protein Pre x-casein Porin Proteolipid (N.crassa) 3*0 2 .o X 4- 21 X 1 Comments References Hydrophobic domain determined by tryptic cleayage sites Two C-terminal helical segments suggested to be buried completely in mcinbrane Strongly hydrophobic span noted near predicted region Residues 51 -81 and 1-36 observed as largely hydrophobic Two spans suggested on either side or alanyl residue 24, an exposed protcase cleavage site Signal proteasc clcaves at Ala-21 Residua 83- 194 noted as nonpolar 140,671 Chemical evidence suggests Glu-65 to be membraneburied; circular dichroic studies indicate a helical hairpin structure [50,63] [681 (411 1421 [44,53] [23,51J [391 structure will allow a more significant test of the hypothesis. The details of the structure-function relationship of the helical structure are discussed elsewhere 1321. Membrane-bound Proteins -1 .o -2.0 Fig.3. Plot of the amino acid sequence number for the 108 C-terminal residues in bovine rhodopsin 132J versus u weightedfive-parameter characteristic value for u given amino acid. The residues are numbered from the C-terminus as 1 ’ to 108’. The two predicted helical spans designated helix 1 and helix 2 are shown as black bars on the zero line. The residue at 85’ was not determined but was taken as cysteine which is the comparable amino acid in the homologous ovine rhodopsin sequence [66] to bacteriorhodopsin. Further support is provided by circular dichroic studies which indicate the secondary structure of the bovine rhodopsin to be 58% helical [35], linear dichroism which suggests aligned transmembrane helices [36,37] and neutron diffraction investigations that indicate about half of the bovine rhodopsin molecular mass to be buried in lipid [38]. A knowledge of the complete primary The weighted five-parameter algorithm was applied to the primary structure of other proteins known to be at least partially bound to a lipid bilayer in an attempt to delineate the buried regions. The proteins included the pore-forming outer membrane protein I (porin) from Escherichia coli B/r [39], human erythrocyte glycophorin A [40], bovine heart cytochrome (‘1 [41], the proteolipid subunit of the ATP synthase from spinach chloroplasts [42], coliphage MI 3 coat protein [43-461, bovine microsomal cytochrome hs [47-491, the proteolipid subunit of the mitochondria1 adenosinetriphosphatase from Neurospora crussu [50], and various amino-terminal extension sequences of secreted proteins (signal sequences) as listed by Austen [23] and von Heijne [15]. The plots for these proteins and an exemplary signal sequence (ovine pre x-casein [51]) are shown in Fig.4. Table 4 lists the amino acid sequence spans that are associated with peak heights greater than 1.O. Due to the end effects of the sevenpoint smoothing process, the seven residues at the two termini of the primary structures are assigned a zero parametric value in Fig.4. If a peak height occurred near these regions, the terminal amino acid number was calculated as K f N where K is the nearest residue number whose parametric value ( P K ) was unaffected by smoothing and N is the rounded integral value of (PK/0.2). Table 4 also shows the amino acid spans suggested to be within the membrane by chemical or theoretical investigations. The predicted membrane-buried regions for glycophorin A and cytochrome hs agree well with the segments determined 571 Y.0 3.0 n 3 .o 2.0 E: 2.0 2 r 3 y 1.0 c K Y LL E 0.0 c -1 .o -2.0 3.0 9.G 7 _I Y.D 3.0 2.0 P.0 5 3 1.0 1.o r In c -2.0 u Fig. 4. Plot of the amino acid st'quence number ( A A NO) for residues in various membrane-bound proteins versus a weiglited ,fi:ve-parametercharorferistic.vulue,fi)ra given amino acid. The residues are numbered from the N-terminus for all proteins. The termini of membrane-buried regions were determined by the intersection points of the positive peaks with the zero line. The parametric value is shown as zero at each of the seven terminal amino acids due to the end-effects of the least-squares 'smoothing' procedures [24]. The sources of the amino acid sequences are referenced in the text chemically. Cytochrome c1 also displays a tail that is likely to be encompassed by the lipid bilayer. The parametric peaks for these three proteins are the largest in the sample, ranging from 3.0-3.8. Since the prediction algorithm was refined by identification of the helices in bacteriorhodopsin, it is suggested that the buried spans in the three proteins will adopt a similar secondary structural configuration. The lengths of the predicted helices are cornpatable with typical membrane widths to be spanned. It is also noteworthy that the buried tails are in the C-terminal regions of the molecules. Membrane-buried helices have been proposed for signal sequences (cf. [15,23,52]). Though the x-casein example shows a peak height of only 1.7, the delineated membrane segment is in good agreement with protease cleavage experiments [51]. Similar investigations suggest two buried helical regions for M13 coat protein [53] and yet the parametric Amino acid Normalized preference value Met 2.96 2.93 2.03 I .67 1.56 1.23 Leu Phe Ile Ala cys Val TrP Thr Ser Pro TYr GlY Gln A% His Asn Glu LYS ASP 1.14 1.08 0.91 0.81 0.76 0.68 0.62 0.51 0.45 0.29 0.27 0.23 0.15 0.14 maximums for the two regions are only 1.8 and 1.O. Though the membrane inserted spans of ATP synthase are not well known chemically, the algorithm indicates two such regions. Thc character of the porin plot is certainly an exception. There are seven peaks greater than 1 .O with an average length of 21 residues; no buried tail is indicated. Porin is considered to be an integral membrane protein which forms hydrophilic channels allowing the diffusion of various low molecular weight solutes (see Lodish and Rothman [54] for a recent review). Circular dichroism and infrared spectroscopy indicate that a large fraction of the polypeptide exists in p-conformation [55]. Kennedy [56] has proposed various types of regular structures which can favorably interact with the bilayer interior: a-helices, P-barrels, and a Family of &helices. Engelman and Steitz [52] suggest from energetic consideration of polypeptide structures in a lipid environment that only a and 3,o helices will be observed in the membrane hydrophobic interior. Porin has been recently crystallized [57] ;a knowledge of its tertiary structure must await the X-ray dilTraction results. A few peak values in the remaining soluble regions of cytochrome 65 approach 1.0 (Fig.4). In order to distinguish between membrane-buried segments and hydrophobic regions in soluble proteins, the algorithm was applied to the primary sequence of eight soluble proteins with two taken from each of the four basic structural classes [58]: all-a (sperm-whale myoglobin and Themiste pyroides hemerythrin) ;all$ (bovine Cu,Zn-superoxide dismutase and jack-bean concanavalin A) ; a + /I (hen egg-white lysozyme and bovine pancreatic ribonuclease A) ; and u / j (bacteriophage T4 thioredoxin and porcine muscle adenyl kinase). The primary structures are listed by Levitt and Greer [22]. The plots for the soluble proteins resulted in 43 peaks with parametric values greater than 0.0; their average height was 1 . I with a standard deviation of 0.5 while the largest peak was 1.8 noted in myoglobin. It is suggested that primary structural spans with parametric maximums greater than 2.0 are likely to be within the lipid environment. For peak heights between 0.0 and 2.0 it would be difficult without experimental facts to distinguish between hydrophobic areas in the soluble portion of the protein or protein fragments in contact with or buried within the lipid surface. However, the prediction results coupled with chemical evidence should be useful in discriminating among various structural models for protein-lipid interactions. Chou-Fasman preference parameters were claculated for the twenty amino acids using a membrane-bound protein data base (Eqn 5). It consisted of 1125 residues taken from the seven helical spans of bacteriorhodopsin, the signal sequence segments delineated by von Heijne [I 51 and Austen [23] and residues associated with a peak value greater than 2.0 for the protein structures predicted in Table 4. Also included were segments from several homologous ATP synthase proteolipids listed by Sebald and Wachter [42]. The resulting membrane-buried preference values are given in Table 5 ; they were normalized according to the distribution of the amino acids in known soluble protein structures listed by Levitt and Greer [22]. The correlations of the membranebound preferences with the Chou-Fasman helical and sheet conformational preferences [ 161are 0.32 and 0.36 respectively, indicating that the helical and sheet parameters would yield erroneous predictions for lipid-buried protein segments. Of the five hydrophobic parameters, correlation with the NozakiTanford values was the smallest at 0.44. Buried Helicul Sidednrss Engelman and Zaccai [59] have suggested through neutron-scattering experiments that bacteriorhodopsin is an 'inside-out' protein contrasting with the structural organization of soluble proteins. By comparing their electron density map of deuterated valines and phenylalanines with the bacteriorhodopsin model structure, they indicated which helical Faces were in contact with the exterior lipid bilayer. The charged and polar residues were distributed on the interior faces exposed to the protein while the strongly nonpolar groups were on the exterior faces associated with the membrane side. With the use of the amino acid preferences for lipid contact (Table 5), it was possible to detect the exterior and interior faces for other membrane-bound proteins. The successive C.-atoms in a helical peptide backbone would, in projeclion down the helical axis, lie on a circle with successive positions rotated by 100". Lines emanating from the circle center and passing through the C , positions would show thc projected direction of the associated amino iicid side groups, giving the appearance of a helical 'wheel' [60]. Only 18 unique C, points separated by 20" would exist on the wheel regardless of the helical length as there are on the average only 3.6 residues per turn of an a-helix. This would result in nine possible pairs of faces for any helix. If amino acids with membrane-buried preferences less than 1.O (Table 5 ) tend to cluster on one side of a helical wheel; then the lipidfacing and protein-facing helical sides can be inferred. The helical faces were chosen such that the arithmetic difference between the number of residues N with a preference less than 1 .O on the two helical sides ( N I - N ~was ) a maximum. Wheels for the two predicted helices in the C-terminal region of the bovine rhodopsin sequence are shown in Fig. 5a. The ( N l - N z ) values are respectively (9-2) and (6-0). It is clear that amino acids not favoring lipid contact can be generally relegated to one side of the helices. These results suggest that the retinal-binding lysine sits within the protein interior. A wheel analysis for the seven model helices (A through G ) in bacteriorhodopsin resulted in respective 573 LEU ALP 80' VAL 91' I M E 73' PHE 76,* ALA 77' VAL 95' LEU 83' BOVINE RHODOPSIN I L E 74. M T 92' I L E ffi LEU 2 5 TRP8O DLD 98 PHE 88 LEU 9 4 ALD 81 BACTERIORHODOPSIN (HELIX Cl VAL 101 LEU 9 2 LEU97 I L E 78 LEU loo DLD9 ILE6 2 MET 19 VDL 13 MI3 COAT PROTEIN VDL 17 \ / VALIt I L E 45 MET51 DLD14 (d) DLD62 PHE 80 DLD 28 MET73 DLD 66 LEU 6 9 ILE 58 LEU 76 VDL 35 PROTEOLIPID LEU 5 9 MET 77 PHE 70 PROTEOLIPID DLD38 LEU 34 & I L E ILE77 30 I L E 88 VAL 84 I L E 73 I L E 91 I L E 85 GLYCOPHORIN I L E 76 F- LEU 75 LEU^ ,+0 GLY 7 9 GLy86 Fig. 5. Helical wlteels for the diferenr membrane-hound proteins. (a) Helical wheels [60] for the two predicted membrane-buried helices in the C-terminal segment of bovine rhodopsin. The views are projections down the helical axis with successive C, positions rotated to 100"along the projected peptide backbones. The direction of the projected side chains are indicated by spokes emanating from the wheel center. Amino acids with membrane-buried propensities (Table 5 ) less than 1.0 are boxed. The side of the helices suggested to face the protein interior is indicated by thick lines. (b) Helical wheels for bacteriorhodopsin model helices A and C as proposed by Engelman et al. IS]. (c) Helical wheels for the two predicted membrane-bound regions in coliphage MI3 coat protein. (d) Helical wheels for the two predicted mernbranebound segments in the proteolipid subunit of the mitochondria1 adenosinetriphosphatase from Nuurosporu crassa. (e) Helical wheel for the predicted membrane-buried region in human erythrocyte glycophorin A 514 ( N 1 - ~ 2values ) of (9-1). (7-3), (7-l), (7-3), (5-2), ( 7 4 , and (6-3). Wheels for helices A and C are given as examples in Fig. 5 b. The amphipathic nature of putative membraneburied helices has also been noted for other proteins; for example, diphtheria toxin fragment B [28,61] and E. coli murein-lipoprotein [62]. A model consisting of two membrane-buried helices has been proposed for the coliphage M13 coat protein [53].The prediction algorithm indicates two possible segments of the M13 protein in contact with lipid (Table 3 and Fig. 4). A helical wheel analysis of these regions (Fig. 5c) suggests a sidedness for the putative helices, thus supporting the model of two helices in contact with each other and the membrane. A similar structure can be inferred for the proteolipid subunit of the mitochondrial adenosinetriphosphatases from N. cmssu. The prediction routines applied to the subunit primary sequence [50] suggest two membrane-buried segments (Fig.4) whose wheels show sidedness (Fig. 5d). Recent circular dichroic studies also suggest a helical hairpin structure for the proteolipid monomer [63]. Since the major subunit of the ATP phosphohydrolase complex of the N . crassa mitochondrial inner membrane is likely to be a hexamer of the proteolipid subunits. a twelve-helical bundle can be envisaged for the proteon-translocating structure 1421. The prediction method indicated glycophorin A residues 67 - 98 as membrane-buried candidates. Recent cross-linking studies (A. H. Ross, R. Radhakrishnan, R. J. Robson and H. G . Khorana, unpublishcd rcsults) suggest that Glu-70 is within the bilayer and that residues 62 - 96 probably define the boundaries of the membrane-embedded segment in glycophorin A. A helical wheel for amino acids 73 - 94 displays a distinct sidedness (Fig. 5e). Terminal spans were not included in the wheel as they contained charged amino acids which are likely to interact with the polar lipid headpieces and therefore may not participate in the transmembrane helical structure. The distinct facedness of the putative glycophorin helix would suggest that glycophorin A acts as a dimer within the membrane, which is also supported by cross-linking investigations (see Marchesi et al. [65] for a review). Conclusions A prediction algorithm based on the physical characteristics of the twenty amino acids can be used to indicate which regions in the primary sequence of a membrane-bound protein arc likely to be in contact with the lipid bilayer. A helical wheel analysis of these primary structural spans coupled with chemical evidence allows discrimination among hypothetical models of secondary structural association within the membrane. The authors wish to express thanks to Drs William Cramer, Schlomo Pundak, and Nancy Scavarda for many useful discussion and to Marilyn Anthony for help in the preparation of the manuscript. P.A. acknowledges support from the National Institutes of Health (grant no. GM27682) and the American Cancer Society (Faculty Research Award no. FRAl73). P.A.H. acknowledges support from the National Institutes of Health (grant nos. EY1275 and ER2875) and from the National Science Foundation (grant no. PCM77-17808). REFERENCES 1. Matlhews, B. W. (1976) Annu. Rev. Phys. Chem. 27,493-523. 2. Bernstein, F. C., Koetzle, T. F., Williams. G . J. B., Meyer. E. F., Jr, Brice. M. D., Rodgers, J. R., Kennard. 0.. Shimanouchi, T. & Tasumi, M. (1977) J. Mol. Biol. 112, 532-542. 3. Schulz, G. E. & Schirmer. R. H. (1979) Princ.iplr3.s of I’rofc2inStriwture, pp. 108- 130. Springer, Berlin, Heidelberg. New York. 4. Sternberg, M. J. E. & Thornton. J. M. (1978) Nature (Lond.j 271. 15-20. 5. Chou. P. Y. & Fasman. G. D. (1978) A ~ P En~ymol. . 47. 45- 148. 6. Henderson, R. & Unwin. P. N.T. (1975) Nanrre I L o n t l . ~ 257. 28 - 32. 7. Michcl, H., Oesterheld, D. & Henderson, R. (1980) Proc. Nut1 Acad. Sci. USA, 77. 338 - 342. 8 . Engelman, D. M., Henderson, R.. McLachlan, A . L). & Wallace. B, A. (1980) Proc. Natl Acad. Sci. USA, 77. 2023-2027. 9. Argos, P.. Schwarz, J. & Schwarz, J. (1976) BiochBn. Biop1n.s. Acta. 43Y, 261 -273. 10. Lenstra, J. A. (1977) Biochim. Bioph Acta, 491. 333 .- 338. 11. Nozaki. Y . & Tanford, C. (1971) J . 01. Chem. 246, 221 1 - 2 1 7. 12. Janin, J. (1979) Naturci (Lond.) 277. 491 -492. 13. Manavalan, P. & Ponnuswarny, P. K. (1978) NUIUW (Land.) 27.5. 673 - 674. 14. Wolfenden, R.V.. Cullis,P. M. & Southgatc, C. C. F. (1979) Scirnc,c2 (Wash. U C ) 206. 575- 577. 15. von Heijne, G . (1981) Eur. J . Biocheni. 116,419-422. 16. Levitl, J. (1978) B i ( i h w i i , ~ f r 17. y , 4277-4285. 17. Jones, D. D. (1975) J . Tlieor. B i d . 50, 167-183. 18. Moews, P. C.. Knox, J. R., Waxman, D. J. & Str0minger.J. L. (1981) It7t. J . Peptide Protein Res. 17, 21 1-218. 19. Chothia, C. (1976) J . Mol. Biol. 105, I - 14. 20. Chou, P. Y. & Fasman, G. D. (1974) Biochemistry. 13, 211 -221. 21. Chou. P. Y. & Fasman, G . D. (1974) Biochemistry. 13, 222-245. 22. Levitt, M. & Grecr, J. (1977) J . Mol. Biol. 114, 181-239. 23. Austen. B. M. (1979) FLBS Lett. 103, 308-313. 24. Rosc, G . D. & Siddhartha, R . (1980) Proc. Nut1 Acad. Sci. US.4. 77. 4643-4647. 25. Matthews, B. W. (1975) Biochh. Biophys. Acta, 405, 442-451. 26. Khorana. H. G.. Gcrbcr. G. E., Hcrlihy, W. C., Gray. C. P.. Anderegg, R. J., Nihei. K. & Biemann, K. (1979) Proc. Nut/ Acad. Sci. USA, 76,5045 - 5050. 27. Hayward, S. B. & Slroud, R. M. (1981) J . M o l . Biol. 151, 491 -517. 28. Lambotk, P., Falniagne. P., Capian. C., Zanen. J.. Ruysschdert. J. & Dirk, J. (1980) J . Cell Biol. 87, 837-840. 29. Frank, G.. Brunner, J., Hauser, H., Wacker. H.. Semenza, Ci. & Zuber, H. (1978) FEBS Lett. 96, 183-188. 30. Siemiatkowski-Juszczak,E. (1981) Ph.D. Dissertation. Southern Illinois University, Carbondale, Illinois, USA. 31. Hargrave, P. A., Fong, S.-L., McDowell, J. €I.. Mas, M. T., Curtis, D. R., Wang, J. K., Siemiatkowski-Juszczak, E. & Smith, D. P. (1980) Neurochem. In/. I , 231 -244. 32. Hargrave, P. A., McDowell, .I. H., Siemiatkowski-Jussczak. E.. Fong, S.-L.. Kuhn, H.. Wdng. J. K., Curtis, D. R., Mohana Rao. J. K., Argos, P. & Feldman, R. J. (1982) Vision Res. in the press. 33. Baylcy, H., Huang, K.-S.,Radhakrishnan. R., Ross, A. H., Takagaki, Y . & Khorana, H. G . (1981) Proc. Natl. Acud. Sci. USA. 78,2225 - 2229. 34. Wang, J. K., McDowell. J. H. & Hargrave, P. A. (1980) Biockemistry, 19, 5111-5117. 35. Stubbs, G. W., Smith, H . G . & Litman, B. J. (1976) Biochini. Bio/ ? ~ , v s Acta, . 426. 46- 56. 36. Michel-Villaz, J., Saibil, H. R. & Chabre. M. (1979) Proc. Nail Acad. Sci. USA. 76. 4405-4408. 37. Rothschild, K. J.. Sanches, R., Hsiao, T. L. & Clark, N. A. (1980) Bi0pli.w. J. 31. 53 -64. 38. Saibil, H.. Chabre, M. & Worcester, D. (1976) Nature (Lond.) 262. 266 - 270. 39. Chen, R., Kramer, C., Schmidmayr, W. & Henning. U. (1979) Prac. Nut[. Acad. Sci. USA, 70. 5014-5017. 40. Tomita, M., Furthmayr, H. & Marchcsi, V. T. (1978) Biochemisrrj,. 17,4756 -4770. 41. Wakabayashi, S., Matsubara. H.. Kim, C. H., Kawai, K. & King. T. E. (1980) Biochem. Biophy,~.Rex. COn7mUt7. 97. 1548- 1554. 42. Sebald, W. & Wachter, E. (1980) FEBS Lett. 122, 307-311. 43. Ashbeck, V. F., Beyreuther, K., Kohler. H.. von Wettstein, G. & Uraunitzer, G. (1 969) Hoppc~-Sey/er’sZ. Physiol. C‘heni. 350, 1047- 1066. 575 44. Chang, C . N.. Blobel. G. & Model. P. (1978) Proc. Nurl Acad. Sci. U S A , 75. 361 -365. 45. Sugimoto. K.. Sugisaki, H. & Takanami, M. (1977) J. Mol. B i d . 110.487 - 507. 46. Nakashima, Y. & Konigsberg. W. (1974) 1. Mol. Biol. 88. 598600. 47. Ozols, J. & Strittmatter. P. 0. (1969) J. B i d . Chem. 244, 66176618. 48. Tsugita. A,. Kabayashi. M.. Tani. S.. Kyo. S.. Rashid, M. A,, Yoshida, Y., Kajihara. T. & Hagihara. B. (1970) Proc. Natl Acud. Sci. USA, 67.442 - 447. 49. Fleming. P. J.. Dailey, H. A.. Corcoran, D. & Strittmatter, P. (1978) J . Biol. CIiem. 253. 5369-5372. 50. Sebald. W.. Machleidt. W. & Wachter. E. (1980) Proc. Nut1 Acad. SC~. USA, 77, 785 - 789. 51. Mercer, J. C., Haze. G.. Gaye, P. & Hue, D. (1978) Biochem. BiopliJs. Res. Commun. 82. 1236- 1245. 52. Engelman. D. M. & Steit7. T. A. (1981) Cell, 23, 41 1-422. 53. Date, T.. Goodman. J. M. & Wickner, W. T. (1980) Proe. Null Acud. Sci. USA. 77.4669-4673. 54. Lodish, H. F. & Rothman. J. E. (1979) Sci. Am. 240, 38-53. 55. Rosenbusch. J. P. (1974) J. Biol. Chem. 249. 8019-8029. 56. Kennedy, S. J. (1978) J. Membrane Biol. 42. 265-279. 57. Garavito, R. M. & Rosenbusch. J. P. (1980) J. Cell B i d . 86, 327329. 58. Levitt. M. & Chothia, C. (1976) Nature (Lond.) 261, 552-558. 59. Engelman, D. M. & Zaccai, G. (1980) Proc. Narl Acad. Sci. USA. 77,5894-5898. 60. Schiffcr, M. & Edmundson, A. B. (1967) Biophp. J . 7, 121 - 135. 61. Kayser, G.. Lambotte. P., Falmagne. P., Capian. C., Zanen, J. & Ruysschaert, J. (1981) Biochem. Biophw. Rrs. Cornmun. Y9, 358-363. 62. Inouye, M. (1974) Proc. Natl Arad. Sci. U S A , 71, 2396-2400. 63. Mao, D. & Wallace, B. A. (1982) Biophys. J. 37, 325a. 64. Reference deleted. 65. Marchesi, V. T., Furthmayr, H. & Tomita, M. (1976) Annic. Rev. Biochem. 45, 667 - 698. 66. Finlay, J. B. C., Brett. M. & Pappin, D. J. C. (1981) Nature (Lond.) 293,314-316. 67. Tomita, M. & Marchesi, V. T. (1975) Proc. Natl Acad. Sci.USA, 72. 2964 - 2968. 68. Dailey. H. A. & Strittmatter, P. (1981) J . B i d . Cliem. 276, 3951 3955. P. Argos and J. K. M. Rao, Department of Biological Sciences, Purdue University, West Lafilyette, Indiana. USA 47907 P. A. Hargrave, School of Medicine and Department of Chemistry and Biochemistry. Southern Illinois University, Carbondale. Illinois. USA 62901