* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Proteins containing unusual amino acid sequences
Silencer (genetics) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Signal transduction wikipedia , lookup
Paracrine signalling wikipedia , lookup
Gene expression wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Expression vector wikipedia , lookup
Biosynthesis wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Interactome wikipedia , lookup
Point mutation wikipedia , lookup
Homology modeling wikipedia , lookup
Genetic code wikipedia , lookup
Metalloprotein wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Biochemistry wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Proteins of Unusual Cornposition Peptide and Protein Group Colloquium, Organized and Edited by Dr A. Lyall (Glaxo Group Research, Greenford), held in London 27 September 1990 ~~ Proteins containing unusual amino acid sequences R. P. Ambler Institute of Cell and Molecular Biology, Division of Biological Sciences, University of Edinburgh, Mayfield Road, Edinburgh EH9 3JR,U.K. Introduction Unusual amino acid sequences When experimentally determined amino acid sequences of proteins first became available for study, they were looked at for statistically nonrandom elements, and for evidence of repeating units. Thus Brenner [l] used an analysis of dipeptide frequencies, at a time when only about 60% of these 400 possible sequences had been recognized in natural proteins, as a way of deciding if some hypotheses about the nature of the genetic code were possible. Early studies showed there to be imperfect repetitive sequence elements in collagen, compatible with the repeat distances recognized by fibre diffraction. Nevertheless, in general, proteins appeared to consist of non-repetitive information that seemed to be very specific, and which was being carefully edited by natural selection. Monkeys did not seem to be getting at the typewriter. Dayhoff [2] surveyed the amino acid compositions of proteins of known sequence. Some amino acids, for instance glycine, alanine and leucine, were more abundant in protein than others, with methionine and the aromatic amino acids generally present in lower amounts. With proteins that were rich in a particular amino acid, the richness seemed to be distributed throughout the sequence. Thus with the cytochromes c', proteins for which many homologous sequences are known [31, and which are characteristically alanine-rich (around 18-20%), the Ala-Ala, Ala-Ala-Ala and Ala-Ala-Ala-Ala elements are not located in specific parts of the sequence. Cytochrome c' is largely made up of a-helix [4], and it would seem that in three of the four helices almost every position is alanine in one or other of the known sequences. Forty years after the publication of the insulin B-chain sequence [S], and with the databases now containing more than lo6 peptide bonds of sequence, our perception of protein sequences is somewhat different. Repeating motifs occur, with periodicities up to at least 48 residues (e.g. in the bacterial ice-nucleation proteins [6]), and long runs of the same amino acid have been found. Proteins containing 30 or more consecutive residues of asparagine, aspartic acid, glutamine or glutamic acid have been found in proteins from a variety of eukaryotic sources. For example there is a string of 33 successive aspartic acid residues in a chicken laminin-binding protein [7]. A bone protein, osteopontin, contains a run of nine aspartic acid residues [8]. This protein binds very strongly to hydroxylapatite in Vitro and interacts with the bone matrix in vivo, and the authors wonder if the Asp, element may be responsible for attachment. It has long been recognized that protamines, proteins that replace histones in the chromatin of mature fish sperm, contain nearly 70% arginine. More recently, proteins containing similarly high contents of other amino acids have been recognized through the translation of open reading frames of DNA sequences. Thus there is a glycine-rich structural protein in plant cell walls [9], but although around 70% of the residues are glycine the protein contains no runs longer than Gly,. A search of sequence databases for runs of consecutive identical amino acids shows that not all the '20' protein amino acids have the same tendency to occur in runs. Thus, the largest number of consecutive tryptophan residues, almost always an amino acid present in low relative amounts in pro- 517 1991 Biochemical Society Transactions 518 teins, was three, and the highest occurrence for tyrosine was six. Runs of the branched side-chain amino acids valine and isoleucine are short (maximum seven) and uncommon, which is likely to relate to the steric hindrance caused by their sidechains. In contrast, runs of six or so leucine residues are quite common, although runs longer than eight have not yet been found. The longest runs seem to occur with the amide and acid residues, as noted above. Runs of glutamine residues are particularly common. They occur in many seed storage proteins (see below), and as the major component of the ‘opa’ boxes in developmentally important proteins including some that contain the homoeo box structural element [ 101. Runs of other amino acids also occur in homoeotic proteins, often bracketing the ‘homoeo box’, and there has been speculation that these runs modulate translation at various stages of differentiation. These simple repetitive sequences occur in several different types of protein. Most striking are the surface antigen proteins of Plamodium [ 113, and other protozoons, with runs of histidine, serine, aspartic acid, aparagine and phenylalanine already recognized. One explanation for their existence is that the structures have evolved to hinder immunological disposal of parasitic protozoa, but the occurrence of similar repeated sequences in freeliving relatives makes such an explanation less attractive. Storage proteins in seeds are often made up from imperfect multiple repeats of short elements rich in a single amino acid [ 121. This amino acid is often glutamine, and runs of up to about 20 of this amino acid occur in wheat gliadins. Other repeat sequences appear to be part of structurally significant domains. Thus the longest detected valine sequence (seven) occurs in a human pulmonary-surfactant-associated protein, in a run of 16 valine/leucine/isoleucine residues, in a protein with a recognizable function in helping to reduce surface tension [ 131. In several proteins runs and unusual sequences occur close to the C-termini. Thus long runs of acidic amino acids occur, followed by a stop codon, at the C-terminus of DrosophiJa troponin-T [ 141, the laminin-binding protein mentioned above, in non-histone chromosomal proteins [ 151, and in a mouse homoeotic protein [ 161. The well-known (‘V8’) protease from Staphylococcus uureus has an acidic C-terminal domain that contains a run of 12 imperfect -ProAsp-Asn- repeats. The domain has been characterized at both the protein (R. P. Ambler, Volume 19 unpublished work) and the DNA levels [ 171. That this region contained an anomalous structure was first suspected when it was found that there was not a single -Asp-Pro- sequence present. This peptide bond is the most acid labile of all the 400 possibilities derived from the 20 protein amino acids. Occurrence of runs of amino acids In the next section some of the long runs for each of the ‘20’ protein amino acids are shown. The amino acids are taken in the alphabetical order of the oneletter code. The runs were found by a search of databases in the Summer of 1990. Alanine Ala,, near N-terminus of the DrosophiZu homoeo box protein engrailed [181. Some fish serum antifreeze peptides contain more than 70% alanine, but the longest runs are Ala,, which occur several times [ 191. Cysteine The databases record Cys7 in a sheep low-sulphur keratin fraction. However, the reference shows this entry to be a peptide composition, not a sequence [2]. A sequence Cys, is reported to occur in the protein endozepine [Zl]. Aspartic acid Asp,, as the C-terminal domain in chicken lamininbinding protein [7]. Asp,, in a hypothetical cowpox virus protein [22]; every codon in this run is GAT. Asp,, in a Plasmodium aspartic-acid-rich protein, which also contains a long phenylalanine run ~31. Asp,, very close to the C-terminus of what is now recognized to be a yeast ubiquitin-conjugating enzyme (EC 6.3.2.19; [24, 251): 20 of the 23 terminal residues are acids, and there are two -Asp-AspMet- tripeptides. Glutamic acid Glu,, as the C-terminal domain of Drosophila troponin-T [14]; all but five of the final 5 5 residues are Asp or Glu. Glu3, in a Plarmodium glutamic-acid-rich protein [26]. This sequence is in a region of 97 residues of which all but four are Asp or Glu, but this is followed by a short very basic region before the stop codon. Runs of glutamic acid occur at the C-terminus of several vertebrate developmental proteins (e.g. Proteins of Unusual Sequence Composition ref [ 16]), but do not seem to occur in such proteins from Drosophila. Phenylalanine Phe,, in the signal peptide of a P h o d i u m asparticacid-rich protein [23]; the sequence is MYLFIYIFFFFFFFFFFVIVqkdie..., with the DNA sequence for the phenylalanines all thymine. Only one other run as long as Phe, was found in the databases. Glycine G1yz4in the human androgen receptor [27]; in the rat there is only Gly,. The protein also contains glutamine runs (see below). G1yzOnear the N-terminus of human calpain, a calcium-dependent protease [28a]. This region also contains a separate Gly,, run. The runs are present in the rabbit and the pig enzymes, but the lengths are different. Gly,, in the human homoeo box protein HOX-2G [28]. In the glycine-rich structural proteins of the plant cell wall [9], the longest consecutive run is Gly,, but there is a long sequence (280 residues) entirely of the motif -Gly-Xaa-Gly-Xaa-, Xaa also often being Gly. Hiddine His,, in the mouse homoeotic protein ERA-1 [29]. His, occurs several times in the Plarmodium histidine-rich glycoprotein [301. This protein contains 74% histidine, and despite containing 304 residues (43 kDa) completely lacks the eight amino acids cysteine, isoleucine, lysine, methionine, asparagine, glutamine, arginine and serine. Isdeuclne Ile, in the N-terminal signal sequence of the mitochondrial NADH-ubiquinone oxidoreductase of Lkshmania tarentohe [31]. In a run of 19 residues, all but 3 are Ile, Leu or Val. All the Leishmania mitochondrial proteins seem to be isoleucine rich. Lysine Lyss in a human translational initiation factor [32]; the same protein also contains another two other Lys6 sequences. The authors suggest these regions interact directly with nucleic acid. Lys, in the yeast potassium transporter protein TRKl [33]. This sequence is immediately preceded by (Asn),-Arg, and elsewhere there is Asp,,. These are the two most notably highly charged ele- ments in a sequence containing several potential membrane-spanning domains. Lysine repeats appear to be infrequent compared with those of either histidine or arginine. 519 Leucine Leu,, in a kinase-related transforming protein from feline sarcoma virus [34]; the sequence is part of a longer hydrophobic region. This protein also contained Ser,, (see below). Runs of six to eight leucines occur in a wide variety of proteins, but longer ones are rare. Methionine Met7 in an a-amidating protein (EC 3.5.1.-) from Xenopuc; b z k [35], although there is only Met5 in a closely related protein from the same organism. The hinge-ligament proteins of some bivalve molluscs are rich in methionine (up to 25%, [36]). Asparagine Amzo in a gene from Dictyostelium cloned using a genomic repetitive sequence as a probe [37]. The same probe also identified putative proteins containing Thr and Gln repeats. Asn,, in a P h o d i u m asparagine-rich protein [3*1. Asn,, in the Drosophila homoeotic protein deformed [39, 401, and sequences nearly as long in other homoeotic proteins. A yeast transcription factor contains a sequence rich in asparagine and threonine, which is discussed below. Proline Proz6 (followed by -Ser-Pro,,) in the Epstein-Barr virus nuclear protein BYRFl [41]. Pro,, in the proline-rich (42/127 residues) Cterminal domain of the protease acrosin, which may function to bind fucose [42]. The rest of this protein is homologous to other serine proteases. Pro,, in the mouse homoeotic protein HOX2.6 [43]. Pro8 in human androgen receptor [27]. This is in a region of the protein where the majority of residues are well conserved between human and rat, but in the rat sequence two of the prolines are changed. Hydroxyproline occurs as 45% of the plant cell-wall protein extensin. Lamport, [44] showed the presence of -Ser-Hyp-Hyp-Hyp-Hyp- elements, and his findings were extended by Chen & Varner, [45], who showed that 25 of these sequences occur in the 1991 Biochemical Society Transactions 280 residues of this highly glycosylated 86 kDa protein. Glutamine 520 Gln,, very near the N-terminus of the mouse protein mopa [46], which is differentially expressed in tissues and stages of development, and which also contains another GlnZ7in the first 100 residues. Polyglutamine sequences occur in many Drosophila homoeotic proteins, and is the element recognized in the opa repetitive sequences [47]. Gln,, in the yeast glucose repression mediator protein CYC8 [48, 491. This protein also contains a Gln,, run near the N-terminus, while in the middle of the molecule there is a stretch that is almost perfect (Gln-Ala),, followed by the Gln,,. Gln,, in rat androgen receptor, although the block is Gln,, in a human protein and 100 residues earlier in the sequence [27]. In a rat glucocorticoid receptor there is Gln,, 75 residues from the Nterminus of a 790-residue protein [SO], but it is thought that this is not important to the functioning of the protein, as the corresponding human protein lacks this element, while in the mouse there is only Gln,. Multiple glutamines are common in seed storage proteins, including Gln,, in wheat gliadin. Arginine Arg,, towards the C-terminus of the Drosophila homoeotic protein caudal [5 11. The N-terminus contains several His and Asn repeats, instead of the Gln repeats which are more common in Drosophila homoeotic proteins. Many protamines and sperm histones contain runs of up to seven arginines, and runs occur in several viral proteins. Serine Ser,, (and SerZ5-Pro-Ser7)in Xenopus vitellogenin [52]. This very serine-rich region is processed to form phosvitin or the phosvettes. Ser,, in a Plasmodium serine-repeat protein (Bzik, D. J., unpublished work, but listed in the databases). Ser,, in the feline sarcoma virus protein that also contains the long Leu repeat (see previously). Threonine Thr,, in Drosophila simulans salivary glue protein [53]. The protein is evolving rapidly in Drosophila, so the Thr,, run is not conserved between species, but all have an ‘A + C‘ region rich in threonine. Volume 19 Thr, in yeast transcription factor ADR6 [54], in a 30-residue sequence containing only threonine and asparagine. This protein also contains glutamine repeats. Valine Val, in a bovine pulmonary surfactant-associated protein [13], is in a very hydrophobic region that is functionally necessary (...KRLLIVVVWVVLVVVWIGAM.. .). This region is always hydrophobic, but differs in detail between species. Valine runs are not common. Tryptophan No runs of more than Trp, have yet been identified. Tyrosine Tyr, in a human immunoglobulin heavy-chain precursor [55]. Classes of proteins that contain unusual sequences In printouts of runs of amino acids, some types of protein occur again and again. These include viral proteins, plant and seed structural proteins, surface proteins from protozoan parasites, and the protein products of homoeotic genes, although the level of viral protein occurrence may primarily reflect their abundance in the databases. Simple repetitions of a single amino acid do not seem to be common in bacterial proteins. There are also a set of miscellaneous proteins where there is a run of acidic amino acids at the extreme C-terminus, often being ended by the stop codon. These are sometimes all aspartic acid or all glutamic acid, but in a rat non-histone chromosoma1 protein there is a run of 30 mixed acidic residues, preceded by the sequence -Lys-Lys-Lys-Lys[15]. The authors speculate about a function in nucleosome assembly or DNA replication. It has been shown that the plant seed [ 121 and cell-wall [561 proteins are made up from repetitions of peptide elements. The elements are likely to be ‘pipe-bends’ with secondary structures that enable the proteins to pack well and form the requisite overall structures. Some divergence is taking place, so elements are not necessarily identical. The seed protein elements are very rich in glutamine, and the long runs form when other residues are mutated to glutamine. Proteins made up from repeated structural elements also occur in animals, for example collagen, with the long-known -Gly-%a-Yaa- motif, where Xaa is often proline and Yaa hydroxyproline. Proteins of Unusual Sequence Composition Similarly, in bacteria there are simple elements, such as -Asp-Asn-Pro- in staphylococcal protease, as well as much more complex ones like in the icenucleation proteins [6]. Possible explanations for the occurrence of repeating elements in the surface proteins of protozoal parasites are discussed by Ridley [ 113. Many homoeotic genes from Drosophila and mammals have now been cloned and sequenced. They contain a recognizable domain, the homoeo box [lo], about 60 residues long and which contains a helix-turn-helix DNA-binding site. In the domains on either side of this central unit there are often long runs of a single amino acid. While glutamine runs are common, making up the widely distributed opa box, 10 of the 20 protein amino acids are already known to occur in runs in developmental proteins. It is unlikely that runs will always have a precise structural purpose. Evidence for this assertion comes from inter-species comparisons, which show that run lengths are highly variable in proteins. Examples known are from proteins as diverse as the protease calpain [28a] and an androgen receptor [27]. In the latter protein, if the run regions are excluded, the rat and the human protein have more than 90% identity, but the long glutamine runs are at positions 100 residues apart in the sequence, and there are indications that individual humans have different length glutamine runs. In most cases the runs are more pronounced at rhe amino acid rather than the nucleotide level, indicating that they have existed long enough that third-base mutation has resulted in a mixture of codons being used. However, a run of phenylalanines in a Plasmodium protein is all coded by TTT [23], and a run of Asp,, in a hypothetical cowpox virus protein are all coded by GAT [22]. The occurrence of long runs of amino acids and complex repetitive elements in proteins is a demonstration of the versatility of this class of molecule. I hope that we shall soon have a better understanding of their origin, their function, and the structures that they take up. I would like to thank Andrew Lyall and Sarah McQuay of the Biocomputing Research Unit, University of Edinburgh, for running the database searches. 1. Brenner, S. (1957) Proc. Natl. Acad. Sci. U.S.A. 43, 687-694 2. Dayhoff, M. 0.(ed) (1972) Atlas of Protein Sequence and Structure, Vol. 5, National Biomedical Research Foundation, Washington D.C. 3. Ambler, R P., Bartsch, R. G., Daniel, M., Kamen, M. D., McLellan, L., Meyer, T. E. & Van Beeumen, J. (1981) Proc. Natl. Acad. Sci. U.S.A. 78,6854-6857 4. Weber, P. C., Bartsch, R. G., Cusanovich, M. A., Hamlin, R C., Howard, A., Jordan, S. A., Kamen, M. D., Meyer, T. E., Weatherford, D. W., Xuong, N. H. & Salemme, F. R. (1980) Nature (London) 286, 302-304 5. Sanger, F. & Tuppy, H. (1951) Biochem. J. 49, 463-481,481-490 6. Warren, G. & Corotto, L. (1989) Gene 85,239-242 7. Clegg, D. O., Helder, J. C., Hann, B. C., Hall, D. E. & Reichardt, D. F. (1988)J. Cell Biol. 107,699-705 8. Oldberg, A., Fraznh, A. & Heinegard, D. (1986) Proc. Natl. Acad. Sci. U.S.A. 83,8819-8823 9. Condit, C. M. & Meagher, R. B. (1986) Nature (London) 323,178-181 10. Gehring, W. J. (1987) Science 236,1245-1252 11. Ridley, R. G. (1991) Biochem. SOC. Trans. 19, 525-528 12. Shewry, P. R., Hull, G. & Tatham, A. S. (1991) Biochem. SOC.Trans. 19,528-530 13. Glasser, S. W., Korfhagen, T. R., Perme, C. M., PilotMatias, T. J., Kister, s. E. & Whitsett, J. A. (1988) J. Biol. Chem. 263,10326-10331 14. Bullard, B., Leonard, K., Larkins, A., Butcher, G., Karlik, C. & Fyrberg, E. (1988) J. Mol. Biol. 204, 621-637 15. Paonessa, G., Frank, R. & Cortese, R (1987) Nucleic Acids Res. 15,9077 16. Kessel, M., Schulze, F., Fibi, M. & Gruss, P. (1987) Proc.Natl.Acad.Sci. U.SA.84,5306-5310 17. Carmona, C. & Gray, G. L. (1987) Nucleic Acids Res. 15,6757 18. Kassis,J. A., Poole, S. J., Wright, D. K. & OFarrell, P. (1986)EMBO J. 5,3583-3589 19. Davies, P. L., Roach, A. H. & Hew, C.-L. (1982) Proc. Natl. Acad. Sci. U.S.A. 79, 335-339 20. reference deleted 21. Webb, N. R, Rose, T. M., Malik, N., Marquardt, H., Shoyab, M., Todaro, G. J. & Lee, D. C. (1987) DNA 6,71-80 22. Patel, D. D. & Pickup, D. J. (1987) EMBO J. 6, 3787-3794 23. Lenstra, R., DAuriol, L., Andrieu, B., Le Bras, J. & Galibert, F. (1987) Biochem. Biophys. Res. Commun. 146,368-377 24. Reynolds, P., Weber, S. & Prakash, L. (1985) Proc. Natl. Acad. Sci. U.S.A. 82, 168-172 25. Jentsch, S., McGrath, J. P. & Varshavsky, A. (1987) Nature (London) 329,13 1-1 34 26. Triglia, T., Stahl, H.-D., Crewther, P. E., Silva, A., Anders, R. F. & Kemp, D. J. (1988) Mol. Biochem. Parasitol. 31, 199-202 27. Lubahn, D. B., Joseph, D. R., Sar, M., Tan, J., Higgs, H. N., Larson, R. E., French, F. S. & Wilson, E. M. (1988) Mol. Endocrinol. 2,1265-1275 28. Acampora, D., DEsposito, M., Faiella, A, Pannese, 1991 52 I Biochemical Society Transactions 522 M., Miogliaccio, E., Morelli, F., Stornaiulo, A., Nigro, V., Simeone, A. & Bonicelli, E. (1989) Nucleic Acids Res. 17,10385-10402 28a. Miyake, S., Emori, Y. & Suzuki, K. (1986) Nucleic Acids Res. 14,8805-8817 29. Larosa, G. J. & Gudas, L. (1988) Mol. Cell. Biol. 8, 3906-3917 30. Ravetch, J. V., Feder, R., Pavlovec, A. & Blobel, G. (1984) Nature (London) 312,616-620 31. Simpson, L., Neckelmann, N., De La Cruz, V. F., Simpson, A. M., Feagin, J. E., Jasmer, D. P. & Stuart, K. J. ( 1987)J. Biol. Chem. 262,6182-6 196 32. Pathak, V. K., Nielsen, P. J., Trachsel, H. & Hershey, J. W. B. (1988) Cell (Cambridge,Mass.) 54,633-639 33. Gaber, R. F., Styles, C. A. & Fink, G. R. (1987) Mol. Cell. Biol. 8,2848-2859 34. Woolford, J., McAuliffe, A. & Rohrschneider, L. R. (1988) Cell (Cambridge,Mass.) 55,965-977 35. Ohsuye, K., Kitano, K., Wada, Y., Fuchimura, K., Tanaka, S., Mizuno, K. & Matsuo, H. (1988) Biochem. Biophys. Res. Commun. 150,1275-1281 36. Kikuchi, Y. & Tamiya, N. (1987) Biochem. J. 242, 505-5 10 37. Shaw, D. R., Richter, H., Giorda, R, Ohmacmhi, T. & Ennis, H. (1989) Mol. Gen. Genet. 218,453-459 38. Stahl, H.-D., Bianco, A. E., Crewther, P. E., Burkot, T., Coppel, R. L., Brown, G. V., Anders, R. F. & Kemp, D. J. (1986) Nucleic Acids Res. 14, 3089-3 102 39. Laughon, A., Carroll, S. B., Storfer, F. A., Riley, P. D. & Scott, M. P. (1985) Cold Spring Harbor Symp. Quant. Biol. 50,253-262 40. Regulski, M., McGinnis, N., Chadwick, R. & McGinnis,W. (1987) EMBO J. 6,767-777 41. Baer, R., Bankier, A. T., Biggin, M. D., Deininger, P. L., Farrell, P. J., Gibson, T. J., Hatfull, G., Hudson, G. S., Stachwell, S. C., Seguin, C., Tuffnell, P. S. & Barrell, B. (1984) Nature (London)310,207-21 1 42. Adham, I. M., Maier, W.-M., Hoyer-Fender, S., Tsauosidou, S., Engel, W. & Klemm, U. (1989) Eur. J. Biochem. 182,562-568 Volume 19 43. Graham, A., Papalopulu, N., Lorimer,J., McVey, J. H., Tuddenham, E. G. D. & Krumlauf, R. (1988) Genes Dev. 2,1424-1438 44. Lamport, D. T. A. (1977) in Recent Advances in Phytochemistry (Loewus, F. A. & Runeckles, B. C., eds.) vol. 11, pp. 79-1 15, Plenum, New York 45. Chen, J. & Vamer, J. E. (1985) EMBO J. 4, 2145-2151 46. Duboule, D. J., Haenlin, M., Galliot, B. & Mohier, E. (1987) Mol. Cell. Biol. 7,2003-2006 47. Wharton, K. A., Yedvobnick, B., Finnerty, V. G. & Artavanis-Tsakonas, S. (1985) Cell (Cambridge, Mass.), 40, 55-62 48. Schultz, J. & Carlson, M. (1987) Mol. Cell. Biol. 7, 3637-3645 49. Trumbly, R. J. (1 988) Gene 73,97-111 50. Miesfeld, R., Rusconi, S., Godowski, P. J., Maler, B. A., Okret, S., Wikstrom, A.-C., Gustafsson, J.-A. & Yamamoto, K. R. (1 986) Cell (Cambridge,Mass.), 46, 389-399 51. Mlodzik, M. & Gehring, W. J. (1987) Cell (Cambridge,Mass.), 48,465-478 52. Gerber-Huber, S., Nardelli, D., Haefliger, J.-A., Cooper, D. N., Givel, F., Germond, J.-E., Engel, J., Green, N. M. & Wahli, W. (1987) Nucleic Acids Res. 15,4737-4760 53. Martin, C. H., Mayeda, C. A. & Meyerowitz, E. M. (1988)J. Mol. Biol. 201,273-287 54. OHara, P. J., Horowitz, H., Eichinger, H. & Young, E. T. (1988) Nucleic Acids Res. 16, 10153-10170 55. Kipps, T. J., Tomhave, E., Pratt, L. F., Dufi, S., Chen, P. P. & Carson, D. A. (1989) Proc. Natl. Acad. Sci. U.S.A. 86,5913-5917 56. Showalter, A. M. & Rumeau, D. (1989) in Organisation and Assembly of Animal and Plant Extracellular Matix (Adair, W. S. & Mecham, R. P., eds.), Academic Press Received 17 December, 1990