Download Proteins containing unusual amino acid sequences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Silencer (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Signal transduction wikipedia , lookup

Paracrine signalling wikipedia , lookup

Gene expression wikipedia , lookup

SR protein wikipedia , lookup

Metabolism wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Magnesium transporter wikipedia , lookup

Expression vector wikipedia , lookup

Biosynthesis wikipedia , lookup

Amino acid synthesis wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Interactome wikipedia , lookup

Point mutation wikipedia , lookup

Homology modeling wikipedia , lookup

Genetic code wikipedia , lookup

Metalloprotein wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein purification wikipedia , lookup

Protein wikipedia , lookup

Western blot wikipedia , lookup

Biochemistry wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Proteins of Unusual
Cornposition
Peptide and Protein Group Colloquium, Organized and Edited by Dr A. Lyall (Glaxo Group Research,
Greenford), held in London 27 September 1990
~~
Proteins containing unusual amino acid sequences
R. P. Ambler
Institute of Cell and Molecular Biology, Division of Biological Sciences, University of Edinburgh, Mayfield Road,
Edinburgh EH9 3JR,U.K.
Introduction
Unusual amino acid sequences
When experimentally determined amino acid
sequences of proteins first became available for
study, they were looked at for statistically nonrandom elements, and for evidence of repeating
units. Thus Brenner [l] used an analysis of dipeptide frequencies, at a time when only about 60%
of these 400 possible sequences had been recognized in natural proteins, as a way of deciding if
some hypotheses about the nature of the genetic
code were possible. Early studies showed there to
be imperfect repetitive sequence elements in collagen, compatible with the repeat distances recognized by fibre diffraction. Nevertheless, in general,
proteins appeared to consist of non-repetitive information that seemed to be very specific, and which
was being carefully edited by natural selection.
Monkeys did not seem to be getting at the typewriter.
Dayhoff [2] surveyed the amino acid compositions of proteins of known sequence. Some
amino acids, for instance glycine, alanine and
leucine, were more abundant in protein than others,
with methionine and the aromatic amino acids
generally present in lower amounts. With proteins
that were rich in a particular amino acid, the richness seemed to be distributed throughout the
sequence. Thus with the cytochromes c', proteins for
which many homologous sequences are known [31,
and which are characteristically alanine-rich
(around 18-20%), the Ala-Ala, Ala-Ala-Ala and
Ala-Ala-Ala-Ala elements are not located in specific
parts of the sequence. Cytochrome c' is largely
made up of a-helix [4], and it would seem that in
three of the four helices almost every position is
alanine in one or other of the known sequences.
Forty years after the publication of the insulin
B-chain sequence [S], and with the databases now
containing more than lo6 peptide bonds of
sequence, our perception of protein sequences is
somewhat different. Repeating motifs occur, with
periodicities up to at least 48 residues (e.g. in the
bacterial ice-nucleation proteins [6]), and long runs
of the same amino acid have been found. Proteins
containing 30 or more consecutive residues of
asparagine, aspartic acid, glutamine or glutamic acid
have been found in proteins from a variety of
eukaryotic sources. For example there is a string of
33 successive aspartic acid residues in a chicken
laminin-binding protein [7]. A bone protein, osteopontin, contains a run of nine aspartic acid residues
[8]. This protein binds very strongly to hydroxylapatite in Vitro and interacts with the bone matrix in
vivo, and the authors wonder if the Asp, element
may be responsible for attachment.
It has long been recognized that protamines,
proteins that replace histones in the chromatin of
mature fish sperm, contain nearly 70% arginine.
More recently, proteins containing similarly high
contents of other amino acids have been recognized
through the translation of open reading frames of
DNA sequences. Thus there is a glycine-rich structural protein in plant cell walls [9], but although
around 70% of the residues are glycine the protein
contains no runs longer than Gly,.
A search of sequence databases for runs of
consecutive identical amino acids shows that not all
the '20' protein amino acids have the same tendency
to occur in runs. Thus, the largest number of consecutive tryptophan residues, almost always an
amino acid present in low relative amounts in pro-
517
1991
Biochemical Society Transactions
518
teins, was three, and the highest occurrence for
tyrosine was six. Runs of the branched side-chain
amino acids valine and isoleucine are short (maximum seven) and uncommon, which is likely to
relate to the steric hindrance caused by their sidechains. In contrast, runs of six or so leucine residues
are quite common, although runs longer than eight
have not yet been found. The longest runs seem to
occur with the amide and acid residues, as noted
above.
Runs of glutamine residues are particularly
common. They occur in many seed storage proteins (see below), and as the major component of
the ‘opa’ boxes in developmentally important proteins including some that contain the homoeo box
structural element [ 101. Runs of other amino acids
also occur in homoeotic proteins, often bracketing
the ‘homoeo box’, and there has been speculation
that these runs modulate translation at various
stages of differentiation.
These simple repetitive sequences occur in
several different types of protein. Most striking are
the surface antigen proteins of Plamodium [ 113, and
other protozoons, with runs of histidine, serine,
aspartic acid, aparagine and phenylalanine already
recognized. One explanation for their existence is
that the structures have evolved to hinder immunological disposal of parasitic protozoa, but the
occurrence of similar repeated sequences in freeliving relatives makes such an explanation less
attractive. Storage proteins in seeds are often made
up from imperfect multiple repeats of short elements rich in a single amino acid [ 121. This amino
acid is often glutamine, and runs of up to about 20
of this amino acid occur in wheat gliadins. Other
repeat sequences appear to be part of structurally
significant domains. Thus the longest detected
valine sequence (seven) occurs in a human pulmonary-surfactant-associated protein, in a run of 16
valine/leucine/isoleucine residues, in a protein with
a recognizable function in helping to reduce surface
tension [ 131.
In several proteins runs and unusual
sequences occur close to the C-termini. Thus long
runs of acidic amino acids occur, followed by a stop
codon, at the C-terminus of DrosophiJa troponin-T
[ 141, the laminin-binding protein mentioned above,
in non-histone chromosomal proteins [ 151, and in a
mouse homoeotic protein [ 161.
The well-known (‘V8’) protease from
Staphylococcus uureus has an acidic C-terminal
domain that contains a run of 12 imperfect -ProAsp-Asn- repeats. The domain has been characterized at both the protein (R. P. Ambler,
Volume 19
unpublished work) and the DNA levels [ 171. That
this region contained an anomalous structure was
first suspected when it was found that there was not
a single -Asp-Pro- sequence present. This peptide
bond is the most acid labile of all the 400 possibilities derived from the 20 protein amino acids.
Occurrence of runs of amino acids
In the next section some of the long runs for each of
the ‘20’ protein amino acids are shown. The amino
acids are taken in the alphabetical order of the oneletter code. The runs were found by a search of
databases in the Summer of 1990.
Alanine
Ala,, near N-terminus of the DrosophiZu homoeo
box protein engrailed [181.
Some fish serum antifreeze peptides contain
more than 70% alanine, but the longest runs are
Ala,, which occur several times [ 191.
Cysteine
The databases record Cys7 in a sheep low-sulphur
keratin fraction. However, the reference shows this
entry to be a peptide composition, not a sequence
[2]. A sequence Cys, is reported to occur in the
protein endozepine [Zl].
Aspartic acid
Asp,, as the C-terminal domain in chicken lamininbinding protein [7].
Asp,, in a hypothetical cowpox virus protein
[22]; every codon in this run is GAT.
Asp,, in a Plasmodium aspartic-acid-rich protein, which also contains a long phenylalanine run
~31.
Asp,, very close to the C-terminus of what is
now recognized to be a yeast ubiquitin-conjugating
enzyme (EC 6.3.2.19; [24, 251): 20 of the 23 terminal residues are acids, and there are two -Asp-AspMet- tripeptides.
Glutamic acid
Glu,, as the C-terminal domain of Drosophila
troponin-T [14]; all but five of the final 5 5 residues
are Asp or Glu.
Glu3, in a Plarmodium glutamic-acid-rich protein [26]. This sequence is in a region of 97 residues
of which all but four are Asp or Glu, but this is followed by a short very basic region before the stop
codon.
Runs of glutamic acid occur at the C-terminus
of several vertebrate developmental proteins (e.g.
Proteins of Unusual Sequence Composition
ref [ 16]), but do not seem to occur in such proteins
from Drosophila.
Phenylalanine
Phe,, in the signal peptide of a P h o d i u m asparticacid-rich protein [23]; the sequence is MYLFIYIFFFFFFFFFFVIVqkdie..., with the DNA
sequence for the phenylalanines all thymine. Only
one other run as long as Phe, was found in the databases.
Glycine
G1yz4in the human androgen receptor [27]; in the
rat there is only Gly,. The protein also contains
glutamine runs (see below).
G1yzOnear the N-terminus of human calpain, a
calcium-dependent protease [28a]. This region also
contains a separate Gly,, run. The runs are present
in the rabbit and the pig enzymes, but the lengths
are different.
Gly,, in the human homoeo box protein
HOX-2G [28].
In the glycine-rich structural proteins of the
plant cell wall [9], the longest consecutive run is
Gly,, but there is a long sequence (280 residues)
entirely of the motif -Gly-Xaa-Gly-Xaa-, Xaa also
often being Gly.
Hiddine
His,, in the mouse homoeotic protein ERA-1 [29].
His, occurs several times in the Plarmodium
histidine-rich glycoprotein [301. This protein contains 74% histidine, and despite containing 304 residues (43 kDa) completely lacks the eight amino
acids cysteine, isoleucine, lysine, methionine,
asparagine, glutamine, arginine and serine.
Isdeuclne
Ile, in the N-terminal signal sequence of the mitochondrial NADH-ubiquinone oxidoreductase of
Lkshmania tarentohe [31]. In a run of 19 residues,
all but 3 are Ile, Leu or Val. All the Leishmania mitochondrial proteins seem to be isoleucine rich.
Lysine
Lyss in a human translational initiation factor [32];
the same protein also contains another two other
Lys6 sequences. The authors suggest these regions
interact directly with nucleic acid.
Lys, in the yeast potassium transporter protein TRKl [33]. This sequence is immediately preceded by (Asn),-Arg, and elsewhere there is Asp,,.
These are the two most notably highly charged ele-
ments in a sequence containing several potential
membrane-spanning domains.
Lysine repeats appear to be infrequent compared with those of either histidine or arginine.
519
Leucine
Leu,, in a kinase-related transforming protein from
feline sarcoma virus [34]; the sequence is part of a
longer hydrophobic region. This protein also contained Ser,, (see below).
Runs of six to eight leucines occur in a wide
variety of proteins, but longer ones are rare.
Methionine
Met7 in an a-amidating protein (EC 3.5.1.-) from
Xenopuc; b z k [35], although there is only Met5 in a
closely related protein from the same organism.
The hinge-ligament proteins of some bivalve
molluscs are rich in methionine (up to 25%, [36]).
Asparagine
Amzo in a gene from Dictyostelium cloned using a
genomic repetitive sequence as a probe [37]. The
same probe also identified putative proteins containing Thr and Gln repeats.
Asn,, in a P h o d i u m asparagine-rich protein
[3*1.
Asn,, in the Drosophila homoeotic protein
deformed [39, 401, and sequences nearly as long in
other homoeotic proteins.
A yeast transcription factor contains a
sequence rich in asparagine and threonine, which is
discussed below.
Proline
Proz6 (followed by -Ser-Pro,,) in the Epstein-Barr
virus nuclear protein BYRFl [41].
Pro,, in the proline-rich (42/127 residues) Cterminal domain of the protease acrosin, which may
function to bind fucose [42]. The rest of this protein
is homologous to other serine proteases.
Pro,, in the mouse homoeotic protein
HOX2.6 [43].
Pro8 in human androgen receptor [27]. This is
in a region of the protein where the majority of residues are well conserved between human and rat,
but in the rat sequence two of the prolines are
changed.
Hydroxyproline occurs as 45% of the plant
cell-wall protein extensin. Lamport, [44] showed the
presence of -Ser-Hyp-Hyp-Hyp-Hyp- elements, and
his findings were extended by Chen & Varner, [45],
who showed that 25 of these sequences occur in the
1991
Biochemical Society Transactions
280 residues of this highly glycosylated 86 kDa protein.
Glutamine
520
Gln,, very near the N-terminus of the mouse protein mopa [46], which is differentially expressed in
tissues and stages of development, and which also
contains another GlnZ7in the first 100 residues.
Polyglutamine sequences occur in many Drosophila
homoeotic proteins, and is the element recognized
in the opa repetitive sequences [47].
Gln,, in the yeast glucose repression mediator
protein CYC8 [48, 491. This protein also contains a
Gln,, run near the N-terminus, while in the middle
of the molecule there is a stretch that is almost
perfect (Gln-Ala),, followed by the Gln,,.
Gln,, in rat androgen receptor, although the
block is Gln,, in a human protein and 100 residues
earlier in the sequence [27]. In a rat glucocorticoid
receptor there is Gln,, 75 residues from the Nterminus of a 790-residue protein [SO], but it is
thought that this is not important to the functioning
of the protein, as the corresponding human protein
lacks this element, while in the mouse there is only
Gln,.
Multiple glutamines are common in seed
storage proteins, including Gln,, in wheat gliadin.
Arginine
Arg,, towards the C-terminus of the Drosophila
homoeotic protein caudal [5 11. The N-terminus
contains several His and Asn repeats, instead of the
Gln repeats which are more common in Drosophila
homoeotic proteins.
Many protamines and sperm histones contain
runs of up to seven arginines, and runs occur in
several viral proteins.
Serine
Ser,, (and SerZ5-Pro-Ser7)in Xenopus vitellogenin
[52]. This very serine-rich region is processed to
form phosvitin or the phosvettes.
Ser,, in a Plasmodium serine-repeat protein
(Bzik, D. J., unpublished work, but listed in the databases).
Ser,, in the feline sarcoma virus protein that
also contains the long Leu repeat (see previously).
Threonine
Thr,, in Drosophila simulans salivary glue protein
[53]. The protein is evolving rapidly in Drosophila,
so the Thr,, run is not conserved between species,
but all have an ‘A + C‘ region rich in threonine.
Volume 19
Thr, in yeast transcription factor ADR6 [54],
in a 30-residue sequence containing only threonine
and asparagine. This protein also contains
glutamine repeats.
Valine
Val, in a bovine pulmonary surfactant-associated
protein [13], is in a very hydrophobic region that is
functionally necessary (...KRLLIVVVWVVLVVVWIGAM.. .). This region is always hydrophobic,
but differs in detail between species. Valine runs are
not common.
Tryptophan
No runs of more than Trp, have yet been identified.
Tyrosine
Tyr, in a human immunoglobulin heavy-chain precursor [55].
Classes of proteins that contain
unusual sequences
In printouts of runs of amino acids, some types of
protein occur again and again. These include viral
proteins, plant and seed structural proteins, surface
proteins from protozoan parasites, and the protein
products of homoeotic genes, although the level of
viral protein occurrence may primarily reflect their
abundance in the databases. Simple repetitions of a
single amino acid do not seem to be common in
bacterial proteins.
There are also a set of miscellaneous proteins
where there is a run of acidic amino acids at the
extreme C-terminus, often being ended by the stop
codon. These are sometimes all aspartic acid or all
glutamic acid, but in a rat non-histone chromosoma1 protein there is a run of 30 mixed acidic residues, preceded by the sequence -Lys-Lys-Lys-Lys[15]. The authors speculate about a function in
nucleosome assembly or DNA replication.
It has been shown that the plant seed [ 121 and
cell-wall [561 proteins are made up from repetitions
of peptide elements. The elements are likely to be
‘pipe-bends’ with secondary structures that enable
the proteins to pack well and form the requisite
overall structures. Some divergence is taking place,
so elements are not necessarily identical. The seed
protein elements are very rich in glutamine, and the
long runs form when other residues are mutated to
glutamine. Proteins made up from repeated structural elements also occur in animals, for example
collagen, with the long-known -Gly-%a-Yaa- motif,
where Xaa is often proline and Yaa hydroxyproline.
Proteins of Unusual Sequence Composition
Similarly, in bacteria there are simple elements,
such as -Asp-Asn-Pro- in staphylococcal protease,
as well as much more complex ones like in the icenucleation proteins [6].
Possible explanations for the occurrence of
repeating elements in the surface proteins of protozoal parasites are discussed by Ridley [ 113.
Many homoeotic genes from Drosophila and
mammals have now been cloned and sequenced.
They contain a recognizable domain, the homoeo
box [lo], about 60 residues long and which contains a helix-turn-helix DNA-binding site. In the
domains on either side of this central unit there are
often long runs of a single amino acid. While glutamine runs are common, making up the widely
distributed opa box, 10 of the 20 protein amino
acids are already known to occur in runs in
developmental proteins.
It is unlikely that runs will always have a precise structural purpose. Evidence for this assertion
comes from inter-species comparisons, which show
that run lengths are highly variable in proteins.
Examples known are from proteins as diverse as
the protease calpain [28a] and an androgen receptor
[27]. In the latter protein, if the run regions are
excluded, the rat and the human protein have more
than 90% identity, but the long glutamine runs are
at positions 100 residues apart in the sequence, and
there are indications that individual humans have
different length glutamine runs.
In most cases the runs are more pronounced
at rhe amino acid rather than the nucleotide level,
indicating that they have existed long enough that
third-base mutation has resulted in a mixture of
codons being used. However, a run of phenylalanines in a Plasmodium protein is all coded by
TTT [23], and a run of Asp,, in a hypothetical cowpox virus protein are all coded by GAT [22].
The occurrence of long runs of amino acids
and complex repetitive elements in proteins is a
demonstration of the versatility of this class of molecule. I hope that we shall soon have a better understanding of their origin, their function, and the
structures that they take up.
I would like to thank Andrew Lyall and Sarah McQuay of
the Biocomputing Research Unit, University of Edinburgh, for running the database searches.
1. Brenner, S. (1957) Proc. Natl. Acad. Sci. U.S.A. 43,
687-694
2. Dayhoff, M. 0.(ed) (1972) Atlas of Protein Sequence
and Structure, Vol. 5, National Biomedical Research
Foundation, Washington D.C.
3. Ambler, R P., Bartsch, R. G., Daniel, M., Kamen,
M. D., McLellan, L., Meyer, T. E. & Van Beeumen,
J. (1981) Proc. Natl. Acad. Sci. U.S.A. 78,6854-6857
4. Weber, P. C., Bartsch, R. G., Cusanovich, M. A.,
Hamlin, R C., Howard, A., Jordan, S. A., Kamen,
M. D., Meyer, T. E., Weatherford, D. W., Xuong, N.
H. & Salemme, F. R. (1980) Nature (London) 286,
302-304
5. Sanger, F. & Tuppy, H. (1951) Biochem. J. 49,
463-481,481-490
6. Warren, G. & Corotto, L. (1989) Gene 85,239-242
7. Clegg, D. O., Helder, J. C., Hann, B. C., Hall, D. E. &
Reichardt, D. F. (1988)J. Cell Biol. 107,699-705
8. Oldberg, A., Fraznh, A. & Heinegard, D. (1986)
Proc. Natl. Acad. Sci. U.S.A. 83,8819-8823
9. Condit, C. M. & Meagher, R. B. (1986) Nature
(London) 323,178-181
10. Gehring, W. J. (1987) Science 236,1245-1252
11. Ridley, R. G. (1991) Biochem. SOC. Trans. 19,
525-528
12. Shewry, P. R., Hull, G. & Tatham, A. S. (1991) Biochem. SOC.Trans. 19,528-530
13. Glasser, S. W., Korfhagen, T. R., Perme, C. M., PilotMatias, T. J., Kister, s. E. & Whitsett, J. A. (1988) J.
Biol. Chem. 263,10326-10331
14. Bullard, B., Leonard, K., Larkins, A., Butcher, G.,
Karlik, C. & Fyrberg, E. (1988) J. Mol. Biol. 204,
621-637
15. Paonessa, G., Frank, R. & Cortese, R (1987) Nucleic
Acids Res. 15,9077
16. Kessel, M., Schulze, F., Fibi, M. & Gruss, P. (1987)
Proc.Natl.Acad.Sci. U.SA.84,5306-5310
17. Carmona, C. & Gray, G. L. (1987) Nucleic Acids Res.
15,6757
18. Kassis,J. A., Poole, S. J., Wright, D. K. & OFarrell, P.
(1986)EMBO J. 5,3583-3589
19. Davies, P. L., Roach, A. H. & Hew, C.-L. (1982) Proc.
Natl. Acad. Sci. U.S.A. 79, 335-339
20. reference deleted
21. Webb, N. R, Rose, T. M., Malik, N., Marquardt, H.,
Shoyab, M., Todaro, G. J. & Lee, D. C. (1987) DNA
6,71-80
22. Patel, D. D. & Pickup, D. J. (1987) EMBO J. 6,
3787-3794
23. Lenstra, R., DAuriol, L., Andrieu, B., Le Bras, J. &
Galibert, F. (1987) Biochem. Biophys. Res. Commun.
146,368-377
24. Reynolds, P., Weber, S. & Prakash, L. (1985) Proc.
Natl. Acad. Sci. U.S.A. 82, 168-172
25. Jentsch, S., McGrath, J. P. & Varshavsky, A. (1987)
Nature (London) 329,13 1-1 34
26. Triglia, T., Stahl, H.-D., Crewther, P. E., Silva, A.,
Anders, R. F. & Kemp, D. J. (1988) Mol. Biochem.
Parasitol. 31, 199-202
27. Lubahn, D. B., Joseph, D. R., Sar, M., Tan, J., Higgs,
H. N., Larson, R. E., French, F. S. & Wilson, E. M.
(1988) Mol. Endocrinol. 2,1265-1275
28. Acampora, D., DEsposito, M., Faiella, A, Pannese,
1991
52 I
Biochemical Society Transactions
522
M., Miogliaccio, E., Morelli, F., Stornaiulo, A., Nigro,
V., Simeone, A. & Bonicelli, E. (1989) Nucleic Acids
Res. 17,10385-10402
28a. Miyake, S., Emori, Y. & Suzuki, K. (1986) Nucleic
Acids Res. 14,8805-8817
29. Larosa, G. J. & Gudas, L. (1988) Mol. Cell. Biol. 8,
3906-3917
30. Ravetch, J. V., Feder, R., Pavlovec, A. & Blobel, G.
(1984) Nature (London) 312,616-620
31. Simpson, L., Neckelmann, N., De La Cruz, V. F.,
Simpson, A. M., Feagin, J. E., Jasmer, D. P. & Stuart,
K. J. ( 1987)J. Biol. Chem. 262,6182-6 196
32. Pathak, V. K., Nielsen, P. J., Trachsel, H. & Hershey,
J. W. B. (1988) Cell (Cambridge,Mass.) 54,633-639
33. Gaber, R. F., Styles, C. A. & Fink, G. R. (1987) Mol.
Cell. Biol. 8,2848-2859
34. Woolford, J., McAuliffe, A. & Rohrschneider, L. R.
(1988) Cell (Cambridge,Mass.) 55,965-977
35. Ohsuye, K., Kitano, K., Wada, Y., Fuchimura, K.,
Tanaka, S., Mizuno, K. & Matsuo, H. (1988)
Biochem. Biophys. Res. Commun. 150,1275-1281
36. Kikuchi, Y. & Tamiya, N. (1987) Biochem. J. 242,
505-5 10
37. Shaw, D. R., Richter, H., Giorda, R, Ohmacmhi, T. &
Ennis, H. (1989) Mol. Gen. Genet. 218,453-459
38. Stahl, H.-D., Bianco, A. E., Crewther, P. E., Burkot,
T., Coppel, R. L., Brown, G. V., Anders, R. F. &
Kemp, D. J. (1986) Nucleic Acids Res. 14,
3089-3 102
39. Laughon, A., Carroll, S. B., Storfer, F. A., Riley, P. D.
& Scott, M. P. (1985) Cold Spring Harbor Symp.
Quant. Biol. 50,253-262
40. Regulski, M., McGinnis, N., Chadwick, R. &
McGinnis,W. (1987) EMBO J. 6,767-777
41. Baer, R., Bankier, A. T., Biggin, M. D., Deininger,
P. L., Farrell, P. J., Gibson, T. J., Hatfull, G., Hudson,
G. S., Stachwell, S. C., Seguin, C., Tuffnell, P. S. &
Barrell, B. (1984) Nature (London)310,207-21 1
42. Adham, I. M., Maier, W.-M., Hoyer-Fender, S.,
Tsauosidou, S., Engel, W. & Klemm, U. (1989) Eur.
J. Biochem. 182,562-568
Volume 19
43. Graham, A., Papalopulu, N., Lorimer,J., McVey, J. H.,
Tuddenham, E. G. D. & Krumlauf, R. (1988) Genes
Dev. 2,1424-1438
44. Lamport, D. T. A. (1977) in Recent Advances in
Phytochemistry (Loewus, F. A. & Runeckles, B. C.,
eds.) vol. 11, pp. 79-1 15, Plenum, New York
45. Chen, J. & Vamer, J. E. (1985) EMBO J. 4,
2145-2151
46. Duboule, D. J., Haenlin, M., Galliot, B. & Mohier, E.
(1987) Mol. Cell. Biol. 7,2003-2006
47. Wharton, K. A., Yedvobnick, B., Finnerty, V. G. &
Artavanis-Tsakonas, S. (1985) Cell (Cambridge,
Mass.), 40, 55-62
48. Schultz, J. & Carlson, M. (1987) Mol. Cell. Biol. 7,
3637-3645
49. Trumbly, R. J. (1 988) Gene 73,97-111
50. Miesfeld, R., Rusconi, S., Godowski, P. J., Maler,
B. A., Okret, S., Wikstrom, A.-C., Gustafsson, J.-A. &
Yamamoto, K. R. (1 986) Cell (Cambridge,Mass.), 46,
389-399
51. Mlodzik, M. & Gehring, W. J. (1987) Cell
(Cambridge,Mass.), 48,465-478
52. Gerber-Huber, S., Nardelli, D., Haefliger, J.-A.,
Cooper, D. N., Givel, F., Germond, J.-E., Engel, J.,
Green, N. M. & Wahli, W. (1987) Nucleic Acids Res.
15,4737-4760
53. Martin, C. H., Mayeda, C. A. & Meyerowitz, E. M.
(1988)J. Mol. Biol. 201,273-287
54. OHara, P. J., Horowitz, H., Eichinger, H. & Young,
E. T. (1988) Nucleic Acids Res. 16, 10153-10170
55. Kipps, T. J., Tomhave, E., Pratt, L. F., Dufi, S., Chen,
P. P. & Carson, D. A. (1989) Proc. Natl. Acad. Sci.
U.S.A. 86,5913-5917
56. Showalter, A. M. & Rumeau, D. (1989) in Organisation and Assembly of Animal and Plant Extracellular
Matix (Adair, W. S. & Mecham, R. P., eds.),
Academic Press
Received 17 December, 1990