Download Molecular studies on an ancient gene encoding

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Public health genomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Human genome wikipedia , lookup

Epigenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Transposable element wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Molecular cloning wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Primary transcript wikipedia , lookup

Pathogenomics wikipedia , lookup

Copy-number variation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Genomic library wikipedia , lookup

Genetic engineering wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Genome (book) wikipedia , lookup

Point mutation wikipedia , lookup

Gene expression programming wikipedia , lookup

Metagenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene desert wikipedia , lookup

Gene therapy wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene nomenclature wikipedia , lookup

Microsatellite wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomics wikipedia , lookup

Gene wikipedia , lookup

Genome editing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

RNA-Seq wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Clinical Science (1993) 84, 119-128 (Printed in Great Britain)
i I9
Glaxo/MRS Young Investigator Prize
Molecular studies on an ancient gene encoding for
carbamoyI-phosphate synthetase
J. P. SCHOFIELD
MRC Molecular Genetics Unit, Hills Road, Cambridge, U.K.
1. Carbamoyl-phosphate synthetase (EC 6.3.5.5.)
catalyses the synthesis of carbamoyl phosphate, the
immediate precursor of arginine and pyrimidine biosynthesis, and is highly conserved throughout evolution. The large subunit of all carbamoyl-phosphate
synthetases sequenced to date comprises two highly
homologous halves, the product of a proposed ancestral gene duplication. The sequences of the enzymes
of Escherichia col, Drosophila melanogaster, Saccharomyces cerevisiae, rat and Syrian hamster all
have duplications, suggesting that this event occurred
in the progenote predating the separation of the
major phylae. Until now, only limited data on
carbamoyl-phosphate synthetase were available for
the primitive eukaryote Dictyostelium discoideum
and for the Archaea Methanosarcina barkeri MS.
The DNA sequence of the D. discoideum carbarnoylphosphate gene and additional sequence for the
carbamoyl-phosphate synthetase gene of M. barkeri
MS have been determined, and a duplicated structure
for both is clearly demonstrated.
2. Genes with ancient duplications provide unique
information on their evolution. A study of the
intronlexon organization of the rat carbamoylphosphate synthetase I gene and the carbamoylphosphate synthetase hamster I1 gene in the CAD
multi-gene complex shows that at least some of their
introns are very old. Evidence is provided that some
introns must have been present in the ancestral
precursor before its duplication.
3. The human carbamoyl-phosphate synthetase I gene
has been isolated and characterized. A human liver
cDNA library was constructed and probed for
carbamoyl-phosphate synthetase I. A human genomic
DNA cosmid library was also probed for the
carbamoyl-phosphate synthetase I gene. The cDNA
sequence of the human carbamoyl-phosphate synthetase I gene has been determined, and work has
been initiated to confirm that at least part of this
gene is contained within two cosmids spanning 46 kb.
This will enable future studies to be made on mutations in this gene in the rare autosomal recessive
deficiency of carbamoyl-phosphate synthetase I.
INTRODUCTION
Traditional evolutionary studies have been
through the applications of paleontology and comparative anatomy. From these studies a systematic
view of evolution has been developed, with a central
scheme that highly complex multicellular organisms
evolved from simple unicellular ones over 600 million years ago. Nucleic acid sequence information
has allowed molecular evolutionists to frame
questions concerning the origin of cells and the
structure of genes within the earliest forms. Molecular data accumulated over the past decade suggests
that the common ancestor of all life arose between
3500 and 4000 million years ago [l]. It is now
generally accepted that investigations based on the
genotype rather than the phenotype are most
revealing.
The first gene product to be studied extensively,
and which established the concept of a molecular
chronometer, was cytochrome c [2]. Although its
analysis extended the understanding of the eukaryotic branch of the tree, it has been evolving too fast
to be used at the earlier phylogenetic levels.
Furthermore, this molecule is not found in many
bacteria, and is not functionally constant.
The quantitative molecular analysis of 16s ribosomal RNAs [3] from several hundreds of organisms has led to the conclusion that there are three
separate and distinct cell lineages from which all
modern cells are derived: eubacteria, archaebacteria
and eukaryotes. This classification supersedes the
more traditional division into prokaryotes and
eukaryotes, as at the molecular level archaebacteria
are no more closely related to prokaryotes than
they are to eukaryotes. The subdivisions have
Key words: D N A sequencing, gene structure, intronr, molecular evolution, polymerase chain reaction.
Abbreviations: CPSase, carbamoyl-phosphate synthetase; PCR, polymerase chain reaction.
Correspondence: D r 1. P. Schofield, Clinic 12. Addenbrooke's Hospital, Cambridge CB2 2QQ, U.K.
I20
J. P. Schofield
recently been named Bacteria, Archaea and Eukarya
[4]. N o one of these lineages predates the other
two, and all three were derived from a common
ancestor, the progenote [4]. Whether the progenote
was itself a true organism, or represented a prebiotic
state of a primitive genetic order, is unresolved.
Eukaryotic genes, as well as a small number of
prokaryotic and organellar genes, have long intervening unexpressed sequences (introns) dividing the
coding sequence into pieces (exons). The existence of
introns in contemporary genomes has led to several
mechanistic and historical questions. The debate on
the function and origin of introns continues, and
this study deals solely with the question of how old
are introns, i.e. present from the beginning or
inserted later during evolution. Since the discovery
of introns 15 years ago [5, 61 many of the original
concepts of the structure of genes have had to be
completely revised. The ‘introns-early’ school proposes that introns were present in the progenote,
and the trend since then has been towards loss [7,
81. Intron loss [9] was explained to be a property of
the ability of certain introns to self-splice as a
remnant of the proposed original ‘RNA world’ [lo].
All of the genes quoted as examples of exon shuffling are relatively modern in evolutionary terms. In
order to examine the origin of introns, the structure
of ancient genes encoding proteins fundamental to
enzymic pathways in existence predating prokaryotes and eukaryotes (i.e. the progenote) must be
studied [S]. A unique insight into the origin of
introns would be obtained by studying a gene with
an ancient tandem duplication (ancient referring to
a gene with representatives from each of the three
major domains), the reason for choosing the gene
encoding for carbamoyl-phosphate
synthetase
(CPSase), an ancient gene with a tandem duplication and representatives from each of the three
recently defined domains of Bacteria, Archaea and
Eukarya.
Any candidate gene product proposed as a molecular model from which to study the structural
evolution of large vertebrate genes should have
representatives from each of the three major
lineages. Studies on the phylogenetic relationships
using 16s ribosomal RNAs are limited when one
begins to pose questions on the evolution of the
structure of large eukaryotic genes. The gene product must not share the limitations of cytochrome c.
Rather the gene should closely resemble the 16s
rRNA parameters if it is to prove useful as a
molecular chronometer (see above).
The gene encoding for CPSase [carbamoylphosphate synthetase (glutamine-hydrolysing), EC
6.3.5.51 fulfils these criteria, being large, universally
distributed, of constant function and highly conserved over great phylogenetic distance. Sequence
data for the CPSase gene are published for Escherichia coli [1 11, Saccharomyces cerevisae [121, Drosophila melanogaster [131, Syrian hamster [141 and
rat [l5]. CPSase catalyses the formation of the
m
Glutamine
CPSase
CPSase
I
Carbamoyl phosphate
0
II
H,N-C-0-P--Q.
0
II
I
0-
GATare
+
CO1 2ATP
+ HIO
Fig. I. Synthesis of carbamoyl phosphate. The small (42kDa) subunit,
glutamine aminotransferase (GATase), catalyses the hydrolysis of glutamine.
Cysteine (C) and histidine (H) are key active-site amino acids. The free
amino group is release t o the large (I2OkDa) subunit. T w o ATP molecules
are required at nucleotide-binding sites (NBDs). The symmetry of structure
and function between the t w o halves of the large subunit has suggested that
each half acts in a separate but coordinated mechanism t o catalyse the t w o
partial reactions of the phosphorylation of bicarbonate t o carboxy phosphate, and the phosphorylation of carbamate t o carbamoyl phosphate.
highly reactive compound carbamoyl phosphate, the
immediate precursor for arginine and pyrimidine
biosynthesis (Fig. 1). The enzyme is a dimer composed of a small (42kDa) and a large (120kDa)
subunit. The small subunit catalyses the hydrolysis
of glutamine (requiring active-site cysteine and histidine) releasing the free amino group to the large
subunit [16]. The large subunit catalyses the formation of carbamoyl phosphate in a complex reaction
between ammonia, carbon dioxide and water. In
E . coli, the small and large subunits are encoded by
the carA [17] and carB [IS] genes, respectively. The
amino acid translation reveals a high degree of
homology (39%) between the N-terminal and Cterminal halves of the carB gene, suggesting that it
has arisen from an ancient duplication of a smaller
ancestral gene [l 11. That the duplication occurred
in the progenote would be strongly supported by
confirmation of duplication in the CPSase gene of
representatives from the two other major lineages,
Archaea and Eukarya.
The synthesis of carbamoyl phosphate in higher
eukaryotes is catalysed by two separate enzymes:
CPSase I for arginine biosynthesis, and CPSase I1
for pyrimidine biosynthesis. These two enzymes are
encoded by separate nuclear genes, and CPSase I is
transported into the mitochondria, whereas
CPSase I1 is active in the cytoplasm. By studying
the molecular structure of the nuclear-encoded
CPSase I gene, whose product is directed into the
mitochondrion, insight into the structure of the gene
as it existed when captured in the endosymbiont
[19] may be inferred. Then, by comparing the
duplicated structure with that of the cytoplasmic
CPSase I1 relative, the molecular structure of the
common ancestor before gene duplication may also
be inferred. For example, it should be possible to
establish whether the CPSase I1 gene within the
CAD gene (a multi-gene complex also encoding
Aspartate carbamoyltransferase and Dihydroorotase activity for pyrimidine biosynthesisde nouo)
Evolution of the carbamoyl-phosphatase synthetase gene
[14] was simply a copy from the CPSase I gene
after integration of the latter into the host nuclear
genome. This feature is important when considering
the evolution of the two enzymes from a common
ancestor.
Following the accumulation of data on the
detailed structure of the CPSase I gene, it is a
logical progression to investigate clinical conditions
involving molecular defects of the gene. Gelehrter
and Snodgrass [20] published the first clear case
report of lethal neonatal hyperammonaemia secondary to an almost complete deficiency of CPSase I.
An autosomal recessive mode of inheritance was
assigned after a family study [21]. A severely hyperammonaemic infant who failed to respond to therapeutic intervention was reported [22]. The authors
postulated that the patient either failed to transcribe
CPSase mRNA or that the mRNA was transcribed
but not translated. They concluded that a distinction between the two possibilities awaited the isolation of a human CPSase I cDNA probe to use in
RNA hybridization studies. Fundamental therefore
to the elucidation of the molecular defect(s) underlying CPSase I deficiency is the DNA sequence of
the normal human CPSase I gene. This would be
predicted to be a large project, as it would be
inferred from evolutionary studies that the sequence
of the human mRNA would closely resemble that of
rat CPSase I, at around 5.7kb [l5].
METHODS
Genomic DNA isolation
DNA was isolated from freeze-dried Methanosarcina barkeri MS by the grinding method in liquid
nitrogen [23]. Genomic DNA from the AX-2 strain
of the slime mould Dictyostelium discoideum was a
kind gift from R. Insall, MRC Laboratory of
Molecular Biology, Cambridge, U.K. Rat, hamster
and human high-molecular-mass genomic DNA was
extracted and purified by standard protocols
[24, 251.
RNA isolation
Total cellular RNA was extracted from normal
human liver tissue by a single-step acid-phenol
procedure [26]. Polyadenylated mRNA was rapidly
purified from the total RNA by oligo(dT) affinity
chromatography [27], using a poly(A) Quik mR-NA
column (Stratagene, La Jolla, CA, U.S.A.).
Human liver cDNA synthesis, library construction and
screening
Synthesis of first-strand cDNA used purified
human liver mRNA as a template in conjunction
with avian myeloblastosis virus reverse transcriptase
121
('Super-RT', Anglian Biotechnology, Colchester,
U.K.). Second-strand cDNA replacement synthesis
was by nick translation [28]. BamHl synthetic
adaptor oligonucleotides were ligated on to the ends
of the double-stranded cDNA to increase ligation
efficiency into BamHl-digested 2 vector arms.
Before ligation the cDNA was size-selected to maximize the yield of full-length cDNA clones. Ligated
cDNA replaced the 'stuffer' fragment of 1 phage,
and was packaged with a high-efficiency packaging
mix (Stratagene). For full coverage of the liver
lo6 independent plaques were
cDNA library,
screened [29]. A high-specific-activity 1 kb CPSase I
cDNA probe was generated by the random hexamer
method [30]. Positive replica signals were plaquepurified by serial dilution platings. A human genomic DNA cosmid library in Lorist 6 vector with
average insert size of 33-45 kb was a kind gift from
L. Buluwela, MRC Laboratory of Molecular
Biology, Cambridge, U.K.
-
Polymerase chain reaction (PCR)
The PCR [31] was used extensively, and new
variations were developed. To prove that the carB
gene of M . barkeri MS was a duplication, the Nterminal-encoding half of the gene was amplified
using a combination of a specific anti-sense primer
and a degenerate sense primer [32]. Computer
multiple-alignment of all known primary amino acid
sequences of the large subunit of CPSase was
performed on a DEC-VAX mainframe computer.
The redundant sense primer was designed to
amplify from the most highly conserved 5' gene
sequence (Fig. 2), and SalI restriction enzyme recognition sites were added to the 5'-end of the primers
to facilitate later cloning of the PCR product(s).
Reactions were performed either in 0.5 ml
Eppendorf tubes, or in thermostable polycarbonate
plates (Hi-Temp 96; Techne, Cambridge, U.K.,
designed by J.P.S.) according to the number of
reactions. The PCR mix was prepared on ice to
minimize non-specific amplification. A 50 pl reaction
mix contained: 0.5-1pg
of genomic DNA,
100mmol/l neutralized deoxynucleotide triphosphates, 5 p1 of 10 x reaction buffer (100mmol/l TrisHCl, pH 8.3 at 25"C, 500 mmol/l KCl and 15 mmol/l
MgCl,), 1 pmol of each primer/l, 0 . 5 ~ 1of Taq
polymerase (2.5 units, Cetus; Norwalk, CT, U.S.A.)
and sterile double-distilled water to 50 pl. The mix
was overlaid with light mineral oil (50p1), before a
brief vortex and pulse centrifugation. A programmable thermocycler (Techne PHC-2, Cambridge,
U.K.) was pre-heated to 95°C before incubating the
reaction tubes in the machine to minimize nonspecific amplification.
Amplification profiles differed according to the
hybridization temperature of the primer pair, as well
as the predicted length of the product. An amplification profile for the D. discoideum CPSase I1 gene
J. P. Schofield
I
N-terminus
Rat CPSase I
Hamster CAD
Ormophilo CAD
Yeast U R A l
LILGSGGLSIGOAGEFDYSGSOA
LILGSGGLSIGOAGEFDYSGSOA
.................YSGSOA
LVIGSGGLSIGOAGEFDYSGSOA
LILGSGGLSIGOAGEFDYSGSOA
Ll~GAGPlVlGDACEFDYSGAOA
D. diaoideum
C-terminus
Rat CPSase I
Hamster CAD
Drosophila CAD
Yeast URAZ
car6
D. discoideum
E. co11 corAB
................
E. coli car0
M. borkeri MS
PCR sense primer:
~
A
G
E
F
D
Anti-sense PCR primer:
Y
5 ' ACTGLCGAC. CAGGCAGGAGAATTCGATTA 3 '
Sol1
A
G
T
C
G
C
T
G
T
R
C
t
P
S
Y
V
L
5 : CGT. CCT. TCC. T A T GTG C T T 3
3
GCA GGA. AGG. ATA. CAC. GAA. r a G C l G T C A 5
Sol I
256-fold redundancy
I
Fig. 2. Computer design of M. barkeri PCR primers. Computer multiple alignment of all known primary amino acid sequences
of the large subunit of CPSase was performed on a DEC-VAX mainframe computer. The alignment showed long stretches of highly
conserved sequences between all species. This information was used in conjunction with the partial DNA sequence data available for
the M. barkeri car6 gene [23] to select the anti-sense primer RPSYVL. The primer was the reverse translation of the published nucleic
acid sequence, and was extended at its 5' end to include a So11 (Sal I) restriction enzyme recognition site t o facilitate cloning of the
PCR product. To obtain the longest car6 PCR product the most conserved sequence at the N-terminus was used from which to design
a redundant sense primer QAGEFDY.
3.1 kb
h
-b
2.4kb
CPSase
DD
;
I
DHOase
I 1
1
kb
2.4 b
mb
4 2.3
420
Fig. 3. PCR strategy for the CPSase gene of D. discoideum (DD). The PCR oligonucleotide primers, A and 8,amplified a 2.4kb
single fragment, seen here run on a 0.6% agarose gel against 1 Hindlll size markers. Abbreviations: GATase, glutamine aminctransferase; DHOase, dihydro-orotase, ATCase, aspartate carbamoyltransferase.
was 35 cycles of: 95°C strand dissociation for
OSmin, 58°C primer annealing for 0.5min and 72°C
enzyme extension for 1 min (predicted product size
of 2.4 kb, Fig. 3). For the rat and hamster CPSase
genes, PCR primers were derived from the known
;DNA sequences and were designed to flank
computer-predicted intron sites (Fig. 4). As the
length of the introns was unknown, the PCR cycle
profiles were adjusted for individual primer pairs. In
some instances non-specific amplification was only
circumvented
PCR [33].
by
the
application
of
nested
DNA cloning and recombinant
screening
Before cloning of cDNA or genomic DNA PCR
products into M13 phage or plasmid host, one-tenth
of the reaction was subjected to agarose mini-gel
electrophoresis [25] to determine the number of
bands, their size and approximate yield. If necessary,
Evolution of the carbamoyl-phosphatase synthetase gene
Primer I A
(a)
I23
Primer I B
Gene
'\
?
.,,* ,.
,.
,
a
I
I
cDNA
,'
.
I
1
A
Predicted intron site
(b)
I
?
-c
I
d
- -lntron
?
g
f
?
j
-h
k
!
m
?
1
c-
n
I
I
u
3.8 kb
e
Fig. 4. PCR strategy for intronlexon gene dissection. (a) PCR across intron/exon boundaries. Oligonucleotide primers used in
PCR amplification (e.g. I A / I B) were recessed from the predicted intron-exon junction to facilitate rapid sequence orientation of the
known cDNA open reading frame to the non-coding intron sequence. (b) Nested PCR of large DNA fragments. Primers A and B
amplified a large product, which served as input template for subsequent internal amplifications across predicted intron-exon
boundaries, e.g. e+h,g+k, m+B, etc.
PCR products were further gel-purified and digested
with appropriate restriction enzyme(s) according to
established procedures [25]. Ligation was into a
similarly digested host vector, and transformed into
competent E . coli cells. Recombinant screening was
rapidly performed by PCR in thermostable polycarbonate plates [34], using Universal M 13 forward
and reverse sequencing primers as PCR primers.
Mini-preparation of plasmid recombinant DNA
[35] provided a sufficiently pure template for DNA
sequencing.
DNA sequencing
DNA sequencing was by modifications of the
dideoxy chain termination method [25]. Alternative
methods were also used to avoid cloning and
recombinant screening of PCR products before
DNA sequencing. The two most reliable were the
techniques of solid-phase sequencing of 5'-biotinlabelled PCR products to streptavidin-coated paramagnetic beads [36], and linear amplification
sequencing [37]. In the latter method chain
termination sequencing was with four spectrally
distinct fluorescent dye-labelled dideoxynucleotides
(DyeDeoxy Terminators; Applied Biosystems, Foster
City, CA, U.S.A.). The terminated products were
electrophoresed on an Applied Biosystems 373A
semi-automated sequencing machine consisting of a
laser excitation source coupled to a microcomputer
for data acquisition and analysis.
Sequence data for human CPSase I cDNA was
input into a computer database, contigs joined and
assembled using the Staden packages run on a
DEC-VAX mainframe computer.
RESULTS
M. barkeri MS cars gene
Semi-redundant PCR of M . barkeri MS genomic
DNA resulted in several products. The inter-primer
predicted distance was around 2kb, yet the dominant product was much smaller at around 0.4 kb.
The sequence of this product revealed that the
redundant primer had annealed to the similar
sequence at the 5' end of the C-terminal-encoding
half rather than the 5' end of the N-terminalencoding half of carB. The redundant sense primer
was redesigned, as well as a new anti-sense primer,
to inhibit dual priming of the sense primer. A
product of the predicted 1.6kb (Fig. 5) was amplified and directly sequenced on the ABI 313A automated sequencer. Sufficient sequence information
was determined to clearly demonstrate that the carB
gene of the Archaea M . barkeri MS has an internal
duplication, and that the duplication is at an equivalent position to that in the E . coli carB gene.
D. discoideum CPSase II gene
The complete nucleotide and derived amino acid
sequences of the D. discoideum CPSase I1 gene
within the PYRl-3 multigene [38] confirm a clear
gene duplication [39]. Alignment of the N- and Cterminal halves shows 28.7% sequence identity and
51% sequence similarity. There are 3126 nucleotides
of open reading frame, encoding 1042 amino acids,
and no introns (EMBL no. X55433). These data
establish that the CPSase I1 gene of the eukarya
D. discoideum has a tandem duplicated structure,
J. P. Schofield
I24
M
I
,
i
encoding the N-terminal half and 28.7kb for the
C-terminal half. Several areas of the gene were only
amplified by using a nested approach, and primer
pairs were designed to overlap both upstream and
downstream of predicted intron positions to ensure
complete coverage. The first intron of the gene
encoding the N-terminal half is one codon downstream from the predicted site when compared with
intron 2 of the C-terminal half. The other concordant intron position is intron 5 of the N-terminal
half with intron 9 of the C-terminal half. This intron
is in exactly the same place and phase in each half
of the duplicated CPSase I gene.
The 3.2 kb CPSase I1 cDNA sequence of Syrian
hamster CAD [14] is a duplication of 1.6kb halves.
PCR amplification of each half demonstrated a
6.6 kb product for the N-terminalpredominant
encoding half and -3.8kb for the C-terminalencoding half, the size difference being accounted for
by introns (Fig. 6). Cloning of these products consistently failed, most likely as a result of insert instability. Definition of the intron-exon structure of the
hamster CPSase I1 gene was achieved by secondary
amplification from the large PCR products as a
template in conjunction with internal primer pairs.
The gene is composed of 17 introns, divided
between eight introns in the N-terminal-encoding
half and nine in the C-terminal-encoding half. All
the introns observed the GT-AG consensus [40],
the intron lengths ranging from -0.1 to -3 kb. A
computer alignment of the structures for the rat
CPSase I and hamster CPSase I1 genes indicated
clear homology, with a common tandem duplication
structure. A comparison between the intron
positions for each half of each gene indicates that at
least two pairs are concordant, e.g. intron 5 is
concordant between all halves of rat and hamster
CPSase genes. Several other introns are concordant,
e.g. between the gene encoding the N-terminal half
of hamster CPSase I1 and the C-terminal half of rat
CPSase I (Fig. 7).
-
-
rn
lm
VA
Human CPSase I gene
which it shares with the Bacteria E. coli carB and
the Archaea M . barkeri MS carB genes.
Genomic organization of rat CPSase I and hamster
CPSase II genes
In contrast to the 13 introns of the rat CPSase I
gene encoding the C-terminal half, the N-terminal
half is expanded by only eight introns. All the
boundaries conform to the GT-AG consensus
sequence for nuclear pre-mRNA introns [40]. These
introns add a further -13kb to the 1656 nucleotides of the exon sequence. The gene spans approximately 43 kb, divided between 14.6kb for the gene
-
It was predicted that the strong homology for
CPSase I would apply to the human liver mRNA. A
pair of rat CPSase I primers designed to amplify the
N-terminal-encoding half of the large subunit were
used to amplify human liver cDNA. The product
yield was increased by performing a second round
nested amplification with a pair of internal primers
(Fig. 8). Cloning and sequencing of the 1 kb product confirmed it as encoding CPSase I with high
sequence homology to rat, but not absolute identity
and therefore not a contaminant. A human liver
library of high titre (5 x 10' plaque-forming units/,ug
of cDNA) was constructed and probed under
stringent conditions (65°C overnight) with radiolabelled human CPSase I cDNA PCR product.
Screening resulted in five purified plaques, from
which DNA was purified and sequenced. The
-
Evolution of the carbamoyl-phosphatase synthetase gene
I
CPSase A
I25
CPSase B
6.6 kb
Fig. 6. Large fragment amplification of the hamster CAD CPSase II gene. The gel photographs show the results of amplifying
each half of the tandem gene duplication. These products were themselves used for internal PCR at predicted intron positions
(Fig. 4b).
nucleotide sequence for human CPSase I was
obtained from both strands to confirm the sequence
(J. P. Schofield, unpublished work). There is 98%
amino acid sequence identity with rat CPSase I,
with tandem duplication of the large subunit. A
human genomic cosmid library screen for CPSase I
gene resulted in two clones. Restriction enzyme
analysis estimates that the two clones span -46kb
of the humans CPSase I gene. Partial sequencing of
one of the clones confirms that it contains the
CPSase I sequence.
DISCUSSlON
Genes with ancient duplications provide unique
information on their evolution. The highly conserved product of the CPSase I gene is a powerful
new molecular model for gene evolution. In proposing that the tandem gene duplication had
occurred in the progenote it is clearly important to
provide representatives from each of the three
domains: Bacteria, Archaea and Eukarya [4].
During DNA sequencing upstream of the argC
gene in the Archaea M . barkeri MS, Morris and
Reeve [23] made the chance discovery of the 3' end
of the carB gene. Unfortunately, there was insufficient DNA sequence information to establish
whether there was a tandem gene duplication like
that in the carB gene of E . coli. There was, however,
sufficient sequence to apply successfully an adaptation of the PCR using a redundant amplification
primer [32]. If this technique had not been available, a genomic library would have been required,
and probed with a 5' sequence from the known carB
gene. The carB gene of M . barkeri has now been
proven to be a tandem duplication, the junction
occurring at the same position as in E. coli carB.
Tandem duplications have been clearly demonstrated in each of the three domains. The hypothesis
that the CPSase gene duplicated in their common
ancestor, i.e. the progenote, is now conclusive. Having established this single duplication event, the
question now focuses on the origin of introns, using
the CPSase gene as a unique model. Introns had
previously been described in the 3' half of the rat
CPSase I gene [l5]. The hypothesis was that if
introns were present in the common progenotic
ancestral gene before duplication, then several, if not
all, should be in concordant positions when comparing the two halves of intron-containing genes. As
the rat CPSase I cDNA and the partial gene
sequence were known, by performing a computer
alignment of the two cDNA halves and marking the
position of the known introns, predictions of the
position of the remaining 5' introns could be made.
The PCR was used in a novel application to amplify
across predicted intron-exon boundaries. The
alternatives would have been heteroduplex mapping
or genomic library screening and sequencing. For
this particular requirement the former would have
been too insensitive as exact intron-exon boundaries were necessary for comparative purposes. A
potential major limitation of the PCR-based technique is the upper size limit of intron which can be
amplified. This research demonstrates that prolonged extension times of lmin/kb of target template are excessive, and by similarly decreasing the
annealing and extension times the total cycling time
can be significantly reduced. There is the added
benefit of retaining activity of the relatively thermo-
). P. Schofield
I26
the situation is rather more complex, with the
introns being of various ages, some truly ancient,
with others having been inserted or lost.
To provide further supportive information, the
FLPITPOFVTEVIKAERPDGLILGMGGOTALNCGVELFKRGVLKEYGVKVLGTSVESINA
RATNSEO
FEELSLERILDIYH~ACNGCIISVGGOIPNNLAVPLYKNGV......
KIMGTSPLOIOR
RATCSEO
Syrian hamster CPSase I1 gene within the CAD
F L P I T P H Y V T ~ I R N E R P O G V L W T F G G O T A L N C G V E L T K A G V L A R Y G V R V L G T P V IOL
E~
HAMNSEO
FOEISF~VMDlYELENPOGVlLSMGGOLPNNMAMALHROOC......
RVLGlSPEAlDS
multi-gene complex [14] was similarly dissected by
HAMCSEO
. . .
. . . . . . . :. .A
. . ..
.
the PCR. The cDNA sequence was computer
l i O R O - F S D I ( L h E I N € < I A O S F A E&DA
< A A C I I GYPVMl R S A I A - G Z - G S G I C’hC
RATNSEO
A E O R S l i S A V L O E ~ L V A O A P W < A \IaEA E F A h S V S I P C - . R P S I V - S S A M h V ~ : S E
RATCSEO
aligned with the rat sequence, and the intron
I E D R R A F A A R 3 A E I G E n V A P S E A NSLE& A A A E R ~ G Y P V ~ V R A A F A ~ C ~ L G S G F A S I L
HAMNSEO
positions of the latter were marked. The primary
AEhRFCFSR..OIIGISOP3IRE
SO-EJA 3 i C O l I G Y P C V ~ R P S V ~ . S G A A M h V A Y l O
HAMCSEO
...
......
amino acid sequences of rat CPSase I and hamster
El-MO-GT&-FAMThOl-VERSVlGY<E I E ~ E V V R O A O O h C V l V C h M E ~ V O A M C V ~ ~ G ~
RATNSEO
CPSase
I1 are highly homologous, the products of a
DEM~RF~EEATRVS~nPVV~ILFIEGAREVEMOAV~C~~G~VIS~AISE~V~OA
GV~SG
RATCSEO
€ E L S A - V A P A - - F A * l S O I . IO<S-LGwI(E I E Y E I V R O A Y G h C v ~ C l ~ E \ ~ O P ~ G l ~ l G common tandemly duplicated ancestor. The hamster
HAM N SE0
GOCERF~SSAAAYSKinPVVlSLFlO~A~lD~DAv-ACnG~VS~l~lSEhVEh~C~~SG
HANCSEO
. . . .
gene was found to contain 18 introns, nine for each
O S V V V A P A O T ~ S h A i F O M ~ R ? l S l h Y V S * . G I V C E C h- . A . n P l S M E Y C I I E V h A R - S R
half.
One of these is in the same position and phase
RATNSEO
OAl~M~P~OllSOGAlE~VLOAlRClALAFAlSGPFhVOFLVKGhOV~~-VlECh~RASR
RATCSEO
in
each
half, as well as concordant with one of the
E S I Y V A P S O l . \ O R E ~ C ~ . R R l A l < ~ T o n - G I VGEChVOYA-hPESEbY I I I EVhAR-SR
HAMNSEO
O A l . V T P P O O I l P < l . E P I < A l V ~ A V G O E ~ O V ~ C P F h ~ O L l A ~ ~ O O ~ C - - V l ~ C h V R V S R proposed ancient pair of introns in the rat CPSase I
HAM C S E 0
..
...
......
gene. Furthermore, three single introns in the
S S A - A S < A l & P L A F IAAC1A.G- - - - - IP.PE I < h ~ V S G ~ l S A ~ ~ C P S . O ~ M V ~ C l P R Y
RATNSEO
S F P F V S < T . G V ~ F I O U A T L V M ~ C E S V D E ~ ~ . P ~ . E O P ~ I P S . . . . . . ~CY.
.A
. .P
l .F
RATCSEO
hamster gene are concordant with other rat introns,
SSALASKATGYPLAYVAAKLALG.....IPLPELR~
SVTGGTAA.F€PSLOYCVVKlPRW
HANNSEQ
suggesting that these too are ancient in origin. The
SFPFVSKTLGVDLVALATRIlMGEKVEPIGL......NTGS.........
CVVGV.VPOF
HAMCSEO
. .:. . .....
fact that the rat CPSase I and hamster CPSase I1
RATNSEO
genes have different intron-exon structures proves
RATCSEO
HAMNSEO
that one is not simply a duplication of the same
HAMCSEO
nuclear-encoded gene. Rather, it is likely that the
LOL- -RKELSEPSSTR IY A IA&LENN.WLOE IV K L T S IO K W F L Y K M R OILNMDKTLKGL
RATNSEO
CPSase I gene was introduced into the eukaryotic
--FLGVAEOLHNEGF~LFATEAT--SDWLNANNVPATPVA-W---PSOE---GONPSLSS
RATCSEO
nuclear genome along with the majority of other
......VELETPTDKRlFVVAAALWAGYSVERLYELTRlOCWFLHRMKRIVTHAOLLEOH
HAMNSEO
SELLPTVRLLESLGYSLYASLGT--ADFYTEHGV~VTAVO-W---HFEEAVDGECPPORS
HAMCSEO
mitochondria1 genes after endosymbiosis. In con. .
trast, the CPSase I1 gene was probably indepenNSESVTEETLROAKEIGF--SDKOISKCLGLTEAOTRELRLKKNIHPWVK~DTLAAEYP
RATNSEO
IRIYIRDGSIOLVINLP.....NNMNTKFVHONYVIRRTAVDSC~........
ALLTNF.
RATCSEO
dently acquired from another source.
R G O P L S O O L L H O A K C L G F - - S O K O IA L A V L S ~ E L A V R K L R O E L G I C P A V K OIDTVAAEWP
HAMNSEO
ILDOLAENHFELVINLSMRGAGGRRLSSFVTKGYRTRRLAADFSV
.......P L I I O I K
Further support for an ancient origin of some
HAMCSEO
introns
in the CPSase gene was sought by comSVTNYL-YVTYNG6EHDIKFD-EH
RATNSEO
VTKLFAEAV-OKARTVOSKSLFYR
RATCSEO
pleting the DNA sequence of an early eukaryote,
AOTNYL-YLTYWGNTHOLOF---R
HANNSEO
the slime mould D . discoideum. Faure et al. [38]
CTKLFVpLGOIGPAPPLKVHVDC
HANCSEO
obtained
a partial sequence from each end of the
92
CPSase I1 gene in the PYR1-3 gene complex (equivalent to CAD). They predicted that the CPSase
Fig. 7. Alignment of rat and hamster CPSase. Primary amino acid
moiety would be intron-less, as they had found to
sequences for the N-terminal (RATNSEQ, HAMNSEQ) and C-terminal
be the case for the rest of the PYR1-3 multi-gene.
(RATCSEQ, HAMCSEQ) halves of rat and hamster CPSase were computer
aligned. lntron positions are indicated (v),
The most highly conserved
However, some D . discoideum genes contain very
ancient introns between the rat and hamster genes are boxed.
short introns, and so the missing -2.4kb of the
CPSase gene was amplified by PCR as a preliminary to DNA sequencing. This confirmed that the
stable enzyme Tuq polymerase, and so increasing
gene was again a tandem duplication with the same
the amplification efficiency. Using these techniques
junction between halves as for other species. The
the 5’ half of the rat CPSase I gene was shown to
gene was uninterrupted, as had been predicted. A
have eight introns, two of which were concordant
further practical problem arose, namely the difficulty
with those in homologous positions in the 3‘ half. If
in sequencing the cloned D. discoideum DNA when
introns had been inserted after the duplication
in a pUC plasmid. This was circumvented by subevent, coincidences would have been highly unlikely.
cloning into M13, as well as using the solid-phase
The conclusion is that the two concordant introns
sequencing technique to walk along the 2.4 kb PCR
were already present in the single ancestral gene
product [391.
before duplication. Whether they have been selecIn conclusion, the CPSase gene is a tandemly
tively retained because they are at significant
duplicated progenotic gene. The single gene conpositions separating functional domains of the
tained several introns before duplication in the
translated protein remains to be elucidated, as little
progenote. All of the introns have been lost from
information is currently available on the CPSase
Bacteria and Archaea as a result of selective evolufolded protein structure. With regard to the remaintionary pressure to streamline their genomes. Simiing discordant introns, it is only possible to specularly the introns were lost from the Eukarya D. dislate that either they were inserted later or are the
coideum. The structure of the CPSase gene in rat
residue of a much larger number of introns in the
and hamster is a mosaic of introns of various ages,
ancestral gene, some of which have been randomly
the concordant pairs being the most ancient. An
lost with the passage of time. It is more likely that
alternative hypothesis is that the ancestral gene was
RATNSEO
RATCSEO
HAMNSEO
HAMCSEO
-
u
I
.
Evolution of the carbamoyl-phosphatase synthetase gene
5’
I
I
CPSase A
4
I
I
I
*
CPSase B
I27
3’
4
12B 13B
I
Ikb
-9
Fig. 8. Nestec
before cloning
maximize the p
run against I I
uninterrupted, introns being inserted later during
evolution.
The knowledge of the intron-exon structure, and
techniques accumulated from the previous experiments were applied to the isolation and sequencing
of human CPSase I cDNA. This is the first step
towards a molecular understanding of the rare
autosomal recessive disease of CPSase I deficiency.
Cosmids have also been isolated as part of the goal
to develop a genomic DNA analysis of diseased
patients and carriers. This would then allow a
simple blood sample to be analysed as a useful
screening procedure. Either of the primers could be
modified to facilitate a simple colorimetric assay
after PCR, and before direct DNA sequencing of the
amplified disease locus (or loci). There are several
possible modifications of the techniques described to
be explored, applicable to both CPSase I deficiency
and other more common dieases.
ACKNOWLEDGMENTS
This work was supported by an MRC Training
Fellowship. I am indebted to Professor Sydney
Brenner for his continuing support, advice and
encouragement.
REFERENCES
I. Fox GE, Stackebrandt RB, Hespell 1, et a1 The phylogeny of prokaryotes.
Science (Washington, DC) 1980; 20% 45763.
2. Fitch WM, Margoliash E. Construction of phylogenetic trees. Science
(Washington, DC) 1967; 155 27-4.
qested primers (3A and 128) were used t o generate a single product
screening. The most 5’ sequence of the large subunit was amplified t o
rose gel photograph shows identical samples of PCR products ( - I kb)
3. Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: the
primary kingdoms. Proc Natl Acad Sci USA 1977; 7 4 5088-90.
4. Woese CR, Kandler 0, Wheelis ML. Towards a natural system of organisms:
proposal for the domains Archaea, Bacteria, and Eukarya. Proc Natl Acad Sci
USA 1990 81: 4576-9.
5. Berget SM, Moore, C, Sharp PA. Spliced segments at the 5’ terminus of
adenovirus 2 late mRNA. Proc Natl Acad Sci USA 1977; 7 4 3171-5.
6. Chow LT, Gelinas RE, Broker TR, Roberts RT. An amazing sequence
arrangement at the 5‘ ends of adenovirus 2 messenger RNA. Cell 1977; I 2
1-8.
7. Darnell JE. Doolittle WF. Speculations on the early course of evolution. Proc
Natl Acad Sci USA 1986; 8 3 1271-5.
8. Gilbert W, Marchionni M, McKnight G. On the antiquity of introns. Cell
1986; 46: 151-3.
9. Scraphin B, Boulet A, Simon M, Faye G. Construction of a yeast strain devoid
of mitochondrial introns and its use t o screen nuclear genes involved in
mitochondrial splicing. Proc Natl Acad Sci USA 1987; 84. 68104.
10. Joyce GF. RNA evolution and the origins of life. Nature (London) 1989; 338:
217-23.
I I. Nyunoya H, Lusty CJ. The car6 gene of Escherichia coli: a duplicated gene
coding for the large subunit of carbamoyl phosphate synthetase. Proc Natl
Acad Sci USA 1983; 80: 4629-33.
12. Lusty CJ, Widgren EE, Broglie KE, Nyunoya H. Yeast carbamyl phosphate
synthetase. J Biol Chem 1983; 258: 14466-72.
13. Freund IN, Jarry BP. The rudimentary gene of Drosophila melanogaster encodes
four enzymatic functions. J Mol Biol 1987; I 9 3 1-13.
14. Simmer JP, Kelly RE, Rinker AG, Scully JL, Evans DR. Mammalian carbamyl
phosphate synthetase (CPS). Proc Natl Acad Sci USA 1990 265 1039ErH)2.
15. Nyunoya H, Broglie KE, Widgren EE, Lusty CJ. Characterisation and
derivation of the gene coding for mitochondrial carbamyl phosphate
synthetase I of rat. J Biol Chem 1985; ZM): 9346-56.
16. Trotta PP, Pinkus LM, Haschmeyer RH, Meister A. Reversible dissociation of
the monomer of glutamine-dependent carbamyl phosphate synthetase into
catalytically active heavy and light subunits. J Biol Chem 1974; 249 492-9.
17. Pierard A, Glansdorff N, Mergeay M, Wiame JM. Control of the biosynthesis
of carbamoyl phosphate in Escherichio coli. J Mol Biol 1965; 14: 23-36.
18. Mergeay M, Gigot D, Beckmann J, et al. Physiology and genetics of carbamoyl
phosphate synthesis in Escherichia coli K12. Mol Gen Genet 1974; 133
299-3 16.
I 28
J. P. Schofield
19. Yang D. Oyaizu Y, Oyaizu H, Olsen GI, Woese CR. Mitochondria1 origins.
Proc Natl Acad Sci USA 1985; 82: 4443-7.
20. Gelehrter TD, Snodgrass PJ. Lethal neonatal deficiency of carbamyl phosphate
synthetase. N Engl J Med 1984 2W: 43C-3.
21. McReynolds JW, Crowley B, Mahoney MI. Rosenberg LE. Autosomal recessive
inheritance of human mitochondria1 carbamyl phosphate synthetase deficiency.
Am J Hum Genet 1981; 33: 345-53.
22. Graf L, Mclntyre P, Hoogenraad N, et al. A carbamyl phosphate synthetase
deficiency with no detectable immunoreactive enzyme and no translatable
mRNA. J lnher Metab Dis 198% 7: 104-6.
23. Morris CJ, Reeve IN. Conservation of structure in the human gene encoding
arginosuccinate synthetase and the orgG genes of the archaebacteria
Methonosorcino borkeri MS and Methonococcus vonnielli. J Bacteriol I988 170:
3 125-30.
24. Blin N, Stafford DW. A general method for isolation of high molecular weight
DNA from eukaryotes. Nucleic Acids Res 1976; 3: 2303-8.
25. Sambrook J, Fritsch EE, Maniatis T., eds. Molecular cloning: a laboratory
manual. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, 1989.
26. Chomczynski P. Sacchi N. Singlestep method of RNA isolation by
guanidinium thiocyanate-phenol-chloroform extraction. Anal Biochem 1987;
162: 1569.
27. Aviv H, Leber P. Purification of biologically active globin messenger RNA by
chromatography on oligwthymidylic acid-cellulose. Proc Natl Acad Sci USA
1972; 6 9 140842.
28. Gubler U. Hoffman BJ. A simple and very efficient method for generating
cDNA libraries. Gene 1983; 2 5 263-9.
29. Benton WD, Davis RW. Screening I g t recombinant clones by hybridisation to
single plaques in situ. Science (Washington, DC) 1977; 196 18C-2.
30. Feinberg AP, Vogelstein B. A technique for radiolabelling DNA restriction
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
endonuclease fragments to high specific activity. Anal Biochem 1984 137:
266-7.
Saiki RK, Gelfand DH, Stoffel S, et al. Primerdirected enzymatic amplification
of DNA with a thermostable DNA polymerase. Science (Washington, DC)
1988; 239 4B7C-9 I.
Girgis SI, Alevizaki M, Denny P, Ferrier GJM, Legon S. Generation of DNA
probes for peptides with highly degenerate codons using mixed primer PCR.
Nucleic Acids Res 1988; 2 6 10371.
Mullis K, Faloona F, Scharf S, Saiki R, Horn G, Erlich H. Specific enzymatic
amplification of DNA in vitro. Cold Spring Harbor Symp Quant Biol 1986; 51:
263-73.
Schofield JP, Vaudin M, Kettle S. Jones DSC. A rapid semi-automated
microtiter plate method for analysis and sequencing by PCR from bacterial
stocks. Nucleic Acids Res 1989; 17: 9498.
Jones DSC, Schofield JP. A rapid method for isolating high quality plasmid
DNA suitable for DNA sequencing. Nucleic Acids Res 1990; IS: 7463-4.
Schofield JP, Vaudin M, Jones DSC. Fluorescent and radioactive solid phase
dideoxy sequencing of PCR products in microtitre plates. Methods Enzymol
1992 (In press).
Craxton M. Linear amplification sequencing, a powerful method for
sequencing DNA. Methods (A companion to Methods Enzymol) 1991; 3 20-6.
Faure M, Camonis JH, Jacquet M. Molecular characterisation of a Dictyostelium
discoideum gene encoding a multifunctional enzyme of the pyrimidine pathway.
Eur J Biochem 1989; 179: 345-58.
Elgar G. Schofield JP. Carbamoyl phosphate synthetase (CPSase) in the PYRIJ
multigene of Dictyostelium discoideum. DNA Sequence 1991; 2: 219-26.
Breathnach R, Benoist C. O'Hare K, et al. Ovalbumin gene: evidence for a
leader sequence in mRNA and DNA sequences at the exon-intron
boundaries. Proc Natl Acad Sci USA 1978; 7 5 4853-7.