* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Molecular studies on an ancient gene encoding
Public health genomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Human genome wikipedia , lookup
Epigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Transposable element wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Molecular cloning wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Primary transcript wikipedia , lookup
Pathogenomics wikipedia , lookup
Copy-number variation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Genomic library wikipedia , lookup
Genetic engineering wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Genome (book) wikipedia , lookup
Point mutation wikipedia , lookup
Gene expression programming wikipedia , lookup
Metagenomics wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene nomenclature wikipedia , lookup
Microsatellite wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome editing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Microevolution wikipedia , lookup
Helitron (biology) wikipedia , lookup
Clinical Science (1993) 84, 119-128 (Printed in Great Britain) i I9 Glaxo/MRS Young Investigator Prize Molecular studies on an ancient gene encoding for carbamoyI-phosphate synthetase J. P. SCHOFIELD MRC Molecular Genetics Unit, Hills Road, Cambridge, U.K. 1. Carbamoyl-phosphate synthetase (EC 6.3.5.5.) catalyses the synthesis of carbamoyl phosphate, the immediate precursor of arginine and pyrimidine biosynthesis, and is highly conserved throughout evolution. The large subunit of all carbamoyl-phosphate synthetases sequenced to date comprises two highly homologous halves, the product of a proposed ancestral gene duplication. The sequences of the enzymes of Escherichia col, Drosophila melanogaster, Saccharomyces cerevisiae, rat and Syrian hamster all have duplications, suggesting that this event occurred in the progenote predating the separation of the major phylae. Until now, only limited data on carbamoyl-phosphate synthetase were available for the primitive eukaryote Dictyostelium discoideum and for the Archaea Methanosarcina barkeri MS. The DNA sequence of the D. discoideum carbarnoylphosphate gene and additional sequence for the carbamoyl-phosphate synthetase gene of M. barkeri MS have been determined, and a duplicated structure for both is clearly demonstrated. 2. Genes with ancient duplications provide unique information on their evolution. A study of the intronlexon organization of the rat carbamoylphosphate synthetase I gene and the carbamoylphosphate synthetase hamster I1 gene in the CAD multi-gene complex shows that at least some of their introns are very old. Evidence is provided that some introns must have been present in the ancestral precursor before its duplication. 3. The human carbamoyl-phosphate synthetase I gene has been isolated and characterized. A human liver cDNA library was constructed and probed for carbamoyl-phosphate synthetase I. A human genomic DNA cosmid library was also probed for the carbamoyl-phosphate synthetase I gene. The cDNA sequence of the human carbamoyl-phosphate synthetase I gene has been determined, and work has been initiated to confirm that at least part of this gene is contained within two cosmids spanning 46 kb. This will enable future studies to be made on mutations in this gene in the rare autosomal recessive deficiency of carbamoyl-phosphate synthetase I. INTRODUCTION Traditional evolutionary studies have been through the applications of paleontology and comparative anatomy. From these studies a systematic view of evolution has been developed, with a central scheme that highly complex multicellular organisms evolved from simple unicellular ones over 600 million years ago. Nucleic acid sequence information has allowed molecular evolutionists to frame questions concerning the origin of cells and the structure of genes within the earliest forms. Molecular data accumulated over the past decade suggests that the common ancestor of all life arose between 3500 and 4000 million years ago [l]. It is now generally accepted that investigations based on the genotype rather than the phenotype are most revealing. The first gene product to be studied extensively, and which established the concept of a molecular chronometer, was cytochrome c [2]. Although its analysis extended the understanding of the eukaryotic branch of the tree, it has been evolving too fast to be used at the earlier phylogenetic levels. Furthermore, this molecule is not found in many bacteria, and is not functionally constant. The quantitative molecular analysis of 16s ribosomal RNAs [3] from several hundreds of organisms has led to the conclusion that there are three separate and distinct cell lineages from which all modern cells are derived: eubacteria, archaebacteria and eukaryotes. This classification supersedes the more traditional division into prokaryotes and eukaryotes, as at the molecular level archaebacteria are no more closely related to prokaryotes than they are to eukaryotes. The subdivisions have Key words: D N A sequencing, gene structure, intronr, molecular evolution, polymerase chain reaction. Abbreviations: CPSase, carbamoyl-phosphate synthetase; PCR, polymerase chain reaction. Correspondence: D r 1. P. Schofield, Clinic 12. Addenbrooke's Hospital, Cambridge CB2 2QQ, U.K. I20 J. P. Schofield recently been named Bacteria, Archaea and Eukarya [4]. N o one of these lineages predates the other two, and all three were derived from a common ancestor, the progenote [4]. Whether the progenote was itself a true organism, or represented a prebiotic state of a primitive genetic order, is unresolved. Eukaryotic genes, as well as a small number of prokaryotic and organellar genes, have long intervening unexpressed sequences (introns) dividing the coding sequence into pieces (exons). The existence of introns in contemporary genomes has led to several mechanistic and historical questions. The debate on the function and origin of introns continues, and this study deals solely with the question of how old are introns, i.e. present from the beginning or inserted later during evolution. Since the discovery of introns 15 years ago [5, 61 many of the original concepts of the structure of genes have had to be completely revised. The ‘introns-early’ school proposes that introns were present in the progenote, and the trend since then has been towards loss [7, 81. Intron loss [9] was explained to be a property of the ability of certain introns to self-splice as a remnant of the proposed original ‘RNA world’ [lo]. All of the genes quoted as examples of exon shuffling are relatively modern in evolutionary terms. In order to examine the origin of introns, the structure of ancient genes encoding proteins fundamental to enzymic pathways in existence predating prokaryotes and eukaryotes (i.e. the progenote) must be studied [S]. A unique insight into the origin of introns would be obtained by studying a gene with an ancient tandem duplication (ancient referring to a gene with representatives from each of the three major domains), the reason for choosing the gene encoding for carbamoyl-phosphate synthetase (CPSase), an ancient gene with a tandem duplication and representatives from each of the three recently defined domains of Bacteria, Archaea and Eukarya. Any candidate gene product proposed as a molecular model from which to study the structural evolution of large vertebrate genes should have representatives from each of the three major lineages. Studies on the phylogenetic relationships using 16s ribosomal RNAs are limited when one begins to pose questions on the evolution of the structure of large eukaryotic genes. The gene product must not share the limitations of cytochrome c. Rather the gene should closely resemble the 16s rRNA parameters if it is to prove useful as a molecular chronometer (see above). The gene encoding for CPSase [carbamoylphosphate synthetase (glutamine-hydrolysing), EC 6.3.5.51 fulfils these criteria, being large, universally distributed, of constant function and highly conserved over great phylogenetic distance. Sequence data for the CPSase gene are published for Escherichia coli [1 11, Saccharomyces cerevisae [121, Drosophila melanogaster [131, Syrian hamster [141 and rat [l5]. CPSase catalyses the formation of the m Glutamine CPSase CPSase I Carbamoyl phosphate 0 II H,N-C-0-P--Q. 0 II I 0- GATare + CO1 2ATP + HIO Fig. I. Synthesis of carbamoyl phosphate. The small (42kDa) subunit, glutamine aminotransferase (GATase), catalyses the hydrolysis of glutamine. Cysteine (C) and histidine (H) are key active-site amino acids. The free amino group is release t o the large (I2OkDa) subunit. T w o ATP molecules are required at nucleotide-binding sites (NBDs). The symmetry of structure and function between the t w o halves of the large subunit has suggested that each half acts in a separate but coordinated mechanism t o catalyse the t w o partial reactions of the phosphorylation of bicarbonate t o carboxy phosphate, and the phosphorylation of carbamate t o carbamoyl phosphate. highly reactive compound carbamoyl phosphate, the immediate precursor for arginine and pyrimidine biosynthesis (Fig. 1). The enzyme is a dimer composed of a small (42kDa) and a large (120kDa) subunit. The small subunit catalyses the hydrolysis of glutamine (requiring active-site cysteine and histidine) releasing the free amino group to the large subunit [16]. The large subunit catalyses the formation of carbamoyl phosphate in a complex reaction between ammonia, carbon dioxide and water. In E . coli, the small and large subunits are encoded by the carA [17] and carB [IS] genes, respectively. The amino acid translation reveals a high degree of homology (39%) between the N-terminal and Cterminal halves of the carB gene, suggesting that it has arisen from an ancient duplication of a smaller ancestral gene [l 11. That the duplication occurred in the progenote would be strongly supported by confirmation of duplication in the CPSase gene of representatives from the two other major lineages, Archaea and Eukarya. The synthesis of carbamoyl phosphate in higher eukaryotes is catalysed by two separate enzymes: CPSase I for arginine biosynthesis, and CPSase I1 for pyrimidine biosynthesis. These two enzymes are encoded by separate nuclear genes, and CPSase I is transported into the mitochondria, whereas CPSase I1 is active in the cytoplasm. By studying the molecular structure of the nuclear-encoded CPSase I gene, whose product is directed into the mitochondrion, insight into the structure of the gene as it existed when captured in the endosymbiont [19] may be inferred. Then, by comparing the duplicated structure with that of the cytoplasmic CPSase I1 relative, the molecular structure of the common ancestor before gene duplication may also be inferred. For example, it should be possible to establish whether the CPSase I1 gene within the CAD gene (a multi-gene complex also encoding Aspartate carbamoyltransferase and Dihydroorotase activity for pyrimidine biosynthesisde nouo) Evolution of the carbamoyl-phosphatase synthetase gene [14] was simply a copy from the CPSase I gene after integration of the latter into the host nuclear genome. This feature is important when considering the evolution of the two enzymes from a common ancestor. Following the accumulation of data on the detailed structure of the CPSase I gene, it is a logical progression to investigate clinical conditions involving molecular defects of the gene. Gelehrter and Snodgrass [20] published the first clear case report of lethal neonatal hyperammonaemia secondary to an almost complete deficiency of CPSase I. An autosomal recessive mode of inheritance was assigned after a family study [21]. A severely hyperammonaemic infant who failed to respond to therapeutic intervention was reported [22]. The authors postulated that the patient either failed to transcribe CPSase mRNA or that the mRNA was transcribed but not translated. They concluded that a distinction between the two possibilities awaited the isolation of a human CPSase I cDNA probe to use in RNA hybridization studies. Fundamental therefore to the elucidation of the molecular defect(s) underlying CPSase I deficiency is the DNA sequence of the normal human CPSase I gene. This would be predicted to be a large project, as it would be inferred from evolutionary studies that the sequence of the human mRNA would closely resemble that of rat CPSase I, at around 5.7kb [l5]. METHODS Genomic DNA isolation DNA was isolated from freeze-dried Methanosarcina barkeri MS by the grinding method in liquid nitrogen [23]. Genomic DNA from the AX-2 strain of the slime mould Dictyostelium discoideum was a kind gift from R. Insall, MRC Laboratory of Molecular Biology, Cambridge, U.K. Rat, hamster and human high-molecular-mass genomic DNA was extracted and purified by standard protocols [24, 251. RNA isolation Total cellular RNA was extracted from normal human liver tissue by a single-step acid-phenol procedure [26]. Polyadenylated mRNA was rapidly purified from the total RNA by oligo(dT) affinity chromatography [27], using a poly(A) Quik mR-NA column (Stratagene, La Jolla, CA, U.S.A.). Human liver cDNA synthesis, library construction and screening Synthesis of first-strand cDNA used purified human liver mRNA as a template in conjunction with avian myeloblastosis virus reverse transcriptase 121 ('Super-RT', Anglian Biotechnology, Colchester, U.K.). Second-strand cDNA replacement synthesis was by nick translation [28]. BamHl synthetic adaptor oligonucleotides were ligated on to the ends of the double-stranded cDNA to increase ligation efficiency into BamHl-digested 2 vector arms. Before ligation the cDNA was size-selected to maximize the yield of full-length cDNA clones. Ligated cDNA replaced the 'stuffer' fragment of 1 phage, and was packaged with a high-efficiency packaging mix (Stratagene). For full coverage of the liver lo6 independent plaques were cDNA library, screened [29]. A high-specific-activity 1 kb CPSase I cDNA probe was generated by the random hexamer method [30]. Positive replica signals were plaquepurified by serial dilution platings. A human genomic DNA cosmid library in Lorist 6 vector with average insert size of 33-45 kb was a kind gift from L. Buluwela, MRC Laboratory of Molecular Biology, Cambridge, U.K. - Polymerase chain reaction (PCR) The PCR [31] was used extensively, and new variations were developed. To prove that the carB gene of M . barkeri MS was a duplication, the Nterminal-encoding half of the gene was amplified using a combination of a specific anti-sense primer and a degenerate sense primer [32]. Computer multiple-alignment of all known primary amino acid sequences of the large subunit of CPSase was performed on a DEC-VAX mainframe computer. The redundant sense primer was designed to amplify from the most highly conserved 5' gene sequence (Fig. 2), and SalI restriction enzyme recognition sites were added to the 5'-end of the primers to facilitate later cloning of the PCR product(s). Reactions were performed either in 0.5 ml Eppendorf tubes, or in thermostable polycarbonate plates (Hi-Temp 96; Techne, Cambridge, U.K., designed by J.P.S.) according to the number of reactions. The PCR mix was prepared on ice to minimize non-specific amplification. A 50 pl reaction mix contained: 0.5-1pg of genomic DNA, 100mmol/l neutralized deoxynucleotide triphosphates, 5 p1 of 10 x reaction buffer (100mmol/l TrisHCl, pH 8.3 at 25"C, 500 mmol/l KCl and 15 mmol/l MgCl,), 1 pmol of each primer/l, 0 . 5 ~ 1of Taq polymerase (2.5 units, Cetus; Norwalk, CT, U.S.A.) and sterile double-distilled water to 50 pl. The mix was overlaid with light mineral oil (50p1), before a brief vortex and pulse centrifugation. A programmable thermocycler (Techne PHC-2, Cambridge, U.K.) was pre-heated to 95°C before incubating the reaction tubes in the machine to minimize nonspecific amplification. Amplification profiles differed according to the hybridization temperature of the primer pair, as well as the predicted length of the product. An amplification profile for the D. discoideum CPSase I1 gene J. P. Schofield I N-terminus Rat CPSase I Hamster CAD Ormophilo CAD Yeast U R A l LILGSGGLSIGOAGEFDYSGSOA LILGSGGLSIGOAGEFDYSGSOA .................YSGSOA LVIGSGGLSIGOAGEFDYSGSOA LILGSGGLSIGOAGEFDYSGSOA Ll~GAGPlVlGDACEFDYSGAOA D. diaoideum C-terminus Rat CPSase I Hamster CAD Drosophila CAD Yeast URAZ car6 D. discoideum E. co11 corAB ................ E. coli car0 M. borkeri MS PCR sense primer: ~ A G E F D Anti-sense PCR primer: Y 5 ' ACTGLCGAC. CAGGCAGGAGAATTCGATTA 3 ' Sol1 A G T C G C T G T R C t P S Y V L 5 : CGT. CCT. TCC. T A T GTG C T T 3 3 GCA GGA. AGG. ATA. CAC. GAA. r a G C l G T C A 5 Sol I 256-fold redundancy I Fig. 2. Computer design of M. barkeri PCR primers. Computer multiple alignment of all known primary amino acid sequences of the large subunit of CPSase was performed on a DEC-VAX mainframe computer. The alignment showed long stretches of highly conserved sequences between all species. This information was used in conjunction with the partial DNA sequence data available for the M. barkeri car6 gene [23] to select the anti-sense primer RPSYVL. The primer was the reverse translation of the published nucleic acid sequence, and was extended at its 5' end to include a So11 (Sal I) restriction enzyme recognition site t o facilitate cloning of the PCR product. To obtain the longest car6 PCR product the most conserved sequence at the N-terminus was used from which to design a redundant sense primer QAGEFDY. 3.1 kb h -b 2.4kb CPSase DD ; I DHOase I 1 1 kb 2.4 b mb 4 2.3 420 Fig. 3. PCR strategy for the CPSase gene of D. discoideum (DD). The PCR oligonucleotide primers, A and 8,amplified a 2.4kb single fragment, seen here run on a 0.6% agarose gel against 1 Hindlll size markers. Abbreviations: GATase, glutamine aminctransferase; DHOase, dihydro-orotase, ATCase, aspartate carbamoyltransferase. was 35 cycles of: 95°C strand dissociation for OSmin, 58°C primer annealing for 0.5min and 72°C enzyme extension for 1 min (predicted product size of 2.4 kb, Fig. 3). For the rat and hamster CPSase genes, PCR primers were derived from the known ;DNA sequences and were designed to flank computer-predicted intron sites (Fig. 4). As the length of the introns was unknown, the PCR cycle profiles were adjusted for individual primer pairs. In some instances non-specific amplification was only circumvented PCR [33]. by the application of nested DNA cloning and recombinant screening Before cloning of cDNA or genomic DNA PCR products into M13 phage or plasmid host, one-tenth of the reaction was subjected to agarose mini-gel electrophoresis [25] to determine the number of bands, their size and approximate yield. If necessary, Evolution of the carbamoyl-phosphatase synthetase gene Primer I A (a) I23 Primer I B Gene '\ ? .,,* ,. ,. , a I I cDNA ,' . I 1 A Predicted intron site (b) I ? -c I d - -lntron ? g f ? j -h k ! m ? 1 c- n I I u 3.8 kb e Fig. 4. PCR strategy for intronlexon gene dissection. (a) PCR across intron/exon boundaries. Oligonucleotide primers used in PCR amplification (e.g. I A / I B) were recessed from the predicted intron-exon junction to facilitate rapid sequence orientation of the known cDNA open reading frame to the non-coding intron sequence. (b) Nested PCR of large DNA fragments. Primers A and B amplified a large product, which served as input template for subsequent internal amplifications across predicted intron-exon boundaries, e.g. e+h,g+k, m+B, etc. PCR products were further gel-purified and digested with appropriate restriction enzyme(s) according to established procedures [25]. Ligation was into a similarly digested host vector, and transformed into competent E . coli cells. Recombinant screening was rapidly performed by PCR in thermostable polycarbonate plates [34], using Universal M 13 forward and reverse sequencing primers as PCR primers. Mini-preparation of plasmid recombinant DNA [35] provided a sufficiently pure template for DNA sequencing. DNA sequencing DNA sequencing was by modifications of the dideoxy chain termination method [25]. Alternative methods were also used to avoid cloning and recombinant screening of PCR products before DNA sequencing. The two most reliable were the techniques of solid-phase sequencing of 5'-biotinlabelled PCR products to streptavidin-coated paramagnetic beads [36], and linear amplification sequencing [37]. In the latter method chain termination sequencing was with four spectrally distinct fluorescent dye-labelled dideoxynucleotides (DyeDeoxy Terminators; Applied Biosystems, Foster City, CA, U.S.A.). The terminated products were electrophoresed on an Applied Biosystems 373A semi-automated sequencing machine consisting of a laser excitation source coupled to a microcomputer for data acquisition and analysis. Sequence data for human CPSase I cDNA was input into a computer database, contigs joined and assembled using the Staden packages run on a DEC-VAX mainframe computer. RESULTS M. barkeri MS cars gene Semi-redundant PCR of M . barkeri MS genomic DNA resulted in several products. The inter-primer predicted distance was around 2kb, yet the dominant product was much smaller at around 0.4 kb. The sequence of this product revealed that the redundant primer had annealed to the similar sequence at the 5' end of the C-terminal-encoding half rather than the 5' end of the N-terminalencoding half of carB. The redundant sense primer was redesigned, as well as a new anti-sense primer, to inhibit dual priming of the sense primer. A product of the predicted 1.6kb (Fig. 5) was amplified and directly sequenced on the ABI 313A automated sequencer. Sufficient sequence information was determined to clearly demonstrate that the carB gene of the Archaea M . barkeri MS has an internal duplication, and that the duplication is at an equivalent position to that in the E . coli carB gene. D. discoideum CPSase II gene The complete nucleotide and derived amino acid sequences of the D. discoideum CPSase I1 gene within the PYRl-3 multigene [38] confirm a clear gene duplication [39]. Alignment of the N- and Cterminal halves shows 28.7% sequence identity and 51% sequence similarity. There are 3126 nucleotides of open reading frame, encoding 1042 amino acids, and no introns (EMBL no. X55433). These data establish that the CPSase I1 gene of the eukarya D. discoideum has a tandem duplicated structure, J. P. Schofield I24 M I , i encoding the N-terminal half and 28.7kb for the C-terminal half. Several areas of the gene were only amplified by using a nested approach, and primer pairs were designed to overlap both upstream and downstream of predicted intron positions to ensure complete coverage. The first intron of the gene encoding the N-terminal half is one codon downstream from the predicted site when compared with intron 2 of the C-terminal half. The other concordant intron position is intron 5 of the N-terminal half with intron 9 of the C-terminal half. This intron is in exactly the same place and phase in each half of the duplicated CPSase I gene. The 3.2 kb CPSase I1 cDNA sequence of Syrian hamster CAD [14] is a duplication of 1.6kb halves. PCR amplification of each half demonstrated a 6.6 kb product for the N-terminalpredominant encoding half and -3.8kb for the C-terminalencoding half, the size difference being accounted for by introns (Fig. 6). Cloning of these products consistently failed, most likely as a result of insert instability. Definition of the intron-exon structure of the hamster CPSase I1 gene was achieved by secondary amplification from the large PCR products as a template in conjunction with internal primer pairs. The gene is composed of 17 introns, divided between eight introns in the N-terminal-encoding half and nine in the C-terminal-encoding half. All the introns observed the GT-AG consensus [40], the intron lengths ranging from -0.1 to -3 kb. A computer alignment of the structures for the rat CPSase I and hamster CPSase I1 genes indicated clear homology, with a common tandem duplication structure. A comparison between the intron positions for each half of each gene indicates that at least two pairs are concordant, e.g. intron 5 is concordant between all halves of rat and hamster CPSase genes. Several other introns are concordant, e.g. between the gene encoding the N-terminal half of hamster CPSase I1 and the C-terminal half of rat CPSase I (Fig. 7). - - rn lm VA Human CPSase I gene which it shares with the Bacteria E. coli carB and the Archaea M . barkeri MS carB genes. Genomic organization of rat CPSase I and hamster CPSase II genes In contrast to the 13 introns of the rat CPSase I gene encoding the C-terminal half, the N-terminal half is expanded by only eight introns. All the boundaries conform to the GT-AG consensus sequence for nuclear pre-mRNA introns [40]. These introns add a further -13kb to the 1656 nucleotides of the exon sequence. The gene spans approximately 43 kb, divided between 14.6kb for the gene - It was predicted that the strong homology for CPSase I would apply to the human liver mRNA. A pair of rat CPSase I primers designed to amplify the N-terminal-encoding half of the large subunit were used to amplify human liver cDNA. The product yield was increased by performing a second round nested amplification with a pair of internal primers (Fig. 8). Cloning and sequencing of the 1 kb product confirmed it as encoding CPSase I with high sequence homology to rat, but not absolute identity and therefore not a contaminant. A human liver library of high titre (5 x 10' plaque-forming units/,ug of cDNA) was constructed and probed under stringent conditions (65°C overnight) with radiolabelled human CPSase I cDNA PCR product. Screening resulted in five purified plaques, from which DNA was purified and sequenced. The - Evolution of the carbamoyl-phosphatase synthetase gene I CPSase A I25 CPSase B 6.6 kb Fig. 6. Large fragment amplification of the hamster CAD CPSase II gene. The gel photographs show the results of amplifying each half of the tandem gene duplication. These products were themselves used for internal PCR at predicted intron positions (Fig. 4b). nucleotide sequence for human CPSase I was obtained from both strands to confirm the sequence (J. P. Schofield, unpublished work). There is 98% amino acid sequence identity with rat CPSase I, with tandem duplication of the large subunit. A human genomic cosmid library screen for CPSase I gene resulted in two clones. Restriction enzyme analysis estimates that the two clones span -46kb of the humans CPSase I gene. Partial sequencing of one of the clones confirms that it contains the CPSase I sequence. DISCUSSlON Genes with ancient duplications provide unique information on their evolution. The highly conserved product of the CPSase I gene is a powerful new molecular model for gene evolution. In proposing that the tandem gene duplication had occurred in the progenote it is clearly important to provide representatives from each of the three domains: Bacteria, Archaea and Eukarya [4]. During DNA sequencing upstream of the argC gene in the Archaea M . barkeri MS, Morris and Reeve [23] made the chance discovery of the 3' end of the carB gene. Unfortunately, there was insufficient DNA sequence information to establish whether there was a tandem gene duplication like that in the carB gene of E . coli. There was, however, sufficient sequence to apply successfully an adaptation of the PCR using a redundant amplification primer [32]. If this technique had not been available, a genomic library would have been required, and probed with a 5' sequence from the known carB gene. The carB gene of M . barkeri has now been proven to be a tandem duplication, the junction occurring at the same position as in E. coli carB. Tandem duplications have been clearly demonstrated in each of the three domains. The hypothesis that the CPSase gene duplicated in their common ancestor, i.e. the progenote, is now conclusive. Having established this single duplication event, the question now focuses on the origin of introns, using the CPSase gene as a unique model. Introns had previously been described in the 3' half of the rat CPSase I gene [l5]. The hypothesis was that if introns were present in the common progenotic ancestral gene before duplication, then several, if not all, should be in concordant positions when comparing the two halves of intron-containing genes. As the rat CPSase I cDNA and the partial gene sequence were known, by performing a computer alignment of the two cDNA halves and marking the position of the known introns, predictions of the position of the remaining 5' introns could be made. The PCR was used in a novel application to amplify across predicted intron-exon boundaries. The alternatives would have been heteroduplex mapping or genomic library screening and sequencing. For this particular requirement the former would have been too insensitive as exact intron-exon boundaries were necessary for comparative purposes. A potential major limitation of the PCR-based technique is the upper size limit of intron which can be amplified. This research demonstrates that prolonged extension times of lmin/kb of target template are excessive, and by similarly decreasing the annealing and extension times the total cycling time can be significantly reduced. There is the added benefit of retaining activity of the relatively thermo- ). P. Schofield I26 the situation is rather more complex, with the introns being of various ages, some truly ancient, with others having been inserted or lost. To provide further supportive information, the FLPITPOFVTEVIKAERPDGLILGMGGOTALNCGVELFKRGVLKEYGVKVLGTSVESINA RATNSEO FEELSLERILDIYH~ACNGCIISVGGOIPNNLAVPLYKNGV...... KIMGTSPLOIOR RATCSEO Syrian hamster CPSase I1 gene within the CAD F L P I T P H Y V T ~ I R N E R P O G V L W T F G G O T A L N C G V E L T K A G V L A R Y G V R V L G T P V IOL E~ HAMNSEO FOEISF~VMDlYELENPOGVlLSMGGOLPNNMAMALHROOC...... RVLGlSPEAlDS multi-gene complex [14] was similarly dissected by HAMCSEO . . . . . . . . . . :. .A . . .. . the PCR. The cDNA sequence was computer l i O R O - F S D I ( L h E I N € < I A O S F A E&DA < A A C I I GYPVMl R S A I A - G Z - G S G I C’hC RATNSEO A E O R S l i S A V L O E ~ L V A O A P W < A \IaEA E F A h S V S I P C - . R P S I V - S S A M h V ~ : S E RATCSEO aligned with the rat sequence, and the intron I E D R R A F A A R 3 A E I G E n V A P S E A NSLE& A A A E R ~ G Y P V ~ V R A A F A ~ C ~ L G S G F A S I L HAMNSEO positions of the latter were marked. The primary AEhRFCFSR..OIIGISOP3IRE SO-EJA 3 i C O l I G Y P C V ~ R P S V ~ . S G A A M h V A Y l O HAMCSEO ... ...... amino acid sequences of rat CPSase I and hamster El-MO-GT&-FAMThOl-VERSVlGY<E I E ~ E V V R O A O O h C V l V C h M E ~ V O A M C V ~ ~ G ~ RATNSEO CPSase I1 are highly homologous, the products of a DEM~RF~EEATRVS~nPVV~ILFIEGAREVEMOAV~C~~G~VIS~AISE~V~OA GV~SG RATCSEO € E L S A - V A P A - - F A * l S O I . IO<S-LGwI(E I E Y E I V R O A Y G h C v ~ C l ~ E \ ~ O P ~ G l ~ l G common tandemly duplicated ancestor. The hamster HAM N SE0 GOCERF~SSAAAYSKinPVVlSLFlO~A~lD~DAv-ACnG~VS~l~lSEhVEh~C~~SG HANCSEO . . . . gene was found to contain 18 introns, nine for each O S V V V A P A O T ~ S h A i F O M ~ R ? l S l h Y V S * . G I V C E C h- . A . n P l S M E Y C I I E V h A R - S R half. One of these is in the same position and phase RATNSEO OAl~M~P~OllSOGAlE~VLOAlRClALAFAlSGPFhVOFLVKGhOV~~-VlECh~RASR RATCSEO in each half, as well as concordant with one of the E S I Y V A P S O l . \ O R E ~ C ~ . R R l A l < ~ T o n - G I VGEChVOYA-hPESEbY I I I EVhAR-SR HAMNSEO O A l . V T P P O O I l P < l . E P I < A l V ~ A V G O E ~ O V ~ C P F h ~ O L l A ~ ~ O O ~ C - - V l ~ C h V R V S R proposed ancient pair of introns in the rat CPSase I HAM C S E 0 .. ... ...... gene. Furthermore, three single introns in the S S A - A S < A l & P L A F IAAC1A.G- - - - - IP.PE I < h ~ V S G ~ l S A ~ ~ C P S . O ~ M V ~ C l P R Y RATNSEO S F P F V S < T . G V ~ F I O U A T L V M ~ C E S V D E ~ ~ . P ~ . E O P ~ I P S . . . . . . ~CY. .A . .P l .F RATCSEO hamster gene are concordant with other rat introns, SSALASKATGYPLAYVAAKLALG.....IPLPELR~ SVTGGTAA.F€PSLOYCVVKlPRW HANNSEQ suggesting that these too are ancient in origin. The SFPFVSKTLGVDLVALATRIlMGEKVEPIGL......NTGS......... CVVGV.VPOF HAMCSEO . .:. . ..... fact that the rat CPSase I and hamster CPSase I1 RATNSEO genes have different intron-exon structures proves RATCSEO HAMNSEO that one is not simply a duplication of the same HAMCSEO nuclear-encoded gene. Rather, it is likely that the LOL- -RKELSEPSSTR IY A IA&LENN.WLOE IV K L T S IO K W F L Y K M R OILNMDKTLKGL RATNSEO CPSase I gene was introduced into the eukaryotic --FLGVAEOLHNEGF~LFATEAT--SDWLNANNVPATPVA-W---PSOE---GONPSLSS RATCSEO nuclear genome along with the majority of other ......VELETPTDKRlFVVAAALWAGYSVERLYELTRlOCWFLHRMKRIVTHAOLLEOH HAMNSEO SELLPTVRLLESLGYSLYASLGT--ADFYTEHGV~VTAVO-W---HFEEAVDGECPPORS HAMCSEO mitochondria1 genes after endosymbiosis. In con. . trast, the CPSase I1 gene was probably indepenNSESVTEETLROAKEIGF--SDKOISKCLGLTEAOTRELRLKKNIHPWVK~DTLAAEYP RATNSEO IRIYIRDGSIOLVINLP.....NNMNTKFVHONYVIRRTAVDSC~........ ALLTNF. RATCSEO dently acquired from another source. R G O P L S O O L L H O A K C L G F - - S O K O IA L A V L S ~ E L A V R K L R O E L G I C P A V K OIDTVAAEWP HAMNSEO ILDOLAENHFELVINLSMRGAGGRRLSSFVTKGYRTRRLAADFSV .......P L I I O I K Further support for an ancient origin of some HAMCSEO introns in the CPSase gene was sought by comSVTNYL-YVTYNG6EHDIKFD-EH RATNSEO VTKLFAEAV-OKARTVOSKSLFYR RATCSEO pleting the DNA sequence of an early eukaryote, AOTNYL-YLTYWGNTHOLOF---R HANNSEO the slime mould D . discoideum. Faure et al. [38] CTKLFVpLGOIGPAPPLKVHVDC HANCSEO obtained a partial sequence from each end of the 92 CPSase I1 gene in the PYR1-3 gene complex (equivalent to CAD). They predicted that the CPSase Fig. 7. Alignment of rat and hamster CPSase. Primary amino acid moiety would be intron-less, as they had found to sequences for the N-terminal (RATNSEQ, HAMNSEQ) and C-terminal be the case for the rest of the PYR1-3 multi-gene. (RATCSEQ, HAMCSEQ) halves of rat and hamster CPSase were computer aligned. lntron positions are indicated (v), The most highly conserved However, some D . discoideum genes contain very ancient introns between the rat and hamster genes are boxed. short introns, and so the missing -2.4kb of the CPSase gene was amplified by PCR as a preliminary to DNA sequencing. This confirmed that the stable enzyme Tuq polymerase, and so increasing gene was again a tandem duplication with the same the amplification efficiency. Using these techniques junction between halves as for other species. The the 5’ half of the rat CPSase I gene was shown to gene was uninterrupted, as had been predicted. A have eight introns, two of which were concordant further practical problem arose, namely the difficulty with those in homologous positions in the 3‘ half. If in sequencing the cloned D. discoideum DNA when introns had been inserted after the duplication in a pUC plasmid. This was circumvented by subevent, coincidences would have been highly unlikely. cloning into M13, as well as using the solid-phase The conclusion is that the two concordant introns sequencing technique to walk along the 2.4 kb PCR were already present in the single ancestral gene product [391. before duplication. Whether they have been selecIn conclusion, the CPSase gene is a tandemly tively retained because they are at significant duplicated progenotic gene. The single gene conpositions separating functional domains of the tained several introns before duplication in the translated protein remains to be elucidated, as little progenote. All of the introns have been lost from information is currently available on the CPSase Bacteria and Archaea as a result of selective evolufolded protein structure. With regard to the remaintionary pressure to streamline their genomes. Simiing discordant introns, it is only possible to specularly the introns were lost from the Eukarya D. dislate that either they were inserted later or are the coideum. The structure of the CPSase gene in rat residue of a much larger number of introns in the and hamster is a mosaic of introns of various ages, ancestral gene, some of which have been randomly the concordant pairs being the most ancient. An lost with the passage of time. It is more likely that alternative hypothesis is that the ancestral gene was RATNSEO RATCSEO HAMNSEO HAMCSEO - u I . Evolution of the carbamoyl-phosphatase synthetase gene 5’ I I CPSase A 4 I I I * CPSase B I27 3’ 4 12B 13B I Ikb -9 Fig. 8. Nestec before cloning maximize the p run against I I uninterrupted, introns being inserted later during evolution. The knowledge of the intron-exon structure, and techniques accumulated from the previous experiments were applied to the isolation and sequencing of human CPSase I cDNA. This is the first step towards a molecular understanding of the rare autosomal recessive disease of CPSase I deficiency. Cosmids have also been isolated as part of the goal to develop a genomic DNA analysis of diseased patients and carriers. This would then allow a simple blood sample to be analysed as a useful screening procedure. Either of the primers could be modified to facilitate a simple colorimetric assay after PCR, and before direct DNA sequencing of the amplified disease locus (or loci). There are several possible modifications of the techniques described to be explored, applicable to both CPSase I deficiency and other more common dieases. ACKNOWLEDGMENTS This work was supported by an MRC Training Fellowship. I am indebted to Professor Sydney Brenner for his continuing support, advice and encouragement. REFERENCES I. Fox GE, Stackebrandt RB, Hespell 1, et a1 The phylogeny of prokaryotes. Science (Washington, DC) 1980; 20% 45763. 2. Fitch WM, Margoliash E. Construction of phylogenetic trees. Science (Washington, DC) 1967; 155 27-4. qested primers (3A and 128) were used t o generate a single product screening. The most 5’ sequence of the large subunit was amplified t o rose gel photograph shows identical samples of PCR products ( - I kb) 3. Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA 1977; 7 4 5088-90. 4. Woese CR, Kandler 0, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eukarya. Proc Natl Acad Sci USA 1990 81: 4576-9. 5. Berget SM, Moore, C, Sharp PA. Spliced segments at the 5’ terminus of adenovirus 2 late mRNA. Proc Natl Acad Sci USA 1977; 7 4 3171-5. 6. Chow LT, Gelinas RE, Broker TR, Roberts RT. An amazing sequence arrangement at the 5‘ ends of adenovirus 2 messenger RNA. Cell 1977; I 2 1-8. 7. Darnell JE. Doolittle WF. Speculations on the early course of evolution. Proc Natl Acad Sci USA 1986; 8 3 1271-5. 8. Gilbert W, Marchionni M, McKnight G. On the antiquity of introns. Cell 1986; 46: 151-3. 9. Scraphin B, Boulet A, Simon M, Faye G. Construction of a yeast strain devoid of mitochondrial introns and its use t o screen nuclear genes involved in mitochondrial splicing. Proc Natl Acad Sci USA 1987; 84. 68104. 10. Joyce GF. RNA evolution and the origins of life. Nature (London) 1989; 338: 217-23. I I. Nyunoya H, Lusty CJ. The car6 gene of Escherichia coli: a duplicated gene coding for the large subunit of carbamoyl phosphate synthetase. Proc Natl Acad Sci USA 1983; 80: 4629-33. 12. Lusty CJ, Widgren EE, Broglie KE, Nyunoya H. Yeast carbamyl phosphate synthetase. J Biol Chem 1983; 258: 14466-72. 13. Freund IN, Jarry BP. The rudimentary gene of Drosophila melanogaster encodes four enzymatic functions. J Mol Biol 1987; I 9 3 1-13. 14. Simmer JP, Kelly RE, Rinker AG, Scully JL, Evans DR. Mammalian carbamyl phosphate synthetase (CPS). Proc Natl Acad Sci USA 1990 265 1039ErH)2. 15. Nyunoya H, Broglie KE, Widgren EE, Lusty CJ. Characterisation and derivation of the gene coding for mitochondrial carbamyl phosphate synthetase I of rat. J Biol Chem 1985; ZM): 9346-56. 16. Trotta PP, Pinkus LM, Haschmeyer RH, Meister A. Reversible dissociation of the monomer of glutamine-dependent carbamyl phosphate synthetase into catalytically active heavy and light subunits. J Biol Chem 1974; 249 492-9. 17. Pierard A, Glansdorff N, Mergeay M, Wiame JM. Control of the biosynthesis of carbamoyl phosphate in Escherichio coli. J Mol Biol 1965; 14: 23-36. 18. Mergeay M, Gigot D, Beckmann J, et al. Physiology and genetics of carbamoyl phosphate synthesis in Escherichia coli K12. Mol Gen Genet 1974; 133 299-3 16. I 28 J. P. Schofield 19. Yang D. Oyaizu Y, Oyaizu H, Olsen GI, Woese CR. Mitochondria1 origins. Proc Natl Acad Sci USA 1985; 82: 4443-7. 20. Gelehrter TD, Snodgrass PJ. Lethal neonatal deficiency of carbamyl phosphate synthetase. N Engl J Med 1984 2W: 43C-3. 21. McReynolds JW, Crowley B, Mahoney MI. Rosenberg LE. Autosomal recessive inheritance of human mitochondria1 carbamyl phosphate synthetase deficiency. Am J Hum Genet 1981; 33: 345-53. 22. Graf L, Mclntyre P, Hoogenraad N, et al. A carbamyl phosphate synthetase deficiency with no detectable immunoreactive enzyme and no translatable mRNA. J lnher Metab Dis 198% 7: 104-6. 23. Morris CJ, Reeve IN. Conservation of structure in the human gene encoding arginosuccinate synthetase and the orgG genes of the archaebacteria Methonosorcino borkeri MS and Methonococcus vonnielli. J Bacteriol I988 170: 3 125-30. 24. Blin N, Stafford DW. A general method for isolation of high molecular weight DNA from eukaryotes. Nucleic Acids Res 1976; 3: 2303-8. 25. Sambrook J, Fritsch EE, Maniatis T., eds. Molecular cloning: a laboratory manual. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, 1989. 26. Chomczynski P. Sacchi N. Singlestep method of RNA isolation by guanidinium thiocyanate-phenol-chloroform extraction. Anal Biochem 1987; 162: 1569. 27. Aviv H, Leber P. Purification of biologically active globin messenger RNA by chromatography on oligwthymidylic acid-cellulose. Proc Natl Acad Sci USA 1972; 6 9 140842. 28. Gubler U. Hoffman BJ. A simple and very efficient method for generating cDNA libraries. Gene 1983; 2 5 263-9. 29. Benton WD, Davis RW. Screening I g t recombinant clones by hybridisation to single plaques in situ. Science (Washington, DC) 1977; 196 18C-2. 30. Feinberg AP, Vogelstein B. A technique for radiolabelling DNA restriction 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. endonuclease fragments to high specific activity. Anal Biochem 1984 137: 266-7. Saiki RK, Gelfand DH, Stoffel S, et al. Primerdirected enzymatic amplification of DNA with a thermostable DNA polymerase. Science (Washington, DC) 1988; 239 4B7C-9 I. Girgis SI, Alevizaki M, Denny P, Ferrier GJM, Legon S. Generation of DNA probes for peptides with highly degenerate codons using mixed primer PCR. Nucleic Acids Res 1988; 2 6 10371. Mullis K, Faloona F, Scharf S, Saiki R, Horn G, Erlich H. Specific enzymatic amplification of DNA in vitro. Cold Spring Harbor Symp Quant Biol 1986; 51: 263-73. Schofield JP, Vaudin M, Kettle S. Jones DSC. A rapid semi-automated microtiter plate method for analysis and sequencing by PCR from bacterial stocks. Nucleic Acids Res 1989; 17: 9498. Jones DSC, Schofield JP. A rapid method for isolating high quality plasmid DNA suitable for DNA sequencing. Nucleic Acids Res 1990; IS: 7463-4. Schofield JP, Vaudin M, Jones DSC. Fluorescent and radioactive solid phase dideoxy sequencing of PCR products in microtitre plates. Methods Enzymol 1992 (In press). Craxton M. Linear amplification sequencing, a powerful method for sequencing DNA. Methods (A companion to Methods Enzymol) 1991; 3 20-6. Faure M, Camonis JH, Jacquet M. Molecular characterisation of a Dictyostelium discoideum gene encoding a multifunctional enzyme of the pyrimidine pathway. Eur J Biochem 1989; 179: 345-58. Elgar G. Schofield JP. Carbamoyl phosphate synthetase (CPSase) in the PYRIJ multigene of Dictyostelium discoideum. DNA Sequence 1991; 2: 219-26. Breathnach R, Benoist C. O'Hare K, et al. Ovalbumin gene: evidence for a leader sequence in mRNA and DNA sequences at the exon-intron boundaries. Proc Natl Acad Sci USA 1978; 7 5 4853-7.