* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genomic Analysis of Hox Clusters in the Sea Lamprey
Essential gene wikipedia , lookup
Long non-coding RNA wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Public health genomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Transposable element wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Point mutation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Human genome wikipedia , lookup
Gene desert wikipedia , lookup
Genome (book) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Non-coding DNA wikipedia , lookup
Minimal genome wikipedia , lookup
Designer baby wikipedia , lookup
Genomic imprinting wikipedia , lookup
Microevolution wikipedia , lookup
Genome editing wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome evolution wikipedia , lookup
Helitron (biology) wikipedia , lookup
Metagenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic library wikipedia , lookup
Ridge (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
JEZ Mde 2054 JOURNAL OF EXPERIMENTAL ZOOLOGY (MOL DEV EVOL) 294:47–62 (2002) Genomic Analysis of Hox Clusters in the Sea Lamprey Petromyzon marinus STEVEN Q. IRVINE1, JANET L. CARR2†, WENDY J. BAILEY3, KAZUHIKO KAWASAKI4, NOBUYOSHI SHIMIZU5, CHRIS T. AMEMIYA6, 1 AND FRANK H. RUDDLE * 1 Yale University, Department of Molecular, Cellular and Developmental Biology, New Haven, Connecticut 2 Genaissance Pharmaceuticals, New Haven, Connecticut 3 Merck and Co., Inc., Department of Bioinformatics, West Point, Pennsylvania 4 Department of Anthropology, Pennsylvania State University, University Park, Pennsylvania 5 Keio University School of Medicine, Department of Molecular Biology, Tokyo, Japan 6 Virginia Mason Research Center, Seattle, Washington ABSTRACT The sea lamprey Petromyzon marinus is among the most primitive of extant vertebrates. We are interested in the organization of its Hox gene clusters, because, as a close relative of the gnathostomes, this information would help to infer Hox cluster organization at the base of the gnathostome radiation. We have partially mapped the P. marinus Hox clusters using phage, cosmid, and P1 artificial chromosome libraries. Complete homeobox sequences were obtained for the 22 Hox genes recovered in the genomic library screens and analyzed for cognate group identity. We estimate that the clusters are somewhat larger than those of mammals (roughly 140 kbp vs. 105 kbp) but much smaller than the single Hox cluster of the cephalochordate amphioxus (at more than 260 kb). We never obtained more than three genes from any single cognate group from the genomic library screens, although it is unlikely that our screen was exhaustive, and therefore conclude that P. marinus has a total of either three or four Hox clusters. We also identify four highly conserved non-coding sequence motifs shared with higher vertebrates in a genomic comparison of Hox 10 genes. J. Exp. Zool. (Mol. Dev. Evol.) 294:47–62, 2002. © 2002 Wiley-Liss, Inc. The Hox classes of homeobox transcription factor genes are conserved in all animals and have critical roles in developmental patterning (DeRobertis, ’94). The discovery that these genes are arrayed in conserved genomic clusters and that their position in the cluster is related to their expression patterns along the body axis is one of the most striking findings of modern biology (Gehring, ’94). All invertebrates have a single Hox cluster. All vertebrates examined to date, on the other hand, have multiple Hox clusters, with as many as seven found, for example, in the zebrafish (Amores et al., ’98). The ubiquitous presence of multiple Hox clusters in vertebrates, given the single cluster in invertebrates, has led to the idea that cluster multiplicity is of great importance in the evolution of vertebrate lineages and in the control of developmental patterning (Holland and Garcia-Fernandez, ’96a; Ruddle et al., ’99). A history of cluster duplications in the verte© 2002 WILEY-LISS, INC. brates would help explain the significance of different cluster numbers in various lineages. If the history of the clusters varies between lineages it would have profound implications for the evolution of the gene orthologs and their associated regulatory sequences. For example, if there have been independent cluster duplications within separate vertebrate lineages, such as occurred in teleost fish, it means that these independently derived clusters are not directly homologous to clus† Co-first author. Grant sponsor: National Science Foundation; Grant numbers: IBN9630567 (FHR), DBI-9803937 (SQI); Grant sponsor: National Institutes of Health; Grant number: GM-09966 (FHR); Grant sponsor: National Science Foundation/Alfred P. Sloan Postdoctoral Research Fellowship in Molecular Evolution; Grant number: DBI-9803937 *Correspondence to: Dr. Frank H. Ruddle, Department of Molecular, Cellular and Developmental Biology, Yale University, P.O. Box 208103, New Haven, CT 06520–8103. Email: [email protected] Received 14 August 2001 Accepted 10 January 2002 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/jez.10090 48 S.Q. IRVINE ET AL. ters in other vertebrates, and have their own evolutionary histories. The agnathan vertebrates are the most primitive true vertebrate taxa extant (Forey and Janvier, ’93). Thus they occupy a pivotal phylogenetic position between the invertebrate cephalochordates such as amphioxus, which has one cluster (Garcia-Fernandez and Holland, ’94), and the gnathostomes, or all other living vertebrates, which have multiple clusters. While most agnathan groups are extinct, two still survive, the lampreys and hagfishes, forming a group called the cyclostomes. A traditional view based on morphological study holds that the lampreys are more closely related to gnathostomes than hagfishes (Forey and Janvier, ’93; Rasmussen et al., ’98). Molecular studies have found either a monophyletic grouping of lampreys and hagfishes (Mallatt and Sullivan, ’98; Stock and Whitt, ’92) or support for the traditional view (Rassmussen et al., ’98). Because the lamprey is far more amenable to laboratory culture and embryology than the hagfish, we have chosen to assess Hox cluster organization in the sea lamprey Petromyzon marinus. Regardless of the phylogenetic relationships within the cyclostomes, examination of the Hox cluster in the lamprey will help to reconstruct the ancestral state of the Hox clusters at the initial radiation of the vertebrates. Three previous studies (Pendleton et al., ’93; Sharman and Holland, ’98; W. J. Bailey, unpublished) have used polymerase chain reaction (PCR) screening to estimate gene composition and cluster number in lampreys. These data, in combination with the present mapping study in the sea lamprey Petromyzon marinus, indicates that either three or four Hox clusters exist in this species. We discuss the implications of this prediction for the evolution of vertebrate Hox clusters. MATERIALS AND METHODS P. marinus libraries Three P. marinus genomic libraries were used in this study. The first was a cosmid library constructed by a commercial laboratory (Stratagene, La Jolla, CA) (Pendleton et al., ’93). The second was a lambda phage library made as outlined below. Finally, we screened a P1 Artificial Chromosome (PAC) library (Amemiya et al., ’96). Construction and plating of phage library Lamprey genomic DNA prepared from embryos from a single-pair mating was provided by J. W. Pendleton. DNA was partially digested with Sau3AI using 0.4–0.6 units for 30 µg of DNA in 1 ml, 37°C for 1 hour. Size-fractionated DNA was ligated with LambdaGEM-11 BamHI arms (Promega, Madison, WI), packaged in GigaPack Gold (Epicentre Technologies, Madison, WI) and titered using KW251 cells. For the primary screen of the library, 8×105 Pfu’s total were screened by hybridization to a Hox cognate group 11 probe (see below). Analysis of genomic library clones Primary picks were amplified by PCR with a 3′ modification of the HoxE and HoxF primers (Bartels et al., ’93) that allows them to be used with the CloneAMP pAMP1 kit (GibcoBRL, Rockville, MD). PCR was performed as in Bartels et al. (’93). PCR products were cloned into the pAMP1 cloning vector following manufacturer’s guidelines. Five clones from each phage isolate were sequenced, and this sequence was used to give the clone a tentative identity. Identity of the clones was confirmed by direct sequencing from phage clones using 32P-end-labeled oligonucleotide primers and the Thermosequenase (Amersham, Piscataway, NJ) cycle sequencing kit. Phage, cosmid, and PAC clones were restriction mapped for the following enzymes: EcoRI, XhoI, NotI, SfiI, and SpeI, using endprobes, oligonucleotides, homeobox probes, et al. specific to certain phage clones. All phage clones were cut with a series of three restriction enzymes: AvaII, BanI, and HincII. These digests were run on an agarose gel, blotted, and the filters were probed with dig-labeled oligos to the T7 or SP6 promoter sites. For each “end,” a restriction site was chosen, which cut the DNA between 500 bp and 1 kbp from the promoter site. The phage DNA was then digested with that enzyme and used as a template for synthesis of riboprobe. DNA probes were labeled by PCR using digoxigenin-11-dUTP (10X DNA labeling mix, Roche, Indianapolis, IN). Primers 5E5 and HoxF were used and lamprey group 11 homeoboxes were used as template. Oligonucleotide probes were end-labeled with digoxigenin-11-ddUTP (Roche) and terminal transferase (Roche) following the manufacturer’s instructions with the following modifications: The reaction mixture was incubated at 37°C for 30 minutes and no further purification was performed before hybridization. Riboprobes were prepared by incubating linear- LAMPREY HOX CLUSTERS ized DNA with dig RNA labeling mix (Roche), RNA polymerase buffer, and either the T7 or SP6 RNA polymerase in the presence of an RNAse inhibitor (RNAsin; Promega) at 40°C for 30–60 minutes. Phylogenetic and genomic analysis Homeobox nucleotide and conceptual amino acid sequences from mouse and amphioxus Hox cognate groups 1 to 11 were obtained from Genbank. Homeodomain amino acid sequences were analyzed by neighbor-joining trees constructed using the Neighbor program, based on protein distance matrices derived using the PAM-Dayhoff option of ProtDist, both programs from the Phylip 3.5 package (Felsenstein, ’95). Genomic sequences were aligned and compared using Pipmaker (Schwartz et al., 2000) available at http://nog.cse.psu.edu/pipmaker. Low complexity and repeat sequences were masked using RepeatMasker (Smith and Green, unpublished data) available at http://ftp.genome.washington. edu/cgi-bin/RepeatMasker. RESULTS Three separate P. marinus genomic libraries were screened for Hox genes by filter hybridization. Three positive clones from a cosmid library were obtained and restriction mapped. CosA2 of approximately 34 kb contains a Hox 1 gene. Two overlapping clones, Cos4A4 and Cos2B, comprise 68 kb of total sequence and contain Hox5, Hox6, Hox7, Hox8, and Hox9 genes. Other contigs were constructed using a lambda phage library for genomic walking. Two additional large contigs were obtained in this manner. One from nine overlapping clones of 38 kb total contains Hox2 and Hox3 genes. The other, also from nine overlapping clones, is 52 kb in length and contains Hox5/6, Hox6/7, and Hox8 genes. A remaining seven contigs and five individual phage clones contained single Hox genes, which we were unable to link with other clones. These results are summarized in Figure 1. Detailed restriction maps are available on request. Estimated cluster size The average spacing of the 12 linked Hox genes is 15.6 kb—slightly more than the average spacing of genes in the mouse Hox clusters, which is 13.8 kb. Assuming a similar average number of genes per cluster (8.75 in the mouse), that would give an estimate of average cluster size of about 136 kb. This size is much smaller than that of the amphioxus cluster, which is 260 kb in length from the Hox1 to the Hox10 genes. 49 Gene orthology The entire homeobox was sequenced from each clone, which had a unique restriction map. Nucleotide sequences are available in GenBank, accession numbers AF410908–AF410925. The deduced amino acid sequences are shown aligned to mouse and amphioxus sequences in Figure 2. To assign genes to Hox cognate groups, the complete homeodomain amino acid sequences from P. marinus were aligned with those from mouse and amphioxus. This alignment was then used in a neighbor-joining analysis resulting in the bootstrapped tree of Figure 3. The known mouse Hox gene cognates cluster together on the neighbor-joining tree in all cases. Note that for the posterior (i.e., 5′) cognate groups the amphioxus genes do not cluster with those of mouse and lamprey, a finding termed “posterior flexibility” by Ferrier et al. (2000). Excluding the amphioxus genes, one or more lamprey homeodomains clustered with the complete complement of mouse cognates for groups 1, 2, 3, 8, and 11 with bootstrap confidence levels of 97% or greater. Groups 4, 9, and 10 are recovered at lower bootstrap proportions. Because of the high degree of amino acid similarity between homeodomains of groups 5, 6, and 7, these groups were not resolved in the tree. We attempted to resolve the relationships between these groups by including 3 residues N-terminal and 6 residues C-terminal of the homeodomain for each gene in a neighbor-joining analysis. This additional data, which includes some of the conserved residues outside the homeodomain (Sharkey et al., ’97) failed to resolve the relationships with significant bootstrap proportions. (For example, Amphi-Hox5, the three mouse Hox5 genes, and PmHoxN5 and PmHoxJ5/6/7 genes form a clade, but the bootstrap proportion is only 55%; data not shown). Where we have linkage data for the P. marinus genes (HoxK6/7, HoxL5/6, HoxN5, HoxN6, HoxN7, HoxQ8, HoxQ8a, HoxV9), orthology was assigned based on cluster position to the extent possible. Names incorporating more than one paralogy group number, such as HoxK6/7, reflect ambiguous assignments. It is possible that these assignments based on cluster position could be incorrect if there have been tandem duplications of individual genes after cluster duplication. The most likely candidates for this situation are PmHoxN6 and PmHoxN7. These genes have identical nucleotide sequences over the entire homeobox, although they begin to diverge immediately outside (Fig. 4). This conser- 50 S.Q. IRVINE ET AL. Fig. 1. Contig maps of lamprey Hox genes. Contigs are designated by lines with relative positions of homeoboxes shown by thick bars. Constituent clones are shown below each contig. Linkage relationships between individual contigs is unknown and relative positions of unlinked clones is arbitrary. LAMPREY HOX CLUSTERS Fig. 2. Homeodomain amino acid sequences. Conceptual translations based on DNA sequences. Lamprey homeodomains (bold type) shown aligned with amphioxus and mouse 51 sequences obtained from GenBank. The sequence of Drosophila melanogaster Antennapedia is shown at the top of the figure for reference. 52 S.Q. IRVINE ET AL. Fig. 3. Gene orthology estimation. Neighbor-joining tree using complete homeodomain amino acid sequences. Amino acid distances calculated using a PAM/Dayhoff matrix, consensus tree of 100 bootstrap replications. Internal branch lengths are proportional to bootstrap support, with bootstrap percentages shown at nodes. Where bootstrap support is lower than 50% branches were collapsed. Cognate group members shown by brackets. LAMPREY HOX CLUSTERS Fig. 4. Illustration of instances where nucleotide sequences of different gene cognates are identical within the region of the homeobox amplified using common degenerate PCR primers. The PCR amplified region is shown shaded. Each of the sequences shown is known by restriction maps of the phage clones to be derived from a separate gene. However, sequence based on the PCR generated fragment alone is insufficient to distinguish the cognates. 53 54 S.Q. IRVINE ET AL. vation of even synonymous sites suggests that the two genes were tandemly duplicated relatively recently, possibly with an extreme level of codon bias preserving third positions, or that gene conversion within the homeobox has occurred. Comparison of PCR survey and genomic mapping data In order to relate the different lamprey Hox gene sequences to each other, the identifiers of PCR sequences from two surveys and of sequences from genomic clones found in this study and in a similar study by another group (A. Force and J. Postlethwait, personal communication) are listed in Table 1. Sequence names at the same horizontal position in the table are the same sequence. Some homeobox nucleotide sequences showed little or no variation within certain regions, as mentioned for PmHoxN6 and N7. There are four different sets of sequences, each of which could TABLE 1. Clone summary Putative cognate group 1 2 Unique PCR sequence1 a b c4 d4 e Matching Irvine, et al. genomic clone2 Matching Force, et al. genomic clone3 1B pethox1w E2 3 – 3 4 g4 h (l) (n) G4 5/6/7 l4 j k l f m4 n o4 p4 pethox3y5 pethox4y Total unique clones assignable to cognate group Genomic clones assignable to cognate group 4 1 1 1 6 2 26 4 3 13 8 2 3 4 3 3 3 4 3 pethox4x pethox4w J5/6/7 K6/7 L5/6 F5/6/7 pethox5w pethox83 pethox51 pethox31 N5 N6 N7 pethox6w pethox5x #139 8 q r4 9 s4 t u v 10 w 11 11.1 11.6 11.8 x 1 Q8 Q8a R8 T9 V9 W10a W10b X10 PCRHx13(9) Y11 pethox11w Z11a Z11b Pendleton, et al. PCR survey and W. Bailey unpublished. This study. Force and Pendleton, personal communication. 4 6 nt or fewer differences with another PCR clone. 5 May be the same as genomic clone 3. 6 This number will be 1 if genomic clone 3 and pethox3y are the same. 2 3 pethox9w pethox9x pethox9y pethox10w LAMPREY HOX CLUSTERS only be detected as one sequence in the PCR survey. These sequences are shown in Figure 4. The region that is amplified by PCR is shown shaded. In each of the four sets, the sequence is identical in the region amplified by PCR, but there are mismatches outside that region. Mapping information confirms that these sequences belong to different genes. Each of the sequences is surrounded by a unique restriction map. The uniqueness of the HoxN5, -6, and -7 and the HoxQ8 and -Q8a sequences is further supported by linkage analysis. The three HoxN sequences are linked to each other in one cluster, and their positions allow us to assign each to a distinct cognate group, with the reservations noted above. The two “Q” sequences are in different linkage groups. HoxQ8 is linked to the three HoxN genes; whereas, HoxQ8a is linked to HoxK6/7 and HoxL5/6. In addition to these genes, which share the same sequence within a portion of the homeobox, we also found a number of PCR fragments that differed by six or fewer nucleotides from another sequence (noted by 4 in Table 1). According to the calculations of Misof and Wagner (’96), these sequences may represent allelic variants of the same gene. Of the sequences that differed by seven or more nucleotides, only four were not identified in our library screens. The medial sequences #139 and “h” were only found in one survey each and may be PCR artifacts or possibly contamination from another organism. In cognate groups 1 and 9, the sequences “a” and “u” were found in both PCR surveys, but not in our libraries (although another group has found this sequence in their 55 cosmid library screen; A. Force and J. Postlethwait, personal communication). These two sequences are probably genuinely missing from our genomic clones. Though the phage library screen produced most of the Hox genes identified by PCR, it is possible that some genes were missed. From a total of 22 different genes found, there were six genes (27%) for which only one phage clone was isolated. That suggests that Hox-containing clones were not isolated to saturation from the library. Number of Hox clusters in P. marinus According to the PCR data summarized in Tables 1 and 2 one might conclude that there are four Hox clusters in the lamprey P. marinus. However, if clones with fewer than seven nucleotide differences as compared with another cognate group member are excluded, assuming they are alleles of another gene, and the remaining unique sequences are combined with the total number of genomic clones recovered, the resulting numbers are consistent with a total of three Hox clusters. This is also the case if the PCR data for the lamprey Lampetra planeri (Sharman and Holland, ’98) is considered (Table 2). On the other hand, the Hox clusters of teleosts and mammals have experienced many losses of genes within the various cognate groups. Only groups Hox4 and Hox9 in mammals, for example, have a cognate representative in all four clusters. Our lamprey genomic survey, in combination with data from the laboratory of J. Postlethwaite (personal communication) found three members for cognate groups 4, 8, 9, 10, and 11, and found 8 members total for TABLE 2. Numbers of genes recovered from surveys sorted by cognate group Reference PCR surveys Cognate group Genomic clones Pendleton et al., 1993 Bailey, unpublished Sharmon and Holland, 1998 This study 1 2 3 4* 1 0 4* 1 1 3 1 1 1 1 1 4 2* 1 5 1 5 6 7 9* 7* 8 9 10 11 12 13 2* 4* 2 – – – 2* 4* 2 3 – – 7 2 3 2 – – 1 3 2 3 3 – – Data based on Petromyzon marinus, except Sharman and Holland, 1998, based on Lampetra planeri. *Indicates that one or more clones had less than seven nucleotide differences from others over the homeobox and may be allelic variants leading to possible overestimation of gene number. 56 S.Q. IRVINE ET AL. groups 5, 6, and 7 (Table 1). If the lamprey clusters have a similar level of gene loss as those of the mammals, there would be more than three clusters. Therefore, either three or four Hox clusters are likely to exist in P. marinus. The Hox9 cognate group can be useful for determining cluster identity because it has representatives in all four mammalian clusters. We tested whether the two lamprey Hox9 genes found in our genomic screen could be assigned to particular clusters based on phylogenetic analysis with the mouse orthologs and using amphioxus Hox9 as an outgroup. We used nucleotide sequences for the entire homeodomain in maximum parsimony analysis and excluded third codon positions on the assumption that these were saturated for nucleotide substitutions. We found that in the best-supported tree, the two lamprey genes clustered together as a sister group to the mammalian Hox9 genes. The next best-supported tree grouped both genes with Hoxc-9 (data not shown). However, neither tree had significant bootstrap values (54% and 40%, respectively) probably because there were only 16 non-synonymous character state changes in the ingroup. In addition, given the low number of informative characters, it is possible that the lamprey genes are clustering together due to differences in codon bias between lampreys and mammals. In short, based on the data we currently have, we cannot rule for or against duplication of one or more of the lamprey clusters independent of the duplications leading to the gnathostome clusters. Genomic analysis of the lamprey HoxW10a region In order to examine a portion of the lamprey Hox clusters in greater detail, we sequenced the entire 30 kb PAC clone Pm18, which contains the gene HoxW10a (GenBank accession no. AF464190). A diagram of the genomic organization of this clone is shown in Figure 5, along with a series of percent identity plot (PIP) alignments to Hox clusters from other vertebrate species. A first observation on examination of Figure 5 is that in comparison to the sequences from other vertebrates, the lamprey has a higher presence of simple repeat sequences and a greater proportion of CpG islands. In addition, as compared with the other vertebrates, the lamprey sequence around HoxW10a is expanded in several respects. First, the inferred intron of HoxW10a is larger than those of the other sequences (approximately 7.5 kb as compared with a maximum of 3 kb for the other Hox10 genes). Secondly, there is a transposable element (Tc1-like transposase sequence) inserted downstream of HoxW10a. Transposable elements are rarely, if ever, found in gnathostome Hox clusters. In addition, the PIP analysis identifies four regions of high sequence conservation in non-coding DNA. These conserved elements are located in the same relative positions with respect to the Hox10 exons in all the sequences compared, but in the lamprey are spread out in a manner consistent with a general expansion of the genome around HoxW10a. Furthermore, despite the 20 kb of sequence, we obtained 3′ of HoxW10a exon 2; we failed to encounter an adjacent Hox gene on this clone. This is in contrast to the average 5.4 and 9.4 kb distances to the first exon of Hox9 for the other Hoxa and Hoxc sequences, respectively. The sequence alignments for the four conserved elements are shown in Figure 6. These sequences are conserved over total divergence times ranging from more than 8×108 years for C1 and C2 to at least 1.5×109 years for the A2 sequence. Interestingly, the A2 and C2 elements are found in both the Hoxa and Hoxc clusters of other vertebrates. This presence in other clusters is reminiscent of the HB-1 element found in several locations in the Drosophila Hox cluster and in multiple clusters in vertebrates, and which has been shown to be responsive to Hox proteins (Haerry and Gehring, ’96; Haerry and Gehring, ’97). Another possibility is that this sequence is a basic cluster control element, as has been proposed for a conserved sequence found in all four vertebrate Hox clusters downstream of HOXA7, HOXB7, HOXC8, and HOXD8 (termed H8/ 7-6 FCS; Kim et al., 2000). The C1 element is also found as an inverted duplication in the lamprey, termed C3 (Figs. 5 and 6), but not in the mammalian sequences examined. Because the sequence is largely preserved in its Fig. 5. Percent identity plots PIPs of sequence alignments between lamprey PAC clone Pm18 (HoxW10a) and the corresponding genomic regions of various Hoxa-10 and Hoxc10 genes available in GenBank. The species and cluster names listed first to the left of the PIPs are the reference sequences for which the exon locations and repeated elements are depicted about the PIP, e.g. Fugu Hoxa vs. lamprey Pm18 shows the genomic organization around the Fugu Hoxa-10 gene with that sequence locally aligned by the PipMaker program (Schwartz et al., 2000) with Pm18. Conserved regions are circled and the corresponding sequences linked by solid lines between PIPs. Refs. and GenBank acc. nos.: Fugu, Aparicio et al. (’97) U92573; Striped bass, Snell et al. (’99) AF089743; Horned shark, Kim et al. (2000), AF224262; Mouse Hoxa, AC015583; Mouse Hoxc, AC021667; Human HOXA, AC004080; Human HOXC, NT_009563. LAMPREY HOX CLUSTERS 57 Fig. 5. 58 S.Q. IRVINE ET AL. Fig. 6. Sequence alignments for conserved regions identified in the PIPs of Figure 5. The inverted C3 element is shown aligned with the C1 sequences. Abbreviations: Fr, Fugu rubripes; Hf, Heterdontus francisci; Hs, Homo sapiens; Mm, Mus musculus; Ms, Morone saxitalis; Pm, Petromyzon marinus. Refs. as in Figure 5. new location, this duplication must either be recent or it must retain some functional significance. Hox genes previously identified by PCR (Pendleton et al., ’93), along with several new genes. We have determined the restriction maps of all of these clones, including linking some of them into clusters. We have also determined the nucleotide sequence of the homeobox (180 bp) for all of these DISCUSSION In these studies we have isolated phage, cosmid, and PAC clones that contain most of the lamprey LAMPREY HOX CLUSTERS sequences. Though the cluster map is incomplete, it does show that there are at least three clusters of Hox genes in the lamprey, for two of which we have shown linkage data. Thus, the lamprey is the most primitive chordate described to have multiple Hox clusters. We have also identified non-coding sequences shared between P. marinus and higher vertebrates, and conserved for more than one billion years of divergence time. These regions are likely to be regulatory elements, as shown by studies of sequence conservation combined with experimental assessment of function (Hardison, 2000; Loots et al., 2000). The extreme level of conservation of these elements suggests that they perform some basic function in regulating nearby Hox genes or that they are general cluster control or insulation elements. Lamprey clusters and hypotheses of duplication history As of yet, we have insufficient data to determine with confidence the identities and evolutionary histories of the lamprey Hox clusters. However, based on certain assumptions, evolutionary hypotheses can be proposed for testing as more data becomes available. We propose two initial assumptions in constructing these hypotheses. First, we assume a three-step duplication scenario based on the analysis of Bailey et al. (’97). This study used extensive sequence both from Hox4 and Hox9 paralogy group genes combined with sequence from the linked collagen genes to reconstruct the most likely duplication scenario based on several phylogenetic reconstruction methods. The model proposes that the ancestral Hox cluster was similar to the tetrapod D cluster, based on outgroup analysis, which duplicated to create an A-like cluster which in turn produced the B and C clusters, ie. (D(A(B,C))). We also assume a four cluster organization at the base of the gnathostome radiation. This is a parsimonious assumption given that all tetrapods examined have four Hox clusters. In addition, library and PCR screening data (Kim et al., 2000; C.-B. Kim, personal communication) suggests that sharks, the most primitive gnathostomes, have four or fewer Hox clusters. In this view, the seven or more Hox clusters existing in ray-finned fishes are the result of one or more duplication events in that lineage after the divergence of the lobefinned fishes (Amores et al., ’98). We propose four alternate scenarios, based on the preceding assumptions (Fig. 7). In Case A, the 59 common ancestor of lampreys and gnathostomes had one Hox cluster, with two or three independent cluster duplications occurring within the lamprey lineage. Three cluster duplications then occurred in the basal gnathostome lineage to create the four clusters retained in tetrapods. In this case, true vertebrates evolved without multiple Hox clusters. However, if the suggestion that one or more lamprey Hox clusters is not directly related to those of tetrapods is correct, it would lend weight to this hypothesis. In Case B, two Hox clusters were present in a primitive agnathan, and one or two independent cluster duplications occurred in the lamprey lineage to give three or four clusters. Thus, two cluster duplications occurred at the base of the gnathostome clade to complete the four ancestral tetrapod Hox clusters. This scenario is consistent with that of Amores et al. (’98) who argue for a two-step duplication scenario, with a duplication of both ancestral agnathan clusters, possibly simultaneously by genome duplication, to produce the four cluster ancestral gnathostome arrangement. This view would still be, in part, consistent with the possibility presented above that one or more lamprey Hox clusters is a sister to all the tetrapod clusters. In Case C, three Hox clusters are present in the ancestral agnathan, the result of two cluster duplications, with a possible independent duplication in the lineage of the modern lamprey. An additional duplication occurs in the basal gnathostome lineage, producing the final tetrapod cluster complement. Finally, in Case D, all the duplications leading to the gnathostome Hox clusters have occurred before the divergence of lampreys. In this case, all the lamprey clusters would be directly homologous to those of mammals. This scenario is the most parsimonious of the four in the absence of significant evidence of independent duplications along the lamprey lineage. We propose that this case is the best working hypothesis, because we believe there are most likely four lamprey clusters, as judged from gene numbers in each paralogy group, and allowing for levels of gene loss consistent with those of other vertebrate groups. Note that numerous other possibilities also exist if cluster losses have occurred in one or more lineages. In fact, preliminary PCR survey data suggests that four clusters may exist in hagfishes (Bailey and Wagner, unpublished data), supporting the notion that the ancestral agnathan had 60 S.Q. IRVINE ET AL. Fig. 7. Four hypotheses of vertebrate Hox cluster duplication. Number of Hox clusters at ancestral nodes are shown in ovals. Bars represent cluster duplication events, with dashed bars representing an additional duplication for the case of four Hox clusters in the lamprey. Hox cluster complements are indicated at termini of branches, with Hox clusters independently duplicated within a branch shown dashed. See text for further base assumptions. LAMPREY HOX CLUSTERS four clusters. If P. marinus has only three clusters, Case D could still be the correct duplication history with a cluster loss in the lamprey lineage (Sharman and Holland, ’98). The difficulty of distinguishing between these hypotheses illustrates the need for both complete mapping as well as extensive sequence data for use in phylogenetic analysis. As this data becomes available, a clearer picture of the lamprey Hox clusters will emerge, and tests of the above hypotheses will be possible. This work, in turn, will enable the reconstruction of the Hox cluster complement at the origin of the vertebrate radiation. Hox cluster number and morphological complexity It has been proposed that the increasing complexity of vertebrate body plans over evolutionary time might be related to expansions in the number of Hox clusters (Kappen et al., ’89). The fact that an agnathan vertebrate has at least three clusters suggests that the cluster duplications occurred at the very base of the vertebrate radiation and long preceded increases in axial complexity (Holland and Garcia-Fernandez, ’96b; Ruddle et al., ’99). However, although lampreys and hagfishes have relatively simple axial body plans, extinct groups of agnathans, such as cephalaspids, had considerable axial complexity (Forey and Janvier, ’93). If three or even four Hox clusters existed in agnathan groups, it may be the case that rather than a gradual increase in Hox gene and cluster number accompanying cladogenesis in the vertebrates, the cluster duplications occurred early in the chordate radiation and were the permissive condition for evolution of the basic vertebrate body plan, and sufficient for the evolution of the morphological complexity within agnathans. ACKNOWLEDGMENTS The authors gratefully acknowledge the technical assistance of Kimberly Hartwell and Stephanie Atiyeh. LITERATURE CITED Amemiya CT, Ota T, Litman GW. 1996. Construction of P1 artificial chromosome (PAC) libraries from lower vertebrates. Nonmammalian genomic analysis: a practical guide. San Diego: Academic Press. Amores A, Force A, Yan YL, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang YL, Westerfield M, Ekker M, Postlethwait JH. 1998. Zebrafish hox clusters and vertebrate genome evolution. Science 282:1711–1714. Aparicio S, Hawker K, Cottage A, Mikawa Y, Zuo L, Venkatesh B, Chen E, Krumlauf R, Brenner S. 1997. Or- 61 ganization of the Fugu rubripes Hox clusters: evidence for continuing evolution of vertebrate Hox complexes. Nature Genet 16:79–83. Bailey WJ, Kim J, Wagner GP, Ruddle FH. 1997. Phylogenetic reconstruction of vertebrate Hox cluster duplications. Mol Biol Evol 14:843–853. Bartels JL, Murtha M, Ruddle FH. 1993. Multiple Hox/ HOM-class homeoboxes in Platyhelminthes. Mol Phyl Evol 2:143–151. DeRobertis EM. 1994. The homeobox in cell differentiation and evolution. In: Duboule D, editor. Guidebook to the homeobox genes. Oxford: Sambrook & Tooze. p 13–23. Felsenstein J. 1995. PHYLIP (Phylogeny Inference Package), computer program distributed by the author. Version 3.57c. Seattle: Department of Genetics, Univ. of Washington. Ferrier DEK, Minguillon C, Holland PWH, Garcia-Fernandez J. 2000. The amphioxus Hox cluster: deuterostome posterior flexibility and Hox14. Evol Dev 2:284–293. Forey P, Janvier P. 1993. Agnathans and the origin of jawed vertebrates. Nature 361:129–134. Garcia-Fernandez J, Holland PWH. 1994. Archetypal organization of the amphioxus Hox gene cluster. Nature 370: 563–566. Gehring WJ. 1994. A history of the homeobox. In: Duboule D, editor. Guidebook to the homeobox genes. Oxford: Sambrook & Tooze. p 3–10. Haerry TE, Gehring WJ. 1996. Intron of the mouse Hoxa-7 gene contains conserved homeodomain binding sites that can function as an enhancer element in Drosophila. Proc Natl Acad Sci USA 93:13884–13889. Haerry TE, Gehring WJ. 1997. A conserved cluster of homeodomain binding sites in the mouse Hoxa-4 intron functions in Drosophila embryos as an enhancer that is directly regulated by Ultrabithorax. Dev Biol 186:1–15. Hardison RC. 2000. Conserved noncoding sequences are reliable guides to regulatory elements. Tr Genet 16:369–372. Holland PW, Garcia-Fernandez J. 1996a. Hox genes and chordate evolution. Dev Biol 173:382–95. Holland PWH, Garcia-Fernandez J. 1996b. Hox genes and chordate evolution. Dev Biol 173:382–395. Kappen C, Schughart K, Ruddle FH. 1989. Two steps in the evolution of antennapedia-class vertebrate homeobox genes. Proc Natl Acad Sci USA 86:5459–5463. Kim C, Amemiya C, Bailey W, Kawasaki K, Mezey J, Miller W, Minoshima S, Shimizu N, Wagner G, Ruddle F. 2000. Hox cluster genomics in the horn shark, Heterodontus francisci. Proc Natl Acad Sci USA 97:1655–1660. Loots GG, Locksley RM, Blankespoor CM, Wang ZE, Miller W, Rubin EM, Frazer KA. 2000. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288:136–140. Mallatt J, Sullivan J. 1998. 28S and 18S rDNA sequences support the monophyly of lampreys and hagfishes. Mol Biol Evol 15:1706–1718. Misof BY, Wagner GP. 1996. Evidence for four Hox clusters in the killifish Fundulus heteroclitus (Teleostei). Mol Phyl Evol 5:309–322. Pendleton JW, Nagai BK, Murtha MT, Ruddle RH. 1993. Expansion of the Hox gene family and the evolution of chordates. Proc. Natl. Acad. Sci. USA 90:6300–6304. Rasmussen A-S, Janke A, Arnason U. 1998. The mitochondrial DNA molecule of the hagfish (Myxine glutinosa) and vertebrate phylogeny. J Mol Evol 46:382–388. Ruddle FH, Carr JL, Kim C-B, Ledje C, Shashikant CS, 62 S.Q. IRVINE ET AL. Wagner G. 1999. Evolution of chordate Hox gene clusters. Annals NYAS 870:238–248. Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W. 2000. PipMaker — a web server for aligning two genomic DNA sequences. Genome Res 10:577–586. Sharkey M, Graba Y, Scott MP. 1997. Hox genes in evolution: protein surfaces and paralog groups. Tr Genet 13: 145–151. Sharman AC, Holland PWH. 1998. Estimation of Hox gene cluster number in lampreys. Int J Dev Biol 42:617–620. Snell EA, Scemama J-L, Stellwag EJ. 1999. Genomic organization of the Hoxa4-Hoxa10 region from Morone saxatilis: implications for Hox gene evolution among vertebrates. J Exp Zool (Mol Dev Evol) 285:41–49. Stock DW, Whitt GS. 1992. Evidence from 18S ribosomal RNA sequences that lampreys and hagfishes form a natural group. Science 257:787–789.