Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
7 Amniote Phylogenomics: Testing Evolutionary Hypotheses with BAC Library Scanning and Targeted Clone Analysis of Large-Scale DNA Sequences from Reptiles Andrew M. Shedlock, Daniel E. Janes, and Scott V. Edwards Summary Phylogenomics research integrating established principles of systematic biology and taking advantage of the wealth of DNA sequences being generated by genome science holds promise for answering long-standing evolutionary questions with orders of magnitude more primary data than in the past. Although it is unrealistic to expect whole-genome initiatives to proceed rapidly for commercially unimportant species such as reptiles, practical approaches utilizing genomic libraries of large-insert clones pave the way for a phylogenomics of species that are nevertheless essential for testing evolutionary hypotheses within a phylogenetic framework. This chapter reviews the case for adopting genome-enabled approaches to evolutionary studies and outlines a program for using bacterial artificial chromosome (BAC) libraries or plasmid libraries as a basis for completing “genome scans” of reptiles. We have used BACs to close a critical gap in the genome database for Reptilia, the sister group of mammals, and present the methodological approaches taken to achieve this as a guideline for designing similar comparative studies. In addition, we provide a detailed step-by-step protocol for BAC-library screening and shotgun sequencing of specific clones containing target genes of evolutionary interest. Taken together, the genome scanning and shotgun sequencing techniques offer complementary diagnostic potential and can substantially increase the scale and power of analyses aimed at testing evolutionary hypotheses for nonmodel species. Key Words: Amniote; Reptilia; BAC library; genome scan; genomic signature; retroelement; simple sequence repeat; sex-linked marker; shotgun cloning; EE0.6. From: Methods in Molecular Biology: Phylogenomics Edited by: W. J. Murphy © Humana Press Inc., Totowa, NJ 91 92 Shedlock et al. 1. Introduction 1.1. Genome-Enabled Phylogenetics and BAC Libraries Completion of the human and other genome projects constitute a leap in the scale on which the genome can be organized and studied. However, the technology that makes producing whole-genome assemblies feasible will likely be more difficult to transfer to other disciplines in biology, particularly the evolutionary sciences, than other more accessible DNA diagnostics have been, most notably, the polymerase chain reaction, for example. Nevertheless, comparative approaches and forays into nonmodel species are already being made by evolutionary biologists in an effort to integrate genome science with established principles of systematics, forging the exciting new area of interdisciplinary research aptly coined phylogenomics. Some would argue that merely having genomic resources for single representatives of, say, fish, mammals, and birds are sufficient to broaden genomics to include evolutionary biology. Furthermore, the cost of genome projects, at least in the short term, is still high enough to prohibit easy transfer of even basic genome resources to the community of evolutionary biologists. The rise of national Genome Centers and the increasing interest of such centers in tackling questions in nonmodel species is also strong evidence that the biology of the 21st century will be much more team-, organization-, and resource-driven than was biology in earlier decades. Indeed, given existing technologies, it is clear that systems-level biology even on single species requires not only advanced technologies and robotics not readily available to single-PI research programs, but large numbers of staff and infrastructure in coordination activities. We believe that, despite the logistical challenges to scaling up the size of comparative DNA sequence studies by orders of magnitude, evolutionary biology needs to embrace genomic technology. In particular, the use of genomic libraries such as those comprised of bacterial artificial chromosome (BAC) clones is one of several excellent ways to do this. BAC libraries are large-insert genomic libraries that are currently the optimal starting point for large-scale analysis of genome evolution in eukaryotes (1). Typically BAC clones can faithfully propagate fragments of DNA on average ~150 kb in length. They have a number of advantages over earlier types of genomic libraries, such as lambda and cosmid libraries, which can accommodate only smaller insert sizes. Although yeast artificial chromosome (YAC) vectors can hold much larger inserts, they are much less stable than BACs: BACs are much less susceptible to inter- and intraclone recombination related to their low copy number (1–2) per E. coli cell and the presence of genes ensuring faithful propagation and passage to daughter cells during cell division (2). For these reasons and their ease of manipulation, growth, and isolation of clones, BAC libraries formed the current backbone Amniote Phylogenomics: Testing Evolutionary Hypotheses 93 of many efforts to sequence complete genomes, including the human genome. Furthermore, through the use of homologous recombination to modify BACs with reporter constructs and their use as substrates for transgenesis (both within species and between species), many new directions in functional biology are opened through BAC libraries (3–6). The details of BAC library construction are extensive and go well beyond the scope of the present chapter (7,8). We therefore do not include protocols for BAC library production here but rather focus on the use of such libraries for conducting comparative studies. BAC libraries are typically produced by and can be accessed in collaboration with laboratories equipped with semiautomated accessories and robotics optimized for producing and managing BAC resources, exemplified by Production Centers affiliated with the U.S. National Human Genome Research Institute’s BAC Resource Network (NHGRI; http://www. genome.gov). In addition to the BAC libraries themselves, one of the most important advancements in eukaryotic genome science is the adoption of standardized methods of archiving of libraries and clones in microtiter plates, with one clone per well, thereby ensuring their long-term survival and utility to the scientific community (9). Such protocols represent a vast improvement over earlier bulk storage methods, which frequently result in significant loss of clones in the long term. In particular, we have benefited from collaboration with and protocols developed by Amemiya and colleagues (7) and from protocols and applications reviewed extensively by Zhao (1,10). The availability of BAC libraries from an expanding diversity of nonmodel species provides an ideal resource to increase access to genome-scale science by comparative biology. Although it is impractical to expect broad comparative studies to proceed rapidly for nonmodel organisms on a whole-genome basis, it is attractive and far less-expensive to estimate the structure of poorly known genomes by mining BAC libraries from diverse taxa essential for testing numerous hypotheses within a phylogenetic context. First, BAC libraries are in principle applicable to any organism with sufficient high-quality genomic DNA. Thus, the diversity of nonmodel species and those which cannot be maintained in captivity are all potential targets for BAC libraries. Second, in conjunction with ancillary methods of shotgun cloning and sequencing, BAC libraries provide access to an order-of-magnitude increase in the amount of DNA sequence data that can be brought to bear on problems of systematics and molecular evolution. Shotgun sequencing approaches are still relatively uncommon in nonmodel vertebrates (11,12). Despite recent substantial increases in the amount of sequence data devoted to questions of systematics in a variety of clades (13–16). BAC libraries offer yet larger increases that could resolve trees even further (17). More importantly, BAC-enabled studies can provide a genuinely genomic window into molecular evolutionary processes, revealing 94 Shedlock et al. on a large scale the vast array of non-nucleotide types of molecular variation, such as indels, duplications, and rearrangements, that hold considerable promise for molecular systematics but have thus far been underexploited (18–22). Other key trends in evolutionary biology for which BACs are relevant include the positional cloning of candidate genes for phenotypic traits, enhancement of QTL mapping efforts, population genetics, chromosome evolution, the role of gene regulation in evolution, and the evolution of multigene families. As we discuss below, these research directions in the Reptilia lag far behind similar studies in mammals, partly associated with the lack of suitable molecular reagents with which to tackle these problems. With the large database on developmental genetics, immunogenetics, and genome and mapping efforts provided by the chicken, these fields are now poised to be extended into a comparative framework as the relevant tools for accessing genes and gene families, especially BAC libraries, continue to become available. It is important also to note that several of the approaches for genome scanning, we suggest, can be implemented with genome libraries other than BAC libraries. For example, simple plasmid libraries with 2–7 kb inserts can provide a rich resource for examining many of the same issues as can be analyzed using BAC libraries. However, plasmid library analysis will incur some important limitations compared to BACs, such as the inability to perform downstream hybridizations of selected BACs to chromosomes using FISH, or to delimit large chromosomal regions with simple paired-end sequences of clones using comparative bioinformatics tools. Thus, although plasmid libraries will in principle yield the same type of multimegabase sequence data sets from end-reads as will BACs, the uses to which these can be put are more limited than with BACs. 1.2. Reptile Phylogenomics and the Amniote Ancestor Reptiles are a critical group of vertebrates for understanding the evolutionary dynamics of amniote genome evolution (Fig. 1). Reptiles are far more taxonomically diverse (~17,000 vs. ~4,500 species), and arguably more developmentally and chromosomally diverse than mammals. They also exhibit a diversity of environmental and genetic sex determining mechanisms relative to mammals (23). Comparing data from the chicken genome to that of mammals in the absence of relevant outgroup information from nonavian reptiles has limited our ability to accurately infer the genomic condition of our common amniote ancestor. Recent attempts to close this large gap in the comparative genomics literature have tested alternative models of amniote genome evolution summarized by Waltari and Edwards (24). Results from phylogenomic analysis of multi-megabase large clone insert sequence from exemplar non-avian species indicate that the ancestral amniote likely had a relatively large genome with a diverse repetitive landscape and GC content similar to that observed for many Amniote Phylogenomics: Testing Evolutionary Hypotheses 95 Fig. 1. Diagram summarizing conventional view of amniote relationships. Distribution of genome sizes, measured in picograms, for lineages are represented by bars over branch tips. The question mark and dotted arrow indicate a growing body of molecular evidence suggesting turtles are more derived among reptilian clades than their traditional basal position illustrated here. mammals and that underwent a series of sequential size reductions in the lineage leading to birds (25,26). Moreover, integrating paleontological data on bone cell size with that for genome size and interspersed repeat abundance in extant amniote species has revealed that the substantial reduction of non-coding elements 96 Shedlock et al. leading to a streamlining of the chicken genome likely evolved in theropod dinosaurs ~130 Myrs before the origin of avian flight (27). The largely uncharted genomic landscape of squamate reptiles remains an important open avenue for exploring the diversity of amniote genome structure and for developing a predictive theory of genome evolution (28). Table 1 outlines some of the major genomic features for three nonavian reptile focal species as well as chicken and human for comparison. Genomic clone libraries for the American alligator (Alligator mississippiensis), turtle (Chrysemys picta), and Anole (Anolis smaragdinus) have been analyzed within a phylogenetic context using methods of investigation outlined in detail below (25). 1.3. Historical Genome Dynamics Two major goals of our phylogenomic approach have been to synthesize a model of genome evolution in reptilian clades over the past 310 Myr of vertebrate evolution and to infer ancestral conditions in the amniote common ancestor. Because we have gleaned genome statistics from unaligned sequence data, many of the genome characters we seek to understand are continuous variables, as opposed to the discrete characters provided by aligned DNA sequences. For this reason, the approaches we have used are somewhat different from those used to calculate ancestral sequence states for whole mammalian chromosomes (e.g., 31). Our approach integrates estimates of genome size, global base composition, abundance and diversity of repetitive elements, and phylogenetic relationships among species-specific signatures of higher order sequence complexity. In particular, this can be achieved by (1) mapping the phylogenetic distribution of retroelements which are known to modulate genome size; (2) mapping the phylogentic distribution of particular simple sequence repeat (SSR) subclasses, which may affect global base composition; and (3) calculating rates of frequency change in DNA words along branches of a phylogeny derived from genomic signatures. Taken together these three provide a means for sketching the history of genome dynamics among lineages being investigated by genome scanning. 2. Materials 2.1. Manipulating and Screening BAC Libraries 1. Electrocompetent, DH10B T1-resistant Escherichia coli cells (cat. no. 12033015, Invitrogen). 2. LB broth (Miller). 3. Nylon filters. 4. Chloramphenicol. 5. Glycerol. 6. LB agar. 7. Lysis buffer solution: 2X SSC, 5% SDS. Amniote Phylogenomics: Testing Evolutionary Hypotheses 97 Table 1 Some General Features of Reptilian and Human Genomes Genome size (pg) Haploid chromosome no. Microchromosomes? Alligator Turtle Chicken Anole Human 2.49 16 No 2.57 ~25 Yes 1.25 39 Yes 2.2 ~18 Yes 3.5 23 No Sources: refs. 29,30,81–84. 8. Proteinase K buffer solution: 50 mM Tris (pH 8), 50 mM EDTA, 100 mM NaCL, 1% N-lauryl sarcosine. 9. 10 μg/mL proteinase K. 10. 2X SSC. 11. Fabric pen (cat. no. PN1, Cleaner’s Supply). 12. XL-10 Gold competent E. coli cells (cat. no. 200315, Stratagene). 13. (γ-32P)-dCTP. 14. Prime-It II Random Primer Labeling Kit (cat. no. 300385, Stratagene). 15. Microspin columns (cat. no. 27-5120-01, GE Healthcare). 16. Sonicated, nonhomologous herring sperm DNA (cat. no. D1815, Promega). 17. Hybridization mesh (cat. no. RPN2519, GE Healthcare). 18. Hybridization buffer solution: 18.75 mL 20X SSPE, 3.75 mL 100X Denhardt’s solution, 3.75 mL 10% (w/v) SDS, and 48.75 mL H2O. 19. 1X washing buffer solution: 935 mL H2O, 50 mL 20X SSC, 5 mL 20% SDS, 10 mL 5% pyrophosphate. 20. Metal cassette (cat. no. S-14, Spectronics). 21. Biomax MR X-ray film (cat. no. 8567232, Kodak). 22. X-omat film processor (cat. no. 1000A, Kodak). 23. Stripping solution 1: 0.2 N NaOH. 24. Stripping solution 2: 0.1 M Tris-HCl (pH 7.5), 0.1X SSC, 0.1% (w/v) SDS. 25. Stripping solution 3: 0.1X SSC, 0.1% (w/v) SDS. 26. Resuspension buffer (cat. no. 19051, Qiagen). 27. RNase A (cat. no. 19101, Qiagen). 28. Lysis buffer (cat. no. 19052, Qiagen). 29. Neutralization buffer (cat. no. 19053, Qiagen). 30. Isopropanol. 31. 70% ethanol. 32. TE buffer. 33. Restriction enzymes (EcoRI and HindIII). 34. Hydro-shear (cat. no. JHSH000000-1, Genomic Solutions). 35. Micrococcus (cat. no. 159972, MP Biomedicals). 36. Qiaquick kit (cat. no. 28106, Qiagen). 37. Blunting solution for 10 samples: 516.67 μL 5X T4 DNA polymerase buffer (cat. no. M0203L, New England Biolabs), 258.33 μL BSA (1 mg/mL; cat. no. B9001S, New England Biolabs), 258.33 μL dNTPs (1 mM; cat. no. 10766020, 98 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. Shedlock et al. Roche), 193.75 μL H2O, 64.58 μL T4 DNA polymerase (cat. no. M0203L, New England Biolabs). 0.5 M EDTA. BstXI/EcoRI Adapter (cat. no. N41818, Invitrogen). Linking solution for 10 samples: 38.75 μL 10X T4 DNA ligase buffer (cat. no. M0202L, New England Biolabs), 38.75 μL T4 DNA ligase (cat. no. M0202L, New England Biolabs). Vector (cat. no. A362A, Promega). Ligating solution for 10 samples: 23.3 μL 10X T4 DNA ligase buffer, 17.5 μL T4 DNA ligase. GC5 cells (cat. no. 62-7000-16W, PGC Scientific). SOC medium (cat. no. S1625, Teknova). LB solution: 1.8 L LB broth (Miller), 3 mL chloramphenicol, 0.2 L glycerol. 96-well 800 μL uniplate (cat. no. 7701-1800, Whatman). 96-well 800 μL unifilter (cat. no. 7700-1806, Whatman). Big Dye Terminator v3.1 Cycle Sequencing Kit (cat. no. 4336917, Applied Biosystems). M13 primers. 3. Methods 3.1. Comparative Investigation of Genomic Libraries Phylogenomic analysis of BAC-libraries can be divided into two general categories of investigation: (1) characterization of genome structure and comparative analysis of multimegabase sequence data based on surveys of paired-end reads of clones sampled randomly from each genome; and (2) targeted studies of genes and genomic neighborhoods based on protocols aimed at characterizing particular loci of interest from a small subset of selected clones. These two approaches to interrogating the library complement each other in terms of their sampling designs, analysis of primary data, and scales of inference about genome evolution. Although genome scanning of BAC-ends provides a statistical estimate of the overall structure of the genomes investigated, it does not typically allow for direct comparisons of particular genes of interest nor does it generate homologous, alignable sequences from specific regions of the genome which can be analyzed by conventional systematic methods. Conversely, although targeted BAC assays will allow for investigation of particular genes of interest, they can provide little inference about the genome-wide distribution of genomic elements or global base composition and historical dynamics of genome evolution among the species being investigated. Taken together the two lines of investigation can provide a synergy of understanding about global patterns of genome dynamics as well as fine-scale evolutionary analysis of particular chromosomal regions in the context of genomic neighborhoods. Each of these two experimental approaches is summarized below with examples from investigating reptilian genomes within a comparative framework. Amniote Phylogenomics: Testing Evolutionary Hypotheses 99 3.2. Genome Scanning of Nonavian Reptiles The following sections outline basic steps taken in completing a BAC-enabled phylogenomic analysis of major lineages of Reptilia. The experimental strategy exploits the relative ease with which a BAC library can be used to generate a multimegabase dataset of nonoverlapping nucleotide sequences sampled randomly from throughout a genome to complete a so-called “genome scan.” Although we have applied this approach to investigating the phylogenomics of reptiles and the structure of the ancestral amniote genome, the methods outlined here serve as a guideline for similar investigations that may test evolutionary hypotheses regarding other nonmodel species using large-scale comparative sequence analysis. Completing a phylogenomic study based on genome scans of BAC libraries relies heavily on existing genomic resources, informatics tools, access to a diversity of online genomic databases, and an integration of computational methods aimed at testing evolutionary hypotheses within a phylogenetic framework. A flowchart summarizing the basic steps and integration of methods we have employed to investigate amniote genome evolution is illustrated in Fig. 2 (see Note 1). 1. Selection of vector primers. If you are creating your own genomic library, design the vector sequence of each individual clone to contain both forward and reverse priming sites near the point of fragment insertion. If you are using an existing library, determine the sequence priming sites of vectors used to build the library. Standard vector sequencing primers include commercially available forward and reverse oligos such as M13 and T7 which can be used to sequence ~500–1000 bp of original genomic sequence along each end of the clone insert. 2. Sequence nonoverlapping clone inserts. Use forward and reverse vector primers to gather paired end-sequences from randomly selected clones in each library. Set up sequencing PCR reactions in 96-well format with ABI Big Dye cycle sequencing kits and gather primary data using a high-throughput capillary array DNA analyzer such as the model ABI 3730 (see Note 2). Compile paired end-reads for thousands of clones isolated from libraries of target species to produce a multimegabase data set of genomic sequence sampled randomly across each genome. A cartoon summarizing the genome scanning process for a BAC library is shown in Fig. 3, with several targeted downstream applications illustrated. 3. Screen end-sequences for poor base calling and contamination. Employ the Phred base calling software program (32) and exclude poor reads with quality value (QV) scores of less than 20. Identify and remove vector sequences from the data set using the NCBI tool VecScreen (http://www.ncbi.nlm.nih.gov/projects/VecScreen/). 4. Evaluate base composition. Derive statistical estimates of global GC content with a 95% confidence interval from observed GC values in multimegabase data compiled from BAC-end sequences distributed across the genome (see Note 3). Take into account any intragenomic variation in GC distribution owing to genomic features such as isochore structure and cpG islands as well as uneven local GC distributions 100 Shedlock et al. Fig. 2. Flowchart summarizing series of basic steps taken in completing a BAC-enabled genome scan study. Paired BAC-end sequences are compiled into large-scale datasets, examined for structural features. Complexity measured in terms of genomic signatures of DNA word frequencies are related phylogentically and integrated with structural data and estimates of character change along branches to synthesize models of historical genome dynamics and test evolutionary hypotheses within a phylogenetic framework. present in given sequence reads. Do this by checking for autocorrelation of bases up to 50–100 nt away from a focal base and assuming a model in which the GC value for each read follows a binomial distribution and where the probability of the GC values are independently and identically distributed with an unknown density (see Note 4). 5. Create genomic signatures of word-frequencies. Large-scale primary sequence derived from genome scans contains compositional information above the level of homologous alignable nucleotide sites that may reflect species-specific differences in patterns of mutation, repair, and selection on the molecular level. Evaluate this higher order organizational structure using motif-counting procedures (33–35). Do this by counting frequencies of all possible short n-letter oligos, or DNA words, using a 1-bp sliding-window approach and plotting genomic signatures (see http://genstyle. imed.jussieu.fr/). Examples of annotated visual summaries of genomic signatures for the American alligator and a human are presented in Fig. 4, along with a legend Amniote Phylogenomics: Testing Evolutionary Hypotheses 101 Fig. 3. Cartoon detailing the use of BAC clones to establish primary data for largescale sequence scanning. Clone-inserts randomly arrayed in a BAC library are sequenced with known forward and reverse vector primers. Compilation of multimegabase data sets sampled across target genomes approximate global structural features and allow for a diversity of genomic analyses, including plots of continuous character distributions, base compostition profiling, and mapping of sequence synteny onto reference genome data assemblies. explaining the scheme of pixel representation shown for frequencies of all possible 16,384 (47) 7-nt words in ~2.5 Mb of BAC-end sequence in each signature. 6. Relate genomic signatures with phylogenetic methods. Generate Euclidean distances among genomic signatures from multiple species based on the square root of the sum of the square of the differences in frequency of motifs. Normalize each signature for genome-wide base compositional differences prior to generating distance matrices. Do this by subtracting the expected frequency of each motif on the basis of single-letter base composition from the observed frequency for each species. Construct phylogenetic trees of the matrix using distance methods such as the neighbor-joining (NJ) algorithm (36). Evaluate statistical support for NJ tree topology using bootstrap replication. Create pseudomatrices by removing each word with replacement and calculating signatures and distances for each pseudomatrix (37) (see Note 5). 102 Shedlock et al. Fig. 4. Examples of genomic signatures for (A), American alligator, and (B), human, visualized by pixel representations of all possible 7-nt DNA word frequencies contained in ~2.5 Mb of genomic sequence per signature. Diagram in (C) illustrates the order of pixel counts used to construct signatures of n-letter words. Darker pixels correspond to higher frequencies. 1, 3, 8, 9, correspond to C7, G7, A7, and T7 words, respectively, plotted at corners of the signatures. 2, 6, 7, mark regions of CG-poor words. 4, 5, mark diagonal lines formed by densities of 7-letter words composed only of pyrimidines or purines, respectively. 10 marks adjacent regions exhibiting high densities of microsatellite motifs apparent in human but absent in alligator. 7. Estimate the diversity of interspersed repeats. Complete informatics searches for interspersed transposable genetic elements present in paired end-sequences of clones from genomic libraries with local alignment tools implemented in the program RepeatMasker (38). Evaluate hits from searches that may reflect an inherent ascertainment bias related to both the incomplete nature of the reference database used by RepeatMasker and the relative level of divergence between query and reference sequences (see Note 6). 8. Estimate the diversity of tandem repeats. Count the density of microsatellites or SSRs (see Note 7) in BAC-end sequences corresponding to specific length categories using Amniote Phylogenomics: Testing Evolutionary Hypotheses 103 the search options built into the online informatics tool Tandem Repeats Finder (39). Algorithms used by Tandem Repeats Finder detect and score target elements independently of any underlying reference database, however, search results can be influenced by alignment parameter settings used for a given query sequence. 9. Calculate the global density of repetitive elements in target genomes. Obtain genome-size information for target species, defined by haploid nuclear DNA content measured in picograms (C-value). Use C-values to calculate the fraction of genomes that are surveyed by paired end sequences of clone inserts sampled for a genomic library. Obtain C-values either from direct experimental measurements (e.g., flow cytometry and buoyant density analysis) or from the literature. The Animal Genome Size Database (http://www.genomesize.com/) is a useful online reference for obtaining statistics on genome size and karyotype information from a wide diversity of organisms. For example, we can estimate the fraction of our target reptile species surveyed by BAC genome scanning using information publicly available on the Genome Size Database as follows: Alligator mississipiensis, 2.49 pg, 50 chromosomes; Chrysemys picta, 2.57 pg, 36 chromosomes; Anolis smaragdinus, 2.2 pg, 32 chromosomes. Conversion of picograms to base pairs: 2.49 pg × (9.78 × 108 pg/Mb) = 2.435 Gbp; 2.57 × (9.78 × 108) = 2.513 Gbp; 2.20 × (9.78 × 108) = 2.152 Gbp. Estimated fractions of genomes surveyed: 2,519,551 bp alligator BAC-end seqs/2.435 Gbp = 0.1035% alligator genome sampled; 2,432,811 bp turtle BAC end seqs/2.513 Gbp = 0.0968% turtle genome sampled; 1,358,158 bp Anolis plasmid end seqs/2.152 Gbp = 0.0631% Anolis genome sampled. The total number of repeats per genome and its standard deviation can be estimated using the following formulas (see Note 8): Total copies = ( raw repeat counts ) × ( genome size in bp) Standard derivation = ± size of sequence data set in bp ( ) raw repeat counts × ( genome size in bp ) size of sequence data set in bp 10. Estimate the evolutionary rates of word frequency change among lineages. Employ comparative methods to calculate the evolutionary rates of word-frequency change for specific branches on the amniote tree on the basis of the phylogentic-generalized least squares (PGLS) approach implemented in the PGLS-ancestor module of the software package COMPARE v. 4.6 (40,41) (see Note 9). As an example, we have used a NJ distance tree with the following published divergence time estimates (Myr) for major clades in the hypothesis assuming the existence of a common ancestor at 310 Myr: squamates, 245 (42); birds, 222 (43); crocodilians and turtles, 207 (42); and rodents and primates, 85 (44,45). Compare differential rates estimated along specific branches in the phylogeny in order to illuminate macroevolutionary trends in genomic change for particular clades. Evaluate the contribution of particularly active repetitive or oligonucleotide subsets of elements to changes in total genome composition by examining frequency change of individual elements along specific branches. 104 Shedlock et al. 3.3. Targeted Studies of Genes and Genomic Neighborhoods In addition to supporting genome scans to estimate the structure and complexity among large numbers of randomly selected sequences, BAC libraries can also be used to target and describe specific genes and their local genomic neighborhoods. To illustrate the steps of gene-targeting using a BAC library, this section describes the location and description of a BAC insert as completed for EE0.6, a heterochromatic marker found on the Z chromosome in emu, Dromaius novaehollandiae (46). In summary, an emu BAC library (see http://evogen.jgi.doe. gov/second_levels/BACs/BAC_library_stats/Dnova_stats.html) was screened with a radioactively labeled probe, EE0.6, and a single hybridizing clone was detected and isolated from the arrayed library for more targeted investigation by means of shotgun-subcloning, fragment sequencing and contig assembly. We present a detailed step-by-step protocol of this process below under items (1)–(11). The library screening and shot-gun assembly of particular BAC clones can be used for comparative investigation of particular homologous functional elements within genomes in a manner that compliments the phylogenomic analysis of BAC-enabled genome scans as outlined above. This example provides an illustration of the type of genome analysis provided by BAC clones that would be impossible to achieve with a simple plasmid library owing to the limited coverage of each clone in small insert libraries. A flowchart summarizing targeted sequence analysis of a BAC clone is presented in Fig. 5. Before a BAC library can be screened, nylon filters must be prepared with a gridded representation of the clones contained in the 384-well plates that hold the library. The emu BAC library, for example, contains 133,632 clones. The emu BACs are inserted into electrocompetent, DH10B T1-resistant Escherichia coli cells. Each clone is suspended in a separate well with LB broth (Miller), chloramphenicol, and glycerol. 1. Filter preparation from the emu BAC library. Remove 384-well plates from frozen storage (−80°C) and thaw for ~90 min. Grid each clone in the emu BAC library to a nylon filter with a Genetix Q-bot or other automated colony picker. Each filter represents clones from 48 384-well plates. Soak each filter in LB broth and 25 mg/mL chloramphenicol before gridding. After gridding, incubate filters at 37°C for 17 h on a large bioassay dish filled with 300 mL 1.5% LB agar and 12.5 μg chloramphenicol (25 mg/mL). Transfer each filter to filter paper on a glass dish, saturated with lysis buffer (2X SSC, 5% SDS). Incubate filters at room temperature for 3 min, microwave at maximum power for 3 min and transfer to a Pyrex baking dish filled with proteinase K buffer and 10 μg/mL proteinase K. Cover the Pyrex dish with plastic wrap and incubate at 37°C for ~2 h with gentle rocking. Once the filters are cleared of colonial debris, rinse them in 2X SSC for 2 min, air-dry and crosslink for 2 min. Label each filter with a fabric pen to indicate orientation and identity of the 48 plates gridded to the filter. Store gridded filters dry at room temperature until their first use. Amniote Phylogenomics: Testing Evolutionary Hypotheses 105 Fig. 5. Generation and analyses of bacterial artificial chromosome (BAC) sequence data. BAC libraries are stored in frozen media in 384-well plates. For gene-targeting, a two-dimensional representation of the library is printed onto nylon filters. Sequence probes are hybridized to the filters to locate a BAC clone that contains the probe sequence. Once the BAC clone is selected, the BAC insert is sheared into fragments, shotgunsubcloned, sequenced, and assembled. Once assembled, BAC inserts provide raw data for large-scale sequence analyses of repeat density and synteny. 2. Probe generation. Generate sequence of interest from genomic DNA by PCR. Ligate and transform the DNA into XL-10 Gold competent E. coli cells. PCR-transformed clones using original gene primers. Purify PCR products from clone DNA and store at a concentration of 25 ng/μL until the day of hybridization. Before hybridization, label 25 ng DNA with (γ-32P)-dCTP with the Prime-It II Random Primer Labeling Kit. Filter labeled probes through microspin columns to remove unlabeled nucleotides. 3. Filter prehybridization and hybridization. Denature 1.5 mL sonicated, nonhomologous herring sperm DNA by heating at 100°C for 10 min and then chill on ice for 2 min. By the same method, also denature labeled probe DNA. Sandwich each filter between layers of hybridization mesh and roll them in lead-lined glass cylinders. Prehybridize filters by adding 75 mL hybridization buffer and 1.5 mL herring sperm DNA. Incubate filters, buffer, and herring sperm DNA with rolling at 65°C for 6 h. 106 4. 5. 6. 7. 8. Shedlock et al. After prehybridization, decant liquid from the cylinders and replace with 25 mL hybridization buffer, 0.5 mL denatured herring sperm DNA, and denatured probe DNA. Incubate cylinders with rolling at 65°C for 12 h. Washing and autoradiography. Decant liquid from cylinders and replace with 25 mL 1X washing buffer. Incubate cylinders at 50°C for 15 min. Decant washing solution. Repeat washing twice. After three washes, blot excess buffer from individual filters with paper towels. Wrap each filter in plastic wrap and place, DNA side up, in a metal cassette under an undeveloped Biomax MR X-ray film. Store cassettes at −80°C for 1 wk. Developing films and picking clones. After autoradiography, develop films with an X-omat film processor. Strip filters of radioactivity by saturating them with three solutions: stripping solution 1 for 20 min; stripping solution 2 for 10 min; and stripping solution 3 for 10 min. Wrap filters and store wet at −20°C. Positive hybridizations are indicated by two black spots on film. The two spots match the pattern in which each clone’s DNA was gridded to the filter by the Q-bot. To illustrate this technique, results from an autoradiograph of a BAC library screening assay of Zebra Finch (Taenopygia guttata), screened with a Mhc class II probe, are shown in Fig. 6. By referencing the location of the spots and the plates gridded onto the filter, select clones from the BAC library for confirmation of sequence of interest within the clone. Growing clones for preliminary PCR survey. Pick each candidate BAC clone from its 384-well plate and incubate in 500 μL LB broth (Miller) and chloramphenicol at 37°C with shaking (250 rpm) for 17 h. PCR each clone with original gene primers. Purify clones that produce a distinguishable band in an electrophoretic gel for Southern blotting. Purification, restriction, and Southern blotting. Incubate positive clone in 10 mL LB broth (Miller) and chloramphenicol at 37°C with shaking (250 rpm) for 18 h. Decant culture into a 1.5-mL Eppendorf tube and centrifuge at 13,000 rpm for 5 min. Decant LB and replace with new culture. Centrifuge tube again. Continue decanting and centrifugation until all bacterial pellets from 10 mL culture are contained in one tube. Resuspend pellets in 30 μL resuspension buffer and RNase A, and mix with 30 μL lysis buffer (Qiagen) and 30 μL neutralization buffer. Rock solution gently several times, incubate on ice for 15 min, and centrifuge at 13,000 rpm for 15 min. Remove supernatant carefully, mix with 150 μL of isopropanol and incubate at −20°C for 30 min. Centrifuge the tube at 13,000 rpm for 15 min and discard the supernatant. Mix the BAC DNA pellet with 30 μL 70% ethanol, rock gently, and centrifuge at 13,000 rpm for 5 min. After ethanol is removed, air-dry the pellet and resuspend in 10 μL TE buffer. Restrict BAC DNA with EcoRI and HindIII, and anneal to a filter for hybridization with a labeled EE0.6 probe as previously described. Autoradiography provides additional support for the presence of the sequence of interest in the BAC clone’s insert. Shearing. Once the target BAC clone has been detected and isolated by the above protocols, the shotgun assembly can be completed beginning with shearing the insert and continuing through subcloning and contig assembly as described in steps Amniote Phylogenomics: Testing Evolutionary Hypotheses 107 Fig. 6. Signal of hybridization between DNA probe and a bacterial artificial chromosome (BAC)-gridded nylon filter. A radioactive or bioluminescent probe is hybridized to a nylon filter that is gridded with DNA from the clones in the library. The organization and location of hybridization signals (arrows pointing at two dots) indicate the identity of the clone that carries the probe sequence. The results shown are from American alligator (Alligator mississippiensis) and were screened with a DMRT1 probe. This filter illustrates how BAC clones can be localized to individual wells by the pattern of hybridization within each 4 × 4 spotted grid. (8)–(11) below. A cartoon summarizing details of the BAC shot-gun sequence and assembly process is illustrated in Fig. 7. Fragment 30 μg purified BAC DNA in (200 μL solution with a hydroshear. Estimate the concentration of sheared DNA by comparison to known concentrations of micrococcus in an electrophoretic gel. 108 Shedlock et al. Fig. 7. Bacterial artificial chromosome (BAC) library shotgun-subcloning and sequencing. BAC inserts carry 100–200 kb of sequence. BAC inserts are sheared into smaller fragments and incorporated as smaller inserts in smaller vectors. The fragments can be sequenced and joined into contigs by bioinformatics software. 9. End repair and shotgun subcloning. After hydroshearing, fragmented DNAs have sticky ends inconsistent with ends for a potential cloning vector. To make the fragment ends uniform, blunt-end and end-repair 30 μg DNA. Purify hydrosheared DNAs with the Qiaquick kit and blunt-end by incubation at 120°C for 40 min with 100 μL blunting mix followed by addition of 8 μL 0.5M EDTA to each sample and incubation at 70°C for 10 min. After cleaning samples with Qiaquick kit again, mix 21 μL DNA with 3 μL BstXI/EcoRI Adapter and 6 μL linking mix and incubate at 16°C for 17 h, 70°C for 10 min, and end at 4°C. Purify samples with the Qiaquick kit again and gel-extract DNAs between 4 and 7 kb for ligation. Purify gel extracts with the Qiaquick kit. Mix 15.5 μL of sample with 1 μL vector and 3.5 μL ligating mix and incubate at 16°C for 17 h, 70°C for 10 min, and end at 4°C. Mix 1 μL ligated sample with 100 μL thawed GC5 cells and incubate at 0°C for 40 min, 42°C for 15 s, 0°C for 1 min. Transfer samples to 900 μL SOC medium, seal, shake, and incubate at 37°C for 1 h. Spread samples on large square bioassay dishes filled with LB agar (Miller) and chloramphenicol, invert, and incubate at 37°C for 17 h. Amniote Phylogenomics: Testing Evolutionary Hypotheses 109 Pick resultant colonies by Genetix QPix or other automated colony picker into the 96-well deep plates filled with 300 μL of LB mix. Incubate colony plates at 37°C and shake at 250 rpm for 17 h. Centrifuge plates at 5000g at 12°C for 6 min. Decant supernatant and add 150 μL resuspension buffer and RNase A, 150 μL lysis buffer, and 150 μL neutralization buffer to each sample while shaking at 250 rpm. Centrifuge samples at 5000g at 23°C for 30 min. Fill a 96-well 800 μL uniplate with 290 μL of 99% isopropanol and cover with a 96-well 800 μL unifilter. Add 370 μL of sample supernatant to the unifilter. Seal the unifilter and centrifuge at 3000g at 14°C for 15 min. Discard the unifilter and the supernatant in the uniplate. Air-dry pellets, wash twice with 70% ethanol, vacuum air-dry and resuspend in 15 μL TE buffer. Centrifuge uniplates at 500 rpm at room temperature for 2 min and incubate at room temperature for 1 h. Freeze samples at −20°C. 10. Sequencing and assembling contigs. Label samples with the Big Dye Terminator v3.1 Cycle Sequencing Kit using m13 primers and sequence. Name resultant sequence files according to the St. Louis naming scheme which consists of three parts: (1) sample identity, (2) sequence direction (b for forward or g for reverse), and (3) unique sequence identity (i.e., 121.b.241). Construct contigs by analyzing sequence files with Phred, Phrap, and Consed assembly software. Query Autofinish for suggested primers for closing gaps between contigs (32,47–49). 11. Bioinformatics. Query contigs produced from the BAC insert for Refseq genes within relevant libraries (human, mouse, chicken, and so on) with the UCSC genome browser. Scan contigs for gene content with Seqhelp, Genscan and Geneparser (50–52). Annotate alignments to relevant libraries with the Apollo software package (53). Measure repeat density of the BAC sequence with the Miropeats and RepeatMasker software packages (38,54). Measure polymorphism using the DNAsp software package (55). These software packages, among others, allow analyses of sequences generated from screening, subcloning, sequencing, and assembling of BAC DNA. An example of annotations produced by the above methods for 41 kb of microchromosomal sequence from emu, Dromaius novaehollandiae, is illustrated in Fig. 8. 4. Notes 1. Our reptile studies surveyed large-insert libraries for the American alligator and painted turtle, constructed by colleagues at the NIH BAC Resource Network Production Center at the Benoroya Research Institute at Virginia Mason, Seattle, WA, (www.benaroyaresearch.org). We also utilized sequences from a whole-genome plasmid library generated at the Washington University Genome Sequencing Center in St. Louis, MO (http://genome.wustl.edu/home.cgi). 2. Clone sequencing was conducted at the Institute for Genomic Research, Rockville, MD (http://www.tigr.org), and the Washington University Genome Sequencing Center, and follow the published protocols of Zhao and colleagues (1,10). A total of 4656 random clones were examined to produce a total of 8638 nonoverlapping high-quality, edited paired end sequence reads, yielding 2,519,551 bp of alligator, 2,432,811 bp of turtle, and 1,358,158 bp of anole original sequence for comparative 110 Shedlock et al. Fig. 8. Annotations of microchromosomal sequence from emu, Dromaius novaehollandiae. (A) The Apollo Genome Annotation and Curation Tool indicates conservation of sequence among queried species. The letters M, H, and C refer to the mouse, human and chicken genomes to which 41 kb of microchromosomal emu sequence was compared in this query. (B) The UCSC genome browser aligned 16.8 kb of microchromosomal emu sequence to chromosome 17 in the chicken genome. Conserved sequences among other species and repeats are also noted in the UCSC output. phylogenomic analysis. Among all reads, the average lengths were 769, 703, and 677 bp for alligator, turtle, and anole, respectively. 3. The pattern of nucleotide frequency observed in genome scans provides a means to estimate global base composition of genomes of species and to infer phylogenetic relationships of hierarchical structure that may be reflected by compositional patterns among genome scans of target species. In particular, because guanine + cytosine (GC) or GC-rich regions in the RNA polymerase II promoter region are essential for efficient transcription of genes in eukaryotes (56), the relative GC content and the distribution of GC rich/poor regions in genomes have become a standard index of evaluating protein coding components of genomes. GC content has been estimated indirectly by experimental methods, such as DNA ultracentrifugation and Amniote Phylogenomics: Testing Evolutionary Hypotheses 111 flow cytometry (29,57), but is increasingly analyzed directly with informatics tools and has been described in extensive detail within the context of whole-genome assemblies of model species (25,58–60). Moreover, analyzing the relationships between GC content and other organismal parameters, such as genome size, cell size, metabolic rate, and physiological constraints on life history, is an expanding line of investigation that illuminates the evolutionary dynamics and possible selective forces shaping genome structure (24,29,30,57,61). 4. We have used a statistical approach with genome scanning of alligator and turtle BAC-end sequences to estimate global GC values that are almost identical to those published based on buoyant densities in CsCl gradients and on-flow cytometry (~42%; [29,57]) indicating that GC content is elevated in alligators and turtles relative to values derived from the chicken and human genome assemblies (~40%; [25,60]). 5. Analysis of genomic signatures from birds has extended this phylogenetic approach to vertebrates and was shown to recover major branches in the avian tree, although phylogenetic relationships near the tips of the tree were clearly incongruent with traditional sequence analysis (62). Alternatively, the use of unsupervised neural network algorithms, known as self-organizing maps, or SOMs, can be used to infer phylogentic relationships based on oligo frequencies in unaligned genome sequences (63,64). Such SOMs are based on the frequencies of short (2 or 3 bp) oligonucleotides estimated from bulk sequence data and have been used to characterize the diversity of species present in environmental genomic samples (62,63,65). Although the signature approach remains exploratory, it is providing valuable new ways for harvesting phylogenetic information present in a wide variety of unalignable genomic sequence and have been shown to corroborate results of conventional analyses based on aligned sequences from homologous nucleotide positions (33,62). In general, when seeking to estimate phylogenetic relationships, we do not advocate relying upon the phenetic approach provided by signatures as a replacement for conventional methods of phylogenetic inference using a matrix of aligned nucleotides and established models of substitution, when available. However, we believe that novel, informatics-rich methods of inference such as genomic signatures provide an exciting option for phylogenetic analysis of higher order patterns of complexity inherent to genomic signatures, whereas at the same time providing insight into global patterns of genomic change not conditioned on a specific chromosome or gene region. 6. Ascertainment bias tends to provide a conservative underestimate of the true repeat diversity in target species, especially for highly derived novel TEs not annotated in model-organism genome assemblies. To minimize this problem there is a growing interest in developing informatics tools that do not rely on reference sequences that can detect directly structural components of certain classes of TEs, such as the tRNA-derived secondary structure in SINEs. Existing online informatics tools such as the tRNA-scan Search Server (htpp://rna.wustl.edu/GtRDB/Hs/Hs-align.html) and mfold (http://www.bioinfo.rpi.edu/applications/mfold/old/rna/form1.cgi) have been used successfully to identify novel families of phylogentically useful retroelements in orders of placental mammals (66). The integration of scripting from text 112 Shedlock et al. processing languages such as Perl (67) or Python (68) with the proliferation of online resources of bioinformatics promises to greatly facilitate discovery of phylogentically diverse repetitive elements de novo in addition to relying upon referencing against annotated databases. 7. SSRs, or microsatellites, are tandemly duplicated units of 1–6 bp DNA motifs found commonly throughout eukaryotic genomes (69). SSRs are unstable and highly mutable and are thought to be primarily a result of polymerase slippage resulting in misalignment of reassociating strands during DNA replication (70) although other mechanisms such as unequal recombination and gene conversion have been characterized in detail (71,72). The balance between adding repeat units and removing them by mismatch repair enzyme machinery is a dynamic process that can influence genome evolution by contributing substantial genetic variation (73,74), introducing mutational bias (75,76) and altering transcriptional activity (73,77) as well as global oligonucleotide frequencies. Moreover, the frequency of SSR types in vertebrates is uneven (78,79) and overall SSR abundance has been shown to correlate with genome size (80). It is, therefore, of interest to examine the profile of SSRs in nonavian reptiles in an effort to understand the influence of repetitive elements on vertebrate genome evolution within a comparative framework. 8. In order to estimate the density of both repeats and also n-letter word frequencies for genomic signature analysis, we assumed that, since these elements are rare events and are more or less uniformly distributed in the genome, the total number of repeats in the sampled region follows a Poisson distribution with rate Nr, where N is the total number repeats in the genome and r is the relative fraction of total repeats contained in the sampled region. Using this approach, departures from the Poisson model associated with localized uneven distributions of elements should provide a conservative estimate of element abundance. For any genome scan study, sampling a limited number of clone ends may yield interspersed and tandem repeat estimates that are biased from whole-genome counts for both bioinformatic and experimental reasons. 9. The Martins–Hansen method (41) utilizes the topological information in a given phylogeny with branch lengths set proportional to time and the values for continuous characters for each species, be they oligonucleotide word frequencies, base composition, repeat abundance, and others. COMPARE then calculates rates as the regression slope for a generalized least squares model describing how well time predicts the variance between pairwise taxon divergence. Rates of evolution along specific branches of a genomic signature tree can be compared in terms of the change in value of any continuous character per million years when divergence times are known from the fossil record. Acknowledgments We thank Chris Amemeiya, Tom Miyake, Robert Macey, Jeff Froula, Zhenshan Wang, Shaying Zhao, Jyoti Shetty, Marcia Lara, Jonathan Losos, Wes, Warren, and Pat Minx for technical support with genomic library construction, cloning, and end-sequencing. Charles Chapus, Chris Botka, Amir Karger, and Amniote Phylogenomics: Testing Evolutionary Hypotheses 113 Lakshmanan Iyer provided computational and informatics support. Jun Liu, Tingting Zhang, and Patrick Deschavanne contributed statistical expertise and help with data analysis. We thank John Wakeley, Mike Sorenson and members of the Edwards Lab for numerous helpful discussions, especially Nancy Rotzel and Chris Balakrishnan, for critical comment on the manuscript and assistance with illustrations. This work was produced in part with support from NSF grant IBN-0207870 to SVE and from Harvard University. References 1. Zhao, S. and Stodolsky, M., eds (2004) Bacterial Artificial Chromosomes: Library Construction, Physical Mapping, and Sequencing, Humana, Totowa, NJ. 2. Sambrook, J. and Russell, D. W. (2001) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. 3. Carvajal, J. J., Cox, D., Summerbell, D., and Rigby, P. W. (2001) A BAC transgenic analysis of the Mrf4/Myf5 locus reveals interdigitated elements that control activation and maintenance of gene expression during muscle development. Development 128, 1857–1868. 4. Giraldo, P. and Montoliu, L. (2001) Size matters: use of YACs, BACs and PACs in transgenic animals. Transgenic Res. 10, 83–103. 5. Heintz, N. (2000) Analysis of mammalian central nervous system gene expression and function using bacterial artificial chromosome-mediated transgenesis. Hum. Mol. Genet. 9, 937–943. 6. Takahashi, R., Ito, K., Fujiwara, Y., et al. (2000) Generation of transgenic rats with YACs and BACs: preparation procedures and integrity of microinjected DNA. Exp. Anim. 49, 229–233. 7. Amemiya, C. T., Zhong, T. P., Silverman, G. A., Fishman, M. C., and Zon, L. I. (1999) Zebrafish YAC, BAC and PAC genomic libraries. Methods Cell Biol. 60, 235–258. 8. Choi, S. and Kim, U.-J. (2001) Construction of a bacterial artificial chromosome library. In: Genomics Protocols (Starkey, M. P. and Elaswarapu, R., eds), pp. 57–68, Humana, Totowa, NJ. 9. Zehetner G., Pack M., and Schäfer, K. (2001) Preparation and screening of highdensity cDNA arrays with genomic clones. In: Genomics Protocols (Starkey, M. P. and Elaswarapu, R., eds), pp. 169–188, Humana, Totowa, NJ. 10. Zhao, S., Shatsman, S., Ayodeji, B., et al. (2001) Mouse BAC ends quality assessment and sequence analyses. Genome Res. 11, 1736–1745. 11. Gasper, J. S., Shiina, T., Inoko, H., and Edwards, S. V. (2001) Songbird genomics: analysis of 45 kb upstream of a polymorphic Mhc Class II gene in red-winged blackbirds (Agelaius phoenicius). Genomics 75, 26–34. 12. Kim, C. B., Amemiya, C., Bailey, W., et al. (2000) Hox cluster genomics in the horn shark, Heterodontus francisci. Proc. Natl Acad. Sci. USA 97, 1655–1660. 13. Giribet, G., Edgecombe, G. D., and Wheeler, W. C. (2001) Arthropod phylogeny based on eight molecular loci and morphology. Nature 413, 157–161. 14. Madsen, O., Scally, M., Douady, C. J., et al. (2001) Parallel adaptive radiations in two major clades of placental mammals. Nature 409, 610–614. 114 Shedlock et al. 15. Murphy, W. J., Eizirik, E., Johnson, W. E., et al. (2001) Molecular phylogenetics and the origins of placental mammals. Nature 409, 614–618. 16. Soltis, P. S., Soltis, D. E., and Chase, M. W. (1999) Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402, 402–404. 17. Edwards, S. V., Jennings, W. B., and Shedlock, A. M. (2005) Phylogenetics of modern birds in the era of genomics. Proc. R. Soc. Lond. Ser. B 272, 979–992. 18. Baker, R. J., Longmire, J. L., Maltbie, M., Hamilton, M. J., and VandenBussche, R. A. (1997) DNA synapomorphies for a variety of taxonomic levels from a cosmid library from the new world bat Macrotus waterhousii. Syst. Biol. 46, 579–589. 19. Rokas, A. and Holland, P. W. H. (2000) Rare molecular changes as a tool for phylogenetics. Trends Ecol. Evol. 15, 454–459. 20. Shedlock, A. M., Milinkovitch, M. C., and Okada, N. (2000) SINE evolution, missing data, and the origin of whales. Syst. Biol. 49, 808–817. 21. Shedlock, A. M. and Okada, N. (2000) SINE insertions: powerful tools for molecular systematics. Bioessays 22, 148–160. 22. Shedlock, A. M., Takahashi, K., and Okada, N. (2004) SINEs of speciation: tracking lineages with retroposons. Trends Ecol. Evol. 19, 545–553. 23. Sarre, S., Georges, A., and Quinn, A. (2004) The ends of a continuum: genetic and temperature-dependent sex determination in reptiles. Bioessays 26, 639–645. 24. Waltari, E. and Edwards, S. V. (2002) Evolutionary dynamics of intron size, genomesize, and physiological correlates in archosaurs. Am. Nat. 160, 539–552. 25. Shedlock, A. M., Botka, C. W., Zhao, S., et al. (2007) Phylogenomics of non-avian reptiles and the structure of the ancestral amniote genome. Proc. Natl Acad. Sci. USA 104, 2767–2772. 26. Shedlock, A. M. (2006) Phylogenomic investigation of CR1 LINE diversity in reptiles. Syst. Biol. 55, 902–911. 27. Organ, C. L., Shedlock, A. M., Meade, A., Pagel, M., and Edwards, S. E. (2007) Origin of avian genome size and structure in non-avian dinosaurs. Nature 446, 180–184. 28. Shedlock, A. M. (2006) Exploring frontiers in the DNA landscape: An introduction to the symposium "Genome analysis and the molecular systematics of retroelements”. Syst. Biol. 55, 871–874. 29. Vinogradov, A. E. (1998) Genome size and GC-percent in vertebrates as determined by flow cytometry: the triangular relationship. Cytometry 31, 100–109. 30. Olmo, E. (1986) Animal Cytogenetics, Vol. 4: Chordata, No. 3A: Reptilia. Gebrüder Borntraeger, Berlin. 31. Blanchette, M., Green, E. D., Miller, W., and Haussler, D. (2004) Reconstructing large regions of an ancestral mammalian genome in silico. Genome Res. 14, 2412–2423. 32. Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194. 33. Chapus, C., Dufraigne, C., Edwards, S. V., et al. (2005) Exploration of phylogenetic data using a global sequence analysis method. BMC Evol. Biol. 2005, 63. 34. Deschavanne, P., Giron, A., Vilain, J., Fagot, G., and Fertil, B. (1999) Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. Mol. Biol. Evol. 16, 1391–1399. Amniote Phylogenomics: Testing Evolutionary Hypotheses 115 35. Karlin, S. and Ladunga, I. (1994) Comparisons of eukaryotic genomic sequences. Proc. Natl Acad. Sci. USA 91, 12,832–12,836. 36. Saitou, N. and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425. 37. Felsenstein, J. (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791. 38. Smit, A. F. A., Hubley, R., and Green, P. (2004) RepeatMasker Open-3.0.5 (http:// www.repeatmasker.org). 39. Benson, G. (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580. 40. Martins, E. P. (2004) COMPARE, version 4.6b. Computer programs for the statistical analysis of comparative data. Distributed by the author at http://compare.bio.indiana. edu/. Department of Biology, Indiana University, Bloomington IN. 41. Martins, E. P. and Hansen, T. F. (1997) Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into analysis of interspecific data. Am. Nat. 149, 646–667. 42. Hedges, S. B. and Poling, L. L. (1999) A molecular phylogeny of reptiles. Science 283(5404), 998–1001. 43. Kumar, S. and Hedges, B. (1998) A molecular timescale for vertebrate evolution. Nature 392, 917–920. 44. Hasegawa, M., Thorne, J. L., and Kishino, H. (2003) Time scale of eutherian evolution estimated without assuming a constant rate of molecular evolution. Genes Genet. Syst. 78, 267–283. 45. Springer, M. S., Murphy, W. J., Eizirik, E., and O’Brien, S. J. (2003) Placental mammal diversification and the Cretaceous-tertiary boundary. Proc. Natl Acad. Sci. USA 100(3), 1056–1061. 46. Ogawa, A., Murata, K., and Mizuno, S. (1998) The location of Z- and W-linked marker genes and sequence on the homomorphic sex chromosomes of the ostrich and the emu. Proc. Natl Acad. Sci. USA 95(8), 4415–4418. 47. Gordon, D. C. (2004) Viewing and editing assembled sequences using Consed. In: Current Protocols in Bioinformatics (Baxevanis, A. D. and Davison, D. B., eds), Wiley, New York. 48. Gordon, D., Desmarais, C., and Green, P. (2001) Automated finishing with Autofinish. Genome Res. 11(4), 614–625. 49. Gordon, D., Abajian, C., and Green, P. (1998) Consed: a graphical tool for sequence finishing. Genome Res. 8(3), 195–202. 50. Lee, M. K., Lynch, E. D., and King, M. C. (1998) SeqHelp: a program to analyze molecular sequences utilizing common computational resources. Genome Res. 8(3), 306–312. 51. Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268(1), 78–94. 52. Snyder, E. E. and Stormo, G. D. (1995) Identification of protein-coding regions in genomic DNA. J. Mol. Biol. 248(1), 1–18. 53. Lewis, S. E., Searle, S. M. J., Harris, N., et al. (2002) Apollo: a sequence annotation editor. Genome Biol. 3, research0082.1–14. 116 Shedlock et al. 54. Parsons, J. D. (1995) Miropeats: graphical DNA sequence comparisons. Comput. Appl. Biosci. 11, 615–619. 55. Rozas, J., Sanchez-DelBarrio, J. C., Messeguer, X., and Rozas, R. (2003) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19(18), 2496–2497. 56. Watson, J. D., Hopkins, N. H., Roberts, J. W., Steitz, J. A., and Weiner, A. M. (1987) Molecular Biology of the Gene, 4th Ed., Benjamin Cummings, Menlo Park. 57. Hughes, S., Clay, O., and Bernardi, G. (2002) Compositional patterns in reptilian genomes. Gene 295, 323–329. 58. Jaillon, O., Aury, J.-M., and Brunet, F., et al. (2004) Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431(7011), 946–957. 59. Rat Genome Sequencing Project Consortium. (2004) Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521. 60. International Chicken Genome Sequencing Consortium. (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432(7018), 695–716. 61. Hughes, A. and Piontkivska, H. (2005) DNA repeat arrays in chicken and human genomes and the adaptive evolution of avian genome size. BMC Evol. Biol. 5(1), 12. 62. Edwards, S. V., Fertil, B., Giron, A., and Deschavanne, P. (2002) A genomic schism in birds revealed by phylogenetic analysis of DNA strings. Syst. Biol. 51(4), 599–613. 63. Abe, T., Kanaya, S., Kinouchi, M., et al. (2003) Informatics for unveiling hidden genome signatures. Genome Res. 13, 693–702. 64. Dopazo, J. and Carazo, J. M. (1997) Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J. Mol. Evol. 44, 226–233. 65. Uchiyama, T., Abe, T., Ikemura, T., and Watanabe, K. (2005) Substrate-induced gene-expression screening of environmental metagenome libraries for isolation of catabolic genes. Nat. Biotechnol. 23, 88–93. 66. Churakov, G., Smit, A. F. A., Brosius, J., and sSchmitz, J. (2004) A novel abundant family of retroposed elements (DAS-SINEs) in the nine-banded armadillo (Dasypus novemcinctus). Mol. Biol. Evol. 22, 886–893. 67. Wall, L., Christiansen, T., and Orwant, J. (2000) Programming Perl, 3rd Ed., O’Reilly Media, Cambridge. 68. Lutz, M. (2001) Programming Python, 2nd Ed., O’Reilly Media, Cambridge. 69. Tautz, D. and Renz, M. (1984) Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucelic Acids Res. 12, 4127–4138. 70. Richards, R. I. and Sutherland, G. R. (1994) Simple repeat DNA is not replicated simply. Nat. Genet. 6, 114–116. 71. Almeida, P. and Penha-Gonçalves, C. (2004) Long perfect dinucleotide repeats are typical of vertebrates, show motif preferences and size convergence. Mol. Biol. Evol. 21, 1226–1233. 72. Majewski, J. and Ott, J. (2000) GT repeats are associated with recombination on human chromosome 22. Genome Res. 10, 1108–1114. Amniote Phylogenomics: Testing Evolutionary Hypotheses 117 73. Kashi, Y., King, D. C., and Soller, M. (1997) Simple sequence repeats as a source of quantitative genetic variation. Trends Genet. 13, 74–78. 74. Tautz, D., Trick, M., and Dover, G. (1986) Cryptic simplicity in DNA is a major source of genetic variation. Nature 322, 652–656. 75. Amos, W., Sawcer, S. J., Feakes, R. W., and Rubeinsztein, D. C. (1996) Microsatellites show mutational bias and heterozygot instability. Nat. Genet. 13, 390–391. 76. Primmer, C. R., Saino, N., Møller, A. P., and Ellegren, H. (1996) Directional evolution in germline microsatellite mutations. Nat. Genet. 13, 391–393. 77. Gerber, H. P., Seipel, K., Georgiev, O., et al. (1994) Transcriptional activation modulated by homopolymeric glutamine and proline stretches. Science 263, 808–811. 78. Tóth, G., Gáspári, Z., and Jurk, J. (2000) Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 10, 967–998. 79. Primmer, C. R., Raudsepp, T., Chowdhary, B. P., Moller, A. P., and Ellegren, H. (1997) Low frequency of microsatellites in the avian genome. Genome Res. 7(5), 471–482. 80. Hancock, J. M. (1996) Simple sequence repeats and the expanding genome. Bioessays 18, 421–425. 81. Burt, D. W. (2002) Origin and evolution of avian microchromosomes. Cytogenet. Genome Res. 96(1–4), 97–112. 82. Epplen, J. T., Diedrich, U., Wagenmann, M., Schmidtke, J., and Engel, W. (1979) Contrasting DNA sequence organisation patterns in sauropsidian genomes. Chromosoma 75, 199–214. 83. Gregory, T. R. (2001) Animal Genome Size Database (http://www.genomesize.com). 84. King, M., Honeycutt, R., and Contreras, N. (1986) Chromosomal repatterning in crocodiles: C, G, and N-banding and in situ hybridization of 18S and 26S rRNA cistrons. Genetica 27, 191–201.