Download View PDF - CiteSeerX

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
7
Amniote Phylogenomics: Testing Evolutionary
Hypotheses with BAC Library Scanning
and Targeted Clone Analysis of Large-Scale
DNA Sequences from Reptiles
Andrew M. Shedlock, Daniel E. Janes, and Scott V. Edwards
Summary
Phylogenomics research integrating established principles of systematic biology and
taking advantage of the wealth of DNA sequences being generated by genome science
holds promise for answering long-standing evolutionary questions with orders of magnitude
more primary data than in the past. Although it is unrealistic to expect whole-genome
initiatives to proceed rapidly for commercially unimportant species such as reptiles, practical
approaches utilizing genomic libraries of large-insert clones pave the way for a phylogenomics of species that are nevertheless essential for testing evolutionary hypotheses within
a phylogenetic framework. This chapter reviews the case for adopting genome-enabled
approaches to evolutionary studies and outlines a program for using bacterial artificial chromosome (BAC) libraries or plasmid libraries as a basis for completing “genome scans” of
reptiles. We have used BACs to close a critical gap in the genome database for Reptilia,
the sister group of mammals, and present the methodological approaches taken to achieve
this as a guideline for designing similar comparative studies. In addition, we provide a detailed
step-by-step protocol for BAC-library screening and shotgun sequencing of specific clones
containing target genes of evolutionary interest. Taken together, the genome scanning and
shotgun sequencing techniques offer complementary diagnostic potential and can substantially increase the scale and power of analyses aimed at testing evolutionary hypotheses for
nonmodel species.
Key Words: Amniote; Reptilia; BAC library; genome scan; genomic signature;
retroelement; simple sequence repeat; sex-linked marker; shotgun cloning; EE0.6.
From: Methods in Molecular Biology: Phylogenomics
Edited by: W. J. Murphy © Humana Press Inc., Totowa, NJ
91
92
Shedlock et al.
1. Introduction
1.1. Genome-Enabled Phylogenetics and BAC Libraries
Completion of the human and other genome projects constitute a leap in
the scale on which the genome can be organized and studied. However, the technology that makes producing whole-genome assemblies feasible will likely be
more difficult to transfer to other disciplines in biology, particularly the evolutionary sciences, than other more accessible DNA diagnostics have been, most
notably, the polymerase chain reaction, for example. Nevertheless, comparative
approaches and forays into nonmodel species are already being made by evolutionary biologists in an effort to integrate genome science with established
principles of systematics, forging the exciting new area of interdisciplinary
research aptly coined phylogenomics. Some would argue that merely having
genomic resources for single representatives of, say, fish, mammals, and birds
are sufficient to broaden genomics to include evolutionary biology. Furthermore,
the cost of genome projects, at least in the short term, is still high enough to
prohibit easy transfer of even basic genome resources to the community of evolutionary biologists. The rise of national Genome Centers and the increasing
interest of such centers in tackling questions in nonmodel species is also strong
evidence that the biology of the 21st century will be much more team-,
organization-, and resource-driven than was biology in earlier decades. Indeed,
given existing technologies, it is clear that systems-level biology even on single
species requires not only advanced technologies and robotics not readily available to single-PI research programs, but large numbers of staff and infrastructure
in coordination activities.
We believe that, despite the logistical challenges to scaling up the size of
comparative DNA sequence studies by orders of magnitude, evolutionary biology
needs to embrace genomic technology. In particular, the use of genomic libraries
such as those comprised of bacterial artificial chromosome (BAC) clones is one
of several excellent ways to do this. BAC libraries are large-insert genomic
libraries that are currently the optimal starting point for large-scale analysis of
genome evolution in eukaryotes (1). Typically BAC clones can faithfully propagate fragments of DNA on average ~150 kb in length. They have a number of
advantages over earlier types of genomic libraries, such as lambda and cosmid
libraries, which can accommodate only smaller insert sizes. Although yeast
artificial chromosome (YAC) vectors can hold much larger inserts, they are
much less stable than BACs: BACs are much less susceptible to inter- and intraclone recombination related to their low copy number (1–2) per E. coli cell and
the presence of genes ensuring faithful propagation and passage to daughter
cells during cell division (2). For these reasons and their ease of manipulation,
growth, and isolation of clones, BAC libraries formed the current backbone
Amniote Phylogenomics: Testing Evolutionary Hypotheses
93
of many efforts to sequence complete genomes, including the human genome.
Furthermore, through the use of homologous recombination to modify BACs
with reporter constructs and their use as substrates for transgenesis (both within
species and between species), many new directions in functional biology are
opened through BAC libraries (3–6).
The details of BAC library construction are extensive and go well beyond
the scope of the present chapter (7,8). We therefore do not include protocols
for BAC library production here but rather focus on the use of such libraries for
conducting comparative studies. BAC libraries are typically produced by and
can be accessed in collaboration with laboratories equipped with semiautomated
accessories and robotics optimized for producing and managing BAC resources,
exemplified by Production Centers affiliated with the U.S. National Human
Genome Research Institute’s BAC Resource Network (NHGRI; http://www.
genome.gov). In addition to the BAC libraries themselves, one of the most important advancements in eukaryotic genome science is the adoption of standardized
methods of archiving of libraries and clones in microtiter plates, with one clone
per well, thereby ensuring their long-term survival and utility to the scientific
community (9). Such protocols represent a vast improvement over earlier bulk
storage methods, which frequently result in significant loss of clones in the long
term. In particular, we have benefited from collaboration with and protocols
developed by Amemiya and colleagues (7) and from protocols and applications
reviewed extensively by Zhao (1,10).
The availability of BAC libraries from an expanding diversity of nonmodel
species provides an ideal resource to increase access to genome-scale science
by comparative biology. Although it is impractical to expect broad comparative
studies to proceed rapidly for nonmodel organisms on a whole-genome basis,
it is attractive and far less-expensive to estimate the structure of poorly known
genomes by mining BAC libraries from diverse taxa essential for testing numerous
hypotheses within a phylogenetic context.
First, BAC libraries are in principle applicable to any organism with sufficient
high-quality genomic DNA. Thus, the diversity of nonmodel species and those
which cannot be maintained in captivity are all potential targets for BAC libraries.
Second, in conjunction with ancillary methods of shotgun cloning and sequencing,
BAC libraries provide access to an order-of-magnitude increase in the amount
of DNA sequence data that can be brought to bear on problems of systematics
and molecular evolution. Shotgun sequencing approaches are still relatively
uncommon in nonmodel vertebrates (11,12). Despite recent substantial increases
in the amount of sequence data devoted to questions of systematics in a variety
of clades (13–16). BAC libraries offer yet larger increases that could resolve
trees even further (17). More importantly, BAC-enabled studies can provide
a genuinely genomic window into molecular evolutionary processes, revealing
94
Shedlock et al.
on a large scale the vast array of non-nucleotide types of molecular variation,
such as indels, duplications, and rearrangements, that hold considerable promise
for molecular systematics but have thus far been underexploited (18–22).
Other key trends in evolutionary biology for which BACs are relevant include
the positional cloning of candidate genes for phenotypic traits, enhancement
of QTL mapping efforts, population genetics, chromosome evolution, the role
of gene regulation in evolution, and the evolution of multigene families. As we
discuss below, these research directions in the Reptilia lag far behind similar
studies in mammals, partly associated with the lack of suitable molecular reagents
with which to tackle these problems. With the large database on developmental
genetics, immunogenetics, and genome and mapping efforts provided by the
chicken, these fields are now poised to be extended into a comparative framework as the relevant tools for accessing genes and gene families, especially BAC
libraries, continue to become available.
It is important also to note that several of the approaches for genome scanning,
we suggest, can be implemented with genome libraries other than BAC libraries.
For example, simple plasmid libraries with 2–7 kb inserts can provide a rich
resource for examining many of the same issues as can be analyzed using BAC
libraries. However, plasmid library analysis will incur some important limitations
compared to BACs, such as the inability to perform downstream hybridizations
of selected BACs to chromosomes using FISH, or to delimit large chromosomal
regions with simple paired-end sequences of clones using comparative bioinformatics tools. Thus, although plasmid libraries will in principle yield the same
type of multimegabase sequence data sets from end-reads as will BACs, the uses
to which these can be put are more limited than with BACs.
1.2. Reptile Phylogenomics and the Amniote Ancestor
Reptiles are a critical group of vertebrates for understanding the evolutionary
dynamics of amniote genome evolution (Fig. 1). Reptiles are far more taxonomically diverse (~17,000 vs. ~4,500 species), and arguably more developmentally
and chromosomally diverse than mammals. They also exhibit a diversity of
environmental and genetic sex determining mechanisms relative to mammals
(23). Comparing data from the chicken genome to that of mammals in the
absence of relevant outgroup information from nonavian reptiles has limited
our ability to accurately infer the genomic condition of our common amniote
ancestor. Recent attempts to close this large gap in the comparative genomics
literature have tested alternative models of amniote genome evolution summarized by Waltari and Edwards (24). Results from phylogenomic analysis of
multi-megabase large clone insert sequence from exemplar non-avian species
indicate that the ancestral amniote likely had a relatively large genome with a
diverse repetitive landscape and GC content similar to that observed for many
Amniote Phylogenomics: Testing Evolutionary Hypotheses
95
Fig. 1. Diagram summarizing conventional view of amniote relationships. Distribution
of genome sizes, measured in picograms, for lineages are represented by bars over branch
tips. The question mark and dotted arrow indicate a growing body of molecular evidence
suggesting turtles are more derived among reptilian clades than their traditional basal
position illustrated here.
mammals and that underwent a series of sequential size reductions in the lineage
leading to birds (25,26). Moreover, integrating paleontological data on bone cell
size with that for genome size and interspersed repeat abundance in extant amniote
species has revealed that the substantial reduction of non-coding elements
96
Shedlock et al.
leading to a streamlining of the chicken genome likely evolved in theropod
dinosaurs ~130 Myrs before the origin of avian flight (27). The largely uncharted
genomic landscape of squamate reptiles remains an important open avenue
for exploring the diversity of amniote genome structure and for developing a
predictive theory of genome evolution (28). Table 1 outlines some of the major
genomic features for three nonavian reptile focal species as well as chicken
and human for comparison. Genomic clone libraries for the American alligator (Alligator mississippiensis), turtle (Chrysemys picta), and Anole (Anolis
smaragdinus) have been analyzed within a phylogenetic context using methods
of investigation outlined in detail below (25).
1.3. Historical Genome Dynamics
Two major goals of our phylogenomic approach have been to synthesize a
model of genome evolution in reptilian clades over the past 310 Myr of vertebrate
evolution and to infer ancestral conditions in the amniote common ancestor.
Because we have gleaned genome statistics from unaligned sequence data,
many of the genome characters we seek to understand are continuous variables,
as opposed to the discrete characters provided by aligned DNA sequences. For
this reason, the approaches we have used are somewhat different from those
used to calculate ancestral sequence states for whole mammalian chromosomes
(e.g., 31). Our approach integrates estimates of genome size, global base
composition, abundance and diversity of repetitive elements, and phylogenetic
relationships among species-specific signatures of higher order sequence complexity. In particular, this can be achieved by (1) mapping the phylogenetic
distribution of retroelements which are known to modulate genome size; (2)
mapping the phylogentic distribution of particular simple sequence repeat
(SSR) subclasses, which may affect global base composition; and (3) calculating rates of frequency change in DNA words along branches of a phylogeny
derived from genomic signatures. Taken together these three provide a means
for sketching the history of genome dynamics among lineages being investigated
by genome scanning.
2. Materials
2.1. Manipulating and Screening BAC Libraries
1. Electrocompetent, DH10B T1-resistant Escherichia coli cells (cat. no. 12033015,
Invitrogen).
2. LB broth (Miller).
3. Nylon filters.
4. Chloramphenicol.
5. Glycerol.
6. LB agar.
7. Lysis buffer solution: 2X SSC, 5% SDS.
Amniote Phylogenomics: Testing Evolutionary Hypotheses
97
Table 1
Some General Features of Reptilian and Human Genomes
Genome size (pg)
Haploid chromosome no.
Microchromosomes?
Alligator
Turtle
Chicken
Anole
Human
2.49
16
No
2.57
~25
Yes
1.25
39
Yes
2.2
~18
Yes
3.5
23
No
Sources: refs. 29,30,81–84.
8. Proteinase K buffer solution: 50 mM Tris (pH 8), 50 mM EDTA, 100 mM NaCL,
1% N-lauryl sarcosine.
9. 10 μg/mL proteinase K.
10. 2X SSC.
11. Fabric pen (cat. no. PN1, Cleaner’s Supply).
12. XL-10 Gold competent E. coli cells (cat. no. 200315, Stratagene).
13. (γ-32P)-dCTP.
14. Prime-It II Random Primer Labeling Kit (cat. no. 300385, Stratagene).
15. Microspin columns (cat. no. 27-5120-01, GE Healthcare).
16. Sonicated, nonhomologous herring sperm DNA (cat. no. D1815, Promega).
17. Hybridization mesh (cat. no. RPN2519, GE Healthcare).
18. Hybridization buffer solution: 18.75 mL 20X SSPE, 3.75 mL 100X Denhardt’s
solution, 3.75 mL 10% (w/v) SDS, and 48.75 mL H2O.
19. 1X washing buffer solution: 935 mL H2O, 50 mL 20X SSC, 5 mL 20% SDS, 10 mL
5% pyrophosphate.
20. Metal cassette (cat. no. S-14, Spectronics).
21. Biomax MR X-ray film (cat. no. 8567232, Kodak).
22. X-omat film processor (cat. no. 1000A, Kodak).
23. Stripping solution 1: 0.2 N NaOH.
24. Stripping solution 2: 0.1 M Tris-HCl (pH 7.5), 0.1X SSC, 0.1% (w/v) SDS.
25. Stripping solution 3: 0.1X SSC, 0.1% (w/v) SDS.
26. Resuspension buffer (cat. no. 19051, Qiagen).
27. RNase A (cat. no. 19101, Qiagen).
28. Lysis buffer (cat. no. 19052, Qiagen).
29. Neutralization buffer (cat. no. 19053, Qiagen).
30. Isopropanol.
31. 70% ethanol.
32. TE buffer.
33. Restriction enzymes (EcoRI and HindIII).
34. Hydro-shear (cat. no. JHSH000000-1, Genomic Solutions).
35. Micrococcus (cat. no. 159972, MP Biomedicals).
36. Qiaquick kit (cat. no. 28106, Qiagen).
37. Blunting solution for 10 samples: 516.67 μL 5X T4 DNA polymerase buffer
(cat. no. M0203L, New England Biolabs), 258.33 μL BSA (1 mg/mL; cat. no.
B9001S, New England Biolabs), 258.33 μL dNTPs (1 mM; cat. no. 10766020,
98
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
Shedlock et al.
Roche), 193.75 μL H2O, 64.58 μL T4 DNA polymerase (cat. no. M0203L, New
England Biolabs).
0.5 M EDTA.
BstXI/EcoRI Adapter (cat. no. N41818, Invitrogen).
Linking solution for 10 samples: 38.75 μL 10X T4 DNA ligase buffer (cat. no.
M0202L, New England Biolabs), 38.75 μL T4 DNA ligase (cat. no. M0202L, New
England Biolabs).
Vector (cat. no. A362A, Promega).
Ligating solution for 10 samples: 23.3 μL 10X T4 DNA ligase buffer, 17.5 μL T4
DNA ligase.
GC5 cells (cat. no. 62-7000-16W, PGC Scientific).
SOC medium (cat. no. S1625, Teknova).
LB solution: 1.8 L LB broth (Miller), 3 mL chloramphenicol, 0.2 L glycerol.
96-well 800 μL uniplate (cat. no. 7701-1800, Whatman).
96-well 800 μL unifilter (cat. no. 7700-1806, Whatman).
Big Dye Terminator v3.1 Cycle Sequencing Kit (cat. no. 4336917, Applied Biosystems).
M13 primers.
3. Methods
3.1. Comparative Investigation of Genomic Libraries
Phylogenomic analysis of BAC-libraries can be divided into two general
categories of investigation: (1) characterization of genome structure and comparative analysis of multimegabase sequence data based on surveys of paired-end
reads of clones sampled randomly from each genome; and (2) targeted studies
of genes and genomic neighborhoods based on protocols aimed at characterizing
particular loci of interest from a small subset of selected clones. These two
approaches to interrogating the library complement each other in terms of their
sampling designs, analysis of primary data, and scales of inference about genome
evolution. Although genome scanning of BAC-ends provides a statistical estimate of the overall structure of the genomes investigated, it does not typically
allow for direct comparisons of particular genes of interest nor does it generate
homologous, alignable sequences from specific regions of the genome which can
be analyzed by conventional systematic methods. Conversely, although targeted
BAC assays will allow for investigation of particular genes of interest, they can
provide little inference about the genome-wide distribution of genomic elements
or global base composition and historical dynamics of genome evolution among
the species being investigated. Taken together the two lines of investigation can
provide a synergy of understanding about global patterns of genome dynamics
as well as fine-scale evolutionary analysis of particular chromosomal regions in
the context of genomic neighborhoods. Each of these two experimental approaches
is summarized below with examples from investigating reptilian genomes within
a comparative framework.
Amniote Phylogenomics: Testing Evolutionary Hypotheses
99
3.2. Genome Scanning of Nonavian Reptiles
The following sections outline basic steps taken in completing a BAC-enabled
phylogenomic analysis of major lineages of Reptilia. The experimental strategy
exploits the relative ease with which a BAC library can be used to generate a
multimegabase dataset of nonoverlapping nucleotide sequences sampled randomly
from throughout a genome to complete a so-called “genome scan.” Although
we have applied this approach to investigating the phylogenomics of reptiles
and the structure of the ancestral amniote genome, the methods outlined here
serve as a guideline for similar investigations that may test evolutionary hypotheses regarding other nonmodel species using large-scale comparative sequence
analysis. Completing a phylogenomic study based on genome scans of BAC
libraries relies heavily on existing genomic resources, informatics tools, access
to a diversity of online genomic databases, and an integration of computational
methods aimed at testing evolutionary hypotheses within a phylogenetic framework. A flowchart summarizing the basic steps and integration of methods
we have employed to investigate amniote genome evolution is illustrated in
Fig. 2 (see Note 1).
1. Selection of vector primers. If you are creating your own genomic library, design
the vector sequence of each individual clone to contain both forward and reverse
priming sites near the point of fragment insertion. If you are using an existing
library, determine the sequence priming sites of vectors used to build the library.
Standard vector sequencing primers include commercially available forward and
reverse oligos such as M13 and T7 which can be used to sequence ~500–1000 bp
of original genomic sequence along each end of the clone insert.
2. Sequence nonoverlapping clone inserts. Use forward and reverse vector primers
to gather paired end-sequences from randomly selected clones in each library. Set
up sequencing PCR reactions in 96-well format with ABI Big Dye cycle sequencing
kits and gather primary data using a high-throughput capillary array DNA analyzer
such as the model ABI 3730 (see Note 2). Compile paired end-reads for thousands
of clones isolated from libraries of target species to produce a multimegabase
data set of genomic sequence sampled randomly across each genome. A cartoon
summarizing the genome scanning process for a BAC library is shown in Fig. 3,
with several targeted downstream applications illustrated.
3. Screen end-sequences for poor base calling and contamination. Employ the Phred
base calling software program (32) and exclude poor reads with quality value (QV)
scores of less than 20. Identify and remove vector sequences from the data set using
the NCBI tool VecScreen (http://www.ncbi.nlm.nih.gov/projects/VecScreen/).
4. Evaluate base composition. Derive statistical estimates of global GC content with
a 95% confidence interval from observed GC values in multimegabase data compiled from BAC-end sequences distributed across the genome (see Note 3). Take into
account any intragenomic variation in GC distribution owing to genomic features
such as isochore structure and cpG islands as well as uneven local GC distributions
100
Shedlock et al.
Fig. 2. Flowchart summarizing series of basic steps taken in completing a BAC-enabled
genome scan study. Paired BAC-end sequences are compiled into large-scale datasets,
examined for structural features. Complexity measured in terms of genomic signatures
of DNA word frequencies are related phylogentically and integrated with structural data
and estimates of character change along branches to synthesize models of historical
genome dynamics and test evolutionary hypotheses within a phylogenetic framework.
present in given sequence reads. Do this by checking for autocorrelation of bases
up to 50–100 nt away from a focal base and assuming a model in which the GC
value for each read follows a binomial distribution and where the probability of the
GC values are independently and identically distributed with an unknown density
(see Note 4).
5. Create genomic signatures of word-frequencies. Large-scale primary sequence
derived from genome scans contains compositional information above the level of
homologous alignable nucleotide sites that may reflect species-specific differences
in patterns of mutation, repair, and selection on the molecular level. Evaluate this
higher order organizational structure using motif-counting procedures (33–35). Do
this by counting frequencies of all possible short n-letter oligos, or DNA words, using
a 1-bp sliding-window approach and plotting genomic signatures (see http://genstyle.
imed.jussieu.fr/). Examples of annotated visual summaries of genomic signatures
for the American alligator and a human are presented in Fig. 4, along with a legend
Amniote Phylogenomics: Testing Evolutionary Hypotheses
101
Fig. 3. Cartoon detailing the use of BAC clones to establish primary data for largescale sequence scanning. Clone-inserts randomly arrayed in a BAC library are sequenced
with known forward and reverse vector primers. Compilation of multimegabase data
sets sampled across target genomes approximate global structural features and allow for
a diversity of genomic analyses, including plots of continuous character distributions,
base compostition profiling, and mapping of sequence synteny onto reference genome
data assemblies.
explaining the scheme of pixel representation shown for frequencies of all possible
16,384 (47) 7-nt words in ~2.5 Mb of BAC-end sequence in each signature.
6. Relate genomic signatures with phylogenetic methods. Generate Euclidean distances
among genomic signatures from multiple species based on the square root of the
sum of the square of the differences in frequency of motifs. Normalize each signature for genome-wide base compositional differences prior to generating distance
matrices. Do this by subtracting the expected frequency of each motif on the basis
of single-letter base composition from the observed frequency for each species.
Construct phylogenetic trees of the matrix using distance methods such as the
neighbor-joining (NJ) algorithm (36). Evaluate statistical support for NJ tree topology using bootstrap replication. Create pseudomatrices by removing each word
with replacement and calculating signatures and distances for each pseudomatrix
(37) (see Note 5).
102
Shedlock et al.
Fig. 4. Examples of genomic signatures for (A), American alligator, and (B), human,
visualized by pixel representations of all possible 7-nt DNA word frequencies contained
in ~2.5 Mb of genomic sequence per signature. Diagram in (C) illustrates the order of
pixel counts used to construct signatures of n-letter words. Darker pixels correspond
to higher frequencies. 1, 3, 8, 9, correspond to C7, G7, A7, and T7 words, respectively,
plotted at corners of the signatures. 2, 6, 7, mark regions of CG-poor words. 4, 5, mark
diagonal lines formed by densities of 7-letter words composed only of pyrimidines or
purines, respectively. 10 marks adjacent regions exhibiting high densities of microsatellite
motifs apparent in human but absent in alligator.
7. Estimate the diversity of interspersed repeats. Complete informatics searches for
interspersed transposable genetic elements present in paired end-sequences of clones
from genomic libraries with local alignment tools implemented in the program
RepeatMasker (38). Evaluate hits from searches that may reflect an inherent ascertainment bias related to both the incomplete nature of the reference database used
by RepeatMasker and the relative level of divergence between query and reference
sequences (see Note 6).
8. Estimate the diversity of tandem repeats. Count the density of microsatellites or SSRs
(see Note 7) in BAC-end sequences corresponding to specific length categories using
Amniote Phylogenomics: Testing Evolutionary Hypotheses
103
the search options built into the online informatics tool Tandem Repeats Finder
(39). Algorithms used by Tandem Repeats Finder detect and score target elements
independently of any underlying reference database, however, search results can be
influenced by alignment parameter settings used for a given query sequence.
9. Calculate the global density of repetitive elements in target genomes. Obtain
genome-size information for target species, defined by haploid nuclear DNA
content measured in picograms (C-value). Use C-values to calculate the fraction of
genomes that are surveyed by paired end sequences of clone inserts sampled for
a genomic library. Obtain C-values either from direct experimental measurements
(e.g., flow cytometry and buoyant density analysis) or from the literature. The Animal
Genome Size Database (http://www.genomesize.com/) is a useful online reference
for obtaining statistics on genome size and karyotype information from a wide
diversity of organisms. For example, we can estimate the fraction of our target
reptile species surveyed by BAC genome scanning using information publicly available on the Genome Size Database as follows: Alligator mississipiensis, 2.49 pg,
50 chromosomes; Chrysemys picta, 2.57 pg, 36 chromosomes; Anolis smaragdinus,
2.2 pg, 32 chromosomes. Conversion of picograms to base pairs: 2.49 pg × (9.78 ×
108 pg/Mb) = 2.435 Gbp; 2.57 × (9.78 × 108) = 2.513 Gbp; 2.20 × (9.78 × 108) =
2.152 Gbp. Estimated fractions of genomes surveyed: 2,519,551 bp alligator
BAC-end seqs/2.435 Gbp = 0.1035% alligator genome sampled; 2,432,811 bp turtle
BAC end seqs/2.513 Gbp = 0.0968% turtle genome sampled; 1,358,158 bp Anolis
plasmid end seqs/2.152 Gbp = 0.0631% Anolis genome sampled. The total number of
repeats per genome and its standard deviation can be estimated using the following
formulas (see Note 8):
Total copies =
( raw repeat counts ) × ( genome size in bp)
Standard derivation = ±
size of sequence data set in bp
(
)
raw repeat counts × ( genome size in bp )
size of sequence data set in bp
10. Estimate the evolutionary rates of word frequency change among lineages. Employ
comparative methods to calculate the evolutionary rates of word-frequency change for
specific branches on the amniote tree on the basis of the phylogentic-generalized least
squares (PGLS) approach implemented in the PGLS-ancestor module of the software
package COMPARE v. 4.6 (40,41) (see Note 9). As an example, we have used a NJ
distance tree with the following published divergence time estimates (Myr) for major
clades in the hypothesis assuming the existence of a common ancestor at 310 Myr:
squamates, 245 (42); birds, 222 (43); crocodilians and turtles, 207 (42); and rodents
and primates, 85 (44,45). Compare differential rates estimated along specific branches
in the phylogeny in order to illuminate macroevolutionary trends in genomic change
for particular clades. Evaluate the contribution of particularly active repetitive or
oligonucleotide subsets of elements to changes in total genome composition by examining frequency change of individual elements along specific branches.
104
Shedlock et al.
3.3. Targeted Studies of Genes and Genomic Neighborhoods
In addition to supporting genome scans to estimate the structure and complexity
among large numbers of randomly selected sequences, BAC libraries can also
be used to target and describe specific genes and their local genomic neighborhoods. To illustrate the steps of gene-targeting using a BAC library, this section
describes the location and description of a BAC insert as completed for EE0.6,
a heterochromatic marker found on the Z chromosome in emu, Dromaius novaehollandiae (46). In summary, an emu BAC library (see http://evogen.jgi.doe.
gov/second_levels/BACs/BAC_library_stats/Dnova_stats.html) was screened
with a radioactively labeled probe, EE0.6, and a single hybridizing clone was
detected and isolated from the arrayed library for more targeted investigation
by means of shotgun-subcloning, fragment sequencing and contig assembly.
We present a detailed step-by-step protocol of this process below under items
(1)–(11). The library screening and shot-gun assembly of particular BAC clones
can be used for comparative investigation of particular homologous functional
elements within genomes in a manner that compliments the phylogenomic
analysis of BAC-enabled genome scans as outlined above. This example provides
an illustration of the type of genome analysis provided by BAC clones that
would be impossible to achieve with a simple plasmid library owing to the limited
coverage of each clone in small insert libraries.
A flowchart summarizing targeted sequence analysis of a BAC clone is
presented in Fig. 5. Before a BAC library can be screened, nylon filters must be
prepared with a gridded representation of the clones contained in the 384-well
plates that hold the library. The emu BAC library, for example, contains 133,632
clones. The emu BACs are inserted into electrocompetent, DH10B T1-resistant
Escherichia coli cells. Each clone is suspended in a separate well with LB broth
(Miller), chloramphenicol, and glycerol.
1. Filter preparation from the emu BAC library. Remove 384-well plates from frozen
storage (−80°C) and thaw for ~90 min. Grid each clone in the emu BAC library to a
nylon filter with a Genetix Q-bot or other automated colony picker. Each filter represents clones from 48 384-well plates. Soak each filter in LB broth and 25 mg/mL
chloramphenicol before gridding. After gridding, incubate filters at 37°C for 17 h on
a large bioassay dish filled with 300 mL 1.5% LB agar and 12.5 μg chloramphenicol
(25 mg/mL). Transfer each filter to filter paper on a glass dish, saturated with lysis
buffer (2X SSC, 5% SDS). Incubate filters at room temperature for 3 min, microwave
at maximum power for 3 min and transfer to a Pyrex baking dish filled with proteinase K buffer and 10 μg/mL proteinase K. Cover the Pyrex dish with plastic wrap
and incubate at 37°C for ~2 h with gentle rocking. Once the filters are cleared of
colonial debris, rinse them in 2X SSC for 2 min, air-dry and crosslink for 2 min.
Label each filter with a fabric pen to indicate orientation and identity of the 48 plates
gridded to the filter. Store gridded filters dry at room temperature until their first use.
Amniote Phylogenomics: Testing Evolutionary Hypotheses
105
Fig. 5. Generation and analyses of bacterial artificial chromosome (BAC) sequence
data. BAC libraries are stored in frozen media in 384-well plates. For gene-targeting, a
two-dimensional representation of the library is printed onto nylon filters. Sequence probes
are hybridized to the filters to locate a BAC clone that contains the probe sequence.
Once the BAC clone is selected, the BAC insert is sheared into fragments, shotgunsubcloned, sequenced, and assembled. Once assembled, BAC inserts provide raw data
for large-scale sequence analyses of repeat density and synteny.
2. Probe generation. Generate sequence of interest from genomic DNA by PCR. Ligate
and transform the DNA into XL-10 Gold competent E. coli cells. PCR-transformed
clones using original gene primers. Purify PCR products from clone DNA and store
at a concentration of 25 ng/μL until the day of hybridization. Before hybridization,
label 25 ng DNA with (γ-32P)-dCTP with the Prime-It II Random Primer Labeling Kit.
Filter labeled probes through microspin columns to remove unlabeled nucleotides.
3. Filter prehybridization and hybridization. Denature 1.5 mL sonicated, nonhomologous
herring sperm DNA by heating at 100°C for 10 min and then chill on ice for 2 min.
By the same method, also denature labeled probe DNA. Sandwich each filter
between layers of hybridization mesh and roll them in lead-lined glass cylinders.
Prehybridize filters by adding 75 mL hybridization buffer and 1.5 mL herring sperm
DNA. Incubate filters, buffer, and herring sperm DNA with rolling at 65°C for 6 h.
106
4.
5.
6.
7.
8.
Shedlock et al.
After prehybridization, decant liquid from the cylinders and replace with 25 mL
hybridization buffer, 0.5 mL denatured herring sperm DNA, and denatured probe
DNA. Incubate cylinders with rolling at 65°C for 12 h.
Washing and autoradiography. Decant liquid from cylinders and replace with 25 mL
1X washing buffer. Incubate cylinders at 50°C for 15 min. Decant washing solution. Repeat washing twice. After three washes, blot excess buffer from individual
filters with paper towels. Wrap each filter in plastic wrap and place, DNA side up,
in a metal cassette under an undeveloped Biomax MR X-ray film. Store cassettes
at −80°C for 1 wk.
Developing films and picking clones. After autoradiography, develop films with an
X-omat film processor. Strip filters of radioactivity by saturating them with three
solutions: stripping solution 1 for 20 min; stripping solution 2 for 10 min; and
stripping solution 3 for 10 min. Wrap filters and store wet at −20°C. Positive hybridizations are indicated by two black spots on film. The two spots match the pattern
in which each clone’s DNA was gridded to the filter by the Q-bot. To illustrate this
technique, results from an autoradiograph of a BAC library screening assay of
Zebra Finch (Taenopygia guttata), screened with a Mhc class II probe, are shown
in Fig. 6. By referencing the location of the spots and the plates gridded onto the
filter, select clones from the BAC library for confirmation of sequence of interest
within the clone.
Growing clones for preliminary PCR survey. Pick each candidate BAC clone from
its 384-well plate and incubate in 500 μL LB broth (Miller) and chloramphenicol
at 37°C with shaking (250 rpm) for 17 h. PCR each clone with original gene
primers. Purify clones that produce a distinguishable band in an electrophoretic gel
for Southern blotting.
Purification, restriction, and Southern blotting. Incubate positive clone in 10 mL
LB broth (Miller) and chloramphenicol at 37°C with shaking (250 rpm) for 18 h.
Decant culture into a 1.5-mL Eppendorf tube and centrifuge at 13,000 rpm for
5 min. Decant LB and replace with new culture. Centrifuge tube again. Continue
decanting and centrifugation until all bacterial pellets from 10 mL culture are contained in one tube. Resuspend pellets in 30 μL resuspension buffer and RNase A,
and mix with 30 μL lysis buffer (Qiagen) and 30 μL neutralization buffer. Rock
solution gently several times, incubate on ice for 15 min, and centrifuge at 13,000 rpm
for 15 min. Remove supernatant carefully, mix with 150 μL of isopropanol and incubate at −20°C for 30 min. Centrifuge the tube at 13,000 rpm for 15 min and discard
the supernatant. Mix the BAC DNA pellet with 30 μL 70% ethanol, rock gently,
and centrifuge at 13,000 rpm for 5 min. After ethanol is removed, air-dry the pellet
and resuspend in 10 μL TE buffer. Restrict BAC DNA with EcoRI and HindIII,
and anneal to a filter for hybridization with a labeled EE0.6 probe as previously
described. Autoradiography provides additional support for the presence of the
sequence of interest in the BAC clone’s insert.
Shearing. Once the target BAC clone has been detected and isolated by the above
protocols, the shotgun assembly can be completed beginning with shearing the
insert and continuing through subcloning and contig assembly as described in steps
Amniote Phylogenomics: Testing Evolutionary Hypotheses
107
Fig. 6. Signal of hybridization between DNA probe and a bacterial artificial chromosome
(BAC)-gridded nylon filter. A radioactive or bioluminescent probe is hybridized to a
nylon filter that is gridded with DNA from the clones in the library. The organization
and location of hybridization signals (arrows pointing at two dots) indicate the identity
of the clone that carries the probe sequence. The results shown are from American
alligator (Alligator mississippiensis) and were screened with a DMRT1 probe. This
filter illustrates how BAC clones can be localized to individual wells by the pattern of
hybridization within each 4 × 4 spotted grid.
(8)–(11) below. A cartoon summarizing details of the BAC shot-gun sequence and
assembly process is illustrated in Fig. 7. Fragment 30 μg purified BAC DNA in
(200 μL solution with a hydroshear. Estimate the concentration of sheared DNA
by comparison to known concentrations of micrococcus in an electrophoretic gel.
108
Shedlock et al.
Fig. 7. Bacterial artificial chromosome (BAC) library shotgun-subcloning and
sequencing. BAC inserts carry 100–200 kb of sequence. BAC inserts are sheared into
smaller fragments and incorporated as smaller inserts in smaller vectors. The fragments
can be sequenced and joined into contigs by bioinformatics software.
9. End repair and shotgun subcloning. After hydroshearing, fragmented DNAs have
sticky ends inconsistent with ends for a potential cloning vector. To make the fragment ends uniform, blunt-end and end-repair 30 μg DNA. Purify hydrosheared
DNAs with the Qiaquick kit and blunt-end by incubation at 120°C for 40 min with
100 μL blunting mix followed by addition of 8 μL 0.5M EDTA to each sample and
incubation at 70°C for 10 min. After cleaning samples with Qiaquick kit again, mix
21 μL DNA with 3 μL BstXI/EcoRI Adapter and 6 μL linking mix and incubate at
16°C for 17 h, 70°C for 10 min, and end at 4°C. Purify samples with the Qiaquick
kit again and gel-extract DNAs between 4 and 7 kb for ligation. Purify gel extracts
with the Qiaquick kit. Mix 15.5 μL of sample with 1 μL vector and 3.5 μL ligating
mix and incubate at 16°C for 17 h, 70°C for 10 min, and end at 4°C. Mix 1 μL
ligated sample with 100 μL thawed GC5 cells and incubate at 0°C for 40 min, 42°C
for 15 s, 0°C for 1 min. Transfer samples to 900 μL SOC medium, seal, shake, and
incubate at 37°C for 1 h. Spread samples on large square bioassay dishes filled
with LB agar (Miller) and chloramphenicol, invert, and incubate at 37°C for 17 h.
Amniote Phylogenomics: Testing Evolutionary Hypotheses
109
Pick resultant colonies by Genetix QPix or other automated colony picker into
the 96-well deep plates filled with 300 μL of LB mix. Incubate colony plates at
37°C and shake at 250 rpm for 17 h. Centrifuge plates at 5000g at 12°C for 6 min.
Decant supernatant and add 150 μL resuspension buffer and RNase A, 150 μL lysis
buffer, and 150 μL neutralization buffer to each sample while shaking at 250 rpm.
Centrifuge samples at 5000g at 23°C for 30 min. Fill a 96-well 800 μL uniplate
with 290 μL of 99% isopropanol and cover with a 96-well 800 μL unifilter. Add
370 μL of sample supernatant to the unifilter. Seal the unifilter and centrifuge at
3000g at 14°C for 15 min. Discard the unifilter and the supernatant in the uniplate.
Air-dry pellets, wash twice with 70% ethanol, vacuum air-dry and resuspend in
15 μL TE buffer. Centrifuge uniplates at 500 rpm at room temperature for 2 min
and incubate at room temperature for 1 h. Freeze samples at −20°C.
10. Sequencing and assembling contigs. Label samples with the Big Dye Terminator
v3.1 Cycle Sequencing Kit using m13 primers and sequence. Name resultant
sequence files according to the St. Louis naming scheme which consists of three
parts: (1) sample identity, (2) sequence direction (b for forward or g for reverse),
and (3) unique sequence identity (i.e., 121.b.241). Construct contigs by analyzing
sequence files with Phred, Phrap, and Consed assembly software. Query Autofinish
for suggested primers for closing gaps between contigs (32,47–49).
11. Bioinformatics. Query contigs produced from the BAC insert for Refseq genes
within relevant libraries (human, mouse, chicken, and so on) with the UCSC genome
browser. Scan contigs for gene content with Seqhelp, Genscan and Geneparser
(50–52). Annotate alignments to relevant libraries with the Apollo software package
(53). Measure repeat density of the BAC sequence with the Miropeats and RepeatMasker software packages (38,54). Measure polymorphism using the DNAsp
software package (55). These software packages, among others, allow analyses of
sequences generated from screening, subcloning, sequencing, and assembling of BAC
DNA. An example of annotations produced by the above methods for 41 kb of
microchromosomal sequence from emu, Dromaius novaehollandiae, is illustrated
in Fig. 8.
4. Notes
1. Our reptile studies surveyed large-insert libraries for the American alligator
and painted turtle, constructed by colleagues at the NIH BAC Resource Network
Production Center at the Benoroya Research Institute at Virginia Mason, Seattle, WA,
(www.benaroyaresearch.org). We also utilized sequences from a whole-genome
plasmid library generated at the Washington University Genome Sequencing Center
in St. Louis, MO (http://genome.wustl.edu/home.cgi).
2. Clone sequencing was conducted at the Institute for Genomic Research, Rockville,
MD (http://www.tigr.org), and the Washington University Genome Sequencing
Center, and follow the published protocols of Zhao and colleagues (1,10). A total
of 4656 random clones were examined to produce a total of 8638 nonoverlapping
high-quality, edited paired end sequence reads, yielding 2,519,551 bp of alligator,
2,432,811 bp of turtle, and 1,358,158 bp of anole original sequence for comparative
110
Shedlock et al.
Fig. 8. Annotations of microchromosomal sequence from emu, Dromaius novaehollandiae. (A) The Apollo Genome Annotation and Curation Tool indicates conservation
of sequence among queried species. The letters M, H, and C refer to the mouse, human and
chicken genomes to which 41 kb of microchromosomal emu sequence was compared
in this query. (B) The UCSC genome browser aligned 16.8 kb of microchromosomal
emu sequence to chromosome 17 in the chicken genome. Conserved sequences among
other species and repeats are also noted in the UCSC output.
phylogenomic analysis. Among all reads, the average lengths were 769, 703, and
677 bp for alligator, turtle, and anole, respectively.
3. The pattern of nucleotide frequency observed in genome scans provides a means
to estimate global base composition of genomes of species and to infer phylogenetic
relationships of hierarchical structure that may be reflected by compositional patterns
among genome scans of target species. In particular, because guanine + cytosine
(GC) or GC-rich regions in the RNA polymerase II promoter region are essential
for efficient transcription of genes in eukaryotes (56), the relative GC content and
the distribution of GC rich/poor regions in genomes have become a standard index
of evaluating protein coding components of genomes. GC content has been estimated indirectly by experimental methods, such as DNA ultracentrifugation and
Amniote Phylogenomics: Testing Evolutionary Hypotheses
111
flow cytometry (29,57), but is increasingly analyzed directly with informatics tools
and has been described in extensive detail within the context of whole-genome
assemblies of model species (25,58–60). Moreover, analyzing the relationships
between GC content and other organismal parameters, such as genome size, cell size,
metabolic rate, and physiological constraints on life history, is an expanding line
of investigation that illuminates the evolutionary dynamics and possible selective
forces shaping genome structure (24,29,30,57,61).
4. We have used a statistical approach with genome scanning of alligator and turtle
BAC-end sequences to estimate global GC values that are almost identical to those
published based on buoyant densities in CsCl gradients and on-flow cytometry
(~42%; [29,57]) indicating that GC content is elevated in alligators and turtles
relative to values derived from the chicken and human genome assemblies (~40%;
[25,60]).
5. Analysis of genomic signatures from birds has extended this phylogenetic approach
to vertebrates and was shown to recover major branches in the avian tree, although
phylogenetic relationships near the tips of the tree were clearly incongruent with
traditional sequence analysis (62). Alternatively, the use of unsupervised neural
network algorithms, known as self-organizing maps, or SOMs, can be used to infer
phylogentic relationships based on oligo frequencies in unaligned genome sequences
(63,64). Such SOMs are based on the frequencies of short (2 or 3 bp) oligonucleotides estimated from bulk sequence data and have been used to characterize the
diversity of species present in environmental genomic samples (62,63,65). Although
the signature approach remains exploratory, it is providing valuable new ways for
harvesting phylogenetic information present in a wide variety of unalignable genomic
sequence and have been shown to corroborate results of conventional analyses based
on aligned sequences from homologous nucleotide positions (33,62). In general,
when seeking to estimate phylogenetic relationships, we do not advocate relying
upon the phenetic approach provided by signatures as a replacement for conventional methods of phylogenetic inference using a matrix of aligned nucleotides and
established models of substitution, when available. However, we believe that novel,
informatics-rich methods of inference such as genomic signatures provide an exciting
option for phylogenetic analysis of higher order patterns of complexity inherent to
genomic signatures, whereas at the same time providing insight into global patterns
of genomic change not conditioned on a specific chromosome or gene region.
6. Ascertainment bias tends to provide a conservative underestimate of the true repeat
diversity in target species, especially for highly derived novel TEs not annotated in
model-organism genome assemblies. To minimize this problem there is a growing
interest in developing informatics tools that do not rely on reference sequences that
can detect directly structural components of certain classes of TEs, such as the
tRNA-derived secondary structure in SINEs. Existing online informatics tools such
as the tRNA-scan Search Server (htpp://rna.wustl.edu/GtRDB/Hs/Hs-align.html)
and mfold (http://www.bioinfo.rpi.edu/applications/mfold/old/rna/form1.cgi) have
been used successfully to identify novel families of phylogentically useful retroelements in orders of placental mammals (66). The integration of scripting from text
112
Shedlock et al.
processing languages such as Perl (67) or Python (68) with the proliferation of online
resources of bioinformatics promises to greatly facilitate discovery of phylogentically
diverse repetitive elements de novo in addition to relying upon referencing against
annotated databases.
7. SSRs, or microsatellites, are tandemly duplicated units of 1–6 bp DNA motifs found
commonly throughout eukaryotic genomes (69). SSRs are unstable and highly
mutable and are thought to be primarily a result of polymerase slippage resulting
in misalignment of reassociating strands during DNA replication (70) although other
mechanisms such as unequal recombination and gene conversion have been characterized in detail (71,72). The balance between adding repeat units and removing
them by mismatch repair enzyme machinery is a dynamic process that can influence
genome evolution by contributing substantial genetic variation (73,74), introducing
mutational bias (75,76) and altering transcriptional activity (73,77) as well as global
oligonucleotide frequencies. Moreover, the frequency of SSR types in vertebrates is
uneven (78,79) and overall SSR abundance has been shown to correlate with genome
size (80). It is, therefore, of interest to examine the profile of SSRs in nonavian
reptiles in an effort to understand the influence of repetitive elements on vertebrate
genome evolution within a comparative framework.
8. In order to estimate the density of both repeats and also n-letter word frequencies
for genomic signature analysis, we assumed that, since these elements are rare
events and are more or less uniformly distributed in the genome, the total number
of repeats in the sampled region follows a Poisson distribution with rate Nr, where
N is the total number repeats in the genome and r is the relative fraction of total
repeats contained in the sampled region. Using this approach, departures from the
Poisson model associated with localized uneven distributions of elements should
provide a conservative estimate of element abundance. For any genome scan study,
sampling a limited number of clone ends may yield interspersed and tandem repeat
estimates that are biased from whole-genome counts for both bioinformatic and
experimental reasons.
9. The Martins–Hansen method (41) utilizes the topological information in a given
phylogeny with branch lengths set proportional to time and the values for continuous
characters for each species, be they oligonucleotide word frequencies, base composition, repeat abundance, and others. COMPARE then calculates rates as the
regression slope for a generalized least squares model describing how well time
predicts the variance between pairwise taxon divergence. Rates of evolution along
specific branches of a genomic signature tree can be compared in terms of the change
in value of any continuous character per million years when divergence times are
known from the fossil record.
Acknowledgments
We thank Chris Amemeiya, Tom Miyake, Robert Macey, Jeff Froula,
Zhenshan Wang, Shaying Zhao, Jyoti Shetty, Marcia Lara, Jonathan Losos, Wes,
Warren, and Pat Minx for technical support with genomic library construction,
cloning, and end-sequencing. Charles Chapus, Chris Botka, Amir Karger, and
Amniote Phylogenomics: Testing Evolutionary Hypotheses
113
Lakshmanan Iyer provided computational and informatics support. Jun Liu,
Tingting Zhang, and Patrick Deschavanne contributed statistical expertise and
help with data analysis. We thank John Wakeley, Mike Sorenson and members
of the Edwards Lab for numerous helpful discussions, especially Nancy Rotzel
and Chris Balakrishnan, for critical comment on the manuscript and assistance
with illustrations. This work was produced in part with support from NSF grant
IBN-0207870 to SVE and from Harvard University.
References
1. Zhao, S. and Stodolsky, M., eds (2004) Bacterial Artificial Chromosomes: Library
Construction, Physical Mapping, and Sequencing, Humana, Totowa, NJ.
2. Sambrook, J. and Russell, D. W. (2001) Molecular Cloning: A Laboratory Manual.
Cold Spring Harbor Laboratory, Cold Spring Harbor, NY.
3. Carvajal, J. J., Cox, D., Summerbell, D., and Rigby, P. W. (2001) A BAC transgenic
analysis of the Mrf4/Myf5 locus reveals interdigitated elements that control activation and maintenance of gene expression during muscle development. Development
128, 1857–1868.
4. Giraldo, P. and Montoliu, L. (2001) Size matters: use of YACs, BACs and PACs in
transgenic animals. Transgenic Res. 10, 83–103.
5. Heintz, N. (2000) Analysis of mammalian central nervous system gene expression
and function using bacterial artificial chromosome-mediated transgenesis. Hum. Mol.
Genet. 9, 937–943.
6. Takahashi, R., Ito, K., Fujiwara, Y., et al. (2000) Generation of transgenic rats with
YACs and BACs: preparation procedures and integrity of microinjected DNA. Exp.
Anim. 49, 229–233.
7. Amemiya, C. T., Zhong, T. P., Silverman, G. A., Fishman, M. C., and Zon, L. I. (1999)
Zebrafish YAC, BAC and PAC genomic libraries. Methods Cell Biol. 60, 235–258.
8. Choi, S. and Kim, U.-J. (2001) Construction of a bacterial artificial chromosome
library. In: Genomics Protocols (Starkey, M. P. and Elaswarapu, R., eds), pp. 57–68,
Humana, Totowa, NJ.
9. Zehetner G., Pack M., and Schäfer, K. (2001) Preparation and screening of highdensity cDNA arrays with genomic clones. In: Genomics Protocols (Starkey, M. P.
and Elaswarapu, R., eds), pp. 169–188, Humana, Totowa, NJ.
10. Zhao, S., Shatsman, S., Ayodeji, B., et al. (2001) Mouse BAC ends quality assessment
and sequence analyses. Genome Res. 11, 1736–1745.
11. Gasper, J. S., Shiina, T., Inoko, H., and Edwards, S. V. (2001) Songbird genomics:
analysis of 45 kb upstream of a polymorphic Mhc Class II gene in red-winged
blackbirds (Agelaius phoenicius). Genomics 75, 26–34.
12. Kim, C. B., Amemiya, C., Bailey, W., et al. (2000) Hox cluster genomics in the
horn shark, Heterodontus francisci. Proc. Natl Acad. Sci. USA 97, 1655–1660.
13. Giribet, G., Edgecombe, G. D., and Wheeler, W. C. (2001) Arthropod phylogeny
based on eight molecular loci and morphology. Nature 413, 157–161.
14. Madsen, O., Scally, M., Douady, C. J., et al. (2001) Parallel adaptive radiations in
two major clades of placental mammals. Nature 409, 610–614.
114
Shedlock et al.
15. Murphy, W. J., Eizirik, E., Johnson, W. E., et al. (2001) Molecular phylogenetics
and the origins of placental mammals. Nature 409, 614–618.
16. Soltis, P. S., Soltis, D. E., and Chase, M. W. (1999) Angiosperm phylogeny inferred
from multiple genes as a tool for comparative biology. Nature 402, 402–404.
17. Edwards, S. V., Jennings, W. B., and Shedlock, A. M. (2005) Phylogenetics of
modern birds in the era of genomics. Proc. R. Soc. Lond. Ser. B 272, 979–992.
18. Baker, R. J., Longmire, J. L., Maltbie, M., Hamilton, M. J., and VandenBussche, R. A.
(1997) DNA synapomorphies for a variety of taxonomic levels from a cosmid library
from the new world bat Macrotus waterhousii. Syst. Biol. 46, 579–589.
19. Rokas, A. and Holland, P. W. H. (2000) Rare molecular changes as a tool for phylogenetics. Trends Ecol. Evol. 15, 454–459.
20. Shedlock, A. M., Milinkovitch, M. C., and Okada, N. (2000) SINE evolution, missing
data, and the origin of whales. Syst. Biol. 49, 808–817.
21. Shedlock, A. M. and Okada, N. (2000) SINE insertions: powerful tools for molecular
systematics. Bioessays 22, 148–160.
22. Shedlock, A. M., Takahashi, K., and Okada, N. (2004) SINEs of speciation: tracking
lineages with retroposons. Trends Ecol. Evol. 19, 545–553.
23. Sarre, S., Georges, A., and Quinn, A. (2004) The ends of a continuum: genetic
and temperature-dependent sex determination in reptiles. Bioessays 26, 639–645.
24. Waltari, E. and Edwards, S. V. (2002) Evolutionary dynamics of intron size, genomesize,
and physiological correlates in archosaurs. Am. Nat. 160, 539–552.
25. Shedlock, A. M., Botka, C. W., Zhao, S., et al. (2007) Phylogenomics of non-avian
reptiles and the structure of the ancestral amniote genome. Proc. Natl Acad. Sci. USA
104, 2767–2772.
26. Shedlock, A. M. (2006) Phylogenomic investigation of CR1 LINE diversity in
reptiles. Syst. Biol. 55, 902–911.
27. Organ, C. L., Shedlock, A. M., Meade, A., Pagel, M., and Edwards, S. E. (2007)
Origin of avian genome size and structure in non-avian dinosaurs. Nature 446,
180–184.
28. Shedlock, A. M. (2006) Exploring frontiers in the DNA landscape: An introduction
to the symposium "Genome analysis and the molecular systematics of retroelements”.
Syst. Biol. 55, 871–874.
29. Vinogradov, A. E. (1998) Genome size and GC-percent in vertebrates as determined
by flow cytometry: the triangular relationship. Cytometry 31, 100–109.
30. Olmo, E. (1986) Animal Cytogenetics, Vol. 4: Chordata, No. 3A: Reptilia. Gebrüder
Borntraeger, Berlin.
31. Blanchette, M., Green, E. D., Miller, W., and Haussler, D. (2004) Reconstructing large
regions of an ancestral mammalian genome in silico. Genome Res. 14, 2412–2423.
32. Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using
phred. II. Error probabilities. Genome Res. 8, 186–194.
33. Chapus, C., Dufraigne, C., Edwards, S. V., et al. (2005) Exploration of phylogenetic
data using a global sequence analysis method. BMC Evol. Biol. 2005, 63.
34. Deschavanne, P., Giron, A., Vilain, J., Fagot, G., and Fertil, B. (1999) Genomic
signature: characterization and classification of species assessed by chaos game
representation of sequences. Mol. Biol. Evol. 16, 1391–1399.
Amniote Phylogenomics: Testing Evolutionary Hypotheses
115
35. Karlin, S. and Ladunga, I. (1994) Comparisons of eukaryotic genomic sequences.
Proc. Natl Acad. Sci. USA 91, 12,832–12,836.
36. Saitou, N. and Nei, M. (1987) The neighbor-joining method: a new method for
reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425.
37. Felsenstein, J. (1985) Confidence limits on phylogenies: an approach using the
bootstrap. Evolution 39, 783–791.
38. Smit, A. F. A., Hubley, R., and Green, P. (2004) RepeatMasker Open-3.0.5 (http://
www.repeatmasker.org).
39. Benson, G. (1999) Tandem repeats finder: a program to analyze DNA sequences.
Nucleic Acids Res. 27, 573–580.
40. Martins, E. P. (2004) COMPARE, version 4.6b. Computer programs for the statistical
analysis of comparative data. Distributed by the author at http://compare.bio.indiana.
edu/. Department of Biology, Indiana University, Bloomington IN.
41. Martins, E. P. and Hansen, T. F. (1997) Phylogenies and the comparative method:
a general approach to incorporating phylogenetic information into analysis of interspecific data. Am. Nat. 149, 646–667.
42. Hedges, S. B. and Poling, L. L. (1999) A molecular phylogeny of reptiles. Science
283(5404), 998–1001.
43. Kumar, S. and Hedges, B. (1998) A molecular timescale for vertebrate evolution.
Nature 392, 917–920.
44. Hasegawa, M., Thorne, J. L., and Kishino, H. (2003) Time scale of eutherian evolution estimated without assuming a constant rate of molecular evolution. Genes Genet.
Syst. 78, 267–283.
45. Springer, M. S., Murphy, W. J., Eizirik, E., and O’Brien, S. J. (2003) Placental
mammal diversification and the Cretaceous-tertiary boundary. Proc. Natl Acad.
Sci. USA 100(3), 1056–1061.
46. Ogawa, A., Murata, K., and Mizuno, S. (1998) The location of Z- and W-linked
marker genes and sequence on the homomorphic sex chromosomes of the ostrich and
the emu. Proc. Natl Acad. Sci. USA 95(8), 4415–4418.
47. Gordon, D. C. (2004) Viewing and editing assembled sequences using Consed. In:
Current Protocols in Bioinformatics (Baxevanis, A. D. and Davison, D. B., eds),
Wiley, New York.
48. Gordon, D., Desmarais, C., and Green, P. (2001) Automated finishing with Autofinish.
Genome Res. 11(4), 614–625.
49. Gordon, D., Abajian, C., and Green, P. (1998) Consed: a graphical tool for sequence
finishing. Genome Res. 8(3), 195–202.
50. Lee, M. K., Lynch, E. D., and King, M. C. (1998) SeqHelp: a program to analyze
molecular sequences utilizing common computational resources. Genome Res. 8(3),
306–312.
51. Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human
genomic DNA. J. Mol. Biol. 268(1), 78–94.
52. Snyder, E. E. and Stormo, G. D. (1995) Identification of protein-coding regions in
genomic DNA. J. Mol. Biol. 248(1), 1–18.
53. Lewis, S. E., Searle, S. M. J., Harris, N., et al. (2002) Apollo: a sequence annotation
editor. Genome Biol. 3, research0082.1–14.
116
Shedlock et al.
54. Parsons, J. D. (1995) Miropeats: graphical DNA sequence comparisons. Comput.
Appl. Biosci. 11, 615–619.
55. Rozas, J., Sanchez-DelBarrio, J. C., Messeguer, X., and Rozas, R. (2003) DnaSP,
DNA polymorphism analyses by the coalescent and other methods. Bioinformatics
19(18), 2496–2497.
56. Watson, J. D., Hopkins, N. H., Roberts, J. W., Steitz, J. A., and Weiner, A. M.
(1987) Molecular Biology of the Gene, 4th Ed., Benjamin Cummings, Menlo Park.
57. Hughes, S., Clay, O., and Bernardi, G. (2002) Compositional patterns in reptilian
genomes. Gene 295, 323–329.
58. Jaillon, O., Aury, J.-M., and Brunet, F., et al. (2004) Genome duplication in the
teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype.
Nature 431(7011), 946–957.
59. Rat Genome Sequencing Project Consortium. (2004) Genome sequence of the
Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521.
60. International Chicken Genome Sequencing Consortium. (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate
evolution. Nature 432(7018), 695–716.
61. Hughes, A. and Piontkivska, H. (2005) DNA repeat arrays in chicken and human
genomes and the adaptive evolution of avian genome size. BMC Evol. Biol. 5(1), 12.
62. Edwards, S. V., Fertil, B., Giron, A., and Deschavanne, P. (2002) A genomic schism in
birds revealed by phylogenetic analysis of DNA strings. Syst. Biol. 51(4), 599–613.
63. Abe, T., Kanaya, S., Kinouchi, M., et al. (2003) Informatics for unveiling hidden
genome signatures. Genome Res. 13, 693–702.
64. Dopazo, J. and Carazo, J. M. (1997) Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J. Mol.
Evol. 44, 226–233.
65. Uchiyama, T., Abe, T., Ikemura, T., and Watanabe, K. (2005) Substrate-induced
gene-expression screening of environmental metagenome libraries for isolation of
catabolic genes. Nat. Biotechnol. 23, 88–93.
66. Churakov, G., Smit, A. F. A., Brosius, J., and sSchmitz, J. (2004) A novel abundant
family of retroposed elements (DAS-SINEs) in the nine-banded armadillo
(Dasypus novemcinctus). Mol. Biol. Evol. 22, 886–893.
67. Wall, L., Christiansen, T., and Orwant, J. (2000) Programming Perl, 3rd Ed.,
O’Reilly Media, Cambridge.
68. Lutz, M. (2001) Programming Python, 2nd Ed., O’Reilly Media, Cambridge.
69. Tautz, D. and Renz, M. (1984) Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucelic Acids Res. 12, 4127–4138.
70. Richards, R. I. and Sutherland, G. R. (1994) Simple repeat DNA is not replicated
simply. Nat. Genet. 6, 114–116.
71. Almeida, P. and Penha-Gonçalves, C. (2004) Long perfect dinucleotide repeats are
typical of vertebrates, show motif preferences and size convergence. Mol. Biol.
Evol. 21, 1226–1233.
72. Majewski, J. and Ott, J. (2000) GT repeats are associated with recombination on
human chromosome 22. Genome Res. 10, 1108–1114.
Amniote Phylogenomics: Testing Evolutionary Hypotheses
117
73. Kashi, Y., King, D. C., and Soller, M. (1997) Simple sequence repeats as a source
of quantitative genetic variation. Trends Genet. 13, 74–78.
74. Tautz, D., Trick, M., and Dover, G. (1986) Cryptic simplicity in DNA is a major
source of genetic variation. Nature 322, 652–656.
75. Amos, W., Sawcer, S. J., Feakes, R. W., and Rubeinsztein, D. C. (1996) Microsatellites show mutational bias and heterozygot instability. Nat. Genet. 13, 390–391.
76. Primmer, C. R., Saino, N., Møller, A. P., and Ellegren, H. (1996) Directional evolution in germline microsatellite mutations. Nat. Genet. 13, 391–393.
77. Gerber, H. P., Seipel, K., Georgiev, O., et al. (1994) Transcriptional activation modulated by homopolymeric glutamine and proline stretches. Science 263, 808–811.
78. Tóth, G., Gáspári, Z., and Jurk, J. (2000) Microsatellites in different eukaryotic
genomes: survey and analysis. Genome Res. 10, 967–998.
79. Primmer, C. R., Raudsepp, T., Chowdhary, B. P., Moller, A. P., and Ellegren, H.
(1997) Low frequency of microsatellites in the avian genome. Genome Res. 7(5),
471–482.
80. Hancock, J. M. (1996) Simple sequence repeats and the expanding genome.
Bioessays 18, 421–425.
81. Burt, D. W. (2002) Origin and evolution of avian microchromosomes. Cytogenet.
Genome Res. 96(1–4), 97–112.
82. Epplen, J. T., Diedrich, U., Wagenmann, M., Schmidtke, J., and Engel, W. (1979)
Contrasting DNA sequence organisation patterns in sauropsidian genomes.
Chromosoma 75, 199–214.
83. Gregory, T. R. (2001) Animal Genome Size Database (http://www.genomesize.com).
84. King, M., Honeycutt, R., and Contreras, N. (1986) Chromosomal repatterning in
crocodiles: C, G, and N-banding and in situ hybridization of 18S and 26S rRNA
cistrons. Genetica 27, 191–201.