Download 24 Recombination Hotspots in Nonallelic Homologous Recombination

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Meiosis wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Minimal genome wikipedia , lookup

Mutation wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Tag SNP wikipedia , lookup

Human genetic variation wikipedia , lookup

Genetic engineering wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene desert wikipedia , lookup

Public health genomics wikipedia , lookup

Point mutation wikipedia , lookup

Neocentromere wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Transposable element wikipedia , lookup

NUMT wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression programming wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Designer baby wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Metagenomics wikipedia , lookup

Microevolution wikipedia , lookup

Microsatellite wikipedia , lookup

Pathogenomics wikipedia , lookup

Polyploid wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Holliday junction wikipedia , lookup

Non-coding DNA wikipedia , lookup

Copy-number variation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomic library wikipedia , lookup

Genomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Human Genome Project wikipedia , lookup

Human genome wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Homologous recombination wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome editing wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Transcript
Chapter 24 / NAHR Hotspots
24
341
Recombination Hotspots in Nonallelic
Homologous Recombination
Matthew E. Hurles, PhD
and James R. Lupski, MD, PhD
CONTENTS
INTRODUCTION
IDENTIFICATION OF RECOMBINATION HOTSPOTS
FEATURES OF AHR AND NAHR HOTSPOTS
MECHANISTIC BASIS OF RECOMBINATION HOTSPOTS
EVOLUTIONARY ORIGINS OF HOTSPOTS
CONCLUSIONS AND FUTURE WORK
SUMMARY
REFERENCES
INTRODUCTION
Rearrangement breakpoints resulting from nonallelic homologous recombination (NAHR)
are typically clustered within small, well-defined portions of the segmental duplications that
promote the rearrangement. These NAHR “hotspots” have been identified in every NAHRpromoted rearrangement in which breakpoint junctions have been sequenced in sufficient
numbers. Enhancement of recombinatorial activity in NAHR hotspots varies from 3 to 237
times more than in the surrounding “cold” duplicated sequence. NAHR hotspots share many
features in common with allelic homologous recombination (AHR) hotspots. Both AHR and
NAHR hotspots appear to be relatively small (<2 kb) and are initiated by double-strand breaks.
Gene conversion events as well as crossovers are enhanced at NAHR hotspots. Recent work
has improved our understanding of the origins of NAHR and AHR hotspots, with both appearing to be relatively short-lived phenomena. Our present understanding of NAHR hotspots
comes from a limited number of locus-specific studies. In the future, we can expect genomewide analyses to provide many further insights.
During meiosis, AHR occurs between homologous chromosomes, generating haplotypic
diversity in the succeeding generation. The distribution of these recombination events throughout the genome has been known to be nonrandom for decades, but in the past 5 years ever-finer
spatial resolutions have revealed dramatic heterogeneity in AHR rates at the DNA sequence
level. These studies have demonstrated the existence of recombination hotspots, where AHR
From: Genomic Disorders: The Genomic Basis of Disease
Edited by: J. R. Lupski and P. Stankiewicz © Humana Press, Totowa, NJ
341
342
Part V / Functional Aspects of Genome Structure
rates can be several orders of magnitude more than in surrounding “cold” regions. In parallel
to these developments, sufficient numbers of breakpoints of selected NAHR rearrangements
have been characterized at the DNA sequence level to resolve the distribution of crossovers in
these cases. This has similarly led to the identification of NAHR hotspots within paralogous
sequences. This chapter profiles AHR and NAHR hotspots and discusses their similarities with
a view to understanding the molecular mechanisms underpinning pathogenic NAHR.
It is necessary to be clear when defining what constitutes a recombination hotspot. In the
most general sense, a hotspot is a region of elevated activity relative to its surroundings.
However, activity in the “surroundings” can be quantified in a number of ways. We consider
a recombination hotspot to be an interval of DNA, defined at the sequence level, which manifests elevated recombinatorial activity relative to its immediate flanking sequences. In principle, a recombination hotspot could alternatively be defined as exhibiting elevated
recombinatorial activity relative to the genome average. It is worth noting that the DNA1
hotspot identified experimentally in the major histocompatibility complex (MHC) region (1)
using the former criterion actually exhibits lower recombinatorial activity than the genome
average. Likewise, crossover in the NAHR hotspot in the Charcot-Marie-Tooth disease type 1A
(CMT1A)-REP (2) does not appear to be significantly more frequent than the average genomewide recombination rate and has been referred to as a positional specificity for strand exchange
(3). Nonetheless, studies of both regions revealed that not all homologous sequences are equal
and a “punctate” pattern of crossovers is revealed for both AHR and NAHR.
In addition, homologous recombination is a process that can result in one of two outcomes:
a crossover event or a gene conversion. Consequently, we consider that a recombination
hotspot may become apparent because of elevated levels of either outcome.
The Importance of Recombination Hotspots
There are a handful of pathogenic mutation processes that operate in the human genome
(e.g., base substitution, replication slippage, NAHR). For some of these mutational mechanisms we have a good understanding of the rates and locations at which these mutations arise
(e.g., base substitution and replication slippage), whereas for NAHR we do not. Examining the
distribution of recombination events at the sequence level is perhaps one of the most important
clues we can have for understanding the basis of pathogenic NAHR. In a more general sense,
the fine-mapping of all homologous recombination (HR) processes can only help our admittedly basic comprehension of what is a fundamental cellular process.
In recent years there has been a growing desire to be able to use patterns of linkage disequilibrium (LD) throughout the genome to design more efficient association studies for the detection of genes predisposing to complex diseases (4). These patterns of LD have been investigated
at high resolution at several sites within the genome and a common finding is that the physical
distance over which LD persists in the genome is highly variable (5). This variability leads to
a “block-like” haplotypic structure in which regions of high LD are separated by shorter
sequences across which LD is minimal (6). One of the main causal mechanisms by which this
block-like structure might arise is the existence of extreme AHR rate heterogeneity. The idea
being that recombination rates are low or absent within haplotype blocks but that hotspots of
recombinatorial activity map between blocks. Other possible causal mechanisms of these
haplotypic structures exist, however these alternative processes are dependent on populationspecific demographic factors. Given the immense effort it is taking to map haplotypic structures in four human populations in the HapMap project (7), it is important to know to what
Chapter 24 / NAHR Hotspots
343
degree it is possible to predict haplotypic structures in other populations from the data on these
four. If the distribution of LD is determined mainly by population demography then haplotype
structures are likely to differ markedly between populations. If, however, recombination
hotspots, a molecular rather then population-based process, accounts for patterns of LD and
these hotspots are shared between populations, then haplotype structures are much more likely
to be shared between populations.
In addition, accurate estimates of recombination rates are required when drawing evolutionary inferences from autosomal variation, for example, the estimation of the very recent age of
the HIV-resistant CCR5-Δ32 mutation (8). The apparent recent origin of this variant has lead
to further work examining the possible role of the CCR5-Δ32 mutation in conferring resistance
to recent plagues within the past 1000 years (9,10). This dating estimation is almost entirely
predicated on an assumption of recombination rate homogeneity across a 46-Mb portion of
chromosome 3.
IDENTIFICATION OF RECOMBINATION HOTSPOTS
The most obvious means to examine recombination rate heterogeneity is to identify a
sufficient number of individual recombination events within a set of defined physical intervals,
such that the frequency of recombination in one interval can be compared with that in another.
This can be achieved at a variety of different spatial resolutions. Despite this conceptual
similarity, the methods used to identify AHR and NAHR hotspots have traditionally been very
different.
Finding NAHR Hotspots
NAHR events are generally ascertained as a result of their deleterious phenotypic outcomes;
usually conveyed by gene dosage effects secondary to DNA rearrangements (e.g., deletion or
duplication). In this sense they are similar to phenotype-based screens in model organisms. De
novo NAHR events are subsequently identified by comparing the genomes of patients and their
parents. A collection of de novo NAHR events can then be fine-mapped to identify and ultimately sequence the rearrangement breakpoints. Not all recurrent rearrangements result from
NAHR (11) and so the preliminary identification of similar sized rearrangements must be
followed by the mapping of breakpoints to blocks of duplicated sequence.
Most NAHR breakpoints have been mapped to a pair of duplicated sequences (also referred
to as low-copy repeats, segmental duplications, or duplicons), but not to a specific interval
within these duplicated sequences. There are two major complicating factors that hinder the
fine-mapping of NAHR breakpoints within duplicated sequences: the presence of the duplicated sequences in multiple copies, and the complex sequence variation within duplicated
sequences.
Somatic cell hybrids that retain only the rearranged copy of a chromosome and not its
unrearranged homolog have been found to be extremely useful in disentangling the signals
from the multiple copies of the duplicated sequences that have driven the NAHR events.
Reducing the copy number of the sequences flanking the breakpoint interval enables the
isolation of a junction-specific breakpoint that can be subsequently sequenced.
The complex pattern of sequence variation within duplicated sequences means that variants
apparent within the reference sequence that distinguish between proximal and distal copies of
a duplicated sequence (known as cis-morphisms or paralogous sequence variants [PSVs]) may
344
Part V / Functional Aspects of Genome Structure
exhibit multi-site variation (12). In other words, rather than being diagnostic of the proximal
copy of a duplicated sequence a variant may be absent from either copy, present in both copies
or even present only in the distal copy. Sequencing parental chromosomes to characterize
sequence variation in the duplicated sequences prior to NAHR can be critical to defining the
precise interval between two PSVs between which the breakpoint must map.
Finding AHR Hotspots
In contrast to NAHR, AHR events are generally ascertained through their generation of
recombinant haplotypes rather than any phenotypic effects. Initially, recombinant haplotypes
were identified though the pedigree analysis of haplotypes of sparsely distributed markers.
Unfortunately, the relatively low frequency of recombination events across the genome as a
whole (approx 50 crossovers per male meiosis) means that to obtain accurate estimates over even
relatively large physical distances requires large numbers of meioses. It is not feasible to type the
number of meioses required to get accurate estimates of recombination frequency for the shorter
distances (tens of kilobases) over which the patterns of LD described above are observed (13).
In recent years, researchers have taken advantage of the fact that the 50 million sperm in
every milliliter of semen each represent an independent product of meiosis to characterize male
AHR at much finer resolution. By amplifying recombinant-specific haplotypes by PCR from
pools of accurately quantified numbers of sperm it has become possible to estimate recombination frequencies over intervals of hundreds of bases (see Fig. 1) (14). However, the painstaking optimization required to achieve the levels of sensitivity necessary to detect single
recombinant haplotypes (spanning <10 kb) against a background of thousands of nonrecombinant haplotypes renders it difficult to expand the initial locus-specific studies of recombination rate heterogeneity across the whole genome.
More recently, a new class of statistical methods have been devised that can infer recombination rate heterogeneity from the patterns of sequence variation within a population (15,16).
Rather than collect individual recombination events this approach relies on being able to detect
the cumulative impact of recombination on haplotypic structures. Having been verified against
known, experimentally determined hotspots, these methods provide the prospect of being able
to scan across the entire genome using datasets of haplotype diversity that are already being
generated in the HapMap project.
FEATURES OF AHR AND NAHR HOTSPOTS
Known NAHR Hotspots
Only for a minority of genomic disorders have the breakpoints of the disease-causing
rearrangements been mapped to specific intervals within duplicated sequences. However, for
every case in which sufficient numbers of breakpoints have been precisely mapped, substantial
NAHR rate heterogeneity within the duplicated sequence is observed. Table 1 documents the
known NAHR hotspots. It can be seen that the list of rearrangements is only a small subset of
the comprehensive list of genomic disorders given elsewhere in this book.
Known AHR Hotspots
The number of experimentally determined AHR hotspots is similarly limited and are highly
nonrandomly distributed throughout the genome. Some were initially identified at low resolution
in pedigree analysis of a few well-characterized loci (e.g., β-globin, the MHC and the pseudo-
Chapter 24 / NAHR Hotspots
345
Fig. 1. Rare recombination events in spermatogenesis generate a low frequency of recombinant
haplotypes. Rare sperm in which selected allelic homologous recombination (AHR) or nonallelic homologous recombination (NAHR) events have occurred are shown in gray. Gray/black arrows indicate
homologous sequences. Variants that differ between the homologous sequences are shown above the
arrow. Novel haplotypes diagnostic of different reciprocal recombination events are show for (A)
NAHR: deletion and duplication and (B) AHR. The problem of identifying recombination (AHR or
NAHR) events in sperm can, therefore, be reduced to one of identifying recombinant haplotypes in
sperm DNA against a high background of nonrecombinant haplotypes.
346
Part V / Functional Aspects of Genome Structure
Table 1
Known NAHR Hotspots
Size of
Maximum contiguous
size of
homology
Factor of
hotspot
block enhancementa Refs.
Disease
Chr.
Repeats
Hotspot
Azoospermia
(AZFa)
Azoospermia
(AZFa)
Azoospermia
NF1
HNPP
CMT1A
SMS
SMS
SMS
Y
HERV15
ID1
1.3 kb
10 kb
5X
47–50
Y
HERV15
ID2
1.7 kb
10 kb
3X
47–50
933 bp
2 kb
557 bp
741 bp
1.1 kb
1.6 kb
524 bp
92 kb
85 kb
24 kb
24 kb
170 kb
170 kb
124 kb
66X
20X
31X
18X
29X
26X
237X
51
52
53
20
54
54
55
3.4 kb
143 kb
9–18X
56
WilliamsBeuren
syndrome
(WBS)
Y
17
17
17
17
17
17
7
Distal P1/P5
—
NF1-REPs
—
CMT1A-REPs
—
CMT1A-REPs
—
SMS-REPs
Block A hotspot
SMS-REPs
Block C hotspot
LCR17pA
—
and
LCR17pD
WBS-LCRs Block B hotspot
Bc/Bm
on normal
(not inverted)
background
a The factor of enhancement (F) is calculated according to the equation below, which takes into account the size
of the hotspot (H), the size of the contiguous duplicated block of appreciable homology (R), and the proportion of
breakpoints found within the hotspot (q); F = Rq/H.
HERV, human endogenous retrovirus; NF1, neurofibromatosis 1; HNPP, hereditary neuropathy with liability
to pressure palsies; CMT1A, Charcot-Marie-Tooth disease type 1A; SMS, Smith-Magenis syndrome; LCR, lowcopy repeat; Chr., chromosome.
autosomal region) or from patterns of LD (e.g., the MHC) and were subsequently fine-mapped
in males using sperm typing methods. Other AHR hotspots were identified in sequences
flanking minisatellites during studies of minisatellite mutation dynamics in sperm (reviewed
by ref. 17).
Properties of AHR and NAHR Hotspots
Despite the relative paucity of well-characterized AHR and NAHR hotspots, a number of
shared features of both types of hotspots have been observed (18). Three characteristics shared
by both NAHR and AHR hotspots are: (1) they occupy clearly delimited short intervals less
than 2 kb in size; (2) there are no obvious sequence similarities either within each class of
hotspot or between them; and (3) they appear to be coincident with gene conversion events.
In addition to the absence of sequence conservation between hotspots of both HR classes,
there is no apparent shared sequence context (e.g., GC content, CpG dinucleotide frequency,
poly[A]/poly[T] fraction, dispersed repeats or the presence of [AC]n repeats) that is consistently associated with the presence of hotspots. It may be that recombination hotspots have
more similar properties in higher levels of chromatin organization (e.g., chromatin conformation, epigenetic marks, histone modification, intranuclear organization) than they do in pri-
Chapter 24 / NAHR Hotspots
347
Fig. 2. Crossover frequencies across two sites, one encompassing an nonallelic homologous recombination (NAHR) and the other an allelic homologous recombination (AHR) hotspot. The number of
crossover events observed in neighbouring intervals of homologous sequences separated by single
nucleotide polymorphisms (AHR) or paralogous sequence variants (NAHR) demonstrate the existence
of tightly confined hotspots for both AHR and NAHR. The CMT1A-REP data come from ref. 20 and
the SHOX data come from ref. 46. The number above each column indicates the number of individual
events detected.
mary sequence. Alternatively, a non-B DNA conformation may initiate breaks that stimulate
recombinatorial activity. Most of the NAHR hotspots listed in Table 1 are associated with
longer stretches of identity between the paralogous sequences in which they reside. This is not
surprising given the known dependence of HR rate on levels of sequence similarity and the
presence of much greater variation in sequence identity between paralogous sequences than
between allelic sequences in the human genome. This has lead to the idea that gene conversion
between paralogous sequences could potentiate subsequent rearrangements as a result of generating greater similarity between the paralogs (homogenization). Such gene conversion events
could be considered “premutations” for genomic disorders, analogous to those apparent for
triplet repeat disorders.
The morphology of AHR hotspots is better characterized than for NAHR hotspots and it
appears that for AHR hotspots a central peak in activity is surrounded by symmetrical tails (see
Fig. 2). This commonality among AHR hotspots has lead to suggestions that they reflect
underlying similarities between hotspots in the initiation and processing of recombination
events (19).
The intensity of recombination hotspots can be measured in a variety of different ways.
Intensity can be considered to be relative, for example the enhancement in activity in hotspots
over flanking cold regions, or absolute intensities, for example recombination rate per meiosis.
It has been estimated that the SHOX AHR hotspot in the pseudo-autosomal region has a
348
Part V / Functional Aspects of Genome Structure
recombination rate of 3/1000 meioses, which is considerably lower than the intensity of some
of the most intense hotspots in yeast (10–100/1000 meioses) (19). We can use the incidence
of autosomal dominant genomic disorders that result from NAHR to estimate an absolute
intensity of their associated NAHR hotspots. Mutations in the 741-bp CMT1A-REP hotspot
are found in 56% of all CMT1A duplications, which occur at a de novo rate of approx 1/2500
meioses, indicating an absolute intensity of 0.22/1000 meioses (20).
Limited evidence on sex-specific AHR and NAHR rates suggest that there are sex-dependent
differences in NAHR hotspot intensities that mirror the well-studied sex-specific differences
in the chromosome-wide distribution and frequency of AHR. For example, de novo CMT1A
duplications occur preferentially in the paternal germline (21), whereas the TAP2 AHR hotspot
appears to be much more active in the female germline (22).
Further complexities in the variability of NAHR hotspot activity are suggested by the
finding that the preferential location of breakpoints of neurofibromatosis 1 deletions are different in meiotic and mitotic events (23), and that the intensity of the CMT1A-REP hotspot
varies among individuals (3). Disentangling the various factors that underlie quantitative
differences in NAHR and AHR hotspot intensity between individuals, between hotspots, and
between haplotypes should lead to improved understanding of the mechanistic basis underlying recombination rate heterogeneity.
MECHANISTIC BASIS OF RECOMBINATION HOTSPOTS
Our understanding of the mechanistic basis of homologous recombination in mammals has
been driven primarily by the correlation of theoretical models with the roles played by homologs of the well-characterized proteins involved in recombination processes in yeast and bacteria. However, HR is involved in a multiplicity of tightly regulated processes in both soma and
germline. In addition, it appears likely that HR can proceed via a number of related, but distinct,
pathways. Therefore, it is reasonable to suspect that different subsets of HR proteins are
recruited during these different processes. Despite these caveats, we can characterize HR as
a process initiated by a double-strand break (DSB), which is subsequently processed to facilitate a homology search, which results in the formation of an intermediate comprising at least
one Holliday junction that is then resolved to generate either a crossover or a gene conversion
event (24). In principle, a recombination hotspot could result from biases at any one of these
stages (i.e., initiation, processing, and resolution) that favor a resulting crossover.
The Machinery of HR
Meiotic recombination hotspots in yeast result primarily from a biased distribution of DSBs
introduced by the protein Spo11 (25). Targeting Spo11 in a sequence-specific manner by
fusing it with a DNA-binding domain is sufficient to generate a recombination hotspot in a
previously cold region (26). Furthermore, it has been shown that a pre-existing hotspot can be
repressed by changes in chromatin structure (27). In yeast, as in other eukaryotes, homologs
of the bacterial RecA protein catalyze the invasion of the processed DSB into homologous
sequences. Components of the mismatch repair machinery are involved in the assessment of
the degree of homology. The ratio of gene conversion to crossover at a hotspot has been found
to differ both between mitosis and meiosis, and between different hotspots in the same genome.
Moreover, it has been argued that there may be different classes of hotspot in yeast (28).
What relevance do these findings have for mammalian HR? There is recent evidence that
DSBs form preferentially in hotspot regions in mice (29), but otherwise it remains to be proven
Chapter 24 / NAHR Hotspots
349
that mammalian HR hotspots also result from preferential initiation by DSBs. Evidence from
sperm-typing (30) and population genetic studies suggest that the majority of recombination
intermediates at meiotic AHR hotspots are resolved as gene conversions rather than as crossovers, but there is also evidence of variation in the gene conversion/crossover ratio between
hotspots (30). This suggests that hotspot activity is not solely determined by initiation, but also
by subsequent events in HR.
The sequential recruitment of different proteins to sites of HR during mammalian meiotic
recombination has been mapped cytogenetically (31). This mapping has shown that the number of initial sites of DNA–DNA interactions as revealed by foci of RecA homologs seems to
be 10-fold larger than the number of eventual crossover events. Only the minority of sites that
recruit the mismatch repair protein MLH1 are thought to undergo crossovers. The majority of
sites that do not appear to be associated with crossovers recruit a different mismatch repair
protein, MSH4. Perhaps this excess of MSH4-recruiting sites reflects the ratio of gene conversion to crossover events observed in the sperm-typing study.
Evidence From Additional Observations
Additional observations from diverse sources provide further insights into the nature of HR
hotspots. In mice it has been demonstrated that AHR hotspot activity declines as sequence
similarity lessens (32). This mirrors the enrichment of NAHR hotspots in the more similar
portions of paralogous sequences. Clearly, some kind of sequence similarity threshold is
necessary, but not sufficient for hotspot activity, although this threshold is poorly understood
at present. The situation in NAHR may be complicated by the competition between nonallelic
and allelic homologs to a DSB arising within a sequence that is duplicated elsewhere in the
genome. Competition between these alternate substrates may determine the ratio of AHR to
NAHR and could reflect such factors as the sequence similarity, orientation, and spatial separation between potential substrates in the meiotic nucleus.
Could it be that an NAHR hotspot merely reflects an AHR hotspot within a paralogous
sequence, which its paralog is of similar sequence identity to the allelic homolog? In this regard
the NAHR hotspots residing in the male-specific portion of the Y (MSY) chromosome are
particularly interesting given that this region of the genome is constitutively haploid. Paralogous
sequences in the MSY that contain these hotspots lack an allelic pairing partner during meiosis.
Consequently, these sequences do not undergo AHR, although some of them are known to
undergo frequent paralogous gene conversion (33).
EVOLUTIONARY ORIGINS OF HOTSPOTS
The evolution of recombination hotspots both within humans and between humans and
higher primates has been studied only minimally. The evolutionary origins of recombination
hotspots is closely related both to the underlying mechanisms of HR, and also to the factors that
modulate HR rates. We argue that if we were to have a better understanding of the origins of
hotspots, we would also more fully understand their underlying mechanism.
In humans, there is conflicting evidence on the issue of whether recombination hotspots are
shared between populations. The CMT1A NAHR hotspot appears to be conserved between
populations (34,35), as do the AHR hotspots in the class II region of the MHC (36), whereas
a study identifying AHR hotspots from large-scale population genetic data finds support for
the existence of population-specific hotspots (15). Looking deeper into evolutionary history,
it appears that the TAP2 AHR hotspot at least, is not conserved in chimpanzees (37).
350
Part V / Functional Aspects of Genome Structure
DSB models for HR suggests that hotspots should be ephemeral phenomena (38). A sequence
variant that acts as preferential site for the initiation of HR through the generation of a DSB
is likely to be preferentially removed during HR as the initiating strand is gene converted to
the state of the less active variant on the homologous sequence. This bias would reduce the
frequency of the recombinogenic variant in the population in a process known as “meiotic
drive” (a biased inheritance of alleles). This prediction of meiotic drive has been verified
experimentally in both human and mouse AHR hotspots (32,39).
This leads to the obvious question, if recombinogenic variants are doomed to extinction,
how do recombination hotspots arise in the first place? One suggestion is that the number of
potential recombination hotspots is more than the number of active hotspots, and that the
focusing of activity on a single location allows the other “cryptic” hotspots to evolve neutrally.
Once an active hotspot wanes, a nearby cryptic hotspot could become active. This suggests a
dynamic relationship between frequency and activity of recombinogenic variants and recombination hotspot location. An intriguing situation that may have a bearing on this relationship
is the finding that alternate hotspots exist in the mouse MHC depending on the haplotypes that
are recombining (32). Additional insights into the transient nature of recombination hotspots
comes from the observations of large inter-individual variation in the intensity of specific AHR
hotspots (39,40) that is associated with single base changes in and around the hotspots. It has
yet to be demonstrated that inter-individual variation in NAHR hotspot activity can be ascribed
to single base changes in the locality of the hotspot.
Comparative Studies: Evolutionary Origins of NAHR Hotspots
The likely dependence of NAHR rates on levels of sequence similarity between paralogs
suggests that monitoring the evolution of this similarity may inform our understanding of the
time-depth of NAHR hotspots. One recent study of NAHR between paralogous human endogenous retrovirus (HERV) proviral sequences, flanking the Y-chromosome Azoospermia Factor a (AZFa) locus that when deleted causes male infertility, provided evidence for several
hominid-specific gene conversion events that may have rendered the associated hotspots better
substrates for chromosomal rearrangements in humans than in either chimpanzees or gorillas
(41). Gene conversion generates signatures of concerted evolution in alignments of comparative sequences (see Fig. 3) and in the AZFa-HERVs these signatures are coincident with the
location of the NAHR hotspots. If we continue the “premutation” analogy, homogenizing
permutations have occurred on the hominid lineage and become fixed in an ancestral species
to modern humans. However, because gene conversion and chromosomal rearrangement reflect the alternative products of a common intermediate, it may be that a recombinogenic
sequence motif/structure underpins the association and the increased sequence identity resulting from gene conversion plays only a minor role in determining the frequency of chromosomal
rearrangement. Nevertheless, the coincidence of signatures of concerted evolution and recurrent breakpoints of chromosomal rearrangements (mapped at the DNA sequence level) may
enable the identification of putative rearrangement hotspots from analysis of comparative
sequences from great apes.
Why Do Recombination Hotspots Exist?
Our lack of understanding of recombination rate heterogeneity in the human genome is
emphasised by our ignorance as to why recombination hotspots exist in the first place. Recombination rate heterogeneity seems to be a general feature of sexually reproducing eukaryotic
Chapter 24 / NAHR Hotspots
351
Fig. 3. The effect of concerted evolution on phylogeny. Independent gene conversion events between
paralogous sequences in two sequences lead to the sequences being homogenized within a species. As
a consequence, phylogenies constructed from these sequences exhibit a different topology from that
expected given the sequence of duplication and speciation events. This phylogenetic pattern can be used
to detect regions of the genome undergoing “concerted evolution.” The “true phylogeny” shows the
actual evolutionary relationships among the four sequence as a result of duplication preceding speciation. The “apparent phylogeny” is the phylogeny that would be reconstructed from an alignment of the
four sequences after undergoing concerted evolution.
species. Are AHR hotspots themselves of benefit to such organisms or are they a by-product
of the need to regulate recombination events to ensure correct segregation of chromosomes?
There are a number of plausible evolutionary scenarios that explain why hotspots themselves
might be beneficial. For example: AHR hotspots might ensure that advantageous linkage
groups are maintained together, or they might facilitate the operation of selection by ensuring
that selective pressures operate independently at two loci under selection on either side of a
hotspot (42). It is less easy to find reasons to explain why NAHR hotspots might exist, other
than the fact that they are related mechanistically to AHR hotspots.
One intriguing prospect is that if NAHR and AHR hotspots are linked by a common mechanism, then the two processes are likely to co-evolve. Given the often pathogenic nature of
NAHR, it could be that the machinery of AHR evolved to minimize NAHR. This certainly
seems possible. Although individual pathogenic NAHR-promoted rearrangements are so rare
that locus-specific selective pressures are likely to be too weak to affect evolutionary change,
cumulatively these rearrangements may well assert an appreciable selective pressure. Is there
any evidence that AHR has evolved to minimize NAHR? In a hypothetical 3-Gb genome
comprised of only single copy sequences, the minimal length of sequence identity between
recombining partners required to ensure that HR occurs between allelic sequences needs only
be approx 20-bp long (frequency of random reoccurrence of a 20mer is 420, which is much more
than 3 Gb). This length is at least an order of magnitude shorter than the apparent minimal
length in mammals, which seems likely to be more than 200 bp (43). This could well reflect
352
Part V / Functional Aspects of Genome Structure
the presence of so many dispersed repeats (and, hence, potential for NAHR) in the human
genome. It is also possible that the lower rate of AHR in males reflects enhancement of NAHR
on the sex chromosomes in the heterogametic germline, where a lack of meiotic pairing partner
outside of the pseudo-autosomal regions may facilitate pathogenic intrachromosomal rearrangements (44). An additional possibility to take into account when extrapolating from studies dissecting HR processes in model organisms is that the apparently recent proliferation of
segmental duplications in primate genomes (45) may have imposed further modification on
AHR processes, such that significant differences exist between AHR in primates and in other,
less-duplicated, mammalian genomes.
CONCLUSIONS AND FUTURE WORK
Currently, only a small number of NAHR and AHR hotspots have been characterized in
detail at the molecular level. However, whenever NAHR hotspots have been sought, they have
been found. Despite this paucity of well-characterized HR hotspots, we have seen that NAHR
and AHR hotspots share several important features in common, which may suggest that they
share similar molecular mechanisms. However, there are no obvious shared sequence motifs
between AHR or NAHR hotspots that might allow us to predict the position of hotspots on the
basis of the single reference sequence at our disposal. Although it is difficult to draw firm
conclusions from so few data, it seems that NAHR hotspots are likely to be less intense (in
terms of locus-specific events per generation) than AHR hotspots. This is not altogether surprising given: (1) competition between allelic and paralogous sequences and (2) a requirement
for chromosomal gymnastics for NAHR to proceed. In contrast to many other fundamental
processes in the human genome, there are good mechanistic reasons, and limited evidence, that
individual NAHR and AHR hotspots are not likely to be highly conserved over large evolutionary distances.
Undoubtedly, what is required for a quantum leap in our understanding of HR hotspots are
methods that allow these hotspots to be identified in a high-throughput manner. At present, all
NAHR hotspots have been identified on the basis of phenotypes associated with the rearrangements they promote. This requirement to collect sufficient numbers of rare rearrangements from
the population represents the major barrier to such high-throughput methods. A haplotype-driven
(as opposed to a phenotype-driven) assay for NAHR in sperm could be the answer. Sperm-typing
methods have been successful in recent years in characterising novel AHR hotspots (13). Candidate locations to test for NAHR hotspot activity could be identified by scanning comparative
sequences of segmental duplications in great ape species for signals of concerted evolution.
Sperm-typing methods of assaying genome “rearrangeability” would also help to address
many of the outstanding fundamental questions concerning NAHR hotspots, which include:
• To what degree do rates of NAHR vary between loci, individuals, and populations, and what
factors underscore this variation?
• What is the relationship between gene conversion hotspots and rearrangement hotspots?
• Are NAHR hotspots simply AHR hotspots within paralogous sequences?
• Are there different classes of NAHR and AHR hotspots?
• Are NAHR and AHR hotspots necessarily short-lived?
Although our focus on a specific class of mutational mechanisms underlying relatively rare
genetic diseases may appear to be of esoteric interest, we would argue that the role played by
NAHR in other more common diseases with a genetic component is likely to have been under-
Chapter 24 / NAHR Hotspots
353
ascertained. We think that it is reasonable to assume that every mutational mechanism in the
human genome contributes to Mendelian and complex diseases, albeit to varying extents.
Currently, NAHR has only been convincingly linked to the former. It has been well recognized
that our ability to dissect the genetic basis of complex disease susceptibility will depend greatly
on the underlying allelic architecture. This allelic architecture is itself dependent on assumptions about the dynamics of the underlying mutation process. Consequently, our ability to
detect the contribution of NAHR to complex disease will depend in large degree on the presently unknown rates and locations of NAHR in the human genome. For example, the widespread existence of NAHR hotspots suggests that chromosomal rearrangements that are
indistinguishable at the DNA sequence level may arise recurrently on diverse haplotypic
backgrounds. Such haplotypic heterogeneity within a population could hinder the detection of
any such etiologically important variants. Greater comprehension of the mutation dynamics of
NAHR (and other mutational processes) should be used to improve the experimental design
of searches for variants conferring susceptibility to complex diseases.
SUMMARY
NAHR hotspots within duplicated sequences that promote rearrangements have been found whenever they have been sought with sufficient resolution. AHR and NAHR hotspots share a number of
important features in common, implying that they share the same underlying mechanism. Many
questions remain regarding the numbers and locations of NAHR hotspots and new methodologies will
be needed to gain a genome-wide understanding of this fundamental mutational process.
REFERENCES
1. Jeffreys AJ, Kauppi L, Neumann R. Intensely punctate meiotic recombination in the class II region of the major
histocompatibility complex. Nat Genet 2001;29:217–222.
2. Reiter LT, Murakami T, Koeuth T, et al. A recombination hotspot responsible for two inherited peripheral
neuropathies is located near a mariner transposon-like element. Nat Genet 1996;12:288–297.
3. Han LL, Keller MP, Navidi W, Chance PF, Arnheim N. Unequal exchange at the Charcot-Marie-Tooth disease
type 1A recombination hot-spot is not elevated above the genome average rate. Hum Mol Genet 2000;9:
1881–1889.
4. Pritchard JK, Przeworski M. Linkage disequilibrium in humans: models and data. Am J Hum Genet 2001;69:
1–14.
5. Dawson E, Abecasis GR, Bumpstead S, et al. A first-generation linkage disequilibrium map of human chromosome 22. Nature 2002;418:544–548.
6. Gabriel SB, Salomon R, Pelet A, et al. The structure of haplotype blocks in the human genome. Science
2002;296:2225–2229.
7. The International HapMap Consortium. The International HapMap Project. Nature 2003;426:789–796.
8. Stephens JC, Reich DE, Goldstein DB, et al. Dating the origin of the CCR5-Delta32 AIDS-resistance allele by
the coalescence of haplotypes. Am J Hum Genet 1998;62:1507–1515.
9. Mecsas J, Franklin G, Kuziel WA, Brubaker RR, Falkow S, Mosier DE. Evolutionary genetics: CCR5 mutation
and plague protection. Nature 2004;427:606.
10. Galvani AP, Slatkin M. Evaluating plague and smallpox as historical selective pressures for the CCR5-Delta
32 HIV-resistance allele. Proc Natl Acad Sci USA 2003;100:15,276–15,279.
11. Kurahashi H, Emanuel BS. Long AT-rich palindromes and the constitutional t(11;22) breakpoint. Hum Mol
Genet 2001;10:2605–2617.
12. Fredman D, White SJ, Potter S, Eichler EE, Den Dunnen JT, Brookes AJ. (2004) Complex SNP-related
sequence variation in segmental genome duplications. Nat Genet 2004;36:861–866.
13. Arnheim N, Calabrese P, Nordborg M. Hot and cold spots of recombination in the human genome: the reason
we should find them and how this can be achieved. Am J Hum Genet 2003;73:5–16.
354
Part V / Functional Aspects of Genome Structure
14. Jeffreys AJ, Murray J, Neumann R. High-resolution mapping of crossovers in human sperm defines a
minisatellite-associated recombination hotspot. Molecular Cell 1998;2:267–273.
15. Crawford DC, Bhangale T, Li N, et al. Evidence for substantial fine-scale variation in recombination rates
across the human genome. Nat Genet 2004;36:700–706.
16. McVean GA, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P. The fine-scale structure of recombination rate variation in the human genome. Science 2004;304:581–584.
17. de Massy B. Distribution of meiotic recombination sites. Trends Genet 2003;19:514–522.
18. Lupski JR. Hotspots of homologous recombination in the human genome: not all homologous sequences are
equal. Genome Biology 2004;5:242.
19. Kauppi L, Jeffreys AJ, Keeney S. Where the crossovers are: recombination distributions in mammals. Nat Rev
Genet 2004;5:413–424.
20. Lopes J, Ravise N, Vandenberghe A, et al. Fine mapping of de novo CMT1A and HNPP rearrangements within
CMT1A-REPs evidences two distinct sex-dependent mechanisms and candidate sequences involved in recombination. Hum Mol Genet 1998;7:141–148.
21. Palau F, Lofgren A, De Jonghe P, et al. Origin of the de novo duplication in Charcot-Marie-Tooth disease type
1A: unequal nonsister chromatid exchange during spermatogenesis. Hum Mol Genet 1993;2:2031–2035.
22. Jeffreys AJ, Ritchie A, Neumann R. High resolution analysis of haplotype diversity and meiotic crossover in
the human TAP2 recombination hotspot. Hum Mol Genet 2000;9:725–733.
23. Kehrer-Sawatzki H, Kluwe L, Sandig C, et al. High frequency of mosaicism among patients with neurofibromatosis type 1 (NF1) with microdeletions caused by somatic recombination of the JJAZ1 gene. Am J Hum
Genet 2004;75:410–423.
24. Szostak JW, Orr-Weaver TL, Rothstein RJ, Stahl FW. The double-strand-break repair model for recombination. Cell 1983;33:25–35.
25. Baudat F, Nicolas A. Clustering of meiotic double-strand breaks on yeast chromosome III. Proc Natl Acad Sci
USA 1997;94:5213–5218.
26. Pecina A, Smith KN, Mezard C, Murakami H, Ohta K, Nicolas A. Targeted stimulation of meiotic recombination. Cell 2002;111:173–184.
27. Ben-Aroya S, Mieczkowski PA, Petes TD, Kupiec M. The compact chromatin structure of a Ty repeated
sequence suppresses recombination hotspot activity in Saccharomyces cerevisiae. Mol Cell 2004;15:221–231.
28. Petes TD. Meiotic recombination hot spots and cold spots. Nat Rev Genet 2001;2:360–369.
29. Qin J, Richardson LL, Jasin M, Handel MA, Arnheim N. Mouse strains with an active H2-Ea meiotic recombination hot spot exhibit increased levels of H2-Ea-specific DNA breaks in testicular germ cells. Mol Cell Biol
2004;24:1655–1666.
30. Jeffreys AJ, May CA. Intense and highly localized gene conversion activity in human meiotic crossover hot
spots. Nat Genet 2004;36:151–156.
31. Moens PB, Kolas NK, Tarsounas M, Marcon E, Cohen PE, Spyropoulos B. The time course and chromosomal
localization of recombination-related proteins at meiosis in the mouse are compatible with models that can
resolve the early DNA-DNA interactions without reciprocal recombination. J Cell Sci 2002;115:1611–1622.
32. Yauk CL, Bois PR, Jeffreys AJ. High-resolution sperm typing of meiotic recombination in the mouse MHC
Ebeta gene. Embo J 2003;22:1389–1397.
33. Rozen S, Skaletsky H, Marszalek JD, et al. Abundant gene conversion between arms of palindromes in human
and ape Y chromosomes. Nature 2003;423:873–876.
34. Timmerman V, Rautenstrauss B, Reiter LT, et al. Detection of the CMT1A/HNPP recombination hotspot in
unrelated patients of European descent. J Med Genet 1997;34:43–49.
35. Warner LE, Reiter LT, Murakami T, Lupski JR. Molecular mechanisms for Charcot-Marie-Tooth disease and
related demyelinating peripheral neuropathies. Cold Spring Harb Symp Quant Biol 1996;61:659–671.
36. Kauppi L, Sajantila A, Jeffreys AJ. Recombination hotspots rather than population history dominate linkage
disequilibrium in the MHC class II region. Hum Mol Genet 2003;12:33–40.
37. Ptak SE, Roeder AD, Stephens M, Gilad Y, Paabo S, Przeworski M. Absence of the TAP2 human recombination hotspot in chimpanzees. PLoS Biol 2004;2:849–855.
38. Boulton A, Myers RS, Redfield RJ. The hotspot conversion paradox and the evolution of meiotic recombination. Proc Natl Acad Sci USA 1997;94:8058–8063.
39. Jeffreys AJ, Neumann R. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot
spot. Nat Genet 2002;31:267–271.
Chapter 24 / NAHR Hotspots
355
40. Monckton DG, Neumann R, Guram T, et al. Minisatellite mutation-rate variation associated with a flanking
DNA sequence polymorphism. Nat Genet 1994;8:162–170.
41. Hurles ME, Willey D, Matthews L, Hussain SS. Origins of chromosomal rearrangement hotspots in the human
genome: evidence from the AZFa deletion hotspots. Genome Biol 2004;5:R55.
42. Hey J. What’s so hot about recombination hotspots? PLoS Biol 2004;2:730–733.
43. Waldman AS, Liskay RM. Dependence of intrachromosomal recombination in mammalian cells on uninterrupted homology. Mol Cell Biol 1988;8:5350–5357.
44. Hurles ME, Jobling MA. A singular chromosome. Nat Genet 2003;34:246–247.
45. Bailey JA, Gu Z, Clark RA, et al. Recent segmental duplications in the human genome. Science 2002;297:
1003–1007.
46. May CA, Shone AC, Kalaydjieva L, Sajantila A, Jeffreys AJ. Crossover clustering and rapid decay of linkage
disequilibrium in the Xp/Yp pseudoautosomal gene SHOX. Nat Genet 2002;31:272–275.
47. Kamp C, Hirschmann P, Voss H, Huellen K, Vogt PH. Two long homologous retroviral sequence blocks in
proximal Yq11 cause AZFa microdeletions as a result of intrachromosomal recombination events. Hum Mol
Genet 2000;9:2563–2572.
48. Blanco P, Shlumukova M, Sargent CA, Jobling MA, Affara N, Hurles ME. Divergent outcomes of
intrachromosomal recombination on the human Y chromosome: male infertility and recurrent polymorphism.
J Med Genet 2000;37:752–758.
49. Bosch E, Jobling MA. Duplications of the AZFa region of the human Y chromosome are mediated by homologous recombination between HERVs and are compatible with male fertility. Hum Mol Genet 2003;12:
341–347.
50. Sun C, Skaletsky H, Rozen S, et al. Deletion of azoospermia factor a (AZFa) region of human Y chromosome
caused by recombination between HERV15 proviruses. Hum Mol Genet 2000;9:2291–2296.
51. Repping S, Skaletsky H, Lange J, et al. Recombination between palindromes P5 and P1 on the human Y
chromosome causes massive deletions and spermatogenic failure. Am J Hum Genet 2002;71:906–922.
52. Lopez-Correa C, Dorschner M, Brems H, et al. Recombination hotspot in NF1 microdeletion patients. Hum
Mol Genet 2001;10:1387–1392.
53. Reiter LT, Hastings PJ, Nelis E, De Jonghe P, Van Broeckhoven C, Lupski JR. Human meiotic recombination
products revealed by sequencing a hotspot for homologous strand exchange in multiple HNPP deletion patients.
Am J Hum Genet 1998;62:1023–1033.
54. Bi W, Park SS, Shaw CJ, Withers MA, Patel PI, Lupski JR. Reciprocal crossovers and a positional preference
for strand exchange in recombination events resulting in deletion or duplication of chromosome 17p11.2. Am
J Hum Genet 2003;73:1302–1315.
55. Shaw CJ, Withers MA, Lupski JR. Uncommon deletions of the Smith-Magenis syndrome region can be
recurrent when alternate low-copy repeats act as homologous recombination substrates. Am J Hum Genet
2004;75:75–81.
56. Bayes M, Magano LF, Rivera N, Flores R, Perez Jurado LA. Mutational mechanisms of Williams-Beuren
syndrome deletions. Am J Hum Genet 2003;73:131–151.