* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 24 Recombination Hotspots in Nonallelic Homologous Recombination
Mitochondrial DNA wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Minimal genome wikipedia , lookup
Human genetic variation wikipedia , lookup
Genetic engineering wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene desert wikipedia , lookup
Public health genomics wikipedia , lookup
Point mutation wikipedia , lookup
Neocentromere wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Transposable element wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression programming wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Designer baby wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Metagenomics wikipedia , lookup
Microevolution wikipedia , lookup
Microsatellite wikipedia , lookup
Pathogenomics wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Holliday junction wikipedia , lookup
Non-coding DNA wikipedia , lookup
Copy-number variation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genomic library wikipedia , lookup
Helitron (biology) wikipedia , lookup
Human Genome Project wikipedia , lookup
Human genome wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Homologous recombination wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Chapter 24 / NAHR Hotspots 24 341 Recombination Hotspots in Nonallelic Homologous Recombination Matthew E. Hurles, PhD and James R. Lupski, MD, PhD CONTENTS INTRODUCTION IDENTIFICATION OF RECOMBINATION HOTSPOTS FEATURES OF AHR AND NAHR HOTSPOTS MECHANISTIC BASIS OF RECOMBINATION HOTSPOTS EVOLUTIONARY ORIGINS OF HOTSPOTS CONCLUSIONS AND FUTURE WORK SUMMARY REFERENCES INTRODUCTION Rearrangement breakpoints resulting from nonallelic homologous recombination (NAHR) are typically clustered within small, well-defined portions of the segmental duplications that promote the rearrangement. These NAHR “hotspots” have been identified in every NAHRpromoted rearrangement in which breakpoint junctions have been sequenced in sufficient numbers. Enhancement of recombinatorial activity in NAHR hotspots varies from 3 to 237 times more than in the surrounding “cold” duplicated sequence. NAHR hotspots share many features in common with allelic homologous recombination (AHR) hotspots. Both AHR and NAHR hotspots appear to be relatively small (<2 kb) and are initiated by double-strand breaks. Gene conversion events as well as crossovers are enhanced at NAHR hotspots. Recent work has improved our understanding of the origins of NAHR and AHR hotspots, with both appearing to be relatively short-lived phenomena. Our present understanding of NAHR hotspots comes from a limited number of locus-specific studies. In the future, we can expect genomewide analyses to provide many further insights. During meiosis, AHR occurs between homologous chromosomes, generating haplotypic diversity in the succeeding generation. The distribution of these recombination events throughout the genome has been known to be nonrandom for decades, but in the past 5 years ever-finer spatial resolutions have revealed dramatic heterogeneity in AHR rates at the DNA sequence level. These studies have demonstrated the existence of recombination hotspots, where AHR From: Genomic Disorders: The Genomic Basis of Disease Edited by: J. R. Lupski and P. Stankiewicz © Humana Press, Totowa, NJ 341 342 Part V / Functional Aspects of Genome Structure rates can be several orders of magnitude more than in surrounding “cold” regions. In parallel to these developments, sufficient numbers of breakpoints of selected NAHR rearrangements have been characterized at the DNA sequence level to resolve the distribution of crossovers in these cases. This has similarly led to the identification of NAHR hotspots within paralogous sequences. This chapter profiles AHR and NAHR hotspots and discusses their similarities with a view to understanding the molecular mechanisms underpinning pathogenic NAHR. It is necessary to be clear when defining what constitutes a recombination hotspot. In the most general sense, a hotspot is a region of elevated activity relative to its surroundings. However, activity in the “surroundings” can be quantified in a number of ways. We consider a recombination hotspot to be an interval of DNA, defined at the sequence level, which manifests elevated recombinatorial activity relative to its immediate flanking sequences. In principle, a recombination hotspot could alternatively be defined as exhibiting elevated recombinatorial activity relative to the genome average. It is worth noting that the DNA1 hotspot identified experimentally in the major histocompatibility complex (MHC) region (1) using the former criterion actually exhibits lower recombinatorial activity than the genome average. Likewise, crossover in the NAHR hotspot in the Charcot-Marie-Tooth disease type 1A (CMT1A)-REP (2) does not appear to be significantly more frequent than the average genomewide recombination rate and has been referred to as a positional specificity for strand exchange (3). Nonetheless, studies of both regions revealed that not all homologous sequences are equal and a “punctate” pattern of crossovers is revealed for both AHR and NAHR. In addition, homologous recombination is a process that can result in one of two outcomes: a crossover event or a gene conversion. Consequently, we consider that a recombination hotspot may become apparent because of elevated levels of either outcome. The Importance of Recombination Hotspots There are a handful of pathogenic mutation processes that operate in the human genome (e.g., base substitution, replication slippage, NAHR). For some of these mutational mechanisms we have a good understanding of the rates and locations at which these mutations arise (e.g., base substitution and replication slippage), whereas for NAHR we do not. Examining the distribution of recombination events at the sequence level is perhaps one of the most important clues we can have for understanding the basis of pathogenic NAHR. In a more general sense, the fine-mapping of all homologous recombination (HR) processes can only help our admittedly basic comprehension of what is a fundamental cellular process. In recent years there has been a growing desire to be able to use patterns of linkage disequilibrium (LD) throughout the genome to design more efficient association studies for the detection of genes predisposing to complex diseases (4). These patterns of LD have been investigated at high resolution at several sites within the genome and a common finding is that the physical distance over which LD persists in the genome is highly variable (5). This variability leads to a “block-like” haplotypic structure in which regions of high LD are separated by shorter sequences across which LD is minimal (6). One of the main causal mechanisms by which this block-like structure might arise is the existence of extreme AHR rate heterogeneity. The idea being that recombination rates are low or absent within haplotype blocks but that hotspots of recombinatorial activity map between blocks. Other possible causal mechanisms of these haplotypic structures exist, however these alternative processes are dependent on populationspecific demographic factors. Given the immense effort it is taking to map haplotypic structures in four human populations in the HapMap project (7), it is important to know to what Chapter 24 / NAHR Hotspots 343 degree it is possible to predict haplotypic structures in other populations from the data on these four. If the distribution of LD is determined mainly by population demography then haplotype structures are likely to differ markedly between populations. If, however, recombination hotspots, a molecular rather then population-based process, accounts for patterns of LD and these hotspots are shared between populations, then haplotype structures are much more likely to be shared between populations. In addition, accurate estimates of recombination rates are required when drawing evolutionary inferences from autosomal variation, for example, the estimation of the very recent age of the HIV-resistant CCR5-Δ32 mutation (8). The apparent recent origin of this variant has lead to further work examining the possible role of the CCR5-Δ32 mutation in conferring resistance to recent plagues within the past 1000 years (9,10). This dating estimation is almost entirely predicated on an assumption of recombination rate homogeneity across a 46-Mb portion of chromosome 3. IDENTIFICATION OF RECOMBINATION HOTSPOTS The most obvious means to examine recombination rate heterogeneity is to identify a sufficient number of individual recombination events within a set of defined physical intervals, such that the frequency of recombination in one interval can be compared with that in another. This can be achieved at a variety of different spatial resolutions. Despite this conceptual similarity, the methods used to identify AHR and NAHR hotspots have traditionally been very different. Finding NAHR Hotspots NAHR events are generally ascertained as a result of their deleterious phenotypic outcomes; usually conveyed by gene dosage effects secondary to DNA rearrangements (e.g., deletion or duplication). In this sense they are similar to phenotype-based screens in model organisms. De novo NAHR events are subsequently identified by comparing the genomes of patients and their parents. A collection of de novo NAHR events can then be fine-mapped to identify and ultimately sequence the rearrangement breakpoints. Not all recurrent rearrangements result from NAHR (11) and so the preliminary identification of similar sized rearrangements must be followed by the mapping of breakpoints to blocks of duplicated sequence. Most NAHR breakpoints have been mapped to a pair of duplicated sequences (also referred to as low-copy repeats, segmental duplications, or duplicons), but not to a specific interval within these duplicated sequences. There are two major complicating factors that hinder the fine-mapping of NAHR breakpoints within duplicated sequences: the presence of the duplicated sequences in multiple copies, and the complex sequence variation within duplicated sequences. Somatic cell hybrids that retain only the rearranged copy of a chromosome and not its unrearranged homolog have been found to be extremely useful in disentangling the signals from the multiple copies of the duplicated sequences that have driven the NAHR events. Reducing the copy number of the sequences flanking the breakpoint interval enables the isolation of a junction-specific breakpoint that can be subsequently sequenced. The complex pattern of sequence variation within duplicated sequences means that variants apparent within the reference sequence that distinguish between proximal and distal copies of a duplicated sequence (known as cis-morphisms or paralogous sequence variants [PSVs]) may 344 Part V / Functional Aspects of Genome Structure exhibit multi-site variation (12). In other words, rather than being diagnostic of the proximal copy of a duplicated sequence a variant may be absent from either copy, present in both copies or even present only in the distal copy. Sequencing parental chromosomes to characterize sequence variation in the duplicated sequences prior to NAHR can be critical to defining the precise interval between two PSVs between which the breakpoint must map. Finding AHR Hotspots In contrast to NAHR, AHR events are generally ascertained through their generation of recombinant haplotypes rather than any phenotypic effects. Initially, recombinant haplotypes were identified though the pedigree analysis of haplotypes of sparsely distributed markers. Unfortunately, the relatively low frequency of recombination events across the genome as a whole (approx 50 crossovers per male meiosis) means that to obtain accurate estimates over even relatively large physical distances requires large numbers of meioses. It is not feasible to type the number of meioses required to get accurate estimates of recombination frequency for the shorter distances (tens of kilobases) over which the patterns of LD described above are observed (13). In recent years, researchers have taken advantage of the fact that the 50 million sperm in every milliliter of semen each represent an independent product of meiosis to characterize male AHR at much finer resolution. By amplifying recombinant-specific haplotypes by PCR from pools of accurately quantified numbers of sperm it has become possible to estimate recombination frequencies over intervals of hundreds of bases (see Fig. 1) (14). However, the painstaking optimization required to achieve the levels of sensitivity necessary to detect single recombinant haplotypes (spanning <10 kb) against a background of thousands of nonrecombinant haplotypes renders it difficult to expand the initial locus-specific studies of recombination rate heterogeneity across the whole genome. More recently, a new class of statistical methods have been devised that can infer recombination rate heterogeneity from the patterns of sequence variation within a population (15,16). Rather than collect individual recombination events this approach relies on being able to detect the cumulative impact of recombination on haplotypic structures. Having been verified against known, experimentally determined hotspots, these methods provide the prospect of being able to scan across the entire genome using datasets of haplotype diversity that are already being generated in the HapMap project. FEATURES OF AHR AND NAHR HOTSPOTS Known NAHR Hotspots Only for a minority of genomic disorders have the breakpoints of the disease-causing rearrangements been mapped to specific intervals within duplicated sequences. However, for every case in which sufficient numbers of breakpoints have been precisely mapped, substantial NAHR rate heterogeneity within the duplicated sequence is observed. Table 1 documents the known NAHR hotspots. It can be seen that the list of rearrangements is only a small subset of the comprehensive list of genomic disorders given elsewhere in this book. Known AHR Hotspots The number of experimentally determined AHR hotspots is similarly limited and are highly nonrandomly distributed throughout the genome. Some were initially identified at low resolution in pedigree analysis of a few well-characterized loci (e.g., β-globin, the MHC and the pseudo- Chapter 24 / NAHR Hotspots 345 Fig. 1. Rare recombination events in spermatogenesis generate a low frequency of recombinant haplotypes. Rare sperm in which selected allelic homologous recombination (AHR) or nonallelic homologous recombination (NAHR) events have occurred are shown in gray. Gray/black arrows indicate homologous sequences. Variants that differ between the homologous sequences are shown above the arrow. Novel haplotypes diagnostic of different reciprocal recombination events are show for (A) NAHR: deletion and duplication and (B) AHR. The problem of identifying recombination (AHR or NAHR) events in sperm can, therefore, be reduced to one of identifying recombinant haplotypes in sperm DNA against a high background of nonrecombinant haplotypes. 346 Part V / Functional Aspects of Genome Structure Table 1 Known NAHR Hotspots Size of Maximum contiguous size of homology Factor of hotspot block enhancementa Refs. Disease Chr. Repeats Hotspot Azoospermia (AZFa) Azoospermia (AZFa) Azoospermia NF1 HNPP CMT1A SMS SMS SMS Y HERV15 ID1 1.3 kb 10 kb 5X 47–50 Y HERV15 ID2 1.7 kb 10 kb 3X 47–50 933 bp 2 kb 557 bp 741 bp 1.1 kb 1.6 kb 524 bp 92 kb 85 kb 24 kb 24 kb 170 kb 170 kb 124 kb 66X 20X 31X 18X 29X 26X 237X 51 52 53 20 54 54 55 3.4 kb 143 kb 9–18X 56 WilliamsBeuren syndrome (WBS) Y 17 17 17 17 17 17 7 Distal P1/P5 — NF1-REPs — CMT1A-REPs — CMT1A-REPs — SMS-REPs Block A hotspot SMS-REPs Block C hotspot LCR17pA — and LCR17pD WBS-LCRs Block B hotspot Bc/Bm on normal (not inverted) background a The factor of enhancement (F) is calculated according to the equation below, which takes into account the size of the hotspot (H), the size of the contiguous duplicated block of appreciable homology (R), and the proportion of breakpoints found within the hotspot (q); F = Rq/H. HERV, human endogenous retrovirus; NF1, neurofibromatosis 1; HNPP, hereditary neuropathy with liability to pressure palsies; CMT1A, Charcot-Marie-Tooth disease type 1A; SMS, Smith-Magenis syndrome; LCR, lowcopy repeat; Chr., chromosome. autosomal region) or from patterns of LD (e.g., the MHC) and were subsequently fine-mapped in males using sperm typing methods. Other AHR hotspots were identified in sequences flanking minisatellites during studies of minisatellite mutation dynamics in sperm (reviewed by ref. 17). Properties of AHR and NAHR Hotspots Despite the relative paucity of well-characterized AHR and NAHR hotspots, a number of shared features of both types of hotspots have been observed (18). Three characteristics shared by both NAHR and AHR hotspots are: (1) they occupy clearly delimited short intervals less than 2 kb in size; (2) there are no obvious sequence similarities either within each class of hotspot or between them; and (3) they appear to be coincident with gene conversion events. In addition to the absence of sequence conservation between hotspots of both HR classes, there is no apparent shared sequence context (e.g., GC content, CpG dinucleotide frequency, poly[A]/poly[T] fraction, dispersed repeats or the presence of [AC]n repeats) that is consistently associated with the presence of hotspots. It may be that recombination hotspots have more similar properties in higher levels of chromatin organization (e.g., chromatin conformation, epigenetic marks, histone modification, intranuclear organization) than they do in pri- Chapter 24 / NAHR Hotspots 347 Fig. 2. Crossover frequencies across two sites, one encompassing an nonallelic homologous recombination (NAHR) and the other an allelic homologous recombination (AHR) hotspot. The number of crossover events observed in neighbouring intervals of homologous sequences separated by single nucleotide polymorphisms (AHR) or paralogous sequence variants (NAHR) demonstrate the existence of tightly confined hotspots for both AHR and NAHR. The CMT1A-REP data come from ref. 20 and the SHOX data come from ref. 46. The number above each column indicates the number of individual events detected. mary sequence. Alternatively, a non-B DNA conformation may initiate breaks that stimulate recombinatorial activity. Most of the NAHR hotspots listed in Table 1 are associated with longer stretches of identity between the paralogous sequences in which they reside. This is not surprising given the known dependence of HR rate on levels of sequence similarity and the presence of much greater variation in sequence identity between paralogous sequences than between allelic sequences in the human genome. This has lead to the idea that gene conversion between paralogous sequences could potentiate subsequent rearrangements as a result of generating greater similarity between the paralogs (homogenization). Such gene conversion events could be considered “premutations” for genomic disorders, analogous to those apparent for triplet repeat disorders. The morphology of AHR hotspots is better characterized than for NAHR hotspots and it appears that for AHR hotspots a central peak in activity is surrounded by symmetrical tails (see Fig. 2). This commonality among AHR hotspots has lead to suggestions that they reflect underlying similarities between hotspots in the initiation and processing of recombination events (19). The intensity of recombination hotspots can be measured in a variety of different ways. Intensity can be considered to be relative, for example the enhancement in activity in hotspots over flanking cold regions, or absolute intensities, for example recombination rate per meiosis. It has been estimated that the SHOX AHR hotspot in the pseudo-autosomal region has a 348 Part V / Functional Aspects of Genome Structure recombination rate of 3/1000 meioses, which is considerably lower than the intensity of some of the most intense hotspots in yeast (10–100/1000 meioses) (19). We can use the incidence of autosomal dominant genomic disorders that result from NAHR to estimate an absolute intensity of their associated NAHR hotspots. Mutations in the 741-bp CMT1A-REP hotspot are found in 56% of all CMT1A duplications, which occur at a de novo rate of approx 1/2500 meioses, indicating an absolute intensity of 0.22/1000 meioses (20). Limited evidence on sex-specific AHR and NAHR rates suggest that there are sex-dependent differences in NAHR hotspot intensities that mirror the well-studied sex-specific differences in the chromosome-wide distribution and frequency of AHR. For example, de novo CMT1A duplications occur preferentially in the paternal germline (21), whereas the TAP2 AHR hotspot appears to be much more active in the female germline (22). Further complexities in the variability of NAHR hotspot activity are suggested by the finding that the preferential location of breakpoints of neurofibromatosis 1 deletions are different in meiotic and mitotic events (23), and that the intensity of the CMT1A-REP hotspot varies among individuals (3). Disentangling the various factors that underlie quantitative differences in NAHR and AHR hotspot intensity between individuals, between hotspots, and between haplotypes should lead to improved understanding of the mechanistic basis underlying recombination rate heterogeneity. MECHANISTIC BASIS OF RECOMBINATION HOTSPOTS Our understanding of the mechanistic basis of homologous recombination in mammals has been driven primarily by the correlation of theoretical models with the roles played by homologs of the well-characterized proteins involved in recombination processes in yeast and bacteria. However, HR is involved in a multiplicity of tightly regulated processes in both soma and germline. In addition, it appears likely that HR can proceed via a number of related, but distinct, pathways. Therefore, it is reasonable to suspect that different subsets of HR proteins are recruited during these different processes. Despite these caveats, we can characterize HR as a process initiated by a double-strand break (DSB), which is subsequently processed to facilitate a homology search, which results in the formation of an intermediate comprising at least one Holliday junction that is then resolved to generate either a crossover or a gene conversion event (24). In principle, a recombination hotspot could result from biases at any one of these stages (i.e., initiation, processing, and resolution) that favor a resulting crossover. The Machinery of HR Meiotic recombination hotspots in yeast result primarily from a biased distribution of DSBs introduced by the protein Spo11 (25). Targeting Spo11 in a sequence-specific manner by fusing it with a DNA-binding domain is sufficient to generate a recombination hotspot in a previously cold region (26). Furthermore, it has been shown that a pre-existing hotspot can be repressed by changes in chromatin structure (27). In yeast, as in other eukaryotes, homologs of the bacterial RecA protein catalyze the invasion of the processed DSB into homologous sequences. Components of the mismatch repair machinery are involved in the assessment of the degree of homology. The ratio of gene conversion to crossover at a hotspot has been found to differ both between mitosis and meiosis, and between different hotspots in the same genome. Moreover, it has been argued that there may be different classes of hotspot in yeast (28). What relevance do these findings have for mammalian HR? There is recent evidence that DSBs form preferentially in hotspot regions in mice (29), but otherwise it remains to be proven Chapter 24 / NAHR Hotspots 349 that mammalian HR hotspots also result from preferential initiation by DSBs. Evidence from sperm-typing (30) and population genetic studies suggest that the majority of recombination intermediates at meiotic AHR hotspots are resolved as gene conversions rather than as crossovers, but there is also evidence of variation in the gene conversion/crossover ratio between hotspots (30). This suggests that hotspot activity is not solely determined by initiation, but also by subsequent events in HR. The sequential recruitment of different proteins to sites of HR during mammalian meiotic recombination has been mapped cytogenetically (31). This mapping has shown that the number of initial sites of DNA–DNA interactions as revealed by foci of RecA homologs seems to be 10-fold larger than the number of eventual crossover events. Only the minority of sites that recruit the mismatch repair protein MLH1 are thought to undergo crossovers. The majority of sites that do not appear to be associated with crossovers recruit a different mismatch repair protein, MSH4. Perhaps this excess of MSH4-recruiting sites reflects the ratio of gene conversion to crossover events observed in the sperm-typing study. Evidence From Additional Observations Additional observations from diverse sources provide further insights into the nature of HR hotspots. In mice it has been demonstrated that AHR hotspot activity declines as sequence similarity lessens (32). This mirrors the enrichment of NAHR hotspots in the more similar portions of paralogous sequences. Clearly, some kind of sequence similarity threshold is necessary, but not sufficient for hotspot activity, although this threshold is poorly understood at present. The situation in NAHR may be complicated by the competition between nonallelic and allelic homologs to a DSB arising within a sequence that is duplicated elsewhere in the genome. Competition between these alternate substrates may determine the ratio of AHR to NAHR and could reflect such factors as the sequence similarity, orientation, and spatial separation between potential substrates in the meiotic nucleus. Could it be that an NAHR hotspot merely reflects an AHR hotspot within a paralogous sequence, which its paralog is of similar sequence identity to the allelic homolog? In this regard the NAHR hotspots residing in the male-specific portion of the Y (MSY) chromosome are particularly interesting given that this region of the genome is constitutively haploid. Paralogous sequences in the MSY that contain these hotspots lack an allelic pairing partner during meiosis. Consequently, these sequences do not undergo AHR, although some of them are known to undergo frequent paralogous gene conversion (33). EVOLUTIONARY ORIGINS OF HOTSPOTS The evolution of recombination hotspots both within humans and between humans and higher primates has been studied only minimally. The evolutionary origins of recombination hotspots is closely related both to the underlying mechanisms of HR, and also to the factors that modulate HR rates. We argue that if we were to have a better understanding of the origins of hotspots, we would also more fully understand their underlying mechanism. In humans, there is conflicting evidence on the issue of whether recombination hotspots are shared between populations. The CMT1A NAHR hotspot appears to be conserved between populations (34,35), as do the AHR hotspots in the class II region of the MHC (36), whereas a study identifying AHR hotspots from large-scale population genetic data finds support for the existence of population-specific hotspots (15). Looking deeper into evolutionary history, it appears that the TAP2 AHR hotspot at least, is not conserved in chimpanzees (37). 350 Part V / Functional Aspects of Genome Structure DSB models for HR suggests that hotspots should be ephemeral phenomena (38). A sequence variant that acts as preferential site for the initiation of HR through the generation of a DSB is likely to be preferentially removed during HR as the initiating strand is gene converted to the state of the less active variant on the homologous sequence. This bias would reduce the frequency of the recombinogenic variant in the population in a process known as “meiotic drive” (a biased inheritance of alleles). This prediction of meiotic drive has been verified experimentally in both human and mouse AHR hotspots (32,39). This leads to the obvious question, if recombinogenic variants are doomed to extinction, how do recombination hotspots arise in the first place? One suggestion is that the number of potential recombination hotspots is more than the number of active hotspots, and that the focusing of activity on a single location allows the other “cryptic” hotspots to evolve neutrally. Once an active hotspot wanes, a nearby cryptic hotspot could become active. This suggests a dynamic relationship between frequency and activity of recombinogenic variants and recombination hotspot location. An intriguing situation that may have a bearing on this relationship is the finding that alternate hotspots exist in the mouse MHC depending on the haplotypes that are recombining (32). Additional insights into the transient nature of recombination hotspots comes from the observations of large inter-individual variation in the intensity of specific AHR hotspots (39,40) that is associated with single base changes in and around the hotspots. It has yet to be demonstrated that inter-individual variation in NAHR hotspot activity can be ascribed to single base changes in the locality of the hotspot. Comparative Studies: Evolutionary Origins of NAHR Hotspots The likely dependence of NAHR rates on levels of sequence similarity between paralogs suggests that monitoring the evolution of this similarity may inform our understanding of the time-depth of NAHR hotspots. One recent study of NAHR between paralogous human endogenous retrovirus (HERV) proviral sequences, flanking the Y-chromosome Azoospermia Factor a (AZFa) locus that when deleted causes male infertility, provided evidence for several hominid-specific gene conversion events that may have rendered the associated hotspots better substrates for chromosomal rearrangements in humans than in either chimpanzees or gorillas (41). Gene conversion generates signatures of concerted evolution in alignments of comparative sequences (see Fig. 3) and in the AZFa-HERVs these signatures are coincident with the location of the NAHR hotspots. If we continue the “premutation” analogy, homogenizing permutations have occurred on the hominid lineage and become fixed in an ancestral species to modern humans. However, because gene conversion and chromosomal rearrangement reflect the alternative products of a common intermediate, it may be that a recombinogenic sequence motif/structure underpins the association and the increased sequence identity resulting from gene conversion plays only a minor role in determining the frequency of chromosomal rearrangement. Nevertheless, the coincidence of signatures of concerted evolution and recurrent breakpoints of chromosomal rearrangements (mapped at the DNA sequence level) may enable the identification of putative rearrangement hotspots from analysis of comparative sequences from great apes. Why Do Recombination Hotspots Exist? Our lack of understanding of recombination rate heterogeneity in the human genome is emphasised by our ignorance as to why recombination hotspots exist in the first place. Recombination rate heterogeneity seems to be a general feature of sexually reproducing eukaryotic Chapter 24 / NAHR Hotspots 351 Fig. 3. The effect of concerted evolution on phylogeny. Independent gene conversion events between paralogous sequences in two sequences lead to the sequences being homogenized within a species. As a consequence, phylogenies constructed from these sequences exhibit a different topology from that expected given the sequence of duplication and speciation events. This phylogenetic pattern can be used to detect regions of the genome undergoing “concerted evolution.” The “true phylogeny” shows the actual evolutionary relationships among the four sequence as a result of duplication preceding speciation. The “apparent phylogeny” is the phylogeny that would be reconstructed from an alignment of the four sequences after undergoing concerted evolution. species. Are AHR hotspots themselves of benefit to such organisms or are they a by-product of the need to regulate recombination events to ensure correct segregation of chromosomes? There are a number of plausible evolutionary scenarios that explain why hotspots themselves might be beneficial. For example: AHR hotspots might ensure that advantageous linkage groups are maintained together, or they might facilitate the operation of selection by ensuring that selective pressures operate independently at two loci under selection on either side of a hotspot (42). It is less easy to find reasons to explain why NAHR hotspots might exist, other than the fact that they are related mechanistically to AHR hotspots. One intriguing prospect is that if NAHR and AHR hotspots are linked by a common mechanism, then the two processes are likely to co-evolve. Given the often pathogenic nature of NAHR, it could be that the machinery of AHR evolved to minimize NAHR. This certainly seems possible. Although individual pathogenic NAHR-promoted rearrangements are so rare that locus-specific selective pressures are likely to be too weak to affect evolutionary change, cumulatively these rearrangements may well assert an appreciable selective pressure. Is there any evidence that AHR has evolved to minimize NAHR? In a hypothetical 3-Gb genome comprised of only single copy sequences, the minimal length of sequence identity between recombining partners required to ensure that HR occurs between allelic sequences needs only be approx 20-bp long (frequency of random reoccurrence of a 20mer is 420, which is much more than 3 Gb). This length is at least an order of magnitude shorter than the apparent minimal length in mammals, which seems likely to be more than 200 bp (43). This could well reflect 352 Part V / Functional Aspects of Genome Structure the presence of so many dispersed repeats (and, hence, potential for NAHR) in the human genome. It is also possible that the lower rate of AHR in males reflects enhancement of NAHR on the sex chromosomes in the heterogametic germline, where a lack of meiotic pairing partner outside of the pseudo-autosomal regions may facilitate pathogenic intrachromosomal rearrangements (44). An additional possibility to take into account when extrapolating from studies dissecting HR processes in model organisms is that the apparently recent proliferation of segmental duplications in primate genomes (45) may have imposed further modification on AHR processes, such that significant differences exist between AHR in primates and in other, less-duplicated, mammalian genomes. CONCLUSIONS AND FUTURE WORK Currently, only a small number of NAHR and AHR hotspots have been characterized in detail at the molecular level. However, whenever NAHR hotspots have been sought, they have been found. Despite this paucity of well-characterized HR hotspots, we have seen that NAHR and AHR hotspots share several important features in common, which may suggest that they share similar molecular mechanisms. However, there are no obvious shared sequence motifs between AHR or NAHR hotspots that might allow us to predict the position of hotspots on the basis of the single reference sequence at our disposal. Although it is difficult to draw firm conclusions from so few data, it seems that NAHR hotspots are likely to be less intense (in terms of locus-specific events per generation) than AHR hotspots. This is not altogether surprising given: (1) competition between allelic and paralogous sequences and (2) a requirement for chromosomal gymnastics for NAHR to proceed. In contrast to many other fundamental processes in the human genome, there are good mechanistic reasons, and limited evidence, that individual NAHR and AHR hotspots are not likely to be highly conserved over large evolutionary distances. Undoubtedly, what is required for a quantum leap in our understanding of HR hotspots are methods that allow these hotspots to be identified in a high-throughput manner. At present, all NAHR hotspots have been identified on the basis of phenotypes associated with the rearrangements they promote. This requirement to collect sufficient numbers of rare rearrangements from the population represents the major barrier to such high-throughput methods. A haplotype-driven (as opposed to a phenotype-driven) assay for NAHR in sperm could be the answer. Sperm-typing methods have been successful in recent years in characterising novel AHR hotspots (13). Candidate locations to test for NAHR hotspot activity could be identified by scanning comparative sequences of segmental duplications in great ape species for signals of concerted evolution. Sperm-typing methods of assaying genome “rearrangeability” would also help to address many of the outstanding fundamental questions concerning NAHR hotspots, which include: • To what degree do rates of NAHR vary between loci, individuals, and populations, and what factors underscore this variation? • What is the relationship between gene conversion hotspots and rearrangement hotspots? • Are NAHR hotspots simply AHR hotspots within paralogous sequences? • Are there different classes of NAHR and AHR hotspots? • Are NAHR and AHR hotspots necessarily short-lived? Although our focus on a specific class of mutational mechanisms underlying relatively rare genetic diseases may appear to be of esoteric interest, we would argue that the role played by NAHR in other more common diseases with a genetic component is likely to have been under- Chapter 24 / NAHR Hotspots 353 ascertained. We think that it is reasonable to assume that every mutational mechanism in the human genome contributes to Mendelian and complex diseases, albeit to varying extents. Currently, NAHR has only been convincingly linked to the former. It has been well recognized that our ability to dissect the genetic basis of complex disease susceptibility will depend greatly on the underlying allelic architecture. This allelic architecture is itself dependent on assumptions about the dynamics of the underlying mutation process. Consequently, our ability to detect the contribution of NAHR to complex disease will depend in large degree on the presently unknown rates and locations of NAHR in the human genome. For example, the widespread existence of NAHR hotspots suggests that chromosomal rearrangements that are indistinguishable at the DNA sequence level may arise recurrently on diverse haplotypic backgrounds. Such haplotypic heterogeneity within a population could hinder the detection of any such etiologically important variants. Greater comprehension of the mutation dynamics of NAHR (and other mutational processes) should be used to improve the experimental design of searches for variants conferring susceptibility to complex diseases. SUMMARY NAHR hotspots within duplicated sequences that promote rearrangements have been found whenever they have been sought with sufficient resolution. AHR and NAHR hotspots share a number of important features in common, implying that they share the same underlying mechanism. Many questions remain regarding the numbers and locations of NAHR hotspots and new methodologies will be needed to gain a genome-wide understanding of this fundamental mutational process. REFERENCES 1. Jeffreys AJ, Kauppi L, Neumann R. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet 2001;29:217–222. 2. Reiter LT, Murakami T, Koeuth T, et al. A recombination hotspot responsible for two inherited peripheral neuropathies is located near a mariner transposon-like element. Nat Genet 1996;12:288–297. 3. Han LL, Keller MP, Navidi W, Chance PF, Arnheim N. Unequal exchange at the Charcot-Marie-Tooth disease type 1A recombination hot-spot is not elevated above the genome average rate. Hum Mol Genet 2000;9: 1881–1889. 4. Pritchard JK, Przeworski M. Linkage disequilibrium in humans: models and data. Am J Hum Genet 2001;69: 1–14. 5. Dawson E, Abecasis GR, Bumpstead S, et al. A first-generation linkage disequilibrium map of human chromosome 22. Nature 2002;418:544–548. 6. Gabriel SB, Salomon R, Pelet A, et al. The structure of haplotype blocks in the human genome. Science 2002;296:2225–2229. 7. The International HapMap Consortium. The International HapMap Project. Nature 2003;426:789–796. 8. Stephens JC, Reich DE, Goldstein DB, et al. Dating the origin of the CCR5-Delta32 AIDS-resistance allele by the coalescence of haplotypes. Am J Hum Genet 1998;62:1507–1515. 9. Mecsas J, Franklin G, Kuziel WA, Brubaker RR, Falkow S, Mosier DE. Evolutionary genetics: CCR5 mutation and plague protection. Nature 2004;427:606. 10. Galvani AP, Slatkin M. Evaluating plague and smallpox as historical selective pressures for the CCR5-Delta 32 HIV-resistance allele. Proc Natl Acad Sci USA 2003;100:15,276–15,279. 11. Kurahashi H, Emanuel BS. Long AT-rich palindromes and the constitutional t(11;22) breakpoint. Hum Mol Genet 2001;10:2605–2617. 12. Fredman D, White SJ, Potter S, Eichler EE, Den Dunnen JT, Brookes AJ. (2004) Complex SNP-related sequence variation in segmental genome duplications. Nat Genet 2004;36:861–866. 13. Arnheim N, Calabrese P, Nordborg M. Hot and cold spots of recombination in the human genome: the reason we should find them and how this can be achieved. Am J Hum Genet 2003;73:5–16. 354 Part V / Functional Aspects of Genome Structure 14. Jeffreys AJ, Murray J, Neumann R. High-resolution mapping of crossovers in human sperm defines a minisatellite-associated recombination hotspot. Molecular Cell 1998;2:267–273. 15. Crawford DC, Bhangale T, Li N, et al. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat Genet 2004;36:700–706. 16. McVean GA, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P. The fine-scale structure of recombination rate variation in the human genome. Science 2004;304:581–584. 17. de Massy B. Distribution of meiotic recombination sites. Trends Genet 2003;19:514–522. 18. Lupski JR. Hotspots of homologous recombination in the human genome: not all homologous sequences are equal. Genome Biology 2004;5:242. 19. Kauppi L, Jeffreys AJ, Keeney S. Where the crossovers are: recombination distributions in mammals. Nat Rev Genet 2004;5:413–424. 20. Lopes J, Ravise N, Vandenberghe A, et al. Fine mapping of de novo CMT1A and HNPP rearrangements within CMT1A-REPs evidences two distinct sex-dependent mechanisms and candidate sequences involved in recombination. Hum Mol Genet 1998;7:141–148. 21. Palau F, Lofgren A, De Jonghe P, et al. Origin of the de novo duplication in Charcot-Marie-Tooth disease type 1A: unequal nonsister chromatid exchange during spermatogenesis. Hum Mol Genet 1993;2:2031–2035. 22. Jeffreys AJ, Ritchie A, Neumann R. High resolution analysis of haplotype diversity and meiotic crossover in the human TAP2 recombination hotspot. Hum Mol Genet 2000;9:725–733. 23. Kehrer-Sawatzki H, Kluwe L, Sandig C, et al. High frequency of mosaicism among patients with neurofibromatosis type 1 (NF1) with microdeletions caused by somatic recombination of the JJAZ1 gene. Am J Hum Genet 2004;75:410–423. 24. Szostak JW, Orr-Weaver TL, Rothstein RJ, Stahl FW. The double-strand-break repair model for recombination. Cell 1983;33:25–35. 25. Baudat F, Nicolas A. Clustering of meiotic double-strand breaks on yeast chromosome III. Proc Natl Acad Sci USA 1997;94:5213–5218. 26. Pecina A, Smith KN, Mezard C, Murakami H, Ohta K, Nicolas A. Targeted stimulation of meiotic recombination. Cell 2002;111:173–184. 27. Ben-Aroya S, Mieczkowski PA, Petes TD, Kupiec M. The compact chromatin structure of a Ty repeated sequence suppresses recombination hotspot activity in Saccharomyces cerevisiae. Mol Cell 2004;15:221–231. 28. Petes TD. Meiotic recombination hot spots and cold spots. Nat Rev Genet 2001;2:360–369. 29. Qin J, Richardson LL, Jasin M, Handel MA, Arnheim N. Mouse strains with an active H2-Ea meiotic recombination hot spot exhibit increased levels of H2-Ea-specific DNA breaks in testicular germ cells. Mol Cell Biol 2004;24:1655–1666. 30. Jeffreys AJ, May CA. Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nat Genet 2004;36:151–156. 31. Moens PB, Kolas NK, Tarsounas M, Marcon E, Cohen PE, Spyropoulos B. The time course and chromosomal localization of recombination-related proteins at meiosis in the mouse are compatible with models that can resolve the early DNA-DNA interactions without reciprocal recombination. J Cell Sci 2002;115:1611–1622. 32. Yauk CL, Bois PR, Jeffreys AJ. High-resolution sperm typing of meiotic recombination in the mouse MHC Ebeta gene. Embo J 2003;22:1389–1397. 33. Rozen S, Skaletsky H, Marszalek JD, et al. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature 2003;423:873–876. 34. Timmerman V, Rautenstrauss B, Reiter LT, et al. Detection of the CMT1A/HNPP recombination hotspot in unrelated patients of European descent. J Med Genet 1997;34:43–49. 35. Warner LE, Reiter LT, Murakami T, Lupski JR. Molecular mechanisms for Charcot-Marie-Tooth disease and related demyelinating peripheral neuropathies. Cold Spring Harb Symp Quant Biol 1996;61:659–671. 36. Kauppi L, Sajantila A, Jeffreys AJ. Recombination hotspots rather than population history dominate linkage disequilibrium in the MHC class II region. Hum Mol Genet 2003;12:33–40. 37. Ptak SE, Roeder AD, Stephens M, Gilad Y, Paabo S, Przeworski M. Absence of the TAP2 human recombination hotspot in chimpanzees. PLoS Biol 2004;2:849–855. 38. Boulton A, Myers RS, Redfield RJ. The hotspot conversion paradox and the evolution of meiotic recombination. Proc Natl Acad Sci USA 1997;94:8058–8063. 39. Jeffreys AJ, Neumann R. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat Genet 2002;31:267–271. Chapter 24 / NAHR Hotspots 355 40. Monckton DG, Neumann R, Guram T, et al. Minisatellite mutation-rate variation associated with a flanking DNA sequence polymorphism. Nat Genet 1994;8:162–170. 41. Hurles ME, Willey D, Matthews L, Hussain SS. Origins of chromosomal rearrangement hotspots in the human genome: evidence from the AZFa deletion hotspots. Genome Biol 2004;5:R55. 42. Hey J. What’s so hot about recombination hotspots? PLoS Biol 2004;2:730–733. 43. Waldman AS, Liskay RM. Dependence of intrachromosomal recombination in mammalian cells on uninterrupted homology. Mol Cell Biol 1988;8:5350–5357. 44. Hurles ME, Jobling MA. A singular chromosome. Nat Genet 2003;34:246–247. 45. Bailey JA, Gu Z, Clark RA, et al. Recent segmental duplications in the human genome. Science 2002;297: 1003–1007. 46. May CA, Shone AC, Kalaydjieva L, Sajantila A, Jeffreys AJ. Crossover clustering and rapid decay of linkage disequilibrium in the Xp/Yp pseudoautosomal gene SHOX. Nat Genet 2002;31:272–275. 47. Kamp C, Hirschmann P, Voss H, Huellen K, Vogt PH. Two long homologous retroviral sequence blocks in proximal Yq11 cause AZFa microdeletions as a result of intrachromosomal recombination events. Hum Mol Genet 2000;9:2563–2572. 48. Blanco P, Shlumukova M, Sargent CA, Jobling MA, Affara N, Hurles ME. Divergent outcomes of intrachromosomal recombination on the human Y chromosome: male infertility and recurrent polymorphism. J Med Genet 2000;37:752–758. 49. Bosch E, Jobling MA. Duplications of the AZFa region of the human Y chromosome are mediated by homologous recombination between HERVs and are compatible with male fertility. Hum Mol Genet 2003;12: 341–347. 50. Sun C, Skaletsky H, Rozen S, et al. Deletion of azoospermia factor a (AZFa) region of human Y chromosome caused by recombination between HERV15 proviruses. Hum Mol Genet 2000;9:2291–2296. 51. Repping S, Skaletsky H, Lange J, et al. Recombination between palindromes P5 and P1 on the human Y chromosome causes massive deletions and spermatogenic failure. Am J Hum Genet 2002;71:906–922. 52. Lopez-Correa C, Dorschner M, Brems H, et al. Recombination hotspot in NF1 microdeletion patients. Hum Mol Genet 2001;10:1387–1392. 53. Reiter LT, Hastings PJ, Nelis E, De Jonghe P, Van Broeckhoven C, Lupski JR. Human meiotic recombination products revealed by sequencing a hotspot for homologous strand exchange in multiple HNPP deletion patients. Am J Hum Genet 1998;62:1023–1033. 54. Bi W, Park SS, Shaw CJ, Withers MA, Patel PI, Lupski JR. Reciprocal crossovers and a positional preference for strand exchange in recombination events resulting in deletion or duplication of chromosome 17p11.2. Am J Hum Genet 2003;73:1302–1315. 55. Shaw CJ, Withers MA, Lupski JR. Uncommon deletions of the Smith-Magenis syndrome region can be recurrent when alternate low-copy repeats act as homologous recombination substrates. Am J Hum Genet 2004;75:75–81. 56. Bayes M, Magano LF, Rivera N, Flores R, Perez Jurado LA. Mutational mechanisms of Williams-Beuren syndrome deletions. Am J Hum Genet 2003;73:131–151.