Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Genetic characterization of commercially available outbred mice and an assessment of their utility for QTL mapping 1 Introduction Genetic dissection of complex disease and quantitative phentoypes in the mouse is limited by a lack of resources for gene identification. By contrast to human genome wide association studies (GWAS), which exploit accumulated historical recombinations to map susceptibility loci with a resolution measured in tens of kilobases, genetic mapping in mice typically uses crosses between inbred strains that deliver a resolution measured in tens of megabases. To improve mapping resolution significantly we need a population of mice with similar population genetics to humans: a large effective population size and dense in independent recombination events. It might appear that these criteria could be found in completely outbred wild, but their use for mapping would encounter the same drawbacks that afflict human GWAS: (i) tens of thousands of subjects are needed for robust detection of common causal variants and (ii) the majority of the genetic variance remains unexplained, even using these large sample sizes. One potential solution is to map in a population in which susceptibility loci consist entirely of known higher-frequency alleles. We have previously demonstrated the potential of this approach using genetically heterogeneous stock (HS) for high-resolution genetic mapping. The HS is descended from eight known inbred strains, subjected to approximately 50 generations of pseudo-random breeding, thereby introducing multiple recombinants. Each animal is a fine-grained mosaic of the progenitors making high resolution 2 mapping possible, and the known ancestry means that, in contrast to wild derived populations, which are likely to contain many rare variants, every allele can be tracked back to the founders. In consequence loci detected in the HS explained on average three quarters of the phenotypic variance of each mapped phenotype (over 100 analysed to date). Quantitative trait loci (QTLs) that contribute to variation in common complex phenotypes in an HS can be mapped into intervals of about 3 Mb, a substantial improvement over mapping in inbred strain crosses, but still too large for gene-level resolution. Previously, we showed that mapping in a commercially available stock, HsdOla:MF1 UK mice, could identify genes. Even though the MF1’s origins are unclear, sequence analysis indicated that it could be modeled as if animals were descended from inbred strains. We exploited this feature of the MF1 to identify Rgs2 as a gene underlying a quantitative trait locus for an anxiety-related phenotype. Subsequently, a US colony of MF1 has been used to obtain sub-megabase mapping resolution on a genome-wide level, for QTLs influencing transcript abundance. Success with the MF1 suggests that commercially available outbred stocks may be a potentially important resource for gene identification. Not only could they deliver genome-wide gene-level mapping resolution, but they could be cheaper to use for mapping than traditional laboratory strains that have to be maintained and crossed within the user’s laboratory. Outbred mice are simply imported, phenotyped and then genotyped. There has been no systematic examination of the genetic architecture of commercially available mice. To date only about half a dozen colonies have 3 been examined, from different often unrelated perspectives: investigations of eight colonies outbred Swiss mice, using assays of protein variation, indicated that the colonies had the same amount of variation found in fully outbred mouse or human populations {Rice, 1980 #263; Cui, 1993 #1591} examination of outbred CD-1 mice found high levels of population substructure {Aldinger, 2009 #8005} and genetic drift has been documented in a colony of CFLP mice {Papaioannou, 1980 #8002}. Important gaps in our current knowledge need to be filled if we are to determine the suitability of commercial stocks for gene identification. First, we lack linkage-disequilibrium (LD) maps: low LD will favour high-resolution mapping. Second, we do not know to what extent colonies are genetically related. We do not know to what extent the frequency of alleles varies between colonies, nor what fraction of variants is rare or private to specific colonies. Stocks with different names are assumed to be genetically different, but we do not know the extent of that differentation nor the extent to which colonies with the same name but sold by different suppliers are genetically similar. Mapping in colonies that consist primarily of high frequency variants will require fewer animals. Furthermore colonies that contain alleles common to laboratory strains would enable loci already detected in inbred strain crosses to be mapped and, potentially, the genes identified. Results Colony breeding protocols, size, age and health status 4 We contacted commercial providers of outbred stocks throughout the world, requesting details on colony sizes, colony history and protocols for maintaining stocks. Table 1 summarizes results from the XX companies that agreed to provide this information. We estimate that this represents XX of global colonies of outbred mice. There is considerable variation in the way animals are maintained and Table 1 documents practices that rule out colonies for genetic mapping. Since unintended directional selection (for example culling small mice) and genetic drift alter genetic diversity some some breeders maintain heterozygosity by periodically crossing the stock to animals taken from a much smaller population (the protocol is called IGS (which stands for….). In consequence a small number of chromosomes are distributed widely throughout the population, introducing large regions of linkage disequilibrium which significantly reduces mapping resolution. With the exception of YY colonies, which we examined to confirm this prediction, we did not genetically characterize colonies using the IGS breeding scheme. Colonies also vary considerably in size, age and health status. Larger colonies (such as XX ) maintain heterozygosity better than smaller colonies. This is because mouse colonies behave very much like finite island populations, except for imposed bottlenecks or forcible introduction of new alleles. The time required for a neutral allele to go to fixation in a population, and hence to reduce heterozygosity, is approximately equal to four times the effective population size (Ne). The age of a colony determines mapping resolution: older colonies accumulate more recombinations and mapping resolution depends primarily on the number of generations since the colony 5 was founded. Finally health status will determine a colony’s suitability in academic laboratories that impose strict health criteria for allowing animals into their facilities. For example only XX colonies had sufficiently clean health reports to be considered for admission into the Mary Lyon Centre, MRC, Harwell UK. Genetic structure: inbreeding and population stratification We started by comparing measures of inbreeding and genetic relatedness within each colony. High rates of inbreeding make colonies less suitable for mapping because they contain fewer (if any) segregating QTLs. Colonies that consist of a mixture of relatives (such as siblings, half siblings, cousins, second degree and third degree relatives) will be difficult to use for mapping because of the differing degrees of genetic relatedness introduces population structure. We screened all populations with 351 markers at four loci chosen so that they could also be used to map QTLs and assess linkage disequilibrium (Table 2) SNPs were spaced so as to allow us to make inferences about both long and short range LD. Each of the four regions extends for approximately 4 megabases (Mb) with a mean intermarker distance of 47 Kb. The QTLs cover four large effect QTLs detected in the HS that are easy and inexpensive to phenotype (large effect QTLs can be detected with relatively few animals). The region on chromosome 17 includes the MHC, highly polymorphic in wild populations and a sensitive indicator therefore of any loss of heterozygosity. However it should be noted that the LD structure of the MHC is atypical of the 6 genome. While these four loci constitute less than 1% of the genome, it is unlikely that they are unrepresentative; if QTLs cannot be mapped at high resolution here, it is unlikely that colonies will be suitable for genome-wide mapping. Our aim was to compare and rank colonies, which could be done with genotypes from the four loci. We included three control populations, with known genetic characteristics: 8 HS mice, 109 collaborative cross mice (a set of XX recombinant inbred lines being created from eight inbred strains and at generation XX of inbreeding when analysed), 94 inbred lines and a population of wild mice caught from multiple sites in Arizona, that consists of unrelated individuals and is more likely to represent a fully outbred population, similar to that used in a human GWAS. Table 3 gives three measures of inbreeding: heterozygosity (inbred colonies will score low on this measure); the percentage of markers that failed a test of Hardy Weinberg equilibrium (HWE) (colonies that consist of inbred but unrelated individuals, will have high scores) and a coefficient of inbreeding that compares the observed versus expected number of homozygous genotypes {Purcell, 2007 #8008}. A measure of relatedness is given in Figure 1: the pairwise extent of similarity between individuals, using the identity by state (IBS) of markers (IBS distance: (IBS2 + 0.5 X IBS1) / ( number of SNP pairs )) {Purcell, 2007 #8008}. The measures detect different features of the genetic structure of the colonies. While low heterozygosity, high HWE failure and high inbreeding coefficient correctly identify the inbred strains, the collaborative cross, which is 7 still not completely inbred, scores relatively well on heterozygosity (19%), but is identified as inbred by the its high inbreeding coefficient (table 2). The IBS distance correctly identifies the CC, inbred strains, HS, and the wild-Lausanne mice as containing more highly related individuals than expected by chance. There are some surprising findings for the commercial outbreds. Four colonies are almost inbred: NTac:NIHBS-US, ClrHli:CD1-IL, Hsd:NIHSBC-IL, BK:W-UK. With heterozygosities < 5% almost all the markers we genotyped were not polymorphic. A further five colonies have heterozygosities less than 10% and so are unlikely to be useful for mapping (nor indeed to be useful for the most of the outbred stock intended purposes). inbreeding coefficients greater than 20% Three colonies have (HsdHu:SABRA-IL, Sca:NMRI- SE_10an, HsdOla:MF1-IL) and a further seven with values greater than 10%. Heterozygosity across all populations (including wild mice, HS, CC and inbred strains) is just over 25%, with about 80% of the total genetic variation attributable to variation within colonies. However restricting attention to commercial stocks gives a weighted mean Fst of 0.108. This contrasts with human populations where estimates of Fst are typically less than 5% (Reich, etc). Genetic relatedness We carried out a PCA using genotypes from all populations to investigate the genetic relationship between colonies. The first two principal components explain 52% of the variation, but neither component is easy to interpret. Figure 3 plots the two components, superimposing the stock name (3a) producer of each colony (figure 3b) and the country of origin name (figure 3c). 8 Stocks that we obtained from a single producer (for example SABRA) form relatively discrete groups, but it was not possible to differentiate stocks that originate from different producers and countries (such as CD1 and NMRI). A similar picture was found using multi-dimensional scaling of an IBS pairwise distance matrix (PLINK) (Figure 4a, 4b, 4c ) This result suggested that many of the commercial stocks are derived from a common set of founders. We attempted to identify this set by considering each genome as originating from K ancestral populations determine genetic ancestry regardless of population identity, (). We looked at values of K from 3 to 12. Figure X shows results for K = 9 and plots of results are given in Supp Fig X. Across the top of each plot we show the names of the outbred stocks, and on the bottom the name of each colony. The proportions of shared ancestry vary considerably between stocks. MF1 and TO stocks consist largely of a single and unique component. CFW divides into two: the Crl derived animals have the same ancestry, different from the Hsd CFW. The latter is indistinguishable from one stock of NMRI (Hsd:NMRI-DE). The large groups of CD1 and NMRI mice are heterogeneous, with large variation between suppliers. We evaluated the relationships between populations using Wright’s fixation indices (Fst), calculated using population allele frequencies (Beaumont). Figure XX(a) shows agglomerative clustering of the Fst distances (without an outgroup it is not possible to root the tree). MF1 stock cluster togetherl, as do NIHS, but there is less consistency for CD1 and NMRI: while stocks from the same supplier aggregate (e.g. CD1 from Crl on 9 the left of the figure) there is no clear partitioning of the two stocks. As in the ancestry analysis the Hsd CFW stock clusters with NMRI from the same supplier (Hsd:NMRI-DE). However we find that the CFW from Crl clusters with other CD1 stocks. We assessed population structure within each colony using multidimensional scaling of the IBS pairwise distance matrices. Supp Figure X shows results for all populations; representative examples are shown in Figure X. We found two or more clusters in eighteen populations. Populations maintained by the IGS system as expected gave rise to population structure, but we also found evidence of structure in XX populations that were maintained by a random breeding method. 10 Mapping resolution We assessed mapping resolution using three measures: (i) haplotypic diversity across each region (ii) genetic diversity measured by the SNPs’ average minor allele frequency (MAF) (iii) mean LD decay radius (defined here to be the mean physical separation in bp between SNPs at which the squared correlation coefficient R2 drops below 0.5). We estimated the variance of mean MAF and LD decay radius by resampling 80% of the data 500 times and re-calculating both measures. We phased haplotypes across the four regions using fastPHASE, following the procedure described in Conrad 2006. We expect the proportion of shared haplotypes between colonies to reflect the genetic relationships. Comparison with the clustering based on Fst shows good agreement (figure X) validating the haplotype reconstruction. The total number of haplotypes Figure we Figure xx shows the results for all populations analysed, sorted by the LD decay radius (there were insufficient genotypes to calculate an LD decay radius for NTac:NIHBS-US and ClrHli:CD1-IL). The mean is shown as a black bar and the 95% confidence intervals as a grey box, with outliers displayed at both extremities. Many colonies have a mean LD decay radius comparable to that found in the wild Arizona mice (0.8), which we can use as a bench mark for a stock appropriate for gene-level mapping resolution. By contrast the HS has a value of 2.9. There are 29 populations with mean MAF less 0.05 or a mean LD decay radius greater than 2 Mb. Combined with exclusions on the basis of 11 poor genetic structure, there are 35 populations that have properties conducive to high resolution mapping; ten of these have LD decay radii of less than 1 Mb. Populations vary considerably in the loci in the extent of LD at the different loci. Figure xx shows LD plots (from Haploview) on chromosome 17 for six populations. Variation is such that although Hsd:Win:NMRI-NL has a mean LD decay radius of just over 1, it will of little use mapping MHC region. We compared our findings from 351 markers with those obtained from whole genome analyses. We used whole genome mouse SNP arrays to interrogate six colonies, chosen to cover a range of LD decay measures: Crl:CFW(SW)-US_P08, HsdWin:CFW1-NL, HsdWin:NMRI-NL, Hsd:ICR(CD1)-FR, RjHan:NMRI-FR, Crl:NMRI(Han)-FR. Figure 4 shows good agreement between the decay of LD with distance averaged across the genome (in 100 Kb windows), compared to the LD decay detected by the 352 SNPs. Comparable measures were found for genetic structure (table 3). Temporal variation The genetic characteristics of colonies will vary over time due to unintended directional selection and genetic drift alter genetic diversity. We assessed XX colonies on XX occasions 12 Sequence analysis and novel variants We used two methods to determine the extent and nature of sequence variation. First we used PCR to amplify 22 fragments of about 1.2 Kb, (see Supp Table xxx for primer information). We randomly selected eight regions from a 5Mb-QTL region we previously mapped on mouse chromosome 1 (REF), four regions from three loci involved in HDL, CD4 and MCV traits (REF) and 2 regions from the AKP2 locus. We sequenced 12 animals from each of the three pilot populations (HsdWin:CFW-1 NL HNL1, Crl:CFW US K71 and HsdWin:NMRI NL HNL1), 12 wild mice animals (DNA provided to us by Alexandre Reymond, University of Lausanne) and 10 classical inbred strains (A/J, AKR/J, BALB/cJ, C3H/HeJ, C57BL/6J, CBA/J, DBA/2J, LP/J, I/LnJ and RIII/DmMobJ). We discovered 120 SNPs (see Supp Table xx for detailed information). Wild mice have an average of one SNP every 200bp., but this rate varies between strains: HsdWin:CFW-1 and Crl:CFW have frequency of 1 SNP every 350bp, whereas HsdWin:NMRI has 1 SNP on average one SNP every 520 bp. Nine of the SNPs are coding variants (table ). We found 3 novel variants (giving a rate of 2.5%) in Crl:CFW (positioned on chr1:173306046, chr1:173368101 and chr17:34785468) and only one (rate 0.8%) in each HsdWin:CFW-1 and HsdWin:NMRI (chr17:34785468). 13 Our locus-specific sequencing data suggest that HsdWin:CFW-1 is closely related to wild-derived inbred strains PWK whereas Crl:CFW and HsdWin:NMRI are related to Swiss-derived inbred strains (eg NOD and FVB). Genome sequencing Genome wide analysis of sequence variants CNVs? QTL mapping A critical determinant of the usefulness of the stock is whether it can be used to replicate and fine-map QTLs detected in other populations {Yalcin, 2004 #32}. We analysed 200 animals from three colonies: Crl:CFW (USA), HsdWin:CFW (Netherlands) and HsdWin:NMRI (Netherlands). Blood samples were taken from a tail vein and we performed assays for serum alkaline phosphatase (ALP), the ratio of CD4+ to CD8+ T-cells, concentration of highdensity lipoproteins (HDL) in serum and mean red cell volume. We found significant association results for three phenotypes (ALP, CD4/CD8 ratio and HDL). Applying a conservative Bonferroni correction for testing 351 markers for four phenotypes in three populations gives a threshold of 4.93, which, as figure XX shows, is exceeded over a 1 Mb interval on chromosome 4 for ALP, a 0.5 Mb region on chromosome 1 for HDL and a two megabase region on chromosome 17 for CD4/CD8 ratio. The QTLs are detected in different populations: ALP detected in Crl:CFW (with less significant evidence for association in HsdWin:NMRI,); HDL in HsdWin:CFW; CD4/CD8 in Crl:CFW and HsdWin:CFW. 14 The extent of the association signal seen in Fig X could be due to linkage disequilibrium between markers, to the presence of multiple independent effects within the same region or due to undetected population structure. To distinguish between these alternatives we used a resample model averaging procedure developed in our analysis of the HS (). Using forward selection to determine which markers to keep in a model explaining phenotypic variation, the data were re-sampled (without replacement) 2,000 times. We determined the performance and resolution of the method by simulating a QTL at each polymorphic marker in the three regions and in all populations. As expected, confidence intervals depended on the location of QTL within a region of high LD, and varied from less than 100Kb to more than 2 Mb (Fig) Results of RMA mapping of the three phenotypes is shown in Figure X with the strength of pairwise LD indicated by a grayscale above the plots (where black circles are R2 of 1). We found no evidence of multiple effects at these loci (as indicated by the logP of second and subsequent rounds of forward selection falling below significance thresholds). The ALP locus remains diffusely spread over a 1 megabase region in both the Crl:CFW and HsdWin:NMRI populations. However much higher resolution is seen for mapping CD4/CD8 ratio and HDL where the 95% confidence intervals (from simulation) is less than 200 Kb in the vicinity of the QTL (?high resolution figure??) Characterization of the molecular basis of CD4/CD8 – h2ealpha is within the location we have identified chr17:34,421,575-34,579,223 15 Characterization of the molecular basis of HD. The locus is chr1: 173.6-73.7 this excludes apoa2 (chr1:173,155,220-173,156,501). It includes Cd48, SlamF1, CD84 and SlamF6. NOTE A DUPLICATION IN LOOKSEQ at 173,759,999-173,775,001 Deletion at 173735500 - 173745500 Discussion We have characterized XX commercially available mouse colonies, from YY breeders in ZZ locations across the world. We document considerable variation in genetic diversity between colonies, estimate inbreeding, population structure and linkage disequilibrium for each colony, catalogue sequence variation and show that colonies can be used to ma a genome wide sca and deshow that linkage disequilibrium with a number of outbred colonies have properties how that some colonies = 80% of the variation On the basis of low heterozygosity, evidence for unexpectedly high genetic relatedness and evidence of population structure Overall, 38 colonies can be excluded. Gst Ht 0.252 Hs 0.200 Gst 0.207 r 0.343 HS and HT are the mean heterozygosity within populations and in the entire population that some colonies are appropriate for high resolution mapping on 16 Variaton between colonies – no single colony is ideal. Needs larger survey of the genomes of all colonies. Colonies fluctuate, partly breeders fault as they move stock around or introduce large chunks of the genome Companies do not Greater awareness of genetic variaton 17 METHODS Sequencing PCR LD Genetic mapping Where necessary, phenotypes are transformed into Gaussian deviates. Covariates (such as gender, age, experimenter, time) that explain a significant fraction of each phenotype’s variance with ANOVA P-value<0.01 are included in subsequent statistical analyses. We use two mapping methods: a single point analysis of variance of each marker and a multi-point method. Haplotypes are reconstructed as mosaics of know inbred strains using a dynamic programming algorithm that minimises the number of breakpoints required {Yalcin, 2004 #32}. These strains are used as progenitors for the multipoint analysis (probabilistic ancestral haplotype reconstruction (in the HAPPY package) {Mott, 2000 #96}. Region-wide significance levels are estimated by permuting the transformed phenotype values 1,000 times. 18 TABLES Table 1 – Mouse providers, location, breeding protocols, health status Table 2 – QTLs and SNPs used to assess colonies Phenotype No. of markers Chr Start End Red cells MCV 1 131.6 134.5 42 CD4/CD8 17 32.6 38.9 112 ALP 4 136.2 139 72 HDL 1 172.6 177.2 125 19 Table 3 – Genetic characteristics of outbred mouse colonies Popullation No. % genotyped % homozygote Het. Pct MAF < 5% Pct fail HWE Mean inbreeding coef Aai:ICR-US 24 98.83 75.92 0.08 6.80 2.27 2.76 BK:W_UK 48 92.17 87.25 0.04 3.12 2.27 8.78 BomTac:NMRI-DK-151 23 91.98 65.16 0.16 2.83 1.70 -5.68 BomTac:NMRI-DK-160 24 93.11 65.72 0.15 3.97 1.98 4.57 109 89.17 5.38 0.19 2.83 89.24 67.28 ClrHli:CD1_IL 20 94.65 93.20 0.01 2.83 0.57 -16.50 Crl:CD1(ICR)-DE 48 94.07 40.51 0.19 18.98 7.08 10.26 Crl:CD1(ICR)-FR 48 94.26 32.01 0.28 15.01 4.53 6.00 Crl:CD1(ICR)-IT 48 95.15 33.71 0.31 13.31 5.38 4.70 Crl:CD1.ICR_UK 48 93.20 30.88 0.27 13.88 3.97 4.40 Crl:CD1(ICR)-US_C61 24 96.81 31.44 0.30 12.75 2.27 0.68 Crl:CD1(ICR)-US_H43 24 96.07 36.54 0.29 9.92 3.97 6.00 Crl:CD1(ICR)-US_H48 24 95.88 37.68 0.30 1.70 2.55 -4.18 Crl:CD1(ICR)-US_K64 48 93.91 29.46 0.30 14.16 5.38 -1.41 Crl:CD1(ICR)-US_K95 24 97.14 44.19 0.28 3.12 2.27 -10.45 Crl:CD1(ICR)-US_P10 24 96.41 42.21 0.22 15.58 1.98 1.56 Crl:CD1(ICR)-US_R16 24 96.86 38.24 0.35 3.40 2.83 -12.10 Crl:CD1.ICR-US_iso 30 97.37 37.96 0.24 11.90 4.25 13.73 Crl:CF1-US 48 94.92 25.50 0.35 4.82 6.80 10.04 Crl:CFW(SW)-US_K71 48 94.25 41.36 0.26 4.25 4.53 6.28 Crl:CFW(SW)-US_P08 48 97.27 29.18 0.22 24.36 0.00 4.65 Crl:MF1_UK 47 93.04 64.87 0.13 1.13 1.13 -2.06 Crl:NMRI(Han)-DE 48 94.74 39.94 0.27 11.61 4.82 1.93 Crl:NMRI(Han)-FR 48 85.44 37.39 0.26 5.67 6.23 12.01 Crl:NMRI(Han)-HU 48 90.37 39.66 0.26 8.22 6.52 0.43 Crl:OF1-FR_B22 24 91.89 26.63 0.35 6.80 6.80 -5.27 Crl:OF1-FR_B41 24 93.77 27.76 0.35 9.07 6.80 -7.98 Crl:OF1-HU 48 92.54 28.05 0.35 5.10 6.80 -1.35 Crlj:CD1(ICR)-JP 48 94.79 41.93 0.21 8.22 7.08 4.61 HS 48 98.44 21.81 0.43 0.57 2.83 -3.88 CC 20 HanRcc:NMRI-CH 48 94.17 66.29 0.20 1.98 1.98 -11.67 Hla:(ICR)CVF-US 48 83.42 49.29 0.21 12.46 4.82 -3.13 Hsd:ICR(CD-1)-DE 48 89.89 47.03 0.29 4.25 5.10 2.13 Hsd:ICR(CD-1)-ES 48 88.56 46.46 0.26 7.37 5.38 3.49 Hsd:ICR(CD-1)-FR 48 93.52 45.04 0.28 5.10 5.38 5.60 Hsd:ICR(CD-1)-IL 48 86.08 43.91 0.29 6.23 3.68 -6.55 Hsd:ICR(CD-1)-IT 48 88.94 47.03 0.28 2.83 4.82 7.52 Hsd:ICR(CD-1)-MX 48 91.28 47.88 0.30 5.10 13.60 -11.34 Hsd:ICR(CD-1)-UK 48 92.96 46.18 0.28 5.95 3.97 -0.34 Hsd:ICR(CD-1)-US 48 95.99 48.16 0.28 6.80 5.38 4.36 Hsd:ND4-US 48 93.68 69.97 0.07 17.00 2.27 4.89 Hsd:NIHSBC_IL 12 91.64 90.93 0.02 1.42 0.57 3.11 Hsd:NIHS_UK_C 15 93.75 68.56 0.11 10.48 1.70 6.36 Hsd:NIHS_UK_G 33 92.63 75.07 0.11 3.40 3.12 -5.09 Hsd:NIHS-US 48 92.11 54.67 0.19 6.52 9.92 -18.01 Hsd:NSA(CF1)-US 48 93.30 30.88 0.34 12.18 11.61 1.90 HsdHu:SABRA_IL 48 91.97 45.04 0.22 5.67 22.38 25.44 HsdIco:OF1-IT 48 90.48 30.31 0.34 2.27 13.60 5.22 HsdOla:MF1_IL 8 90.51 50.42 0.21 0.00 1.70 21.38 HsdOla:MF1-UK_C 48 72.71 26.06 0.21 10.20 4.25 5.31 HsdOla:MF1_UK_G 48 93.90 41.08 0.28 7.37 3.40 -0.65 HsdOla:MF1-US_202A_iso 24 93.87 75.35 0.13 1.70 0.85 -6.90 HsdOla:MF1-US_202A_prod 24 94.76 75.35 0.13 1.13 0.85 -9.21 HsdOla:TO_UK 48 93.63 71.10 0.10 4.25 3.68 9.47 HsdWin:CFW1-DE 48 87.64 49.01 0.24 9.92 7.93 -0.88 HsdWin:CFW1-NL 48 82.99 51.84 0.21 7.93 4.82 3.62 HsdWin:NMRI-DE 48 90.78 58.07 0.20 6.80 2.27 -8.87 HsdWin:NMRI-NL 64 93.96 57.79 0.19 5.95 3.12 2.11 HsdWin:NMRI_UK 32 93.92 62.89 0.12 15.58 1.70 -4.89 IcrTac:ICR-US 36 89.28 69.69 0.06 13.31 2.55 5.40 Inbreds_94_strains 94 91.25 0.00 0.00 2.83 98.58 100.00 NTac:NIHBS-US 36 91.71 93.77 0.01 1.98 0.57 -53.44 RjHan:NMRI-FR 48 92.58 31.16 0.28 14.45 13.60 17.80 RjOrl:Swiss-FR 48 91.68 64.87 0.17 1.70 3.40 -9.22 21 Sca:NMRI_SE-10an 24 75.51 70.82 0.09 5.38 5.38 22.31 Sca:NMRI_SE_22 24 80.63 75.07 0.09 3.97 3.12 15.16 Sim:(SW)fBR-US_A1 48 94.56 74.50 0.10 5.67 3.68 12.43 Sim:(SW)fBR-US_B1 24 95.82 79.60 0.11 1.42 1.13 -7.87 Tac:SW-US 36 92.67 46.18 0.33 1.98 3.97 -2.00 Wild_Arizona 96 85.77 17.85 0.26 13.31 38.81 27.86 22 Table : Whole genome analyses Population No. Markers Genos. Hom. Het. MAF HWE Inbreed coef Crl:CFW(SW)-US_P08 22 169,333 97.30 71.06 0.19 8.00 6.36 -20.86 HsdWin:CFW1-NL 22 152,716 97.17 74.55 0.18 4.98 7.15 -20.70 HsdWin:NMRI-NL 26 164,287 97.41 73.02 0.13 4.51 7.23 -18.33 Hsd:ICR(CD-1)-FR 20 623,124 87.24 45.19 0.22 10.50 1.53 -11.82 RjHan:NMRI-FR 13 171,198 96.49 63.33 0.18 4.69 7.59 -10.62 Crl:NMRI(Han)-FR 20 623,124 87.04 38.09 0.24 11.09 4.55 3.14 23 Figure 1: Linkage disequilibrium decay radius and minor allele frequencies in outbred mice. The figure shows the distribution of 250 analyses of resampled data 24 Figure 2 25 Figure 3 26 27 28 Figure 29 : 30 Figure 4 31 Figure 4: Linkage disequilibrium in six colonies at the MHC locus on mouse chromosome 17 32 Figure 4 : Decay of linkage disequilibrium with distance; whole genome compared to locus specific analyses 33 Figure 5: QTL mapping of three phenotypes in three colonies 34 Figure 6: Simulation of resample model averaging Performance of the SMA method depends on the position of the QTL and the population analysed. Here the resolution of the RMA (indicated by the distribution of the black dots) varies according to the postion of a simulated QTL, indicated by dotted red lines) 35 Figure 7 Resample model averaging and linkage disequilibrium 36 SUPPLEMENTAL MDS FIGURE OF ALL POPULATIONS 37