* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CommercialOutbreds07..
Site-specific recombinase technology wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Inbreeding avoidance wikipedia , lookup
Genetic studies on Bulgarians wikipedia , lookup
Koinophilia wikipedia , lookup
Medical genetics wikipedia , lookup
Behavioural genetics wikipedia , lookup
Genetic engineering wikipedia , lookup
History of genetic engineering wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Genetic testing wikipedia , lookup
Heritability of IQ wikipedia , lookup
Genetics and archaeogenetics of South Asia wikipedia , lookup
Public health genomics wikipedia , lookup
Genome (book) wikipedia , lookup
Genetic drift wikipedia , lookup
Microevolution wikipedia , lookup
Population genetics wikipedia , lookup
Genetic characterization of commercially available outbred mice and an assessment of their utility for QTL mapping 1 Abstract Diversity between stocks – mapping possible in one and not another; gene level resolution possible at some loci, not all; stock choice important; Limited genetic diversity and descent from a common set of alleles, present in laboratory inbred strains. Mapping resolution demonstrated down to under 100 Kb. 35 stocks provide sub 1Mb mapping; haplotype maps provided for 6. Introduction What characterizes an ideal population for gene mapping studies? Mouse geneticists have reason to envy the success of human genome-wide association studies (GWAS), but not necessarily to adopt their practice, for example by using wild mice {Laurie, 2007 #7307}. So doing entails the same drawbacks that afflict human GWAS: tens of thousands of subjects are needed for robust detection of common causal variants and the majority of the genetic variance remains unexplained, even using these large sample sizes. What are the alternatives? One solution, available to mouse geneticists, is to design an ideal population by breeding. The design principles for the ideal population can be expressed in terms of linkage disequilbrium (LD) decay, the fall-off in correlation between genotypes with increasing distance between markers. High rates of decay are 2 found in populations with large effective population sizes (minimizing the effects of homozygosity due to genetic drift) and many generations of random mating (introducing large numbers of recombinants that break up correlations between genotypes). Unfortunately a necessary corollary is the presence of rare alleles as allele frequencies drift to extremes and new, rare, alleles arise as a consequence of mutations. The more rare alleles in a population, and the more they contribute to phenotypic variation, the more difficult it will be to detect the responsible quantitative trait loci (QTLs) using genome-wide association strategies that genotype only common alleles {Dickson, 2010 #8190}. The best strategy to create a stock where there are few if any rare variants, while maintaining high genetic diversity and low LD, might seem to be to choose animals from highly divergent populations, such as wild mice caught in many locations{Bonhomme, 2007 #8193}, or from animals currently being used to create the collaborative cross, a set of 1,000 recombinant inbred lines derived from highly genetically divergent progenitor strains {Churchill, 2004 #6639}. However consideration of the properties required for mapping reveals that this strategy is not ideal. Mice from different populations will have a proportion of variants in common and a proportion of variants that are unique to the animals (being present in one population only). LD decay for the latter private variants will depend solely on recombinants accumulated during the creation of the stock, while LD decay for the former, common, variants will depend on the ancestry of the two founding populations. It follows that high mapping resolution is best 3 obtained by using animals from the same mating population to reduce the number of private alleles. Surprisingly the ideal population may already be available. Commercial mouse breeders, such as Harlan and Charles River Laboratories, maintain large colonies of outbred mice that may have the necessary genetic structure. Some outbred stocks are known to derive from animals from a single population, such as the ‘Swiss’ stocks which descend from two male and seven female imported from Lausanne, Switzerland {Lynch, 1969 #8188}. Furthermore LD in some outbred stocks has been shown to allow highresolution mapping {Ghazalpour, 2008 #8189}, sufficient to identify genes {Yalcin, 2004 #6820}. However other findings argues against the use of commercial outbreds for genetic mapping: investigations of eight colonies outbred Swiss mice, using assays of protein variation, indicated that the colonies had the same amount of variation found in fully outbred mouse or human populations {Rice, 1980 #263; Cui, 1993 #1591}; examination of outbred CD-1 mice found high levels of population substructure {Aldinger, 2009 #8005} and genetic drift has been documented in a colony of CFLP mice {Papaioannou, 1980 #8002}. Groundwork responsible for the successful application of human GWAS required both the development of sufficient markers as well as the genetic characterization of different populations. Similar work is needed in mouse genetics. Dense marker sets and tools for their genotyping are now available, but we lack systematic characterization of the genetic architecture of suitable populations. In this paper we set out to estimate: (i) the degree of 4 genetic relatedness within and between commercially available outbred populations, and thereby determine whether inbreeding and population structure preclude the use of the population; (ii) linkage-disequilibrium (LD) in each stock (low LD will favour high-resolution mapping); (iii) the proportions of common and rare variants. In order to assess the latter we tested the hypothesis that stocks are descended from a common source: the laboratory inbred strains. Populations in which this assumption holds true, and which have low levels of LD, are most suitable for high-resolution mapping. Results Stocks, colonies and genetic markers Table 1 lists the populations that we obtained for this study and the numbers of animals we used. We included three control populations, with known genetic characteristics: 12 heterogeneous stock mice {Valdar, 2006 #7175}, 109 collaborative cross mice {Churchill, 2004 #6639}, 94 inbred strains {Shifman, 2006 #7159} and a population of wild mice caught from multiple sites in Arizona that is likely to represent a fully outbred population, similar to that used in a human GWAS{Laurie, 2007 #7307}. We use the term “colony” to mean a population of mice maintained as a mating population at a single location, and “stock” to mean a collection of colonies that are given the same stock designation by the breeders. For example Crl:CFW-SW and Crl:CFW-UK are two colonies from the same stock. One might expect colonies from the same stock to have descended 5 from the same founding population and only to differ to a relatively minor extent caused by genetic drift but breeding practices may invalidate this assumption (for example when two colonies are mixed). Therefore, where possible, we treat colonies as separate populations. We follow the international standardized nomenclature for outbred stocks {Festing, 1993 #8218}, but add two further pieces of information: a two letter code for the country of origin and a code for the colony name used by the commercial provider (e.g. Crl:CFW-US-P08). There is considerable variation in the size of colonies and the way animals are maintained (Table 1). Since unintended directional selection (for example culling small mice) and genetic drift alter genetic diversity, some breeders maintain heterozygosity by periodically crossing the stock to animals taken from a much smaller population (the protocol is called IGS (which stands for….). In consequence a small number of chromosomes are distributed widely throughout the population, introducing large regions of linkage disequilibrium that significantly reduce mapping resolution. With the exception of YY colonies, which we examined to confirm this prediction, we did not genetically characterize colonies using the IGS breeding scheme. We analysed all colonies with 351 markers at two loci on chromosome 1 (131.6-134.5 Mb and 172.6-177.2 Mb) one locus on chromosome 4 (136.2139 Mb) and one locus on chromosome 17 (32.6-38.9 Mb). We also carried out genome-wide analyses in a subset of animals and stocks. SNPs at the four loci were spaced so as to allow us to make inferences about both long and short range LD. Each of the four regions extends for approximately 4 megabases (Mb) with a mean intermarker distance of 47 Kb. They were 6 chosen because they include large effect QTLs detected in the HS that are easy and inexpensive to phenotype (large effect QTLs can be detected with relatively few animals): serum alkaline phosphatase (ALP), the ratio of CD4+ to CD8+ T-cells, concentration of high-density lipoproteins (HDL) in serum and mean red cell volume. The region on chromosome 17 includes the MHC, highly polymorphic in wild populations and a sensitive indicator therefore of any loss of heterozygosity. While these four loci constitute less than 1% of the genome, if QTLs cannot be mapped at high resolution here, it is unlikely that colonies will be suitable for genome-wide mapping. Inbreeding, genetic relatedness and genetic drift We started by comparing measures of inbreeding. High rates of inbreeding make colonies less suitable for mapping because they contain fewer (if any) segregating QTLs. Colonies that consist of a mixture of relatives (such as siblings, half siblings, cousins, second degree and third degree relatives) will be difficult to use for mapping because of the differing degrees of genetic relatedness introduces population structure. Table 2 gives four measures of inbreeding: mean minor allele frequency (MAF), heterozygosity (inbred colonies will score low on this measure); the percentage of markers that failed a test of Hardy Weinberg equilibrium (HWE) (colonies that consist of inbred but unrelated individuals, will have high scores) and a coefficient of inbreeding that compares the observed versus expected number of homozygous genotypes {Purcell, 2007 #8008}. 7 The measures detect different features of the genetic structure of the colonies. While low heterozygosity, high HWE failure and high inbreeding coefficient correctly identify the inbred strains, the collaborative cross, which at the time of genotyping (2008) was not completely inbred, scores relatively well on heterozygosity (19%), but is identified as inbred by the its high inbreeding coefficient (table 2). There are some surprising findings on the degree of genetic heterogeneity in commercial outbreds. Four colonies are almost inbred: NTac:NIHBS-US, ClrHli:CD1-IL, Hsd:NIHSBC-IL, BK:W-UK. With heterozygosities < 5% almost all the markers we genotyped were not polymorphic. A further five colonies have heterozygosities less than 10% and so are unlikely to be useful for mapping (nor indeed to be useful for the most of the stocks’ intended purposes). coefficients greater than 20% Three colonies have inbreeding (HsdHu:SABRA-IL, Sca:NMRI-SE_10an, HsdOla:MF1-IL) and a further seven with values greater than 10%. We attempted to distinguish between colonies using methods applied in human genetic analysis, but while principal components and Fst analysis revealed population differentiation (Supp Figs) we could find no feature (not stock, colony, producer of country of origin) that satisfactorily accounted for the distribution. These difficulties led us to determine genetic ancestry regardless of stock identity. We considered each colony as originating from K unknown ancestral populations and looked at values of K from 3 to 12 (80 – check with Amelie) using the FRAPPE software package{Li, 2008 #8220} {Tang, 2005 #8219}. 8 Two results were noteworthy. First, at no value of K were we able to differentiate all stocks. In a few cases a single component predominates, uniquely distinguishing a stock (MF1 and CFW stocks are examples), but in general stocks differ in the proportions of common ancestry. This is true of the most widely used stocks, CD1 and NMRI (Figure 1). Ancestry also confirms the similarity between ICR and CD1, essentially the same stocks. Second, there is considerable variation within a stock, which is largely explained by variation between colonies, as shown for example by the varying proportions of colour in the CD1 and NMRI stocks (Figure 1). One likely contribution to variation is from population structure within the colonies. We looked for evidence of this using multi-dimensional scaling of IBS pairwise distance matrices {Li, 2008 #8222}. Supp Figure X shows results for all populations; representative examples are shown in Figure 2. We found two or more clusters in eighteen populations. Finally we looked at allele frequency fluctuation over time, which is expected to occur due to unintended directional selection and random genetic drift. Results obtained from Hsd:MF1 animals used in 2003 were strikingly different from those purchased in 2007: heterozygosity fell from 30% to 5% and the inbreeding coefficient rose from 3 to more than 30. We discovered that due to infection the colony had been reformed from a small number of rederived founders, thereby introducing a severe population bottleneck and explaining the changes in genetic architecture. However such drastic changes are unusual. We surveyed five more colonies, at least one year after our initial analysis and found good agreement between heterozygosity, relatedness, inbreeding (Table 4) measured on the two occasions. 9 QTL Mapping resolution We assessed mapping resolution at the four test loci by the LD decay radius, defined as the mean physical separation in base pairs (bp) between SNPs at which the squared correlation coefficient (R2) drops below 0.5. Figure 3 shows results for all populations analysed (there were insufficient genotypes to calculate LD for NTac:NIHBS-US and ClrHli:CD1-IL). Populations suitable for high-resolution mapping should have low LD decay radius and high mean MAF. Average figures of LD decay mask variation between regions. For example Hsd:Win:NMRI-NL has a mean LD decay radius of just over 1, but it will be of little use mapping the MHC region where LD is extensive. However a region with high LD in one population may have low LD in another. This locus to locus variation means that no single population is ideal and that genome-wide haplotype maps are needed. Therefore we explored genomewide variation in LD in six colonies, chosen to cover a range of mean LD decay measures. After genotyping using the mouse diversity array {Yang, 2009 #8223} and discarding non-polymorphic markers, haplotype blocks were estimated using PLINK {Purcell, 2007 #7230}, which implements the same block finding algorithm found in HAPLOVIEW {Barrett, 2005 #6834}. Measures of relatedness and inbreeding agreed with those obtained from the single locus analyses (table 3). Over the genome, mean block length varied between the six colonies: Crl:CFW.SW-US 403.9 Kb (standard deviation (sd) 570.9), Crl:NMRI.Han-FR 10 39.53 Kb (sd 58.7), Hsd:ICR.CD1-FR 51.1 Kb (sd 79.5), HsdWein:CFW-NL 440.1 Kb (sd 573.8), HsdWin:NMRI-NL 374.5 Kb (sd 525.5) and RjHan:NMRIFR 264.0 Kb (sd 398.0). As expected, LD varied considerably across the genome and we present the findings for each chromosome at http://www.well.ox.ac.uk/mouse/outbreds/haploview. Haplotypes in commercial outbreds are found in laboratory strains We estimated the contribution of each inbred strain to each stock’s genetic architecture by reconstructing the genome of each mouse as a probabilistic mosaic of the founders using a hidden Markov model {Mott, 2000 #5686}. We used the Perlegen NIEHS genotypes {Frazer, 2007 #7202} as a reference set of 15 inbred founders and analysed all stocks at the four loci (figure 4a) and performed genome-wide analyses in a subset of colonies (figure 4b). While there is considerable variation between colonies two general patterns are clear in both locus-specific and genome-wide analyses. First, in all colonies the fraction accounted by classical inbred strains ranges between 42% (the NIHS colonies) to 80% (most ICR/CD1). Averaged across all colonies and over the four loci, most inbred strains contribute between 3-8% of the haplotype fraction, whilst 129, FVB and NOD contribute 12-14%. Second the wild-derived strains (WSB, CAST, FVB, MOLF) contribute the least (3-5%). The NIHS stocks contain the highest contribution of the Swiss mouse FVB (25-35%). NMRI are 15-20% FVB and 15% 129, CD1 about 15% FVB and MF1 only 5% The CFW stocks all contain about 15% FVB. 11 Sequence analysis and novel variants Probabilistic ancestral haplotype reconstruction assumes that the haplotypes of the progenitors are identical to those of the outbreds. We used two methods to determine whether this was true. First, we used PCR to amplify 22 fragments of about 1.2 Kb, (see Supp Table xxx for primer information). We randomly selected eight regions from a 5Mb-QTL region we previously mapped on mouse chromosome 1 (REF), four regions from three loci involved in HDL, CD4 and MCV traits (REF) and 2 regions from the AKP2 locus. We sequenced 12 animals from three populations (HsdWin:CFW-1 NL HNL1, Crl:CFW US K71 and HsdWin:NMRI NL HNL1), 12 wild mice animals (DNA provided to us by Alexandre Reymond, University of Lausanne) and 10 classical inbred strains (A/J, AKR/J, BALB/cJ, C3H/HeJ, C57BL/6J, CBA/J, DBA/2J, LP/J, I/LnJ and RIII/DmMobJ). We discovered 120 SNPs (see Supp Table xx for detailed information). Wild mice have an average of one SNP every 200 bp but this rate varies between colonies: HsdWin:CFW-1 and Crl:CFW have frequency of 1 SNP every 350 bp, whereas HsdWin:NMRI has 1 SNP on average one SNP every 520 bp. Nine of the SNPs are coding variants (table ). We found 3 novel variants (giving a rate of 2.5%) in Crl:CFW (positioned on chr1:173306046, chr1:173368101 and chr17:34785468) and only one (rate 0.8%) in each HsdWin:CFW-1 and HsdWin:NMRI (chr17:34785468). Our locus-specific sequencing data suggest that HsdWin:CFW-1 is related to wild-derived inbred 12 strains PWK whereas Crl:CFW and HsdWin:NMRI are related to Swissderived inbred strains (eg NOD and FVB). Second we used next generation sequencing to estimate genome wide rates of novel SNPs. We took two approaches, sequencing at ten fold coverage DNA from four mice from one colony () and restriction enzyme digest enrichment. [ FASTERIS RESULTS] QTL mapping The implication from haplotype reconstruction and sequence analyses and is that colonies are descended from a common set of progenitors. Consequently many of the same alleles, though differing in frequency, will contribute to phenotypic variation in different colonies. We directly investigated this hypothesis by mapping QTLs contributing to variation in four phenotypes (serum alkaline phosphatase (ALP), the ratio of CD4+ to CD8+ T-cells, concentration of high-density lipoproteins (HDL) in serum and mean red cell volume) in three populations (Crl:CFW (USA), HsdWin:CFW (Netherlands) and HsdWin:NMRI (Netherlands)). We tested with a joint analysis in which QTLs were mapped simultaneously in the three stocks. This showed that the assumption that a single trait effect for each founder strain, independent of stock, fitted the data as well as a model in which each stock had independent effects. [ RICHARD ] Finally we directly investigated the extent to which variation in allele frequency and in LD affects mapping resolution. We analysed the data by 13 ANOVA at each marker (single marker analysis). Applying a conservative Bonferroni correction for testing 351 markers for four phenotypes in three populations gives a threshold of 4.93, which, as Figure 5 shows, is exceeded over a 1 Mb interval on chromosome 4 for ALP, a 0.5 Mb region on chromosome 1 for HDL and a two megabase region on chromosome 17 for CD4/CD8 ratio. Figure X shows that QTLs are detected in different populations: ALP in Crl:CFW (with less significant evidence for association in HsdWin:NMRI,); HDL in HsdWin:CFW; CD4/CD8 in both Crl:CFW and HsdWin:CFW. We determined the most likely position of the QTL by resample model averaging, a procedure developed in our analysis of the HS {Valdar, 2009 #7988}. We determined the performance and resolution of the method by simulating a QTL at each polymorphic marker in the three regions and in all populations. As expected, confidence intervals depended on the location of QTL within a region of high LD, and varied from less than 100Kb to more than 2 Mb (examples are given in Supplemental Figure X) We found no evidence of multiple effects at these loci (as indicated by the logP of second and subsequent rounds of forward selection falling below significance thresholds). The ALP locus remains diffusely spread over a 1 megabase region in both the Crl:CFW and HsdWin:NMRI populations. However much higher resolution is seen for mapping CD4/CD8 ratio and HDL where the 95% confidence intervals (from simulation) is less than 200 Kb in the vicinity of the QTL. Figure X plots the position of the most significant locus identified by forward selection and indicates the LD structure of the region above the plots (where red circles are R2 of 1). 14 Characterization of the molecular basis of CD4/CD8 – h2ealpha is within the location we have identified chr17:34,421,575-34,579,223 Characterization of the molecular basis of HD. Discussion Commercially available outbred mice are used primarily by the pharmaceutical industry for toxicology testing, on the assumption that they model outbred human populations, a view supported by limited genetic surveys {Rice, 1980 #263}. In fact very little is known about their genetic architecture and assumptions about the combined effects of fluctuating allele frequencies (due to genetic drift) and lack of genetic quality control have led some to argue against their use in genetic investigations {Chia, 2005 #8130; Festing, 1999 #8134}. Our catalogue of the genetic structure of commercially available stocks makes a systematic evaluation possible for the first time. We have established three important features. First, variation between colonies is large. Fst, a measure of variation within and between populations, is 0.454 (in contrast human populations values are typically less than 0.05 {Reich, 2009 #8148}). The source of this variation is not straightforward. Stock names (such as NMRI or CD1) do not account for it, nor does the supplier, nor the country. While we can show that some stocks, such as TO and MF1, do indeed have a unique genetic ancestry, many do not. 15 To a large extent variation is colony specific. Mouse colonies are often believed to behave very much like finite island populations. In which case, except for imposed bottlenecks (as happened with the MF1) or the forcible introduction of new alleles (as happens with breeding schemes like IGS that introduce large unrecombined chunks of the genome), genetic variation will depend on the effective population size (Ne): assuming random mating, the time required for a neutral allele to go to fixation in a population, and hence to reduce heterozygosity, is approximately equal to four times Ne. Given that so many colonies are maintained with effective population sizes of many thousands, colony genetic architecture should be stable. Consistent with this view, our analyses of five colonies over two years found little evidence for changes in allele frequencies and LD values. Second, the number of alleles segregating in colonies is relatively limited (compared to a wild population). Almost all of the genetic variants can be found in classical laboratory strains. Both locus specific and genome wide sequencing support this conclusion and haplotype reconstruction demonstrates how variants in the outbreds can be modeled as descending from inbred progenitors. Third, in terms of mapping resolution, no mouse colony is comparable to a human population. Using an LD criterion, the best mapping resolution in any colony is at least twice that obtainable in human populations. Applying the same definition of a haplotype LD block as used in human LD studies, we find average block size varies between colonies from 40 to 400 Kb. By contrast in African populations average block length is 9Kb, and 18 Kb in European populations {Gabriel, 2002 #6159}. 16 These observations have important implications for the use of commercial outbreds for genetic mapping. The extent of LD means that genome-wide coverage can be obtained with fewer SNPs (about 200,000 and less for colonies with larger blocks) than in human populations, but resolution may fall short of gene level in many parts of the genome. This means that high resolution mapping of a locus may be possible in one colony, but not in another – no single colony is ideal. We have shown this for the MHC region on chromosome 17, where high-resolution mapping was possible in the HsdWin:CFW but not in Crl:CFW stocks. However, as we have repeatedly emphasized, mapping resolution is not the only useful measure of a colony’s suitability for GWAS. Another critical measure is allele frequency. Large numbers of rare variants contributing to phenotypic variation in a population will make the trait difficult to map using standard GWAS designs. Here our data reveal a favorable situation: QTL mapping assuming a common set of founder strains shows that the QTLs replicate between stocks in a consistent manner. These findings suggest that quantitative differences in allele frequencies, rather than the existence of private alleles, are responsible for the population differences. Furthermore, the limited sequence diversity means it is possible to impute the sequence of any commercially available mouse from a dense SNP map. Thus the full catalogue of sequence variation in a stock could be obtained by sequencing the inbred strains presumed to be founders for it, and genotyping the stock at a skeleton of SNPs. Therefore we should be able to detect the effect of all variants, a situation that has so far eluded studies in completely outbred populations. 17 Our catalogue of the genetic structure of commercially available stocks, the first of its kind, makes it possible to rank colonies according to their utility for genetic mapping. Combined with exclusions on the basis of poor genetic structure, we have identified 35 populations that have properties conducive to high-resolution mapping. These 35 populations appear to be substantially superior to currently available resources for high-resolution mapping. LD is for example lower than in the HS or collaborative cross animals. Our results now make it possible for geneticists to make informed choices on the use of the stocks and to use them for GWAS studies of complex traits in mice. [By mapping in different colonies we identified a deletion in the promoter of h2-ealpha as the molecular change that contributes to variation in CD4/CD8 levels. This locus has recently been identified in humans and the homologous gene is therefore a prime candidate (). Furthermore, our work on the HDL locus has identified two previously unsuspected candidates (Slams and CD48)] . Acknowledgements 18 19 METHODS Mice Genotyping Sequencing Phenotyping We analysed 200 animals from three colonies: Crl:CFW (USA), HsdWin:CFW (Netherlands) and HsdWin:NMRI (Netherlands). Blood samples were taken from a tail vein and we performed assays for serum alkaline phosphatase (ALP), the ratio of CD4+ to CD8+ T-cells, concentration of high-density lipoproteins (HDL) in serum and mean red cell volume. LD Genetic mapping Where necessary, phenotypes are transformed into Gaussian deviates. Covariates (such as gender, age, experimenter, time) that explain a significant fraction of each phenotype’s variance with ANOVA P-value<0.01 are included in subsequent statistical analyses. We use two mapping methods: a single point analysis of variance of each marker and a multi-point method. 20 Haplotypes are reconstructed as mosaics of know inbred strains using a dynamic programming algorithm that minimises the number of breakpoints required {Yalcin, 2004 #32}. These strains are used as progenitors for the multipoint analysis (probabilistic ancestral haplotype reconstruction (in the HAPPY package) {Mott, 2000 #96}. Region-wide significance levels are estimated by permuting the transformed phenotype values 1,000 times. 21 TABLES Table 1 – Mouse providers, location, breeding protocols, health status 22 Table 2 – Genetic characteristics of outbred mouse colonies Pct fail HWE Mean inbreeding coef Haplotypes LD decay radius Mean MAF 6.80 2.27 2.76 2.21 1.88 0.026 0.04 3.12 2.27 8.78 0.63 1.12 0.024 65.16 0.16 2.83 1.70 -5.68 2.61 1.07 0.068 93.11 65.72 0.15 3.97 1.98 4.57 2.29 0.87 0.075 109 89.17 5.38 0.19 2.83 89.24 67.28 3.61 2.78 0.254 ClrHli:CD1_IL 20 94.65 93.20 0.01 2.83 0.57 -16.50 0.95 Crl:CD1.ICR_UK 48 93.20 30.88 0.27 13.88 3.97 4.40 4.04 1.00 0.126 Crl:CD1.ICR-US_iso 30 97.37 37.96 0.24 11.90 4.25 13.73 3.67 1.37 0.152 Crl:CD1(ICR)-DE 48 94.07 40.51 0.19 18.98 7.08 10.26 2.48 1.24 0.090 Crl:CD1(ICR)-FR 48 94.26 32.01 0.28 15.01 4.53 6.00 2.46 0.73 0.133 Crl:CD1(ICR)-IT 48 95.15 33.71 0.31 13.31 5.38 4.70 4.40 0.76 0.161 Crl:CD1(ICR)-US_C61 24 96.81 31.44 0.30 12.75 2.27 0.68 7.58 1.18 0.114 Crl:CD1(ICR)-US_H43 24 96.07 36.54 0.29 9.92 3.97 6.00 4.08 0.89 0.130 Crl:CD1(ICR)-US_H48 24 95.88 37.68 0.30 1.70 2.55 -4.18 4.88 1.46 0.103 Crl:CD1(ICR)-US_K64 48 93.91 29.46 0.30 14.16 5.38 -1.41 2.06 0.84 0.075 Crl:CD1(ICR)-US_K95 24 97.14 44.19 0.28 3.12 2.27 -10.45 7.54 1.06 0.136 Crl:CD1(ICR)-US_P10 24 96.41 42.21 0.22 15.58 1.98 1.56 4.29 1.08 0.100 Crl:CD1(ICR)-US_R16 24 96.86 38.24 0.35 3.40 2.83 -12.10 3.21 1.22 0.085 Crl:CF1-US 48 94.92 25.50 0.35 4.82 6.80 10.04 4.90 2.37 0.194 Crl:CFW(SW)-US_K71 48 94.25 41.36 0.26 4.25 4.53 6.28 3.60 0.86 0.084 Crl:CFW(SW)-US_P08 48 91.27 29.18 0.22 24.36 0.00 4.65 1.85 1.65 0.068 Crl:MF1_UK 47 93.04 64.87 0.13 1.13 1.13 -2.06 2.21 4.06 0.053 Crl:NMRI(Han)-DE 48 94.74 39.94 0.27 11.61 4.82 1.93 3.56 1.11 0.128 Crl:NMRI(Han)-FR 48 85.44 37.39 0.26 5.67 6.23 12.01 3.58 1.21 0.139 Crl:NMRI(Han)-HU 48 90.37 39.66 0.26 8.22 6.52 0.43 3.77 1.07 0.120 Crl:OF1-FR_B22 24 91.89 26.63 0.35 6.80 6.80 -5.27 6.00 2.04 0.168 Crl:OF1-FR_B41 24 93.77 27.76 0.35 9.07 6.80 -7.98 5.67 2.36 0.161 No. % genotyped % homozygote Het. Aai:ICR-US 24 88.83 75.92 0.08 BK:W_UK 48 92.17 87.25 BomTac:NMRI-DK151 23 91.98 BomTac:NMRI-DK160 24 Population CC Pct MAF < 5% 23 NA 0.008 Crl:OF1-HU 50 92.54 28.05 0.35 5.10 6.80 -1.35 4.80 2.27 0.162 Crlj:CD1(ICR)-JP 48 94.79 41.93 0.21 8.22 7.08 4.61 3.54 1.34 0.073 HanRcc:NMRI-CH 48 94.17 66.29 0.20 1.98 1.98 -11.67 3.40 1.47 0.102 Hla:(ICR)CVF-US 48 83.42 49.29 0.21 12.46 4.82 -3.13 6.33 0.79 0.098 HS 12 90.44 21.81 0.43 0.57 2.83 -3.88 1.77 2.03 0.207 Hsd:ICR(CD-1)-DE 53 89.89 47.03 0.29 4.25 5.10 2.13 2.94 1.08 0.153 Hsd:ICR(CD-1)-ES 48 88.56 46.46 0.26 7.37 5.38 3.49 3.08 1.49 0.147 Hsd:ICR(CD-1)-FR 64 93.52 45.04 0.28 5.10 5.38 5.60 3.27 0.99 0.155 Hsd:ICR(CD-1)-IL 48 86.08 43.91 0.29 6.23 3.68 -6.55 3.44 1.34 0.143 Hsd:ICR(CD-1)-IT 48 88.94 47.03 0.28 2.83 4.82 7.52 3.40 1.07 0.162 Hsd:ICR(CD-1)-MX 48 91.28 47.88 0.30 5.10 13.60 -11.34 2.50 1.07 0.153 Hsd:ICR(CD-1)-UK 48 92.96 46.18 0.28 5.95 3.97 -0.34 3.02 1.24 0.147 Hsd:ICR(CD-1)-US 48 95.99 48.16 0.28 6.80 5.38 4.36 2.50 1.05 0.149 Hsd:ND4-US 48 93.68 69.97 0.07 17.00 2.27 4.89 2.06 1.79 0.036 Hsd:NIHS_UK_C 15 93.75 68.56 0.11 10.48 1.70 6.36 4.87 1.02 0.055 Hsd:NIHS_UK_G 33 92.63 75.07 0.11 3.40 3.12 -5.09 4.76 2.04 0.084 Hsd:NIHS-US 48 92.11 54.67 0.19 6.52 9.92 -18.01 0.63 2.45 0.011 Hsd:NIHSBC_IL 12 91.64 90.93 0.02 1.42 0.57 3.11 4.08 1.05 0.047 Hsd:NSA(CF1)-US 48 93.30 30.88 0.34 12.18 11.61 1.90 5.04 1.30 0.160 HsdHu:SABRA_IL 48 91.97 45.04 0.22 5.67 22.38 25.44 2.98 2.55 0.146 HsdIco:OF1-IT 48 90.48 30.31 0.34 2.27 13.60 5.22 4.77 1.82 0.187 HsdOla:MF1_IL 8 90.51 50.42 0.21 0.00 1.70 21.38 6.00 3.38 0.141 HsdOla:MF1_UK_G 56 93.90 41.08 0.28 7.37 3.40 -0.65 2.89 3.14 0.132 HsdOla:MF1-UK_C 184 72.71 26.06 0.21 10.20 4.25 5.31 1.64 3.18 0.132 HsdOla:MF1US_202A_iso 24 93.87 75.35 0.13 1.70 0.85 -6.90 2.63 0.53 0.061 HsdOla:MF1US_202A_prod 24 94.76 75.35 0.13 1.13 0.85 -9.21 2.54 2.38 0.061 HsdOla:TO_UK 48 93.63 71.10 0.10 4.25 3.68 9.47 1.85 2.84 0.049 HsdWin:CFW1-DE 48 87.64 49.01 0.24 9.92 7.93 -0.88 3.42 1.51 0.127 HsdWin:CFW1-NL 48 82.99 51.84 0.21 7.93 4.82 3.62 3.63 0.89 0.112 HsdWin:NMRI_UK 32 93.92 62.89 0.12 15.58 1.70 -4.89 1.25 1.51 0.049 HsdWin:NMRI-DE 48 90.78 58.07 0.20 6.80 2.27 -8.87 1.85 1.10 0.098 24 HsdWin:NMRI-NL 64 93.96 57.79 0.19 5.95 3.12 2.11 1.86 1.04 0.099 IcrTac:ICR-US 36 89.28 69.69 0.06 13.31 2.55 5.40 1.50 1.92 0.013 Inbreds_94_strains 94 91.25 0.00 0.00 2.83 98.58 100.00 2.81 2.32 0.326 NTac:NIHBS-US 36 91.71 93.77 0.01 1.98 0.57 -53.44 0.39 RjHan:NMRI-FR 48 92.58 31.16 0.28 14.45 13.60 17.80 4.00 1.00 0.132 RjOrl:Swiss-FR 48 91.68 64.87 0.17 1.70 3.40 -9.22 2.10 0.88 0.078 Sca:NMRI_SE_22 24 80.63 75.07 0.09 3.97 3.12 15.16 2.92 1.09 0.047 Sca:NMRI_SE-10an 24 75.51 70.82 0.09 5.38 5.38 22.31 2.17 1.10 0.054 Sim:(SW)fBR-US_A1 48 94.56 74.50 0.10 5.67 3.68 12.43 1.96 3.02 0.056 Sim:(SW)fBR-US_B1 24 95.82 79.60 0.11 1.42 1.13 -7.87 2.58 3.05 0.050 Tac:SW-US 36 92.67 46.18 0.33 1.98 3.97 -2.00 3.33 1.30 0.159 Wild_Arizona 96 85.77 17.85 0.26 13.31 38.81 27.86 7.64 0.38 0.169 25 NA 0.003 Table 3: Whole genome analyses Population No. Markers Genos. Hom. Het. MAF HWE Inbreed coef Crl:CFW(SW)-US_P08 22 169,333 97.30 71.06 0.19 8.00 6.36 -20.86 HsdWin:CFW1-NL 22 152,716 97.17 74.55 0.18 4.98 7.15 -20.70 HsdWin:NMRI-NL 26 164,287 97.41 73.02 0.13 4.51 7.23 -18.33 Hsd:ICR(CD-1)-FR 20 623,124 87.24 45.19 0.22 10.50 1.53 -11.82 RjHan:NMRI-FR 13 171,198 96.49 63.33 0.18 4.69 7.59 -10.62 Crl:NMRI(Han)-FR 20 623,124 87.04 38.09 0.24 11.09 4.55 3.14 26 Table 4: Temporal variation Het. Mean inbreeding coef Month Crl:CD1.ICR_US_K64 Nov 2007 48 0.300 14.16 5.38 -1.41 Crl:CD1.ICR_US_K64 Aug 2009 24 0.322 4.25 1.98 -5.33 Crl:CFW.SW_US_P08 June 2008 206 0.216 24.36 0.00 4.65 Crl:CFW.SW_US_P08 Oct 2009 36 0.254 11.33 2.83 -5.29 HsdIco:OF1_IT Nov 2007 48 0.343 2.27 13.60 5.22 HsdIco:OF1_IT Feb 2008 48 0.357 9.63 9.07 -3.73 2003 52 0.297 2.27 1.98 3.34 192 0.051 1.98 31.16 31.20 HsdOla:MF1_UK No. Pct fail HWE Population HsdOla:MF1_UK Year Pct MAF < 5% HsdWin:CFW1_NL Nov 2007 48 0.205 7.93 4.82 3.62 HsdWin:CFW1_NL Aug 2008 234 0.204 12.18 0.00 10.19 HsdWin:NMRI_NL Aug 2007 64 0.191 5.95 3.12 2.11 HsdWin:NMRI_NL Aug 2008 200 0.190 8.50 0.00 0.29 27 Figure 1. Ancestry inferred from the frappe program at K = 9. The length of each colored corresponds to the ancestry coefficient of each mouse, plotted along the horizontal axis. Mice are labeled by stock name (along the bottom) and by commercial provider along the top. Mice of the same colony were grouped together (giving rise to blocks of common ancestry, as seen for example to the right of the CD1 cluster) but individual colony labels omitted. Figure 2. Multi-dimensional scaling of identity by state pairwise distances, calculated using PLINK. The figure shows a reduced representation of the results, plotting the position on the first dimension (horizontal axis) against position on the second dimension (vertical axis). Figure 3. Linkage disequilibrium decay radius (black) and minor allele frequencies (red) in outbred mice. The scale of the vertical axis is megabases for the decay radius and ten times the value of the mean allele frequency (so a value of 2 is 0.2). Figure 4 Proportion of laboratory strain inbed haplotypes found in commercial outbred stocks. 4a) the proportion for all colonies analysed at four loci. 4b) genome wide analysis for six colonies. Figure 5. QTL mapping of three phenotypes in three colonies Figure 6. Resample-based mapping of three phenotypes with LD structure. 28 29 Figure 1 30 Figure 2 31 Figure 3 32 Figure 4 Figure 5 33 34 Figure 6 35 Figure 6 36 Figure 5: QTL mapping of three phenotypes in three colonies 37 Supplemental Figure: Simulation of resample model averaging Performance of the SMA method depends on the position of the QTL and the population analysed. Here the resolution of the RMA (indicated by the distribution of the black dots) varies according to the postion of a simulated QTL, indicated by dotted red lines) 38