Download CommercialOutbreds07..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Site-specific recombinase technology wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Inbreeding avoidance wikipedia , lookup

Genetic studies on Bulgarians wikipedia , lookup

Koinophilia wikipedia , lookup

Medical genetics wikipedia , lookup

Behavioural genetics wikipedia , lookup

Genetic engineering wikipedia , lookup

History of genetic engineering wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Genetic testing wikipedia , lookup

Heritability of IQ wikipedia , lookup

Genetics and archaeogenetics of South Asia wikipedia , lookup

Public health genomics wikipedia , lookup

Genome (book) wikipedia , lookup

Tag SNP wikipedia , lookup

Inbreeding wikipedia , lookup

Genetic drift wikipedia , lookup

Microevolution wikipedia , lookup

Population genetics wikipedia , lookup

Human genetic variation wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
Genetic characterization of commercially available outbred
mice and an assessment of their utility for QTL mapping
1
Abstract
Diversity between stocks – mapping possible in one and not another; gene
level resolution possible at some loci, not all; stock choice important;
Limited genetic diversity and descent from a common set of alleles, present in
laboratory inbred strains. Mapping resolution demonstrated down to under
100 Kb. 35 stocks provide sub 1Mb mapping; haplotype maps provided for 6.
Introduction
What characterizes an ideal population for gene mapping studies? Mouse
geneticists have reason to envy the success of human genome-wide
association studies (GWAS), but not necessarily to adopt their practice, for
example by using wild mice {Laurie, 2007 #7307}. So doing entails the same
drawbacks that afflict human GWAS:
tens of thousands of subjects are
needed for robust detection of common causal variants and the majority of the
genetic variance remains unexplained, even using these large sample sizes.
What are the alternatives? One solution, available to mouse geneticists, is to
design an ideal population by breeding.
The design principles for the ideal population can be expressed in
terms of linkage disequilbrium (LD) decay, the fall-off in correlation between
genotypes with increasing distance between markers. High rates of decay are
2
found in populations with large effective population sizes (minimizing the
effects of homozygosity due to genetic drift) and many generations of random
mating (introducing large numbers of recombinants that break up correlations
between genotypes). Unfortunately a necessary corollary is the presence of
rare alleles as allele frequencies drift to extremes and new, rare, alleles arise
as a consequence of mutations. The more rare alleles in a population, and the
more they contribute to phenotypic variation, the more difficult it will be to
detect the responsible quantitative trait loci (QTLs) using genome-wide
association strategies that genotype only common alleles {Dickson, 2010
#8190}.
The best strategy to create a stock where there are few if any rare
variants, while maintaining high genetic diversity and low LD, might seem to
be to choose animals from highly divergent populations, such as wild mice
caught in many locations{Bonhomme, 2007 #8193}, or from animals currently
being used to create the collaborative cross, a set of 1,000 recombinant
inbred lines derived from highly genetically divergent progenitor strains
{Churchill, 2004 #6639}. However consideration of the properties required for
mapping reveals that this strategy is not ideal.
Mice from different populations will have a proportion of variants in
common and a proportion of variants that are unique to the animals (being
present in one population only). LD decay for the latter private variants will
depend solely on recombinants accumulated during the creation of the stock,
while LD decay for the former, common, variants will depend on the ancestry
of the two founding populations. It follows that high mapping resolution is best
3
obtained by using animals from the same mating population to reduce the
number of private alleles.
Surprisingly the ideal population may already be available. Commercial
mouse breeders, such as Harlan and Charles River Laboratories, maintain
large colonies of outbred mice that may have the necessary genetic structure.
Some outbred stocks are known to derive from animals from a single
population, such as the ‘Swiss’ stocks which descend from two male and
seven female imported from Lausanne, Switzerland {Lynch, 1969 #8188}.
Furthermore LD in some outbred stocks has been shown to allow highresolution mapping {Ghazalpour, 2008 #8189}, sufficient to identify genes
{Yalcin, 2004 #6820}.
However other findings argues against the use of commercial outbreds
for genetic mapping: investigations of eight colonies outbred Swiss mice,
using assays of protein variation, indicated that the colonies had the same
amount of variation found in fully outbred mouse or human populations {Rice,
1980 #263; Cui, 1993 #1591}; examination of outbred CD-1 mice found high
levels of population substructure {Aldinger, 2009 #8005} and genetic drift has
been documented in a colony of CFLP mice {Papaioannou, 1980 #8002}.
Groundwork responsible for the successful application of human
GWAS required both the development of sufficient markers as well as the
genetic characterization of different populations. Similar work is needed in
mouse genetics. Dense marker sets and tools for their genotyping are now
available, but we lack systematic characterization of the genetic architecture
of suitable populations. In this paper we set out to estimate: (i) the degree of
4
genetic relatedness within and between commercially available outbred
populations, and thereby determine whether inbreeding and population
structure preclude the use of the population; (ii) linkage-disequilibrium (LD) in
each stock (low LD will favour high-resolution mapping); (iii) the proportions of
common and rare variants. In order to assess the latter we tested the
hypothesis that stocks are descended from a common source: the laboratory
inbred strains. Populations in which this assumption holds true, and which
have low levels of LD, are most suitable for high-resolution mapping.
Results
Stocks, colonies and genetic markers
Table 1 lists the populations that we obtained for this study and the numbers
of animals we used. We included three control populations, with known
genetic characteristics: 12 heterogeneous stock mice {Valdar, 2006 #7175},
109 collaborative cross mice {Churchill, 2004 #6639}, 94 inbred strains
{Shifman, 2006 #7159} and a population of wild mice caught from multiple
sites in Arizona that is likely to represent a fully outbred population, similar to
that used in a human GWAS{Laurie, 2007 #7307}.
We use the term “colony” to mean a population of mice maintained as
a mating population at a single location, and “stock” to mean a collection of
colonies that are given the same stock designation by the breeders. For
example Crl:CFW-SW and Crl:CFW-UK are two colonies from the same
stock. One might expect colonies from the same stock to have descended
5
from the same founding population and only to differ to a relatively minor
extent caused by genetic drift but breeding practices may invalidate this
assumption (for example when two colonies are mixed). Therefore, where
possible, we treat colonies as separate populations. We follow the
international standardized nomenclature for outbred stocks {Festing, 1993
#8218}, but add two further pieces of information: a two letter code for the
country of origin and a code for the colony name used by the commercial
provider (e.g. Crl:CFW-US-P08).
There is considerable variation in the size of colonies and the way
animals are maintained (Table 1). Since unintended directional selection (for
example culling small mice) and genetic drift alter genetic diversity, some
breeders maintain heterozygosity by periodically crossing the stock to animals
taken from a much smaller population (the protocol is called IGS (which
stands for….). In consequence a small number of chromosomes are
distributed widely throughout the population, introducing large regions of
linkage disequilibrium that significantly reduce mapping resolution. With the
exception of YY colonies, which we examined to confirm this prediction, we
did not genetically characterize colonies using the IGS breeding scheme.
We analysed all colonies with 351 markers at two loci on chromosome
1 (131.6-134.5 Mb and 172.6-177.2 Mb) one locus on chromosome 4 (136.2139 Mb) and one locus on chromosome 17 (32.6-38.9 Mb). We also carried
out genome-wide analyses in a subset of animals and stocks. SNPs at the
four loci were spaced so as to allow us to make inferences about both long
and short range LD. Each of the four regions extends for approximately 4
megabases (Mb) with a mean intermarker distance of 47 Kb. They were
6
chosen because they include large effect QTLs detected in the HS that are
easy and inexpensive to phenotype (large effect QTLs can be detected with
relatively few animals): serum alkaline phosphatase (ALP), the ratio of CD4+
to CD8+ T-cells, concentration of high-density lipoproteins (HDL) in serum and
mean red cell volume. The region on chromosome 17 includes the MHC,
highly polymorphic in wild populations and a sensitive indicator therefore of
any loss of heterozygosity. While these four loci constitute less than 1% of the
genome, if QTLs cannot be mapped at high resolution here, it is unlikely that
colonies will be suitable for genome-wide mapping.
Inbreeding, genetic relatedness and genetic drift
We started by comparing measures of inbreeding. High rates of inbreeding
make colonies less suitable for mapping because they contain fewer (if any)
segregating QTLs. Colonies that consist of a mixture of relatives (such as
siblings, half siblings, cousins, second degree and third degree relatives) will
be difficult to use for mapping because of the differing degrees of genetic
relatedness introduces population structure.
Table 2 gives four measures of inbreeding: mean minor allele
frequency (MAF), heterozygosity (inbred colonies will score low on this
measure); the percentage of markers that failed a test of Hardy Weinberg
equilibrium (HWE) (colonies that consist of inbred but unrelated individuals,
will have high scores) and a coefficient of inbreeding that compares the
observed versus expected number of homozygous genotypes {Purcell, 2007
#8008}.
7
The measures detect different features of the genetic structure of the
colonies. While low heterozygosity, high HWE failure and high inbreeding
coefficient correctly identify the inbred strains, the collaborative cross, which
at the time of genotyping (2008) was not completely inbred, scores relatively
well on heterozygosity (19%), but is identified as inbred by the its high
inbreeding coefficient (table 2).
There are some surprising findings on the degree of genetic
heterogeneity in commercial outbreds. Four colonies are almost inbred:
NTac:NIHBS-US,
ClrHli:CD1-IL,
Hsd:NIHSBC-IL,
BK:W-UK.
With
heterozygosities < 5% almost all the markers we genotyped were not
polymorphic. A further five colonies have heterozygosities less than 10% and
so are unlikely to be useful for mapping (nor indeed to be useful for the most
of the stocks’ intended purposes).
coefficients greater than 20%
Three colonies have inbreeding
(HsdHu:SABRA-IL, Sca:NMRI-SE_10an,
HsdOla:MF1-IL) and a further seven with values greater than 10%.
We attempted to distinguish between colonies using methods applied
in human genetic analysis, but while principal components and Fst analysis
revealed population differentiation (Supp Figs) we could find no feature (not
stock, colony, producer of country of origin) that satisfactorily accounted for
the distribution. These difficulties led us to determine genetic ancestry
regardless of stock identity. We considered each colony as originating from K
unknown ancestral populations and looked at values of K from 3 to 12 (80 –
check with Amelie) using the FRAPPE software package{Li, 2008 #8220}
{Tang, 2005 #8219}.
8
Two results were noteworthy. First, at no value of K were we able to
differentiate all stocks. In a few cases a single component predominates,
uniquely distinguishing a stock (MF1 and CFW stocks are examples), but in
general stocks differ in the proportions of common ancestry. This is true of the
most widely used stocks, CD1 and NMRI (Figure 1). Ancestry also confirms
the similarity between ICR and CD1, essentially the same stocks. Second,
there is considerable variation within a stock, which is largely explained by
variation between colonies, as shown for example by the varying proportions
of colour in the CD1 and NMRI stocks (Figure 1).
One likely contribution to variation is from population structure within
the colonies. We looked for evidence of this using multi-dimensional scaling of
IBS pairwise distance matrices {Li, 2008 #8222}. Supp Figure X shows results
for all populations; representative examples are shown in Figure 2. We found
two or more clusters in eighteen populations.
Finally we looked at allele frequency fluctuation over time, which is
expected to occur due to unintended directional selection and random genetic
drift. Results obtained from Hsd:MF1 animals used in 2003 were strikingly
different from those purchased in 2007: heterozygosity fell from 30% to 5%
and the inbreeding coefficient rose from 3 to more than 30. We discovered
that due to infection the colony had been reformed from a small number of rederived founders, thereby introducing a severe population bottleneck and
explaining the changes in genetic architecture. However such drastic changes
are unusual. We surveyed five more colonies, at least one year after our initial
analysis and found good agreement between heterozygosity, relatedness,
inbreeding (Table 4) measured on the two occasions.
9
QTL Mapping resolution
We assessed mapping resolution at the four test loci by the LD decay radius,
defined as the mean physical separation in base pairs (bp) between SNPs at
which the squared correlation coefficient (R2) drops below 0.5.
Figure 3
shows results for all populations analysed (there were insufficient genotypes
to calculate LD for NTac:NIHBS-US and ClrHli:CD1-IL). Populations suitable
for high-resolution mapping should have low LD decay radius and high mean
MAF.
Average figures of LD decay mask variation between regions. For
example Hsd:Win:NMRI-NL has a mean LD decay radius of just over 1, but it
will be of little use mapping the MHC region where LD is extensive. However
a region with high LD in one population may have low LD in another. This
locus to locus variation means that no single population is ideal and that
genome-wide haplotype maps are needed. Therefore we explored genomewide variation in LD in six colonies, chosen to cover a range of mean LD
decay measures. After genotyping using the mouse diversity array {Yang,
2009 #8223} and discarding non-polymorphic markers, haplotype blocks were
estimated using PLINK {Purcell, 2007 #7230}, which implements the same
block finding algorithm found in HAPLOVIEW {Barrett, 2005 #6834}.
Measures of relatedness and inbreeding agreed with those obtained from the
single locus analyses (table 3).
Over the genome, mean block length varied between the six colonies:
Crl:CFW.SW-US 403.9 Kb (standard deviation (sd) 570.9), Crl:NMRI.Han-FR
10
39.53 Kb (sd 58.7), Hsd:ICR.CD1-FR 51.1 Kb (sd 79.5), HsdWein:CFW-NL
440.1 Kb (sd 573.8), HsdWin:NMRI-NL 374.5 Kb (sd 525.5) and RjHan:NMRIFR 264.0 Kb (sd 398.0). As expected, LD varied considerably across the
genome
and
we
present
the
findings
for
each
chromosome
at
http://www.well.ox.ac.uk/mouse/outbreds/haploview.
Haplotypes in commercial outbreds are found in laboratory strains
We estimated the contribution of each inbred strain to each stock’s genetic
architecture by reconstructing the genome of each mouse as a probabilistic
mosaic of the founders using a hidden Markov model {Mott, 2000 #5686}. We
used the Perlegen NIEHS genotypes {Frazer, 2007 #7202} as a reference set
of 15 inbred founders and analysed all stocks at the four loci (figure 4a) and
performed genome-wide analyses in a subset of colonies (figure 4b).
While there is considerable variation between colonies two general
patterns are clear in both locus-specific and genome-wide analyses. First, in
all colonies the fraction accounted by classical inbred strains ranges between
42% (the NIHS colonies) to 80% (most ICR/CD1). Averaged across all
colonies and over the four loci, most inbred strains contribute between 3-8%
of the haplotype fraction, whilst 129, FVB and NOD contribute 12-14%.
Second the wild-derived strains (WSB, CAST, FVB, MOLF) contribute the
least (3-5%). The NIHS stocks contain the highest contribution of the Swiss
mouse FVB (25-35%). NMRI are 15-20% FVB and 15% 129, CD1 about 15%
FVB and MF1 only 5% The CFW stocks all contain about 15% FVB.
11
Sequence analysis and novel variants
Probabilistic ancestral haplotype reconstruction assumes that the haplotypes
of the progenitors are identical to those of the outbreds. We used two
methods to determine whether this was true.
First, we used PCR to amplify 22 fragments of about 1.2 Kb, (see Supp
Table xxx for primer information). We randomly selected eight regions from a
5Mb-QTL region we previously mapped on mouse chromosome 1 (REF), four
regions from three loci involved in HDL, CD4 and MCV traits (REF) and 2
regions from the AKP2 locus.
We sequenced 12 animals from three
populations (HsdWin:CFW-1 NL HNL1, Crl:CFW US K71 and HsdWin:NMRI
NL HNL1), 12 wild mice animals (DNA provided to us by Alexandre Reymond,
University of Lausanne) and 10 classical inbred strains (A/J, AKR/J, BALB/cJ,
C3H/HeJ, C57BL/6J, CBA/J, DBA/2J, LP/J, I/LnJ and RIII/DmMobJ).
We discovered 120 SNPs (see Supp Table xx for detailed information).
Wild mice have an average of one SNP every 200 bp but this rate varies
between colonies: HsdWin:CFW-1 and Crl:CFW have frequency of 1 SNP
every 350 bp, whereas HsdWin:NMRI has 1 SNP on average one SNP every
520 bp. Nine of the SNPs are coding variants (table ). We found 3 novel
variants (giving a rate of 2.5%) in Crl:CFW (positioned on chr1:173306046,
chr1:173368101 and
chr17:34785468) and only one (rate 0.8%) in each
HsdWin:CFW-1 and HsdWin:NMRI (chr17:34785468). Our locus-specific
sequencing data suggest that HsdWin:CFW-1 is related to wild-derived inbred
12
strains PWK whereas Crl:CFW and HsdWin:NMRI are related to Swissderived inbred strains (eg NOD and FVB).
Second we used next generation sequencing to estimate genome wide
rates of novel SNPs. We took two approaches, sequencing at ten fold
coverage DNA from four mice from one colony () and restriction enzyme
digest enrichment.
[ FASTERIS RESULTS]
QTL mapping
The implication from haplotype reconstruction and sequence analyses and is
that colonies are descended from a common set of progenitors. Consequently
many of the same alleles, though differing in frequency, will contribute to
phenotypic variation in different colonies. We directly investigated this
hypothesis by mapping QTLs contributing to variation in four phenotypes
(serum alkaline phosphatase (ALP), the ratio of CD4+ to CD8+ T-cells,
concentration of high-density lipoproteins (HDL) in serum and mean red cell
volume) in three populations (Crl:CFW (USA), HsdWin:CFW (Netherlands)
and HsdWin:NMRI (Netherlands)). We tested with a joint analysis in which
QTLs were mapped simultaneously in the three stocks. This showed that the
assumption that a single trait effect for each founder strain, independent of
stock, fitted the data as well as a model in which each stock had independent
effects. [ RICHARD ]
Finally we directly investigated the extent to which variation in allele
frequency and in LD affects mapping resolution. We analysed the data by
13
ANOVA at each marker (single marker analysis). Applying a conservative
Bonferroni correction for testing 351 markers for four phenotypes in three
populations gives a threshold of 4.93, which, as Figure 5 shows, is exceeded
over a 1 Mb interval on chromosome 4 for ALP, a 0.5 Mb region on
chromosome 1 for HDL and a two megabase region on chromosome 17 for
CD4/CD8 ratio.
Figure X shows that QTLs are detected in different
populations: ALP in Crl:CFW (with less significant evidence for association in
HsdWin:NMRI,); HDL in HsdWin:CFW; CD4/CD8 in both Crl:CFW and
HsdWin:CFW.
We determined the most likely position of the QTL by resample model
averaging, a procedure developed in our analysis of the HS {Valdar, 2009
#7988}. We determined the performance and resolution of the method by
simulating a QTL at each polymorphic marker in the three regions and in all
populations. As expected, confidence intervals depended on the location of
QTL within a region of high LD, and varied from less than 100Kb to more than
2 Mb (examples are given in Supplemental Figure X)
We found no evidence of multiple effects at these loci (as indicated by
the logP of second and subsequent rounds of forward selection falling below
significance thresholds). The ALP locus remains diffusely spread over a 1
megabase region in both the Crl:CFW and HsdWin:NMRI populations.
However much higher resolution is seen for mapping CD4/CD8 ratio and HDL
where the 95% confidence intervals (from simulation) is less than 200 Kb in
the vicinity of the QTL. Figure X plots the position of the most significant locus
identified by forward selection and indicates the LD structure of the region
above the plots (where red circles are R2 of 1).
14
Characterization of the molecular basis of CD4/CD8 – h2ealpha is
within the location we have identified chr17:34,421,575-34,579,223
Characterization of the molecular basis of HD.
Discussion
Commercially
available
outbred
mice
are
used
primarily
by
the
pharmaceutical industry for toxicology testing, on the assumption that they
model outbred human populations, a view supported by limited genetic
surveys {Rice, 1980 #263}. In fact very little is known about their genetic
architecture and assumptions about the combined effects of fluctuating allele
frequencies (due to genetic drift) and lack of genetic quality control have led
some to argue against their use in genetic investigations {Chia, 2005 #8130;
Festing, 1999 #8134}. Our catalogue of the genetic structure of commercially
available stocks makes a systematic evaluation possible for the first time. We
have established three important features.
First, variation between colonies is large. Fst, a measure of variation
within and between populations, is 0.454 (in contrast human populations
values are typically less than 0.05 {Reich, 2009 #8148}). The source of this
variation is not straightforward. Stock names (such as NMRI or CD1) do not
account for it, nor does the supplier, nor the country. While we can show that
some stocks, such as TO and MF1, do indeed have a unique genetic
ancestry, many do not.
15
To a large extent variation is colony specific. Mouse colonies are often
believed to behave very much like finite island populations. In which case,
except for imposed bottlenecks (as happened with the MF1) or the forcible
introduction of new alleles (as happens with breeding schemes like IGS that
introduce large unrecombined chunks of the genome), genetic variation will
depend on the effective population size (Ne): assuming random mating, the
time required for a neutral allele to go to fixation in a population, and hence to
reduce heterozygosity, is approximately equal to four times Ne. Given that so
many colonies are maintained with effective population sizes of many
thousands, colony genetic architecture should be stable. Consistent with this
view, our analyses of five colonies over two years found little evidence for
changes in allele frequencies and LD values.
Second, the number of alleles segregating in colonies is relatively
limited (compared to a wild population). Almost all of the genetic variants can
be found in classical laboratory strains. Both locus specific and genome wide
sequencing
support
this
conclusion
and
haplotype
reconstruction
demonstrates how variants in the outbreds can be modeled as descending
from inbred progenitors.
Third, in terms of mapping resolution, no mouse colony is comparable
to a human population. Using an LD criterion, the best mapping resolution in
any colony is at least twice that obtainable in human populations. Applying the
same definition of a haplotype LD block as used in human LD studies, we find
average block size varies between colonies from 40 to 400 Kb. By contrast in
African populations average block length is 9Kb, and 18 Kb in European
populations {Gabriel, 2002 #6159}.
16
These observations have important implications for the use of
commercial outbreds for genetic mapping. The extent of LD means that
genome-wide coverage can be obtained with fewer SNPs (about 200,000 and
less for colonies with larger blocks) than in human populations, but resolution
may fall short of gene level in many parts of the genome. This means that
high resolution mapping of a locus may be possible in one colony, but not in
another – no single colony is ideal. We have shown this for the MHC region
on chromosome 17, where high-resolution mapping was possible in the
HsdWin:CFW but not in Crl:CFW stocks.
However, as we have repeatedly emphasized, mapping resolution is
not the only useful measure of a colony’s suitability for GWAS.
Another
critical measure is allele frequency. Large numbers of rare variants
contributing to phenotypic variation in a population will make the trait difficult
to map using standard GWAS designs. Here our data reveal a favorable
situation: QTL mapping assuming a common set of founder strains shows that
the QTLs replicate between stocks in a consistent manner. These findings
suggest that quantitative differences in allele frequencies, rather than the
existence of private alleles, are responsible for the population differences.
Furthermore, the limited sequence diversity means it is possible to impute the
sequence of any commercially available mouse from a dense SNP map. Thus
the full catalogue of sequence variation in a stock could be obtained by
sequencing the inbred strains presumed to be founders for it, and genotyping
the stock at a skeleton of SNPs. Therefore we should be able to detect the
effect of all variants, a situation that has so far eluded studies in completely
outbred populations.
17
Our catalogue of the genetic structure of commercially available stocks,
the first of its kind, makes it possible to rank colonies according to their utility
for genetic mapping. Combined with exclusions on the basis of poor genetic
structure, we have identified 35 populations that have properties conducive to
high-resolution mapping. These 35 populations appear to be substantially
superior to currently available resources for high-resolution mapping. LD is for
example lower than in the HS or collaborative cross animals. Our results now
make it possible for geneticists to make informed choices on the use of the
stocks and to use them for GWAS studies of complex traits in mice.
[By mapping in different colonies we identified a deletion in the
promoter of h2-ealpha as the molecular change that contributes to variation in
CD4/CD8 levels. This locus has recently been identified in humans and the
homologous gene is therefore a prime candidate (). Furthermore, our work on
the HDL locus has identified two previously unsuspected candidates (Slams
and CD48)]
.
Acknowledgements
18
19
METHODS
Mice
Genotyping
Sequencing
Phenotyping
We analysed 200 animals from three colonies: Crl:CFW (USA), HsdWin:CFW
(Netherlands) and HsdWin:NMRI (Netherlands). Blood samples were taken
from a tail vein and we performed assays for serum alkaline phosphatase
(ALP), the ratio of CD4+ to CD8+ T-cells, concentration of high-density
lipoproteins (HDL) in serum and mean red cell volume.
LD
Genetic mapping
Where necessary, phenotypes are transformed into Gaussian deviates.
Covariates (such as gender, age, experimenter, time) that explain a significant
fraction of each phenotype’s variance with ANOVA P-value<0.01 are included
in subsequent statistical analyses. We use two mapping methods: a single
point analysis of variance of each marker and a multi-point method.
20
Haplotypes are reconstructed as mosaics of know inbred strains using a
dynamic programming algorithm that minimises the number of breakpoints
required {Yalcin, 2004 #32}. These strains are used as progenitors for the
multipoint analysis (probabilistic ancestral haplotype reconstruction (in the
HAPPY package) {Mott, 2000 #96}. Region-wide significance levels are
estimated by permuting the transformed phenotype values 1,000 times.
21
TABLES
Table 1 – Mouse providers, location, breeding protocols, health status
22
Table 2 – Genetic characteristics of outbred mouse colonies
Pct
fail
HWE
Mean
inbreeding
coef
Haplotypes
LD
decay
radius
Mean
MAF
6.80
2.27
2.76
2.21
1.88
0.026
0.04
3.12
2.27
8.78
0.63
1.12
0.024
65.16
0.16
2.83
1.70
-5.68
2.61
1.07
0.068
93.11
65.72
0.15
3.97
1.98
4.57
2.29
0.87
0.075
109
89.17
5.38
0.19
2.83
89.24
67.28
3.61
2.78
0.254
ClrHli:CD1_IL
20
94.65
93.20
0.01
2.83
0.57
-16.50
0.95
Crl:CD1.ICR_UK
48
93.20
30.88
0.27
13.88
3.97
4.40
4.04
1.00
0.126
Crl:CD1.ICR-US_iso
30
97.37
37.96
0.24
11.90
4.25
13.73
3.67
1.37
0.152
Crl:CD1(ICR)-DE
48
94.07
40.51
0.19
18.98
7.08
10.26
2.48
1.24
0.090
Crl:CD1(ICR)-FR
48
94.26
32.01
0.28
15.01
4.53
6.00
2.46
0.73
0.133
Crl:CD1(ICR)-IT
48
95.15
33.71
0.31
13.31
5.38
4.70
4.40
0.76
0.161
Crl:CD1(ICR)-US_C61
24
96.81
31.44
0.30
12.75
2.27
0.68
7.58
1.18
0.114
Crl:CD1(ICR)-US_H43
24
96.07
36.54
0.29
9.92
3.97
6.00
4.08
0.89
0.130
Crl:CD1(ICR)-US_H48
24
95.88
37.68
0.30
1.70
2.55
-4.18
4.88
1.46
0.103
Crl:CD1(ICR)-US_K64
48
93.91
29.46
0.30
14.16
5.38
-1.41
2.06
0.84
0.075
Crl:CD1(ICR)-US_K95
24
97.14
44.19
0.28
3.12
2.27
-10.45
7.54
1.06
0.136
Crl:CD1(ICR)-US_P10
24
96.41
42.21
0.22
15.58
1.98
1.56
4.29
1.08
0.100
Crl:CD1(ICR)-US_R16
24
96.86
38.24
0.35
3.40
2.83
-12.10
3.21
1.22
0.085
Crl:CF1-US
48
94.92
25.50
0.35
4.82
6.80
10.04
4.90
2.37
0.194
Crl:CFW(SW)-US_K71
48
94.25
41.36
0.26
4.25
4.53
6.28
3.60
0.86
0.084
Crl:CFW(SW)-US_P08
48
91.27
29.18
0.22
24.36
0.00
4.65
1.85
1.65
0.068
Crl:MF1_UK
47
93.04
64.87
0.13
1.13
1.13
-2.06
2.21
4.06
0.053
Crl:NMRI(Han)-DE
48
94.74
39.94
0.27
11.61
4.82
1.93
3.56
1.11
0.128
Crl:NMRI(Han)-FR
48
85.44
37.39
0.26
5.67
6.23
12.01
3.58
1.21
0.139
Crl:NMRI(Han)-HU
48
90.37
39.66
0.26
8.22
6.52
0.43
3.77
1.07
0.120
Crl:OF1-FR_B22
24
91.89
26.63
0.35
6.80
6.80
-5.27
6.00
2.04
0.168
Crl:OF1-FR_B41
24
93.77
27.76
0.35
9.07
6.80
-7.98
5.67
2.36
0.161
No.
%
genotyped
%
homozygote
Het.
Aai:ICR-US
24
88.83
75.92
0.08
BK:W_UK
48
92.17
87.25
BomTac:NMRI-DK151
23
91.98
BomTac:NMRI-DK160
24
Population
CC
Pct
MAF
< 5%
23
NA
0.008
Crl:OF1-HU
50
92.54
28.05
0.35
5.10
6.80
-1.35
4.80
2.27
0.162
Crlj:CD1(ICR)-JP
48
94.79
41.93
0.21
8.22
7.08
4.61
3.54
1.34
0.073
HanRcc:NMRI-CH
48
94.17
66.29
0.20
1.98
1.98
-11.67
3.40
1.47
0.102
Hla:(ICR)CVF-US
48
83.42
49.29
0.21
12.46
4.82
-3.13
6.33
0.79
0.098
HS
12
90.44
21.81
0.43
0.57
2.83
-3.88
1.77
2.03
0.207
Hsd:ICR(CD-1)-DE
53
89.89
47.03
0.29
4.25
5.10
2.13
2.94
1.08
0.153
Hsd:ICR(CD-1)-ES
48
88.56
46.46
0.26
7.37
5.38
3.49
3.08
1.49
0.147
Hsd:ICR(CD-1)-FR
64
93.52
45.04
0.28
5.10
5.38
5.60
3.27
0.99
0.155
Hsd:ICR(CD-1)-IL
48
86.08
43.91
0.29
6.23
3.68
-6.55
3.44
1.34
0.143
Hsd:ICR(CD-1)-IT
48
88.94
47.03
0.28
2.83
4.82
7.52
3.40
1.07
0.162
Hsd:ICR(CD-1)-MX
48
91.28
47.88
0.30
5.10
13.60
-11.34
2.50
1.07
0.153
Hsd:ICR(CD-1)-UK
48
92.96
46.18
0.28
5.95
3.97
-0.34
3.02
1.24
0.147
Hsd:ICR(CD-1)-US
48
95.99
48.16
0.28
6.80
5.38
4.36
2.50
1.05
0.149
Hsd:ND4-US
48
93.68
69.97
0.07
17.00
2.27
4.89
2.06
1.79
0.036
Hsd:NIHS_UK_C
15
93.75
68.56
0.11
10.48
1.70
6.36
4.87
1.02
0.055
Hsd:NIHS_UK_G
33
92.63
75.07
0.11
3.40
3.12
-5.09
4.76
2.04
0.084
Hsd:NIHS-US
48
92.11
54.67
0.19
6.52
9.92
-18.01
0.63
2.45
0.011
Hsd:NIHSBC_IL
12
91.64
90.93
0.02
1.42
0.57
3.11
4.08
1.05
0.047
Hsd:NSA(CF1)-US
48
93.30
30.88
0.34
12.18
11.61
1.90
5.04
1.30
0.160
HsdHu:SABRA_IL
48
91.97
45.04
0.22
5.67
22.38
25.44
2.98
2.55
0.146
HsdIco:OF1-IT
48
90.48
30.31
0.34
2.27
13.60
5.22
4.77
1.82
0.187
HsdOla:MF1_IL
8
90.51
50.42
0.21
0.00
1.70
21.38
6.00
3.38
0.141
HsdOla:MF1_UK_G
56
93.90
41.08
0.28
7.37
3.40
-0.65
2.89
3.14
0.132
HsdOla:MF1-UK_C
184
72.71
26.06
0.21
10.20
4.25
5.31
1.64
3.18
0.132
HsdOla:MF1US_202A_iso
24
93.87
75.35
0.13
1.70
0.85
-6.90
2.63
0.53
0.061
HsdOla:MF1US_202A_prod
24
94.76
75.35
0.13
1.13
0.85
-9.21
2.54
2.38
0.061
HsdOla:TO_UK
48
93.63
71.10
0.10
4.25
3.68
9.47
1.85
2.84
0.049
HsdWin:CFW1-DE
48
87.64
49.01
0.24
9.92
7.93
-0.88
3.42
1.51
0.127
HsdWin:CFW1-NL
48
82.99
51.84
0.21
7.93
4.82
3.62
3.63
0.89
0.112
HsdWin:NMRI_UK
32
93.92
62.89
0.12
15.58
1.70
-4.89
1.25
1.51
0.049
HsdWin:NMRI-DE
48
90.78
58.07
0.20
6.80
2.27
-8.87
1.85
1.10
0.098
24
HsdWin:NMRI-NL
64
93.96
57.79
0.19
5.95
3.12
2.11
1.86
1.04
0.099
IcrTac:ICR-US
36
89.28
69.69
0.06
13.31
2.55
5.40
1.50
1.92
0.013
Inbreds_94_strains
94
91.25
0.00
0.00
2.83
98.58
100.00
2.81
2.32
0.326
NTac:NIHBS-US
36
91.71
93.77
0.01
1.98
0.57
-53.44
0.39
RjHan:NMRI-FR
48
92.58
31.16
0.28
14.45
13.60
17.80
4.00
1.00
0.132
RjOrl:Swiss-FR
48
91.68
64.87
0.17
1.70
3.40
-9.22
2.10
0.88
0.078
Sca:NMRI_SE_22
24
80.63
75.07
0.09
3.97
3.12
15.16
2.92
1.09
0.047
Sca:NMRI_SE-10an
24
75.51
70.82
0.09
5.38
5.38
22.31
2.17
1.10
0.054
Sim:(SW)fBR-US_A1
48
94.56
74.50
0.10
5.67
3.68
12.43
1.96
3.02
0.056
Sim:(SW)fBR-US_B1
24
95.82
79.60
0.11
1.42
1.13
-7.87
2.58
3.05
0.050
Tac:SW-US
36
92.67
46.18
0.33
1.98
3.97
-2.00
3.33
1.30
0.159
Wild_Arizona
96
85.77
17.85
0.26
13.31
38.81
27.86
7.64
0.38
0.169
25
NA
0.003
Table 3: Whole genome analyses
Population
No.
Markers
Genos.
Hom.
Het.
MAF
HWE
Inbreed coef
Crl:CFW(SW)-US_P08
22
169,333
97.30
71.06
0.19
8.00
6.36
-20.86
HsdWin:CFW1-NL
22
152,716
97.17
74.55
0.18
4.98
7.15
-20.70
HsdWin:NMRI-NL
26
164,287
97.41
73.02
0.13
4.51
7.23
-18.33
Hsd:ICR(CD-1)-FR
20
623,124
87.24
45.19
0.22
10.50
1.53
-11.82
RjHan:NMRI-FR
13
171,198
96.49
63.33
0.18
4.69
7.59
-10.62
Crl:NMRI(Han)-FR
20
623,124
87.04
38.09
0.24
11.09
4.55
3.14
26
Table 4: Temporal variation
Het.
Mean
inbreeding
coef
Month
Crl:CD1.ICR_US_K64
Nov
2007
48
0.300
14.16
5.38
-1.41
Crl:CD1.ICR_US_K64
Aug
2009
24
0.322
4.25
1.98
-5.33
Crl:CFW.SW_US_P08
June
2008
206
0.216
24.36
0.00
4.65
Crl:CFW.SW_US_P08
Oct
2009
36
0.254
11.33
2.83
-5.29
HsdIco:OF1_IT
Nov
2007
48
0.343
2.27
13.60
5.22
HsdIco:OF1_IT
Feb
2008
48
0.357
9.63
9.07
-3.73
2003
52
0.297
2.27
1.98
3.34
192
0.051
1.98
31.16
31.20
HsdOla:MF1_UK
No.
Pct fail
HWE
Population
HsdOla:MF1_UK
Year
Pct MAF
< 5%
HsdWin:CFW1_NL
Nov
2007
48
0.205
7.93
4.82
3.62
HsdWin:CFW1_NL
Aug
2008
234
0.204
12.18
0.00
10.19
HsdWin:NMRI_NL
Aug
2007
64
0.191
5.95
3.12
2.11
HsdWin:NMRI_NL
Aug
2008
200
0.190
8.50
0.00
0.29
27
Figure 1. Ancestry inferred from the frappe program at K = 9. The length of
each colored corresponds to the ancestry coefficient of each mouse, plotted
along the horizontal axis. Mice are labeled by stock name (along the bottom)
and by commercial provider along the top. Mice of the same colony were
grouped together (giving rise to blocks of common ancestry, as seen for
example to the right of the CD1 cluster) but individual colony labels omitted.
Figure 2. Multi-dimensional scaling of identity by state pairwise distances,
calculated using PLINK. The figure shows a reduced representation of the
results, plotting the position on the first dimension (horizontal axis) against
position on the second dimension (vertical axis).
Figure 3. Linkage disequilibrium decay radius (black) and minor allele
frequencies (red) in outbred mice. The scale of the vertical axis is megabases
for the decay radius and ten times the value of the mean allele frequency (so
a value of 2 is 0.2).
Figure 4
Proportion of laboratory strain inbed haplotypes found in
commercial outbred stocks. 4a) the proportion for all colonies analysed at four
loci. 4b) genome wide analysis for six colonies.
Figure 5. QTL mapping of three phenotypes in three colonies
Figure 6. Resample-based mapping of three phenotypes with LD structure.
28
29
Figure 1
30
Figure 2
31
Figure 3
32
Figure 4
Figure 5
33
34
Figure 6
35
Figure 6
36
Figure 5: QTL mapping of three phenotypes in three colonies
37
Supplemental
Figure: Simulation of resample model averaging
Performance of the SMA method depends on the position of the QTL and the
population analysed. Here the resolution of the RMA (indicated by the
distribution of the black dots) varies according to the postion of a simulated
QTL, indicated by dotted red lines)
38