Download A genome-wide association scan in pig identifies novel regions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene nomenclature wikipedia , lookup

Genomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Copy-number variation wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genetic engineering wikipedia , lookup

Genomic library wikipedia , lookup

Population genetics wikipedia , lookup

Twin study wikipedia , lookup

History of genetic engineering wikipedia , lookup

Minimal genome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Heritability of IQ wikipedia , lookup

Gene desert wikipedia , lookup

Human genetic variation wikipedia , lookup

Genomic imprinting wikipedia , lookup

X-inactivation wikipedia , lookup

Gene expression programming wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Public health genomics wikipedia , lookup

Designer baby wikipedia , lookup

Genome-wide association study wikipedia , lookup

Genome (book) wikipedia , lookup

Microevolution wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Tag SNP wikipedia , lookup

SNP genotyping wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
Published December 2, 2014
A genome-wide association scan
in pig identifies novel regions associated with feed efficiency trait1
Goutam Sahana,*2 Veronika Kadlecová,*† Henrik Hornshøj,* Bjarne Nielsen,‡ and Ole F. Christensen*
*Department of Molecular Biology and Genetics, Aarhus University, Blichers Alle 20, Postboks 50, DK-8830 Tjele,
Denmark; †Department of Animal Husbandry, Czech University of Life Sciences Prague, 165 21 Prague 6-Suchdol, Czech
Republic; and ‡Pig Research Centre, Danish Agriculture and Food Council, Axeltorv 3, DK-1609 Copenhagen V, Denmark
ABSTRACT: Feed conversion ratio (FCR) is an economically important trait in pigs, and feed accounts for
a significant proportion of the costs involved in pig production. In this study we used a high-density SNP chip
panel, Porcine SNP60 BeadChip, to identify the association between FCR and SNP markers and to study the
genetic architecture of the trait. After quality control, a
total of 30,847 SNP that could be mapped to the 18 porcine autosomes (SSC) using the pig genome assembly
10.2 were used in the analyses. Deregressed estimated
breeding value was used as the response variable. A total
of 3,071 Duroc pigs had both FCR data and genotype
data. The linkage disequilibrium (r2) between adjacent
markers was 0.56. Two association mapping approaches were used: a linear mixed model (LMM) based on
single-locus regression analysis and a Bayesian variable
selection approach (BVS). A total of 79 significant (P <
0.0001) SNP associations on 6 chromosomes were identified by LMM analyses. Out of these, 10 SNP crossed
the genome-wide significance threshold. These 10 SNP
were all located on SSC 4 and 14. In the BVS analysis, a
total of 44 SNP located on 12 chromosomes had posterior
probability more than or equal to 0.05 (i.e., Bayes factor
≥ 10). Thirteen SNP were identified by both LMM and
BVS. These 13 SNP were located on 4 chromosomes:
SSC 4, 7, 8, and 14. Hypoxia inducible factor 1, alpha
subunit inhibitor (HIF1AN) and ladybird homeobox 1
(LBX1) are 2 possible candidate genes affecting FCR on
SSC 4 and 14, respectively. The study provides a list of
SNP associated with FCR and also offers valuable information on the genetic architecture and candidate genes
for this trait.
Key words: feed conversion ratio, gene mapping, genome-wide association study, pig
© 2013 American Society of Animal Science. All rights reserved.
INTRODUCTION
Feed conversion ratio (FCR) is an economically
important trait in pigs, and feed accounts for a significant proportion of the costs involved in pig production. Therefore, selection that improves FCR has been
an important part of the breeding goal for commercial
pig breeding (Okine et al., 2004). Genomic selection
has been implemented in pig breeding (Ostersen et al.,
1The work was partly funded by the Danish Ministry of Food,
Agriculture and Fisheries (grant 3405-11-0279). The authors gratefully
acknowledge Luc Janss for help on the use of Bayz software and the
Pig Research Centre, the Danish Agricultural and Food Council, for
the data used in the study.
2Corresponding author: [email protected]
Received July 12, 2012.
Accepted November 29, 2012.
J. Anim. Sci. 2013.91:1041–1050
doi:10.2527/jas2012-5643
2011; Christensen et al., 2012) for improvement of
economic traits including FCR. However, knowledge
of QTL influencing FCR may further improve genetic
selection models by using prior knowledge of chromosomal regions harboring QTL. Several QTL affecting
economic traits in pigs have previously been reported,
primarily on the basis of within-family linkage analysis
using microsatellite markers. The results are listed in
the Pig Quantitative Trait Locus database (pig QTLdb)
located at www.animalgenome.org/cgi-bin/QTLdb/SS/
index (Hu and Reecy, 2007; Hu et al., 2010). However,
availability of the PorcineSNP60 BeadChip (Illumina,
San Diego, CA), which contains more than 62,000 SNP
spread over entire porcine genome, gives us the opportunity to carry out association studies to map chromosomal regions associated with economically important
traits in pigs at the population level.
1041
1042
Sahana et al.
Several QTL for FCR discovered using linkage studies with microsatellite markers have been listed in the
pig QTLdb on various SSC (9, 13, 16, 18, and X). Such
linkage information can be used only for within-family
selection, and the linkage QTL intervals are generally
wide and therefore are not suitable for candidate gene
searches. However, association mapping (also known as
linkage disequilibrium mapping) with dense SNP markers can map QTL segregating in a population, and as it
uses historic recombinations, the QTL intervals become
much narrower than in a linkage study (Risch, 2000).
Furthermore, there are often pronounced disagreements
between results obtained from different genome-wide
association studies in different populations. There could
be multiple possible reasons for these disagreements, for
example, the original discovery was a false positive, the
QTL is segregating in 1 population, or lack of statistical
SRZHU%RORUPDDHWDO7KHUHIRUHFRQ¿UPDWLRQ
of published QTL is necessary in the population under
study before such information is included in genomic
VHOHFWLRQ DV SULRU NQRZOHGJH RU IRU ¿QHPDSSLQJ 47/
with an aim to identify candidate genes that underlie the
QTL. The objective of this study was to identify SNP
associated with FCR in Danish Duroc pigs.
0$7(5,$/6$1'0(7+2'6
Animal Care and Use Committee approval was not
obtained for this study because no animal was handled
in this experiment.
et al. (2011). In brief, the deregression procedure of
Garrick et al. (2009) adjusts for ancestral information,
such that the deregressed EBV only contain their own
information and the information of the descendant on
each animal. Deregressed EBV have unequal variances
and therefore should be used in a weighted analysis. The
weight for the ith animal was estimated using Garrick et
al. (2009) as
1− h2
wi =
⎡⎛ 1 − ri2 ⎞ 2 ⎤
⎢⎜ c + 2 ⎟ h ⎥
ri ⎠ ⎦
⎣⎝
,
where c = the part of the genetic variance not explained
by markers, h2 = the heritability of the trait, and ri2 = the
reliability of the deregressed EBV of the ith animal. The
average reliability of the deregressed EBV was 0.21
(SD = 0.13). The parameter c was assumed to be 0.1,
DQGWKHKHULWDELOLW\IRU)&5ZDVXVLQJ2VWHUVHQHW
al. (2011). Determining the proportion of genetic variance not captured by markers is not straightforward as
it will vary with the trait, the population, and also the
marker density. Christensen et al. (2012) reported that a
proportion in the range of 0.1 to 0.4 is a good assumption. Jensen et al. (2012) reported that 77.2% of the total
additive genetic variance was explained by markers in
cattle (using Bovine SNP50 BeadChip).We have chosen
IRUSDUDPHWHUFWREHFRQVLVWHQWZLWK2VWHUVHQHWDO
(2011). However, the aim of the present study is to map
the major QTL, and therefore, the choice of value for
SDUDPHWHUFZLOOQRWKDYHDPDMRULQÀXHQFHRQWKHFRQclusions of this study.
Animals and Phenotypes
The Duroc pig data were collected as part of the genetic evaluation system in Denmark. All data were supplied by the Pig Research Centre, Danish Agriculture
and Food Council, Axeltorv, Denmark. The pedigree
was traced back to 1984; no animals were imported into
the system, and nearly all great-grandparents of animals
with records were known. There was a total of 371,233
individuals in the pedigree, and a total of 30,091 animals
were recorded for FCR. However, the pedigree connected to the genotyped animals had only 11,146 individuals. The trait analyzed was FCR (feed intake/BW gain)
in the interval 30 to 100 kg, recorded from 1992. The
animals with FCR records were all males, and most litters only had 1 or 2 animals with FCR records. The EBV
for FCR were calculated by single-trait animal model,
ZLWK UHJUHVVLRQ HIIHFW RI VWDUW ZHLJKW ¿[HG HIIHFW RI
herd-week-section, random effect of pen, and a random
DGGLWLYHJHQHWLFHIIHFWDVLQ2VWHUVHQHWDO7KH
phenotype used for association analysis was deregressed
estimated breeding values for FCR. The details of the
HVWLPDWLRQ RI GHUHJUHVVHG (%9 DUH JLYHQ E\ 2VWHUVHQ
SNP Array Genotyping
Marker data were obtained for 3,071 animals selected as those having the largest accuracy for the
EBV (born in 1998 to 2010) using the Illumina Porcine
SNP60 BeadChip (Illumina). The SNP data were editHG XVLQJ WKH VDPH FULWHULD DV LQ 2VWHUVHQ HW DO Each animal had a call rate greater than 0.95, a minor
allele frequency greater than 0.05, a call-frequency
score greater than 0.95, a genCall score greater than
0.65, and a heterozygote frequency that did not deviate
1
from Hardy-Weinberg expectations by more than 4 pq
, where p and q are the allele frequencies at the marker.
The term 14 pq corresponds to a one-fourth SD unit when
assuming a binomial distribution of alleles. A total of
33,756 SNP passed these quality control criteria, among
which 30,847 SNP could be mapped to the pig genome
assembly version 10.2 (SSC10.2; NCBI GenBank ID
GCA_000003025.4) using GMAP alignment software
:X DQG:DWDQDEH 2QO\ WKH 613 ZLWK NQRZQ
genome positions were used for association analyses,
because without the known assembly position, it will not
Association studies for feed conversion ratio in pigs
EHSRVVLEOHWROLQNVLJQL¿FDQWO\DVVRFLDWHG613WRFDQdidate genes. The missing genotypes and the haplotype
phases were determined using the software fastPHASE
(Scheet and Stephens, 2006), and linkage disequilibrium
(r2) between adjacent markers were estimated using the
Haploview software (Barrett et al., 2005).
Statistical Analyses
Association Analyses. The 2 methods used to associate SNP and phenotypic data are described below.
7KH ¿UVW DSSURDFK ZDV D 613E\613 DQDO\VLV ZKHUH
HDFK613ZDV¿WWHGVHSDUDWHO\LQDOLQHDUPL[HGPRGHO
(/00) using Yu et al. (2005). Complex familial relationship is the primary confounding factor in genomewide association studies (GWAS) in livestock population. Linear mixed model analysis, which includes the
relationship among individuals through a polygenic effect, is able to control the false-positives due to family
structure (Yu et al., 2005; Sahana et al., 2010). The second approach we used was Bayesian variable selection
(%96; George and McCulloch, 1993).
Linear Mixed Model. The association between the
SNP and the phenotype was assessed by a single-locus
regression analysis for each SNP separately, using a linear
mixed model (Yu et al., 2005). The model was as follows:
y = 1μ + mg + Zu + e ,
[1]
where \ is the vector FCR phenotypes (deregressed
EBV), 1 is a vector of ones with length equal to the number of observations, μ is the general mean, m is a vector
ZLWK JHQRW\SLF LQGLFDWRUV í RU DVVRFLDWLQJ UHcords to the marker effect, g is a scalar of the associated
additive effect of the SNP, Z is an incidence matrix relating phenotypes to the corresponding random polygenic
effect, u is a vector of the random polygenic effect with
the normal distribution N(0, AT2u ), where A is the additive relationship matrix and σ2u is the polygenic variance, and e is a vector of random environmental deviates
with the normal distribution N (0, W −1σ2e ) , where σe2 is
the error variance and W is the diagonal matrix containing weights of the deregressed estimated breeding valXHV7KHPRGHOZDV¿WWHGE\5(0/XVLQJWKHVRIWZDUH
DMU (Madsen and Jensen, 2011), and testing was done
using a Wald test against a null hypothesis of g = 0.
Bayesian Variable Selection. This method is based
on specifying a mixture distribution for SNP effects, and
DOO613PDUNHUVDUH¿WWHGVLPXOWDQHRXVO\LQWKHPRGHO
(George and McCulloch, 1993). The markers with heavily shrunken effects were placed in 1 group, effectively
removing them from the model, and a second group with
more mildly shrunken effects was used to identify the
most plausible multi-SNP model(s).
1043
It was assumed that most markers had very small
effects on the trait (99.5% of SNP in this analysis) and
only a few markers (0.5%) had large effects. Therefore,
among the total 30,847 SNP, a priori about 154 SNP
were assumed to have an effect on FCR. The allocation
of each SNP to either of these 2 distributions was done
using an indicator variable in Gibbs sampling. The averaged mixture indicator estimated a posterior probability for that SNP to come from the distribution with
large effects, which was interpreted as the probability
for the presence of an associated marker or QTL. The
analysis was performed using Bayz software (Janss,
2011), and the variances of the 2 mixture components
were estimated. This approach can therefore be viewed
as a Bayesian multiple regression model selection
method. The model is as follows:
y = 1μ + Mb + e ,
[2]
where \ȝDQGe are as described using Eq. [1], M is
the design matrix with dimension n × j allocating phenotypes to marker effects (n is the number of animals, j
is the number of markers), E is a vector of marker ef2
2
fects, with b π 0 N (0, σ β 0 I ) + π 1 N (0, σ β 1 I ), where p0
2
2
= 99.5, p1 = 0.05, and σ β 0 = 0.01 × σ β 1 . Similar to Eq.
[1], the variances of residuals were proportional to the
reciprocals of the weights of the deregressed EBV. The
Gibbs sampling chain was run for 50,000 iterations, and
WKH¿UVWLWHUDWLRQVZHUHGLVFDUGHGDVEXUQLQ
6LJQL¿FDQW $VVRFLDWLRQV DQG )'5 The genomeZLGH VLJQL¿FDQW DVVRFLDWLRQ DW WKH VLJQL¿FDQFH
level after Bonferroni multiple testing correction was a
P-value of 1.6 × 10-6, i.e., –log10(P-value) > 5.79. For
/00ZH¿UVWH[DPLQHGWKHQXPEHURI613WKDWKDGJHQRPHZLGHVLJQL¿FDQWDVVRFLDWLRQZLWK)&5+RZHYHU
the tests conducted here were not independent and were
linked because of linkage disequilibrium (/') between
markers. The Bonferroni correction may result in too
stringent a threshold and hence many false-negative
UHVXOWV 7KHUHIRUH ZH DOVR FRQVLGHUHG D OLEHUDO VLJQL¿cant threshold, and a SNP was considered to have a sigQL¿FDQWDVVRFLDWLRQZLWK)&5LILWV±ORJ10(P-value) was
greater than 4 (i.e., P-value < 0.0001). The false discovHU\UDWHIRUWKLVQRPLQDOVLJQL¿FDQWWKUHVKROGFRQVLGHUHG
in LMM analyses was calculated using Benjamini and
Hochberg (1995) as m × P/S, where m is the number of
tests, P is the PYDOXHVXVHGWRFDOODQDVVRFLDWLRQVLJQL¿FDQWDQG6LVWKHQXPEHURI613GHFODUHGVLJQL¿FDQWDW
this threshold. The calculated false discovery rate may
be overestimated as the tests conducted are not independent because of LD between markers. For BVS, the SNP
with posterior probability of the mixture indicator larger
than 0.05, which corresponds to a Bayes factor (posteri-
1044
Sahana et al.
than 0.05, which corresponds to a Bayes factor (posterior probability/prior probability) of ≥10.0, were reported
as having an effect on the phenotype.
Chromosomal Mapping of Genes near the Significant
SNP. To provide information regarding identity and
function of genes at mapped SNP markers, the chromosomal positions of a locally created set of Augustus gene
predictions (http://gbi.agrsci.dk/pig/sscrofa10_2_annotation) was used. A list of genes closest to the significant SNP was extracted, allowing a maximum distance
of 1 Mb between SNP and genes. Putative gene identities were established by BLAST homology search to
known human gene transcripts downloaded from the
NCBI RefSeq sequence database. The biological function of genes and possible relation to feed efficiency was
subsequently investigated.
RESULTS
Minor Allele Frequencies and Linkage Disequilibrium
The association analyses were carried out for the
30,847 SNP that could be mapped on the 18 porcine
autosomes (SSC) using pig genome assembly SSC10.2.
The number of SNP on a chromosome ranged from
1,420 on SSC18 to 7,027 on SSC1. The minor allele frequencies of the selected SNP were uniformly distributed
between 0.05 and 0.50 (Supplementary Fig. 1; see http://
journalofanimalscience.org). This indicates that the SNP
panel used in this study represents mainly the common
variation in the genome. The linkage disequilibrium (r2)
was studied on chromosome (SSC14), and the results
are presented in Fig. 1. The average distance between
adjacent SNP was 64 kb, and the average r2 between
adjacent SNP was 0.56. The average r2 values at 100,
1,000, and 10,000 kb distance were 0.43, 0.30, and 0.08,
respectively. The average r2 at 50 Mb was 0.02 on this
chromosome.
Figure 1. Decay of average linkage disequilibrium (r2) on chromosome
14 in Duroc pigs at various distances (in million base pairs, from 0 to 50 Mb).
Association Results
The SNP showing genome-wide significant association with FCR in the LMM analyses are presented in
Table 1, and the Manhattan plot is shown in Fig. 2. The
distribution of allele substitution effects are presented in
Fig. 3. There were a total of 79 significant associations
(P < 0.0001) located on 6 chromosomes (Supplementary
Table 1. Single nucleotide polymorphism showing genome-wide significant associations with feed conversion ratio
(FCR) in Duroc pigs using the linear mixed model (LMM) approach1
SNP
Chr.
Pos(Bp)
MAF
b.value
SE(b)
-Log10(p-value)
Percent of DEBV_var
CASI0010164
4
63896889
0.456
0.026
0.005
6.708
1.04
ALGA0081097
4
64003194
0.456
−0.026
0.005
6.705
1.04
MARC0014536
14
120439294
0.452
0.025
0.005
6.086
0.96
MARC0111695
14
121418432
0.456
−0.026
0.005
6.691
1.04
ASGA0066126
14
121556657
0.453
0.026
0.005
6.566
1.04
MARC0012665
14
121885221
0.496
0.028
0.005
7.793
1.22
MARC0005041
14
122096887
0.497
−0.028
0.005
7.670
1.22
ASGA0066146
14
122124455
0.496
−0.029
0.005
7.891
1.31
ALGA0081147
14
122390163
0.500
−0.029
0.005
8.044
1.31
H3GA0042111
14
123496090
0.493
0.027
0.005
7.033
1.13
1Chr. = chromosome; Pos(Bp) = position on the chromosome (in base pair); MAF = minor allele frequency; b.value = allele substitution effect; SE(b) = SE
for the b.value; Percent of DEBV_var = % of variance for deregressed EBV explained by the SNP.
1045
Association studies for feed conversion ratio in pigs
Table 1; see http://journalofanimalscience.org). The
largest number of significant (P < 0.0001) SNP was located on SSC16 (44 SNP), followed by SSC14 (19 SNP).
Ten SNP located on 2 chromosomes, SSC4 (2 SNP) and
SSC14 (8 SNP), crossed the genome-wide significant
threshold. Five significant SNP were located within
161.8 kb on SSC4, and 2 of them showed genome-wide
significant association. All 5 SNP were in high LD (Fig.
4), indicating that they all are likely linked to the same
QTL. The variances of deregressed EBV explained by
individual significant SNP ranged between 0.56% and
1.31% (Supplementary Table 1).
We considered a SNP to have significant association with FCR if its P-value was less than 0.0001 in the
LMM analyses. Therefore, the false discovery in our
study was (0.0001 × 30847)/79 = 3.9%, meaning 4% of
the 79 associations discovered by LMM analyses are
likely to be false.
In the BVS analysis, a total of 44 SNP located
on 12 chromosomes had posterior probability greater
than or equal to 0.05 (i.e., Bayes factor of 10 or more;
Table 2, Fig. 5). Out of these SNP, 13 were also identified by LMM analyses. These 13 SNP were located
Figure 2. Manhattan plot showing association with feed conversion ratio
trait for all the SNP used in the analysis in Duroc pigs. The horizontal line indicates genome-wide significant threshold. See online version for figure in color.
on 4 chromosomes: SSC4 (3 SNP), SSC7 (1 SNP),
SSC8 (1 SNP), and SSC14 (8 SNP). The distribution
of allele substitution effects for the BVS model is presented in Supplementary Fig. 2 (see http://journalofanimalscience.org).
Table 3 presents the genes located closest to the significant SNP discovered by LMM and/or BVS approaches. It was not possible to detect a specific candidate gene
for all the QTL regions because some of them were wide
(>5 Mb), but some candidate genes could be selected on
the basis of the known gene function in other species.
Candidate genes for the QTL on SSC4 (63.8 Mb) and
SSC14 (121 Mb) are discussed in the next section.
DISCUSSION
Extent of Linkage Disequilibrium
The LD observed in this Duroc pig population was
quite high (average r2 = 0.56 between adjacent markers).
The average r2 at 1,000 kb distance was 0.30; therefore,
the currently used SNP chip should be sufficient for a
whole-genome scan (Meuwissen et al., 2001). However,
high LD at long distance limits fine mapping the QTL
as SNP quite far from the actual QTL position will also
show association because of the extent of LD. Badke
et al. (2012) have studied LD in 4 pig breeds, including Duroc, using the same SNP chip used in the present study. In general, the estimates of LD at different
distances reported by Badke et al. (2012) were less than
those estimated in our study (for example, the r2 between adjacent SNP was 0.46). One of the main reasons
for these discrepancies could be the populations used
to estimate LD. Although Badke et al. (2012) used unrelated trios (separated for at least 2 generations), we
used all the individuals included in this GWAS study irrespective of their relationship. This might have resulted
in the presence of multiple copies of the same haplotype that are identical by descent because of the inclu-
Figure 3. Distribution of allele substitution effects obtained in SNP-by-SNP
analyses in linear mixed model approach. See online version for figure in color.
1046
Sahana et al.
sion of closely related individuals in our sample. Du
et al. (2007) estimated r2 from 4,500 SNP markers in
6 commercial lines of pigs and found estimates of average r2 = 0.51 for markers less than 0.1 centimorgan
(cM; ~100 kb). Uimari and Tapio (2011) also used the
Illumina Porcine SNP60 BeadChip and reported an average r2 of 0.43 and 0.46 for adjacent markers in Finnish
Landrace and Yorkshire pig populations.
Associations with FCR
We used 2 approaches for association mapping,
LMM analyses and a Bayesian variable selection approach. Although the LMM does SNP-by-SNP analysis,
BVS fits all the markers simultaneously. In this study 13
SNP were identified by both LMM and BVS approaches. The LMM identified more SNP (79) associated with
FCR compared with BVS (44 SNP). As BVS considered all the markers simultaneously, given that a SNP is
included in a model, there is little chance that a second
SNP, in LD with the first SNP, will also be included in
a sample. However, if the SNP are in high LD, it is possible that 1 among the SNP in strong LD is picked in
a different iteration of the Markov Chain Monte Carlo
(MCMC) run, thereby relegating the QTL association
to a region rather than to a specific SNP (Sahana et al.,
2010). For example, on SSC14, 8 SNP in close proximity had a Bayes factor of >10, but the highest posterior probability for an individual SNP was only 0.218.
We also observed high LD among these SNP (Fig.6).
A similar phenomenon was also observed on SSC16,
where 11 closely located SNP had a Bayes factor of >10
Figure 4. Linkage disequilibrium (r2) plot for the markers on chromosome 4 showing association with feed conversion ratio in Duroc pigs. See
online version for figure in color.
but none of them individually had high posterior probability.
In the present study the number of chromosomal regions (1 or more closely located significant SNP) showing significant associations with FCR were clustered on
the basis of physical distances to 8 QTL in LMM and
19 for BVS. One reason for the lower number of QTL
discovered by LMM could be the stringent significant
threshold in LMM, whereby some QTL were missed.
There was a marked difference between LMM and
BVS regarding the location of the QTL on SSC16.
Although the significant SNP in LMM were located between 32.4 and 38.9 Mb, the significant SNP for BVS
were at 77.8 to 84.2 Mb. There was suggestive evidence of association at around the BVS QTL in LMM
also, but none of these SNP crossed the significance
threshold set in the study for multiple testing corrections. There is a possibility that 2 QTL for FCR are
segregating on this chromosome.
The detected QTL for FCR from the present study
were compared with the previously published QTL
and/or genes for FCR from the same chromosome as observed in the present study. Direct comparison between
data obtained in this study and those from previous QTL
studies is hindered by the fact that locations given in
centimorgans on different genome assemblies do not
necessarily reflect the same physical location on the
genome. Therefore, the physical locations on the QTL
(in Mb) as given in the SSC10.2 build in the pig QTLdb
(www.animalgenome.org/cgi-bin/QTLdb/SS/index)
were used to compare to the results of previous studies. The majority of the earlier QTL studies were based
on within-family linkage analysis, and therefore, QTL
intervals were generally large.
In the present study, the most promising QTL for
FCR was located at 63.8 to 64.0 Mb on SSC4. Fontanesi
et al. (2010a) reported an association of the cathepsin K
(CTSK) gene located at 102 Mb on SSC4 affecting the
G:F ratio. Fontanesi et al. (2010b) also studied the cathepsin S (CTSS) gene located on SSC4 at 102 Mb, where
QTL for meat and fat deposition were reported. They
found that polymorphism in CTSS genes was associated
with feed:BW gain ratio and ADG. Cepica et al. (2003)
reported QTL associated with longer carcass, greater BW,
and greater meat content at 89 to 98 Mb on SSC4 near the
genes V-ATPase, ATP1A2, and ATP1B1. Therefore, the
QTL identified on SSC4 in this study is a novel one.
Zhang et al. (2009) reported a QTL for FCR on SSC7
at 34 Mb. Insulin-like growth factor 2 mRNA binding
protein (IGF2BP) 1 is located at 59.3 Mb on SSC7. In
the present study the significant SNP was located at
52.5 Mb by both LMM and BVS. Although IGF2BP1
cannot be ruled out, it is most unlikely that the QTL we
observed is due to this gene. Liu et al. (2007) reported a
Association studies for feed conversion ratio in pigs
QTL for FCR on SSC16, but the location of the QTL was
very unspecific (1 to 68 Mb). On SSC16 LMM identified
a QTL at 32 to 38 Mb with the peak at 35 Mb, whereas
the QTL identified by BVS was at 77.7 to 84.1 Mb. It is
possible that there are 2 QTL segregating on this chromosome and the QTL intervals were overlapping from
Table 2. Single nucleotide polymorphisms showing association with feed conversion ratio (FCR) in Duroc pigs in a
Bayesian variable selection approach (Bayes factor ≥ 10)1
SNP
CHR.
POS(BP)
MAF
POST.PROB
DIAS0003220
2
144806634
0.247
0.051
INRA0046679
4
63841307
0.473
0.065
CASI0010164
4
63896889
0.456
0.050
MARC0049861
4
63988781
0.473
0.060
ALGA0026782
4
100674729
0.125
0.050
ALGA0026797
4
100869004
0.129
0.054
M1GA0006282
4
113676430
0.426
0.053
H3GA0013913
4
114077157
0.455
0.078
ASGA0024642
5
14387665
0.427
0.068
ALGA0033360
5
91226878
0.500
0.058
ASGA0030530
7
2277165
0.246
0.050
M1GA0009803
7
28080480
0.235
0.140
SIRI0000155
7
28872346
0.235
0.179
M1GA0010288
7
52545007
0.218
0.088
MARC0055215
8
83821187
0.382
0.086
ASGA0043216
9
60564587
0.320
0.059
MARC0068186
10
52697274
0.264
0.051
ALGA0074502
14
5375680
0.341
0.056
H3GA0042050
14
120546415
0.495
0.070
ASGA0066126
14
121556657
0.453
0.054
MARC0015087
14
121582671
0.491
0.055
MARC0012665
14
121885221
0.496
0.183
MARC0005041
14
122096887
0.497
0.158
ASGA0066146
14
122124455
0.496
0.154
ALGA0081147
14
122390163
0.500
0.218
H3GA0042111
14
123496090
0.493
0.090
ASGA0093353
15
15507497
0.434
0.083
ASGA0084877
15
38003646
0.292
0.055
ASGA0069422
15
49225855
0.233
0.051
H3GA0047121
16
77781169
0.306
0.066
M1GA0021168
16
77794510
0.310
0.064
H3GA0047156
16
78321022
0.375
0.053
ALGA0092260
16
82729055
0.476
0.061
ALGA0092264
16
82766361
0.471
0.060
ASGA0074743
16
82855966
0.472
0.060
M1GA0021418
16
82877833
0.472
0.061
ALGA0107462
16
82915871
0.472
0.063
ASGA0102627
16
83908627
0.496
0.061
ASGA0074775
16
84145049
0.469
0.064
ASGA0074804
16
84160066
0.392
0.101
ALGA0093843
17
26830847
0.224
0.058
ALGA0093863
17
27104034
0.453
0.201
MARC0056150
18
57909445
0.234
0.051
ALGA0098916
18
58907611
0.307
0.054
1Chr. = chromosome; Pos(Bp) = position on the chromosome (in base
pair); MAF = minor allele frequency; post.prob = posterior probability for the
SNP being included in the model.
1047
linkage analyses. This could be the reason for Liu et al.
(2007) reporting such a large QTL interval. Houston et
al. (2005) suggested a QTL at 12 Mb on SSC13 for daily
feed intake. Insulin-like growth factor 2 mRNA binding
protein 2 is located at 93.4 Mb on SSC13. Therefore,
the QTL identified by LMM at 28.2 Mb is an additional
genetic factor on SSC13 affecting FCR.
Candidate Genes for Feed Efficiency
We examined the genes located close to the associated SNP. Some of the QTL regions are wide (>5 Mb),
and several genes are located within those QTL regions.
Therefore, it is not possible to pinpoint a specific candidate gene for such a large QTL region. The strongest
evidence of QTL for FCR was observed on SSC4 and
SSC14. Here there were 2 strong candidate genes that
might be responsible for the QTL effects in these 2 genomic regions affecting feed efficiency.
On SSC4, the SNP CASI0010164 at 63896889
bp showing the strongest association [−log10(P-value) = 6.708] with FCR for this QTL is located within
the gene Hypoxia inducible factor 1, alpha subunit inhibitor (HIF1AN), which is located at 63736245 to
63907065 bp. Another SNP, ALGA0081092, located
within the HIF1AN gene, which is also associated with
FCR (Supplementary Table 1). Zhang et al. (2010), using
mice with a null mutation in the factor-inhibiting HIF-1α
(FIH) gene, found that hypoxia-inducible factors (HIF)
Figure 5. The association results for the SNP using the Bayesian variable selection approach. (top) The estimate of allele substitution effects of the
SNP and (bottom) the posterior probabilities of the SNP being in the model.
See online version for figure in color.
1048
Sahana et al.
Table 3. Genes located closest the associated SNP1
Chr.
Start_Bp
End_Bp
2
144718313 144880260
4
63736245
63907065
4
63909866
63916434
4
63954630
63997673
4
87134024
87185018
4
4
4
100703017 100750827
100845127 100982954
113175592 113353702
4
4
113658484 113701147
114052764 114163466
5
14214605
14290330
5
91180096
91262406
7
2293813
2311064
7
28038952
28077106
7
28810803
28891854
7
52535543
52554486
8
20474676
20536888
8
24729340
24731185
8
83616671
83722239
9
61475647
61539927
10 52583372
13 25517874
52733037
25618569
13 28066059
28221607
13 28488958
28607562
14 4720242
4748260
14 119905252 119985534
14 120054422 120080877
14 120166395 120214754
14 120274015 120322538
Gene_name
Gene_symbol
Sparc/osteonectin, cwcv SPOCK1
and kazal-like domains
proteoglycan (testican) 1
Hypoxia inducible factor 1, HIF1AN
alpha subunit inhibitor
NADH dehydrogenase
NDUFB8
(ubiquinone) 1 beta
subcomplex, 8, 19kDa,
nuclear gene encoding
mitochondrial protein
Wingless-type MMTV integra- WNT8B
tion site family, member 8B
Minichromosome maintenance MCM4
complex component 4,
transcript variant 1
CD5 molecule-like
CD5L
Fc receptor-like 4
FCRL4
Mannosidase, alpha,
MAN1A2
class 1A, member 2
CD101 molecule
CD101
Immunoglobulin
IGSF3
superfamily, member 3
(IGSF3), transcript variant 1
NUAK family,
NUAK1
SNF1-like kinase, 1
PREDICTED: chromosome C12orf63
12 open reading frame 63
Chromosome 6
C6orf145
open reading frame 145
Pre-B-cell
PBX2
leukemia homeobox 2
Butyrophilin-like 2
BTNL2
(MHC class II associated)
Minichromosome maintenance MCM3
complex component 3
Recombination signal
RBPJ
binding protein for immunoglobulin kappa J region
(RBPJ), transcript variant 1
Protocadherin 7,
PCDH7
transcript variant c
Doublecortin-like kinase 2, DCLK2
transcript variant 1
V-etserythroblastosis virus ETS1
E26 oncogene homolog 1
(avian), transcript variant 3
FERM domain containing 4A FRMD4A
Sodium channel,
SCN5A
voltage-gated, type V, alpha
subunit, transcript variant 3
Unc-51-like
ULK4
kinase 4 (C. elegans)
Trafficking protein,
TRAK1
kinesin binding 1
(TRAK1), transcript variant 2
ATPase, H+
ATP6V1B2
transporting, lysosomal
56/58kDa, V1 subunit B2
Cyclin M1
CNNM1
Glutamic-oxaloacetic
GOT1
transaminase 1, soluble
(aspartate aminotransferase 1))
NK2 homeobox 3
NKX2-3
Ectonucleoside triphosphate ENTPD7
diphosphohydrolase 7
Chr.
Start_Bp
End_Bp
14 120345005 120484143
Gene_name
Gene_symbol
CutC copper transporter CUTC
homolog (E. coli)
14 120490521 120575847
Dynamin binding protein DNMBP
14 121295795 121368923
Hypoxia inducible factor 1, HIF1AN
alpha subunit inhibitor
14 121481667 121599455
Paired box 2 (PAX2),
PAX2
transcript variant d
14 121897333 121899278
Ladybird homeobox 1
LBX1
14 122360746 122380773
Deleted in primary ciliary DPCD
dyskinesia homolog (mouse)
14 122729561 122905102
Chromosome 10
C10orf76
open reading frame 76
14 123492265 123505827
Tripartite motif
TRIM8
containing 8 (TRIM8)
14 126126955 126181469
Sortilin-related VPS10
SORCS3
domain containing receptor 3
15 15281842 15301048
Speckle-type POZ protein-like SPOPL
15 37991566 38019080
Rho guanine nucleotide
ARHGEF10
exchange factor (GEF) 10
16 32537265 32559273
ISL LIM homeobox 1
ISL1
16 33953237 33955108
Pelota homolog (Drosophila) PELO
16 33979957 34130428
Integrin, alpha 1 (ITGA1) ITGA1
16 34164917 34262520
Integrin, alpha
ITGA2
2 (CD49B, alpha 2
subunit of VLA-2 receptor)
16 34271233 34290806
Molybdenum
MOCS2
cofactor synthesis 2
(MOCS2), transcript variant 3
16 34779010 34784235
Follistatin, transcript
FST
variant FST317
16 34843343 34908030
NADH dehydrogenase
NDUFS4
(ubiquinone) Fe-S protein 4,
18kDa (NADH-coenzyme
Q reductase) (NDUFS4),
nuclear gene encoding
mitochondrial protein
16 35316886 35475423
ADP-ribosylation factor-like 15 ARL15
16 35788147 35838239
Heat shock 27kDa protein 3 HSPB3
16 35962856 35966677
Sorting nexin 18 (SNX18), SNX18
transcript variant 2
16 36288841 36296526
Endothelial
ESM1
cell-specific molecule 1
(ESM1), transcript variant 2
16 37296881 37396018
Ankyrin repeat domain 55 ANKRD55
16 37530326 37613274
Interleukin 31 receptor A IL31RA
(IL31RA), transcript variant 1
16 38138052 38210961
Mitogen-activated protein MAP3K1
kinase kinasekinase 1
16 38869597 38871285
Actin, beta-like 2
ACTBL2
16 77759785 77802067
Solute carrier
SLC36A1
family 36 (proton/amino
acid symporter), member 1
16 78291978 78338684
TNFAIP3 interacting protein 1 TNIP1
16 82659680 82792978
ADAM metallopeptidase with ADAMTS16
thrombospondin type 1 motif, 16
16 84587087 84601558
Iroquois homeobox 2,
IRX2
transcript variant 1
17 27581211 27672350
MACRO domain
MACROD2
containing 2 (MACROD2),
transcript variant 1
18 57998005 58008669
Inhibin, beta A
INHBA
18 58822206 58966286
Chromosome 7 open reading C7orf10
frame 10, transcript variant 4
1Chr. = chromosome; Start_Bp = staring position of the gene; End_Bp =
end position of the gene.
Association studies for feed conversion ratio in pigs
1049
Figure 6. Linkage disequilibrium (r2) plot for the markers on chromosome 14 showing association with feed conversion ratio from 11.99 to 12.6 Mb region
in Duroc pigs.
is an essential regulator of metabolism: mice lacking FIH
exhibit reduced BW, increased metabolic rate, hyperventilation, and improved glucose and lipid homeostasis and
are resistant to high-fat-diet-induced BW gain and hepatic steatosis. Neuron-specific loss of FIH influenced some
of the major metabolic phenotypes of the global null
animals: those mice have reduced BW, increased metabolic rate, and enhanced insulin sensitivity and are also
protected against high-fat-diet-induced BW gain. Their
results demonstrated that FIH acts to a significant degree
through the nervous system to regulate metabolism.
On SSC14, the ladybird homeobox 1 (LBX1) gene
located at 121897333 to 121899278 bp is a strong candidate gene for FCR. There were 4 genome-wide significant SNP very closely located (121885221 to 122390163
bp) to the LBX1 gene. In mice, this gene is a key regulator of muscle precursor cell migration and is required for
the acquisition of dorsal identities of forelimb muscles
(Watanabe et al., 2011). They generated conditional
Lbx1-null mice using the Cre-loxP system. In Lbx1-null
mouse lines, Pax3-expressing limb muscle precursor
cells were seriously reduced during embryonic development, and eventually the limb extensor muscles were
lost after birth. The LBX1 gene can have an effect on
muscle growth in pigs and thereby on FRC, which is a
ratio between body growth and feed intake.
The marker effects are overestimated in LMM because only 1 marker is fitted at a time. However, BVS
considered all the markers simultaneously, and therefore, it is expected to give more accurate effect size
estimates. In BVS, there were only 7 SNP with allele
substitution effects (absolute values) larger than 0.001
(Supplementary Fig. 2). The maximum variance explained by a marker was only 0.00137% in SD units on
deregressed EBV. If we assume the closely located SNP
picked up by BVS represent 1 QTL, then the present
study was able to identify approximately 20 QTL affecting FCR. No QTL with large effect size points toward a
highly polygenic nature of the trait.
This study provides a list of SNP associated with
FCR and also offers valuable information on the genetic
architecture and candidate genes for this trait. The information about SNP or regions showing association with
FCR may be used as prior information in genomic selection and also for identifying candidate genes affecting
FCR. Hypoxia inducible factor 1, alpha subunit inhibitor
and LBX1 are 2 possible candidate genes affecting FCR
on SSC4 and SSC14, respectively. As we observed high
LD between adjacent markers, it will be necessary to
analyze other pig populations to come closer to the candidate genes responsible for the identified QTL effects.
LITERATURE CITED
Badke, Y. M., R. O. Bates, C. W. Ernst, C. Schwab, and J. P. Steibel.
2012. Estimation of linkage disequilibrium in four US pig
breeds. BMC Genomics 13:24.
1050
Sahana et al.
Barrett, J. C., B. Fry, J. Maller, and M. J. Daly. 2005. Haploview:
Analysis and visualization of LD and haplotype maps.
Bioinformatics 21:263–265.
Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing.
J. R. Stat. Soc. Series B Stat. Methodol. 57:289–300.
Bolormaa, S., B. J. Hayes, K. Savin, R. Hawken, W. Barendse, P. F.
Arthur, R. M. Herd, and M. E. Goddard. 2011. Genome-wide
association studies for feedlot and growth traits in cattle. J.
Anim. Sci. 89:1684–1697.
Cepica, S., A. Stratil, M. Kopecny, P. Blazkova, J. Schröffel Jr., R.
Davoli, L. Fontanesi, G. Reiner, H. Bartenshlager, and G. Moser.
2003. Linkage and QTL mapping for Sus scrofa chromosome 4.
J. Anim. Breed. Genet. 120:28–37.
Christensen, O. F., P. Madsen, B. Nielsen, T. Ostersen, and G. Su.
2012. Single-step methods for genomic evaluation in pigs.
Animal 6:1565–1571.
Du, F., A. Clutter, and M. Lohuis. 2007. Characterizing linkage disequilibrium in pig populations. Int. J. Biol. Sci. 3:166–178.
Fontanesi, L., E. Scotti, L. Buttazzoni, S. Dall’Olio, R. Davoli, and V.
Russo. 2010a. A single nucleotide polymorphism in the porcine
cathepsin K (CTSK) gene is associated with back fat thickness and
production traits in Italian Duroc pigs. Mol. Biol. Rep. 37:491–495.
Fontanesi, L., C. Speroni, L. Buttazzoni, E. Scotti, L. N. Costa, R. Davoli,
and V. Russo. 2010b. Association between cathepsin L (CTSL) and
cathepsin S (CTSS) polymorphisms and meat production and carcass traits in Italian Large White pigs. Meat Sci. 85:331–338.
Garrick, D. J., J. F. Taylor, and R. L. Fernando. 2009. Deregressing
estimated breeding values and weighting information for genomic regression analyses. Genet. Sel. Evol. 41:55.
George, E. I., and R. E. McCulloch. 1993. Variable selection via
Gibbs sampling. J. Am. Stat. Assoc. 88:881–889.
Houston, R. D., C. S. Haley, A. L. Archibald, and K. A. Rance. 2005.
A QTL affecting daily feed intake maps to chromosome 2 in
pigs. Mamm. Genome 16:464–470.
Hu, Z.-L., C. A. Park, E. R. Fritz, and J. M. Reecy. 2010. QTLdb: A
comprehensive database tool building bridges between genotypes and phenotypes. Proc. 9th World Congr. Genet. Appl.
Livest. Prod. Leipzig, Germany.
Hu, Z.-L., and J. M. Reecy. 2007. Animal QTLdb: Beyond a repository: A public platform for QTL comparisons and integration
with diverse types of structural genomic information. Mamm.
Genome 18:1–4.
Janss, L. 2011. Bayz manual version 2.02. Janss Biostat., Leiden,
Netherlands.
Jensen, J., G. Su, and P. Madsen.2012. Partitioning additive genetic
variance into genomic and remaining polygenic components for
complex traits in dairy cattle. BMC Genet. 13:44.
Liu, G., D. G. Jennen, E. Tholen, H. Juengst, T. Kleinwächter, M.
Hölker, D. Tesfaye, G. Un, H. J. Schreinemachers, and E.
Murani. 2007. A genome scan reveals QTL for growth, fatness,
leanness and meat quality in a Duroc-Pietrain resource population. Anim. Genet. 38: 241–252
Madsen, P., and J. Jensen. 2011. A user’s guide to DMU, a package
for analyzing multivariate mixed models, version 6, Release 5.
http://www.dmu.agrsci.dk/dmuv6_guide.5.0.pdf. (Accessed 5
January 2011).
Meuwissen, T. H., B. J. Hayes, and M. E. Goddard. 2001. Prediction
of total genetic value using genome-wide dense marker maps.
Genetics. 157:1819–1829.
Okine, E. K., J. A. Basarab, L. A. Goonewardene, and P. Mir. 2004.
Residual feed intake and feed efficiency: Differences and implications. Proc. FL Rumin. Nutr. Symp., Gainesville, FL. p. 27–38.
Ostersen, T., O. F. Christensen, M. Henryon, B. Nielsen, G. Su, and P.
Madsen. 2011. Deregressed EBV as the response variable yield
more reliable genomic predictions than traditional EBV in purebred pigs. Genet. Sel. Evol. 43:38.
Risch, N. J. 2000. Searching for genetic determination in the new
millennium. Nature 405:847–856.
Sahana, G., B. Guldbrandtsen, L. Janss, and M. S. Lund. 2010.
Comparison of association mapping methods in a complex pedigreed population. Genet. Epidemiol. 34:455–462.
Scheet, P., and M. Stephens. 2006. A fast and flexible statistical
model for large-scale population genotype data: Applications to
inferring missing genotypes and haplotypic phase. Am. J. Hum.
Genet. 78:629–644.
Uimari, P., and M. Tapio. 2011. Extent of linkage disequilibrium
and effective population size in Finnish Landrace and Finnish
Yorkshire pig breeds. J. Anim. Sci. 89:609–614.
Watanabe, S., S. Matsushita, M. Hayasaka, and K. Hanaoka. 2011.
Generation of a conditional null allele of Lbx1. Genesis 49:803–810.
Wu, T. D., and C. K. Watanabe. 2005. GMAP: A genomic mapping and alignment program for mRNA and EST sequences.
Bioinformatics 21:1859–1875.
Yu, J., G. Pressoir, W. H. Briggs, I. Vroh Bi, M. Yamasaki, J. F.
Doebley, M. D. McMullen, B. S. Gaut, D. M. Nielsen, J. B.
Holland, S. Kresovich, and E. S. Buckler. 2005. A unified
mixed-model method for association mapping that accounts for
multiple levels of relatedness. Nat Genet. 38:203–208.
Zhang, N., Z. Fu, S. Linke, J. Chicher, J. J. Gorman, D. Visk, G. G.
Haddad, L. Poellinger, D. J. Peet, F. Powell, and R. S. Johnson.
2010. The asparaginyl hydroxylase factor inhibiting HIF-1alpha
is an essential regulator of metabolism. Cell Metab. 11:364–378.
Zhang, Z. Y., J. Ren, D. R. Ren, J. W. Ma, Y. M. Guo, and L. S. Huang.
2009. Mapping quantitative trait loci for feed consumption and
feeding behaviors in a White Duroc × Chinese Erhualian resource population. J. Anim. Sci. 87: 3458–3463.