Download The Impact of Pedigree Relationship on Molecular Breeding Value

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quantitative trait locus wikipedia , lookup

Heritability of IQ wikipedia , lookup

Public health genomics wikipedia , lookup

Transcript
The Impact of Pedigree Relationship
on Molecular Breeding Value Accuracy
Related Individual
Trait
y
lit
y
lit
True
EPD
Pedigree Relationship
True
EPD
Genetic Correlation
Trait
y
lit
bi
a
rit
he
Accuracy: The accuracy of an EPD for an
animal will depend on the amount of data we
have on that animal. The data can include
the trait measured on the animal or its relatives. It can also include data on correlated
traits and genomic information in the form of
MBV. The amount of information that various
types of data contributes to the prediction of the
true EPD depends on the strengths of the linkages between the data and the true EPD. These
linkages are illustrated by the arrows in Figure 1. The strength of the linkages depends on
the heritability, the genetic correlation between
the traits, and the correlation between individuals measured by their pedigree relationship. A
MBV can be thought of as another correlated
trait whose heritability is close to one.
i
ab
rit
he
i
ab
rit
he
Written by Stephen D. Kachman
Department of Statistics
University of Nebraska–Lincoln
Introduction: Incorporating genomic information through the use of molecular breeding
values (MBV) into a genetic evaluation consists of 1) single nucleotide polymorphisms
(SNP) effects estimated using a set of training
data, 2) estimation of genetic parameters (primarily the genetic correlation) using a set of validation data, and 3) predicting genomic enhanced
expected progeny differences (EPD) using
a set of evaluation data. Assigning the proper
weight to genomic information along with its effect on the estimated accuracy of a GeEPD depends on the animals in the validation data set
and animals in the evaluation data set having
similar pedigree relationships to the animals in
the training data set. When the animals in the
training and validation data sets have a closer
pedigree relationship than those in the training
and evaluation data sets the weight given to genomic information will be too high and the estimated accuracies will be inflated.
Trait
True
EPD
Correlated Trait
Figure 1: Linkages between various types of data
to an animal’s true EPD for a trait.
Genetic Correlation: The key parameter in
determining what weight a MBV should receive
in a genetic evaluation and how much of an impact a MBV has on the accuracy is the genetic
correlation between the phenotypic trait and the
MBV. The genetic correlation measures the association between a MBV and the true EPD. For
there to be a strong association between a MBV
and the true EPD requires that SNP be located
close to quantitative trait loci (QTL) for the
trait, that alleles for those SNP line up with alleles for the corresponding QTL, and that these
conditions occur for many of the biologically significant QTL for a trait.
Construction of a Genomic enhanced
EPD: The steps in incorporating genomic information in the form of a MBV into a genomic
enhanced EPD (GeEPD) is illustrated in Figure 2. The three steps involved are 1) construction of a MBV using the training data, 2) estimation of genetic parameters such as the genetic
correlation using the validation data, and 3) estimation of GeEPD using the evaluation data.
The construction of a MBV involves the esti-
Pedigree
Relationship
Training
Data
11
00
00
11
00
11
00
11
Pedigree
Relationship
Validation
Data
Evaluation
Data
11
00
00
11
00
11
00
11
11
00
00
11
00
11
00
11
Estimated
SNP
Effects
Genomic
Enhanced
EPD
Estimated
Genetic
Correlation
Figure 2: The steps and data sets involved in incorporating genomic information into a national
cattle evaluation to produce a GeEPD.
mation of the SNP effects used to construct the
MBV. The training data needs to include animals with both genomic information and phenotypic information where the phenotype information might be provided in the form of phenotype
based EPDs. Since estimation of SNP effects
involves linking genomic information with phenotypic information, the estimated effects will
be weighted towards animals with both genotypic information and accurate phenotype based
EPDs.
Validation of a MBV involves estimation of
genetic parameters including the genetic correlation. A MBV is constructed based on how
well it performed for the animals in the training data. Validation using the training data will
result in overly optimistic estimates of the genetic correlation. Since the performance of a
MBV for a group of animals being evaluated is
expected to deteriorate as their pedigree relationship with the animals in the training data
becomes weaker. Therefore, the validation data
should be constructed so that pedigree relationships between the validation and training data is
similar to the pedigree relationships between the
evaluation and training data.
Estimation of GeEPD in a national evaluation
makes use of both the SNP effects estimated using a set of training data to calculate the MBVs
and the genetic correlation estimated using a set
of validation data.
The training, validation, and evaluation data
sets differ in several important respects. To
effectively estimate SNP effects, the training
data consists of records on animals that have
both genotype information and are rich in terms
of phenotype information. Because MBV are
trained using animals from the training data, a
MBV tends to work best with animals that are
either from or closely related to animals in the
training data used to construct the MBV. To effectively estimate the genetic correlation needed
in the evaluation stage requires a validation data
set which consists of records on animals that
have both genotype information and are rich in
terms of phenotype information. In addition,
the pedigree relationship between animals in the
validation data set and animals in the evaluation data set with little phenotypic information
should be similar.
Implications: The pedigree relationships between the training, validation, and evaluation
data sets has a large impact on obtaining accurate estimates of the true effectiveness of a MBV,
on giving the proper weight to genomic information, and on obtaining accurate estimates of
the accuracy of a GeEPD. Selecting a validation
population which is more closely related training population than to the evaluation population
will result in over estimating the effectiveness of
the MBV, give too much weight to genomic information, and overstate the accuracy of the resulting GeEPD.