Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Impact of Pedigree Relationship on Molecular Breeding Value Accuracy Related Individual Trait y lit y lit True EPD Pedigree Relationship True EPD Genetic Correlation Trait y lit bi a rit he Accuracy: The accuracy of an EPD for an animal will depend on the amount of data we have on that animal. The data can include the trait measured on the animal or its relatives. It can also include data on correlated traits and genomic information in the form of MBV. The amount of information that various types of data contributes to the prediction of the true EPD depends on the strengths of the linkages between the data and the true EPD. These linkages are illustrated by the arrows in Figure 1. The strength of the linkages depends on the heritability, the genetic correlation between the traits, and the correlation between individuals measured by their pedigree relationship. A MBV can be thought of as another correlated trait whose heritability is close to one. i ab rit he i ab rit he Written by Stephen D. Kachman Department of Statistics University of Nebraska–Lincoln Introduction: Incorporating genomic information through the use of molecular breeding values (MBV) into a genetic evaluation consists of 1) single nucleotide polymorphisms (SNP) effects estimated using a set of training data, 2) estimation of genetic parameters (primarily the genetic correlation) using a set of validation data, and 3) predicting genomic enhanced expected progeny differences (EPD) using a set of evaluation data. Assigning the proper weight to genomic information along with its effect on the estimated accuracy of a GeEPD depends on the animals in the validation data set and animals in the evaluation data set having similar pedigree relationships to the animals in the training data set. When the animals in the training and validation data sets have a closer pedigree relationship than those in the training and evaluation data sets the weight given to genomic information will be too high and the estimated accuracies will be inflated. Trait True EPD Correlated Trait Figure 1: Linkages between various types of data to an animal’s true EPD for a trait. Genetic Correlation: The key parameter in determining what weight a MBV should receive in a genetic evaluation and how much of an impact a MBV has on the accuracy is the genetic correlation between the phenotypic trait and the MBV. The genetic correlation measures the association between a MBV and the true EPD. For there to be a strong association between a MBV and the true EPD requires that SNP be located close to quantitative trait loci (QTL) for the trait, that alleles for those SNP line up with alleles for the corresponding QTL, and that these conditions occur for many of the biologically significant QTL for a trait. Construction of a Genomic enhanced EPD: The steps in incorporating genomic information in the form of a MBV into a genomic enhanced EPD (GeEPD) is illustrated in Figure 2. The three steps involved are 1) construction of a MBV using the training data, 2) estimation of genetic parameters such as the genetic correlation using the validation data, and 3) estimation of GeEPD using the evaluation data. The construction of a MBV involves the esti- Pedigree Relationship Training Data 11 00 00 11 00 11 00 11 Pedigree Relationship Validation Data Evaluation Data 11 00 00 11 00 11 00 11 11 00 00 11 00 11 00 11 Estimated SNP Effects Genomic Enhanced EPD Estimated Genetic Correlation Figure 2: The steps and data sets involved in incorporating genomic information into a national cattle evaluation to produce a GeEPD. mation of the SNP effects used to construct the MBV. The training data needs to include animals with both genomic information and phenotypic information where the phenotype information might be provided in the form of phenotype based EPDs. Since estimation of SNP effects involves linking genomic information with phenotypic information, the estimated effects will be weighted towards animals with both genotypic information and accurate phenotype based EPDs. Validation of a MBV involves estimation of genetic parameters including the genetic correlation. A MBV is constructed based on how well it performed for the animals in the training data. Validation using the training data will result in overly optimistic estimates of the genetic correlation. Since the performance of a MBV for a group of animals being evaluated is expected to deteriorate as their pedigree relationship with the animals in the training data becomes weaker. Therefore, the validation data should be constructed so that pedigree relationships between the validation and training data is similar to the pedigree relationships between the evaluation and training data. Estimation of GeEPD in a national evaluation makes use of both the SNP effects estimated using a set of training data to calculate the MBVs and the genetic correlation estimated using a set of validation data. The training, validation, and evaluation data sets differ in several important respects. To effectively estimate SNP effects, the training data consists of records on animals that have both genotype information and are rich in terms of phenotype information. Because MBV are trained using animals from the training data, a MBV tends to work best with animals that are either from or closely related to animals in the training data used to construct the MBV. To effectively estimate the genetic correlation needed in the evaluation stage requires a validation data set which consists of records on animals that have both genotype information and are rich in terms of phenotype information. In addition, the pedigree relationship between animals in the validation data set and animals in the evaluation data set with little phenotypic information should be similar. Implications: The pedigree relationships between the training, validation, and evaluation data sets has a large impact on obtaining accurate estimates of the true effectiveness of a MBV, on giving the proper weight to genomic information, and on obtaining accurate estimates of the accuracy of a GeEPD. Selecting a validation population which is more closely related training population than to the evaluation population will result in over estimating the effectiveness of the MBV, give too much weight to genomic information, and overstate the accuracy of the resulting GeEPD.