Download But what are genomic (additive) relationships?

Document related concepts

Inbreeding avoidance wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genomic library wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome evolution wikipedia , lookup

Epistasis wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Gene wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Gene expression programming wikipedia , lookup

Medical genetics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genetic engineering wikipedia , lookup

Inbreeding wikipedia , lookup

Pathogenomics wikipedia , lookup

Behavioural genetics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Twin study wikipedia , lookup

Genetic drift wikipedia , lookup

Genomic imprinting wikipedia , lookup

Human genetic variation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Heritability of IQ wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Public health genomics wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
Butwhataregenomic
(additive)relationships?
AndresLegarra
[email protected]
INRAUMRGenPhySE,Toulouse,France
1
WhatIwanttoshowinthistalk
• Giveanoverviewofsomeestimatorsofrelationshipsinahistorical
context
• Explainwhy“allgenomicrelationshipsareequal”
2
Kinship
Itobviously comesfromLatin“parentes”
3
Sowhatiskinship?
• Sociallyithasa“pedigree”interpretation
• e.g.”allroyalfamiliesarerelated”
• Howeverpedigrees“gobackforever”
• Weneedamorerigorousdefinition
4
Truerelationships
• Twoindividualsaregeneticallyidentical(foratrait)iftheycarrythe
samegenotypeatthecausalQTLsorgenes
• Thisisabiologicalfact
• if Isharethe blood group 00with somebody Iam“like”his twin
• Thegeneticsofonelocusfortwodiploidindividualscanbedescribed
usingGillois’identitycoefficients
5
Genes andphenotypes
• Heredity seems toact inalinear manner
• Fisher(1918)explained why this is so:
• The« substitution »effect ofoneallele is theregression ofphenotype ongenotype
𝟎
𝑪𝑪
• 𝑎 = 𝒛$ 𝒛 %𝟏 𝒛$ 𝒚, 𝒛 = *𝟏 ,e.g.forgenotypes = *𝑪𝑻
𝟐
𝑻𝑻
• Undermost realistic models ofgene actionandmutation,this « substitution »
effect explains alargepartofthegenetic varianceofatrait
• Even ifbiology is complex (interactions)
6
Why additiverelationships
• Diploids transmitalleles,notgenotypes,totheir offspring
• Additivesubstitutioneffects describe adequately thegenetic
superiority oftheoffspring ofone(selected)parentmated atrandom
toapopulation
• Thisis thereason why we useBreeding Valuesforgenetic improvement
• Inpractice,even ifmating is notatrandom,they arevery goodguides
• See next talkforconsideration ofmatings « notatrandom »
• Additivesubstitutioneffects:=>additiverelationships
7
Relationships
• Relationshipswere conceived asstandardized covariances (Fisher,Wright)
• 𝐶𝑜𝑣 𝑢3 , 𝑢4 = 𝑅𝑒𝑙34 𝜎9:
• 𝑅𝑒𝑙34 “some”relationship
• 𝜎9: « some »variancecomponent
• Genetic relationships aredue toshared (Identical By State)alleles atcausal
genes
• These genesareunknown (andmany will likely remain so)
• Useproxies
• Pedigree relationships
• Marker relationships
8
Classical view ofapedigree
• Baseanimals aredrawn from avery large,
unselected population
• All2n founder alleles aredifferent
• E.g.for blood group we would have 10
different alleles (there areactually 3)
• These assumptions arefalsebutwork
reasonably well
9
Pedigree relationships
• (Malécot,Wright)
• Kinship or coancestry𝜙34 of𝑖 and𝑗:probability that one allele taken
atrandom from each individualis identical by descent
• Additive relationship 𝐴34 = 2𝜙34
0
0.5
0.25
0.25
10
Computationofrelationshipcoefficientsfrom
pedigree
𝑎NO
1
= 2ΔQ + ΔS + ΔT + ΔU + ΔW
2
𝑑YZ[ = ΔU
𝑑\Z[ = ΔQ
1
𝑐NO = ΔQ + ΔT
2
1
𝑐ON = ΔQ + ΔS
2
• e.g.followingKarigl (1981)
11
X
Juan
León
Pedro
Y
Petrona
Juana
Paiperon
Mariana
Leonor
Beltrán
Andrés
Julio
Carmen Constantino
Teresa
Julio-Mencha
Progeny-progeny
Julio-progeny
Mencha-progeny
1
0.01025
0.06580
0.03467
0.03467
2
0.02393
0.04333
0.08252
0.00000
3
0.02490
0.04333
0.00000
0.08252
4
0.02393
0.04333
0.06665
0.06665
5
0.02490
0.04333
0.08228
0.08228
6
0.00708
0.00729
0.00000
0.00000
7
0.05103
0.02383
0.00000
0.00000
8
0.05103
0.02383
0.00000
0.00000
9
0.05127
0.24713
0.08228
0.06665
10
0.10937
0.15900
0.29248
0.00000
11
0.16992
0.15900
0.00000
0.29248
12
0.00708
0.00729
0.06665
0.08228
13
0.05103
0.02383
0.00000
0.29248
14
0.05103
0.02383
0.29248
0.00000
15
0.34326
0.08582
0.00000
0.00000
Francisco Caciana
Catoya
Mencha
Figure 2. The pedigree of the Jicaque Indians Julio and Mencha.
Table 1. Detailed coefficients of identity for four pairs involving Julio, Mencha and their progeny.
Garcia-CortesGenSel Evol 2015
12
Pedigree relationships:A
• Systematic “tabular”rulestocomputeany 𝐴34 (Emik &Terrill 1947)
• The whole array of𝐴34 is disposed inamatrix 𝑨.
• 𝑨%Q is very sparse andeasy tocreate andmanipulate (Henderson
1976)
• Extraordinary development ofwhole-pedigree methods inlivestock genetics
• E.g.computing inbreeding for 15generations including 106 sheep takes
minutes
13
Pedigree relationships:A
• Earlyuseofmarkersusedthemtoinferpedigreesor relationships
• Gathermarkers,thenreconstructpedigrees,thenconstructA
• Inconservationgenetics,molecular markershaveoften beenused to
estimate pedigreerelationships
• Either estimates ofAxy ,orestimates of« themost likely relation »(sondaughter,cousins,whatever)
• LiandHorvitz 1953,Cockerham 1969,Ritland 1996,Caballero&Toro2002,
andmany others
• Withabundantmarkerdatawecandobetterthanthis
14
Theinfinitesimalworldwasahappyone
• Therewasamythicallarge,unrelated“base”populationfromwhich
everythingoriginated
• Chromosomes“didnotexist”forBLUPers
• GeneswereIdenticalByStateONLYiftheywereidenticalbydescent
(IBD)
• Relationshipscouldbecomputedfrompedigree
• Dominanceandadditivitycan“easily”beconsidered
• Onlyinbreedingresultedincomplications
15
Thegenomicworldisacruelone
• Whatisagenomicbasepopulation?
• Chromosomesexistandarefinite
• Markersare“measures”ofDNAthatare“read”(andcostmoney!!)
• Weestimate relationships
• …yetwedonotagreeonacommondefinitionofrelationships
• (seerecentreviews
• EAThompsondoi:10.1534/genetics.112.148825,
• Speed&Baldingdoi:10.1038/nrg3821)
16
Realized relationships
• IdenticalByDescent Relationships based onpedigreeareaverage relationships
which assumeinfinite loci.
_ areabitdifferent duetofinite genome size(Hilland
• « Real »IBDrelationships 𝑨
Weir,2010)
_
• Therefore A is theexpectation ofrealized relationhips 𝑨
• SNPs moreinformativethan A.
• Two fullsibsmight haveacorrelation of0.4or0.6
• Youneed many markerstoget these « finerelationships »
17
Traditional Pedigree
Sire of Sire
Sire
Dam of Sire
Animal
Sire of Dam
Dam
Dam of Dam
Interbull annual meeting 2007 (18)
VanRaden
2007
Genomic Pedigree
Interbull annual meeting 2007 (19)
VanRaden
2007
Haplotype Pedigree
atagatcgatcg
ctgtagcgatcg
ctgtagcttagg
agatctagatcg
agggcgcgcagt
ctgtctagatcg
cgatctagatcg
atgtcgcgcagt
cggtagatcagt
agagatcgcagt
agagatcgatct
atgtcgctcacg
atggcgcgaacg
ctatcgctcagg
Interbull annual meeting 2007 (20)
VanRaden
2007
Genotype Pedigree
Count number of second allele
121101011110
111211120200
101121101111
122221121111
101101111102
011111012011
121120011010
0 = homozygous for first allele (alphabetically)
1 = heterozygous
2 = homozygous for second allele (alphabetically)
Interbull annual meeting 2007 (21)
VanRaden
2007
Comparison of expected and observed
variances – relationship/sharing
4401 full sib pairs
400-800 markers
Expected
Mean 0.5
SD 0.039
Observed
0.498
Mean 0.0498
SD 0.036
Range 0.37 - 0.63
Source: Visscher et al.
22
Slidefrom WGHill
Genomicrelationships
Canbeseenas
• Estimatorsofwhole-genomeIBDrelationships
• Ritland,1996,VanRaden 2007andmanyothers
• Onesuchsoftwarepre-SNPeraisSPAGeDi (HardyandVekemans)
• RelationshipsattheQTLloci
• e.g.Nejati-Javaremi etal.,1997;
• Bothatthesametime
• TheestimatorofVanRaden 2007,2008(==Yangetal.2010)hasreceivedgreat
attention
23
Genomicrelationships
• Inpractice,thebehaviorofmostofthesethingsareverysimilarto
eachother
• First,becausetheyinvolvesomeformofdistanceacrossgenotypes,
andmostdistancesareverysimilar
• Second,becauseinacovariance/BLUP/REML/linearworldtheyare
oftenmathematicallyequivalent
24
Genomicrelationships
Canbeseenas
• Estimatorsofwhole-genomeIBDrelationships
• Ritland,1996,VanRaden 2007andmanyothers
• Onesuchsoftwarepre-SNPeraisSPAGeDi (HardyandVekemans)
• RelationshipsattheQTLloci
• e.g.Nejati-Javaremi etal.,1997;
• Bothatthesametime
• TheestimatorofVanRaden 2007,2008(==Yangetal.2010)hasreceivedgreat
attention
25
IBSandIBD
• IBSatmarkers(𝑟a34 )isafrequentlyusedestimatorofrealizedIBD(𝐴b34 )
• IndividualscanbeidenticalbyIBDorbyIBSatthefounders:
𝑟a34 = 𝐴b34 + 2 − 𝐴b34 𝑝: + 𝑞 :
• Thus,IBSisbiasedupwardswithrespecttoIBD.
• Thishasoriginatedabunchofestimators,withacommonproblem:where
togetp from.
• Foradetailedaccount,seeToroetal(2011GenSel Evol)
26
Thequantitativegeneticsofmarkers
• Considergenecontentcoding{𝐴𝐴, 𝐴𝑎, 𝑎𝑎}as𝑚 = {0,1,2}
• Cockerham,1969:
• Fortwoindividuals,thecovarianceoftheirgenecontentsis
𝐶𝑜𝑣 𝑚3 , 𝑚4 = 𝐴b 34 2𝑝𝑞
• Inotherwords,tworelatedindividualswillshowsimilargenotypesatthe
markers
• Thisleadstothemethodofcovariances ofVanRaden (2008)
27
VanRaden’s “firstG”
Genotypes{0,1,2}
𝑮=
Ifbaseallelicfrequencies
areused,G isanunbiased
anefficientestimatorofIBD
realizedrelationships
Shiftedtorefertothe
averageofapopulation
withallelefrequenciesp
𝑴%:𝑷 𝑴%:𝑷 m
:∑op qp
Scaledtorefertothe
geneticvarianceofa
populationwithallele
frequenciesp
28
Some properties ofG
• Ifp arecomputed from thesample
• InHWE&LinkageEquilibrium
• Average ofDiag(G)=1
• Average (G)=0
𝑮=
𝑴%:𝑷 𝑴%:𝑷 m
:∑op qp
• With average inbreeding F
• Average ofDiag(G)=1+F
freq
AA
Aa
aa
q2 + pqF
2pq(1-F)
p2 + pqF
29
Some intriguing properties ofG
• Ifp arecomputed from thedata
• Thisimplies that E(Breeding Values)=0
• Positiveandnegative inbreeding
• Some individuals aremoreheterozygous than theaverage of
thepopulation(OK,nobiological problem)
• Positiveandnegative genomic relationships
• Thisimplies that individuals i andj aremoredistinctthan an
average pairofindividuals inthedata
• Fixingnegative estimates ofrelationships to0is wrong praxis
30
Notpositivedefinite
• Strandén &Christensen(2011)showed that ifp’s
areaverages across thesample then G is not
positivedefinite (hasnoinverse)
• We could useBLUPequations with non-inverted G
(Henderson,1984)=>see exercises
• Instead,we use𝑮 = 0.99
𝑴%:𝑷 𝑴%:𝑷 m
:∑op qp
+0.01I
orsomethingsimilar
31
Genomicrelationships
Canbeseenas
• Estimatorsofwhole-genomeIBDrelationships
• Ritland,1996,VanRaden 2007andmanyothers
• Onesuchsoftwarepre-SNPeraisSPAGeDi (HardyandVekemans)
• RelationshipsattheQTLloci
• e.g.Nejati-Javaremi etal.,1997;
• Bothatthesametime
• TheestimatorofVanRaden 2007,2008(==Yangetal.2010)hasreceivedgreat
attention
32
Volume 41
January–February 2001
Number 1
PERSPECTIVES
What If We Knew All the Genes for a Quantitative Trait in Hybrid Crops?
Rex Bernardo*
ABSTRACT
Plant genomics programs are expected to decipher the sequence
and function of genes controlling important traits. Most of the important traits in crops are quantitative and are controlled jointly by many
loci. What if we knew all the genes for a quantitative trait in hybrid
crops? Will genomics enhance hybrid crop breeding, which currently
involves selection on the basis of phenotypes rather than gene information? With maize (Zea mays L.) as a model species, I found through
computer simulation that gene information is most useful in selection
when few loci (e.g., 10) control the trait. With many loci ($50),
the least squares estimates of gene effects become imprecise. Gene
information consequently improves selection efficiency among hybrids by only 10% or less, and actually becomes detrimental to selection as more loci become known. Increasing the population size and
trait heritability to improve the estimates of gene effects also improves
phenotypic selection, leaving little room for improvement of selection
efficiency via gene information. The typical reductionist approach in
genomics therefore has limited potential for enhancing selection for
quantitative traits in hybrid crops.
“cherry-pick” as many desirable genes as possible into
one single-cross hybrid. It becomes increasingly difficult
to accumulate all the desirable genes into one hybrid if
the inbreds differ at an increasingly large number of
loci. Consequently, the effects of the individual genes
need to be quantified for the information to be useful
in selection (Kennedy et al., 1992). In other words, a
maize breeder would need to know how many grams
per kilogram of oil each gene for kernel oil contributes.
Selection in hybrid crops, such as maize, oilseed rape
(Brassica napus L.), hybrid rice (Oryza sativa L.), rye
(Secale cereale L.), sorghum (Sorghum bicolor L.
Moench), sugar beet (Beta vulgaris L.), and sunflower
(Helianthus annuus L.), is performed among testcrosses
of recombinant inbreds and among hybrids (Fehr, 1987,
p. 2, 5–6). Best linear unbiased prediction on the basis
of trait phenotypes (T-BLUP; Henderson, 1985) is particularly useful for selecting improved single-cross hybrids (Bernardo, 1996). Selection, however, can be on
the basis of both trait values and known genes (via
trait and gene best linear unbiased prediction, i.e., TGBLUP) if some of the genes are known, or on gene
information alone (via standard multiple regression) if
all the genes are known (Kennedy et al., 1992). Details
• InsteadoftryingtoestimateofQTLeffects,wecoulduseidentityby
state relationshipsattheQTLloci
B
reeders have successfully improved crops despite
not knowing the genes affecting quantitative traits.
The numbers of genes controlling quantitative traits in
different crops are yet unknown, although rough esti-
33
2
2
∑∑
TAl = 2 ×
i=1 j=1
2
2
∑∑
I ij
=
i=1 j=1
1
I ij
2
3
IBSrelationshipsattheQTL
wh er e I is t h e iden t it y of t h e i a llele of t h e fir st
ij
4
2
(1 )
th
in dividu a l wit h t h e j th a llele of t h e secon d ( I ij t a kes
t h e va lu e of 1 if t h e t wo a lleles a r e iden t ica l a n d zer o
if t h ey a r e n ot ); TA is t h e t ot a l a llelic r ela t ion sh ip.
Th e coefficien t of 2 em ph a sizes t h a t TA is t wice t h e
coefficien t of r ela t ion sh ip (Ma lécot , 1 9 4 8 ) a n d is
a n a logou s t o t h e n u m er a t or r ela t ion sh ip (Wr igh t ,
1922). Tot a l a llelic iden t it y of in dividu a ls x a n d y
a ver a ged over L loci a ffect in g t h e t r a it is t h en
4
5
6


1.8
.8
1.2
1.6
1.2
1.6
.8
1.4
1.0
1.2
1.2
1.2
1.2
1.0
1.2
1.2
1.0
1.2
1.6
1.2
1.2
1.8
1.4
1.8
1.2
1.2
1.0
1.4
1.4
1.4
1.6
1.2
1.2
1.8
1.4
1.8
Th e elem en t s of TA m a t r ix follow dir ect ly f
a pplica t ion of E qu a t ion 2 t o t h e in for m a t ion
gen ot ype a t t h e five loci. Alt h ou gh in dividu a ls 3
a r e a ll pr ogen y of a sin gle pa ir of pa r en t s, h en ce f
2
2
sibs, t h eir t ot a l a llelic r ela t ion sh ip r a n ges fr om 1.
I
∑
∑
lij
L
L
1.8, in con t r a st t o t h e va lu e of .5 for a ll pr og
i=1 j=1
∑ TAl ∑ ( 2 )
est im a t ed fr om pedigr ee in for m a t ion . Th eir pa r e
l=1
l=1
TAxy =
=
.
h a ve a lso a t ot a l a llelic r ela t ion sh ip of .8 wit h e
this
L
L
( 2 ) relationship can also be obtained in a mathematical form
ot h er a s com pa r ed wit h zer o r ela t ion sh ip in
without counting as (TORO et al. 2011):
pedigr𝑟 ee =
m et
od.−Tot
l a llelic
r ela t ion sh ip wou ld
𝑚3h𝑚
𝑚3 a−𝑚
As a n exa m ple, con sider t h e followin g t wo in dividu a ls,
a34
4
4 +2
ch a n ge even if t h er e wer e n o pedigr ee r ela t ion s
a m on g t h ese six in dividu a ls.
Th er e a r e t wo pr in cipa l differ en ces bet ween
In dividu a l 1
In dividu a l 2
m et h od of t ot a l a llelic r ela t ion sh ip a n d 34t h e st a n d
IBSrelationshipsatthemarkers
• Theserelationshipsare twiceprobabilities,andhenceoscillate
between0and2(nonegativevalues)
• BecausewedonotknowallQTLs,weusedensemarkersinstead
Hence:
• 𝑮\tu isagenomicrelationshipmatrixbasedonIdentityByStateat
themarkers
_ (aswesaw
• And𝑮\tu isalso abiasedestimateofrealizedIBDrelationships𝑨
before)
35
Estimationofmarkereffects
• IfeverymarkerisaQTL,awaytoestimatemarkereffects𝒂 istousea
BayesianmethodcalledRRBLUP,SNPBLUP,“Ridge”…
𝒚 = 𝑿𝒃 + 𝑴𝒂 + 𝒆
ortaking 𝑴 − 2𝑷 = 𝒁
𝒚 = 𝑿𝒃 + 𝒁𝒂 + 𝒆
36
Frommarkereffectstocovariances
• Underreasonableassumptions,
• Var 𝒂 = 𝑰
geneticvariance
:∑op qp
= 𝑰𝜎€:
• Definegeneticvalueas𝐮 = 𝒁𝒂
• Covarianceofgeneticvaluesis𝐶𝑜𝑣 𝒖 =
𝒁𝒁$
geneticvariance
:∑op qp
Sothattherelationshipmatrixforthismodelisagain
𝒁𝒁$
𝑮=
2∑𝑝3 𝑞3
37
Recapitulation
• Bythreeways(estimationofrealizedrelationships,IBSattheQTL,
RRBLUP)wearrivetosimilarorsamemodels,andinparticularto
𝑴 − 2𝑷 𝑴 − 2𝑷
𝑮=
2∑𝑝3 𝑞3
$
• Thisisbecausethesameconceptsareusedoverandover
• ButaretheseGBLUPsreallythesame?
38
GBLUP==RRBLUP
• Theequivalence is tersely shown inVanRaden 2008andfully shown inStrandén
andGarrick2009(JDS)
• Moreequivalences areshown inStrandén andChristensen2011(GSE)
• Shifting ofgenotypes in𝒁 = 𝑴 − 2𝑷
• Irrelevantifthereisanoverallmeanorfixedeffectinthemodel
• EBVsareshiftedbyaconstant
• Scaling𝑮 by2∑𝑝3 𝑞3 andmarkereffectpriorvariance𝜎€: mustbe
geneticvariance
:
equivalent,i.e.𝜎€ =
:∑o q
p p
• Often,thisnotdonebecausevariancesareestimatedby,e.g.REML,which
explainstheminordifferences
39
GBLUP==RRBLUP
• We can jumpfrom GBLUPtoRRBLUP
ƒ = 𝒁𝒂
ƒ
𝒖
ƒ=
𝒂
1
2∑𝑝3 𝑞3
𝒁$ 𝑮%𝟏 𝒖
„
40
GBLUP==GBLUP
• Forallmatricesofthekind𝑮 =
𝑴%:𝑷 𝑴%:𝑷 m
:∑op qp
• Changingallelefrequenciesin𝑷 shiftsEBV’sbyaconstant
•
Q
Changingallelefrequenciesin:∑o q
p p
“scales”
• Butwecancompensatethroughachangeinthe“geneticvariance”
• E. g.
1.1 0.55
1 0.5
10 =
11
0.55 1.1
0.5 1
• So,ifvariancesareestimated byREMLor« corrected »according to2∑𝑝3 𝑞3
results should be identical :see exercises
41
GBLUP==GBLUPwith IBS
• Infact,𝑮𝑰𝑩𝑺 =
𝑴%:𝑷 𝑴%:𝑷 m
ˆ
+ 𝟏𝟏′
• WithP containing0.5
• So,𝑮𝑰𝑩𝑺 isa“VanRaden style”G matrix
• Again,usingtherightvariancecomponentswegetthesameEBVs
42
OK,so what should Iuse?
• Doesnotmattermuch,allmodelsareequivalent
• IfusingREMLorBayesianmethodsyougetthevariancecomponents
right
• Ifyouusepre-estimatedvariancecomponentsyouwanttouse
comparablevariances
• Thisisabittrickybutinmostcases“default”G worksjustfine
• Comparingvariancecomponentsandℎ: acrossG’sgetstricky
• E.g.𝑮𝑰𝑩𝑺 overestimatesℎ: withrespecttoA
• SeeLegarra2016,TPB,fortheseaspects
43
Compatibility ofmarker andpedigree
relationships
• Populationsevolve with time,but genotypes came years after
pedigree started
• Genomic Predictions areshifted from Pedigree Predictions
• Compatibility is achieved if both relationships refer tothe same
genetic base:
• Same average BVatthe base
• Same genetic variance atthe base
• Quiteactivework for the SSGBLUP
44
Finally,the SingleStep GBLUP
• You want tocombineG inapart ofthe population andA inall the
population
• But infact,A contains information about the “likely”genotypes of
animals that have not been genotyped (e.g.,the daughter ofan
animal“AA”will receive an “A”allele)
45
Covariancesofallindividuals
Legarraetal.2009;Aguilaretal.,2010;Christensen &Lund,2010
⎛ u1 ⎞
⎡ H11 H12 ⎤
Var ⎜ ⎟ = H = ⎢
=
non genotyped
⎥
⎝ u2 ⎠
⎣ H 21 H 22 ⎦
⎡ A11 − A12 A −221 A 21 + A12 A −221GA −221A 21 A12 A −221G ⎤
⎢
⎥
−1
GA
A
G
⎣
22 21
⎦
genotyped
non genotyped
Let
⎡ A11
A= ⎢
⎣ A 21
A12 ⎤
A 22 ⎥⎦
46
Covariancesofallindividuals
This is the variance of prediction
of genotypes from genotyped to
non-genotyped
⎛ u1 ⎞
⎡ H11 H12 ⎤
Var ⎜ ⎟ = H = ⎢
=
⎥
⎝ u2 ⎠
⎣ H 21 H 22 ⎦
−1
−1
−1
⎡ A11 − A12 A −221 A 21 + A12 A 22
GA 22
A 21 A12 A 22
G⎤
⎢
⎥
−1
GA 22 A 21
G ⎥⎦
This is the error in the
⎢⎣
prediction
The prediction « generates » a
covariance
G comes from genotypes
47
⎛ u1 ⎞
⎡ H11 H12 ⎤
Var ⎜ ⎟ = H = ⎢
=
⎥
⎝ u2 ⎠
⎣ H 21 H 22 ⎦
⎡ A11 − A12 A −221 A 21 + A12 A −221GA −221A 21 A12 A −221G ⎤
⎢
⎥
−1
GA 22 A 21
G ⎦
⎣
• Incredibly: H-1 is very simple:
…and avoiding « double
counting »
Inverse of the regular pedigree
relationship matrix
Correcting for genomic
relationships…
48
Fun
• Relationshipsacrosstechnicallyunrelatedpopulations
49
Lacaune
Manech Tête Rousse
Latxa CaraNegra- Euskadi
Basco Bearnaise
Manech Tête Noire
Latxa CaraNegra- Navarre
Latxa CaraRubia
50
PCA
• First component distinguish Lacaune from the rest
• Two Lacaune sub-populations corresponding to two AIcenters
(that donot exchange)
• LCR-MTRoverlap (recent exchanges oframs)
• LCNNAFbetween MTNandLCNEUS(exchanges but less
frequent)
• BBisolated
51
PC 2
0.04
0.03
0.02
0.01
0.00
-0.01
-0.02
0.00
BB
LCN-EUS
0.01
MTR
LCR
LCN-NA
MTN
PCA of 7 dairy sheep breeds
Lacaune
-0.01
PC 1
0.02
52
Asaconclusion
• Markershavechangedourwayofthinkinginrelationships
• Wecancopymany,butnotall,conceptsfrompedigreerelationships
togenomicrelationships
• E.g.relationshipsarenotboundedtoprobabilities
• Mostderivationsareplaintranspositionsfromclassicpapers
• Mostinformationis“outthere”butreadingandlinkingthedifferent
informations takestime
• Takeyourdatasetandhavefun
53
Moreinfo:
• FinancingthroughINRASelGen metaprogram
• Notesinmywebpage
• UGAIgnacy Misztal’s course(http://nce.ads.uga.edu )
• and…
54
http://icms.org.uk/workshops/statistical
Inparticular, seeDavidBalding talk
55
Aguilar,I.,I.Misztal,D.L.Johnson,A.Legarra,S.Tsuruta etal.,2010Hottopic:aunifiedapproachtoutilizephenotypic,fullpedigree,andgenomicinformationforgeneticevaluationofHolsteinfinalscore.JDairySci 93:
743-752.
Aguilar,I.,I.Misztal,A.LegarraandS.Tsuruta,2011Efficientcomputationsofgenomicrelationshipmatrixandothermatricesusedinthesingle-stepevaluation.JournalofAnimalBreedingandGenetics128: 422-428.
Emik,L.O.,andC.E.Terrill,1949Systematicproceduresforcalculatinginbreedingcoefficients.JHered 40: 51-55.
LI,C.C.,andD.G.HORVITZ,1953Somemethodsofestimatingtheinbreedingcoefficient.AmJHumGenet5: 107-117.
Cockerham,C.C.,1969Varianceofgenefrequencies.Evolution23: 72-84.
Ritland,K.,1996Estimatorsforpairwiserelatednessandindividualinbreedingcoefficients.Genetical research67: 175-185.
Caballero,A.,andM.A.Toro,2002Analysisofgeneticdiversityforthemanagementofconservedsubdividedpopulations.Conservationgenetics3: 289.
VanRaden,P.M.,2008EfficientMethodstoComputeGenomicPredictions.J.DairySci.91: 4414-4423.
Hill,W.G.,andB.S.Weir,2011VariationinactualrelationshipasaconsequenceofMendeliansamplingandlinkage.GenetRes (Camb): 1-18.
Yang,J.,B.Benyamin,B.P.McEvoy,S.Gordon,A.K.Henders etal.,2010CommonSNPsexplainalargeproportionoftheheritabilityforhumanheight.NatGenet42: 565-569.
VanRaden,P.M.,C.P.V.Tassell,G.R.Wiggans,T.S.Sonstegard,R.D.Schnabel etal.,2009Invitedreview:reliabilityofgenomicpredictionsforNorthAmericanHolsteinbulls.JDairySci 92: 16-24.
Legarra, A.,I.AguilarandI.Misztal,2009Arelationshipmatrixincludingfullpedigreeandgenomicinformation.JDairySci 92: 4656-4663.
Christensen,O.F.,andM.S.Lund,2010Genomicpredictionwhensomeanimalsarenotgenotyped.GenetSel Evol 42: 2.
Tanner,M.A.,andW.H.Wong,1987Thecalculationofposteriordistributionsbydataaugmentation.JournaloftheAmericanStatisticalAssociation82: 528-540.
Gengler, N.,P.Mayeres andM.Szydlowski,2007Asimplemethodtoapproximategenecontentinlargepedigreepopulations:applicationtothemyostatin geneindual-purposeBelgianBluecattle.animal1: 21-28.
McPeek,M.S.,X.WuandC.Ober,2004Bestlinearunbiasedallele-frequencyestimationincomplexpedigrees.Biometrics60: 359-367.
Christensen,O.F.,2012Compatibilityofpedigree-basedandmarker-basedrelationshipmatricesforsingle-stepgeneticevaluation.GENETICSSELECTIONEVOLUTION44: 37.
Lourenco,D.,I.Misztal,S.Tsuruta,I.Aguilar,T.Lawlor etal.,2014Areevaluationsonyounggenotypedanimalsbenefitingfromthepastgenerations?JournalofDairyScience97: 3930-3942.
Chen,C.,I.Misztal,I.Aguilar,S.Tsuruta,S.Aggrey etal.,2011Genome-widemarker-assistedselectioncombiningallpedigreephenotypicinformationwithgenotypicdatainonestep:Anexampleusingbroilerchickens.
JournalofAnimalScience89: 23-28.
Vitezica,Z.,I.Aguilar,I.Misztal andA.Legarra,2011Biasingenomicpredictionsforpopulationsunderselection.GeneticsResearch: Inpress.
Christensen,O.F.,2012Compatibilityofpedigree-basedandmarker-basedrelationshipmatricesforsingle-stepgeneticevaluation.GENETICSSELECTIONEVOLUTION44: 37.
Jacquard,A.,1970Genetic structuresofpopulations.Structuresgenetiques despopulations.
General review:
Legarra, A.,O.F.Christensen,I.AguilarandI.Misztal,2014SingleStep,AGeneralApproachForGenomicSelection.LivestockScience.
56
Thanks
• LeopoldoAlfonso,CatherineBastien fortheinvitation
• Mylong-timecollaboratorsanddiscussersinthis,IMisztal,IAguilar,
MAToro,LAGarcia-Cortes,LVarona,ZGVitezica,OFChristensen,PM
VanRaden,JMElsen,ARicardandmanyothers
• FinancingthroughINRASelGen metaprogram
57