Download Quantitative Trait Loci, QTL An introduction to

Quantitative Trait Loci, QTL An introduction to quantitative genetics and common methods for mapping of loci underlying continuous traits: Why study quantitative traits? • Many (most) human traits/disorders are complex in the sense that they are governed by several genetic loci as well as being influenced by environmental agents; • Many of these traits are intrinsically continuously varying and need specialized statistical models/methods for the localization and estimation of genetic contributions; • In addition, in several cases there are potential benefits from studying continuously varying quantities as opposed to a binary affected/unaffected response: For example: • in a study of risk factors the underlying quantitative phenotypes that predispose disease may be more etiologically homogenous than the disease phenotype itself; • some qualitative phenotypes occur once a threshold for susceptibility has been exceeded, e.g. type 2 diabetes, obesity, etc.; • in such a case the binary phenotype (affected/unaffected) is not as informative as the actual phenotypic measurements; A pedigree representation Variance and variability • methods for linkage analysis of QTL in humans rely on a partitioning of the total variability of trait values; • in statistical theory, the variance is the expected squared deviation round the mean value, Y  E (Y ) : V (Y )  E[(Y  Y ) 2 ]; • it can be estimated from data as: 1 n s  i 1 ( yi  y ) 2 ; n 2 • the square root of the variance is called the standard deviation; A simple model for the phenotype Y=X+e where • Y is the phenotypic value, i.e. the trait value; • X is the genotypic value, i.e. the mean or expected phenotypic value given the genotype; • e is the environmental deviation with mean 0. • We assume that the total phenotypic variance is the sum of the genotypic variance and the environmental variance, V (Y ) = V (X ) + V (e), i.e. the environmental contribution is assumed independent of the genotype of the individual; Distribution of Y : a single biallelic locus A single biallelic locus: genetic effects Genotype Genotypic value • a is the homozygous effect, • k is the dominance coeffcient • k = 0 means complete additivity, • k = 1 means complete dominance (of A2), • k > 1 if A2 is overdominant. Example: The pygmy gene, pg • From data we have the following mean values of weight: X++ = 14g, X+pg = 12g, Xpgpg = 6g, • 2a = 14 -6 = 8 implies a =4, • (1 + k)a = 12 - 6 = 6 implies k = 0.5. Data suggest recessivity (although not complete) of the pygmy gene. Decomposition of the genotypic value, X • Xij is the mean of Y for AiAj-individuals; • when k = 0 the two alleles of a biallelic locus behaves in a completely additive fashion: X is a linear function of the number of A2-alleles; • we can then think of each allele contributing a purely additive effect to X ; • this can be generalized to k ≠ 0 by decomposition of X into additive contributions of alleles together with deviations resulting from dominance; • the generalization is accomplished using leastsquares regression of X on the gene content; Least-squares linear regression  X = X̂ +  , i.e. fitted value  residual deviation;  minimize the sum of squared residuals;  V ( X )  V (X̂ )  V ( ), variance decomposit ion Model 1  X i j  Xˆ ij   ij     i   j   ij   is the population mean phenotype,  i is the additive effect of allele Ai ,  ij is the residual deviation due to dominance;   Xˆ ij    1 N1   2 N 2 , with N k the number of Ak - alleles in the genotype;    21  ˆ X ij    1   2   2 2  for A1 A1 , for A1 A2 , for A2 A2 . 1 p1   2 p2  0    2  1  1   p2  2  p1 Interpretations • in the linear regression X  Xˆ   Xˆ is the heritable component of the genotype, δis the non-heritable part; • the sum of an individuals additive allelic effects, αi+αj is called the breeding value and is denoted Λij • under random mating αican be interpreted as the average excess of allele Ai • this is defined as the difference between the expected phenotypic value when one allele (e.g. the paternally transmitted) is fixed at Ai and the population average, μ; Linear Regression  pk  proportion of Ak - alleles in population;  the expected additive effect of a randomly drawn allele is 0, i.e.  1 p1   2 p2  0 ;  which implies the corresponding population variance  12 p1   22 p2  since for a bialleliclocus N1  2-N 2 , X  ~  N   ij where ~    2 1 ,    2  1. 2 ij Graphically Linear Regression Model solving  • X ij  ~    N 2   ij X N2 prob. 0 0 p12 a(1+k) 1 2 p1 p2 2a 2 cov( X , N 2 ) •  var( N 2 ) p 2 2 E ( X )  a(1  k )  2 p1 p2  2ap22  2ap2 (1  p1k ) V ( X )  a (1  k )  2 p1 p2  4a p  4a p (1  p1k ) 2 2 2 2 2 2 E ( N 2 )  2 p2 Var ( N 22 )  2 p1 p2 E ( XN 2 )  a(1  k ) 1 2 p1 p2  2a  2 p  2ap2 (2 p2  p1 (1  k )) 2 2 2 2 2 COV ( X , N 2 )  2ap2 [2 p2 (1  p1k )  2 p2  p1 (1  k )]  2ap1 p2 [1  k  2 p2 k ]  2ap1 p2 [1  k ( p1  p2 )]    a  [1  k ( p1  p2 )] average excesses  i*  E ( X | one allele is i )   X 1*  X 12 p(another one is 2 | 1)  X 11 p(another one is 1 | 1)   X randommating  X 12 p2  X 11 p1   X  (1   2 ) p2  (21 ) p1  1 Interpretations under random mating • α= a [1+ k (p1-p2)] ; α= - p2 α; α= p1 α, Population parameters for k≠0 • α is called the average effect of allelic substitution: substitute A1 A2for a randomly chosen A1 –allele • then the expected change in X is, (X12 -X11) p1 + (X22 -X12) p2 ; • which equals α. (simple calculations).  : Average effect of allelic substitution A1 A2 A2 A1 A2 A1 p1 ( X 12  X 11 )  p2 ( X 22  X 12 )  p1  a(1  k )  p2  a(1  k )  a  (1  k ( p1  p2 )) α is a function of p2 and k : Partitioning the genetic variance • the variance, V (X ), of the genotypic values in a population is called the genetic variance: V ( X )  V ( Xˆ   )  V ( Xˆ )  V ( )  VA  VD • VA  2 p1 p2 2  2( p112  p2 22 ) is the additive genetic variance, i.e. variance associated with additive allelic effects; • VD  (2 p1 p2 ak ) 2 dominance genetic variance, i.e. due to dominance deviations; VA VA  2( p112  p2 22 )  p11  p2 2  0 VA  2 p1 p2  p 4 ( Linear 2 2 2 2  (2 p1 p2  2p22 ) 2  2 p1 p2 2  2 p1 p2 a 2 [1  k ( p1  p2 )]2 regression ) V (X); VA; VD are functions of p2 and k: VA  [dashed ]  2 p1 p2 [a(1  k ( p1  p2 ))]2 ; VD  [dotted ]  (2 p1 p2 ak ) 2 ; Example: The Booroola gene, (Lynch and Walsh, 1998) In summary • The homozygous effect a, and the dominance coefficient k are intrinsic properties of allelic products. • The additive effect αi, and the average excess αi* are properties of alleles in a particular population. • The breeding value is a property of a particular individual in reference to a particular population. It is the sum of the additive effects of an individual's alleles. • The additive genetic variance, VA, , is a property of a particular population. It is the variance of the breeding values of individuals in the population. Multilocus traits • Do the separate locus effects combine in an additive way, or do there exist non-linear interaction between different loci: epistasis? • Do the genes at different loci segregate independently? • Do the gene expression vary with the environmental context: gene by environment interaction? • Are specic genotypes associated with particular environments: covariation of genotypic values and environmental effects? Example: epistasis Average length of vegetative internodes in the lateral branch (in mm) of teosinte. Table from Lynch and Walsh (1998). Two independently segregating loci • Extending the least-squares decomposition of X : X    1  1  2   2   • Λk is the breeding value of the k'th locus, δk is the dominance deviation of the k'th locus, ε is a residual term due to epistasis; • if the loci are independently segregating V ( X )  V (1 )  V ( 2 )  V (1 )  V ( 2 )  V ( )  VA,1  VA,2  VD,1  VD,2  V ( )  VA  VD  V ( ) Neglecting V (ε) • the epistatic variance components contributing to V (ε) are often small compared to VA and VD; • in linkage analysis it is this often assumed that V (ε) = 0; • note however: the relative magnitude of the variance components provide only limited insight into the physiological mode of gene action; • epistatic interactions, can greatly inflate the additive and/or dominance components of variance; Resemblance between relatives A model for the trait values of two relatives: Yk = Xk + ek, k = 1 , 2, where for the k’th relative • Yk is the phenotypic value, • Yk is the genotypic value, • ek is the mean zero environmental deviation. • the ek’s are assumed to be mutually independent and also independent of k. Hence, the covariance of the trait values of two relatives is given by the genetic covariance, C(X1; X2), i.e. C(Y1; Y2) = C(X1; X2) A (preliminary) formula for C(X1 ,X 2) For a single locus trait C(X1; X2) = c1VA + c2VD • c1 and c2 are constants determined by the type of relationship between the two relatives. • same formula applies for multilocus traits if no epistatic variance components are included in the model, i.e. V (ε) = 0. • in this latter case and are given by summation of the corresponding locus-specific contributions. Joint distribution of sibling trait values Single biallelic, dominant (k =1 ) model. Correlation 0.46. Measures of relatedness • N = the number of alleles shared IBD by two relatives at a given locus; • the kinship coefficient, θ , is given by 2 θ = E(N) / 2; i.e. twice the kinship coefficient equals the expected proportion of alleles shared IBD at the locus. • The coefficient of fraternity, Δ, is defined as Δ = P(N = 2). Some examples • Siblings (z0; z1; z2) = (1/4; 1/2; 1/4) implying E(N) = 1. Thus θ= 1/4 and Δ = 1/4: • Parent-offspring (z0; z1; z2) = (0; 1; 0) implying E(N) = 1. Thus θ = 1/4 and Δ = 0: • Grandparent - grandchild (z0; z1; z2) = (1/2; 1/2; 0) implying E(N) = 1=2. Thus θ = 1/8 and Δ = 0: Covariance formula for a single locus Under the assumed model X 1     i1   1j   ij1 X 2     i2   2j   ij2 Cov( X 1 , X 2 )  Cov( i1   1j ,  i2   2j )  Cov( ij1 ,  ij2 ) C (Y1 , Y2 )  C ( X 1 , X 2 )  2θVA  VD E(N )  VA  P( N  2)VD 2 A single locus; perfect marker data N C(Y1,Y2|N)  VA  I  N  2 VD 2 with 1 if N  2 I {N  2}   0 if N  0 or N  1 i.e. if N  0 0  C (Y1,Y2|N)  VA / 2 if N  1 V  V if N  2 D  A Covariance formula for multiple loci n independently segregating loci assuming no epistatic interaction, i.e. putting V (ε) = 0 C (Y1 , Y2 )  C ( X 1 , X 2 )  2 VA  VD  2  l VA,l   l VD ,l  E( Nl )   l  VA,l  P ( N l  2) VD ,l  ;  2  N l is the mumber of alleles shared IBD at locus l ; V A,l , VD ,l are locus - specific additive - and dominace variance contributi ons, respective ly. Covariance formula for multiple loci n independently segregating loci assuming no epistatic interaction, i.e. putting V (ε) = 0 C (Y1 , Y2 )  C ( X 1 , X 2 )  2 VA  VD  2  l VA,l   l VD ,l  E( Nl )   l  VA,l  P ( N l  2) VD ,l  ;  2  N l is the mumber of alleles shared IBD at locus l ; V A,l , VD ,l are locus - specific additive - and dominace variance contributi ons, respective ly. Covariance... continued Define for every pair of relatives  (x)  E[ Nx | MDx] / 2; and  2(x)  P( Nx  2 | MDx); For two related individuals we then have, C (Y1 , Y2 | MD x )   l E[ N l | MD x ] ( VA,l  P( N l  2 | MD x )VD ,l ) ; 2   VA, x   2 VD , x  2VA, x  VD , R ( x) ( x) Haseman-Elston method • Uses pairs of relatives of the same type: most often sib pairs; • for each relative pair calculate the squared phenotypic difference: Z = (Y1 –Y2)2; • given MDx regress the Z's on the expected proportion of alleles IBD, π(x) = E [Nx |MDx]/2, at the test locus; • a slope coefficient β< 0, if statistically significant, is considered as evidence for linkage; HE: an example 0.5 Proportion of marker alleles identical by decent Solid line is the tted regression line; Dotted line indicates true underlying relationship HE: motivation E[(Y1  Y2 ) ]  V [Y1  Y2 ] 2  V (Y1 )  V (Y2 )  2C (Y1  Y2 )  2V (Y )  2C (Y1  Y2 ) Assume strictly additive gene action at each locus, i.e.VD = 0. Then, for a putative QTL at x, E[(Y1  Y2 ) 2 | MD x ]  2V (Y )  2C (Y1  Y2 | MD x )  2V (Y )  2[ ( x )VA, x  2VA, R ] NOTE : This is a linear function in  ( x ) ! HE: linkage test E[Y1 , Y2|MD x ]     ( x ) where   2[V (Y )  2VA, R ]   2VA, x The linkage test is H0 :   0, ( VA, x  0) vs H1 :   0 HE: examples with simulated data simulated data from n = 200 sib-pairs; top to bottom: h2 = 0:50; 0:33; 0:25. Heritability and power • for a given locus we may define the locus-specific heritability as the proportion of the total variance 'explained' by that particular site, e.g. (in the narrowsense), V h2  A V (Y ) • the locus-specific heritability is the single most important parameter for the power of QTL linkage methods; • heritabilities below  10% leads, in general, to unrealistically large sample sizes. HE: two-point analysis ~ ( m) ~ E[(Y1  Y2 ) | marker genotypes]      2 where  is the expected proportion of marker alleles shared IBD. ~ •  depends on the type of relatives considered; ~ • for sib pairs   2(1  2 ) 2VA,l ; • recombination fraction (θ) and effect size (VA;l ) are confounded and cannot be separately estimated; (m ) HE: in summary Simple, transparent and comparatively robust but: • • • • poor statistical power in many settings; different types of relatives cannot be mixed; parents and their offspring cannot be used in HE; assumptions of the statistical model not generally satisfied; • Remedy: • use one of several suggested extensions of HE; • alternatively, use VCA instead VCA QTL Polygenes Independent environment Mathematically: Yi=+Tai+gi+qi+ei Trait value where  is the population mean, a are the “environmental” predictor variables, q is the major trait locus, g is the polygenic effect, and e is the residual error. VCA: an additive model p n i 1 l 1 Y      i zi   X l  e E (Y )    i 1  i zi ; p V (Y )  VA  VD  V (e)  VA, x  VD , x  VA, R  VD , R  V (e) C (Y1 , Y2 | MD x )   VA, x   V ( x) ( x) 2 D, x  2VA, R  VD , R VCA: major assumption The joint distribution of the phenotypic values in a pedigree is assumed to be multivariate normal with the given mean values, variances and covariances; • the multivariate normal distribution is completely specified by the mean values, variances and covariances; • the likelihood, L, of data can be calculated and we can estimate the variance components VA;x; VD;x ; VA;R; VD;R; VCA: linkage test The linkage test of H0 : VA;x = VD;x = 0 uses the LOD score statistic L(full model) LOD x  log 10 L(VA, x  VD, x  0) When the position of the test locus, x, is varied over a chromosomal region the result can be summarized in a LOD score curve. VCA vs HE: LOD score proles From Pratt et al.; Am. J. Hum. Genet. 66:1153-1157, (2000) Linkage methods for QTL • Fully parametric linkage approach is difficult; • Model-free tests comprise the alternative choice; • We will discuss Haseman-Elston Regression (HE); Variance Components Analysis (VCA); Both can be viewed as two-step procedures: 1. use polymorphic molecular markers to extract information on inheritance patterns; 2. evaluate evidence for a trait-influencing locus at specified locations; Similarities and differences • HE and VCA are based on estimated IBDsharing given marker data; • both methods require specification of a statistical model! ('model-free' means 'does not require specification of genetic model') • similarity in IBD-sharing is used to evaluate trait similarity using either linear regression (HE) or variance components analysis (VCA);

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Quantitative Trait Loci, QTL An introduction to