* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Quantitative Trait Loci, QTL An introduction to
Gene expression programming wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Designer baby wikipedia , lookup
Human genetic variation wikipedia , lookup
Genetic drift wikipedia , lookup
Microevolution wikipedia , lookup
Behavioural genetics wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Population genetics wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Quantitative Trait Loci, QTL An introduction to quantitative genetics and common methods for mapping of loci underlying continuous traits: Why study quantitative traits? • Many (most) human traits/disorders are complex in the sense that they are governed by several genetic loci as well as being influenced by environmental agents; • Many of these traits are intrinsically continuously varying and need specialized statistical models/methods for the localization and estimation of genetic contributions; • In addition, in several cases there are potential benefits from studying continuously varying quantities as opposed to a binary affected/unaffected response: For example: • in a study of risk factors the underlying quantitative phenotypes that predispose disease may be more etiologically homogenous than the disease phenotype itself; • some qualitative phenotypes occur once a threshold for susceptibility has been exceeded, e.g. type 2 diabetes, obesity, etc.; • in such a case the binary phenotype (affected/unaffected) is not as informative as the actual phenotypic measurements; A pedigree representation Variance and variability • methods for linkage analysis of QTL in humans rely on a partitioning of the total variability of trait values; • in statistical theory, the variance is the expected squared deviation round the mean value, Y E (Y ) : V (Y ) E[(Y Y ) 2 ]; • it can be estimated from data as: 1 n s i 1 ( yi y ) 2 ; n 2 • the square root of the variance is called the standard deviation; A simple model for the phenotype Y=X+e where • Y is the phenotypic value, i.e. the trait value; • X is the genotypic value, i.e. the mean or expected phenotypic value given the genotype; • e is the environmental deviation with mean 0. • We assume that the total phenotypic variance is the sum of the genotypic variance and the environmental variance, V (Y ) = V (X ) + V (e), i.e. the environmental contribution is assumed independent of the genotype of the individual; Distribution of Y : a single biallelic locus A single biallelic locus: genetic effects Genotype Genotypic value • a is the homozygous effect, • k is the dominance coeffcient • k = 0 means complete additivity, • k = 1 means complete dominance (of A2), • k > 1 if A2 is overdominant. Example: The pygmy gene, pg • From data we have the following mean values of weight: X++ = 14g, X+pg = 12g, Xpgpg = 6g, • 2a = 14 -6 = 8 implies a =4, • (1 + k)a = 12 - 6 = 6 implies k = 0.5. Data suggest recessivity (although not complete) of the pygmy gene. Decomposition of the genotypic value, X • Xij is the mean of Y for AiAj-individuals; • when k = 0 the two alleles of a biallelic locus behaves in a completely additive fashion: X is a linear function of the number of A2-alleles; • we can then think of each allele contributing a purely additive effect to X ; • this can be generalized to k ≠ 0 by decomposition of X into additive contributions of alleles together with deviations resulting from dominance; • the generalization is accomplished using leastsquares regression of X on the gene content; Least-squares linear regression X = X̂ + , i.e. fitted value residual deviation; minimize the sum of squared residuals; V ( X ) V (X̂ ) V ( ), variance decomposit ion Model 1 X i j Xˆ ij ij i j ij is the population mean phenotype, i is the additive effect of allele Ai , ij is the residual deviation due to dominance; Xˆ ij 1 N1 2 N 2 , with N k the number of Ak - alleles in the genotype; 21 ˆ X ij 1 2 2 2 for A1 A1 , for A1 A2 , for A2 A2 . 1 p1 2 p2 0 2 1 1 p2 2 p1 Interpretations • in the linear regression X Xˆ Xˆ is the heritable component of the genotype, δis the non-heritable part; • the sum of an individuals additive allelic effects, αi+αj is called the breeding value and is denoted Λij • under random mating αican be interpreted as the average excess of allele Ai • this is defined as the difference between the expected phenotypic value when one allele (e.g. the paternally transmitted) is fixed at Ai and the population average, μ; Linear Regression pk proportion of Ak - alleles in population; the expected additive effect of a randomly drawn allele is 0, i.e. 1 p1 2 p2 0 ; which implies the corresponding population variance 12 p1 22 p2 since for a bialleliclocus N1 2-N 2 , X ~ N ij where ~ 2 1 , 2 1. 2 ij Graphically Linear Regression Model solving • X ij ~ N 2 ij X N2 prob. 0 0 p12 a(1+k) 1 2 p1 p2 2a 2 cov( X , N 2 ) • var( N 2 ) p 2 2 E ( X ) a(1 k ) 2 p1 p2 2ap22 2ap2 (1 p1k ) V ( X ) a (1 k ) 2 p1 p2 4a p 4a p (1 p1k ) 2 2 2 2 2 2 E ( N 2 ) 2 p2 Var ( N 22 ) 2 p1 p2 E ( XN 2 ) a(1 k ) 1 2 p1 p2 2a 2 p 2ap2 (2 p2 p1 (1 k )) 2 2 2 2 2 COV ( X , N 2 ) 2ap2 [2 p2 (1 p1k ) 2 p2 p1 (1 k )] 2ap1 p2 [1 k 2 p2 k ] 2ap1 p2 [1 k ( p1 p2 )] a [1 k ( p1 p2 )] average excesses i* E ( X | one allele is i ) X 1* X 12 p(another one is 2 | 1) X 11 p(another one is 1 | 1) X randommating X 12 p2 X 11 p1 X (1 2 ) p2 (21 ) p1 1 Interpretations under random mating • α= a [1+ k (p1-p2)] ; α= - p2 α; α= p1 α, Population parameters for k≠0 • α is called the average effect of allelic substitution: substitute A1 A2for a randomly chosen A1 –allele • then the expected change in X is, (X12 -X11) p1 + (X22 -X12) p2 ; • which equals α. (simple calculations). : Average effect of allelic substitution A1 A2 A2 A1 A2 A1 p1 ( X 12 X 11 ) p2 ( X 22 X 12 ) p1 a(1 k ) p2 a(1 k ) a (1 k ( p1 p2 )) α is a function of p2 and k : Partitioning the genetic variance • the variance, V (X ), of the genotypic values in a population is called the genetic variance: V ( X ) V ( Xˆ ) V ( Xˆ ) V ( ) VA VD • VA 2 p1 p2 2 2( p112 p2 22 ) is the additive genetic variance, i.e. variance associated with additive allelic effects; • VD (2 p1 p2 ak ) 2 dominance genetic variance, i.e. due to dominance deviations; VA VA 2( p112 p2 22 ) p11 p2 2 0 VA 2 p1 p2 p 4 ( Linear 2 2 2 2 (2 p1 p2 2p22 ) 2 2 p1 p2 2 2 p1 p2 a 2 [1 k ( p1 p2 )]2 regression ) V (X); VA; VD are functions of p2 and k: VA [dashed ] 2 p1 p2 [a(1 k ( p1 p2 ))]2 ; VD [dotted ] (2 p1 p2 ak ) 2 ; Example: The Booroola gene, (Lynch and Walsh, 1998) In summary • The homozygous effect a, and the dominance coefficient k are intrinsic properties of allelic products. • The additive effect αi, and the average excess αi* are properties of alleles in a particular population. • The breeding value is a property of a particular individual in reference to a particular population. It is the sum of the additive effects of an individual's alleles. • The additive genetic variance, VA, , is a property of a particular population. It is the variance of the breeding values of individuals in the population. Multilocus traits • Do the separate locus effects combine in an additive way, or do there exist non-linear interaction between different loci: epistasis? • Do the genes at different loci segregate independently? • Do the gene expression vary with the environmental context: gene by environment interaction? • Are specic genotypes associated with particular environments: covariation of genotypic values and environmental effects? Example: epistasis Average length of vegetative internodes in the lateral branch (in mm) of teosinte. Table from Lynch and Walsh (1998). Two independently segregating loci • Extending the least-squares decomposition of X : X 1 1 2 2 • Λk is the breeding value of the k'th locus, δk is the dominance deviation of the k'th locus, ε is a residual term due to epistasis; • if the loci are independently segregating V ( X ) V (1 ) V ( 2 ) V (1 ) V ( 2 ) V ( ) VA,1 VA,2 VD,1 VD,2 V ( ) VA VD V ( ) Neglecting V (ε) • the epistatic variance components contributing to V (ε) are often small compared to VA and VD; • in linkage analysis it is this often assumed that V (ε) = 0; • note however: the relative magnitude of the variance components provide only limited insight into the physiological mode of gene action; • epistatic interactions, can greatly inflate the additive and/or dominance components of variance; Resemblance between relatives A model for the trait values of two relatives: Yk = Xk + ek, k = 1 , 2, where for the k’th relative • Yk is the phenotypic value, • Yk is the genotypic value, • ek is the mean zero environmental deviation. • the ek’s are assumed to be mutually independent and also independent of k. Hence, the covariance of the trait values of two relatives is given by the genetic covariance, C(X1; X2), i.e. C(Y1; Y2) = C(X1; X2) A (preliminary) formula for C(X1 ,X 2) For a single locus trait C(X1; X2) = c1VA + c2VD • c1 and c2 are constants determined by the type of relationship between the two relatives. • same formula applies for multilocus traits if no epistatic variance components are included in the model, i.e. V (ε) = 0. • in this latter case and are given by summation of the corresponding locus-specific contributions. Joint distribution of sibling trait values Single biallelic, dominant (k =1 ) model. Correlation 0.46. Measures of relatedness • N = the number of alleles shared IBD by two relatives at a given locus; • the kinship coefficient, θ , is given by 2 θ = E(N) / 2; i.e. twice the kinship coefficient equals the expected proportion of alleles shared IBD at the locus. • The coefficient of fraternity, Δ, is defined as Δ = P(N = 2). Some examples • Siblings (z0; z1; z2) = (1/4; 1/2; 1/4) implying E(N) = 1. Thus θ= 1/4 and Δ = 1/4: • Parent-offspring (z0; z1; z2) = (0; 1; 0) implying E(N) = 1. Thus θ = 1/4 and Δ = 0: • Grandparent - grandchild (z0; z1; z2) = (1/2; 1/2; 0) implying E(N) = 1=2. Thus θ = 1/8 and Δ = 0: Covariance formula for a single locus Under the assumed model X 1 i1 1j ij1 X 2 i2 2j ij2 Cov( X 1 , X 2 ) Cov( i1 1j , i2 2j ) Cov( ij1 , ij2 ) C (Y1 , Y2 ) C ( X 1 , X 2 ) 2θVA VD E(N ) VA P( N 2)VD 2 A single locus; perfect marker data N C(Y1,Y2|N) VA I N 2 VD 2 with 1 if N 2 I {N 2} 0 if N 0 or N 1 i.e. if N 0 0 C (Y1,Y2|N) VA / 2 if N 1 V V if N 2 D A Covariance formula for multiple loci n independently segregating loci assuming no epistatic interaction, i.e. putting V (ε) = 0 C (Y1 , Y2 ) C ( X 1 , X 2 ) 2 VA VD 2 l VA,l l VD ,l E( Nl ) l VA,l P ( N l 2) VD ,l ; 2 N l is the mumber of alleles shared IBD at locus l ; V A,l , VD ,l are locus - specific additive - and dominace variance contributi ons, respective ly. Covariance formula for multiple loci n independently segregating loci assuming no epistatic interaction, i.e. putting V (ε) = 0 C (Y1 , Y2 ) C ( X 1 , X 2 ) 2 VA VD 2 l VA,l l VD ,l E( Nl ) l VA,l P ( N l 2) VD ,l ; 2 N l is the mumber of alleles shared IBD at locus l ; V A,l , VD ,l are locus - specific additive - and dominace variance contributi ons, respective ly. Covariance... continued Define for every pair of relatives (x) E[ Nx | MDx] / 2; and 2(x) P( Nx 2 | MDx); For two related individuals we then have, C (Y1 , Y2 | MD x ) l E[ N l | MD x ] ( VA,l P( N l 2 | MD x )VD ,l ) ; 2 VA, x 2 VD , x 2VA, x VD , R ( x) ( x) Haseman-Elston method • Uses pairs of relatives of the same type: most often sib pairs; • for each relative pair calculate the squared phenotypic difference: Z = (Y1 –Y2)2; • given MDx regress the Z's on the expected proportion of alleles IBD, π(x) = E [Nx |MDx]/2, at the test locus; • a slope coefficient β< 0, if statistically significant, is considered as evidence for linkage; HE: an example 0.5 Proportion of marker alleles identical by decent Solid line is the tted regression line; Dotted line indicates true underlying relationship HE: motivation E[(Y1 Y2 ) ] V [Y1 Y2 ] 2 V (Y1 ) V (Y2 ) 2C (Y1 Y2 ) 2V (Y ) 2C (Y1 Y2 ) Assume strictly additive gene action at each locus, i.e.VD = 0. Then, for a putative QTL at x, E[(Y1 Y2 ) 2 | MD x ] 2V (Y ) 2C (Y1 Y2 | MD x ) 2V (Y ) 2[ ( x )VA, x 2VA, R ] NOTE : This is a linear function in ( x ) ! HE: linkage test E[Y1 , Y2|MD x ] ( x ) where 2[V (Y ) 2VA, R ] 2VA, x The linkage test is H0 : 0, ( VA, x 0) vs H1 : 0 HE: examples with simulated data simulated data from n = 200 sib-pairs; top to bottom: h2 = 0:50; 0:33; 0:25. Heritability and power • for a given locus we may define the locus-specific heritability as the proportion of the total variance 'explained' by that particular site, e.g. (in the narrowsense), V h2 A V (Y ) • the locus-specific heritability is the single most important parameter for the power of QTL linkage methods; • heritabilities below 10% leads, in general, to unrealistically large sample sizes. HE: two-point analysis ~ ( m) ~ E[(Y1 Y2 ) | marker genotypes] 2 where is the expected proportion of marker alleles shared IBD. ~ • depends on the type of relatives considered; ~ • for sib pairs 2(1 2 ) 2VA,l ; • recombination fraction (θ) and effect size (VA;l ) are confounded and cannot be separately estimated; (m ) HE: in summary Simple, transparent and comparatively robust but: • • • • poor statistical power in many settings; different types of relatives cannot be mixed; parents and their offspring cannot be used in HE; assumptions of the statistical model not generally satisfied; • Remedy: • use one of several suggested extensions of HE; • alternatively, use VCA instead VCA QTL Polygenes Independent environment Mathematically: Yi=+Tai+gi+qi+ei Trait value where is the population mean, a are the “environmental” predictor variables, q is the major trait locus, g is the polygenic effect, and e is the residual error. VCA: an additive model p n i 1 l 1 Y i zi X l e E (Y ) i 1 i zi ; p V (Y ) VA VD V (e) VA, x VD , x VA, R VD , R V (e) C (Y1 , Y2 | MD x ) VA, x V ( x) ( x) 2 D, x 2VA, R VD , R VCA: major assumption The joint distribution of the phenotypic values in a pedigree is assumed to be multivariate normal with the given mean values, variances and covariances; • the multivariate normal distribution is completely specified by the mean values, variances and covariances; • the likelihood, L, of data can be calculated and we can estimate the variance components VA;x; VD;x ; VA;R; VD;R; VCA: linkage test The linkage test of H0 : VA;x = VD;x = 0 uses the LOD score statistic L(full model) LOD x log 10 L(VA, x VD, x 0) When the position of the test locus, x, is varied over a chromosomal region the result can be summarized in a LOD score curve. VCA vs HE: LOD score proles From Pratt et al.; Am. J. Hum. Genet. 66:1153-1157, (2000) Linkage methods for QTL • Fully parametric linkage approach is difficult; • Model-free tests comprise the alternative choice; • We will discuss Haseman-Elston Regression (HE); Variance Components Analysis (VCA); Both can be viewed as two-step procedures: 1. use polymorphic molecular markers to extract information on inheritance patterns; 2. evaluate evidence for a trait-influencing locus at specified locations; Similarities and differences • HE and VCA are based on estimated IBDsharing given marker data; • both methods require specification of a statistical model! ('model-free' means 'does not require specification of genetic model') • similarity in IBD-sharing is used to evaluate trait similarity using either linear regression (HE) or variance components analysis (VCA);