* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Quantitative Trait Loci, QTL An introduction to
Survey
Document related concepts
Gene expression programming wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Designer baby wikipedia , lookup
Human genetic variation wikipedia , lookup
Genetic drift wikipedia , lookup
Microevolution wikipedia , lookup
Behavioural genetics wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Population genetics wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Transcript
Quantitative Trait Loci, QTL An introduction to quantitative genetics and common methods for mapping of loci underlying continuous traits: Why study quantitative traits? • Many (most) human traits/disorders are complex in the sense that they are governed by several genetic loci as well as being influenced by environmental agents; • Many of these traits are intrinsically continuously varying and need specialized statistical models/methods for the localization and estimation of genetic contributions; • In addition, in several cases there are potential benefits from studying continuously varying quantities as opposed to a binary affected/unaffected response: For example: • in a study of risk factors the underlying quantitative phenotypes that predispose disease may be more etiologically homogenous than the disease phenotype itself; • some qualitative phenotypes occur once a threshold for susceptibility has been exceeded, e.g. type 2 diabetes, obesity, etc.; • in such a case the binary phenotype (affected/unaffected) is not as informative as the actual phenotypic measurements; A pedigree representation Variance and variability • methods for linkage analysis of QTL in humans rely on a partitioning of the total variability of trait values; • in statistical theory, the variance is the expected squared deviation round the mean value, Y E (Y ) : V (Y ) E[(Y Y ) 2 ]; • it can be estimated from data as: 1 n s i 1 ( yi y ) 2 ; n 2 • the square root of the variance is called the standard deviation; A simple model for the phenotype Y=X+e where • Y is the phenotypic value, i.e. the trait value; • X is the genotypic value, i.e. the mean or expected phenotypic value given the genotype; • e is the environmental deviation with mean 0. • We assume that the total phenotypic variance is the sum of the genotypic variance and the environmental variance, V (Y ) = V (X ) + V (e), i.e. the environmental contribution is assumed independent of the genotype of the individual; Distribution of Y : a single biallelic locus A single biallelic locus: genetic effects Genotype Genotypic value • a is the homozygous effect, • k is the dominance coeffcient • k = 0 means complete additivity, • k = 1 means complete dominance (of A2), • k > 1 if A2 is overdominant. Example: The pygmy gene, pg • From data we have the following mean values of weight: X++ = 14g, X+pg = 12g, Xpgpg = 6g, • 2a = 14 -6 = 8 implies a =4, • (1 + k)a = 12 - 6 = 6 implies k = 0.5. Data suggest recessivity (although not complete) of the pygmy gene. Decomposition of the genotypic value, X • Xij is the mean of Y for AiAj-individuals; • when k = 0 the two alleles of a biallelic locus behaves in a completely additive fashion: X is a linear function of the number of A2-alleles; • we can then think of each allele contributing a purely additive effect to X ; • this can be generalized to k ≠ 0 by decomposition of X into additive contributions of alleles together with deviations resulting from dominance; • the generalization is accomplished using leastsquares regression of X on the gene content; Least-squares linear regression X = X̂ + , i.e. fitted value residual deviation; minimize the sum of squared residuals; V ( X ) V (X̂ ) V ( ), variance decomposit ion Model 1 X i j Xˆ ij ij i j ij is the population mean phenotype, i is the additive effect of allele Ai , ij is the residual deviation due to dominance; Xˆ ij 1 N1 2 N 2 , with N k the number of Ak - alleles in the genotype; 21 ˆ X ij 1 2 2 2 for A1 A1 , for A1 A2 , for A2 A2 . 1 p1 2 p2 0 2 1 1 p2 2 p1 Interpretations • in the linear regression X Xˆ Xˆ is the heritable component of the genotype, δis the non-heritable part; • the sum of an individuals additive allelic effects, αi+αj is called the breeding value and is denoted Λij • under random mating αican be interpreted as the average excess of allele Ai • this is defined as the difference between the expected phenotypic value when one allele (e.g. the paternally transmitted) is fixed at Ai and the population average, μ; Linear Regression pk proportion of Ak - alleles in population; the expected additive effect of a randomly drawn allele is 0, i.e. 1 p1 2 p2 0 ; which implies the corresponding population variance 12 p1 22 p2 since for a bialleliclocus N1 2-N 2 , X ~ N ij where ~ 2 1 , 2 1. 2 ij Graphically Linear Regression Model solving • X ij ~ N 2 ij X N2 prob. 0 0 p12 a(1+k) 1 2 p1 p2 2a 2 cov( X , N 2 ) • var( N 2 ) p 2 2 E ( X ) a(1 k ) 2 p1 p2 2ap22 2ap2 (1 p1k ) V ( X ) a (1 k ) 2 p1 p2 4a p 4a p (1 p1k ) 2 2 2 2 2 2 E ( N 2 ) 2 p2 Var ( N 22 ) 2 p1 p2 E ( XN 2 ) a(1 k ) 1 2 p1 p2 2a 2 p 2ap2 (2 p2 p1 (1 k )) 2 2 2 2 2 COV ( X , N 2 ) 2ap2 [2 p2 (1 p1k ) 2 p2 p1 (1 k )] 2ap1 p2 [1 k 2 p2 k ] 2ap1 p2 [1 k ( p1 p2 )] a [1 k ( p1 p2 )] average excesses i* E ( X | one allele is i ) X 1* X 12 p(another one is 2 | 1) X 11 p(another one is 1 | 1) X randommating X 12 p2 X 11 p1 X (1 2 ) p2 (21 ) p1 1 Interpretations under random mating • α= a [1+ k (p1-p2)] ; α= - p2 α; α= p1 α, Population parameters for k≠0 • α is called the average effect of allelic substitution: substitute A1 A2for a randomly chosen A1 –allele • then the expected change in X is, (X12 -X11) p1 + (X22 -X12) p2 ; • which equals α. (simple calculations). : Average effect of allelic substitution A1 A2 A2 A1 A2 A1 p1 ( X 12 X 11 ) p2 ( X 22 X 12 ) p1 a(1 k ) p2 a(1 k ) a (1 k ( p1 p2 )) α is a function of p2 and k : Partitioning the genetic variance • the variance, V (X ), of the genotypic values in a population is called the genetic variance: V ( X ) V ( Xˆ ) V ( Xˆ ) V ( ) VA VD • VA 2 p1 p2 2 2( p112 p2 22 ) is the additive genetic variance, i.e. variance associated with additive allelic effects; • VD (2 p1 p2 ak ) 2 dominance genetic variance, i.e. due to dominance deviations; VA VA 2( p112 p2 22 ) p11 p2 2 0 VA 2 p1 p2 p 4 ( Linear 2 2 2 2 (2 p1 p2 2p22 ) 2 2 p1 p2 2 2 p1 p2 a 2 [1 k ( p1 p2 )]2 regression ) V (X); VA; VD are functions of p2 and k: VA [dashed ] 2 p1 p2 [a(1 k ( p1 p2 ))]2 ; VD [dotted ] (2 p1 p2 ak ) 2 ; Example: The Booroola gene, (Lynch and Walsh, 1998) In summary • The homozygous effect a, and the dominance coefficient k are intrinsic properties of allelic products. • The additive effect αi, and the average excess αi* are properties of alleles in a particular population. • The breeding value is a property of a particular individual in reference to a particular population. It is the sum of the additive effects of an individual's alleles. • The additive genetic variance, VA, , is a property of a particular population. It is the variance of the breeding values of individuals in the population. Multilocus traits • Do the separate locus effects combine in an additive way, or do there exist non-linear interaction between different loci: epistasis? • Do the genes at different loci segregate independently? • Do the gene expression vary with the environmental context: gene by environment interaction? • Are specic genotypes associated with particular environments: covariation of genotypic values and environmental effects? Example: epistasis Average length of vegetative internodes in the lateral branch (in mm) of teosinte. Table from Lynch and Walsh (1998). Two independently segregating loci • Extending the least-squares decomposition of X : X 1 1 2 2 • Λk is the breeding value of the k'th locus, δk is the dominance deviation of the k'th locus, ε is a residual term due to epistasis; • if the loci are independently segregating V ( X ) V (1 ) V ( 2 ) V (1 ) V ( 2 ) V ( ) VA,1 VA,2 VD,1 VD,2 V ( ) VA VD V ( ) Neglecting V (ε) • the epistatic variance components contributing to V (ε) are often small compared to VA and VD; • in linkage analysis it is this often assumed that V (ε) = 0; • note however: the relative magnitude of the variance components provide only limited insight into the physiological mode of gene action; • epistatic interactions, can greatly inflate the additive and/or dominance components of variance; Resemblance between relatives A model for the trait values of two relatives: Yk = Xk + ek, k = 1 , 2, where for the k’th relative • Yk is the phenotypic value, • Yk is the genotypic value, • ek is the mean zero environmental deviation. • the ek’s are assumed to be mutually independent and also independent of k. Hence, the covariance of the trait values of two relatives is given by the genetic covariance, C(X1; X2), i.e. C(Y1; Y2) = C(X1; X2) A (preliminary) formula for C(X1 ,X 2) For a single locus trait C(X1; X2) = c1VA + c2VD • c1 and c2 are constants determined by the type of relationship between the two relatives. • same formula applies for multilocus traits if no epistatic variance components are included in the model, i.e. V (ε) = 0. • in this latter case and are given by summation of the corresponding locus-specific contributions. Joint distribution of sibling trait values Single biallelic, dominant (k =1 ) model. Correlation 0.46. Measures of relatedness • N = the number of alleles shared IBD by two relatives at a given locus; • the kinship coefficient, θ , is given by 2 θ = E(N) / 2; i.e. twice the kinship coefficient equals the expected proportion of alleles shared IBD at the locus. • The coefficient of fraternity, Δ, is defined as Δ = P(N = 2). Some examples • Siblings (z0; z1; z2) = (1/4; 1/2; 1/4) implying E(N) = 1. Thus θ= 1/4 and Δ = 1/4: • Parent-offspring (z0; z1; z2) = (0; 1; 0) implying E(N) = 1. Thus θ = 1/4 and Δ = 0: • Grandparent - grandchild (z0; z1; z2) = (1/2; 1/2; 0) implying E(N) = 1=2. Thus θ = 1/8 and Δ = 0: Covariance formula for a single locus Under the assumed model X 1 i1 1j ij1 X 2 i2 2j ij2 Cov( X 1 , X 2 ) Cov( i1 1j , i2 2j ) Cov( ij1 , ij2 ) C (Y1 , Y2 ) C ( X 1 , X 2 ) 2θVA VD E(N ) VA P( N 2)VD 2 A single locus; perfect marker data N C(Y1,Y2|N) VA I N 2 VD 2 with 1 if N 2 I {N 2} 0 if N 0 or N 1 i.e. if N 0 0 C (Y1,Y2|N) VA / 2 if N 1 V V if N 2 D A Covariance formula for multiple loci n independently segregating loci assuming no epistatic interaction, i.e. putting V (ε) = 0 C (Y1 , Y2 ) C ( X 1 , X 2 ) 2 VA VD 2 l VA,l l VD ,l E( Nl ) l VA,l P ( N l 2) VD ,l ; 2 N l is the mumber of alleles shared IBD at locus l ; V A,l , VD ,l are locus - specific additive - and dominace variance contributi ons, respective ly. Covariance formula for multiple loci n independently segregating loci assuming no epistatic interaction, i.e. putting V (ε) = 0 C (Y1 , Y2 ) C ( X 1 , X 2 ) 2 VA VD 2 l VA,l l VD ,l E( Nl ) l VA,l P ( N l 2) VD ,l ; 2 N l is the mumber of alleles shared IBD at locus l ; V A,l , VD ,l are locus - specific additive - and dominace variance contributi ons, respective ly. Covariance... continued Define for every pair of relatives (x) E[ Nx | MDx] / 2; and 2(x) P( Nx 2 | MDx); For two related individuals we then have, C (Y1 , Y2 | MD x ) l E[ N l | MD x ] ( VA,l P( N l 2 | MD x )VD ,l ) ; 2 VA, x 2 VD , x 2VA, x VD , R ( x) ( x) Haseman-Elston method • Uses pairs of relatives of the same type: most often sib pairs; • for each relative pair calculate the squared phenotypic difference: Z = (Y1 –Y2)2; • given MDx regress the Z's on the expected proportion of alleles IBD, π(x) = E [Nx |MDx]/2, at the test locus; • a slope coefficient β< 0, if statistically significant, is considered as evidence for linkage; HE: an example 0.5 Proportion of marker alleles identical by decent Solid line is the tted regression line; Dotted line indicates true underlying relationship HE: motivation E[(Y1 Y2 ) ] V [Y1 Y2 ] 2 V (Y1 ) V (Y2 ) 2C (Y1 Y2 ) 2V (Y ) 2C (Y1 Y2 ) Assume strictly additive gene action at each locus, i.e.VD = 0. Then, for a putative QTL at x, E[(Y1 Y2 ) 2 | MD x ] 2V (Y ) 2C (Y1 Y2 | MD x ) 2V (Y ) 2[ ( x )VA, x 2VA, R ] NOTE : This is a linear function in ( x ) ! HE: linkage test E[Y1 , Y2|MD x ] ( x ) where 2[V (Y ) 2VA, R ] 2VA, x The linkage test is H0 : 0, ( VA, x 0) vs H1 : 0 HE: examples with simulated data simulated data from n = 200 sib-pairs; top to bottom: h2 = 0:50; 0:33; 0:25. Heritability and power • for a given locus we may define the locus-specific heritability as the proportion of the total variance 'explained' by that particular site, e.g. (in the narrowsense), V h2 A V (Y ) • the locus-specific heritability is the single most important parameter for the power of QTL linkage methods; • heritabilities below 10% leads, in general, to unrealistically large sample sizes. HE: two-point analysis ~ ( m) ~ E[(Y1 Y2 ) | marker genotypes] 2 where is the expected proportion of marker alleles shared IBD. ~ • depends on the type of relatives considered; ~ • for sib pairs 2(1 2 ) 2VA,l ; • recombination fraction (θ) and effect size (VA;l ) are confounded and cannot be separately estimated; (m ) HE: in summary Simple, transparent and comparatively robust but: • • • • poor statistical power in many settings; different types of relatives cannot be mixed; parents and their offspring cannot be used in HE; assumptions of the statistical model not generally satisfied; • Remedy: • use one of several suggested extensions of HE; • alternatively, use VCA instead VCA QTL Polygenes Independent environment Mathematically: Yi=+Tai+gi+qi+ei Trait value where is the population mean, a are the “environmental” predictor variables, q is the major trait locus, g is the polygenic effect, and e is the residual error. VCA: an additive model p n i 1 l 1 Y i zi X l e E (Y ) i 1 i zi ; p V (Y ) VA VD V (e) VA, x VD , x VA, R VD , R V (e) C (Y1 , Y2 | MD x ) VA, x V ( x) ( x) 2 D, x 2VA, R VD , R VCA: major assumption The joint distribution of the phenotypic values in a pedigree is assumed to be multivariate normal with the given mean values, variances and covariances; • the multivariate normal distribution is completely specified by the mean values, variances and covariances; • the likelihood, L, of data can be calculated and we can estimate the variance components VA;x; VD;x ; VA;R; VD;R; VCA: linkage test The linkage test of H0 : VA;x = VD;x = 0 uses the LOD score statistic L(full model) LOD x log 10 L(VA, x VD, x 0) When the position of the test locus, x, is varied over a chromosomal region the result can be summarized in a LOD score curve. VCA vs HE: LOD score proles From Pratt et al.; Am. J. Hum. Genet. 66:1153-1157, (2000) Linkage methods for QTL • Fully parametric linkage approach is difficult; • Model-free tests comprise the alternative choice; • We will discuss Haseman-Elston Regression (HE); Variance Components Analysis (VCA); Both can be viewed as two-step procedures: 1. use polymorphic molecular markers to extract information on inheritance patterns; 2. evaluate evidence for a trait-influencing locus at specified locations; Similarities and differences • HE and VCA are based on estimated IBDsharing given marker data; • both methods require specification of a statistical model! ('model-free' means 'does not require specification of genetic model') • similarity in IBD-sharing is used to evaluate trait similarity using either linear regression (HE) or variance components analysis (VCA);