Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Basic QTL Analysis Is there an association between marker genotype and quantitative trait phenotype? - Classify progeny by marker genotype g = genotypic effect - Compare phenotypic mean between classes (t-test or ANOVA) µ1 = trait mean for - Significance = marker linked to QTL genotypic class AA - Difference between means = estimate of QTL effect g = (µ1 - µ2)/2 µ2 = trait mean for genotypic class aa y βo 0 -1 aa AA Genotypic classes x Notations for single-QTL models in backcross and F2 populations Model Backcross (Qq x QQ) DH (qq x QQ) F2 (Qq x Qq) Genotype Value QQ µ1 Qq µ2 Genetic effect g = 0.5(µ1 - µ2) QQ µ1 Qq µ2 Genetic effect g = 0.5(µ1 - µ2) QQ µ1 Qq µ2 qq µ3 Additive a = 0.5(µ1 - µ3) Dominance d = 0.5(2µ2 - µ1 - µ3) Single-marker analysis • How it works – Finds associations between marker genotype and trait value y j f ( A) j r A (marker) Q (putative QTL) • When to use – Order of markers unknown or incomplete maps – Quick scan – Find best possible QTLs – Identify missing or incorrectly formatted data • Limitations Underestimates QTL number and effects QTL position can not be precisely determined r = recombination fraction yj = trait value for the jth individual in the population μ = population mean f(A) = function of marker genotype εj = residual associated with the jth individual Single-marker analysis in backcross progeny • Parents: • Backcross: AAQQ x aaqq aaqq x AaQq x AAQQ Expected Frequency • BC Progeny AaQq AAQQ 0.5 (1 - r) Aaqq AAQq 0.5r aaQq AaQQ 0.5r aaqq AaQq 0.5(1 - r) r is recombination frequency between A and Q Expected QTL genotypic frequencies conditional on genotypes Marker genotype Observed count Marginal frequencies QTL genotype QQ Qq Expected trait value Joint frequency AA n1 0.5 0.5(1-r) 0.5r Aa n2 0.5 0.5r 0.5(1-r) Conditional frequency AA n1 0.5 1-r r (1-r)µ1 + rµ2 Aa n2 0.5 r 1-r rµ1 + (1-r)µ2 Single-marker analysis r A Q (marker) (putative QTL) - Simple t-test - Analysis of variance - Linear regression - Likelihood Simple t-test using backcross progeny H0: [μAa - μaa ] = 0 Yj(i)k = μ + Mi + g(M)j(i) + ei(j)k (a + d) = 0 r = 0.5 tM ˆ Aa ˆ aa 1 1 sˆ n1 n2 2 M ˆ Aa ˆ aa tM 2 sˆAa sˆaa2 n1 n2 t-distribution with df = N – 2 Yj(i)k = trait value for individual j with genotype i in the replication k μ = population mean Mi = effect of the marker genotype g(M)j(i) = genotypic effect which cannot be explained by the marker genotype ei(j)k = error term µAa = trait mean for genotypic class Aa µaa = trait mean for genotypic class aa s2M = pooled variance within the two classes If tM is significant, then a QTL is declared to be near the marker Analysis of variance using backcross H : [μ progeny 0 Aa - μaa ] = 0 (a + d) = 0 r = 0.5 Source df MS (Mean Square) Expected MS Total Genetics N - 1 MSG e2 b G2 Marker 1 MSM e2 b G2 (QTL ) 4r (1 r )a 2 bc(1 2r ) 2 a 2 G(Marker) N-2 MSG(M) e2 Residual N (b - 1) MSE e2 b 2 G ( QTL ) MSM F MSG (M ) F-distribution with 1 and N – 2 df If F is significant, then a QTL is declared to be near the marker F = t if df for numerator is 1 4r (1 r )a 2 N= no. of individuals in pop. b = no. of replications r = recombination fraction Analysis of variance using SAS (A simple example) data a; input Individuals Trait1 Marker1 Marker2; cards; 1 1.57 A B 2 1.35 B A 3 10.7 B B … proc glm; class Marker1 Marker2; model Trait1 = Marker1 Marker2; lsmeans Marker1 Marker2; run; Linear regression using backcross progeny y j 0 1 x j j H0: [μAa - μaa ] = 0 (a + d) = 0 R2: percent of the phenotypic variance explained by the QTL r = 0.5 y β1 Dummy variables: yj= trait value for the jth individual βo aa = -1 xj= dummy variable Aa = 1 βo= intercept for the regression 0 -1 Expectations: aa Aa Genotypic classes x β1= slope for the regression j= random error E(βo) = 0.5 (µAa + µaa) = Mean for the trait E(β1) = 0.5 (1 - 2r) (µAa - µaa) = (1 - 2r) g = 0.5 (a + d) (1 - 2r) Linear regression using backcross progeny Interpretation of results depends on coding of the dummy variables 6 y y=3 +x+e 5 -1 4 3 3 2 2 1 1 0 1 aa Aa Genotypic classes µ=3 µAa = 4 µaa = 2 g = 0.5(µAa - µaa) = 1 y=3 -x+e 5 4 0 -2 y 6 x2 0 -2 -1 0 1 aa Aa Genotypic classes µ=3 µAa = 2 µaa = 4 g = 0.5(µAa - µaa) = -1 x2 A likelihood approach using backcross progeny Joint distribution function: L ( yi j ) 2 p(Q j / M i ) exp 2 2 i 1 j 1 N 1 2 N 2 A likelihood approach using backcross progeny (cont.) 2 2 N ( y ) i j 2 2 Ln L( 1 , 2 , , r Ln p(Q j / M i ) exp Ln ( 2 ) 2 2 j 1 i 1 2 N 1 N N LnL( 1 2 2 ( yi ) 2 Ln(2 2 ) 2 i 1 2 ( yi 1 ) 2 ( yi 2 ) 2 N 2 LnL(r 0.5) Lnexp Ln ( 2 ) 2 2 i 1 2 2 2 N A likelihood approach using backcross progeny (cont.) (Weller, 1986) G-statistics H0: [μAa - μaa ] = 0 Likelihood ratio test statistics (LR) Probability of occurrence of the data under the (a + d) = 0 null hypothesis r = 0.5 G 2 ln L( ˆ Aa , ˆ aa , ˆ 2 , rˆ) ln L(r 0.5) G is distributed asymptotically as a chisquare variable with one degree of freedom G 2 ln L( ˆ Aa , ˆ aa , ˆ 2 , rˆ) ln L( Aa aa ) The t-test is approximately equivalent to the likelihood ratio test using this formula LOD score LOD : Logarithm of the odds ratio Base 10 logarithm of G LR= 2 (log)LOD = 4.605LOD LOD= 0.217LR LOD is interpreted as an odds ratio (probability of observing the data under linkage/probability of observing the same data under no linkage) No theoretical distribution is needed to interpret a lOD score Key value: ≥ 3 (H1 is 1000 times more likely than H0 -no linkage-) (approx: p = 0.001) p= probability of type I error Type I error: false positive (declare a QTL when there is no QTL) G-Statistics and LOD score Single-marker analysis Summary • • • • Identify marker-trait associations Identify missing or incorrectly formatted data Genetic map is not required Divide the population into subpopulations based on the allelic segregation of individual loci (one marker at a time) • Get trait means for each subpopulation (genotypic class) • Determine if the subpopulations trait means are significantly different • Limitations Underestimates QTL number and effects QTL position can not be precisely determined