Download Document

Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting Quantitative Trait Loci 1. M.Bogdan, J.K.Ghosh and R.W.Doerge, Genetics 2004 167: 989-999. 2. M.Bogdan and R.W.Doerge “Mapping multiple interacting QTL by multidimensional genome searches’’ Xia- genotype of i-th individual at locus a Xia = 1/2 - individual is heterozygous at locus a Xia = -1/2 - individual is homozygous at locus a dab=10 cM - ρ (Xia, Xib) = 0.81 Data for QTL mapping Y1,...,Yn - vector of trait values for n backcross individuals X=[Xij], 1 ≤ i ≤ n, 1 ≤ j ≤ m - genotypes of m markers Standard methods of QTL mapping One QTL model (1) Yi     Qi   i , i N (0,  ) 2 Qi  (-1/2,1/2) - QTL genotype 1. Search over markers - fit model (1) at each marker and choose markers for which the likelihood exceeds a preestablished threshold value as candidate QTL locations. Interval mapping Lander and Botstein (1989) • Consider a fixed position between markers I i - state of flanking markers  1 1   1 1   1 1   1 1   I i   ,  ,  ,   ,   ,  ,   ,     2 2   2 2   2 2   2 2   1 pi  P (Qi  | I i )  easy to compute 2 Yi     Qi   i , i N (0,  2 ) 1 1 2 2 f (Yi | I i )  pi N (    ,  )  (1  pi ) N (    ,  ) 2 2 n L(Y | I )   f (Yi | I i ) i 1 1. Estimate μ, β, and σ by EM algorithm and compute the corresponding likelihood. 2. Repeat this procedure for a new possible QTL location. 3. Plot the resulting likelihoods as the function of assumed QTL position. • Problems with interval mapping a) Not able to distingush closely linked QTL b) Not able to detect epistatic QTL (involved only in interactions) • Solution Estimate the location of several QTL at once using multiple regression model (Kao et al. 1999) p Yi  μ   β jQij  j1 r  1 j<l  m γ jlQijQil  ε i Problem : estimation of the number of additive and interaction terms p r j1 j1 Yi  μ   β jX ih j   γ jX ik j X iu j  ε i Xij - genotype of j-th marker average number of markers - (200,400) Bayesian Information Criterion • Choose the model which maximizes log L -1/2 k log n L – likelihood of the data for a given model k – number of parameters in the model n – sample size Broman (1997) and Broman and Speed (2002) – BIC overestimates QTL number How to modify BIC ? Mi – i-th linear model (specifies which markers are included in regression) θ = (μ, β1,..., βp, γ1,..., γr, σ) – vector of parameters for Mi fi(θ) – density of the prior distribution for θ π(i) – prior probability of Mi L(Y|θ) – likelihood of the data given the vector of paramers θ mi(Y) – likelihood of the data given the model Mi m i (Y)   L(Y | θ)f i (θ)dθ P(Mi|Y)  π(i)mi(Y) BIC neglects π(i) and uses asymptotic approximation log m i (Y)  log L(Y, θ̂)  1/2(p  r  2)log n neglecting π(i) = assigning the same prior probability to all models = assigning high prior probability to the event that there are many regressors Example : 200 markers 200 models with one additive term  200    =19 900 models with one interaction 2  or with two additive terms  200    = 9.05*1058 models with 100 additive terms 100  Idea: supplement BIC with a more realistic prior distribution π 1 ~ ˆ S (i )  log  (i )  log L(Y , )  ( p(i )  r (i )) log n 2 n ˆ log L(Y , )   log RSS  C (n) 2 RSS  residual sum of squares from regression S (i )  n log RSS  ( p(i )  r (i )) log n  2 log  (i ) Choice of π (George and McCulloch, 1993) M – number of markers M(M  1) - number of potential interactions N 2 α - the probability that i-th additive term appears in the model ν - the probability that j-th interaction term appears in the model M- model with p additive terms and r interactions π(M)= αp νr(1-α)M-p (1-ν)N-r Prior distribution on the number of additive terms, p – Binomial (M,α) Prior distribution on the number of interactions, r – Binomial (N,ν) We choose 1 1   , l  N and   , u  N l u log π(M)=C(M,N,l,u)-p log(l-1)-r log(u-1) S (i )  n log RSS  ( p  r ) log n  2 p log( l  1)  2r log( u  1) M E(p)  , l N E(r)  u Choice of l and u should depend on the prior knowledge on the number of QTL. Our choice – for the sample size 200 probability of wrongly detecting QTL (when there are none) ≈ 0.05 We keep E(p) and E(r) equal to 2.2 The choice is supported by theoretical bound on type I error based on Bonferoni inequality. S (i )  n log RSS  ( p  r ) log n  2p log( M / 2.2)  2r log( N / 2.2) Additional penalty similar to Risk Inflation Criterion of Foster and George (2k log t , where t is the total number of available regressors) and to the modification of BIC proposed by Siegmund (2004). Search over 12 chromosomes markers spaced every 10 cM n h2 p corr. extr 200 0 0 500 0 200 0.2 200 r corr extr 0.95 0.03 0 - 0.02 0 0.99 0.01 0 - 0 1 1 0.03 0 0 0.02 0.195 0 - 0.01 1 0.95 0.04 n h2 p corr extr r corr extr 200 0.55 0 - 0.02 3 2.88 0.08 200 0.5 7 5.06 0.26 0 - 0.09 500 0.5 7 6.99 0.14 0 - 0.03 200 0.43 12 2.39 0.31 0 - 0.03 500 0.43 12 9.68 0.47 0 - 0.02 200 0.71 12 9.53 0.75 0 - 0.02 200 0.53 2 1.95 0.04 5 2.11 0.11 500 0.53 2 2 0.03 5 3.47 0.08 • The criterion adjusts well to the number of available markers • For n = 200 the criterion detects almost all additive QTL with individual h2 =0.13 and interactions with h2 =0.2. • For n = 500 the criterion detects almost all additive QTL with individual h2 =0.06 and interactions with h2 =0.12. Bound for the type I error S1  the maximum of the criterion over all one dimensional models S0 = log L0 (Y / ˆ , ˆ )  the value of the criterion for the null model D - the number of terms chosen by our criterion P( D  0)  P( S1  S0 ) S M i - the value of the criterion for a given one dimensional model S M i  S0 if L(Y / ˆM i ) 2 log  log n  2(log(l  1) or log(u  1)) L (Y / ˆ ) 0 0 P( S M i  S0 )  2 P( Z  log n  2(log(l  1) or log(u  1))) where Z N (0,1) By Bonferoni inequality and the bound 2 1 x P(Z>x)  exp( ) 2 2 x 2M 2N P( S1  S0 )   (l  1)C1 (l , n) (u  1)C2 (u, n) M N l , u 2.2 2.2 P( S1  S0 )   4.4  1 1    2 n  log n  2 log(l  1) log n  2 log(u  1)  For n=200 and typical values of M this yields values in the range between 0.057 and 0.08.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document