Download IAP workshop, Ghent, Sept.

Mixed model analysis to discover cisregulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*, Olivier Thas*, Marnik Vuylsteke# * Ghent University # VIB (Flanders Institute for Biotechnology) IAP workshop, Ghent, Sept. 18th, 2008 Overview  Genetic background  Objectives  Data  Methodology  Results  Conclusions IAP workshop, Ghent, Sept. 18th, 2008 2 Genetic background  Regulation of gene expression is affected either in: - Cis : affecting the expression of only one of the two alleles in a heterozygous individual; - Trans : affecting the expression of both alleles in a heterozygous individual; IAP workshop, Ghent, Sept. 18th, 2008 3 Genetic background  Why search for Cis-regulatory variants? “low hanging fruit”: window is a small genomic region Fast screening for markers in LD with expression trait.  How to search for Cis-regulatory variants? Using GASED (Genome-wide Allelic Specific Expression Difference) approach (Kiekens et al, 2006) - Based on a diallel design which is very popular in plant breeding system to estimate GCA (generation combination ability) and SCA (specific combination ability) IAP workshop, Ghent, Sept. 18th, 2008 4 Genetic Background  What is GASED approach?  The expression of a gene in a F1 hybrid coming from the kth offspring of the cross can be written as: (c—cis-element, t-trans-element) yijk    ci  ctii  cj  ctjj  ctij  ctji  ijk kth offspring of cross i  j Genotypic variation y ijk    From parent j From parent i From both (cross-terms) In case homozygous gcai gcaj  In case there is cis-effect A cis-regulatory divergence completely explains the difference between two parental lines gcai  gca j IAP workshop, Ghent, Sept. 18th, 2008  scaij   ijk In case there is no trans-effect scaij  0 5 Objectives of this study  Using mixed model analysis to discover Cisregulated Arabidopsis genes  Based on GASED approach, to partition between F1 hybrid genotypic variation for mRNA abundance into additive and nonadditive variance components to differentiate between cis- and trans-regulatory changes and to assign allele specific expression differences to cis-regulatory variation.  To find its associated haplotypes (a set of SNPs) for these selected cis-regulated genes.  Systematic surveys of cis-regulatory variation to identify “superior alleles”. IAP workshop, Ghent, Sept. 18th, 2008 6 Flow chart Data contains all expressed genes (25527 genes) Step I: Step II: Step III: Step IV: Choose genes with significant genotypic variation:σ 2genotype 0 Choose genes from Step 1 with no trans-regulatory variation: σ 2sca_ij  0 Choose genes from step 2 displaying significant allelic imbalance to cisregulatory variation: gcai  gca j Choose genes from Step 3 showing significant association with founded haplotype blocks: βSNPi  0 IAP workshop, Ghent, Sept. 18th, 2008 7 Data Data acquisition: 1) Scan the arrays 2) Quantitate each spot 3) Subtract noise from background 4) Normalize 5) Export table Data for us to analyze IAP workshop, Ghent, Sept. 18th, 2008 8 Methodology - Step I Mixed-Model Equations Full model: Gene X: expression values Reduced model: yklnm = μ + dyek + replicatel + genotypen + arraym + errorklnm FIXED effects RANDOM effect Residual yklnm = μ + dyek + replicatel + arraym + errorklnm error ~ N(0,Σe) , Σe =I2202e ; array ~ N(0, Σa) , Σa =I1102a genotype ~ N(0,Σgenotype) , Σ genotype=G = K2g; K = 55 x 55 marker-based relatedness matrix: Calculated as 1 – dR ; dR = Rogers’ distance (Rogers ,1972; Reif et al. 2005) IAP workshop, Ghent, Sept. 18th, 2008 9 Methodology - Step I Mixed-Model Equations K = 55 x 55 marker-based relatedness matrix: 1 dR  m ni (p   m t 1 1 2 ij  qij ) 2 Rogers (1972); Reif et al. (2005) j 1 d R  [0,1] d R ( F1 , P1 )  d R ( F1 , P2 )  d R ( P1 , P2 ) / 2 Melchinger et al. (1991) pij and qij are allele frequencies of the jth allele at the ith locus ni is the number of alleles at the ith locus (i.e. ni= 2) m refers to the number of loci (i.e. m = 210,205) IAP workshop, Ghent, Sept. 18th, 2008 10 Methodology - Step I Multiple testing correction Gene X: H 0 : σ g2  0 vs H a : σ g2  0 Likelihood ratio test (REML) LRT ~ 0.52(0) + 0.52(1)) 25527 Genes p-value Adjusted q-value (FDR) FDR: false discovery rate How many of the called positives are false? 5% FDR means 5% of calls are false positive John Storey et al. (2002) : q-value to represent FDR Estimate the proportion of features that are truly null: π 0 ^ qval  m π0 t # (pval  t) We use adjusted q-value to represent FDR IAP workshop, Ghent, Sept. 18th, 2008 11 Methodology - Step I Multiple testing correction ^ Storey et al estimate π0 = m0 /m under assumption that true null pvalues is uniformly distributed (0,1) ^ qvalue  m 0 t (t  (0,1)) # ( pvalue  t ) We estimate π0 –adj = m0 /m under assumption that true null p-values is 50% uniformly distributed (0,0.5) , 50% is just 0.5. ^ adjusted _ qvalue  IAP workshop, Ghent, Sept. 18th, 2008 m  0 _ adj t # ( pvalue  t ) (t  (0,0.5)) 12 Methodology - Step II Mixed-Model Equations Full model: y klijm= μ + dyek + replicatel + gcai + gcaj + scaij + arraym + error klijm Gene X: expression values FIXED effects RANDOM effect Residual Σ genotype  Kσ 2g  K(σ 2gcai  σ 2gcaj  σ 2scaij )  LLT (I 1  10 , I1  45 )(σ 2gca , σ 2sca )  L(I 1  10 , I1  45 )(σ 2gca , σ 2sca ) LT L is the Cholesky decomposition Reduced model: y klijm= μ + dyek + replicatel + gcai + gcaj + arraym + error klijm IAP workshop, Ghent, Sept. 18th, 2008 13 Methodology - Step II Multiple testing correction Gene X: H 0 :  2 sca  0 vs H a :  2 sca  0 Likelihood ratio test (REML) LRT ~ 0.52(0) + 0.52(1) 20976 Genes p-value qa-value (FNR)  FNR: false non-discovery rate (Genovese et al , 2002) How many of the called negatives are false? 5% FNR means 5% of calls are false negative  Since we are interested in selecting genes with negative scaij effect, we control FNR instead of FDR We use qa-value to represent FNR IAP workshop, Ghent, Sept. 18th, 2008 14 Methodology - Step II Multiple testing correction False non-discovery rate (FNR) : T | (m  R)  0]Pr(m  R)  0 mR ^ m π 0 (1 t) π0 is the estimate of the proportion of qaval  1  features that are truly null #(pval  t) FNR  E[ IAP workshop, Ghent, Sept. 18th, 2008 15 Methodology - Step III Mixed-Model Equations model: yklijm = μ + dyek + replicatel + gcai + gcaj + arraym + errorkijlm Test 45 pairs gca Gene X: g1 =g2? g1 =g3? g1 =g4? … g1= g10? g2 =g3? i  gca ? j g2= g4? Two sample dependent t-test … g2 =g10? ……, …… g9 = g10? Non-standard P-value ^ standard_t  g2=g5? ^ (g 1  g 2 ) ^ ^ SE( g 1  g 2 ) ^ non  standard_t  ^ (( g 1  g 1 )  (g 2  g 2 )) ^ Distribution of true null p-values is not uniformly distributed from 0 to 1 ^ SE(( g 1  g 1 )  (g 2  g 2 )) ^ g1 ^ is BLUP of g1 , g 2 is BLUP of IAP workshop, Ghent, Sept. 18th, 2008 g2 16 Methodology - Step III Multiple testing correction Gene X: H 0 : gca _ i  gca _ j vs H a : gca _ i  gca _ j two sample t-test testing BLUPs Simulate H0 distribution from real data: simulation-based p-value 1380 Genes q-value (FDR) IAP workshop, Ghent, Sept. 18th, 2008 17 Methodology - Step IV Mixed-Model Equations Full model: yklim = μ + dyek + replicatel + Gene X: (cis-regulated) * SNP β SNP i i i FIXED effects + genotypei + arraym + errorkijlm RANDOM effect Gene Residual chromosome SNP1 SNP2 SNP3 ………SNPi (tag SNPs) genotype ~ N(0,Σgenotype) , Σ genotype=G = K2g; K = 55 x 55 marker-based relatedness matrix. array ~ N(0,Σa) , Σ a=I1102a; error ~ N(0,Σe) , Σ e=I2202e Reduced model: yklim = μ + dyek + replicate+ genotypei + arraym + errorkilm IAP workshop, Ghent, Sept. 18th, 2008 18 Methodology - Step IV Multiple testing correction H :β β  ...β 0 0 SNP1 SNP2 SNPi H : at least one β  0 a SNPi Gene X: (cis-regulated) 836 Genes Likelihood ratio test (ML) LRT ~ 2(2n) n is the number of SNPs p-value q-value (FDR) IAP workshop, Ghent, Sept. 18th, 2008 19 Results Data contains all expressed genes (25527 genes) Adjusted_q value<0.0005 Step I:  genotype  0 20979 genes Step II:  sca _ ij  0 Adjusted_qa value<0.01 1328 genes Step III: gca _ i  gca _ j q value<0.01 972 genes q value<0.01 Step IV:   SNPi  0 859 genes IAP workshop, Ghent, Sept. 18th, 2008 20 Results  Among all 25527 genes, 20979 genes have significant genotypic variation (qvalue < 0.0005). (–Step I)  Among these 20979 genes, 1328 genes have no-trans regulated effect (qavalue < 0.01). (–Step II)  Among these 1328 genes, 972 genes have showed significant different allelic expressions (qvlaue < 0.01); these 972 genes are discovered as cisregulated. (–Step III)  We confirm our discovery from these 972 cis-regulated genes in step IV:  an allelic expression difference caused by cis-regulatory variant implies a nearby polymorphism (SNP) that controls expression in LD;  We indeed found 96.5% selected cis-regulated genes have associated polymorphisms (haplotype blocks ) nearby. IAP workshop, Ghent, Sept. 18th, 2008 21 Conclusions  This mixed-model approach used here for association mapping analysis with Kinship matrix included are more appropriate than other recent methods in identifying cis-regulated genes ( p-values more reliable).  Each step’s statistical method is controlled in a more accurate way to specify statistical significance (referring to FDR, FNR).  Using simulation-based pvalues when testing difference between random effects increases power of detecting association.  A comprehensive analysis of gene expression variation in plant populations has been described.  Using this mixed-model analysis strategy, a detailed characterization of both the genetic and the positional effects in the genome is provided.  This detailed statistical analysis provides a robust and useful framework for the future analysis of gene expression variation in large sample sizes. Advanced statistical methods look promising in identifying interesting discoveries in genetics. IAP workshop, Ghent, Sept. 18th, 2008 22 Many thanks for your attention ! IAP workshop, Ghent, Sept. 18th, 2008 23

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download IAP workshop, Ghent, Sept.