Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Regulatory variation and its functional consequences Chris Cotsapas [email protected] Motivating questions • How do phenotypes vary across individuals? – Regulatory changes drive cellular and organismal traits – Likely also drive evolutionary differences • How are genes (co)regulated? – Pathways, processes, contexts Regulatory variation • What do “interesting” variants do? • Genetic changes to: – – – – – – – – Coding sequence ** Gene expression levels Splice isomer levels Methylation patterns Chromatin accessibility Transcription factor binding kinetics Cell signaling Protein-protein interactions ~88% of GWAS hits are regulatory Genetic variation alters regulation • Protein levels – Maize (Damerval 94) • Expression levels – Yeast, maize, mouse, humans (Brem 02, Schadt 03, Stranger 05, Stranger 07) • RNA splicing – Humans (Pickrell 12, Lappalainen 13) • Methylation and Dnase I peak strength – Humans (Degner 12; Gibbs 12) Genetics of gene expression (eQTL) • cis-eQTL – The position of the eQTL maps near the physical position of the gene. – Promoter polymorphism? – Insertion/Deletion? – Methylation, chromatin conformation? • trans-eQTL – The position of the eQTL does not map near the physical position of the gene. – Regulator? – Direct or indirect? Modified from Cheung and Spielman 2009 Nat Gen QT association • Analysis of the relationship between a dependent or outcome variable (phenotype) with one or more independent or predictor variables (SNP genotype) Yi = b0 + b1Xi + ei Continuous Trait Value Linear Regression Equation Slope: b1 b0 Logistic Regression Equation pi ln (1-pi) = b0 + b1Xi + ei ( ) 0 1 Number of A1 Alleles 2 eQTL analysis: a GWAS for every gene gene 1 gene 2 gene 3 gene 4 gene 5 gene N Cis- eQTL analysis: Test SNPs within a pre-defined distance of gene 1Mb 1Mb window probe gene SNPs 1Mb cis-eQTLs are rather common Nica et al PLoS Genet 2011 Cis-eQTLs cluster around TSS Stranger et al PLoS Genet 2012 Open question WHERE ARE THE TRANS eQTLS? trans hotspots (yeast) Brem et al Science 2002 Yvert et al Nat Genet 2003 Whole-genome eQTL analysis is an independent GWAS for expression of each gene gene 1 gene 2 gene 3 gene 4 gene 5 gene N Issues with trans mapping • Power – Genome-wide significance is 5e-8 – Multiple testing on ~20K genes – Sample sizes clearly inadequate • Data structure – Bias corrections deflate variance – Non-normal distributions • Sample sizes – Far too small But… • Assume that trans eQTLs affect many genes… • …and you can use multivariate methods! Hore et al Nat Genet 2016 MHC class I; Hore et al Nat Genet 2016 Histone RNA processing; Hore et al Nat Genet 2016 trans-eQTL implies over-dispersion Cross-phenotype meta-analysis l=1 l¹1 l¹1 −log(p) −log(p) −log(p) SCPMA ~ L(data | λ≠1) L(data | λ=1) Cotsapas et al, PLoS Genetics 2011 N = 50 N = 100 N = 150 N = 200 N = 250 N = 300 N = 350 N = 400 1.0 0.8 0.6 True positive rate 0.4 0.2 0.0 1.0 0.8 0.6 NCP 1 2 3 4 5 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Brynedal et al AJHG to appear Comparing eQTLs from three African populations Brynedal et al AJHG to appear Prediction 1 • Allelic effects should be conserved between two populations Genes p1 < 0.05 Genes p2 < 0.05 YRI + + - - + LWK + + - - + L vs Y M vs Y L vs M Brynedal et al AJHG to appear Prediction 2 • Target genes should overlap Genes p1 < 0.05 Genes p2 < 0.05 L vs Y M vs Y L vs M Brynedal et al AJHG to appear Brynedal et al AJHG to appear RNAseq, GTEx NEXT-GEN SEQUENCING DATA GTEx – Genotype-Tissue EXpression An NIH common fund project Current: 35 tissues from 50 donors Scale up: 20K tissues from 900 donors. Novel methods groups: 5 current + RFA How can we make RNAseq useful? • Standard eQTLs – Montgomery et al, Pickrell et al Nature 2010 • Isoform eQTLs – Depth of sequence! • • • • Long genes are preferentially sequenced Abundant genes/isoforms ditto Power!? Mapping biases due to SNPs RNAseq combined with other techs • Regulons: TF gene sets via CHiP/seq – Look for trans effects • Open chromatin states (Dnase I; methylation) – Find active genes – Changes in epigenetic marks correlated to RNA – Genetic effects • RNA/DNA comparisons – Simultaneous SNP detection/genotyping – RNA editing ???