Download Regulatory variation and eQTLs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Regulatory variation and its
functional consequences
Chris Cotsapas
[email protected]
Motivating questions
• How do phenotypes vary across individuals?
– Regulatory changes drive cellular and organismal
traits
– Likely also drive evolutionary differences
• How are genes (co)regulated?
– Pathways, processes, contexts
Regulatory variation
• What do “interesting” variants do?
• Genetic changes to:
–
–
–
–
–
–
–
–
Coding sequence **
Gene expression levels
Splice isomer levels
Methylation patterns
Chromatin accessibility
Transcription factor binding kinetics
Cell signaling
Protein-protein interactions
~88% of GWAS
hits are
regulatory
Genetic variation alters regulation
• Protein levels
– Maize (Damerval 94)
• Expression levels
– Yeast, maize, mouse, humans (Brem 02, Schadt
03, Stranger 05, Stranger 07)
• RNA splicing
– Humans (Pickrell 12, Lappalainen 13)
• Methylation and Dnase I peak strength
– Humans (Degner 12; Gibbs 12)
Genetics of gene expression (eQTL)
• cis-eQTL
– The position of the eQTL maps
near the physical position of the
gene.
– Promoter polymorphism?
– Insertion/Deletion?
– Methylation, chromatin
conformation?
• trans-eQTL
– The position of the eQTL does
not map near the physical
position of the gene.
– Regulator?
– Direct or indirect?
Modified from Cheung and Spielman 2009 Nat Gen
Cis- eQTL analysis:
Test SNPs within a pre-defined distance of gene
1Mb
1Mb window
probe
gene
SNPs
1Mb
QT association
• Analysis of the relationship between a dependent or outcome
variable (phenotype) with one or more independent or
predictor variables (SNP genotype)
Yi = b0 + b1Xi + ei
Continuous Trait Value
Linear Regression Equation
Slope: b1
b0
Logistic Regression Equation
pi
ln (1-pi) = b0 + b1Xi + ei
(
)
0
1
Number of A1 Alleles
2
eQTL analysis: a GWAS for every gene
gene 1
gene 2
gene 3
gene 4
gene 5
gene N
cis-eQTLs are rather common
Nica et al PLoS Genet 2011
Cis-eQTLs cluster around TSS
Stranger et al
PLoS Genet 2012
trans hotspots (yeast)
Brem et al Science 2002
Yvert et al Nat Genet 2003
Candidate genes, perturbations underlying organismal phenotypes
DOES REGULATORY VARIATION ALTER
PHENOTYPE? APPLICATION TO GWAS
Rationale
• How do disease/trait variants actually alter
biology?
• If they change regulation, then:
– Change in gene expression/isoform use
– Phenotypic consequence*
Risk
variant
Molecular
trait
Cellular
trait
Disease
risk
RTC and CPSM in the CD58 Locus
6
5
3
eQTL
2
35
eqtl$BP
26
17
GWAS
9
10
gwas$BP
7
5
CPSM
Trait #2
2
1
pos$BP
M1 M2 M3 M4 M5
0.75
0.5
116.63
116.75
Trait #1
116.5
RTC
M1
0.25
M2
116.88
M3
M4
M5
tab$bps
117
Physical Position (Mbp)
117.13
117.25
117.38
117.5
The PAINTOR model
• If a SNP is causal, then r2 should
predict association of other SNPs
in the area:
• Correlation between test statistics
Z are approximated by MVN given
local pairwise LD structure.
Parameters
λ: standardized effect size
Z: association statistic
C: indicator of causality
m: SNP considered
Kichaev et al. PLoS Genet. 2014
19
Trait #2
M1 M2 M3 M4 M5
M2
M3
M4
M5
Distinct
eQTL
GWAS
eQTL
GWAS
eQTL
GWAS
Trait #1
M1
Shared
Shared
Sheet1
Disease
IBD
CD
UC
MS
T1D
CEL
RA
Disease loci
With any eQTL***
Known* Densely genotyped** CD4+ CD14+
LCL
Total
110
69
69
69
68
69
30
19
18
18
18
18
23
10
10
9
10
10
93
55
54
55
55
56
44
40
39
40
36
40
38
34
34
34
34
34
44
34
34
34
34
34
Explained by a shared eQTL variant****
CD4+
CD14+
LCL
Total
6
7
1
11
2
1
0
3
2
1
3
4
8
3
3
10
2
0
2
4
3
1
0
4
1
0
0
1
* Excluding conditional hits
** Defined by immunochip's densely genotyped fine-mapping intervals. Excluding MHC
*** Loosely
identified
by cis-eQTLhits
signal within +/-100kb from index SNPs. cis-eQTL is defined by the association p-value < 0.05.
* Excluding
conditional
**** FDR
< 0.05 by immunochip's densely genotyped fine** Defined
The numbers are all based on unique loci.
mapping intervals. Excluding MHC
*** Loosely identified by cis-eQTL signal within +/100kb from index SNPs. cis-eQTL is defined by the
association p-value < 0.05.
30
6
●●●●
● ●●●●●
6
●
●
●●●●●●●
●●●●●
●
●
20
●
●●
●
●
● ●
0
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
● ●
● ●●
●
●●
●●
●
●
● ●
●●●● ●
● ● ●
● ●
●
●● ●
●
●
●
●●
●
●●
45.60
●
eQTL
GWAS
●
●
●
4
●
10
●
●●
●●
●●
●
●●
●●
●
● ●
●
●
●
●●●●●●
●●
● ●●● ● ● ●● ● ●●● ● ● ●●
● ● ●●
● ●● ● ● ● ●● ●● ●●●
● ●●●●●●●
●
●
45.65
●
●●
●
45.70
0
45.75
●●
●
●●
●●
●●●●
●
●
45.60
4
●● ●
●
●●
● ●
●
●
●●●●●●●
●●
●●●●
●
● ●● ●
●●
●
● ●●● ●
●
● ●
●
●●
●●
●●
●
●
●●●●
●
●
●
2
●●●●●
●
●
●
eQTL
●●
●
●
●●● ●
● ●
●
●
●
● ●
●
●
●
●
●
● ●
● ●● ●
●
● ●● ●●
●
● ●
●●
●●
● ●
● ● ●●●● ●●
●
●
●
●
●●
45.65
●
●
●●
0
45.70
45.75
●
●
●●
●●●
●
●●
●
●●
●
●
●●
2
●
●
●
●
●
●●
●●
●
●● ●
●
●
●● ●●●
●● ● ●
●●●
●●● ●
●
●
●
●
●
●
●
●●● ●
●
●●● ●
●
●●●
●●●
●
●● ● ●
●●
● ●
●● ●
●●
●●●
●●●
●●
●
●●
●
●
●
●
● ●●●●
●●●● ●● ●
●●
●●●
● ● ●●
●●
●●●●
●
0
●
10
20
30
GWAS
15
●
●●●
●
●●
● ● ● ●●
●
●
●
●
●
●
● ●●● ●
● ●
●●●●
●
3
●
5
0
eQTL
GWAS
● ●●●●● ●●
●●●●
●
●●● ●●
●
●
●
●
●
●
●
●
● ●
●●
●
●● ●
●
●
●
●
●●
●
●
●
● ●●
●●
●
●
●
● ● ●●
● ●●
●
●
●
●
●
● ●● ● ●
●
●
●
● ● ●●●●● ●● ●
●●●●
●●●●● ● ●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●● ●
● ●● ●● ●●●●●
● ● ●●●● ●●●
● ●● ● ●
●
●●
●●●●●●●
●●●●
●●
●
●
●●●●
●●●
●●●●●●●
●●●●
●● ●●● ●● ● ● ●●●●●●●
●●●
●●●●●●●●●
●
●
●● ●●●●●●●●
●●●●●●●●
● ●●● ● ●●●
●
●● ●
●●
●●
●
●● ●●
●
●●
●
●●
●● ●●
●●●● ●●●●●●●●●●
●● ●●●●●
●●
●●●●
●●●
●●●
●●●
●
●
●●
●●●● ●●●●● ● ●●●●
●●●●●●●●●●
●●
● ●
7.7
7.8
●●●●●
●
●
● ●●
● ● ●● ●●
7.9
8.0
●
●
●
●
10
●
3
●
●
●●●
●● ●
● ●● ●
●
●
● ●
●
●● ●●
● ●
●●●
●●
●
●
●
8.1
8.2
●
2
●
0
8.3
●
● ●● ●
1
●●
●
●
●
●●
●●
● ●
●
●●●●
●
●●●●●●●●●
●● ● ● ●●●● ●
●
● ●●
●
●
eQTL
●●●●
●●
●
●
7.7
●
●●
●●
●
●●
●
●
●
●●
●
●
●● ● ●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●● ●
●●
●
●●●●
●●
●●● ●
●
● ● ● ● ●● ●●
● ●
●●● ● ●●●
●
●●●
●
●●●●●●
● ● ●●●●●●●●●●
●●●●
●
●●●
● ●
●●●●
●●
●●
● ●●
●
● ● ●●● ●
●●
●
●●●●
●
●
●●
● ●●●● ●●
●● ●
●●●●●●● ● ● ●
●
●●
●●● ● ●●
●●
●
● ● ●● ●●●●●● ●●● ●●
●●●
●●●●●●
●● ●●●
● ●●
●●●●●
●●● ●●
● ● ● ●●●● ●● ●
● ●
● ●
● ●●●● ●
●●●
●●●●● ●●●
●●●●●●●
●●●
●●
● ●● ●●●
●
●●●
●●●●● ●
●
●●●●●
●●●● ●●● ● ● ●● ●
●●●●●
●
●
●●●●●●●●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●●●●●
●
● ● ● ●●
●●●●●●●●● ● ●●●●●●●
●●
● ●●
●●●
● ●●●●●●●●●●●●●
● ●
●●
●●
●
●
●
●
●●●●●
7.8
7.9
●●●●●●●● ●●
●
8.0
8.1
8.2
●
2
1
●
●
●
0
8.3
●
●
●
●●
●
●
●●● ●
●●● ●●
●●●●
●
●
●
● ●●
●
●●
●
●● ●●
●●
●
●●● ●
● ●●●
●
●
●
● ●●
●
●
●●●
●
● ●●●●●●●●●●
●●
●
●
●
●● ●●●
●
●
●
●
●● ●
● ●●
●
●●●●●
● ●●●●●
●
● ● ●●● ●●●
●●
●
●●●
●
●
●● ●
●●●
●●●●●
●
●●
●
●
●●●●●●●●●● ● ●
●●●
●
●
●
●●●●●●
●●●● ●●● ●
●
●
●
●●
●
●●
●●●
●●●
●
● ● ●● ●●● ●●●
●
●
●
●
●
●●●●
●
●●
●●●
●
●
●
●●●●●●● ●●●●● ●●●●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●●● ●●●● ● ●●
●●
0
5
●
10
GWAS
●
●●●
●
●●
●
●●●●●
●●●
● ●●●●●
●●●●
●●●
15
Sheet1
Disease
IBD
CD
UC
MS
T1D
CEL
RA
Disease loci
With any eQTL***
Known* Densely genotyped** CD4+ CD14+
LCL
Total
110
69
69
69
68
69
30
19
18
18
18
18
23
10
10
9
10
10
93
55
54
55
55
56
44
40
39
40
36
40
38
34
34
34
34
34
44
34
34
34
34
34
Explained by a shared eQTL variant****
CD4+
CD14+
LCL
Total
6
7
1
11
2
1
0
3
2
1
3
4
8
3
3
10
2
0
2
4
3
1
0
4
1
0
0
1
* Excluding conditional hits
eQTL
** Defined by immunochip's densely genotyped fine-mapping intervals. Excluding MHC
*** Loosely
identified
by cis-eQTLhits
signal within +/-100kb from index SNPs. cis-eQTL is defined by the association p-value < 0.05.
* Excluding
conditional
**** FDR
< 0.05 by immunochip's densely genotyped fine** Defined
The numbers are all based on unique loci.
*** Loosely identified by cis-eQTL signal within +/100kb from index SNPs. cis-eQTL is defined by the
association p-value < 0.05.
**** FDR < 0.05
GWAS
mapping intervals. Excluding MHC
Shared
Tissue
LCL
CD14+
CD4+
MS GWAS
SNP
p-value
rs35967351
4.79E-08
rs9989735
2.22E-16
rs1021156
4.20E-11
rs1021156
4.20E-11
rs1359062
6.09E-13
rs1021156
4.20E-11
rs201202118
3.01E-16
rs35967351
4.79E-08
rs71624119
6.56E-10
rs917116
2.04E-09
rs60600003
3.22E-11
rs1021156
4.20E-11
rs201202118
3.01E-16
rs12946510
6.63E-05
rs12946510
6.63E-05
eQTL association
GWAS-eQTL
Joint likelihood result
Gene
p-value
LD (r2)
Empirical P Bonferroni
SLAMF7
5.42E-08
0.94
0
7E-03
SP140
1.29E-09
0.88
0
7E-03
ZC2HC1A
3.35E-30
0.99
0
7E-03
PKIA
1.12E-09
0.99
1E-05
7E-03
RGS1
1.61E-21
0.95
0
7E-03
ZC2HC1A
3.38E-40
1.00
0
7E-03
METTL21B
1.95E-21
0.99
0
7E-03
NHLH1
8.22E-05
0.99
0
7E-03
ANKRD55
1.99E-10
1.00
3E-05
2E-02
JAZF1
6.16E-16
0.94
0
7E-03
ELMO1
1.17E-08
0.81
1E-05
7E-03
ZC2HC1A
4.51E-12
0.98
0
7E-03
METTL21B
8.84E-21
1.00
0
7E-03
GSDMB
4.15E-17
0.79
0
7E-03
ORMDL3
5.65E-13
0.88
0
7E-03
Open question
DOES REGVAR REVEAL CO-REGULATION?
A.K.A. WHERE ARE THE TRANS eQTLS?
Whole-genome eQTL analysis is an independent
GWAS for expression of each gene
gene 1
gene 2
gene 3
gene 4
gene 5
gene N
Issues with trans mapping
• Power
– Genome-wide significance is 5e-8
– Multiple testing on ~20K genes
– Sample sizes clearly inadequate
• Data structure
– Bias corrections deflate variance
– Non-normal distributions
• Sample sizes
– Far too small
But…
• Assume that trans eQTLs affect many genes…
• …and you can use cross-trait methods!
Association data
Z1,1
Z2,1
:
:
Zs,1
Z1,2
…
…
Z1,p
Zs,p
Cross-phenotype meta-analysis
l=1
l¹1
l¹1
−log(p)
−log(p)
−log(p)
SCPMA ~
L(data | λ≠1)
L(data | λ=1)
Cotsapas et al, PLoS Genetics
CPMA for correlated traits
• Empirical assessment to account for
correlation
• Simulate Z scores under covariance,
recalculate CPMA
• Construct distribution of CPMA for dataset,
call significance
with Ben Voight, U Penn
Experimental design
610,180 SNPs
MAF >0.15 CEU and YRI
LD pruned (r2 < 0.2)
plink
CEU p-values
Transcript ~ SNP, sex
CPMA
8368 transcripts
YRI p-values
Detectable on Illumina arrays
108 CEU individuals*
109 YRI individuals*
Transcript ~ SNP, sex
* Stranger et al Nat Genet 2007
(LCL data; publicly available)
CEU CPMA
scores
>95%ile sim CPMA
YRI CPMA
scores
Target sets of genes
• trans-acting variant: SNP with CPMA evidence
• Target genes: genes affected by trans-acting
variant (i.e. regulon)
Prediction 1
• Allelic effects should be conserved between
two populations
– Binomial test on paired observations for all genes
P < 0.05 in at least one population
Genes
pCEU < 0.05
Genes
pYRI < 0.05
CEU
+
+
-
-
+
YRI
+
+
-
-
+
YRI
-
-
+
+
-
True for 1124/1311 SNPs
(binomial p < 0.05)
Prediction 2
• Target genes should overlap
– Identify by mixture of gaussians classification
– Empirical p from distribution of overlaps between
NCEU and NYRI genes across SNPs.
Genes
pCEU < 0.05
Genes
pYRI < 0.05
True for 600/1311 SNPs
(empirical p < 0.05)
What about the target genes?
• Regulons:
– Encode proteins more
connected than expected by
chance
www.broadinstitute.org/mpg/dapple.php
Rossin et al 2011 PLoS Genetics
What about the target genes?
• Regulons:
– Encode proteins enriched for
TF targets (ENCODE LCL data)
– 24/67 filtered TFs significant
– Binomial overlap test
trans
target
genes
CHiPseq
LCL target
genes
TF
p-value
CEBPB
3.7 x 10-142
HDAC8
7.8 x 10-122
FOS
2.5 x 10-96
JUND
3.7 x 10-88
NFYB
3.3 x 10-71
ETS1
3.8 x 10-63
FAM48A
2.1 x 10-61
FOXA1
1.4 x 10-33
GATA1
4.6 x 10-33
HEY1
7.8 x 10-32
Summary
• Regulatory variation is common
• It affects gene expression levels
• Likely many other types:
– DNA accessibility, chromatin states
– Transcript splicing, processing, turnover
• Has phenotypic consequences
– GWAS
– Some cellular assays (not discussed here)
Open questions
• Discover regulatory elements (cis)
– Promoters, enhancers etc
• Gene regulatory circuits (trans)
• Dynamics of regulation
– Splicing variation, processing, degradation
• Phenotypic consequences
– Cellular assays required
• Tie in to organismal phenotype
RNAseq, GTEx
NEXT-GEN SEQUENCING DATA
GTEx – Genotype-Tissue EXpression
An NIH common fund project
Current: 35 tissues from 50 donors
Scale up: 20K tissues from 900 donors.
Novel methods groups: 5 current + RFA
How can we make RNAseq useful?
• Standard eQTLs
– Montgomery et al, Pickrell et al Nature 2010
• Isoform eQTLs
– Depth of sequence!
•
•
•
•
Long genes are preferentially sequenced
Abundant genes/isoforms ditto
Power!?
Mapping biases due to SNPs
RNAseq combined with other techs
• Regulons: TF gene sets via CHiP/seq
– Look for trans effects
• Open chromatin states (Dnase I; methylation)
– Find active genes
– Changes in epigenetic marks correlated to RNA
– Genetic effects
• RNA/DNA comparisons
– Simultaneous SNP detection/genotyping
– RNA editing ???
Related documents