Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Causes of regulatory variation in the
human genome
Manolis Dermitzakis
The Wellcome Trust Sanger Institute
Wellcome Trust Genome Campus
Cambridge, UK
[email protected]
Human Genome:
~25,000 genes
1-1.5% of the human DNA is coding
Is the remaining 98.5% “junk”
Gene expression as a phenotype
•
Altered patterns of gene expression
 disease.
– e.g., Type 1 diabetes, Burkitt’s
lymphomas.
•
Widespread intraspecific variation.
•
Heritable genetic variation for
transcript levels.
– Familial aggregation of expression
profiles (Cheung et al. 2003).
– In humans, ~30% of surveyed loci
exhibited a genetic component for
expression differences (Monks et al. 2004;
Schadt et al. 2003).
•
Much of the influential variation is
located cis- to the coding locus.
– In humans, mouse, and maize, 35%50% of the genetic basis for intraspecific
differences in transcription level are cisto the coding locus (e.g. Morley et al. 2004;
Stranger and Dermitzakis 2006
Schadt et al. 2003; Stranger et al. 2005;
Cheung et al. 2005, etc.).
Why study gene expression
• Describe and dissect regulatory variation
• Annotate regulatory elements in the
human genome
• Support disease studies to interpret
statistical signals
• Distribution of molecular effects in the
genome
• Natural selection
Outline
• Gene expression variation – recent studies
• Analysis of gene expression with HapMap phase
II SNPs
• Update on CNV-expression associations
• Natural selection and cis regulatory effects
Nature of regulatory variation
REG
GENE
DNA
i) Pre-mRNA
ii) mRNA
iii) Protein
REG
GENE
Expression
iv) DNA
Stranger and Dermitzakis, Human Genomics 2005
Effects of Copy Number Variation on
gene expression
REG
GENE
GENE
REG
Additional gene copy
REG
GENE
REG
Increase of distance from
regulatory element
GENE
GENE
REG
REG
GENE
GENE
REG
New regulatory element
Gene interruption
REG
REG
GENE
GENE
Gene expression association mapping
100
AA
Frequency
80
AG
60
40
GG
20
0
-1.5
0.0
1.5
3.0
Expression Levels
4.5
6.0
7.5
Stranger et al. PLoS Genet 2005
Whole-genome gene expression
illumina Human 6 x 2 gene GEX arrays
~48,000 transcripts
24,000 RefSeq
24,000 other transcripts
270 HapMap individuals:
CEU: 30 trios, 90 total
CHB: 45 unrelated
JPT: 45 unrelated
YRI: 30 trios, 90 total
Cell line
2 IVTs each person
2 replicate hybridizations each IVT
RNA
IVT1
rep1
IVT2
rep2 rep3
rep4
Quantile normalization of all replicates of
each individual.
Median normalization across all
individuals of a population.
HapMap SNPs
60 CEU
45 CHB
44 JPT
60 YRI
14,072 genes
Phase I HapMap; MAF > 0.05
CEU:
CHB:
JPT:
YRI:
762,447 SNPs
695,601
689,295
799,242
~1/5kb
Copy Number Variation dataset
• Genome Structural Variation Consortium
– Redon et al. Nature Nov 22, 2006
•
Array-CGH using a whole genome
tile path array
– Median clone size ~170 kb
– All 270 HapMap individuals
•
•
Quantitative values (log2 ratios) representing
diploid genome copy number, not genotypes.
1117 CNVs called from log2 ratios
– Calls based on standard deviation of log2 ratios
– Many CNVs experimentally verified
26,563 clones
93.7% euchromatic genome
9.0
8.5
8.0
Expression level
9.5
Linear regression for SNPs CNV and expression
CC
CT
TT
Genotype
0
1
2
- slope of line
- p-value
- r2
Clone signal (log2 ratio)
SNP cis-analysis:
SNPs within 1Mb of probe midpoint
1Mb
2Mb window
probe
gene
SNPs
1Mb
CNV cis-analysis:
clone midpoint within 2Mb of probe
midpoint
2Mb
4Mb window
probe
gene
clones
2Mb
Permutation
GENOTYPES
g11
g21
g31
…
…
…
gi1
g12
g22
g32
g13
g23
g33
g14
g24
g34
GENE EXPRESSION
…
…
…
g1n
g2n
g3n
permute
gi2
gi3
gi4
…
gin
Exp1
Exp2
Exp3
…
…
…
Expi
- 10,000 permutations – each time keep lowest p-value
- Null distribution of 10,000 extreme p-values
- Compare observed p-values to the tails of the null
Doerge and Churchill 1996
CNV vs. SNP associations
Stranger et al. Science 2007
CNVs and SNPs mostly capture
different effects
• Relative impact on gene expression:
82% SNPs
18% CNVs
• Only 13% of genes with CNV association also had a SNP association
in the same population
– biased toward large effect size.
– CNV and SNP variation are highly correlated (p-value 0.001).
Custom vs. Genome-wide
[Stranger et al. 2005 PLoS Genet and
Stranger et al. 2007 Science]
• 2 batches of 60 CEU individuals
– grown independently at two different labs
– RNA extraction and labelling by different labs
and people
– Run in custom and gw illumina arrays
– 97% of associations at the 0.05 permutation
threshold from the custom array analysis were
also detected in gw analysis
HapMap phase II analysis
• ~ 4 million SNP genotypes made publicly
available for the 270 HapMap individuals.
• Density: 1 SNP/ 700 bps
• Includes ~50% of expected common SNPs in
these populations.
• 2.2 million SNPs analyzed (MAF>0.05)
Phase I vs. Phase II
cis- significant genes (0.001)
phase I
HapMap
CEU
CHB
JPT
YRI
286
317
337
356
phase II
HapMap
both
90%
85%
87%
87%
258
269
297
310
86%
85%
87%
79%
299
318
341
394
Phase I vs. Phase II
phaseI
12
10
8
6
-log10(pvalue)
4
2
0
12
phaseII
10
8
6
4
2
0
64500000
65100000
65700000
chrom osom e coordinate
66300000
Population sharing of cis- associations
CEU-CHB-JPT-YRI
CEU-CHB-JPT
CEU-CHB-YRI
CEU-JPT-YRI
CHB-JPT-YRI
CEU-CHB
CEU-JPT
CEU-YRI
CHB-JPT
CHB-YRI
JPT-YRI
CEU only
CHB only
JPT only
YRI only
Number of genes
66
38
13
9
30
20
14
36
45
21
28
111
94
121
205
SUM (Non-redundant genes)
851
gene associations in at least 2 populations
percentage of total
320
0.38
gene associations in single populations
percentage of total
531
0.62
Associated SNP position relative to TSS
-1000000 -500000
CEU
0
CHB
500000
1000000
40
30
-log10(pvalue)
20
10
40
JPT
YRI
30
20
10
-1000000 -500000
0
500000
1000000
SNP_relative_to_TSS
Distribution of regulatory elements around the TSS
ENCODE Nature 2007
Direction of allelic effect
same SNP-gene combination across populations
Population 2
7.75
7.50
AGREEMENT
7.25
7.00
6.75
6.50
CC
CT
THAP5
8.00
log2 expression
THAP5
log2 expression
Population 1
8.00
7.75
7.50
7.25
7.00
6.75
6.50
TT
CC
7.75
7.50
OPPOSITE
7.25
7.00
6.75
6.50
CT
rs40915
TT
THAP5
8.00
CC
CT
TT
rs40915
log2 expression
THAP5
log2 expression
rs40915
8.00
7.75
7.50
7.25
7.00
6.75
6.50
CC
CT
rs40915
TT
Direction of allelic effect
Gradient-ceu*ceu-chb
CEU-CHB
CEU-JPT
Gradient-ceu*ceu-jpt
CEU-YRI
Gradient-ceu*ceu-yri
3
3
3
0
0
0
-3
-3
-3
-3
0
3
-3
CHB-JPT
Gradient-chb*chb-jpt
2
0
3
-3
CHB-YRI
Gradient-chb*chb-y
ri
5
0
3
JPT-YRI
Gradient-jpt*jpt-y
ri
2
0
0
0
-2
-2
-2
0
2
-5
-2
0
2
-5
0
5
Pooling populations
Spurious associations
Pop1
90
Pop1
80
Frequency
70
Pop2
60
50
40
30
20
10
0
-20
-10
0
10
log2exp
20
30
40
Pop2
Conditional permutations
Permute data within each pop separately then perform test
GENOTYPES
g11
g21
g31
…
…
…
gi1
g12
g22
g32
g13
g23
g33
g14
g24
g34
GENE EXPRESSION
…
…
…
g1n
g2n
g3n
permute
gi2
gi3
gi4
…
gin
Exp1
Exp2
Exp3
…
…
…
Expi
X4
Multi-population analysis
overlap
overlap with 0 pops
overlap with 1 pop
overlap with 2 pops
overlap with 3 pops
overlap with 4 pops
#genes "multipop" #genes "by_pop"
447
NA
226
530
120
164
74
90
62
66
929
850
Proportion of single pop cis associated genes detected in
multi-population analysis
Proportion of single population associated genes detected in multipopulation analysis
Figure 2A
C29
2 multipop
3 multipop
4 multipop
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
1
2
3
Number of populations sharing association in cis- single population analyses
Number of populations sharing association in cis: single population analysis
4
SGPP2
4-pop_multipop
4-multipop
50
12
6
0
0
CEU
CEU
12
50
6
25
0
Frequency
-log10(pvalue)
N = 80
25
CHB
12
6
0
CHB
N = 31
25
0
JPT
12
50
6
25
0
YRI
YRI
12
50
6
25
222500000
223000000 223500000
224000000
chromosome position
0
50
JPT
0
N = 29
0
0.00 0.15 0.30 0.45 0.60 0.75 0.90
Adjusted_R^2
N = 34
0
N = 39
Trans- phase II HapMap association
– Biological hypotheses: functional categories
• Regulatory SNPs identified from cis- analysis (52%)
• Non-synonymous SNPs (39%)
• Splice site SNPs (7%)
• miRNA SNPs (1%)
rSNPs
spliceSNPs
nsSNPs
miRNA SNPs
GENE
REG
DNA
Genome-wide associations
GENE
~ 25,000 SNPs per population x 14,072 genes
Trans- associations
10-3 threshold
CEU
CHB
JPT
YRI
1
linear regression
significant genes
44
37
38
23
Non-redundant
4 pops
>= 2 pops
108
5
16
10-3
2
3
CEU-CHB-JPT-YRI CEU-CHB-JPT
multipop
multipop
44
52
44
52
44
52
44
52
4
CHB-JPT
multipop
39
39
39
39
overlap
1&2
9
10
10
7
overlap
1&3
12
14
14
7
overlap
1&4
12
15
16
7
correction at 0.001
15 genes estimated false positives
FDR = 33%-39%
correction at 0.01
150 genes estimated false positives
FDR = 60%-75%
14,072 genes
tested
Enrichment of regulatory SNPs and
deficit of nsSNPs in trans- associations
regulatory SNPs
(cis 0.001)
ns SNPs
splice SNPs
miRNA SNPs
ratio
p-value
ratio
p-value
Ratio
pvalue
ratio
pvalue
CEU
6.05
3.23E-24
0.15
1.22E-21
0.49
0.07
0
1
CHB
3.69
7.90E-10
0.24
1.91E-09
0.76
0.71
0
1
!
JPT
3.15
2.06E-07
0.31
8.82E-07
0.71
0.55
0
1
3-6x more likely that a cis regulatory effect
explains a trans regulatory effect
Multi-pop CNV analysis
• Combined 4 populations: 193 genes at
0.001 (48 overlap with the 99 from single
population analysis)
• Combined 3 populations: 173 genes at
0.001 (42 overlap with the 99 from single
population analysis)
CNV trans effects
REG
GENE
Additional gene copy
REG
GENE
GENE
REG
Variable expression
Biological pathway
REG
GENE
REG
GENE
Trans-position
Increase of distance from
regulatory element
GENE
REG
GENE
REG
Increase of distance from
regulatory element
GENE
REG
REG
GENE
GENE
REG
New regulatory element
REG
GENE
Gene interruption
REG
GENE
Trans effects - CEU
25
CNV_chromosome
20
15
10
5
0
0
5
10
15
GENE_chromosome
20
25
Trans effects - YRI
25
CNV_chromosome
20
15
10
5
0
0
5
10
15
GENE_chromosome
20
25
-100000
-50000
CEU
0
50000
100000
CHB
Gene expression and natural selection
40
30
10
JPT
40
YRI
30
-logpval
-log10(pvalue)
20
20
10
-100000
-50000
TSS
0
50000
100000
SNP_relative_to_TSS
TSS
With Sridhar Kudaravalli and Jonathan Pritchard (unpublished)
Gene expression and natural selection
With Sridhar Kudaravalli and Jonathan Pritchard (unpublished)
Co-segregating regulatory variants can
drive differential isoform expression
regulatory
variants
coding
variants
drives high
expression
C
T
drives low
expression
G
C
gene X
protein
isoforms
SUMMARY
• Cis- and trans- acting genetic variation influencing mRNA levels.
• CNV effects detected are largely not captured by SNPs
• Structural variation (copy number polymorphism) influences transcript
level variation.
• Many detected associations are shared across human populations –
replication of effects
• Signal concentrated within 100 Kb from the promoter symmetrically
• Trans-acting effects of CNVs - interpretation
• Primary effects of trans associations are largely cis regulatory effects
• Cis regulatory effects under positive selection
Acknowledgements
Cambridge University
Barbara Stranger
Alexandra Nica
Antigone Dimas
Christine Bird
Matthew Forrest
Catherine Ingle
Claude Beazley
Panos Deloukas
Matt Hurles
Mark Dunning
Natalie Thorne
Simon Tavaré
Stanford
Daphne Koller
illumina
Jill Orwick
Mark Gibbs
Genome Structural Variation Consortium
Richard Redon, Nigel Carter, Charles Lee, Chris
Tyler-Smith, Stephen Scherer,
Wellcome Trust for funding
The HapMap
Consortium
Related documents