Download SI - Regulatory circuits

Document related concepts
no text concepts found
Transcript
— Supplementary Figures —
Tissue-specific regulatory circuits reveal variable
modular perturbations across complex diseases
Daniel Marbach1,2,∗ , David Lamparter1,2 , Gerald Quon3,4 , Manolis Kellis3,4 , Zoltán Kutalik1,2,5 ,
Sven Bergmann1,2
1
2
3
4
5
Department of Medical Genetics, University of Lausanne, Switzerland
Swiss Institute of Bioinformatics, Lausanne, Switzerland
Broad Institute, MIT, Cambridge, MA, USA
Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
Institute of Social and Preventive Medicine, University Hospital of Lausanne, Switzerland
* Correspondence should be addressed to D.M. ([email protected])
Website: http://regulatorycircuits.org
1
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figures
1
Linking TFs, promoters, enhancers, and genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2
Enhancer–promoter activity correlations for chromosome 22 . . . . . . . . . . . . . . . . . . . . . . . . .
5
3
Q-Q plot for enhancer–promoter correlation coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
4
Dependence of networks on individual samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
5
Dependence of networks on individual samples for primary cells, tissues, and cell lines . . . . . . . . . .
8
6
In-degree distributions of regulatory circuits at each layer . . . . . . . . . . . . . . . . . . . . . . . . . .
9
7
Out-degree distributions of regulatory circuits at each layer . . . . . . . . . . . . . . . . . . . . . . . . .
10
8
TF–promoter out-degree distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
9
ChIP-seq validation of TF–enhancer and TF–promoter edges . . . . . . . . . . . . . . . . . . . . . . . .
12
10
ChIP-seq validation results for each of 5 cell lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
11
eQTL validation of enhancer–gene edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
12
ChIP-seq and eQTL validation: AUPR vs. F-score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
13
Correlation between regulatory edges and target gene expression . . . . . . . . . . . . . . . . . . . . . .
16
14
Hierarchical clustering of regulatory networks across cell types and tissues . . . . . . . . . . . . . . . . .
17
15
Pairwise similarity of cell type and tissue-specific regulatory networks . . . . . . . . . . . . . . . . . . .
18
16
Dendrogram and annotation of nervous system clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
17
Dendrogram of mesenchyme clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
18
Annotation of mesenchyme clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
19
Dendrogram and annotation of immune system clusters . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
20
Dendrogram of trunk organs clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
21
Annotation of trunk organs clusters
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
22
Dendrogram and annotation of neuron-associated clusters . . . . . . . . . . . . . . . . . . . . . . . . . .
25
23
Dendrogram and annotation of epithelium clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
24
Developmental origin of regulatory networks pertaining to different high-level clusters . . . . . . . . . .
27
25
Comparing number of TFs per gene across lineages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
26
Comparing network properties across lineages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
27
Overview of network connectivity enrichment analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
28
Illustrative example for network connectivity enrichment analysis . . . . . . . . . . . . . . . . . . . . . .
32
29
Correlation between neighboring genes due to linkage disequilibrium . . . . . . . . . . . . . . . . . . . .
33
30
Validation of connectivity enrichment scores using phenotype-label permutations . . . . . . . . . . . . .
34
31
Connectivity enrichment scores for different types of networks . . . . . . . . . . . . . . . . . . . . . . . .
35
32
Overview of network connectivity enrichment results for all traits . . . . . . . . . . . . . . . . . . . . . .
37
33
Connectivity enrichment for the psychiatric, cross-disorder study . . . . . . . . . . . . . . . . . . . . . .
38
34
Connectivity enrichment for schizophrenia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
35
Connectivity enrichment for anorexia nervosa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
36
Connectivity enrichment for bipolar disorder
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
37
Connectivity enrichment for inflammatory bowel disease . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
38
Connectivity enrichment for ulcerative colitis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
39
Connectivity enrichment for Crohn’s disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
40
Connectivity enrichment for rheumatoid arthritis, multiple sclerosis, and hepatitis C resolvers . . . . . .
45
41
Connectivity enrichment for Alzheimer’s disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
42
Connectivity enrichment for narcolepsy with and without HLA region . . . . . . . . . . . . . . . . . . .
47
43
Connectivity enrichment for blood lipids (HDL and LDL) . . . . . . . . . . . . . . . . . . . . . . . . . .
48
2
Nature Methods: doi:10.1038/nmeth.3799
44
45
46
Connectivity enrichment for blood lipids (TC and TG) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Connectivity enrichment for body mass index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Connectivity enrichment for β-cell function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
50
52
47
Connectivity enrichment for advanced macular degeneration (neovascular) . . . . . . . . . . . . . . . . .
53
3
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 1
b
Promoter
TF motif
Distal enhancer
TF motif – promoter distance
TF motif
>10-fold enrichment
(motif confidence 0.9)
Fold enrichment
TF motif – enhancer distance
50
Motif
confidence
1.0
0.7
20
Motifs within enhancers:
>10-fold enrichment
10.0
Fold enrichment
a
0.1
Motif
confidence
7.5
1.0
0.7
5.0
0.1
2.5
0
0
500
1000
0
c Promoter
d
cis-eQTL
Gene
>10-fold
enrichment
1000
Empirical distribution
Smooth fit (weight
function)
1.0
0.75
150
100
0.5
Weight
200
Density
Fold enrichment
750
cis-eQTL – gene distance
500
250
500
Target gene
Promoter – gene distance
250
Distance from enhancer (bp)
Distance from promoter (bp)
0.25
500kb
50
0.0
0
0
1kb
0
500
1000
10kb
100kb
1mb
Distance from gene (log scale)
Distance from TSS (bp)
Supplementary Figure 1: Linking TFs, promoters, enhancers, and genes
(a) Enrichment of TF motif instances near CAGE-defined promoters. High-confidence motifs are >10-fold enriched in
a window of -400–50bp around promoters (negative: upstream; positive: downstream).
(b) Enrichment of TF motif instances near distal, CAGE-defined enhancers. High-confidence motifs are >10-fold
enriched inside (distance = 0bp) enhancers. In contrast to promoters, motif enrichment only occurs inside — not
around — the CAGE-defined enhancers, indicating that the signature of bidirectional transcription used for their
detection comprises the entire enhancer region.
(c) Enrichment of CAGE-defined promoters near the annotated transcription start site (TSS) of genes. Promoters are
>10-fold enriched in a window of -250–500bp around the TSS of genes.
(a-c) Regions where >10-fold enrichment was found (grey areas) are used to link TF motifs to promoters, TF motifs
to enhancers, and promoters to genes.
(d) Distance distribution of cis-eQTLs from their target genes (note: logarithmic scale on x-axis, orientation
[up/downstream] is not distinguished). The smooth fit of the empirical distribution is used as a weighting function to
probabilistically link enhancers to genes (second y-axis).
See also Methods for a more detailed discussion of these analyses.
4
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 2
Enhancer-promoter
correlation
0.5
0.0
NA (Enhancer-enhancer and
promoter-promoter pairs)
Enhancers
Promoters
Chromosome 22
Supplementary Figure 2: Enhancer–promoter activity correlations for chromosome 22
In order to investigate whether enhancers globally (i.e., across different cell types and tissues) associate with specific
target promoters, we computed Pearson’s correlation coefficient for all enhancer–promoter pairs on a given chromosome.
The correlation matrix was visualized as a heatmap, where promoters and enhancers were ordered according to their
position on the chromosome. Here, chromosome 22 is shown as an illustrative example, the same observations were
made for other chromosomes. We found that most enhancers do not globally correlate more strongly with specific,
nearby promoters (blue cells are scattered through the matrix). Instead, most enhancers positively correlate with
many promoters throughout the chromosome, many of which are too far away to be likely regulatory targets of the
enhancer. The reason is that many promoters are themselves co-expressed. Moreover, cell type-specific interactions
are often missed when using correlation or correlation-like statistics1 (this conclusion is also supported by a genomewide analysis in 3). It is possible that more sophisticated approaches (e.g., based on an initial clustering of the
input data,2, 3 statistics designed to capture cell type-specific interactions,1 or machine learning approaches integrating
additional data4 ) would reveal significant relationships. However, the performance of these different methods for the
data at hand is not yet well understood and we thus opted for a parsimonious approach to link enhancers to potential
target promoters (see Methods).
5
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 3
Observed quantiles (–log10)
Enhancer–promoter correlation (Q–Q plot)
Expected quantiles (–log10)
Supplementary Figure 3: Q-Q plot for enhancer–promoter correlation coefficients
We sought to test whether enhancers show increased correlation with their target promoters. To this end, we computed
Pearson’s correlation coefficient for all enhancer–promoter pairs. We divided the correlation coefficients in two sets:
one for enhancer–promoter pairs that are on the same chromosome (intra-chromosomal pairs), and one for pairs where
the enhancer and promoter are on different chromosomes (inter-chromosomal pairs). Since the target promoters of
an enhancer are on the same chromosome, the correlation coefficients corresponding to true enhancer–target promoter
interactions are all in the intra-chromosomal set, while the inter-chromosomal correlation coefficients correspond to the
expected null distribution for non-interacting enhancer–promoter pairs. Thus, if enhancers showed increased correlation
with their target promoters, the first distribution would be shifted towards greater positive correlation. The Q-Q plot
shows that this is not the case: the two distributions are identical. We conclude that enhancers do not show increased
correlation with specific target promoters at a global level, i.e., across cell types and tissues (note that the FANTOM5
data consists of samples from different cell types and tissues — naturally, the situation would be different for a dataset
consisting of many samples from the same cell type or tissue). See Supplementary Fig. 2 for a more detailed
discussion of these observations. (Note, for computational reasons the Q-Q plot only includes enhancer–promoter pairs
for chromosomes 13–22; results are identical for the remaining chromosomes).
6
Nature Methods: doi:10.1038/nmeth.3799
a
Number of networks
Supplementary Figure 4
150
100
50
0
1
Stable edges
(leave one sample out)
b
2
3
4
5
6
7
8
9
15
33
Number of samples per network
100%
95%
90%
85%
80%
75%
1
2
3
4
5
6
7
8
9
15
33
Number of samples per network
Supplementary Figure 4: Dependence of networks on individual samples
The compendium of 394 cell type and tissue-specific regulatory networks is based on 808 cell and tissue samples from
FANTOM5: first a network was constructed for each of the 808 samples, and then networks of biological replicates and
other closely related samples were merged (Supplementary Table 1, Methods).
(a) The histogram shows the number of samples merged for each of the 394 final networks. Most networks consist of
few samples: 361 networks (92%) merge only one, two or three samples. 20 networks consist of 4 samples and only 13
networks have more than 4 samples.
(b) Dependence of networks on individual samples. For each of the networks consisting of more than one sample, we
assessed the dependence on each of its samples. To this end, we removed one sample at a time and evaluated the
percentage of edges that remained in the network (referred to as stable edges). Boxplots show the percentage of stable
network edges when removing each of the samples (Supplementary Table 1 includes the result for each sample). These
results confirm that the merged samples of a network are generally very similar and redundant to each other: e.g., for
networks consisting of three samples, on average 94% of edges remain the same when leaving one of the samples out.
7
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 5
Number of networks
Primary cells
Tissues
Cell lines
75
50
25
0
Stable edges
(leave one sample out)
100%
95%
90%
85%
80%
75%
1
2
3
4
5
6
7
8
9
33
Number of samples per network
1
2
3
4
5
6
7
8
9
Number of samples per network
1 2
3
4
5
6
7
8
9
15
Number of samples per network
Supplementary Figure 5: Dependence of networks on individual samples for primary cells, tissues, and
cell lines
Top row: histograms showing the number of samples merged per network; bottom row: dependence of networks on
individual samples. For explanation, see legend of Supplementary Fig. 4. Here, the same results are shown broken
down by primary cells, tissues, and cell lines. Stability of edges is slightly lower for cell lines than for primary cells and
tissues. This is because merged primary cell or tissue samples are mostly biological replicates (same cell type or tissue
from different donors), while merged cell lines are merely of the same cancer subtype (tumor samples from different
patients are expected to have higher heterogeneity than healthy tissue samples).
8
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 6
Distribution
1.00
Cumulative distribution
3
Count (log10)
Median
Log scale
Cumulative distribution
#TFs per enhancer
2
1
0.75
0.50
0.25
0.00
0
1
50
100
150
50
1
#TFs per enhancer
100
150
#TFs per enhancer
#TFs per promoter
Cumulative distribution
1.00
Count (log10)
3
2
1
0
0.75
0.50
0.25
0.00
1
50
100
150
1
#TFs per promoter
50
100
150
#TFs per promoter
#Enhancers per gene isoform
Cumulative distribution
1.00
Count (log10)
3
2
1
0
1
10
20
30
0.75
0.50
0.25
0.00
40
1
#Enhancers per gene isoform
10
20
30
40
#Enhancers per gene isoform
#Promoters per gene isoform
Cumulative distribution
Count (log10)
1.00
3
2
1
0
0.75
0.50
0.25
0.00
1
5
10
15
20
#Promoters per gene isoform
1
5
10
15
20
#Promoters per gene isoform
Supplementary Figure 6: In-degree distributions of regulatory circuits at each layer
In-degree distributions (left side) show the number of nodes (y-axis) for each in-degree (number of incoming links,
x-axis). Distributions are plotted with a logarithmic scale on the y-axis, revealing approximately exponential decay
(linear relationship in the semi-log plots). Exponential in-degree distributions, which do not have extreme hubs as
in scale-free networks, have been previously observed for gene networks.5, 6 Presumably, this is because it would be
difficult to sensibly integrate a huge number of TF inputs at a single enhancer or promoter (a different phenomenon
are hot regions where huge numbers of TFs bind non-specifically, but likely without direct regulatory effect7 ). Extreme
hubs are also not expected for the other types of links (enhancer–gene and promoter–gene) because they are confined
to a certain region on a chromosome, i.e., the number of potential interaction partners is physically limited.
9
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 7
Distribution
Cumulative distribution
#Enhancers per TF
1.00
Cumulative distribution
2
Log scale
Count (log10)
Median
1
0
0.75
0.50
0.25
0.00
1
1
1000 2000 3000 4000
1000 2000 3000 4000
#Enhancers per TF
#Enhancers per TF
#Promoters per TF
1.00
Cumulative distribution
Count (log10)
2
1
Heavy tail
0
0
10000
0.75
0.50
0.25
0.00
20000
1
#Promoters per TF
10000
20000
#Promoters per TF
#Gene isoforms per enhancer
Cumulative distribution
1.00
Count (log10)
3
2
1
0.75
0.50
0.25
0.00
0
1
25
50
75
1
25
50
75
#Gene isoforms per enhancer
#Gene isoforms per enhancer
#Gene isoforms per promoter
Cumulative distribution
Count (log10)
1.00
3
2
1
0.75
0.50
0.25
0.00
0
1
5
10
15
20
25
#Gene isoforms per promoter
1
5
10
15
20
25
#Gene isoforms per promoter
Supplementary Figure 7: Out-degree distributions of regulatory circuits at each layer
Out-degree distributions (left side) show the number of nodes (y-axis) for each out-degree bin (number of outgoing
links, counts were obtained using equally spaced bins on x-axis). Similar to the in-degree distributions (previous figure),
the out-degree distributions show roughly exponential decay with one exception: the number of promoters per TF has
a heavy tail corresponding to extreme hubs that are characteristic of scale-free networks (cf. Supplementary Fig.
8). In other words, regulatory networks are scale-free only at the level of promoters, but not at the level of enhancers,
suggesting that the widely cited scale-free property of gene networks6 is due to the promoter-centric view of previous
studies. Concerning the enhancer–gene and promoter–gene links, we do not expect extreme hubs because of physical
constraints, as discussed in Supplementary Fig. 6.
10
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 8
TF–promoter outdegree distribution
2
Count (log10)
Log scale
Median
1
0
3
5
#Promoters per TF (log10)
Log scale
Supplementary Figure 8: TF–promoter out-degree distribution
Same plot as shown in the second row of Supplementary Fig. 7, but using a log scale for both axes, confirming
that the out-degree distribution of TFs with respect to promoters follows roughly a power-law in the right tail (linear
relationship in the log-log plot).
11
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 9
a
Inference methods
Expression correlation
TF motifs
Expression correlation & TF motifs
Enhancer expression & TF motifs*
Random
Reference
ChIP-seq replicates
TF – enhancer edges
1
2
3
4
i
ii
0
*Retained method
b
Expression correlation
TF motifs
Expression correlation & TF motifs
Promoter expression & TF motifs*
Random
ChIP-seq replicates
0.25
0.5
AUPR
0.75
1
TF – promoter edges
1
2
3
4
i
ii
0
0.25
0.5
AUPR
0.75
1
Supplementary Figure 9: ChIP-seq validation of TF–enhancer and TF–promoter edges
Assessment of different approaches to infer edges between TFs and regulatory elements (enhancers and promoters).
See legend of Fig. 2a for details. While Fig. 2a shows the overall performance across enhancers and promoters,
here we show results individually for (a) enhancers and (b) promoters. For both the retained method to construct
regulatory circuits (*) and the ChIP-seq replicates, there is no significant difference in performance between enhancers
and promoters (p > 0.05, two-sided Wilcoxon rank-sum test).
12
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 10
GM12878
48 samples
TF–enhancer edges
Inference methods
Expression correlation
1
1
TF motifs
2
2
Expression correlation & TF motifs
3
3
Enhancer expression & TF motifs*
4
4
Random
i
i
ChIP-seq replicates
ii
ii
Reference
0
K562
42 samples
*Retained method
HepG2
40 samples
0.5
AUPR
0.75
1
1
1
TF motifs
2
2
Expression correlation & TF motifs
3
3
Enhancer expression & TF motifs*
4
4
Random
i
i
ChIP-seq replicates
ii
ii
0.25
0.5
AUPR
0.75
1
Expression correlation
1
1
TF motifs
2
2
Expression correlation & TF motifs
3
3
Enhancer expression & TF motifs*
4
4
Random
i
i
ChIP-seq replicates
ii
ii
0
25 samples
0.25
Expression correlation
0
0.25
0.5
AUPR
0.75
1
Expression correlation
1
1
TF motifs
2
2
Expression correlation & TF motifs
3
3
Enhancer expression & TF motifs*
4
4
Random
i
i
ChIP-seq replicates
ii
ii
0
HUVEC
6 samples
TF–promoter edges
0.25
0.5
AUPR
0.75
1
Expression correlation
1
1
TF motifs
2
2
Expression correlation & TF motifs
3
3
Enhancer expression & TF motifs*
4
4
Random
i
i
ChIP-seq replicates
ii
ii
0
0.25
0.5
AUPR
0.75
1
0
0.25
0.5
AUPR
0.75
1
0
0.25
0.5
AUPR
0.75
1
0
0.25
0.5
AUPR
0.75
1
0
0.25
0.5
AUPR
0.75
1
0
0.25
0.5
AUPR
0.75
1
Supplementary Figure 10: ChIP-seq validation results for each of 5 cell lines
Assessment of different approaches to infer edges between TFs and regulatory elements (enhancers and promoters)
for each of 5 ENCODE cell lines. See legend of Fig. 2a for details. While Fig. 2a shows the overall performance
across enhancers, promoters and cell lines, here we show results individually for TF–enhancer and TF–promoter edges
(columns) and the different cell lines (rows). For each cell line, the number of samples (ChIP-seq experiments) is
indicated. Note that there are only 6 ChIP-seq experiemnts for HUVEC (human umbilical vein endothelial cells). The
retained method to construct regulatory circuits (*) outperforms alternative approaches in each of the cell lines, but
there is substantial variation in performance both for the retained method and ChIP-seq replicates across cell lines.
Notably, the retained method performs as well as the ChIP-seq replicates in K562. There is also substantial variation
in performance for different TFs in the same cell line, ranging from poor (AUPR comparable to random predictions)
to excellent prediction accuracy (AUPR comparable to ChIP-seq replicates), which is expected due to varying quality
of TF motifs and/or ChIP-seq antibodies. Comparing the performance for TF–enhancer vs. TF–promoter edges, we
observe the same trends as in Supplementary Fig. 9.
13
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 11
a
Matching tissues (9 samples)
Inference methods
Maximum expression correlation 1
Minimum genomic distance 2
Tissue-specific activity & distance* 3
Reference
Random i
*Retained method
b
0
0.2
AUPR
0.4
Closely related tissues (57 samples)
Maximum expression correlation 1
Minimum genomic distance 2
Tissue-specific activity & distance* 3
Random i
0
0.2
AUPR
0.4
Supplementary Figure 11: eQTL validation of enhancer–gene edges
Assessment of different approaches to link enhancers to target genes using eQTL data from GTEx. See legend of Fig.
2b for details. We mapped the 13 tissues from GTEx to corresponding tissue samples from FANTOM (Supplementary
Table 3). We distinguished between exact matches and closely related tissues. For example, one of the GTEx tissues is
the aorta. There is a matching tissue sample in FANTOM, but also several aortic cell types including aortic endothelial
cells, fibroblasts and smooth muscle cells, which we label as closely related tissues.
(a) Results for 9 matching GTEx–FANTOM tissue pairs (this panel is identical to Fig. 2b).
(b) Results for 57 closely related GTEx–FANTOM tissue pairs. The results for matching tissues are confirmed in the
broader set of closely related tissues. As expected, for our tissue-specific circuits (the retained method) performance
is slightly better for matching tissues than for closely related tissues. For the other methods (random, maximum
correlation, and minimum distance), there is no difference because these methods do not infer links in a tissue-specific
manner.
14
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 12
a
TF – enhancer/promoter edges
Inference methods
Expression correlation 1
1
TF motifs 2
2
Expression correlation & TF motifs 3
3
Enhancer expression & TF motifs* 4
4
Reference
Random i
i
ChIP-seq replicates ii
ii
0
*Retained method
0.25
0.5
b
Inference methods
Maximum expression correlation
Minimum genomic distance
Tissue-specific activity & distance*
Reference
Random
*Retained method
0.75
1
0
0.25
AUPR
0.5
0.75
1
F-score
Enhancer – gene edges
1
1
2
2
3
3
i
i
0
0.2
0.4
AUPR
0
0.1
0.2
0.3
F-score
Supplementary Figure 12: ChIP-seq and eQTL validation: AUPR vs. F-score
Comparison of validation results using two alternative metrics to assess the accuracy of edge predictions: the area
under the precision-recall curve (AUPR, left) and the F-score (right). The plots showing the AUPR are identical to
Fig. 2 of the main text, they are reproduced here for comparison with the F-score. Refer to the legend of Fig. 2 for
explanation. The choice of the AUPR to report validation results is motivated in Methods, but similar results were
obtained with other performance metrics. Here we show in addition the F-score of edge predictions: the F-score is a
popular metric used in the network inference literature, it is defined as the harmonic mean of precision and recall at
a given cutoff. The plots show the F-score for the top 50% of the predicted edges, similar results were obtained at
different cutoffs. We conclude that our results are robust and not specific to the AUPR as performance metric.
15
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 13
a
b
Background
Correlation between strength of TF
inputs and expression of genes
●
●
Density
Genes present in network
●
●
●
●
●
●
Correlation
Genes ranked by expression
Supplementary Figure 13: Correlation between regulatory edges and target gene expression
Gene expression data (RNA-seq) was obtained from the Roadmap Epigenomics project for 40 tissues that are also
present in our network compendium (Supplementary Table 4, Methods).
(a) Overlap of weakly, moderately, and highly expressed genes with regulatory networks across the 40 tissues. For each
tissue, genes were ranked by their expression level (RPKM). Boxplots show the percentage of genes that are part of
the regulatory network for the bottom 10%, mid 10%, and top 10% most highly expressed genes. On average, 98% of
the highly and 90% of the moderately expressed genes are present in our networks, i.e., they have regulatory inputs
(incoming edges) in the given tissue. As expected, most of the weakly expressed genes have no active regulatory inputs
in the given tissue, i.e., they are not present in the corresponding network.
(b) Correlation between regulatory edge strength and target gene expression. For each gene, the correlation between
its TF input (sum of weights of incoming edges) and expression was evaluated across tissues. The plot shows the
distribution of correlation coefficients for all genes in green. The background/expected distribution is shown in grey
(correlation coefficients for TF inputs of gene i and expression levels of gene j, where i = j). There is strong positive
correlation between the strength of TF inputs and the expression of the same gene. The plot shows results using
Spearman’s correlation coefficient (median = 0.53), similar results were obtained using Pearson correlation (median =
0.49). Correlation is stronger when considering only Roadmap Epigenomic tissues that were considered to be a good
match for a network in our compendium (results shown here) than when including also the secondary matches / related
tissues given in Supplementary Table 4 (median Spearman’s correlation = 0.42).
16
Nature Methods: doi:10.1038/nmeth.3799
3
Adult forebrain
4
Mesenchymal, mixed
5
Sarcoma
6
Endothelial cells
7
Mesenchymal stem & smooth muscle cells
8
Connective tissue & muscle cells
9
Connective tissue & integumental cells
10
11
12
Trunk organs
13
14
22
23
Myeloid leukemia
Endo-epithelial cells
Adenocarcinoma
Liver & kidney Male reproductive organs
Gastrointestinal system
Heart
Mouth, throat & skeletal muscle tissue
Lung
24
Glands & internal genitalia
15
16
17
18
19
20
21
Epithelium
c
Lymphocytes
Myeloid leukocytes
Lymphocytes of B lineage
Lymphoma
Immune organs
Pineal gland & eye
Neuron-associated cells & cancer
27 Astrocytes & pigment cells
28 Neuroectodermal tumors & sarcoma
25
26
30
31
Epithelial cells
Extraembryonic membrane
Epithelial cells of kidney & uterus
32
Lung epithelium & lung cancer
29
Cluster 10
Neurons & fetal brain
Nervous system & adult hindbrain
Cluster 11
1
2
Individual networks
CD4+ T cells
Natural killer cells
CD8+ T cells
!"ve regulatory T cells
!ry conventional T cells
!ry regulatory T cells
!"ve conventional T cells
CD14+ monocytes
Mast cell
Langerhans cells, immature
Langerhans cells, migratory
CD19+ B cells
Dendritic cells, plasmacytoid
CD14+CD16+ monocytes
Neutrophils
Whole blood ribopure
Peripheral blood mononuclear cells
Basophils
Lymphocytes
Myeloid leukocytes
Other
c
Cluster 24
b
Neuronassociated
b
32 clusters of networks
Mesenchyme
Immune system
~400 cell-type-specific networks
a
Nervous
system
Supplementary Figure 14
Individual networks
Thyroid, fetal
Thyroid, adult
Parotid gland, adult
Salivary gland, adult
Submaxillary gland, adult
Seminal vesicle, adult
Ductus deferens, adult
Uterus, fetal
Ovary, adult
Cervix, adult
Uterus, adult
Breast, adult
Dura mater, adult
Vein, adult
Adipose tissue, adult
Vagina, adult
Smooth muscle, adult
Prostate, adult
Bladder, adult
Glands
Male internal genitalia
Female internal genitalia
Supplementary Figure 14: Hierarchical clustering of regulatory networks across cell types and tissues
(a) Clustering of the 394 cell type and tissue-specific regulatory networks based on edge similarity (see also Supplementary Fig. 15). Clusters are highly consistent with developmental and functional relationships between cell types,
tissues and organs. The cutoff indicated by the red dashed line leads to 32 coherent clusters, annotated to the right
based on enriched terms from cell, anatomical and disease ontologies. For each of these clusters, a high-level network
was derived by merging the corresponding individual networks (Methods).
(b-c) Detailed view of representative clusters of networks (red boxes in Panel a; remaining clusters are shown in Supplementary Figs. 16–23).
(b) Cluster 10 consists of regulatory networks from lymphocytes including T cells and natural killer cells, while Cluster
11 mostly contains networks from myeloid leukocytes. Cf. Supplementary Fig. 19.
(c) Cluster 24 joins regulatory networks from glands and internal genitalia (seminal vesicle, ovary, breast, and prostate
belonging to both categories). Functionally related networks are consistently grouped also at the sub-cluster level (e.g.,
endocrine glands, salivary glands, and male and female genitalia, respectively). Cf. Supplementary Figs. 20–21
17
Nature Methods: doi:10.1038/nmeth.3799
Frequency
Supplementary Figure 15
0
394 regulatory networks
32 clusters
0.5
Edge similarity
1
(i) Nervous system
(ii) Mesenchyme
(iii) Immune system
(iv) Trunk organs
(v) Neuron-associated
(vi) Epithelium
Supplementary Figure 15: Pairwise similarity of cell type and tissue-specific regulatory networks
Hierarchical clustering of regulatory networks was performed based on edge similarity (Methods). The cutoff indicated
by the red, dashed line leads to 32 coherent clusters, which are annotated in Supplementary Fig. 14 and shown
in detail below (Supplementary Figs. 16–23). The six high-level groups (i–vi) separate regulatory networks of major
types of tissues and anatomical systems, which also share distinct developmental origins (Supplementary Fig. 24).
Regulatory networks of the nervous system form a particularly tight group and are the most dissimilar to networks of
other groups.
18
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 16
a
Regulatory networks
Neural stem cells
Neurons
brain, fetal
1. Neurons & fetal brain
parietal lobe, fetal
temporal lobe, fetal
occipital lobe, fetal
Assigned cluster names
optic nerve
spinal cord, fetal
cerebellum, adult
corpus callosum, adult
substantia nigra, adult
2. Nervous system & adult hindbrain
pons, adult
spinal cord, adult
medulla oblongata, adult
locus coeruleus, adult
caudate nucleus, adult
putamen, adult
diencephalon, adult
thalamus, adult
globus pallidus, adult
amygdala, adult
hippocampus, adult
parietal lobe, adult
medial frontal gyrus, adult
occipital cortex, adult
medial temporal gyrus
3. Adult forebrain
cerebral meninges, adult
brain, adult
medial temporal gyrus, adult
olfactory region, adult
occipital lobe, adult
nucleus accumbens, adult
paracentral gyrus, adult
insula, adult
frontal lobe, adult
occipital pole, adult
temporal lobe, adult
postcentral gyrus, adult
b
3. Adult forebrain
1. Neurons, fetal brain
Q-value
3.57E-02
9.66E-02
ID
UBERON:0000922
UBERON:0011216
Term
embryo
organ system subdivision
2. Nervous system, hindbrain
Q-value
1.95E-06
3.14E-06
3.59E-05
5.29E-05
1.07E-04
1.11E-04
1.11E-04
2.70E-04
1.03E-03
1.86E-03
1.86E-03
3.57E-03
3.57E-03
6.00E-03
1.81E-02
3.13E-02
3.13E-02
3.13E-02
3.13E-02
3.13E-02
3.13E-02
6.45E-02
7.64E-02
2.96E-03
ID
UBERON:0001016
UBERON:0004732
UBERON:0001017
UBERON:0010314
UBERON:0011216
UBERON:0002028
UBERON:0004733
UBERON:0004121
UBERON:0002298
UBERON:0000073
UBERON:0002616
UBERON:0001895
UBERON:0002680
UBERON:0000955
UBERON:0000477
UBERON:0000122
UBERON:0000988
UBERON:0001137
UBERON:0002240
UBERON:0005162
UBERON:0005174
UBERON:0000063
UBERON:0000481
UBERON:0007023
Term
nervous system
segmental subdivision of nervous system
central nervous system
structure with developmental contribution from neural crest
organ system subdivision
hindbrain
segmental subdivision of hindbrain
ectoderm-derived structure
brainstem
regional part of nervous system
regional part of brain
metencephalon
regional part of metencephalon
brain
anatomical cluster
neuron projection bundle
pons
dorsum
spinal cord
multi cell component structure
back organ
organ segment
multi-tissue structure
adult organism
Q-value
1.68E-20
1.68E-20
1.87E-19
9.13E-19
9.90E-19
9.90E-19
1.44E-18
2.79E-18
7.92E-18
1.24E-17
3.90E-17
3.90E-17
1.87E-14
6.79E-13
2.50E-12
2.50E-12
1.04E-11
1.52E-11
2.32E-11
5.60E-11
1.08E-07
5.08E-05
8.76E-05
3.58E-04
3.58E-04
3.58E-04
3.58E-04
3.58E-04
4.49E-03
4.49E-03
1.11E-02
1.11E-02
1.48E-02
1.48E-02
1.48E-02
1.48E-02
4.17E-02
3.06E-14
ID
UBERON:0001890
UBERON:0002780
UBERON:0001869
UBERON:0001017
UBERON:0000073
UBERON:0002616
UBERON:0001893
UBERON:0000955
UBERON:0001016
UBERON:0002791
UBERON:0002020
UBERON:0003528
UBERON:0011216
UBERON:0002619
UBERON:0000203
UBERON:0000956
UBERON:0010314
UBERON:0001950
UBERON:0004121
UBERON:0000481
UBERON:0000477
UBERON:0000064
UBERON:0000200
UBERON:0000454
UBERON:0002420
UBERON:0007245
UBERON:0010009
UBERON:0010011
UBERON:0001871
UBERON:0009663
UBERON:0000125
UBERON:0002308
UBERON:0000204
UBERON:0000349
UBERON:0000369
UBERON:0002435
UBERON:0002021
UBERON:0007023
Term
forebrain
regional part of forebrain
cerebral hemisphere
central nervous system
regional part of nervous system
regional part of brain
telencephalon
brain
nervous system
regional part of telencephalon
gray matter of neuraxis
brain grey matter
organ system subdivision
regional part of cerebral cortex
pallium
cerebral cortex
structure with developmental contribution from neural crest
neocortex
ectoderm-derived structure
multi-tissue structure
anatomical cluster
organ part
gyrus
cerebral subcortex
basal ganglion
nuclear complex of neuraxis
aggregate regional part of brain
collection of basal ganglia
temporal lobe
telencephalic nucleus
neural nucleus
nucleus of brain
ventral part of telencephalon
limbic system
corpus striatum
striatum
occipital lobe
adult organism
Supplementary Figure 16: Dendrogram and annotation of nervous system clusters
(a) Dendrogram showing networks and cluster names for (i) Nervous system (cf. Supplementary Fig. 15).
(b) Enriched annotations from cell, anatomical and disease ontologies for each cluster of networks (p-values were
corrected using Benjamini-Hochberg procedure; Methods). Horizontal lines separate groups of related annotations. We
named clusters by inspecting both the individual networks (a) and the enriched annotations (b, selected terms are
highlighted in yellow) with the aim to define succinct names that fit the majority of corresponding cells and tissues.
19
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 17
Regulatory networks
*/"r/"$
!r/"$
trr"$r""$$$
"/J"$r""$$$
'Y<$$$*<$"r!rry
Fibrob$"*<$"r!rry
h"$'$$Y"r€$$
'Y<$$$[=$"$V
4. Mesenchymal, mixed
="//o~/"$$$$
$""$$$
'r$$$
=""$$$"/"$$$
Y"h"$</$$$
~J"/"$$$
</\=/"$$$
$$$"/"$$$
$myob$""$$$
xtrasZ$"$m^Y/"/"$$$
"/"$rt^""/"$$$
al;$"/$$"/"$$$
$"/J$$$<J"/"$$$
=ry"/"$$$
J"$$"/"$$$
"/"$$$
$rYY"$$<$"/"/"$$$
$"/"$$$
J$"$$$
"/"$$$
5. Sarcoma
$Y$"$$$$
Jra<$"$$</$$$
"$$"r""/"$$$
th/"/"$$$
J"$$$
\=/<Y"$$$
\=/"/"$$$
mixm<$$r"</$$$
Y$"J$$<$"/"/"$$$
"J"/"$$$
m^\=/"/"$$$
""$""/"$$$
Yw""$$$
_"'<"$Y$"$$$
"$X$r<$"/Y$"$$$
Y$"$$$/o;"<$"/
Y$"$$$@Y"
6. Endothelial cells
Y$"$$$Yr"
Y$"$$$[=$"$;
Y$"$$$V
Y$"$$$!rry
Y$"$$$!r
h"$/<//$$o;ar""/""
h"$/<//$$o;ar""/rJYo;ary
h"$/<//$$o;ar""/$\o;ary
h"$/<//$$="//ow
h"$/<//$$"/"
h"$/<//$$"
$my"$$$
Y$"$$$
7. Mesenchymal stem & smooth muscle cells
'Y<$$$[=$"$!rry
Fibrob$""/"
'Y<$$$!r
'Y<$$$'<=$a;"!rry
'Y<$$$"/
'Y<$$$}r"$Yr"!rry
'Y<$$$/"r!rry
'Y<$$$r"YY"$
h"$$$$
Myoblast
Pr
SZ$"$m<$$$\f/"y<=m<$<$"
SZ$"$<$$$
SZ$"$<$'"$$$$
Tr"=<$"/YworZ$$
Fibrob$"|<;al
Kr"
h"$'$$="//ow
Fibrob$"@Y"
'Y<$$$Y"J"$
'Y<$$$/Y"$
8. Connective tissue & muscle cells
h"$'$$Y"
_"'$$"$$`$z
h"$'$$<=$"$
"/"y
Ir*JY$"$$$
'Y<$$$r"V"<$"/
Fibrob$"Y/*$^<
J"$$$
?b$"\f/"
'Y<$$$Tr"Y"$
Hair F$$$?</'Y"Y$$
!<$<*<$<$$
]<$<*<$<$$
!<=<"<
!=/"
!"$
Y//\\
'o;
Y/\\
r"$"$Y$"$$$$
r"$=ry"$""$h"$$$$
'Y<$$$$
'Y<$$$*/"
*/"'/"$$$
h"$'$$"=r"
<$/$[/r'"'$$
'Y<$$$[r
Fibrob$"ZwalZ/warb</J
9. Connective tissue & integumental cells
P"/"/"$$$
*/""$
*/"=/"
Fibrob$"Zrmal
Fibrob$"Z/Y"my"
Fibrob$"Z"$m<<$"/"/hy
Hair F$$$rmal P"$$"$$
Olf"rY$"$$$
Fibrob$"Pr"$@J"K"<$
Fibrob$"XJ;al
Fibrob$"Pr"$@J"Kf"$
?blast
Fibrob$"!r!;"$
*/"<=<"<
*/";ral
h"$'$$"
Fibrob$"rmal
Supplementary Figure 17: Dendrogram of mesenchyme clusters
Dendrogram showing networks and cluster names for (ii) Mesenchyme (cf. Supplementary Fig. 15). Annotations
are shown in Supplementary Fig. 18.
20
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 18
4. Mesenchymal, mixed
Q-value
8.15E-02
8.15E-02
8.15E-02
8.15E-02
4.40E-02
ID
UBERON:0000078
UBERON:0002012
UBERON:0008886
UBERON:0005406
DOID:3307
7. Mesenchymal stem & smooth muscle cells
Term
mixed ectoderm/mesoderm/endoderm-derived structure
pulmonary artery
pulmonary vascular system
perirenal fat
teratoma
5. Sarcoma
Q-value
5.03E-02
7.42E-02
6.91E-07
2.70E-06
2.05E-05
1.36E-03
5.06E-03
2.82E-02
2.99E-03
2.99E-03
ID
UBERON:0000949
UBERON:0002368
DOID:4
DOID:14566
DOID:162
DOID:0050687
DOID:7
DOID:1115
DOID:0060100
DOID:201
Term
endocrine system
endocrine gland
disease
disease of cellular proliferation
cancer
cell type cancer
disease of anatomical entity
sarcoma
musculoskeletal system cancer
connective tissue cancer
6. Endothelial cells
Q-value
8.56E-09
8.56E-09
8.56E-09
1.40E-07
5.90E-07
5.90E-07
7.99E-07
1.34E-06
3.14E-06
3.14E-06
4.70E-06
5.70E-06
1.15E-05
1.41E-05
1.61E-05
2.98E-05
2.50E-04
2.77E-04
1.03E-03
1.03E-03
1.03E-03
1.81E-02
1.97E-02
6.12E-13
3.38E-11
3.38E-11
3.38E-11
8.91E-11
8.06E-09
2.46E-04
5.12E-04
7.81E-04
2.76E-02
2.76E-02
ID
UBERON:0001986
UBERON:0004638
UBERON:0004852
UBERON:0000055
UBERON:0002049
UBERON:0007798
UBERON:0003914
UBERON:0000487
UBERON:0001981
UBERON:0004537
UBERON:0006914
UBERON:0004111
UBERON:0000025
UBERON:0004535
UBERON:0001009
UBERON:0000490
UBERON:0000483
UBERON:0000119
UBERON:0001917
UBERON:0003915
UBERON:0004700
UBERON:0000477
UBERON:0004120
CL:0000115
CL:0000213
CL:0000215
CL:0002078
CL:0002139
CL:0000071
CL:0000076
CL:0000066
CL:1000413
CL:0002543
CL:0002544
Term
endothelium
blood vessel endothelium
cardiovascular system endothelium
vessel
vasculature
vascular system
epithelial tube
simple squamous epithelium
blood vessel
blood vasculature
squamous epithelium
anatomical conduit
tube
cardiovascular system
circulatory system
unilaminar epithelium
epithelium
cell layer
endothelium of artery
endothelial tube
arterial system endothelium
anatomical cluster
mesoderm-derived structure
endothelial cell
lining cell
barrier cell
meso-epithelial cell
endothelial cell of vascular tree
blood vessel endothelial cell
squamous epithelial cell
epithelial cell
endothelial cell of artery
vein endothelial cell
aortic endothelial cell
Q-value
3.47E-03
2.52E-02
2.95E-05
2.95E-05
2.95E-05
6.36E-05
6.36E-05
3.58E-04
3.58E-04
4.52E-04
5.45E-04
9.32E-04
1.41E-03
2.75E-03
2.75E-03
8.15E-02
2.47E-03
3.39E-03
4.57E-03
6.05E-03
8.60E-07
5.53E-05
2.75E-04
3.82E-04
3.82E-04
5.12E-04
7.29E-02
1.44E-02
8.63E-03
ID
UBERON:0003914
UBERON:0000025
UBERON:0001637
UBERON:0003509
UBERON:0004572
UBERON:0004571
UBERON:0004573
UBERON:0001981
UBERON:0004537
UBERON:0004535
UBERON:0001009
UBERON:0000055
UBERON:0004120
UBERON:0002049
UBERON:0007798
UBERON:0001533
CL:0000134
CL:0000048
CL:0000723
CL:0000034
CL:0000359
CL:0000192
CL:0000187
CL:0000211
CL:0000393
CL:0000183
CL:0002595
CL:0002494
DOID:2394
Term
epithelial tube
tube
artery
arterial blood vessel
arterial system
systemic arterial system
systemic artery
blood vessel
blood vasculature
cardiovascular system
circulatory system
vessel
mesoderm-derived structure
vasculature
vascular system
subclavian artery
mesenchymal cell
multi fate stem cell
somatic stem cell
stem cell
vascular associated smooth muscle cell
smooth muscle cell
muscle cell
electrically active cell
electrically responsive cell
contractile cell
smooth muscle cell of the subclavian artery
cardiocyte
ovarian cancer
8. Connective tissue & muscle cells
Q-value
5.16E-02
5.16E-02
8.98E-02
6.29E-03
7.08E-03
2.06E-04
8.11E-04
1.14E-03
1.14E-03
1.62E-02
1.28E-03
ID
UBERON:0001134
UBERON:0002036
UBERON:0002204
UBERON:0002384
UBERON:0006876
CL:0000183
CL:0000187
CL:0000211
CL:0000393
CL:0000188
CL:0002320
Term
skeletal muscle tissue
striated muscle tissue
musculoskeletal system
connective tissue
vasculature of organ
contractile cell
muscle cell
electrically active cell
electrically responsive cell
skeletal muscle cell
connective tissue cell
9. Connective tissue & integumental cells
Q-value
2.80E-03
2.80E-03
6.29E-03
5.58E-02
2.32E-14
1.86E-03
3.05E-15
8.51E-07
1.39E-03
1.32E-02
1.58E-02
8.95E-02
ID
UBERON:0002199
UBERON:0002416
UBERON:0003102
UBERON:0002097
UBERON:0002384
UBERON:0004923
CL:0002320
CL:0000057
CL:0002620
CL:0002334
CL:0000499
CL:0000136
Term
integument
integumental system
surface structure
skin of body
connective tissue
organ component layer
connective tissue cell
fibroblast
skin fibroblast
preadipocyte
stromal cell
fat cell
Supplementary Figure 18: Annotation of mesenchyme clusters
Enriched annotations for each cluster shown in Supplementary Fig. 17. Explanation in legend of Supplementary
Fig. 16.
21
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 19
a
CD4+ T Cells
Natural Killer Cells
CD8+ T Cells
10. Lymphocytes
CD4+CD25+CD45RA+ naive regulatory T cells
!ry conventional T cells
!ry regulatory T cells
!"ve conventional T cells
CD14+ Monocytes
Mast cell
langerhans cells, immature
langerhans cells, migratory
CD19+ B Cells
Dendr$$$""
11. Myeloid leukocytes
CD14+CD16+ Monocytes
Neutrophils
Whole blood (ribopure)
Peripheral Blood Mononuclear Cells
Basophils
B lymphoblastoid cell line
b cell line
xeroderma pigentosum b cell line
plasma cell leukemia cell line
splenic lymphoma with villous lymphocytes cell line
12. Lymphocytes of B lineage
non T non B acute lymphoblastic leukemia (ALL) cell line
lymphoma, malignant, hair$$$$$
acute lymphoblastic leuk"`!@@z$$$
\\<$"/J$$$Y"$$$
Burkitts lymphoma cell line
chronic lymphocytic leuk"`@@z$$$
Hodgkins lymphoma cell line
acute lymphoblastic leuk"`!@@z$$$
hairy cell leukemia cell line
mycosis fungoides, T cell lymphoma cell line
"<$$$$<kemia cell line
13. Lymphoma
anaplastic large cell lymphoma cell line
myeloma cell line
lymphangiectasia cell line
hereditary spherocytic anemia cell line
NK T cell leukemia cell line
Endothelial Progenitor Cells
Dendr$$"</rived
"/Y"Jrived
thymus, fetal
14. Immune organs
thymus, adult
blood, adult
lymph node, adult
spleen, fetal
spleen, adult
Reticulocytes
chronic myelogenous leukemia (CML) cell line
acute myeloid leukemia (FAB M6) cell line
leukemia, chronic megakaryoblastic cell line
acute myeloid leukemia (FAB M7) cell line
acute myeloid leukemia (FAB M0) cell line
acute myeloid leukemia (FAB M1) cell line
15. Myeloid leukemia
acute myeloid leukemia (FAB M4) cell line
acute myeloid leukemia (FAB M2) cell line
‚
$$"<$="//ow derived
acute myeloid leukemia (FAB M5) cell line
acute myeloid leukemia (FAB M4eo) cell line
biphenotypic B myelomonocytic leukemia cell line
acute myeloid leukemia (FAB M3) cell line
myelodysplastic syndrome cell line
b
10. Lymphocytes
Q-value
1.14E-03
1.14E-03
1.14E-03
1.50E-02
1.50E-02
1.69E-02
2.39E-02
4.15E-02
ID
CL:0000789
CL:0000791
CL:0002419
CL:0000542
CL:0002242
CL:0000624
CL:0000084
CL:0002087
13. Lymphoma
Term
alpha-beta T cell
mature alpha-beta T cell
mature T cell
lymphocyte
nucleate cell
CD4-positive, alpha-beta T cell
T cell
nongranular leukocyte
11. Myeloid leukocytes
Q-value
1.46E-09
7.89E-08
1.74E-07
8.60E-07
2.14E-04
9.46E-04
7.83E-03
7.83E-03
7.86E-03
5.09E-02
5.09E-02
5.09E-02
5.09E-02
5.09E-02
5.09E-02
ID
CL:0000738
CL:0000988
CL:0000219
CL:0000766
CL:0000763
CL:0000576
CL:0000451
CL:0000990
CL:0002087
CL:0000094
CL:0000453
CL:0000860
CL:0002057
CL:0002393
CL:0002397
Term
leukocyte
hematopoietic cell
motile cell
myeloid leukocyte
myeloid cell
monocyte
dendritic cell
conventional dendritic cell
nongranular leukocyte
granulocyte
Langerhans cell
classical monocyte
CD14-positive, CD16-negative classical monocyte
intermediate monocyte
CD14-positive, CD16-positive monocyte
12. Lymphocytes of B lineage
Q-value
3.28E-10
2.30E-07
2.30E-07
2.32E-06
3.32E-05
3.94E-04
6.39E-04
8.02E-02
1.58E-03
1.09E-02
1.09E-02
ID
CL:0000945
CL:0000542
CL:0002242
CL:0002087
CL:0000738
CL:0000988
CL:0000219
CL:0000236
DOID:0060058
DOID:0060083
DOID:2531
Term
lymphocyte of B lineage
lymphocyte
nucleate cell
nongranular leukocyte
leukocyte
hematopoietic cell
motile cell
B cell
lymphoma
immune system cancer
hematologic cancer
Q-value
1.79E-08
1.79E-08
4.77E-08
1.10E-07
2.20E-07
2.70E-06
5.47E-06
2.09E-09
2.09E-09
4.75E-05
7.01E-05
3.19E-04
2.40E-03
2.40E-03
2.60E-03
ID
CL:0000542
CL:0002242
CL:0000084
CL:0000738
CL:0002087
CL:0000988
CL:0000219
DOID:0060083
DOID:2531
DOID:0050686
DOID:0060058
DOID:4
DOID:1240
DOID:162
DOID:14566
Term
lymphocyte
nucleate cell
T cell
leukocyte
nongranular leukocyte
hematopoietic cell
motile cell
immune system cancer
hematologic cancer
organ system cancer
lymphoma
disease
leukemia
cancer
disease of cellular proliferation
14. Immune organs
Q-value
8.51E-05
1.11E-04
1.11E-04
5.96E-04
5.96E-04
2.60E-02
3.13E-02
3.13E-02
3.13E-02
7.58E-02
ID
UBERON:0002193
UBERON:0004177
UBERON:0005057
UBERON:0002390
UBERON:0002405
UBERON:0000077
UBERON:0002370
UBERON:0005058
UBERON:0009113
UBERON:0002106
Term
hemolymphoid system
hemopoietic organ
immune organ
hematopoietic system
immune system
mixed endoderm/mesoderm-derived structure
thymus
hemolymphoid system gland
thymic region
spleen
15. Myeloid leukemia
Q-value
6.12E-13
7.02E-08
3.25E-17
1.92E-13
2.48E-10
2.48E-10
4.46E-05
1.29E-03
3.73E-03
4.39E-03
ID
CL:0000763
CL:0000988
DOID:8692
DOID:1240
DOID:0060083
DOID:2531
DOID:0050686
DOID:4
DOID:162
DOID:14566
Term
myeloid cell
hematopoietic cell
myeloid leukemia
leukemia
immune system cancer
hematologic cancer
organ system cancer
disease
cancer
disease of cellular proliferation
Supplementary Figure 19: Dendrogram and annotation of immune system clusters
Explanation in legend of Supplementary Fig. 16.
22
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 20
Regulatory networks
hepatoblastoma cell line
hepatocellular carcinoma cell line
colon carcinoma cell line
16. Endo-epithelial cells
Intestinal epithelial cells (polarized)
Prostate Epithelial Cells
rectal cancer cell line
adenocarcinoma cell line
signet ring carcinoma cell line
17. Adenocarcinoma
tubular adenocarcinoma cell line
lung adenocarcinoma cell line
bile duct carcinoma cell line
epididymis, adult
testis, adult
18. Male reproductive organs
Clontech Human Universal Reference Total RNA
kidney, adult
kidney, fetal
Univ/"$]!_<"]rmal Tissues Biochain
SABiosciences XpressRef Human Universal Total RNA 19. Liver & kidney
liver, fetal
Hepatocyte
liver, adult
duodenum, fetal
colon, fetal
small intestine, fetal
small intestine, adult
colon, adult
20. Gastrointestinal system
pancreas, adult
rectum, fetal
appendix, adult
gall bladder, adult
stomach, fetal
hearricuspid valve, adult
hear<$valve, adult
heart, adult
hearral valve, adult
21. Heart
left ventricle, adult
left atrium, adult
heart, fetal
diaphragm, fetal
skeletal muscle, fetal
skeletal m<$$<muscle
skeletal muscle, adult
skin, fetal
umbilical cord, fetal
throat, adult
trachea, adult
22. Mouth, throat & skeletal muscle tissue
throat, fetal
trachea, fetal
tonsil, adult
esophagus, adult
penis, adult
tongue, adult
tongue, fetal
aorta, adult
lung, right lower lobe, adult
23. Lung
lung, fetal
lung, adult
thyroid, fetal
thyroid, adult
parotid gland, adult
salivary gland, adult
submaxillary gland, adult
ductus deferens, adult
seminal vesicle, adult
uterus, fetal
ovary, adult
24. Glands & internal genitalia
cervix, adult
uterus, adult
breast, adult
dura mater, adult
vein, adult
adipose tissue, adult
vagina, adult
smooth muscle, adult
prostate, adult
bladder, adult
Supplementary Figure 20: Dendrogram of trunk organs clusters
Dendrogram showing networks and cluster names for (iv) Trunk organs (cf. Supplementary Fig. 15). Corresponding
annotations are shown in Supplementary Fig. 21.
23
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 21
21. Heart
16. Endo-epithelial cells
Q-value
2.79E-02
2.79E-02
2.79E-02
2.79E-02
2.79E-02
3.98E-02
4.60E-02
4.60E-02
7.21E-02
6.13E-02
2.37E-02
5.22E-02
6.13E-02
8.44E-02
8.44E-02
8.44E-02
8.44E-02
ID
UBERON:0001242
UBERON:0001262
UBERON:0001277
UBERON:0004786
UBERON:0004808
UBERON:0004119
UBERON:0000485
UBERON:0003929
UBERON:0003350
UBERON:0001007
CL:0002563
CL:0002076
CL:0000066
CL:0000181
CL:0000182
CL:0000412
CL:0000417
Term
intestinal mucosa
wall of intestine
intestinal epithelium
gastrointestinal system mucosa
gastrointestinal system epithelium
endoderm-derived structure
simple columnar epithelium
gut epithelium
epithelium of mucosa
digestive system
intestinal epithelial cell
endo-epithelial cell
epithelial cell
metabolising cell
hepatocyte
polyploid cell
endopolyploid cell
17. Adenocarcinoma
Q-value
1.34E-03
8.63E-03
2.38E-02
2.59E-02
3.35E-02
3.35E-02
ID
DOID:299
DOID:305
DOID:162
DOID:14566
DOID:0050687
DOID:4
Term
adenocarcinoma
carcinoma
cancer
disease of cellular proliferation
cell type cancer
disease
Q-value
6.86E-10
1.33E-07
3.14E-06
2.07E-05
4.73E-04
4.73E-04
4.73E-04
6.52E-04
1.56E-02
2.09E-02
2.16E-02
2.16E-02
3.38E-02
ID
UBERON:0007100
UBERON:0000948
UBERON:0003103
UBERON:0001009
UBERON:0000946
UBERON:0003978
UBERON:0004151
UBERON:0004535
UBERON:0010313
UBERON:0010314
UBERON:0002081
UBERON:0002133
UBERON:0007023
Term
circulatory organ
heart
compound organ
circulatory system
cardial valve
valve
cardiac chamber
cardiovascular system
neural crest-derived structure
structure with developmental contribution from neural crest
cardiac atrium
atrioventricular valve
adult organism
22. Mouth, throat & skeletal muscle tissue
Q-value
1.07E-02
7.49E-02
8.34E-03
8.34E-03
3.61E-02
3.61E-02
3.61E-02
7.49E-02
7.49E-02
6.12E-03
ID
UBERON:0000475
UBERON:0000341
UBERON:0001134
UBERON:0002036
UBERON:0000383
UBERON:0001015
UBERON:0002385
UBERON:0001630
UBERON:0005090
UBERON:0000922
Term
organism subdivision
throat
skeletal muscle tissue
striated muscle tissue
musculature of body
musculature
muscle tissue
muscle organ
muscle structure
embryo
ID
UBERON:0000025
UBERON:0000117
UBERON:0000170
UBERON:0000171
UBERON:0002048
UBERON:0004111
UBERON:0000065
UBERON:0005178
UBERON:0001004
UBERON:0005181
UBERON:0000915
Term
tube
respiratory tube
pair of lungs
respiration organ
lung
anatomical conduit
respiratory tract
thoracic cavity organ
respiratory system
thoracic segment organ
thoracic segment of trunk
18. Male reproductive organs
Q-value
4.81E-02
ID
UBERON:0003135
Term
male reproductive organ
19. Liver & kidney
Q-value
3.68E-03
3.68E-03
4.96E-03
4.96E-03
3.52E-02
3.52E-02
6.17E-02
6.61E-02
7.75E-02
ID
UBERON:0005172
UBERON:0005173
UBERON:0000916
UBERON:0002417
UBERON:0002107
UBERON:0006925
UBERON:0005177
UBERON:0009569
UBERON:0002423
Term
abdomen organ
abdominal segment organ
abdomen
abdominal segment of trunk
liver
digestive gland
trunk organ
subdivision of trunk
hepatobiliary system
20. Gastrointestinal system
Q-value
1.76E-06
1.76E-05
4.06E-05
2.47E-04
2.52E-04
9.30E-04
2.71E-03
1.07E-02
1.69E-02
3.61E-02
8.51E-03
ID
UBERON:0005409
UBERON:0000160
UBERON:0004921
UBERON:0001007
UBERON:0001555
UBERON:0000059
UBERON:0000481
UBERON:0001155
UBERON:0004119
UBERON:0000922
UBERON:0011216
Term
gastrointestinal system
intestine
subdivision of digestive tract
digestive system
digestive tract
large intestine
multi-tissue structure
colon
endoderm-derived structure
embryo
organ system subdivision
23. Lung
Q-value
7.98E-03
1.23E-02
1.23E-02
1.23E-02
1.23E-02
3.02E-02
3.38E-02
3.68E-02
8.77E-02
4.17E-02
8.15E-02
24. Glands & internal genitalia
Q-value
1.87E-02
2.39E-02
2.39E-02
3.57E-02
2.76E-02
2.76E-02
2.76E-02
2.76E-02
3.59E-02
8.27E-02
7.00E-08
ID
UBERON:0004175
UBERON:0000990
UBERON:0005156
UBERON:0003133
UBERON:0001044
UBERON:0003294
UBERON:0003408
UBERON:0010047
UBERON:0002530
UBERON:0002553
UBERON:0007023
Term
internal genitalia
reproductive system
reproductive structure
reproductive organ
salivary gland
gland of foregut
gland of gut
oral gland
gland
anatomical cavity
adult organism
Supplementary Figure 21: Annotation of trunk organs clusters
Enriched annotations for each cluster shown in Supplementary Fig. 20. Explanation in legend of Supplementary
Fig. 16.
24
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 22
a
Regulatory networks
pineal gland, adult
eye, fetal
25. Pineal gland & eye
retina, adult
retinoblastoma cell line
medulloblastoma cell line
merkel cell carcinoma cell line
carcinoid cell line
neuroblastoma cell line
neuroectodermal tumor cell line
testicular germ cell embryonal carcinoma cell line
26. Neuron-associated cells & cancer
teratocarcinoma cell line
argyrophil small cell carcinoma cell line
gastric cancer cell line
pituitary gland, adult
small cell lung carcinoma cell line
small cell gastrointestinal carcinoma cell line
melanoma cell line
Melanocyte
Retinal Pigment Epithelial Cells
Lens Epithelial Cells
27. Astrocytes & pigment cells
Ciliary Epithelial Cells
!//=ral cortex
!//=$$<
cord blood derived cell line
embryonic kidney cell line
mesothelioma cell line
_$$
epitheloid cancer cell line
anaplastic squamous cell carcinoma cell line
synovial sarcoma cell line
28. Sarcoma & neuroectodermal tumors
rhabdomyosarcoma cell line
Wilms tumor cell line
peripheral neuroectodermal tumor cell line
neuroepithelioma cell line
carcinosarcoma cell line
small cell cervical cancer cell line
somatostatinoma cell line
b
25. Eye
Q-value
27. Astrocytes & pigment cells
ID
Term
Q-value
3.67E-03
4.67E-03
2.79E-02
2.79E-02
2.79E-02
2.79E-02
3.52E-02
4.33E-02
4.33E-02
5.10E-02
6.61E-02
7.75E-02
7.75E-02
1.28E-02
4.45E-02
4.45E-02
4.45E-02
4.45E-02
6.88E-02
No enriched terms
26. Neuron-associated cells & cancer
Q-value
9.22E-02
3.61E-04
4.21E-04
5.19E-04
7.57E-04
5.83E-03
5.83E-03
1.89E-02
2.63E-02
4.15E-02
ID
CL:0000095
DOID:162
DOID:14566
DOID:0050687
DOID:4
DOID:2994
DOID:3095
DOID:305
DOID:0050686
DOID:3119
Term
neuron associated cell
cancer
disease of cellular proliferation
cell type cancer
disease
germ cell cancer
germ cell and embryonal cancer
carcinoma
organ system cancer
gastrointestinal system cancer
ID
UBERON:0007625
UBERON:0010371
UBERON:0000019
UBERON:0000047
UBERON:0004088
UBERON:0010230
UBERON:0000970
UBERON:0001456
UBERON:0002104
UBERON:0004121
UBERON:0000020
UBERON:0001032
UBERON:0004456
CL:0000075
CL:0000126
CL:0000127
CL:0000128
CL:0000147
CL:0000325
Term
pigment epithelium of eye
ecto-epithelium
camera-type eye
simple eye
ocular region
eyeball of camera-type eye
eye
face
visual system
ectoderm-derived structure
sense organ
sensory system
entire sense organ system
columnar/cuboidal epithelial cell
macroglial cell
astrocyte
oligodendrocyte
pigment cell
stuff accumulating cell
28. Sarcoma & neuroectodermal tumors
Q-value
1.34E-03
2.40E-03
2.60E-03
8.63E-03
1.00E-02
1.21E-02
9.65E-02
ID
DOID:0050687
DOID:1115
DOID:4
DOID:162
DOID:14566
DOID:171
DOID:169
Term
cell type cancer
sarcoma
disease
cancer
disease of cellular proliferation
neuroectodermal tumor
neuroendocrine tumor
Supplementary Figure 22: Dendrogram and annotation of neuron-associated clusters
Explanation in legend of Supplementary Fig. 16.
25
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 23
a
salivary acinar cells
keratoacanthoma cell line
cervical cancer cell line
nasal epithelial cells
Corneal Epithelial Cells
Esophageal Epithelial Cells
Ker"ral
Bronchial Epithelial Cell
29. Epithelial cells
Tracheal Epithelial Cells
Urothelial Cells
Small Airway Epithelial Cells
Mammary Epithelial Cell
Gingival epithelial cells
"$$"†rived cells
Sebocyte
Ker"rmal
pancreatic carcinoma cell line
choriocarcinoma cell line
placenta, adult
30. Extraembryonic membrane
chorionic membrane cells
amniotic membrane cells
serous cystadenocarcinoma cell line
renal cell carcinoma cell line
serous adenocarcinoma cell line
clear cell carcinoma cell line
endometrial cancer cell line
Placental Epithelial Cells
31. Epithelial cells of kidney & uterus
Amniotic Epithelial Cells
Alveolar Epithelial Cells
Renal Mesangial Cells
Renal Proximal Tubular Epithelial Cell
Renal Epithelial Cells
Renal Cortical Epithelial Cells
maxillary sinus tumor cell line
breast carcinoma cell line
bronchioalveolar carcinoma cell line
papillotubular adenocarcinoma cell line
mucinous adenocarcinoma cell line
gall bladder carcinoma cell line
ductal cell carcinoma cell line
prostate cancer cell line
tr""$$$"/"$$$
malignant trichilemmal cyst cell line
32. Lung epithelium & lung cancer
squamous cell carcinoma cell line
epidermoid carcinoma cell line
oral squamous cell carcinoma cell line
bronchogenic carcinoma cell line
glassy cell carcinoma cell line
squamous cell lung carcinoma cell line
$"/J$$keratinizing squamous carcinoma cell line
acantholytic squamous carcinoma cell line
pharyngeal carcinoma cell line
bronchial squamous cell carcinoma cell line
b
29. Epithelial cells
Q-value
4.89E-05
2.75E-04
1.78E-03
2.03E-03
2.98E-02
5.13E-02
7.29E-02
ID
CL:0000066
CL:0002076
CL:0002077
CL:0002159
CL:0002251
CL:0002368
CL:0000622
31. Epithelial cells of kidney & uterus
Term
epithelial cell
endo-epithelial cell
ecto-epithelial cell
general ecto-epithelial cell
epithelial cell of alimentary canal
respiratory epithelial cell
acinar cell
30. Extraembryonic membrane
Q-value
1.37E-02
4.60E-02
4.60E-02
ID
UBERON:0000478
UBERON:0000158
UBERON:0005631
Term
extraembryonic structure
membranous layer
extraembryonic membrane
32. Lung epithelium & lung cancer
Q-value
7.21E-02
2.58E-02
2.58E-02
7.21E-02
9.57E-02
9.73E-02
1.94E-08
4.48E-04
6.20E-02
2.40E-03
1.80E-02
2.70E-14
1.99E-11
2.82E-09
3.63E-09
1.11E-08
1.21E-08
ID
UBERON:0000115
UBERON:0000464
UBERON:0000466
UBERON:0004802
UBERON:0004807
UBERON:0005911
CL:0000066
CL:0000076
CL:0000082
DOID:0050615
DOID:1324
DOID:305
DOID:0050687
DOID:162
DOID:14566
DOID:4
DOID:1749
Term
lung epithelium
anatomical space
immaterial anatomical entity
respiratory tract epithelium
respiratory system epithelium
endo-epithelium
epithelial cell
squamous epithelial cell
epithelial cell of lung
respiratory system cancer
lung cancer
carcinoma
cell type cancer
cancer
disease of cellular proliferation
disease
squamous cell carcinoma
Q-value
1.61E-05
1.61E-05
1.61E-05
1.61E-05
3.59E-05
3.59E-05
9.47E-05
9.47E-05
9.47E-05
9.47E-05
1.41E-04
1.41E-04
2.20E-04
3.58E-04
3.58E-04
1.95E-03
1.95E-03
7.64E-03
7.64E-03
9.02E-03
1.07E-02
1.07E-02
4.33E-02
9.57E-02
2.66E-07
2.66E-07
9.71E-06
1.22E-05
6.67E-05
6.67E-05
2.75E-04
2.75E-04
4.60E-02
4.60E-02
6.16E-03
1.11E-02
1.89E-02
ID
UBERON:0001285
UBERON:0004819
UBERON:0006555
UBERON:0007684
UBERON:0002113
UBERON:0011143
UBERON:0001231
UBERON:0004211
UBERON:0004810
UBERON:0009773
UBERON:0001008
UBERON:0006554
UBERON:0000489
UBERON:0001225
UBERON:0008987
UBERON:0000353
UBERON:0001851
UBERON:0005172
UBERON:0005173
UBERON:0003103
UBERON:0000916
UBERON:0002417
UBERON:0003914
UBERON:0000474
CL:0002518
CL:1000497
CL:1000449
CL:0000066
CL:1000494
CL:1000507
CL:0002584
CL:0002681
CL:0002149
CL:0002255
DOID:120
DOID:193
DOID:299
Term
nephron
kidney epithelium
excretory tube
uriniferous tubule
kidney
upper urinary tract
nephron tubule
nephron epithelium
nephron tubule epithelium
renal tubule
excretory system
urinary system structure
cavitated compound organ
cortex of kidney
renal parenchyma
parenchyma
cortex
abdomen organ
abdominal segment organ
compound organ
abdomen
abdominal segment of trunk
epithelial tube
female reproductive system
kidney epithelial cell
kidney cell
epithelial cell of nephron
epithelial cell
epithelial cell of renal tubule
kidney tubule cell
renal cortical epithelial cell
kidney cortical cell
epithelial cell of uterus
stromal cell of endometrium
female reproductive organ cancer
reproductive organ cancer
adenocarcinoma
Supplementary Figure 23: Dendrogram and annotation of epithelium clusters
Explanation in legend of Supplementary Fig. 16.
26
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 24
(i) Nervous system: ectoderm, neural crest
P-value
1.69E-21
2.94E-20
1.63E-08
1
2
Neurons, fetal brain
Nervous system, hindbrain
3
Forebrain
4
Mesenchymal, mixed
5
Sarcoma
6
Endothelial cells
7
Mesenchymal, smooth muscle cells
8
Mesenchymal, skeletal muscle cells
9
Connective tissue, integumental cells
ID
UBERON:0004121
UBERON:0010314
UBERON:0010316
Term
ectoderm-derived structure
structure with developmental contribution from neural crest
germ layer / neural crest
(ii) Mesenchyme: mesoderm
P-value
3.07E-05
1.39E-14
ID
UBERON:0004120
CL:0000222
Term
mesoderm-derived structure
mesodermal cell
Embryonic germ layers
10
11
12
13
14
Ectoderm, neural crest
Mesoderm
Endoderm
Lymphocytes
Myeloid leukocytes
Lymphocytes of B lineage
Lymphoma
Immune organs
(iii) Immune system: mesoderm
Myeloid leukemia
Endo-epithelial cells
Adenocarcinoma
Male reproductive organs
Liver, kidney
Gastrointestinal system
Heart
22 Mouth, throat, skeletal muscle tissue
23 Lung
15
16
17
18
19
20
21
24
P-value
4.41E-03
8.60E-03
8.60E-03
8.60E-03
Eye
Neuron-associated cell, cancer
27 Astrocytes, pigment cells
28 Sarcoma, neuroectodermal tumors
30
31
Epithelial cells
Extraembryonic membrane
Epithelial cells of kidney, uterus
32
Lung epithelium, lung cancer
P-value
4.29E-08
2.57E-11
ID
UBERON:0004119
UBERON:0010316
Term
endoderm-derived structure
germ layer / neural crest
(v) Neuron-associated: ectoderm, neural crest
Glands, internal genitalia
25
26
29
(iv) Trunk organs: mixed origin
ID
UBERON:0004121
UBERON:0002346
UBERON:0010316
CL:0000333
Term
ectoderm-derived structure
neurectoderm
germ layer / neural crest
migratory neural crest cell
(vi) Epithelium: mixed origin
P-value
3.27E-03
ID
UBERON:0004119
Term
endoderm-derived structure
Supplementary Figure 24: Developmental origin of regulatory networks pertaining to different highlevel clusters
We performed annotation enrichment analysis for each high-level cluster (i-vi, Supplementary Fig. 15), focusing
specifically on terms related to the three embryonic germ layers (ectoderm and neural crest, mesoderm, and endoderm).
Regulatory networks of the first cluster (i, nervous system) are derived from ectoderm and neural crest. Regulatory
networks of the second cluster (ii, mesenchyme) are mesoderm-derived. Immune cells (clusters 11-13) are derived from
hematopoietic stem cells (the corresponding high-level cluster [iii, immune system] does not show significant enrichment
for mesoderm; this is just an artifact because immune cells and corresponding cancer cells were not systematically annotated as mesodermal cells in the FANTOM5 sample ontology). The fourth cluster (iv, trunk organs) groups regulatory
networks of mixed origin (endoderm and neural crest show significant enrichment, but we also found mesoderm-derived
tissues). The fifth cluster (v, neuron-associated) enriches for ectoderm and neural crest. Epithelial cells (cluster vi) are
derived from all three germ layers. In summary, regulatory networks that are clustered together often share a common
developmental origin.
27
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 25
Immune cells
Mean number of TFs per gene
11 Myeloid leukocytes
10 Lymphocytes
Nervous system
2 Nervous system & adult hindbrain
1 Neurons & fetal brain
3 Adult forebrain
12.5
10
7.5
11 10
2 1 3
26 23 21 20 16 30 19 14 22 7 12 15 31 28 6 29 13 27 17 24 9 32 8 5 4
Cluster index
Supplementary Figure 25: Comparing number of TFs per gene across lineages
Boxplots show the mean number of TF inputs per gene (i.e., the mean indegree) for all networks across the 32 clusters
defined in Supplementary Fig. 14 (weak edges with weights below 0.05 were not counted for the indegrees). Immune
cells, followed by cells and tissues of the nervous system, have the highest number of TFs per gene, indicating that they
rely on more intricate, combinatorial regulation than other cells and tissues (the observed differences between the mean
indegrees of immune cells, nervous system, and the remaining clusters are statistically significant; Supplementary
Fig. 26). Both immune cells and neurons perform highly adaptive functions that require integration of diverse
extra-cellular signals, e.g., to recognize distinct pathogens or guide neuronal wiring. Our networks suggest that the
orchestration of transcriptional responses in these cells involves intricate regulatory programs.
28
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 26
a
b
11. Myeloid leukocytes
10. Lymphocytes
2. Nervous system & adult hindbrain
1. Neurons & fetal brain
3. Adult forebrain
]</""$$‡"/
23. Lung
21. Heart
20. Gastrointestinal system
Y$"$$$s & hepatocytes
30. Extraembryonic membrane
19. Liver & kidney
14. Immune organs
22. Mouth, throat & skeletal muscle tissue
7. Mesenchymal stem & smooth muscle cells
12. Lymphocytes of B lineage
15. Myeloid leukemia
31. Epithelial cells of kidney & uterus
28. Neuroectodermal tumor & sarcoma
6. Endothelial cells
29. Epithelial cells
13. Lymphoma
27. Astrocytes & pigment cell
17. Adenocarcinoma
24. Glands & internal genitalia
9. Connective tissue & integumental cells
32. Lung epithelium & lung cancer
8. Connectivte tissue & mucle cells
5. Sarcoma
4. Mesenchymal, mixed
Immune
cells
p<10-6
Nervous
system
p<0.01
7.5
10
12.5
100
Mean number of TFs per gene
c
150
200
d
Immune
cells
11. Myeloid leukocytes
10. Lymphocytes
2. Nervous system & adult hindbrain
1. Neurons & fetal brain
3. Adult forebrain
]</""$$‡"/
23. Lung
21. Heart
20. Gastrointestinal system
Y$"$$$s & hepatocytes
30. Extraembryonic membrane
19. Liver & kidney
14. Immune organs
22. Mouth, throat & skeletal muscle tissue
7. Mesenchymal stem & smooth muscle cells
12. Lymphocytes of B lineage
15. Myeloid leukemia
31. Epithelial cells of kidney & uterus
28. Neuroectodermal tumor & sarcoma
6. Endothelial cells
29. Epithelial cells
13. Lymphoma
27. Astrocytes & pigment cell
17. Adenocarcinoma
24. Glands & internal genitalia
9. Connective tissue & integumental cells
32. Lung epithelium & lung cancer
8. Connectivte tissue & mucle cells
5. Sarcoma
4. Mesenchymal, mixed
Nervous
system
8000
10000
12000
Number of active genes per network
100
150
200
29
250
Average network clustering coefficient
Supplementary Figure 26: Comparing network properties across lineages
(Caption next page.)
Nature Methods: doi:10.1038/nmeth.3799
250
Mean number of target genes per TF
Supplementary Figure 26: (Previous page.)
Boxplots show network properties for all 394 regulatory networks, grouped according to the 32 clusters defined in
Supplementary Fig. 14. The same cluster order on the vertical axis is used in all four panels.
(a) Mean number of TFs per gene (i.e., mean indegree; same as Supplementary Fig. 25). Clusters were ordered on
the vertical axis by their median. Immune cells, followed by cells and tissues of the nervous system, have the highest
number of TFs per gene. The observed differences are statistically significant: immune cells have a higher mean indegree
than nervous system networks (p<10-6, two-sided Wilcoxon rank-sum test), and the latter come before other networks
with a high mean indegree, such as those of lung (p<0.01). Note, cluster 26 (neuron-associated cells and cancer) also
contains some neural tissues and related tumors, which explains why it comes right after the nervous system clusters.
(b) Mean number of target genes per TF (i.e., mean outdegree). Networks of immune cells and nervous system clusters
have relatively high mean outdegree; however, in contrast to the mean indegree, differences between immune cells,
nervous system, and lung (as an example of another cluster with high mean outdegree) are not statistically significant
(p>0.05).
(c) The number of active genes in each network mainly depends on whether it is derived from a cell or tissue sample.
Tissues comprise multiple cell types and thus tend to have more active genes (the union of the genes active in each
cell type). Nervous system networks have a similar number of active genes as other tissue-specific networks (e.g.,
lung). Immune cell networks have a lower number of active genes, but again not significantly different from other cell
type-specific networks (e.g., connective tissue and muscle cells).
(d) There are no notable differences in the average network clustering coefficient between different lineages.
30
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 27
Network
GWAS
Summary statistics
(1) Gene-level p-values
(2) Pairwise connectivity
Significance
SNP p-values
How “close” are any two
genes in the network?
Connectivity matrix
(diffusion kernel)
Reference population
SNP correlation matrix
Pascal
Gene p-value
Genotypes
(3) Connectivity enrichment curves
Connectivity enrichment
Do trait-associated genes
cluster within network?
Original data
Gene-label permutations
Genes ranked by p-value
Phenotype
Genotype
G
G
G
G
A
G
A
G
A
A
Genotype-phenotype vs.
gene label permutations
A
A
(4) Connectivity enrichment scores
Summarize enrichment curves
Area under the curve
(5) Validation
LD and other counfounders
accounted for?
Original data
Gene-label
permutations
Enrichment score =
–log10(empirical p-value)
Supplementary Figure 27: Overview of network connectivity enrichment analysis
Given summary statistics from a GWAS and a network, our pipeline evaluates whether genes perturbed by traitassociated variants are more densely inter-connected than expected. Here we give a general overview of the approach,
see Supplementary Fig. 28 for an illustrative example.
(1) SNP association p-values from the GWAS are summarized at the level of genes using Pascal,8 a fast and accurate
tool that corrects for LD structure using information from a reference population (we used the European panel of the
1000 genomes project9 ).
(2) A pairwise connectivity matrix is computed, which defines how "close" any two genes are in the network based on
a random-walk kernel.10
(3) Genes are ranked by their GWAS p-value, from most to least significant, and the mean connectivity between the
top k genes is evaluated — as k is varied over the complete list — both for the original data and random permutations
(Supplementary Fig. 28). A conservative approach is used for the permutations, where the network structure
remains completely fixed and only the labels of genes with similar network degree are shuffled.11 Gene pairs that are
in LD are excluded from the analysis (Supplementary Fig. 29).
(4) Enrichment curves are summarized by the signed area under the curve, both for the original data and the permutations, and a corresponding empirical p-value is computed.
(5) We validated that LD structure and other potential confounders are accounted for by showing that enrichment
scores based on gene label permutations are equivalent to those obtained using genotype-phenotype permutations
(Supplementary Fig. 30).
31
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 28
Network connectivy
Higher than
expected
Lower than
expected
b
HDL cholesterol
Original data
10,000 permutations
(99% confidence interval)
Genome-wide significant genes
(GWAS p-value < 10)
Signed area under the curve
a
Original data
Enrichment score =
–log10(q-value)
Permutations
0
500
1000
1500
Genes ranked by GWAS p-value
Supplementary Figure 28: Illustrative example for network connectivity enrichment analysis
Network connectivity analysis is illustrated using an example trait and network that show strong enrichment (HDL
cholesterol and the global regulatory network defined in Methods).
(a) Genes are ranked by GWAS p-value. For each position k in the ranked list, the network connectivity between the
top k genes is computed using a random-walk kernel (red line; positive and negative values indicate higher and lower
connectivity than expected, respectively). The same is done for 10,000 permutations of the data (the grey area contains
99% of the corresponding curves). In this example, the top 500 genes show significantly increased connectivity, which
includes genes with GWAS p-values that do not reach the genome-wide significance threshold (dashed line).
(b) To summarize connectivity enrichment, the signed area under the curve is computed both for the original data
(red line) and the permutations (boxplot), resulting in an empirical p-value and corresponding score that gauge how
clustered trait-associated genes are within the network.
32
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 29
Chromosome 1
Chromosome 6
1000
750
Genes
Genes
750
500
500
250
250
HLA region
250
500
750
1000
250
Genes
500
750
Genes
Correlation of GWAS p-values
Neighboring genes
(distance <1mb)
0.00 0.25 0.50 0.75
Supplementary Figure 29: Correlation between neighboring genes due to linkage disequilibrium
The GWAS association signal of neighboring genes is often correlated due to linkage disequilibrium (LD). Here we
show the pairwise correlation between gene scores (GWAS p-values summarized by gene using Pascal8 ) across 500
simulated phenotypes (Methods). Genes of chromosome 1 and 6 are shown, ordered by genomic position (the same
observations were be made for other chromosomes). Genes that are less than 1 mega-base apart (magenta overlay) often
show increased correlation. This is relevant for network analysis because neighboring genes are often also functionally
related and/or co-regulated, i.e., they are also neighbors at the level of pathways or networks. An extreme example is
the human leukocyte antigen (HLA) gene cluster on chromosome 6. If not properly corrected for, such clusters of genes
that are both correlated and functionally related lead to inflated pathway or network enrichment scores. To ensure
that our results were not confounded by neighboring, functionally related genes, we took a conservative approach and
completely excluded all gene pairs with a distance of less than 1 mega-base from the network connectivity enrichment
analysis (Methods). Since the HLA genes form an exceptionally large cluster that also shows strong association with
many immune-related traits, we further completely excluded all genes in the HLA region from the connectivity analysis.
This ensures that the observed network connectivity enrichment is not driven by the HLA genes or similar gene clusters.
33
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 30
Score
G
en
ePh
la
be
en
lp
ot
yp
er
m
eut
la
be
at
io
lp
ns
er
m
ut
at
io
ns
(Signed area under enrichment curve)
No significant difference
Supplementary Figure 30: Validation of connectivity enrichment scores using phenotype-label permutations
Our method computes connectivity enrichment scores empirically based on permutations of the data. It is thus crucial
that permutations are done appropriately such that no bias is introduced. Phenotype label permutation is generally
considered the gold standard for this purpose because it preserves both the network structure and genomic architecture
(e.g., LD structure and gene clusters, see Supplementary Fig. 29). However, it is computationally intensive and
requires access to genotypic data, which are rarely shared. Thus, a standard approach used in network-based GWAS
analysis is within-degree gene label permutation: the labels of genes with similar network degree are shuffled, while
the network structure itself remains completely fixed11, 12 (i.e., no edges are rewired, only node labels are reassigned).
However, in our case the relevant confounder is not the degree of the gene, but rather its mean pairwise connectivity
across all other genes (we call this the centrality of the gene), which is closely related — but not equivalent — to
the degree. Thus, we used the same approach as within-degree gene label permutation, but in our case the genes
were binned based on their centrality (Methods). This permutation method implies that the null distribution of
connectivity between the top k genes is entirely a function of the centrality of these genes.11 In order to validate the
within-centrality gene label permutation method, we generated 500 phenotype-label permutations and compared the
resulting scores (signed area under the connectivity enrichment curve; Methods). We found no significant difference
(p=0.18, two-sided Wilcoxon rank-sum test), thus confirming that our method based on within-centrality gene label
permutation adequately corrects for confounders. I.e., the resulting enrichment scores are equivalent to those obtained
using phenotype label permutation. Note that this only works because our method excludes gene pairs that are in LD
as explained in Supplementary Fig. 29 — if these gene pairs are included scores become inflated.
34
Nature Methods: doi:10.1038/nmeth.3799
-s
p
Nature Methods: doi:10.1038/nmeth.3799
pe
ci
Ti Pro fic
ss te re
ue in gu
-s -pr lat
pe ot or
c e y
G
Ti lo G ific in i cir
ss ba lo c nt cu
ue l r ba o- er its
-s eg l c ex act
pe u o pr io
ci lato -ex es n
fic r p si
D y (C res on
N
as hI sio
e P- n
fo se
ot q)
pr
in
ts
1
All genes
0
35
–log10(q-value)
Enrichment score
b
-s
0
–log10(q-value)
Regulatory
(ChIP-seq)
–log10(q-value)
–log10(q-value)
Networks
Tissue-specific
co-expression
0
Global coexpression
Tissue-specific
regulatory
–log10(q-value)
Protein
interaction
–log10(q-value)
Psychiatric
ue
–log10(q-value)
Enrichment score
/
Sc
hi zo De Bi
p /
pr po A hre /
es lar no ni
siv d re a
In
fla
e iso xia
di rd
m
so er
m U
at l
Au rde
or ce
y ra A tis r
b
Rh C ow tive DHm
eu roh el d co D
li
m n
He Mu ato's disea tis
pa ltip id ise se
tit le ar as
Al Tyis C scl thri e
zh pe re er tis
ei 1 so os
m d lv is
Pa
er ia e
's b r
rk
in Na dis etes
s
o
HD n rco eas s
's
LDL c dislepse
h
y
L
Bl
T
ot choleseas
o
al o te e
Cood
l
ro pre T cho esterol
na ss rig le ro
ry ur lyc ste l
G Fartee (s eridrol
lyc a r ys e
at sti y d to s
e ng is lic
Tyd he gl eas )
pe m uc e
2 og ose
Fa
st 2h diablobi
i
In ng r g e n
In sul pr luc tes
sulinin s oinsose
" re ecr uli
s e n
M
Fa$$ istation
a
st \< nc
M cul
in e
ac ar
g ul de
in ar g
su
de en
l
ge era H BMin
ne tio e I
ig
r
O ation (w ht
st n e
eo (d t)
po ry
ro )
sis
a
Ti
ss
ec
i
Ti Pro fic
ss te re
ue in gu
-s -pr lat
pe ot or
c e y
Ti Glo G ific in i cir
ss ba lo c nt cu
ue l r ba o- er its
-s eg l c ex act
pe u o pr io
ci lato -ex es n
fic r p si
D y (C res on
N
as hI sio
e P- n
fo se
ot q)
pr
in
ts
ss
ue
Ti
Supplementary Figure 31
Neurodegenerative Cardiovascular
Anthropometric
Immune-related
Blood lipids
Glycemic
Others
2
1
0
2
1
0
2
1
0
2
1
2
1
10% FDR
GWAS
c
Genome-wide
significant genes
2
2
1
0
Supplementary Figure 31: Connectivity enrichment scores for different types of networks
(Caption next page.)
Supplementary Figure 31: (Previous page.)
Connectivity enrichment scores for different types of networks and GWAS traits. See legend of Fig. 2c and Methods
for a description of the different networks.
(a) Same as Fig. 2c, but here we show results for all traits, not just those showing significant connectivity enrichment
for at least one network. Some traits do not show enrichment, which may be either because the signal was too weak,
the right tissues were not profiled, or other types of networks may be more relevant for these traits.
(b-c) Overall connectivity enrichment scores when considering (b) all genes and (c) only genes with GWAS p-values
that reach genome-wide significance.
(b) This panel summarizes the results of Fig. 2c. Tissue-specific regulatory networks show the strongest connectivity
enrichment, followed by protein-protein interaction networks and tissue-specific co-expression networks. Among the four
protein-protein interaction networks tested (Methods), the InWeb database13 showed the best connectivity enrichment,
covering the largest number of traits. The regulatory networks from ENCODE based on ChIP-seq14 and DNaseI
footprints15 only show weak connectivity enrichment for most traits. Note that the networks based on DNaseI footprints
were likely not comprehensive enough to reveal perturbed regulatory modules because they are limited to TF genes
and promoter-level interactions (Methods).
(c) When considering only genes that pass the GWAS significance threshold, the connectivity enrichment scores are
much lower, suggesting substantial contribution of weakly associated genes.
36
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 32
a
e
Y""
Cardiovascular
High-level networks Individual networks
High-level networks Individual networks
Cross-disorder
Suppl. Fig. 33
Suppl. Fig. 33
%'"+ No enrichment
No enrichment
Schizophrenia
Suppl. Fig. 34
Suppl. Fig. 34
;
No enrichment
=
@
Suppl. Fig. 35
No enrichment
Bipolar disorder
No enrichment
Suppl. Fig. 36
Depressive disorder
No enrichment
No enrichment
No enrichment
No enrichment
ADHD
No enrichment
f
No enrichment
J""
High-level networks Individual networks
No enrichment
Z
Fasting glucose
No enrichment
No enrichment
J"$
No enrichment
No enrichment
TX
No enrichment
No enrichment
X$"
No enrichment
No enrichment
Fasting proinsulin
No enrichment
No enrichment
Suppl. Fig. 37
Suppl. Fig. 37
Insulin secretion
No enrichment
No enrichment
Ulcerative colitis
Suppl. Fig. 38
Suppl. Fig. 38
Insulin resistance
No enrichment
No enrichment
Crohns disease
Suppl. Fig. 39
Suppl. Fig. 39
Beta-cell function
Suppl. Fig. 46
No enrichment
Suppl. Fig. 40a
Suppl. Fig. 40a
Fasting insulin
No enrichment
No enrichment
Multiple sclerosis
No enrichment
Suppl. Fig. 40b
Hepatitis C resolvers
No enrichment
Suppl. Fig. 40c
T
No enrichment
No enrichment
High-level networks Individual networks
g
"
High-level networks Individual networks
c
Neurodegenerative
%
=
Suppl. Fig. 45
Suppl. Fig. 45
Height
No enrichment
No enrichment
High-level networks Individual networks
rs disease
Suppl. Fig. 41
Suppl. Fig. 41
!"
Suppl. Fig. 42
Suppl. Fig. 42
Parkinsons disease
No enrichment
No enrichment
h
Others
High-level networks Individual networks
d
Blood lipids
>?'
@"+
Suppl. Fig. 47
Suppl. Fig. 47
>?'+
No enrichment
No enrichment
Osteoporosis
No enrichment
No enrichment
High-level networks Individual networks
HDL cholesterol
Suppl. Fig. 43a
Not tested
LDL cholesterol
Suppl. Fig. 43b
Not tested
Total cholesterol
Suppl. Fig. 44a
Not tested
T$"
Suppl. Fig. 44b
Not tested
!["
"@
"
Trait-relevant cell types rank first
Unrelated cell types (false positives)
Globally strong enrichment (diverse cell types)
No enrichment (score < 1.0)
Supplementary Figure 32: Overview of network connectivity enrichment results for all traits
Traits are listed in same order as in Supplementary Fig. 31. Refer to the indicated supplementary figures and
Results for details. Possible reasons why some traits did not show significant connectivity enrichment are: (1) the
signal was too weak (four out of five GWASs with <10,000 individuals were not enriched in any network), (2) the
relevant tissues were not profiled (e.g., our library does not include pancreatic islet cells relevant for type 2 diabetes16 ),
or (3) other types of networks (e.g., post-transcriptional) may be more relevant for these traits.
(a–c) Psychiatric, immune-related, and neurodegenerative disorders showed either highly specific connectivity enrichment in disease-related cell types and tissues or no enrichment at all, with the exception of Crohn’s disease.
(d) Blood lipid traits showed strong connectivity enrichment in many diverse networks.
(e–f ) Cardiovascular and glycemic traits showed no connectivity enrichment except for one likely false positive (β-cell
function).
(g–h) Body mass index and age-related macular degeneration (AMD) of neovascular type consistently showed connectivity enrichment in trait-related networks, while the remaining traits showed no enrichment.
37
Nature Methods: doi:10.1038/nmeth.3799
a
Supplementary Figure 33
b Individual networks
High-level networks
Nervous system & adult hindbrain (2)
Neurons & fetal brain (1)
Forebrain (3)
Rank
1
2
3
0.0
1.0
Neural stem cells
Caudate nucleus, adult
Thalamus, adult
Locus coeruleus, adult
Frontal lobe, adult
Occipital cortex, adult
Pons, adult
Spinal cord, fetal
Neurons
Parietal lobe, adult
Olfactory region, adult
Hippocampus, adult
Cerebellum, adult
Amygdala, adult
Brain, adult
Medial frontal gyrus, adult
Paracentral gyrus, adult
Medial temporal gyrus, adult
Globus pallidus, adult
Spinal cord, adult
Diencephalon, adult
Insula, adult
Postcentral gyrus, adult
Temporal lobe, adult
Cerebral meninges, adult
Occipital pole, adult
Nucleus accumbens, adult
Substantia nigra, adult
1
2
3
4
2.0
–log10(q-value)
5
6
7
8
c
9
Cognitive control
10
11
Basal ganglia
Thalamus
Prefrontal cortex
Caudate nucleus
Thalamic nuclei
12
Rank
Frontal lobe
13
14
15
16
d
Cerebellar modulation of cognition
17
18
Frontal lobe
19
Prefrontal cortex
20
21
22
Thalamus
Pons
Thalamic nuclei
Locus coeruleus
23
24
25
26
Basal ganglia
Caudate nucleus
Cerebellum
27
28
0.0
1.0
2.0
–log10(q-value)
Supplementary Figure 33: Connectivity enrichment for the psychiatric, cross-disorder study
Connectivity enrichment scores for the psychiatric, cross-disorder study across (a) the 32 high-level networks and (b)
the corresponding individual networks. All networks with score > 1.0 are shown (10% FDR, dashed line). Numbers
in parentheses correspond to cluster indexes (Supplementary Fig. 14). (a) The psychiatric cross-disorder study
showed the strongest clustering of perturbed genes precisely in the three high-level networks of nervous system and
brain. (b) After neural stem cells, the strongest connectivity enrichment was observed in the caudate nucleus, thalamus,
and locus coeruleus, which are all implicated in psychiatric disorders. The caudate nucleus is a basic structure of the
basal ganglia, strongly innervated by dopamine neurons, which performs important functions related to goal-directed
action, memory, learning, and emotion. Dopamine is an important neurotransmitter implicated in psychiatric disorders,
e.g., the dopaminergic system of the basal ganglia manifests pathological anomalies in schizophrenia patients and is
the primary target of current antipsychotic drugs.17 The thalamus, located adjacent to the caudate nucleus, is an
information processing and communication hub involved in sensory and cognitive processes that has been implicated in
the pathophysiology of schizophrenia, for instance.18 The locus coeruleus is part of the pons, which also showed strong
signal (7th rank). It is responsible for physiological responses to emotional pain, stress, and panic and has widespread
projections utilizing the neurotransmitter noradrenalin (the locus coeruleus-noradrenergic system). Dysregulation of
this system contributes to cognitive dysfunction associated with a variety of psychiatric disorders, including attentiondeficit hyperactivity disorder, sleep and arousal disorders, and post-traumatic stress disorder.19 (c–d) Schematic
representation of key cerebral circuits underlying cognitive function that are disrupted in psychiatric disorders (adapted
from Millan et al.,20 colors match the corresponding networks in Panel b). The brain structures showing the strongest
clustering of perturbed genes are not only individually relevant to psychiatric disorders, as explained above, but they
also constitute the core components of these integrated cognitive circuits, further supporting an etiological model
where their dysregulation is a cause of impaired cognitive function in patients. (c) The frontal lobe, basal ganglia, and
thalamus form a feedback loop that integrates cognitive control, attention, and working memory.20 (d) Cognition is
modulated by the cerebellum through feedback loops with the basal ganglia and the cortex, mainly via the thalamus
and the pons.20
38
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 34
Adult forebrain (3)
Nervous system and adult hindbrain (2)
0.0
1.0
2.0
–log10(q-value)
Rank
Rank
High-level networks
1
2
Individual networks
Putamen
Globus pallidus
Occipital cortex
Frontal lobe
Caudate nucleus
Locus coeruleus
Insula
Hippocampus
Cerebellum
Optic nerve
Amygdala
Medial temporal gyrus
Thalamus
Spinal cord
Basal ganglia
Neural
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0.0
1.0
2.0
–log10(q-value)
Supplementary Figure 34: Connectivity enrichment for schizophrenia
Same as Fig. 3a, but showing all networks at FDR < 10%.
39
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 35
Rank
High-level networks
Neural
Endocrine system
Other
Glands and internal genitalia (24)
Pineal gland and eye (25)
Nervous system and adult hindbrain (2)
Mouth, throat and skeletal muscle tissue (22)
Liver and kidney (19)
Extraembryonic membrane (30)
Mesenchymal stem and smooth muscle (7)
Myeloid leukocytes (11)
Heart (21)
1
2
3
4
5
6
7
8
9
0.0
1.0
2.0
–log10(q-value)
Supplementary Figure 35: Connectivity enrichment for anorexia nervosa
Anorexia nervosa showed the strongest clustering of associated genes in high-level regulatory networks of the endocrine
system (hormonal glands, Supplementary Fig. 14c) followed by nervous system and hindbrain, suggesting that endocrine
dysregulation, a hallmark of chronic starvation in anorexia nervosa,21 may be implicated.
40
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 36
Individual networks
Amygdala, adult
Medial temporal gyrus
Cerebral meninges, adult
Postcentral gyrus, adult
Temporal lobe, adult
Occipital pole, adult
Occipital lobe, adult
Medial temporal gyrus, adult
Brain, adult
Putamen, adult
1
2
3
Rank
4
5
6
7
8
9
10
0.0
1.0
2.0
–log10(q-value)
Supplementary Figure 36: Connectivity enrichment for bipolar disorder
Bipolar disorder did not show connectivity enrichment for the 32 high-level networks, but we nevertheless tested for
enrichment in all individual networks of the nervous system (clusters 1–3 in Supplementary Fig. 14). All networks
with score > 1.0 are shown (10% FDR, dashed line). The strongest clustering of perturbed genes is observed in the
regulatory network of the amygdala, a group of nuclei playing key roles in memory modulation, emotional learning,
emotional reactions, and mood. There are consistent findings in the literature suggesting that abnormalities in the
structure and function of the amygdala are implicated in the etiology of bipolar disorder.22
41
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 37
Immune-related
Vascular system
Includes cell types
of vascular system
High-level networks
Rank
0.0
1.0
Rank
Connective tissue and muscle cells (8)
Nervous system and adult hindbrain (2)
Myeloid leukocytes (11)
Immune organs (14)
Astrocytes and pigment cells (27)
Mesenchymal and smooth muscle (7)
Mesenchymal, mixed (4)
Glands and internal genitalia (24)
Endothelial cells (6)
Mouth, throat and skeletal muscle (22)
Lung (23)
Lymphocytes (10)
1
2
3
4
5
6
7
8
9
10
11
12
2.0
–log10(q-value)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Individual networks
Endothelial cells (aortic)
Endothelial cells (vein)
Endothelial cells (lymphatic)
Endothelial cells (microvascular)
Fibroblast (cardiac)
Endothelial cells (umbilical vein)
Smooth muscle cells (coronary artery)
Spleen, fetal
Medulla oblongata, adult
Mast cell
Skeletal muscle, fetal
Smooth muscle cells (aortic)
Dendritic cells (monocyte immature)
Mesenchymal stem cells (hepatic)
Endothelial cells (artery)
Renal glomerular endothelial cells
Substantia nigra, adult
Naive conventional T cells
Smooth muscle cells (umbilical artery)
Mesenchymal precursor (ovary, left)
Thyroid, adult
Vein, adult
Astrocyte (cerebellum)
Placental epithelial cells
Spinal cord, adult
Spleen, adult
Optic nerve
Breast, adult
Submaxillary gland, adult
Mesenchymal precursor (ovary, right)
Memory conventional T cells
Skeletal muscle (soleus muscle)
Thyroid, fetal
Fibroblast (pulmonary artery)
Astrocyte (cerebral cortex)
Locus coeruleus, adult
0.0
1.0
2.0
–log10(q-value)
Supplementary Figure 37: Connectivity enrichment for inflammatory bowel disease
Same as Fig. 3b, but showing all networks at FDR < 10%.
42
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 38
High-level networks
1
Mesenchymal stem & smooth muscle cells (7)
2
Astrocytes & pigment cells (27)
3
Nervous system & adult hindbrain (2)
4
Connective tissue & muscle cells (8)
5
Endothelial cells (6)
6
Mesenchymal, mixed (4)
7
Gastrointestinal system (20)
8
Connective tissue & integumental cells (9)
9
Mouth, throat & skeletal muscle tissue (22)
10
Immune organs (14)
11
Extraembryonic membrane (30)
b
0.0
1.0
2.0
–log10(q-value)
Immune-related
Vascular system
Includes cell types of vascular system
Other
Rank
Rank
a
Individual networks
Endothelial cells (microvascular)
1
Spleen, fetal
2
Endothelial cells (umbilical vein)
3
Endothelial cells (lymphatic)
4
Medulla oblongata, adult
5
Spinal cord, adult
6
Substantia nigra, adult
7
Skeletal muscle, fetal
8
Appendix, adult
9
Cerebellum, adult
10
CD4+ T cells
11
Fibroblast (cardiac)
12
Synoviocyte
13
CD14+ monocytes
14
Naive conventional T cells
15
Endothelial cells (aortic)
16
0.0 0.5 1.0 1.5
–log10(q-value)
Supplementary Figure 38: Connectivity enrichment for ulcerative colitis
Connectivity enrichment scores for ulcerative colitis across (a) the 32 high-level networks and (b) the corresponding
individual networks. Results are highly consistent with those of inflammatory bowel disease, as expected (see Fig. 3b
and main text). Namely, strong connectivity enrichment is observed for endothelial cells and immune organs. Endothelial cells drive pathophysiological processes including immune cell emigration and vascular growth (angiogenesis) that
are direct or indirect targets of many drugs used to treat ulcerative colitis.23 The spleen is the body’s largest reservoir
of monocytes and regulates inflammation through their storage and deployment.24
43
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 39
Connective tissue & muscle cells (8)
Y$"$$$`z
1
2
0.0
1.0
Individual networks
Rank
b
High-level networks
Rank
a
Fibroblast (choroid plexus)
1
0.0
1.0
2.0
–log10(q-value)
2.0
–log10(q-value)
Supplementary Figure 39: Connectivity enrichment for Crohn’s disease
Connectivity enrichment scores for Crohn’s disease across (a) the 32 high-level networks and (b) the corresponding
individual networks. Connectivity enrichment is much weaker than for inflammatory bowel disease (IBD) and ulcerative
colitis (UC) and does not seem to reveal relevant cell types.
(a) The high-level network for connective tissue & muscle cells (cluster 8) includes vascular cell types and also showed
signal for IBD and UC. However, other clusters that include vascular cell types did not show signal here.
(b) The vascular cells showing strong clustering of perturbed genes for IBD and UC were not recovered here. Only one
cell type shows significant connectivity enrichment (fibroblasts from the choroid plexus), which is likely a false positive.
44
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 40
Rheumatoid arthritis
Rank
a
High-level networks
Immune organs (14)
1
0.0
1.0
2.0
Rank
–log10(q-value)
Individual networks
1
Neutrophils
0.0
1.0
2.0
–log10(q-value)
Multiple sclerosis
Rank
b
Individual networks
1
2
!ry conventional T cells
3
CD8+ T cells
4
Neutrophils
5
6
Langerhans cells (migratory)
7
Spleen, adult
8
Spleen, fetal
9
Lymph node, adult
10
Dendritic cells (monocyte immature derived)
11
CD4+ T cells
12
Endothelial progenitor cells
13
CD14+CD16+ monocytes
14
Langerhans cells (immature)
15
Dendritic cells (plasmacytoid)
0.0
1.0
2.0
Hepatitis C resolvers
Rank
c
Immune-related
Other
Individual networks
1
CD14+CD16+ monocytes
0.0
1.0
2.0
Supplementary Figure 40: Connectivity enrichment for rheumatoid arthritis, multiple sclerosis, and
hepatitis C resolvers
Rheumatoid arthritis showed borderline connectivity enrichment in the high-level network of immune organs (a), while
multiple sclerosis and hepatitis C resolvers did not show signal in any high-level network. We nevertheless tested these
three immune-related traits for enrichment in all individual networks of the immune system.
(a) For rheumatoid arthritis, the strongest evidence for perturbed regulatory modules was found in neutrophils, which
have an activated phenotype in patients and contribute to pathogenesis through the release of cytotoxins, immunoregulatory molecules, and putative autoantigens.25
(b) Diverse immune cells showed connectivity enrichment for multiple sclerosis (MS), in particular T cells and monocytes. Indeed, the most consistent pathological finding in MS is the presence of T cell and monocyte/macrophage
infiltrates among areas of myelin breakdown and gliosis in the central nervous system.26
(c) Hepatitis C resolvers, i.e., patients who spontaneously cleared the virus, showed borderline connectivity enrichment
in monocytes, which have been proposed to spread infection over extended time because they get infected by the
hepatitis C virus with few cytopathic effects.27
45
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 41
b
High-level networks
Adult forebrain (3)
Endothelial cells (6)
Connective tissue & integumental cells (9)
Rank
1
2
3
0.0
1.0
2.0
–log10(q-value)
Individual networks
Endothelial cells (umbilical vein)
Medial temporal gyrus, adult
Locus coeruleus, adult
Neural stem cells
Occipital pole, adult
Adipocyte (subcutaneous)
Medial frontal gyrus, adult
1
2
3
Rank
a
4
5
Brain structures
Vascular system
Includes cell types of vascular system
Other
6
7
0.0
1.0
2.0
–log10(q-value)
Supplementary Figure 41: Connectivity enrichment for Alzheimer’s disease
Connectivity enrichment scores for Alzheimer’s disease across (a) the 32 high-level networks and (b) the corresponding
individual networks.
(a) The strongest clustering of perturbed genes was found in regulatory networks of adult forebrain and endothelial
cells, supporting the view that neurovascular dysregulation is implicated.28 In contrast, these results do not confirm
recent evidence for genetically driven dysregulation of immune cells in Alzheimer’s disease,29 however, the most relevant
cell type (microglia) was not present in our library.
(b) Besides from endothelial cells mentioned above, several brain structures as well as neural stem cells showed connectivity enrichment. The medial temporal gyrus ranks second: medial temporal lobe atrophy is predictive of AD in
subjects with mild cognitive impairment.30 The locus coeruleus (3rd rank) is also particularly vulnerable in the early
stages of AD and its degeneration contributes to disease pathology.31
46
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 42
a
1
Individual networks (without HLA region)
Neurons and fetal brain (1)
0.0
1.0
Rank
Rank
High-level networks
Neural stem cells
1
2.0
0.0
–log10(q-value)
1.0
2.0
–log10(q-value)
Individual networks (with HLA region)
Rank
b
CD4+ T cells
Neural stem cells
1
2
0.0
1.0
2.0
–log10(q-value)
Supplementary Figure 42: Connectivity enrichment for narcolepsy with and without HLA region
Narcolepsy is a rare sleep disorder caused by an autoimmune attack targeting hypocretin-producing neurons.32 We
generally excluded the HLA region from the connectivity enrichment analysis because of its strong LD and association
with many immune-related traits — this is a conservative measure often taken in network-based analyses of GWAS
data to make sure that results are not driven by the HLA region11 (see also Methods and Supplementary Fig. 29).
(a) We observed connectivity enrichment only in regulatory networks of neural stem cells when the HLA region was
excluded. This example shows that if the relevant cell type is not available (in this case the hypocretin neurons),
networks of related cell types may still show signal.
(b) Given that the HLA region shows particularly strong association with narcolepsy,32 we also run the connectivity
enrichment analysis with the HLA region included (but still excluding all gene pairs that are in LD, see Methods) for all
regulatory networks of the nervous and immune system. This analysis confirmed the observed clustering of perturbed
genes in neural stem cells, while also showing signal for T cells, consistent with the autoimmune basis of narcolepsy
and its association with the T-cell receptor alpha locus.32, 33
47
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 43
a
b
HDL
LDL
High-level networks
3
4
5
7
8
9
10
11
12
13
14
15
17
18
19
20
21
22
23
24
0.0
1.0
Heart (21)
Neurons & fetal brain (1)
Lung epithelium & lung cancer (32)
Y$"$$$`z
Nervous system & adult hindbrain (2)
Pineal gland & eye (25)
Astrocytes & pigment cells (27)
Lung (23)
Adenocarcinoma (17)
]</""$$‡"/`z
Mesenchymal, mixed (4)
1
2
3
4
5
Rank
2
Rank
High-level networks
Epithelial cells (29)
Neuroectodermal tumors & sarcoma (28)
]</""$$‡"/`z
Pineal gland & eye (25)
Glands & internal genitalia (24)
Heart (21)
Adenocarcinoma (17)
Myeloid_leukemia (15)
Leukocytes (11)
Connective tissue & integumental cells (9)
Mesenchymal & smooth muscle cells (7)
Adult forebrain (3)
Nervous system & adult hindbrain (2)
Neurons & fetal brain (1)
Immune organs (14)
Sarcoma (5)
Lung epithelium & lung cancer (32)
Extraembryonic membrane (30)
Mouth, throat & skeletal muscle tissue (22)
Connective tissue & muscle cells (8)
Mesenchymal, mixed (4)
Astrocytes & pigment cells (27)
Lymphoma (13)
Epithelial cells of kidney & uterus (31)
1
7
8
9
10
11
0.0
1.0
2.0
–log10(q-value)
2.0
–log10(q-value)
Supplementary Figure 43: Connectivity enrichment for blood lipids (HDL and LDL)
Connectivity enrichment scores for high-density lipoprotein (HDL) and low-density lipoprotein (LDL) across the 32
high-level networks. We found strong clustering of perturbed genes in diverse high-level networks: over 75% of all
networks show connectivity enrichment for HDL and over 30% for LDL. The same observations are made when testing
individual networks: a large number of diverse cell types show connectivity enrichment (results not shown). Accordingly,
the global regulatory network (obtained by taking the union of all cell type and tissue-specific networks, see Methods)
also shows strong connectivity enrichment (Supplementary Fig. 28). Indeed, all human cells synthesize cholesterol,
and thus share related pathways, because it is an essential component of cell membranes. Cells that regulate cholesterol
levels in the blood, e.g., hepatocytes, likely rely on some of these shared cholesterol pathways, explaining the broad
connectivity enrichment.
48
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 44
a
b
Total cholesterol
Triglycerides
High-level networks
3
7
8

10
11
13
3
7
8

10
11
13
–log10(q-value)
17
0.0
1.0
@<JY$<‡$<J"/`‚z
]<//"$</‡"/"`z
!/‡J$$`z
]</""$$‡"/`z
*"$J$"‡y`z
X$"‡r"$J"$"`
z
@<J`‚z
Hear`z
Imm</J"`
z
v<‡J<"$$$`z
'"/"`z
Adult \orebr"`‚z
Nerv<‡"<$Y=r"`z
Neurons & \etal br"`z
Mouth, throat & sZeletal m<$<`z
;<‡muscle cells (8)
My$$<Z"`z
@v/‡Ze`z
Y"$K^`
z
Y$"$$$\Zey & uterus (31)
@Y\$"J`z
Y$"$$$`z
!"/"`z
@Y"`‚z
"$//<ve organs (18)
Y$"$$$`z
Extraembry=rane (3)
Y$"$$$`z
Mesenchymal & smooth muscle cells (7)
1
"Z
*"$J$"‡y`z
@<J`‚z
Hear`z
Neurons & \etal br"`z
@<JY$<‡$<J"/`‚z
!/‡J$$`z
@v/‡Ze`z
Y"$K^`
z
'"/"`z
X$"‡r"$J"$"`
z
Y$"$$$`z
v<‡J<"$$$`z
]</""$$‡"/`z
!"/"`z
Mesenchymal & smooth muscle cells (7)
1
"Z
High-level networks
18

‚


0.0
1.0
–log10(q-value)
Supplementary Figure 44: Connectivity enrichment for blood lipids (TC and TG)
Connectivity enrichment scores for total cholesterol (TC) and triglycerides (TG) across the 32 high-level networks. We
found strong clustering of perturbed genes in diverse high-level networks: close to 50% of all networks show connectivity
enrichment for TC and over 90% for TG. The same observations are made when testing individual networks: a large
number of diverse cell types show connectivity enrichment (results not shown). This suggests that general pathways
are implicated; see legend of Supplementary Fig. 43 for details.
49
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 45
a
b
High-level networks
Mesenchymal, mixed (4)
Endothelial cells (6)
Gastrointestinal system (20)
Immune organs (14)
Mouth, throat & skeletal muscle tissue (22)
Pineal gland & eye (25)
1
2
Rank
3
4
5
6
0.0
1.0
Individual networks
Tonsil, adult
Tongue, adult
Colon, adult
Thymus, adult
Esophagus, adult
Small intestine, adult
Gall bladder, adult
Adipocyte (perirenal)
Hepatic mesenchymal tumor cell line
Sertoli cells
Trachea, adult
Colon, fetal
Tongue, fetal
Appendix, adult
Spleen, adult
Skeletal muscle, fetal
Mesenchymal stem cells
Renal glomerular endothelial cells
Hepatic sinusoidal endothelial cells
Spleen, fetal
Endothelial cells (microvascular)
Bone marrow stromal cell line
Basal cell carcinoma cell line
Eye, fetal
Penis, adult
Throat, fetal
Endothelial cells (artery)
Preadipocyte (perirenal)
Lymph node, adult
Thymus, fetal
Rectum, fetal
Duodenum, fetal
Endothelial cells (lymphatic)
Pancreas, adult
Neurofibroma cell line
Diaphragm, fetal
Trachea fetal
Tridermal teratoma cell line
Throat adult
Endothelial cells (thoracic)
Ewings sarcoma cell line
Stomach, fetal
Umbilical cord, fetal
Retina adult
Pineal gland, adult
Skin fetal
1
2
3
4
5
6
7
2.0
–log10(q-value)
8
9
10
Annotation (UBERON terms)
Immune system
Digestive system
Both terms
Other
11
12
13
14
15
16
17
18
19
Ontology enrichment (UBERON)
Rank
Term
1 embryo
20
Q-value
21
0.0008
22
2 immune system
0.013
3 digestive system
0.017
Rank
c
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
0.0
1.0
2.0
–log10(q-value)
Supplementary Figure 45: Connectivity enrichment for body mass index
(Caption next page.)
50
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 45: (Previous page.)
Connectivity enrichment scores for body mass index (BMI) across (a) the 32 high-level networks and (b) the corresponding individual networks.
(a) BMI shows connectivity enrichment in diverse high-level networks, including those of the gastrointestinal system
(rank 3) and immune organs (rank 4). Together with ulcerative colitis (Supplementary Fig. 38), BMI is the only
trait showing connectivity enrichment for the gastrointestinal system.
(b) When testing individual networks, the strongest clustering of BMI-associated genes was found in networks of the
digestive and immune system (red and purple indicate networks associated with the UBERON ontology terms digestive
system and immune system, respectively, based on the sample annotation provided by FANTOM5). For the digestive
system, we observed strong connectivity enrichment both for the lower gastrointestinal tract (small intestine, colon,
rectum, gall bladder, appendix) and the upper gastrointestinal tract (esophagus, stomach) and related networks (tongue
and tonsils, which group together with the esophagus in the hierarchical clustering of networks; Supplementary Fig.
20). For the immune system, we observed strong connectivity enrichment in different immune organs (tonsils, thymus,
spleen, lymph nodes).
(c) The overrepresentation of digestive and immune system networks was confirmed through ontology enrichment
analysis (besides the general term embryo (fetal samples), immune system and digestive system are the only significant
terms; Methods).
These results suggest that perturbed pathways specific to the digestive system impact BMI. The digestive system is
directly responsible for energy intake through (1) regulation of appetite by signaling to the brain and (2) digestion of
nutrients. The strong signal for both the intestinal tract and the immune system may further suggest a potential link
with the gut microbiome, which has recently been discovered to have a heritable component that impacts BMI.34, 35
Interestingly, previous studies by the GIANT consortium implicated mainly the central nervous system in obesity
susceptibility (i.e., the receiver-end of appetite signals), and not the digestive and immune systems.36 Reconciling
these results is an important avenue for future work. Possibly, the regulatory circuit-based analysis presented here and
pathway analyses employed by the GIANT consortium (e.g., DEPICT37 ) capture complementary aspects that may
together give a fuller picture of tissues and biological pathways that impact BMI.
51
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 46
Rank
High-level networks
Y$"$$$`z
1
0.0
1.0
2.0
–log10(q-value)
Supplementary Figure 46: Connectivity enrichment for β-cell function
Connectivity enrichment scores for β-cell function (HOMA-β) across the 32 high-level networks. Only the network
of cluster 16 (endo-epithelial cells) shows borderline enrichment. Given that none of the individual networks in this
cluster is related to β-cells, our library does not include β-cells or pancreatic islets, and no other glycemic traits showed
connectivity enrichment, this is likely a false positive.
52
Nature Methods: doi:10.1038/nmeth.3799
Supplementary Figure 47
Individual networks
Mesenchymal, mixed (4)
0.0
1.0
2.0
–log10(q-value)
Vascular
Tumors (induce vascularization)
Other
Rank
Rank
High-level networks
1
Smooth muscle cells (umbilical vein)
Spindle cell sarcoma cell line
Neurofibroma cell line
Osteoclastoma cell line
Basal cell carcinoma cell line
Fibroblast (pulmonary artery)
Preadipocyte (perirenal)
Bone marrow stromal cell line
Sertoli cells
Mesenchymal stem cells
Hepatic mesenchymal tumor cell line
Tridermal teratoma cell line
Ewings sarcoma cell line
1
2
3
4
5
6
7
8
9
10
11
12
13
0.0
1.0
2.0
–log10(q-value)
Supplementary Figure 47: Connectivity enrichment for advanced macular degeneration (neovascular)
Same as Fig. 3c, but showing all networks at FDR < 10%. Age-related macular degeneration (AMD) of neovascular
type shows the strongest connectivity enrichment in regulatory networks of vascular smooth muscle cells followed by
diverse tumors, which induce vascularization to achieve growth. As a control, we further confirmed that the dry form
of AMD, which does not involve neovascularization, does not show any connectivity enrichment in these networks.
53
Nature Methods: doi:10.1038/nmeth.3799
References
[1] Marstrand, T. T. & Storey, J. D. Identifying and mapping cell-type-specific chromatin programming of gene expression.
Proceedings of the National Academy of Sciences 111, E645–E654 (2014).
[2] Maurano, M. T. et al. Systematic Localization of Common Disease-Associated Variation in Regulatory DNA. Science 337,
1190–1195 (2012).
[3] Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
[4] Roy, S. et al. A predictive modeling approach for cell line-specific long-range regulatory interactions. Nucleic Acids Research
43, 8694–8712 (2015).
[5] Babu, M. M., Luscombe, N. M., Aravind, L., Gerstein, M. & Teichmann, S. A. Structure and evolution of transcriptional
regulatory networks. Current Opinion in Structural Biology 14, 283–291 (2004).
[6] Albert, R. Scale-free networks in cell biology. Journal of Cell Science 118, 4947–4957 (2005).
[7] Yip, K. Y. et al. Classification of human genomic regions based on experimentally determined binding sites of more than
100 transcription-related factors. Genome Biology 13, R48 (2012).
[8] Lamparter, D., Marbach, D., Rico, R., Kutalik, Z. & Bergmann, S. Fast and rigorous computation of gene and pathway
scores from SNP-based summary statistics. PLoS Comp Bio (in press).
[9] The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491,
56–65 (2012).
[10] Kondor, R. I. & Lafferty, J. D. Diffusion Kernels on Graphs and Other Discrete Input Spaces. In Proceedings of the
Nineteenth International Conference on Machine Learning, ICML ’02, 315–322 (Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 2002).
[11] Rossin, E. J. et al. Proteins encoded in genomic regions associated with immune-mediated disease physically interact and
suggest underlying biology. PLoS Genetics 7, e1001273 (2011).
[12] Erten, S., Bebek, G., Ewing, R. M. & Koyutürk, M. DADA: Degree-Aware Algorithms for Network-Based Disease Gene
Prioritization. BioData mining 4, 19 (2011).
[13] Lage, K. et al. A human phenome-interactome network of protein complexes implicated in genetic disorders. Nature
biotechnology 25, 309–316 (2007).
[14] Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100
(2012).
[15] Neph, S. et al. Circuitry and Dynamics of Human Transcription Factor Regulatory Networks. Cell 150, 1274–1286 (2012).
[16] Boyer, L. A. et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956 (2005).
[17] Perez-Costas, E., Melendez-Ferro, M. & Roberts, R. C. Basal ganglia pathology in schizophrenia: dopamine connections
and anomalies. Journal of neurochemistry 113, 287–302 (2010).
[18] Hazlett, E. A. et al. Three-Dimensional Analysis With MRI and PET of the Size, Shape, and Function of the Thalamus in
the Schizophrenia Spectrum. American Journal of Psychiatry (2014).
[19] Berridge, C. W. & Waterhouse, B. D. The locus coeruleus-noradrenergic system: modulation of behavioral state and
state-dependent cognitive processes. Brain Research. Brain Research Reviews 42, 33–84 (2003).
[20] Millan, M. J. et al. Cognitive dysfunction in psychiatric disorders: characteristics, causes and the quest for improved
therapy. Nature Reviews Drug Discovery 11, 141–168 (2012).
[21] Misra, M. & Klibanski, A. Endocrine consequences of anorexia nervosa. The Lancet Diabetes & Endocrinology 2, 581–592
(2014).
[22] Garrett, A. & Chang, K. The role of the amygdala in bipolar disorder development. Development and Psychopathology 20,
1285–1296 (2008).
[23] Cromer, W. E., Mathis, J. M., Granger, D. N., Chaitanya, G. V. & Alexander, J. S. Role of the endothelium in inflammatory
bowel diseases. World Journal of Gastroenterology 17, 578–593 (2011).
[24] Swirski, F. K. et al. Identification of Splenic Reservoir Monocytes and Their Deployment to Inflammatory Sites. Science
325, 612–616 (2009).
[25] Wright, H. L., Moots, R. J. & Edwards, S. W. The multifactorial role of neutrophils in rheumatoid arthritis. Nature Reviews
Rheumatology 10, 593–601 (2014).
[26] Filion, L. G., Graziani-Bowering, G., Matusevicius, D. & Freedman, M. S. Monocyte-derived cytokines in multiple sclerosis.
Clinical and Experimental Immunology 131, 324–334 (2003).
54
Nature Methods: doi:10.1038/nmeth.3799
[27] Revie, D. & Salahuddin, S. Z. Role of macrophages and monocytes in hepatitis C virus infections. World Journal of
Gastroenterology : WJG 20, 2777–2784 (2014).
[28] Iadecola, C. Neurovascular regulation in the normal brain and in Alzheimer’s disease. Nature Reviews. Neuroscience 5,
347–360 (2004).
[29] Gjoneska, E. et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer/’s disease. Nature
518, 365–369 (2015).
[30] Visser, P. J., Verhey, F. R. J., Hofman, P. a. M., Scheltens, P. & Jolles, J. Medial temporal lobe atrophy predicts
Alzheimer’s disease in patients with minor cognitive impairment. Journal of Neurology, Neurosurgery, and Psychiatry 72,
491–497 (2002).
[31] Heneka, M. T. et al. Locus ceruleus controls Alzheimer’s disease pathology by modulating microglial functions through
norepinephrine. Proceedings of the National Academy of Sciences of the United States of America 107, 6058–6063 (2010).
[32] Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy.
Nature Genetics 42, 786–789 (2010).
[33] Hallmayer, J. et al. Narcolepsy is strongly associated with the T-cell receptor alpha locus. Nature Genetics 41, 708–711
(2009).
[34] Goodrich, J. K. et al. Human genetics shape the gut microbiome. Cell 159, 789–799 (2014).
[35] Sommer, F. & Bäckhed, F. The gut microbiota — masters of host development and physiology. Nature Reviews Microbiology
11, 227–238 (2013).
[36] Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
[37] Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nature
Communications 6, 5890 (2015).
55
Nature Methods: doi:10.1038/nmeth.3799
Related documents