Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
— Supplementary Figures — Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases Daniel Marbach1,2,∗ , David Lamparter1,2 , Gerald Quon3,4 , Manolis Kellis3,4 , Zoltán Kutalik1,2,5 , Sven Bergmann1,2 1 2 3 4 5 Department of Medical Genetics, University of Lausanne, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland Broad Institute, MIT, Cambridge, MA, USA Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA Institute of Social and Preventive Medicine, University Hospital of Lausanne, Switzerland * Correspondence should be addressed to D.M. ([email protected]) Website: http://regulatorycircuits.org 1 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figures 1 Linking TFs, promoters, enhancers, and genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Enhancer–promoter activity correlations for chromosome 22 . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Q-Q plot for enhancer–promoter correlation coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4 Dependence of networks on individual samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5 Dependence of networks on individual samples for primary cells, tissues, and cell lines . . . . . . . . . . 8 6 In-degree distributions of regulatory circuits at each layer . . . . . . . . . . . . . . . . . . . . . . . . . . 9 7 Out-degree distributions of regulatory circuits at each layer . . . . . . . . . . . . . . . . . . . . . . . . . 10 8 TF–promoter out-degree distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 9 ChIP-seq validation of TF–enhancer and TF–promoter edges . . . . . . . . . . . . . . . . . . . . . . . . 12 10 ChIP-seq validation results for each of 5 cell lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 11 eQTL validation of enhancer–gene edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 12 ChIP-seq and eQTL validation: AUPR vs. F-score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 13 Correlation between regulatory edges and target gene expression . . . . . . . . . . . . . . . . . . . . . . 16 14 Hierarchical clustering of regulatory networks across cell types and tissues . . . . . . . . . . . . . . . . . 17 15 Pairwise similarity of cell type and tissue-specific regulatory networks . . . . . . . . . . . . . . . . . . . 18 16 Dendrogram and annotation of nervous system clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 17 Dendrogram of mesenchyme clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 18 Annotation of mesenchyme clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 19 Dendrogram and annotation of immune system clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 20 Dendrogram of trunk organs clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 21 Annotation of trunk organs clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 22 Dendrogram and annotation of neuron-associated clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 25 23 Dendrogram and annotation of epithelium clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 24 Developmental origin of regulatory networks pertaining to different high-level clusters . . . . . . . . . . 27 25 Comparing number of TFs per gene across lineages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 26 Comparing network properties across lineages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 27 Overview of network connectivity enrichment analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 28 Illustrative example for network connectivity enrichment analysis . . . . . . . . . . . . . . . . . . . . . . 32 29 Correlation between neighboring genes due to linkage disequilibrium . . . . . . . . . . . . . . . . . . . . 33 30 Validation of connectivity enrichment scores using phenotype-label permutations . . . . . . . . . . . . . 34 31 Connectivity enrichment scores for different types of networks . . . . . . . . . . . . . . . . . . . . . . . . 35 32 Overview of network connectivity enrichment results for all traits . . . . . . . . . . . . . . . . . . . . . . 37 33 Connectivity enrichment for the psychiatric, cross-disorder study . . . . . . . . . . . . . . . . . . . . . . 38 34 Connectivity enrichment for schizophrenia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 35 Connectivity enrichment for anorexia nervosa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 36 Connectivity enrichment for bipolar disorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 37 Connectivity enrichment for inflammatory bowel disease . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 38 Connectivity enrichment for ulcerative colitis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 39 Connectivity enrichment for Crohn’s disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 40 Connectivity enrichment for rheumatoid arthritis, multiple sclerosis, and hepatitis C resolvers . . . . . . 45 41 Connectivity enrichment for Alzheimer’s disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 42 Connectivity enrichment for narcolepsy with and without HLA region . . . . . . . . . . . . . . . . . . . 47 43 Connectivity enrichment for blood lipids (HDL and LDL) . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2 Nature Methods: doi:10.1038/nmeth.3799 44 45 46 Connectivity enrichment for blood lipids (TC and TG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connectivity enrichment for body mass index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connectivity enrichment for β-cell function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 50 52 47 Connectivity enrichment for advanced macular degeneration (neovascular) . . . . . . . . . . . . . . . . . 53 3 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 1 b Promoter TF motif Distal enhancer TF motif – promoter distance TF motif >10-fold enrichment (motif confidence 0.9) Fold enrichment TF motif – enhancer distance 50 Motif confidence 1.0 0.7 20 Motifs within enhancers: >10-fold enrichment 10.0 Fold enrichment a 0.1 Motif confidence 7.5 1.0 0.7 5.0 0.1 2.5 0 0 500 1000 0 c Promoter d cis-eQTL Gene >10-fold enrichment 1000 Empirical distribution Smooth fit (weight function) 1.0 0.75 150 100 0.5 Weight 200 Density Fold enrichment 750 cis-eQTL – gene distance 500 250 500 Target gene Promoter – gene distance 250 Distance from enhancer (bp) Distance from promoter (bp) 0.25 500kb 50 0.0 0 0 1kb 0 500 1000 10kb 100kb 1mb Distance from gene (log scale) Distance from TSS (bp) Supplementary Figure 1: Linking TFs, promoters, enhancers, and genes (a) Enrichment of TF motif instances near CAGE-defined promoters. High-confidence motifs are >10-fold enriched in a window of -400–50bp around promoters (negative: upstream; positive: downstream). (b) Enrichment of TF motif instances near distal, CAGE-defined enhancers. High-confidence motifs are >10-fold enriched inside (distance = 0bp) enhancers. In contrast to promoters, motif enrichment only occurs inside — not around — the CAGE-defined enhancers, indicating that the signature of bidirectional transcription used for their detection comprises the entire enhancer region. (c) Enrichment of CAGE-defined promoters near the annotated transcription start site (TSS) of genes. Promoters are >10-fold enriched in a window of -250–500bp around the TSS of genes. (a-c) Regions where >10-fold enrichment was found (grey areas) are used to link TF motifs to promoters, TF motifs to enhancers, and promoters to genes. (d) Distance distribution of cis-eQTLs from their target genes (note: logarithmic scale on x-axis, orientation [up/downstream] is not distinguished). The smooth fit of the empirical distribution is used as a weighting function to probabilistically link enhancers to genes (second y-axis). See also Methods for a more detailed discussion of these analyses. 4 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 2 Enhancer-promoter correlation 0.5 0.0 NA (Enhancer-enhancer and promoter-promoter pairs) Enhancers Promoters Chromosome 22 Supplementary Figure 2: Enhancer–promoter activity correlations for chromosome 22 In order to investigate whether enhancers globally (i.e., across different cell types and tissues) associate with specific target promoters, we computed Pearson’s correlation coefficient for all enhancer–promoter pairs on a given chromosome. The correlation matrix was visualized as a heatmap, where promoters and enhancers were ordered according to their position on the chromosome. Here, chromosome 22 is shown as an illustrative example, the same observations were made for other chromosomes. We found that most enhancers do not globally correlate more strongly with specific, nearby promoters (blue cells are scattered through the matrix). Instead, most enhancers positively correlate with many promoters throughout the chromosome, many of which are too far away to be likely regulatory targets of the enhancer. The reason is that many promoters are themselves co-expressed. Moreover, cell type-specific interactions are often missed when using correlation or correlation-like statistics1 (this conclusion is also supported by a genomewide analysis in 3). It is possible that more sophisticated approaches (e.g., based on an initial clustering of the input data,2, 3 statistics designed to capture cell type-specific interactions,1 or machine learning approaches integrating additional data4 ) would reveal significant relationships. However, the performance of these different methods for the data at hand is not yet well understood and we thus opted for a parsimonious approach to link enhancers to potential target promoters (see Methods). 5 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 3 Observed quantiles (–log10) Enhancer–promoter correlation (Q–Q plot) Expected quantiles (–log10) Supplementary Figure 3: Q-Q plot for enhancer–promoter correlation coefficients We sought to test whether enhancers show increased correlation with their target promoters. To this end, we computed Pearson’s correlation coefficient for all enhancer–promoter pairs. We divided the correlation coefficients in two sets: one for enhancer–promoter pairs that are on the same chromosome (intra-chromosomal pairs), and one for pairs where the enhancer and promoter are on different chromosomes (inter-chromosomal pairs). Since the target promoters of an enhancer are on the same chromosome, the correlation coefficients corresponding to true enhancer–target promoter interactions are all in the intra-chromosomal set, while the inter-chromosomal correlation coefficients correspond to the expected null distribution for non-interacting enhancer–promoter pairs. Thus, if enhancers showed increased correlation with their target promoters, the first distribution would be shifted towards greater positive correlation. The Q-Q plot shows that this is not the case: the two distributions are identical. We conclude that enhancers do not show increased correlation with specific target promoters at a global level, i.e., across cell types and tissues (note that the FANTOM5 data consists of samples from different cell types and tissues — naturally, the situation would be different for a dataset consisting of many samples from the same cell type or tissue). See Supplementary Fig. 2 for a more detailed discussion of these observations. (Note, for computational reasons the Q-Q plot only includes enhancer–promoter pairs for chromosomes 13–22; results are identical for the remaining chromosomes). 6 Nature Methods: doi:10.1038/nmeth.3799 a Number of networks Supplementary Figure 4 150 100 50 0 1 Stable edges (leave one sample out) b 2 3 4 5 6 7 8 9 15 33 Number of samples per network 100% 95% 90% 85% 80% 75% 1 2 3 4 5 6 7 8 9 15 33 Number of samples per network Supplementary Figure 4: Dependence of networks on individual samples The compendium of 394 cell type and tissue-specific regulatory networks is based on 808 cell and tissue samples from FANTOM5: first a network was constructed for each of the 808 samples, and then networks of biological replicates and other closely related samples were merged (Supplementary Table 1, Methods). (a) The histogram shows the number of samples merged for each of the 394 final networks. Most networks consist of few samples: 361 networks (92%) merge only one, two or three samples. 20 networks consist of 4 samples and only 13 networks have more than 4 samples. (b) Dependence of networks on individual samples. For each of the networks consisting of more than one sample, we assessed the dependence on each of its samples. To this end, we removed one sample at a time and evaluated the percentage of edges that remained in the network (referred to as stable edges). Boxplots show the percentage of stable network edges when removing each of the samples (Supplementary Table 1 includes the result for each sample). These results confirm that the merged samples of a network are generally very similar and redundant to each other: e.g., for networks consisting of three samples, on average 94% of edges remain the same when leaving one of the samples out. 7 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 5 Number of networks Primary cells Tissues Cell lines 75 50 25 0 Stable edges (leave one sample out) 100% 95% 90% 85% 80% 75% 1 2 3 4 5 6 7 8 9 33 Number of samples per network 1 2 3 4 5 6 7 8 9 Number of samples per network 1 2 3 4 5 6 7 8 9 15 Number of samples per network Supplementary Figure 5: Dependence of networks on individual samples for primary cells, tissues, and cell lines Top row: histograms showing the number of samples merged per network; bottom row: dependence of networks on individual samples. For explanation, see legend of Supplementary Fig. 4. Here, the same results are shown broken down by primary cells, tissues, and cell lines. Stability of edges is slightly lower for cell lines than for primary cells and tissues. This is because merged primary cell or tissue samples are mostly biological replicates (same cell type or tissue from different donors), while merged cell lines are merely of the same cancer subtype (tumor samples from different patients are expected to have higher heterogeneity than healthy tissue samples). 8 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 6 Distribution 1.00 Cumulative distribution 3 Count (log10) Median Log scale Cumulative distribution #TFs per enhancer 2 1 0.75 0.50 0.25 0.00 0 1 50 100 150 50 1 #TFs per enhancer 100 150 #TFs per enhancer #TFs per promoter Cumulative distribution 1.00 Count (log10) 3 2 1 0 0.75 0.50 0.25 0.00 1 50 100 150 1 #TFs per promoter 50 100 150 #TFs per promoter #Enhancers per gene isoform Cumulative distribution 1.00 Count (log10) 3 2 1 0 1 10 20 30 0.75 0.50 0.25 0.00 40 1 #Enhancers per gene isoform 10 20 30 40 #Enhancers per gene isoform #Promoters per gene isoform Cumulative distribution Count (log10) 1.00 3 2 1 0 0.75 0.50 0.25 0.00 1 5 10 15 20 #Promoters per gene isoform 1 5 10 15 20 #Promoters per gene isoform Supplementary Figure 6: In-degree distributions of regulatory circuits at each layer In-degree distributions (left side) show the number of nodes (y-axis) for each in-degree (number of incoming links, x-axis). Distributions are plotted with a logarithmic scale on the y-axis, revealing approximately exponential decay (linear relationship in the semi-log plots). Exponential in-degree distributions, which do not have extreme hubs as in scale-free networks, have been previously observed for gene networks.5, 6 Presumably, this is because it would be difficult to sensibly integrate a huge number of TF inputs at a single enhancer or promoter (a different phenomenon are hot regions where huge numbers of TFs bind non-specifically, but likely without direct regulatory effect7 ). Extreme hubs are also not expected for the other types of links (enhancer–gene and promoter–gene) because they are confined to a certain region on a chromosome, i.e., the number of potential interaction partners is physically limited. 9 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 7 Distribution Cumulative distribution #Enhancers per TF 1.00 Cumulative distribution 2 Log scale Count (log10) Median 1 0 0.75 0.50 0.25 0.00 1 1 1000 2000 3000 4000 1000 2000 3000 4000 #Enhancers per TF #Enhancers per TF #Promoters per TF 1.00 Cumulative distribution Count (log10) 2 1 Heavy tail 0 0 10000 0.75 0.50 0.25 0.00 20000 1 #Promoters per TF 10000 20000 #Promoters per TF #Gene isoforms per enhancer Cumulative distribution 1.00 Count (log10) 3 2 1 0.75 0.50 0.25 0.00 0 1 25 50 75 1 25 50 75 #Gene isoforms per enhancer #Gene isoforms per enhancer #Gene isoforms per promoter Cumulative distribution Count (log10) 1.00 3 2 1 0.75 0.50 0.25 0.00 0 1 5 10 15 20 25 #Gene isoforms per promoter 1 5 10 15 20 25 #Gene isoforms per promoter Supplementary Figure 7: Out-degree distributions of regulatory circuits at each layer Out-degree distributions (left side) show the number of nodes (y-axis) for each out-degree bin (number of outgoing links, counts were obtained using equally spaced bins on x-axis). Similar to the in-degree distributions (previous figure), the out-degree distributions show roughly exponential decay with one exception: the number of promoters per TF has a heavy tail corresponding to extreme hubs that are characteristic of scale-free networks (cf. Supplementary Fig. 8). In other words, regulatory networks are scale-free only at the level of promoters, but not at the level of enhancers, suggesting that the widely cited scale-free property of gene networks6 is due to the promoter-centric view of previous studies. Concerning the enhancer–gene and promoter–gene links, we do not expect extreme hubs because of physical constraints, as discussed in Supplementary Fig. 6. 10 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 8 TF–promoter outdegree distribution 2 Count (log10) Log scale Median 1 0 3 5 #Promoters per TF (log10) Log scale Supplementary Figure 8: TF–promoter out-degree distribution Same plot as shown in the second row of Supplementary Fig. 7, but using a log scale for both axes, confirming that the out-degree distribution of TFs with respect to promoters follows roughly a power-law in the right tail (linear relationship in the log-log plot). 11 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 9 a Inference methods Expression correlation TF motifs Expression correlation & TF motifs Enhancer expression & TF motifs* Random Reference ChIP-seq replicates TF – enhancer edges 1 2 3 4 i ii 0 *Retained method b Expression correlation TF motifs Expression correlation & TF motifs Promoter expression & TF motifs* Random ChIP-seq replicates 0.25 0.5 AUPR 0.75 1 TF – promoter edges 1 2 3 4 i ii 0 0.25 0.5 AUPR 0.75 1 Supplementary Figure 9: ChIP-seq validation of TF–enhancer and TF–promoter edges Assessment of different approaches to infer edges between TFs and regulatory elements (enhancers and promoters). See legend of Fig. 2a for details. While Fig. 2a shows the overall performance across enhancers and promoters, here we show results individually for (a) enhancers and (b) promoters. For both the retained method to construct regulatory circuits (*) and the ChIP-seq replicates, there is no significant difference in performance between enhancers and promoters (p > 0.05, two-sided Wilcoxon rank-sum test). 12 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 10 GM12878 48 samples TF–enhancer edges Inference methods Expression correlation 1 1 TF motifs 2 2 Expression correlation & TF motifs 3 3 Enhancer expression & TF motifs* 4 4 Random i i ChIP-seq replicates ii ii Reference 0 K562 42 samples *Retained method HepG2 40 samples 0.5 AUPR 0.75 1 1 1 TF motifs 2 2 Expression correlation & TF motifs 3 3 Enhancer expression & TF motifs* 4 4 Random i i ChIP-seq replicates ii ii 0.25 0.5 AUPR 0.75 1 Expression correlation 1 1 TF motifs 2 2 Expression correlation & TF motifs 3 3 Enhancer expression & TF motifs* 4 4 Random i i ChIP-seq replicates ii ii 0 25 samples 0.25 Expression correlation 0 0.25 0.5 AUPR 0.75 1 Expression correlation 1 1 TF motifs 2 2 Expression correlation & TF motifs 3 3 Enhancer expression & TF motifs* 4 4 Random i i ChIP-seq replicates ii ii 0 HUVEC 6 samples TF–promoter edges 0.25 0.5 AUPR 0.75 1 Expression correlation 1 1 TF motifs 2 2 Expression correlation & TF motifs 3 3 Enhancer expression & TF motifs* 4 4 Random i i ChIP-seq replicates ii ii 0 0.25 0.5 AUPR 0.75 1 0 0.25 0.5 AUPR 0.75 1 0 0.25 0.5 AUPR 0.75 1 0 0.25 0.5 AUPR 0.75 1 0 0.25 0.5 AUPR 0.75 1 0 0.25 0.5 AUPR 0.75 1 Supplementary Figure 10: ChIP-seq validation results for each of 5 cell lines Assessment of different approaches to infer edges between TFs and regulatory elements (enhancers and promoters) for each of 5 ENCODE cell lines. See legend of Fig. 2a for details. While Fig. 2a shows the overall performance across enhancers, promoters and cell lines, here we show results individually for TF–enhancer and TF–promoter edges (columns) and the different cell lines (rows). For each cell line, the number of samples (ChIP-seq experiments) is indicated. Note that there are only 6 ChIP-seq experiemnts for HUVEC (human umbilical vein endothelial cells). The retained method to construct regulatory circuits (*) outperforms alternative approaches in each of the cell lines, but there is substantial variation in performance both for the retained method and ChIP-seq replicates across cell lines. Notably, the retained method performs as well as the ChIP-seq replicates in K562. There is also substantial variation in performance for different TFs in the same cell line, ranging from poor (AUPR comparable to random predictions) to excellent prediction accuracy (AUPR comparable to ChIP-seq replicates), which is expected due to varying quality of TF motifs and/or ChIP-seq antibodies. Comparing the performance for TF–enhancer vs. TF–promoter edges, we observe the same trends as in Supplementary Fig. 9. 13 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 11 a Matching tissues (9 samples) Inference methods Maximum expression correlation 1 Minimum genomic distance 2 Tissue-specific activity & distance* 3 Reference Random i *Retained method b 0 0.2 AUPR 0.4 Closely related tissues (57 samples) Maximum expression correlation 1 Minimum genomic distance 2 Tissue-specific activity & distance* 3 Random i 0 0.2 AUPR 0.4 Supplementary Figure 11: eQTL validation of enhancer–gene edges Assessment of different approaches to link enhancers to target genes using eQTL data from GTEx. See legend of Fig. 2b for details. We mapped the 13 tissues from GTEx to corresponding tissue samples from FANTOM (Supplementary Table 3). We distinguished between exact matches and closely related tissues. For example, one of the GTEx tissues is the aorta. There is a matching tissue sample in FANTOM, but also several aortic cell types including aortic endothelial cells, fibroblasts and smooth muscle cells, which we label as closely related tissues. (a) Results for 9 matching GTEx–FANTOM tissue pairs (this panel is identical to Fig. 2b). (b) Results for 57 closely related GTEx–FANTOM tissue pairs. The results for matching tissues are confirmed in the broader set of closely related tissues. As expected, for our tissue-specific circuits (the retained method) performance is slightly better for matching tissues than for closely related tissues. For the other methods (random, maximum correlation, and minimum distance), there is no difference because these methods do not infer links in a tissue-specific manner. 14 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 12 a TF – enhancer/promoter edges Inference methods Expression correlation 1 1 TF motifs 2 2 Expression correlation & TF motifs 3 3 Enhancer expression & TF motifs* 4 4 Reference Random i i ChIP-seq replicates ii ii 0 *Retained method 0.25 0.5 b Inference methods Maximum expression correlation Minimum genomic distance Tissue-specific activity & distance* Reference Random *Retained method 0.75 1 0 0.25 AUPR 0.5 0.75 1 F-score Enhancer – gene edges 1 1 2 2 3 3 i i 0 0.2 0.4 AUPR 0 0.1 0.2 0.3 F-score Supplementary Figure 12: ChIP-seq and eQTL validation: AUPR vs. F-score Comparison of validation results using two alternative metrics to assess the accuracy of edge predictions: the area under the precision-recall curve (AUPR, left) and the F-score (right). The plots showing the AUPR are identical to Fig. 2 of the main text, they are reproduced here for comparison with the F-score. Refer to the legend of Fig. 2 for explanation. The choice of the AUPR to report validation results is motivated in Methods, but similar results were obtained with other performance metrics. Here we show in addition the F-score of edge predictions: the F-score is a popular metric used in the network inference literature, it is defined as the harmonic mean of precision and recall at a given cutoff. The plots show the F-score for the top 50% of the predicted edges, similar results were obtained at different cutoffs. We conclude that our results are robust and not specific to the AUPR as performance metric. 15 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 13 a b Background Correlation between strength of TF inputs and expression of genes ● ● Density Genes present in network ● ● ● ● ● ● Correlation Genes ranked by expression Supplementary Figure 13: Correlation between regulatory edges and target gene expression Gene expression data (RNA-seq) was obtained from the Roadmap Epigenomics project for 40 tissues that are also present in our network compendium (Supplementary Table 4, Methods). (a) Overlap of weakly, moderately, and highly expressed genes with regulatory networks across the 40 tissues. For each tissue, genes were ranked by their expression level (RPKM). Boxplots show the percentage of genes that are part of the regulatory network for the bottom 10%, mid 10%, and top 10% most highly expressed genes. On average, 98% of the highly and 90% of the moderately expressed genes are present in our networks, i.e., they have regulatory inputs (incoming edges) in the given tissue. As expected, most of the weakly expressed genes have no active regulatory inputs in the given tissue, i.e., they are not present in the corresponding network. (b) Correlation between regulatory edge strength and target gene expression. For each gene, the correlation between its TF input (sum of weights of incoming edges) and expression was evaluated across tissues. The plot shows the distribution of correlation coefficients for all genes in green. The background/expected distribution is shown in grey (correlation coefficients for TF inputs of gene i and expression levels of gene j, where i = j). There is strong positive correlation between the strength of TF inputs and the expression of the same gene. The plot shows results using Spearman’s correlation coefficient (median = 0.53), similar results were obtained using Pearson correlation (median = 0.49). Correlation is stronger when considering only Roadmap Epigenomic tissues that were considered to be a good match for a network in our compendium (results shown here) than when including also the secondary matches / related tissues given in Supplementary Table 4 (median Spearman’s correlation = 0.42). 16 Nature Methods: doi:10.1038/nmeth.3799 3 Adult forebrain 4 Mesenchymal, mixed 5 Sarcoma 6 Endothelial cells 7 Mesenchymal stem & smooth muscle cells 8 Connective tissue & muscle cells 9 Connective tissue & integumental cells 10 11 12 Trunk organs 13 14 22 23 Myeloid leukemia Endo-epithelial cells Adenocarcinoma Liver & kidney Male reproductive organs Gastrointestinal system Heart Mouth, throat & skeletal muscle tissue Lung 24 Glands & internal genitalia 15 16 17 18 19 20 21 Epithelium c Lymphocytes Myeloid leukocytes Lymphocytes of B lineage Lymphoma Immune organs Pineal gland & eye Neuron-associated cells & cancer 27 Astrocytes & pigment cells 28 Neuroectodermal tumors & sarcoma 25 26 30 31 Epithelial cells Extraembryonic membrane Epithelial cells of kidney & uterus 32 Lung epithelium & lung cancer 29 Cluster 10 Neurons & fetal brain Nervous system & adult hindbrain Cluster 11 1 2 Individual networks CD4+ T cells Natural killer cells CD8+ T cells !"ve regulatory T cells !ry conventional T cells !ry regulatory T cells !"ve conventional T cells CD14+ monocytes Mast cell Langerhans cells, immature Langerhans cells, migratory CD19+ B cells Dendritic cells, plasmacytoid CD14+CD16+ monocytes Neutrophils Whole blood ribopure Peripheral blood mononuclear cells Basophils Lymphocytes Myeloid leukocytes Other c Cluster 24 b Neuronassociated b 32 clusters of networks Mesenchyme Immune system ~400 cell-type-specific networks a Nervous system Supplementary Figure 14 Individual networks Thyroid, fetal Thyroid, adult Parotid gland, adult Salivary gland, adult Submaxillary gland, adult Seminal vesicle, adult Ductus deferens, adult Uterus, fetal Ovary, adult Cervix, adult Uterus, adult Breast, adult Dura mater, adult Vein, adult Adipose tissue, adult Vagina, adult Smooth muscle, adult Prostate, adult Bladder, adult Glands Male internal genitalia Female internal genitalia Supplementary Figure 14: Hierarchical clustering of regulatory networks across cell types and tissues (a) Clustering of the 394 cell type and tissue-specific regulatory networks based on edge similarity (see also Supplementary Fig. 15). Clusters are highly consistent with developmental and functional relationships between cell types, tissues and organs. The cutoff indicated by the red dashed line leads to 32 coherent clusters, annotated to the right based on enriched terms from cell, anatomical and disease ontologies. For each of these clusters, a high-level network was derived by merging the corresponding individual networks (Methods). (b-c) Detailed view of representative clusters of networks (red boxes in Panel a; remaining clusters are shown in Supplementary Figs. 16–23). (b) Cluster 10 consists of regulatory networks from lymphocytes including T cells and natural killer cells, while Cluster 11 mostly contains networks from myeloid leukocytes. Cf. Supplementary Fig. 19. (c) Cluster 24 joins regulatory networks from glands and internal genitalia (seminal vesicle, ovary, breast, and prostate belonging to both categories). Functionally related networks are consistently grouped also at the sub-cluster level (e.g., endocrine glands, salivary glands, and male and female genitalia, respectively). Cf. Supplementary Figs. 20–21 17 Nature Methods: doi:10.1038/nmeth.3799 Frequency Supplementary Figure 15 0 394 regulatory networks 32 clusters 0.5 Edge similarity 1 (i) Nervous system (ii) Mesenchyme (iii) Immune system (iv) Trunk organs (v) Neuron-associated (vi) Epithelium Supplementary Figure 15: Pairwise similarity of cell type and tissue-specific regulatory networks Hierarchical clustering of regulatory networks was performed based on edge similarity (Methods). The cutoff indicated by the red, dashed line leads to 32 coherent clusters, which are annotated in Supplementary Fig. 14 and shown in detail below (Supplementary Figs. 16–23). The six high-level groups (i–vi) separate regulatory networks of major types of tissues and anatomical systems, which also share distinct developmental origins (Supplementary Fig. 24). Regulatory networks of the nervous system form a particularly tight group and are the most dissimilar to networks of other groups. 18 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 16 a Regulatory networks Neural stem cells Neurons brain, fetal 1. Neurons & fetal brain parietal lobe, fetal temporal lobe, fetal occipital lobe, fetal Assigned cluster names optic nerve spinal cord, fetal cerebellum, adult corpus callosum, adult substantia nigra, adult 2. Nervous system & adult hindbrain pons, adult spinal cord, adult medulla oblongata, adult locus coeruleus, adult caudate nucleus, adult putamen, adult diencephalon, adult thalamus, adult globus pallidus, adult amygdala, adult hippocampus, adult parietal lobe, adult medial frontal gyrus, adult occipital cortex, adult medial temporal gyrus 3. Adult forebrain cerebral meninges, adult brain, adult medial temporal gyrus, adult olfactory region, adult occipital lobe, adult nucleus accumbens, adult paracentral gyrus, adult insula, adult frontal lobe, adult occipital pole, adult temporal lobe, adult postcentral gyrus, adult b 3. Adult forebrain 1. Neurons, fetal brain Q-value 3.57E-02 9.66E-02 ID UBERON:0000922 UBERON:0011216 Term embryo organ system subdivision 2. Nervous system, hindbrain Q-value 1.95E-06 3.14E-06 3.59E-05 5.29E-05 1.07E-04 1.11E-04 1.11E-04 2.70E-04 1.03E-03 1.86E-03 1.86E-03 3.57E-03 3.57E-03 6.00E-03 1.81E-02 3.13E-02 3.13E-02 3.13E-02 3.13E-02 3.13E-02 3.13E-02 6.45E-02 7.64E-02 2.96E-03 ID UBERON:0001016 UBERON:0004732 UBERON:0001017 UBERON:0010314 UBERON:0011216 UBERON:0002028 UBERON:0004733 UBERON:0004121 UBERON:0002298 UBERON:0000073 UBERON:0002616 UBERON:0001895 UBERON:0002680 UBERON:0000955 UBERON:0000477 UBERON:0000122 UBERON:0000988 UBERON:0001137 UBERON:0002240 UBERON:0005162 UBERON:0005174 UBERON:0000063 UBERON:0000481 UBERON:0007023 Term nervous system segmental subdivision of nervous system central nervous system structure with developmental contribution from neural crest organ system subdivision hindbrain segmental subdivision of hindbrain ectoderm-derived structure brainstem regional part of nervous system regional part of brain metencephalon regional part of metencephalon brain anatomical cluster neuron projection bundle pons dorsum spinal cord multi cell component structure back organ organ segment multi-tissue structure adult organism Q-value 1.68E-20 1.68E-20 1.87E-19 9.13E-19 9.90E-19 9.90E-19 1.44E-18 2.79E-18 7.92E-18 1.24E-17 3.90E-17 3.90E-17 1.87E-14 6.79E-13 2.50E-12 2.50E-12 1.04E-11 1.52E-11 2.32E-11 5.60E-11 1.08E-07 5.08E-05 8.76E-05 3.58E-04 3.58E-04 3.58E-04 3.58E-04 3.58E-04 4.49E-03 4.49E-03 1.11E-02 1.11E-02 1.48E-02 1.48E-02 1.48E-02 1.48E-02 4.17E-02 3.06E-14 ID UBERON:0001890 UBERON:0002780 UBERON:0001869 UBERON:0001017 UBERON:0000073 UBERON:0002616 UBERON:0001893 UBERON:0000955 UBERON:0001016 UBERON:0002791 UBERON:0002020 UBERON:0003528 UBERON:0011216 UBERON:0002619 UBERON:0000203 UBERON:0000956 UBERON:0010314 UBERON:0001950 UBERON:0004121 UBERON:0000481 UBERON:0000477 UBERON:0000064 UBERON:0000200 UBERON:0000454 UBERON:0002420 UBERON:0007245 UBERON:0010009 UBERON:0010011 UBERON:0001871 UBERON:0009663 UBERON:0000125 UBERON:0002308 UBERON:0000204 UBERON:0000349 UBERON:0000369 UBERON:0002435 UBERON:0002021 UBERON:0007023 Term forebrain regional part of forebrain cerebral hemisphere central nervous system regional part of nervous system regional part of brain telencephalon brain nervous system regional part of telencephalon gray matter of neuraxis brain grey matter organ system subdivision regional part of cerebral cortex pallium cerebral cortex structure with developmental contribution from neural crest neocortex ectoderm-derived structure multi-tissue structure anatomical cluster organ part gyrus cerebral subcortex basal ganglion nuclear complex of neuraxis aggregate regional part of brain collection of basal ganglia temporal lobe telencephalic nucleus neural nucleus nucleus of brain ventral part of telencephalon limbic system corpus striatum striatum occipital lobe adult organism Supplementary Figure 16: Dendrogram and annotation of nervous system clusters (a) Dendrogram showing networks and cluster names for (i) Nervous system (cf. Supplementary Fig. 15). (b) Enriched annotations from cell, anatomical and disease ontologies for each cluster of networks (p-values were corrected using Benjamini-Hochberg procedure; Methods). Horizontal lines separate groups of related annotations. We named clusters by inspecting both the individual networks (a) and the enriched annotations (b, selected terms are highlighted in yellow) with the aim to define succinct names that fit the majority of corresponding cells and tissues. 19 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 17 Regulatory networks */"r/"$ !r/"$ trr"$r""$$$ "/J"$r""$$$ 'Y<$$$*<$"r!rry Fibrob$"*<$"r!rry h"$'$$Y"r$$ 'Y<$$$[=$"$V 4. Mesenchymal, mixed ="//o~/"$$$$ $""$$$ 'r$$$ =""$$$"/"$$$ Y"h"$</$$$ ~J"/"$$$ </\=/"$$$ $$$"/"$$$ $myob$""$$$ xtrasZ$"$m^Y/"/"$$$ "/"$rt^""/"$$$ al;$"/$$"/"$$$ $"/J$$$<J"/"$$$ =ry"/"$$$ J"$$"/"$$$ "/"$$$ $rYY"$$<$"/"/"$$$ $"/"$$$ J$"$$$ "/"$$$ 5. Sarcoma $Y$"$$$$ Jra<$"$$</$$$ "$$"r""/"$$$ th/"/"$$$ J"$$$ \=/<Y"$$$ \=/"/"$$$ mixm<$$r"</$$$ Y$"J$$<$"/"/"$$$ "J"/"$$$ m^\=/"/"$$$ ""$""/"$$$ Yw""$$$ _"'<"$Y$"$$$ "$X$r<$"/Y$"$$$ Y$"$$$/o;"<$"/ Y$"$$$@Y" 6. Endothelial cells Y$"$$$Yr" Y$"$$$[=$"$; Y$"$$$V Y$"$$$!rry Y$"$$$!r h"$/<//$$o;ar""/"" h"$/<//$$o;ar""/rJYo;ary h"$/<//$$o;ar""/$\o;ary h"$/<//$$="//ow h"$/<//$$"/" h"$/<//$$" $my"$$$ Y$"$$$ 7. Mesenchymal stem & smooth muscle cells 'Y<$$$[=$"$!rry Fibrob$""/" 'Y<$$$!r 'Y<$$$'<=$a;"!rry 'Y<$$$"/ 'Y<$$$}r"$Yr"!rry 'Y<$$$/"r!rry 'Y<$$$r"YY"$ h"$$$$ Myoblast Pr SZ$"$m<$$$\f/"y<=m<$<$" SZ$"$<$$$ SZ$"$<$'"$$$$ Tr"=<$"/YworZ$$ Fibrob$"|<;al Kr" h"$'$$="//ow Fibrob$"@Y" 'Y<$$$Y"J"$ 'Y<$$$/Y"$ 8. Connective tissue & muscle cells h"$'$$Y" _"'$$"$$`$z h"$'$$<=$"$ "/"y Ir*JY$"$$$ 'Y<$$$r"V"<$"/ Fibrob$"Y/*$^< J"$$$ ?b$"\f/" 'Y<$$$Tr"Y"$ Hair F$$$?</'Y"Y$$ !<$<*<$<$$ ]<$<*<$<$$ !<=<"< !=/" !"$ Y//\\ 'o; Y/\\ r"$"$Y$"$$$$ r"$=ry"$""$h"$$$$ 'Y<$$$$ 'Y<$$$*/" */"'/"$$$ h"$'$$"=r" <$/$[/r'"'$$ 'Y<$$$[r Fibrob$"ZwalZ/warb</J 9. Connective tissue & integumental cells P"/"/"$$$ */""$ */"=/" Fibrob$"Zrmal Fibrob$"Z/Y"my" Fibrob$"Z"$m<<$"/"/hy Hair F$$$rmal P"$$"$$ Olf"rY$"$$$ Fibrob$"Pr"$@J"K"<$ Fibrob$"XJ;al Fibrob$"Pr"$@J"Kf"$ ?blast Fibrob$"!r!;"$ */"<=<"< */";ral h"$'$$" Fibrob$"rmal Supplementary Figure 17: Dendrogram of mesenchyme clusters Dendrogram showing networks and cluster names for (ii) Mesenchyme (cf. Supplementary Fig. 15). Annotations are shown in Supplementary Fig. 18. 20 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 18 4. Mesenchymal, mixed Q-value 8.15E-02 8.15E-02 8.15E-02 8.15E-02 4.40E-02 ID UBERON:0000078 UBERON:0002012 UBERON:0008886 UBERON:0005406 DOID:3307 7. Mesenchymal stem & smooth muscle cells Term mixed ectoderm/mesoderm/endoderm-derived structure pulmonary artery pulmonary vascular system perirenal fat teratoma 5. Sarcoma Q-value 5.03E-02 7.42E-02 6.91E-07 2.70E-06 2.05E-05 1.36E-03 5.06E-03 2.82E-02 2.99E-03 2.99E-03 ID UBERON:0000949 UBERON:0002368 DOID:4 DOID:14566 DOID:162 DOID:0050687 DOID:7 DOID:1115 DOID:0060100 DOID:201 Term endocrine system endocrine gland disease disease of cellular proliferation cancer cell type cancer disease of anatomical entity sarcoma musculoskeletal system cancer connective tissue cancer 6. Endothelial cells Q-value 8.56E-09 8.56E-09 8.56E-09 1.40E-07 5.90E-07 5.90E-07 7.99E-07 1.34E-06 3.14E-06 3.14E-06 4.70E-06 5.70E-06 1.15E-05 1.41E-05 1.61E-05 2.98E-05 2.50E-04 2.77E-04 1.03E-03 1.03E-03 1.03E-03 1.81E-02 1.97E-02 6.12E-13 3.38E-11 3.38E-11 3.38E-11 8.91E-11 8.06E-09 2.46E-04 5.12E-04 7.81E-04 2.76E-02 2.76E-02 ID UBERON:0001986 UBERON:0004638 UBERON:0004852 UBERON:0000055 UBERON:0002049 UBERON:0007798 UBERON:0003914 UBERON:0000487 UBERON:0001981 UBERON:0004537 UBERON:0006914 UBERON:0004111 UBERON:0000025 UBERON:0004535 UBERON:0001009 UBERON:0000490 UBERON:0000483 UBERON:0000119 UBERON:0001917 UBERON:0003915 UBERON:0004700 UBERON:0000477 UBERON:0004120 CL:0000115 CL:0000213 CL:0000215 CL:0002078 CL:0002139 CL:0000071 CL:0000076 CL:0000066 CL:1000413 CL:0002543 CL:0002544 Term endothelium blood vessel endothelium cardiovascular system endothelium vessel vasculature vascular system epithelial tube simple squamous epithelium blood vessel blood vasculature squamous epithelium anatomical conduit tube cardiovascular system circulatory system unilaminar epithelium epithelium cell layer endothelium of artery endothelial tube arterial system endothelium anatomical cluster mesoderm-derived structure endothelial cell lining cell barrier cell meso-epithelial cell endothelial cell of vascular tree blood vessel endothelial cell squamous epithelial cell epithelial cell endothelial cell of artery vein endothelial cell aortic endothelial cell Q-value 3.47E-03 2.52E-02 2.95E-05 2.95E-05 2.95E-05 6.36E-05 6.36E-05 3.58E-04 3.58E-04 4.52E-04 5.45E-04 9.32E-04 1.41E-03 2.75E-03 2.75E-03 8.15E-02 2.47E-03 3.39E-03 4.57E-03 6.05E-03 8.60E-07 5.53E-05 2.75E-04 3.82E-04 3.82E-04 5.12E-04 7.29E-02 1.44E-02 8.63E-03 ID UBERON:0003914 UBERON:0000025 UBERON:0001637 UBERON:0003509 UBERON:0004572 UBERON:0004571 UBERON:0004573 UBERON:0001981 UBERON:0004537 UBERON:0004535 UBERON:0001009 UBERON:0000055 UBERON:0004120 UBERON:0002049 UBERON:0007798 UBERON:0001533 CL:0000134 CL:0000048 CL:0000723 CL:0000034 CL:0000359 CL:0000192 CL:0000187 CL:0000211 CL:0000393 CL:0000183 CL:0002595 CL:0002494 DOID:2394 Term epithelial tube tube artery arterial blood vessel arterial system systemic arterial system systemic artery blood vessel blood vasculature cardiovascular system circulatory system vessel mesoderm-derived structure vasculature vascular system subclavian artery mesenchymal cell multi fate stem cell somatic stem cell stem cell vascular associated smooth muscle cell smooth muscle cell muscle cell electrically active cell electrically responsive cell contractile cell smooth muscle cell of the subclavian artery cardiocyte ovarian cancer 8. Connective tissue & muscle cells Q-value 5.16E-02 5.16E-02 8.98E-02 6.29E-03 7.08E-03 2.06E-04 8.11E-04 1.14E-03 1.14E-03 1.62E-02 1.28E-03 ID UBERON:0001134 UBERON:0002036 UBERON:0002204 UBERON:0002384 UBERON:0006876 CL:0000183 CL:0000187 CL:0000211 CL:0000393 CL:0000188 CL:0002320 Term skeletal muscle tissue striated muscle tissue musculoskeletal system connective tissue vasculature of organ contractile cell muscle cell electrically active cell electrically responsive cell skeletal muscle cell connective tissue cell 9. Connective tissue & integumental cells Q-value 2.80E-03 2.80E-03 6.29E-03 5.58E-02 2.32E-14 1.86E-03 3.05E-15 8.51E-07 1.39E-03 1.32E-02 1.58E-02 8.95E-02 ID UBERON:0002199 UBERON:0002416 UBERON:0003102 UBERON:0002097 UBERON:0002384 UBERON:0004923 CL:0002320 CL:0000057 CL:0002620 CL:0002334 CL:0000499 CL:0000136 Term integument integumental system surface structure skin of body connective tissue organ component layer connective tissue cell fibroblast skin fibroblast preadipocyte stromal cell fat cell Supplementary Figure 18: Annotation of mesenchyme clusters Enriched annotations for each cluster shown in Supplementary Fig. 17. Explanation in legend of Supplementary Fig. 16. 21 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 19 a CD4+ T Cells Natural Killer Cells CD8+ T Cells 10. Lymphocytes CD4+CD25+CD45RA+ naive regulatory T cells !ry conventional T cells !ry regulatory T cells !"ve conventional T cells CD14+ Monocytes Mast cell langerhans cells, immature langerhans cells, migratory CD19+ B Cells Dendr$$$"" 11. Myeloid leukocytes CD14+CD16+ Monocytes Neutrophils Whole blood (ribopure) Peripheral Blood Mononuclear Cells Basophils B lymphoblastoid cell line b cell line xeroderma pigentosum b cell line plasma cell leukemia cell line splenic lymphoma with villous lymphocytes cell line 12. Lymphocytes of B lineage non T non B acute lymphoblastic leukemia (ALL) cell line lymphoma, malignant, hair$$$$$ acute lymphoblastic leuk"`!@@z$$$ \\<$"/J$$$Y"$$$ Burkitts lymphoma cell line chronic lymphocytic leuk"`@@z$$$ Hodgkins lymphoma cell line acute lymphoblastic leuk"`!@@z$$$ hairy cell leukemia cell line mycosis fungoides, T cell lymphoma cell line "<$$$$<kemia cell line 13. Lymphoma anaplastic large cell lymphoma cell line myeloma cell line lymphangiectasia cell line hereditary spherocytic anemia cell line NK T cell leukemia cell line Endothelial Progenitor Cells Dendr$$"</rived "/Y"Jrived thymus, fetal 14. Immune organs thymus, adult blood, adult lymph node, adult spleen, fetal spleen, adult Reticulocytes chronic myelogenous leukemia (CML) cell line acute myeloid leukemia (FAB M6) cell line leukemia, chronic megakaryoblastic cell line acute myeloid leukemia (FAB M7) cell line acute myeloid leukemia (FAB M0) cell line acute myeloid leukemia (FAB M1) cell line 15. Myeloid leukemia acute myeloid leukemia (FAB M4) cell line acute myeloid leukemia (FAB M2) cell line $$"<$="//ow derived acute myeloid leukemia (FAB M5) cell line acute myeloid leukemia (FAB M4eo) cell line biphenotypic B myelomonocytic leukemia cell line acute myeloid leukemia (FAB M3) cell line myelodysplastic syndrome cell line b 10. Lymphocytes Q-value 1.14E-03 1.14E-03 1.14E-03 1.50E-02 1.50E-02 1.69E-02 2.39E-02 4.15E-02 ID CL:0000789 CL:0000791 CL:0002419 CL:0000542 CL:0002242 CL:0000624 CL:0000084 CL:0002087 13. Lymphoma Term alpha-beta T cell mature alpha-beta T cell mature T cell lymphocyte nucleate cell CD4-positive, alpha-beta T cell T cell nongranular leukocyte 11. Myeloid leukocytes Q-value 1.46E-09 7.89E-08 1.74E-07 8.60E-07 2.14E-04 9.46E-04 7.83E-03 7.83E-03 7.86E-03 5.09E-02 5.09E-02 5.09E-02 5.09E-02 5.09E-02 5.09E-02 ID CL:0000738 CL:0000988 CL:0000219 CL:0000766 CL:0000763 CL:0000576 CL:0000451 CL:0000990 CL:0002087 CL:0000094 CL:0000453 CL:0000860 CL:0002057 CL:0002393 CL:0002397 Term leukocyte hematopoietic cell motile cell myeloid leukocyte myeloid cell monocyte dendritic cell conventional dendritic cell nongranular leukocyte granulocyte Langerhans cell classical monocyte CD14-positive, CD16-negative classical monocyte intermediate monocyte CD14-positive, CD16-positive monocyte 12. Lymphocytes of B lineage Q-value 3.28E-10 2.30E-07 2.30E-07 2.32E-06 3.32E-05 3.94E-04 6.39E-04 8.02E-02 1.58E-03 1.09E-02 1.09E-02 ID CL:0000945 CL:0000542 CL:0002242 CL:0002087 CL:0000738 CL:0000988 CL:0000219 CL:0000236 DOID:0060058 DOID:0060083 DOID:2531 Term lymphocyte of B lineage lymphocyte nucleate cell nongranular leukocyte leukocyte hematopoietic cell motile cell B cell lymphoma immune system cancer hematologic cancer Q-value 1.79E-08 1.79E-08 4.77E-08 1.10E-07 2.20E-07 2.70E-06 5.47E-06 2.09E-09 2.09E-09 4.75E-05 7.01E-05 3.19E-04 2.40E-03 2.40E-03 2.60E-03 ID CL:0000542 CL:0002242 CL:0000084 CL:0000738 CL:0002087 CL:0000988 CL:0000219 DOID:0060083 DOID:2531 DOID:0050686 DOID:0060058 DOID:4 DOID:1240 DOID:162 DOID:14566 Term lymphocyte nucleate cell T cell leukocyte nongranular leukocyte hematopoietic cell motile cell immune system cancer hematologic cancer organ system cancer lymphoma disease leukemia cancer disease of cellular proliferation 14. Immune organs Q-value 8.51E-05 1.11E-04 1.11E-04 5.96E-04 5.96E-04 2.60E-02 3.13E-02 3.13E-02 3.13E-02 7.58E-02 ID UBERON:0002193 UBERON:0004177 UBERON:0005057 UBERON:0002390 UBERON:0002405 UBERON:0000077 UBERON:0002370 UBERON:0005058 UBERON:0009113 UBERON:0002106 Term hemolymphoid system hemopoietic organ immune organ hematopoietic system immune system mixed endoderm/mesoderm-derived structure thymus hemolymphoid system gland thymic region spleen 15. Myeloid leukemia Q-value 6.12E-13 7.02E-08 3.25E-17 1.92E-13 2.48E-10 2.48E-10 4.46E-05 1.29E-03 3.73E-03 4.39E-03 ID CL:0000763 CL:0000988 DOID:8692 DOID:1240 DOID:0060083 DOID:2531 DOID:0050686 DOID:4 DOID:162 DOID:14566 Term myeloid cell hematopoietic cell myeloid leukemia leukemia immune system cancer hematologic cancer organ system cancer disease cancer disease of cellular proliferation Supplementary Figure 19: Dendrogram and annotation of immune system clusters Explanation in legend of Supplementary Fig. 16. 22 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 20 Regulatory networks hepatoblastoma cell line hepatocellular carcinoma cell line colon carcinoma cell line 16. Endo-epithelial cells Intestinal epithelial cells (polarized) Prostate Epithelial Cells rectal cancer cell line adenocarcinoma cell line signet ring carcinoma cell line 17. Adenocarcinoma tubular adenocarcinoma cell line lung adenocarcinoma cell line bile duct carcinoma cell line epididymis, adult testis, adult 18. Male reproductive organs Clontech Human Universal Reference Total RNA kidney, adult kidney, fetal Univ/"$]!_<"]rmal Tissues Biochain SABiosciences XpressRef Human Universal Total RNA 19. Liver & kidney liver, fetal Hepatocyte liver, adult duodenum, fetal colon, fetal small intestine, fetal small intestine, adult colon, adult 20. Gastrointestinal system pancreas, adult rectum, fetal appendix, adult gall bladder, adult stomach, fetal hearricuspid valve, adult hear<$valve, adult heart, adult hearral valve, adult 21. Heart left ventricle, adult left atrium, adult heart, fetal diaphragm, fetal skeletal muscle, fetal skeletal m<$$<muscle skeletal muscle, adult skin, fetal umbilical cord, fetal throat, adult trachea, adult 22. Mouth, throat & skeletal muscle tissue throat, fetal trachea, fetal tonsil, adult esophagus, adult penis, adult tongue, adult tongue, fetal aorta, adult lung, right lower lobe, adult 23. Lung lung, fetal lung, adult thyroid, fetal thyroid, adult parotid gland, adult salivary gland, adult submaxillary gland, adult ductus deferens, adult seminal vesicle, adult uterus, fetal ovary, adult 24. Glands & internal genitalia cervix, adult uterus, adult breast, adult dura mater, adult vein, adult adipose tissue, adult vagina, adult smooth muscle, adult prostate, adult bladder, adult Supplementary Figure 20: Dendrogram of trunk organs clusters Dendrogram showing networks and cluster names for (iv) Trunk organs (cf. Supplementary Fig. 15). Corresponding annotations are shown in Supplementary Fig. 21. 23 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 21 21. Heart 16. Endo-epithelial cells Q-value 2.79E-02 2.79E-02 2.79E-02 2.79E-02 2.79E-02 3.98E-02 4.60E-02 4.60E-02 7.21E-02 6.13E-02 2.37E-02 5.22E-02 6.13E-02 8.44E-02 8.44E-02 8.44E-02 8.44E-02 ID UBERON:0001242 UBERON:0001262 UBERON:0001277 UBERON:0004786 UBERON:0004808 UBERON:0004119 UBERON:0000485 UBERON:0003929 UBERON:0003350 UBERON:0001007 CL:0002563 CL:0002076 CL:0000066 CL:0000181 CL:0000182 CL:0000412 CL:0000417 Term intestinal mucosa wall of intestine intestinal epithelium gastrointestinal system mucosa gastrointestinal system epithelium endoderm-derived structure simple columnar epithelium gut epithelium epithelium of mucosa digestive system intestinal epithelial cell endo-epithelial cell epithelial cell metabolising cell hepatocyte polyploid cell endopolyploid cell 17. Adenocarcinoma Q-value 1.34E-03 8.63E-03 2.38E-02 2.59E-02 3.35E-02 3.35E-02 ID DOID:299 DOID:305 DOID:162 DOID:14566 DOID:0050687 DOID:4 Term adenocarcinoma carcinoma cancer disease of cellular proliferation cell type cancer disease Q-value 6.86E-10 1.33E-07 3.14E-06 2.07E-05 4.73E-04 4.73E-04 4.73E-04 6.52E-04 1.56E-02 2.09E-02 2.16E-02 2.16E-02 3.38E-02 ID UBERON:0007100 UBERON:0000948 UBERON:0003103 UBERON:0001009 UBERON:0000946 UBERON:0003978 UBERON:0004151 UBERON:0004535 UBERON:0010313 UBERON:0010314 UBERON:0002081 UBERON:0002133 UBERON:0007023 Term circulatory organ heart compound organ circulatory system cardial valve valve cardiac chamber cardiovascular system neural crest-derived structure structure with developmental contribution from neural crest cardiac atrium atrioventricular valve adult organism 22. Mouth, throat & skeletal muscle tissue Q-value 1.07E-02 7.49E-02 8.34E-03 8.34E-03 3.61E-02 3.61E-02 3.61E-02 7.49E-02 7.49E-02 6.12E-03 ID UBERON:0000475 UBERON:0000341 UBERON:0001134 UBERON:0002036 UBERON:0000383 UBERON:0001015 UBERON:0002385 UBERON:0001630 UBERON:0005090 UBERON:0000922 Term organism subdivision throat skeletal muscle tissue striated muscle tissue musculature of body musculature muscle tissue muscle organ muscle structure embryo ID UBERON:0000025 UBERON:0000117 UBERON:0000170 UBERON:0000171 UBERON:0002048 UBERON:0004111 UBERON:0000065 UBERON:0005178 UBERON:0001004 UBERON:0005181 UBERON:0000915 Term tube respiratory tube pair of lungs respiration organ lung anatomical conduit respiratory tract thoracic cavity organ respiratory system thoracic segment organ thoracic segment of trunk 18. Male reproductive organs Q-value 4.81E-02 ID UBERON:0003135 Term male reproductive organ 19. Liver & kidney Q-value 3.68E-03 3.68E-03 4.96E-03 4.96E-03 3.52E-02 3.52E-02 6.17E-02 6.61E-02 7.75E-02 ID UBERON:0005172 UBERON:0005173 UBERON:0000916 UBERON:0002417 UBERON:0002107 UBERON:0006925 UBERON:0005177 UBERON:0009569 UBERON:0002423 Term abdomen organ abdominal segment organ abdomen abdominal segment of trunk liver digestive gland trunk organ subdivision of trunk hepatobiliary system 20. Gastrointestinal system Q-value 1.76E-06 1.76E-05 4.06E-05 2.47E-04 2.52E-04 9.30E-04 2.71E-03 1.07E-02 1.69E-02 3.61E-02 8.51E-03 ID UBERON:0005409 UBERON:0000160 UBERON:0004921 UBERON:0001007 UBERON:0001555 UBERON:0000059 UBERON:0000481 UBERON:0001155 UBERON:0004119 UBERON:0000922 UBERON:0011216 Term gastrointestinal system intestine subdivision of digestive tract digestive system digestive tract large intestine multi-tissue structure colon endoderm-derived structure embryo organ system subdivision 23. Lung Q-value 7.98E-03 1.23E-02 1.23E-02 1.23E-02 1.23E-02 3.02E-02 3.38E-02 3.68E-02 8.77E-02 4.17E-02 8.15E-02 24. Glands & internal genitalia Q-value 1.87E-02 2.39E-02 2.39E-02 3.57E-02 2.76E-02 2.76E-02 2.76E-02 2.76E-02 3.59E-02 8.27E-02 7.00E-08 ID UBERON:0004175 UBERON:0000990 UBERON:0005156 UBERON:0003133 UBERON:0001044 UBERON:0003294 UBERON:0003408 UBERON:0010047 UBERON:0002530 UBERON:0002553 UBERON:0007023 Term internal genitalia reproductive system reproductive structure reproductive organ salivary gland gland of foregut gland of gut oral gland gland anatomical cavity adult organism Supplementary Figure 21: Annotation of trunk organs clusters Enriched annotations for each cluster shown in Supplementary Fig. 20. Explanation in legend of Supplementary Fig. 16. 24 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 22 a Regulatory networks pineal gland, adult eye, fetal 25. Pineal gland & eye retina, adult retinoblastoma cell line medulloblastoma cell line merkel cell carcinoma cell line carcinoid cell line neuroblastoma cell line neuroectodermal tumor cell line testicular germ cell embryonal carcinoma cell line 26. Neuron-associated cells & cancer teratocarcinoma cell line argyrophil small cell carcinoma cell line gastric cancer cell line pituitary gland, adult small cell lung carcinoma cell line small cell gastrointestinal carcinoma cell line melanoma cell line Melanocyte Retinal Pigment Epithelial Cells Lens Epithelial Cells 27. Astrocytes & pigment cells Ciliary Epithelial Cells !//=ral cortex !//=$$< cord blood derived cell line embryonic kidney cell line mesothelioma cell line _$$ epitheloid cancer cell line anaplastic squamous cell carcinoma cell line synovial sarcoma cell line 28. Sarcoma & neuroectodermal tumors rhabdomyosarcoma cell line Wilms tumor cell line peripheral neuroectodermal tumor cell line neuroepithelioma cell line carcinosarcoma cell line small cell cervical cancer cell line somatostatinoma cell line b 25. Eye Q-value 27. Astrocytes & pigment cells ID Term Q-value 3.67E-03 4.67E-03 2.79E-02 2.79E-02 2.79E-02 2.79E-02 3.52E-02 4.33E-02 4.33E-02 5.10E-02 6.61E-02 7.75E-02 7.75E-02 1.28E-02 4.45E-02 4.45E-02 4.45E-02 4.45E-02 6.88E-02 No enriched terms 26. Neuron-associated cells & cancer Q-value 9.22E-02 3.61E-04 4.21E-04 5.19E-04 7.57E-04 5.83E-03 5.83E-03 1.89E-02 2.63E-02 4.15E-02 ID CL:0000095 DOID:162 DOID:14566 DOID:0050687 DOID:4 DOID:2994 DOID:3095 DOID:305 DOID:0050686 DOID:3119 Term neuron associated cell cancer disease of cellular proliferation cell type cancer disease germ cell cancer germ cell and embryonal cancer carcinoma organ system cancer gastrointestinal system cancer ID UBERON:0007625 UBERON:0010371 UBERON:0000019 UBERON:0000047 UBERON:0004088 UBERON:0010230 UBERON:0000970 UBERON:0001456 UBERON:0002104 UBERON:0004121 UBERON:0000020 UBERON:0001032 UBERON:0004456 CL:0000075 CL:0000126 CL:0000127 CL:0000128 CL:0000147 CL:0000325 Term pigment epithelium of eye ecto-epithelium camera-type eye simple eye ocular region eyeball of camera-type eye eye face visual system ectoderm-derived structure sense organ sensory system entire sense organ system columnar/cuboidal epithelial cell macroglial cell astrocyte oligodendrocyte pigment cell stuff accumulating cell 28. Sarcoma & neuroectodermal tumors Q-value 1.34E-03 2.40E-03 2.60E-03 8.63E-03 1.00E-02 1.21E-02 9.65E-02 ID DOID:0050687 DOID:1115 DOID:4 DOID:162 DOID:14566 DOID:171 DOID:169 Term cell type cancer sarcoma disease cancer disease of cellular proliferation neuroectodermal tumor neuroendocrine tumor Supplementary Figure 22: Dendrogram and annotation of neuron-associated clusters Explanation in legend of Supplementary Fig. 16. 25 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 23 a salivary acinar cells keratoacanthoma cell line cervical cancer cell line nasal epithelial cells Corneal Epithelial Cells Esophageal Epithelial Cells Ker"ral Bronchial Epithelial Cell 29. Epithelial cells Tracheal Epithelial Cells Urothelial Cells Small Airway Epithelial Cells Mammary Epithelial Cell Gingival epithelial cells "$$"rived cells Sebocyte Ker"rmal pancreatic carcinoma cell line choriocarcinoma cell line placenta, adult 30. Extraembryonic membrane chorionic membrane cells amniotic membrane cells serous cystadenocarcinoma cell line renal cell carcinoma cell line serous adenocarcinoma cell line clear cell carcinoma cell line endometrial cancer cell line Placental Epithelial Cells 31. Epithelial cells of kidney & uterus Amniotic Epithelial Cells Alveolar Epithelial Cells Renal Mesangial Cells Renal Proximal Tubular Epithelial Cell Renal Epithelial Cells Renal Cortical Epithelial Cells maxillary sinus tumor cell line breast carcinoma cell line bronchioalveolar carcinoma cell line papillotubular adenocarcinoma cell line mucinous adenocarcinoma cell line gall bladder carcinoma cell line ductal cell carcinoma cell line prostate cancer cell line tr""$$$"/"$$$ malignant trichilemmal cyst cell line 32. Lung epithelium & lung cancer squamous cell carcinoma cell line epidermoid carcinoma cell line oral squamous cell carcinoma cell line bronchogenic carcinoma cell line glassy cell carcinoma cell line squamous cell lung carcinoma cell line $"/J$$keratinizing squamous carcinoma cell line acantholytic squamous carcinoma cell line pharyngeal carcinoma cell line bronchial squamous cell carcinoma cell line b 29. Epithelial cells Q-value 4.89E-05 2.75E-04 1.78E-03 2.03E-03 2.98E-02 5.13E-02 7.29E-02 ID CL:0000066 CL:0002076 CL:0002077 CL:0002159 CL:0002251 CL:0002368 CL:0000622 31. Epithelial cells of kidney & uterus Term epithelial cell endo-epithelial cell ecto-epithelial cell general ecto-epithelial cell epithelial cell of alimentary canal respiratory epithelial cell acinar cell 30. Extraembryonic membrane Q-value 1.37E-02 4.60E-02 4.60E-02 ID UBERON:0000478 UBERON:0000158 UBERON:0005631 Term extraembryonic structure membranous layer extraembryonic membrane 32. Lung epithelium & lung cancer Q-value 7.21E-02 2.58E-02 2.58E-02 7.21E-02 9.57E-02 9.73E-02 1.94E-08 4.48E-04 6.20E-02 2.40E-03 1.80E-02 2.70E-14 1.99E-11 2.82E-09 3.63E-09 1.11E-08 1.21E-08 ID UBERON:0000115 UBERON:0000464 UBERON:0000466 UBERON:0004802 UBERON:0004807 UBERON:0005911 CL:0000066 CL:0000076 CL:0000082 DOID:0050615 DOID:1324 DOID:305 DOID:0050687 DOID:162 DOID:14566 DOID:4 DOID:1749 Term lung epithelium anatomical space immaterial anatomical entity respiratory tract epithelium respiratory system epithelium endo-epithelium epithelial cell squamous epithelial cell epithelial cell of lung respiratory system cancer lung cancer carcinoma cell type cancer cancer disease of cellular proliferation disease squamous cell carcinoma Q-value 1.61E-05 1.61E-05 1.61E-05 1.61E-05 3.59E-05 3.59E-05 9.47E-05 9.47E-05 9.47E-05 9.47E-05 1.41E-04 1.41E-04 2.20E-04 3.58E-04 3.58E-04 1.95E-03 1.95E-03 7.64E-03 7.64E-03 9.02E-03 1.07E-02 1.07E-02 4.33E-02 9.57E-02 2.66E-07 2.66E-07 9.71E-06 1.22E-05 6.67E-05 6.67E-05 2.75E-04 2.75E-04 4.60E-02 4.60E-02 6.16E-03 1.11E-02 1.89E-02 ID UBERON:0001285 UBERON:0004819 UBERON:0006555 UBERON:0007684 UBERON:0002113 UBERON:0011143 UBERON:0001231 UBERON:0004211 UBERON:0004810 UBERON:0009773 UBERON:0001008 UBERON:0006554 UBERON:0000489 UBERON:0001225 UBERON:0008987 UBERON:0000353 UBERON:0001851 UBERON:0005172 UBERON:0005173 UBERON:0003103 UBERON:0000916 UBERON:0002417 UBERON:0003914 UBERON:0000474 CL:0002518 CL:1000497 CL:1000449 CL:0000066 CL:1000494 CL:1000507 CL:0002584 CL:0002681 CL:0002149 CL:0002255 DOID:120 DOID:193 DOID:299 Term nephron kidney epithelium excretory tube uriniferous tubule kidney upper urinary tract nephron tubule nephron epithelium nephron tubule epithelium renal tubule excretory system urinary system structure cavitated compound organ cortex of kidney renal parenchyma parenchyma cortex abdomen organ abdominal segment organ compound organ abdomen abdominal segment of trunk epithelial tube female reproductive system kidney epithelial cell kidney cell epithelial cell of nephron epithelial cell epithelial cell of renal tubule kidney tubule cell renal cortical epithelial cell kidney cortical cell epithelial cell of uterus stromal cell of endometrium female reproductive organ cancer reproductive organ cancer adenocarcinoma Supplementary Figure 23: Dendrogram and annotation of epithelium clusters Explanation in legend of Supplementary Fig. 16. 26 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 24 (i) Nervous system: ectoderm, neural crest P-value 1.69E-21 2.94E-20 1.63E-08 1 2 Neurons, fetal brain Nervous system, hindbrain 3 Forebrain 4 Mesenchymal, mixed 5 Sarcoma 6 Endothelial cells 7 Mesenchymal, smooth muscle cells 8 Mesenchymal, skeletal muscle cells 9 Connective tissue, integumental cells ID UBERON:0004121 UBERON:0010314 UBERON:0010316 Term ectoderm-derived structure structure with developmental contribution from neural crest germ layer / neural crest (ii) Mesenchyme: mesoderm P-value 3.07E-05 1.39E-14 ID UBERON:0004120 CL:0000222 Term mesoderm-derived structure mesodermal cell Embryonic germ layers 10 11 12 13 14 Ectoderm, neural crest Mesoderm Endoderm Lymphocytes Myeloid leukocytes Lymphocytes of B lineage Lymphoma Immune organs (iii) Immune system: mesoderm Myeloid leukemia Endo-epithelial cells Adenocarcinoma Male reproductive organs Liver, kidney Gastrointestinal system Heart 22 Mouth, throat, skeletal muscle tissue 23 Lung 15 16 17 18 19 20 21 24 P-value 4.41E-03 8.60E-03 8.60E-03 8.60E-03 Eye Neuron-associated cell, cancer 27 Astrocytes, pigment cells 28 Sarcoma, neuroectodermal tumors 30 31 Epithelial cells Extraembryonic membrane Epithelial cells of kidney, uterus 32 Lung epithelium, lung cancer P-value 4.29E-08 2.57E-11 ID UBERON:0004119 UBERON:0010316 Term endoderm-derived structure germ layer / neural crest (v) Neuron-associated: ectoderm, neural crest Glands, internal genitalia 25 26 29 (iv) Trunk organs: mixed origin ID UBERON:0004121 UBERON:0002346 UBERON:0010316 CL:0000333 Term ectoderm-derived structure neurectoderm germ layer / neural crest migratory neural crest cell (vi) Epithelium: mixed origin P-value 3.27E-03 ID UBERON:0004119 Term endoderm-derived structure Supplementary Figure 24: Developmental origin of regulatory networks pertaining to different highlevel clusters We performed annotation enrichment analysis for each high-level cluster (i-vi, Supplementary Fig. 15), focusing specifically on terms related to the three embryonic germ layers (ectoderm and neural crest, mesoderm, and endoderm). Regulatory networks of the first cluster (i, nervous system) are derived from ectoderm and neural crest. Regulatory networks of the second cluster (ii, mesenchyme) are mesoderm-derived. Immune cells (clusters 11-13) are derived from hematopoietic stem cells (the corresponding high-level cluster [iii, immune system] does not show significant enrichment for mesoderm; this is just an artifact because immune cells and corresponding cancer cells were not systematically annotated as mesodermal cells in the FANTOM5 sample ontology). The fourth cluster (iv, trunk organs) groups regulatory networks of mixed origin (endoderm and neural crest show significant enrichment, but we also found mesoderm-derived tissues). The fifth cluster (v, neuron-associated) enriches for ectoderm and neural crest. Epithelial cells (cluster vi) are derived from all three germ layers. In summary, regulatory networks that are clustered together often share a common developmental origin. 27 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 25 Immune cells Mean number of TFs per gene 11 Myeloid leukocytes 10 Lymphocytes Nervous system 2 Nervous system & adult hindbrain 1 Neurons & fetal brain 3 Adult forebrain 12.5 10 7.5 11 10 2 1 3 26 23 21 20 16 30 19 14 22 7 12 15 31 28 6 29 13 27 17 24 9 32 8 5 4 Cluster index Supplementary Figure 25: Comparing number of TFs per gene across lineages Boxplots show the mean number of TF inputs per gene (i.e., the mean indegree) for all networks across the 32 clusters defined in Supplementary Fig. 14 (weak edges with weights below 0.05 were not counted for the indegrees). Immune cells, followed by cells and tissues of the nervous system, have the highest number of TFs per gene, indicating that they rely on more intricate, combinatorial regulation than other cells and tissues (the observed differences between the mean indegrees of immune cells, nervous system, and the remaining clusters are statistically significant; Supplementary Fig. 26). Both immune cells and neurons perform highly adaptive functions that require integration of diverse extra-cellular signals, e.g., to recognize distinct pathogens or guide neuronal wiring. Our networks suggest that the orchestration of transcriptional responses in these cells involves intricate regulatory programs. 28 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 26 a b 11. Myeloid leukocytes 10. Lymphocytes 2. Nervous system & adult hindbrain 1. Neurons & fetal brain 3. Adult forebrain ]</""$$"/ 23. Lung 21. Heart 20. Gastrointestinal system Y$"$$$s & hepatocytes 30. Extraembryonic membrane 19. Liver & kidney 14. Immune organs 22. Mouth, throat & skeletal muscle tissue 7. Mesenchymal stem & smooth muscle cells 12. Lymphocytes of B lineage 15. Myeloid leukemia 31. Epithelial cells of kidney & uterus 28. Neuroectodermal tumor & sarcoma 6. Endothelial cells 29. Epithelial cells 13. Lymphoma 27. Astrocytes & pigment cell 17. Adenocarcinoma 24. Glands & internal genitalia 9. Connective tissue & integumental cells 32. Lung epithelium & lung cancer 8. Connectivte tissue & mucle cells 5. Sarcoma 4. Mesenchymal, mixed Immune cells p<10-6 Nervous system p<0.01 7.5 10 12.5 100 Mean number of TFs per gene c 150 200 d Immune cells 11. Myeloid leukocytes 10. Lymphocytes 2. Nervous system & adult hindbrain 1. Neurons & fetal brain 3. Adult forebrain ]</""$$"/ 23. Lung 21. Heart 20. Gastrointestinal system Y$"$$$s & hepatocytes 30. Extraembryonic membrane 19. Liver & kidney 14. Immune organs 22. Mouth, throat & skeletal muscle tissue 7. Mesenchymal stem & smooth muscle cells 12. Lymphocytes of B lineage 15. Myeloid leukemia 31. Epithelial cells of kidney & uterus 28. Neuroectodermal tumor & sarcoma 6. Endothelial cells 29. Epithelial cells 13. Lymphoma 27. Astrocytes & pigment cell 17. Adenocarcinoma 24. Glands & internal genitalia 9. Connective tissue & integumental cells 32. Lung epithelium & lung cancer 8. Connectivte tissue & mucle cells 5. Sarcoma 4. Mesenchymal, mixed Nervous system 8000 10000 12000 Number of active genes per network 100 150 200 29 250 Average network clustering coefficient Supplementary Figure 26: Comparing network properties across lineages (Caption next page.) Nature Methods: doi:10.1038/nmeth.3799 250 Mean number of target genes per TF Supplementary Figure 26: (Previous page.) Boxplots show network properties for all 394 regulatory networks, grouped according to the 32 clusters defined in Supplementary Fig. 14. The same cluster order on the vertical axis is used in all four panels. (a) Mean number of TFs per gene (i.e., mean indegree; same as Supplementary Fig. 25). Clusters were ordered on the vertical axis by their median. Immune cells, followed by cells and tissues of the nervous system, have the highest number of TFs per gene. The observed differences are statistically significant: immune cells have a higher mean indegree than nervous system networks (p<10-6, two-sided Wilcoxon rank-sum test), and the latter come before other networks with a high mean indegree, such as those of lung (p<0.01). Note, cluster 26 (neuron-associated cells and cancer) also contains some neural tissues and related tumors, which explains why it comes right after the nervous system clusters. (b) Mean number of target genes per TF (i.e., mean outdegree). Networks of immune cells and nervous system clusters have relatively high mean outdegree; however, in contrast to the mean indegree, differences between immune cells, nervous system, and lung (as an example of another cluster with high mean outdegree) are not statistically significant (p>0.05). (c) The number of active genes in each network mainly depends on whether it is derived from a cell or tissue sample. Tissues comprise multiple cell types and thus tend to have more active genes (the union of the genes active in each cell type). Nervous system networks have a similar number of active genes as other tissue-specific networks (e.g., lung). Immune cell networks have a lower number of active genes, but again not significantly different from other cell type-specific networks (e.g., connective tissue and muscle cells). (d) There are no notable differences in the average network clustering coefficient between different lineages. 30 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 27 Network GWAS Summary statistics (1) Gene-level p-values (2) Pairwise connectivity Significance SNP p-values How “close” are any two genes in the network? Connectivity matrix (diffusion kernel) Reference population SNP correlation matrix Pascal Gene p-value Genotypes (3) Connectivity enrichment curves Connectivity enrichment Do trait-associated genes cluster within network? Original data Gene-label permutations Genes ranked by p-value Phenotype Genotype G G G G A G A G A A Genotype-phenotype vs. gene label permutations A A (4) Connectivity enrichment scores Summarize enrichment curves Area under the curve (5) Validation LD and other counfounders accounted for? Original data Gene-label permutations Enrichment score = –log10(empirical p-value) Supplementary Figure 27: Overview of network connectivity enrichment analysis Given summary statistics from a GWAS and a network, our pipeline evaluates whether genes perturbed by traitassociated variants are more densely inter-connected than expected. Here we give a general overview of the approach, see Supplementary Fig. 28 for an illustrative example. (1) SNP association p-values from the GWAS are summarized at the level of genes using Pascal,8 a fast and accurate tool that corrects for LD structure using information from a reference population (we used the European panel of the 1000 genomes project9 ). (2) A pairwise connectivity matrix is computed, which defines how "close" any two genes are in the network based on a random-walk kernel.10 (3) Genes are ranked by their GWAS p-value, from most to least significant, and the mean connectivity between the top k genes is evaluated — as k is varied over the complete list — both for the original data and random permutations (Supplementary Fig. 28). A conservative approach is used for the permutations, where the network structure remains completely fixed and only the labels of genes with similar network degree are shuffled.11 Gene pairs that are in LD are excluded from the analysis (Supplementary Fig. 29). (4) Enrichment curves are summarized by the signed area under the curve, both for the original data and the permutations, and a corresponding empirical p-value is computed. (5) We validated that LD structure and other potential confounders are accounted for by showing that enrichment scores based on gene label permutations are equivalent to those obtained using genotype-phenotype permutations (Supplementary Fig. 30). 31 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 28 Network connectivy Higher than expected Lower than expected b HDL cholesterol Original data 10,000 permutations (99% confidence interval) Genome-wide significant genes (GWAS p-value < 10) Signed area under the curve a Original data Enrichment score = –log10(q-value) Permutations 0 500 1000 1500 Genes ranked by GWAS p-value Supplementary Figure 28: Illustrative example for network connectivity enrichment analysis Network connectivity analysis is illustrated using an example trait and network that show strong enrichment (HDL cholesterol and the global regulatory network defined in Methods). (a) Genes are ranked by GWAS p-value. For each position k in the ranked list, the network connectivity between the top k genes is computed using a random-walk kernel (red line; positive and negative values indicate higher and lower connectivity than expected, respectively). The same is done for 10,000 permutations of the data (the grey area contains 99% of the corresponding curves). In this example, the top 500 genes show significantly increased connectivity, which includes genes with GWAS p-values that do not reach the genome-wide significance threshold (dashed line). (b) To summarize connectivity enrichment, the signed area under the curve is computed both for the original data (red line) and the permutations (boxplot), resulting in an empirical p-value and corresponding score that gauge how clustered trait-associated genes are within the network. 32 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 29 Chromosome 1 Chromosome 6 1000 750 Genes Genes 750 500 500 250 250 HLA region 250 500 750 1000 250 Genes 500 750 Genes Correlation of GWAS p-values Neighboring genes (distance <1mb) 0.00 0.25 0.50 0.75 Supplementary Figure 29: Correlation between neighboring genes due to linkage disequilibrium The GWAS association signal of neighboring genes is often correlated due to linkage disequilibrium (LD). Here we show the pairwise correlation between gene scores (GWAS p-values summarized by gene using Pascal8 ) across 500 simulated phenotypes (Methods). Genes of chromosome 1 and 6 are shown, ordered by genomic position (the same observations were be made for other chromosomes). Genes that are less than 1 mega-base apart (magenta overlay) often show increased correlation. This is relevant for network analysis because neighboring genes are often also functionally related and/or co-regulated, i.e., they are also neighbors at the level of pathways or networks. An extreme example is the human leukocyte antigen (HLA) gene cluster on chromosome 6. If not properly corrected for, such clusters of genes that are both correlated and functionally related lead to inflated pathway or network enrichment scores. To ensure that our results were not confounded by neighboring, functionally related genes, we took a conservative approach and completely excluded all gene pairs with a distance of less than 1 mega-base from the network connectivity enrichment analysis (Methods). Since the HLA genes form an exceptionally large cluster that also shows strong association with many immune-related traits, we further completely excluded all genes in the HLA region from the connectivity analysis. This ensures that the observed network connectivity enrichment is not driven by the HLA genes or similar gene clusters. 33 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 30 Score G en ePh la be en lp ot yp er m eut la be at io lp ns er m ut at io ns (Signed area under enrichment curve) No significant difference Supplementary Figure 30: Validation of connectivity enrichment scores using phenotype-label permutations Our method computes connectivity enrichment scores empirically based on permutations of the data. It is thus crucial that permutations are done appropriately such that no bias is introduced. Phenotype label permutation is generally considered the gold standard for this purpose because it preserves both the network structure and genomic architecture (e.g., LD structure and gene clusters, see Supplementary Fig. 29). However, it is computationally intensive and requires access to genotypic data, which are rarely shared. Thus, a standard approach used in network-based GWAS analysis is within-degree gene label permutation: the labels of genes with similar network degree are shuffled, while the network structure itself remains completely fixed11, 12 (i.e., no edges are rewired, only node labels are reassigned). However, in our case the relevant confounder is not the degree of the gene, but rather its mean pairwise connectivity across all other genes (we call this the centrality of the gene), which is closely related — but not equivalent — to the degree. Thus, we used the same approach as within-degree gene label permutation, but in our case the genes were binned based on their centrality (Methods). This permutation method implies that the null distribution of connectivity between the top k genes is entirely a function of the centrality of these genes.11 In order to validate the within-centrality gene label permutation method, we generated 500 phenotype-label permutations and compared the resulting scores (signed area under the connectivity enrichment curve; Methods). We found no significant difference (p=0.18, two-sided Wilcoxon rank-sum test), thus confirming that our method based on within-centrality gene label permutation adequately corrects for confounders. I.e., the resulting enrichment scores are equivalent to those obtained using phenotype label permutation. Note that this only works because our method excludes gene pairs that are in LD as explained in Supplementary Fig. 29 — if these gene pairs are included scores become inflated. 34 Nature Methods: doi:10.1038/nmeth.3799 -s p Nature Methods: doi:10.1038/nmeth.3799 pe ci Ti Pro fic ss te re ue in gu -s -pr lat pe ot or c e y G Ti lo G ific in i cir ss ba lo c nt cu ue l r ba o- er its -s eg l c ex act pe u o pr io ci lato -ex es n fic r p si D y (C res on N as hI sio e P- n fo se ot q) pr in ts 1 All genes 0 35 –log10(q-value) Enrichment score b -s 0 –log10(q-value) Regulatory (ChIP-seq) –log10(q-value) –log10(q-value) Networks Tissue-specific co-expression 0 Global coexpression Tissue-specific regulatory –log10(q-value) Protein interaction –log10(q-value) Psychiatric ue –log10(q-value) Enrichment score / Sc hi zo De Bi p / pr po A hre / es lar no ni siv d re a In fla e iso xia di rd m so er m U at l Au rde or ce y ra A tis r b Rh C ow tive DHm eu roh el d co D li m n He Mu ato's disea tis pa ltip id ise se tit le ar as Al Tyis C scl thri e zh pe re er tis ei 1 so os m d lv is Pa er ia e 's b r rk in Na dis etes s o HD n rco eas s 's LDL c dislepse h y L Bl T ot choleseas o al o te e Cood l ro pre T cho esterol na ss rig le ro ry ur lyc ste l G Fartee (s eridrol lyc a r ys e at sti y d to s e ng is lic Tyd he gl eas ) pe m uc e 2 og ose Fa st 2h diablobi i In ng r g e n In sul pr luc tes sulinin s oinsose " re ecr uli s e n M Fa$$ istation a st \< nc M cul in e ac ar g ul de in ar g su de en l ge era H BMin ne tio e I ig r O ation (w ht st n e eo (d t) po ry ro ) sis a Ti ss ec i Ti Pro fic ss te re ue in gu -s -pr lat pe ot or c e y Ti Glo G ific in i cir ss ba lo c nt cu ue l r ba o- er its -s eg l c ex act pe u o pr io ci lato -ex es n fic r p si D y (C res on N as hI sio e P- n fo se ot q) pr in ts ss ue Ti Supplementary Figure 31 Neurodegenerative Cardiovascular Anthropometric Immune-related Blood lipids Glycemic Others 2 1 0 2 1 0 2 1 0 2 1 2 1 10% FDR GWAS c Genome-wide significant genes 2 2 1 0 Supplementary Figure 31: Connectivity enrichment scores for different types of networks (Caption next page.) Supplementary Figure 31: (Previous page.) Connectivity enrichment scores for different types of networks and GWAS traits. See legend of Fig. 2c and Methods for a description of the different networks. (a) Same as Fig. 2c, but here we show results for all traits, not just those showing significant connectivity enrichment for at least one network. Some traits do not show enrichment, which may be either because the signal was too weak, the right tissues were not profiled, or other types of networks may be more relevant for these traits. (b-c) Overall connectivity enrichment scores when considering (b) all genes and (c) only genes with GWAS p-values that reach genome-wide significance. (b) This panel summarizes the results of Fig. 2c. Tissue-specific regulatory networks show the strongest connectivity enrichment, followed by protein-protein interaction networks and tissue-specific co-expression networks. Among the four protein-protein interaction networks tested (Methods), the InWeb database13 showed the best connectivity enrichment, covering the largest number of traits. The regulatory networks from ENCODE based on ChIP-seq14 and DNaseI footprints15 only show weak connectivity enrichment for most traits. Note that the networks based on DNaseI footprints were likely not comprehensive enough to reveal perturbed regulatory modules because they are limited to TF genes and promoter-level interactions (Methods). (c) When considering only genes that pass the GWAS significance threshold, the connectivity enrichment scores are much lower, suggesting substantial contribution of weakly associated genes. 36 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 32 a e Y"" Cardiovascular High-level networks Individual networks High-level networks Individual networks Cross-disorder Suppl. Fig. 33 Suppl. Fig. 33 %'"+ No enrichment No enrichment Schizophrenia Suppl. Fig. 34 Suppl. Fig. 34 ; No enrichment = @ Suppl. Fig. 35 No enrichment Bipolar disorder No enrichment Suppl. Fig. 36 Depressive disorder No enrichment No enrichment No enrichment No enrichment ADHD No enrichment f No enrichment J"" High-level networks Individual networks No enrichment Z Fasting glucose No enrichment No enrichment J"$ No enrichment No enrichment TX No enrichment No enrichment X$" No enrichment No enrichment Fasting proinsulin No enrichment No enrichment Suppl. Fig. 37 Suppl. Fig. 37 Insulin secretion No enrichment No enrichment Ulcerative colitis Suppl. Fig. 38 Suppl. Fig. 38 Insulin resistance No enrichment No enrichment Crohns disease Suppl. Fig. 39 Suppl. Fig. 39 Beta-cell function Suppl. Fig. 46 No enrichment Suppl. Fig. 40a Suppl. Fig. 40a Fasting insulin No enrichment No enrichment Multiple sclerosis No enrichment Suppl. Fig. 40b Hepatitis C resolvers No enrichment Suppl. Fig. 40c T No enrichment No enrichment High-level networks Individual networks g " High-level networks Individual networks c Neurodegenerative % = Suppl. Fig. 45 Suppl. Fig. 45 Height No enrichment No enrichment High-level networks Individual networks rs disease Suppl. Fig. 41 Suppl. Fig. 41 !" Suppl. Fig. 42 Suppl. Fig. 42 Parkinsons disease No enrichment No enrichment h Others High-level networks Individual networks d Blood lipids >?' @"+ Suppl. Fig. 47 Suppl. Fig. 47 >?'+ No enrichment No enrichment Osteoporosis No enrichment No enrichment High-level networks Individual networks HDL cholesterol Suppl. Fig. 43a Not tested LDL cholesterol Suppl. Fig. 43b Not tested Total cholesterol Suppl. Fig. 44a Not tested T$" Suppl. Fig. 44b Not tested ![" "@ " Trait-relevant cell types rank first Unrelated cell types (false positives) Globally strong enrichment (diverse cell types) No enrichment (score < 1.0) Supplementary Figure 32: Overview of network connectivity enrichment results for all traits Traits are listed in same order as in Supplementary Fig. 31. Refer to the indicated supplementary figures and Results for details. Possible reasons why some traits did not show significant connectivity enrichment are: (1) the signal was too weak (four out of five GWASs with <10,000 individuals were not enriched in any network), (2) the relevant tissues were not profiled (e.g., our library does not include pancreatic islet cells relevant for type 2 diabetes16 ), or (3) other types of networks (e.g., post-transcriptional) may be more relevant for these traits. (a–c) Psychiatric, immune-related, and neurodegenerative disorders showed either highly specific connectivity enrichment in disease-related cell types and tissues or no enrichment at all, with the exception of Crohn’s disease. (d) Blood lipid traits showed strong connectivity enrichment in many diverse networks. (e–f ) Cardiovascular and glycemic traits showed no connectivity enrichment except for one likely false positive (β-cell function). (g–h) Body mass index and age-related macular degeneration (AMD) of neovascular type consistently showed connectivity enrichment in trait-related networks, while the remaining traits showed no enrichment. 37 Nature Methods: doi:10.1038/nmeth.3799 a Supplementary Figure 33 b Individual networks High-level networks Nervous system & adult hindbrain (2) Neurons & fetal brain (1) Forebrain (3) Rank 1 2 3 0.0 1.0 Neural stem cells Caudate nucleus, adult Thalamus, adult Locus coeruleus, adult Frontal lobe, adult Occipital cortex, adult Pons, adult Spinal cord, fetal Neurons Parietal lobe, adult Olfactory region, adult Hippocampus, adult Cerebellum, adult Amygdala, adult Brain, adult Medial frontal gyrus, adult Paracentral gyrus, adult Medial temporal gyrus, adult Globus pallidus, adult Spinal cord, adult Diencephalon, adult Insula, adult Postcentral gyrus, adult Temporal lobe, adult Cerebral meninges, adult Occipital pole, adult Nucleus accumbens, adult Substantia nigra, adult 1 2 3 4 2.0 –log10(q-value) 5 6 7 8 c 9 Cognitive control 10 11 Basal ganglia Thalamus Prefrontal cortex Caudate nucleus Thalamic nuclei 12 Rank Frontal lobe 13 14 15 16 d Cerebellar modulation of cognition 17 18 Frontal lobe 19 Prefrontal cortex 20 21 22 Thalamus Pons Thalamic nuclei Locus coeruleus 23 24 25 26 Basal ganglia Caudate nucleus Cerebellum 27 28 0.0 1.0 2.0 –log10(q-value) Supplementary Figure 33: Connectivity enrichment for the psychiatric, cross-disorder study Connectivity enrichment scores for the psychiatric, cross-disorder study across (a) the 32 high-level networks and (b) the corresponding individual networks. All networks with score > 1.0 are shown (10% FDR, dashed line). Numbers in parentheses correspond to cluster indexes (Supplementary Fig. 14). (a) The psychiatric cross-disorder study showed the strongest clustering of perturbed genes precisely in the three high-level networks of nervous system and brain. (b) After neural stem cells, the strongest connectivity enrichment was observed in the caudate nucleus, thalamus, and locus coeruleus, which are all implicated in psychiatric disorders. The caudate nucleus is a basic structure of the basal ganglia, strongly innervated by dopamine neurons, which performs important functions related to goal-directed action, memory, learning, and emotion. Dopamine is an important neurotransmitter implicated in psychiatric disorders, e.g., the dopaminergic system of the basal ganglia manifests pathological anomalies in schizophrenia patients and is the primary target of current antipsychotic drugs.17 The thalamus, located adjacent to the caudate nucleus, is an information processing and communication hub involved in sensory and cognitive processes that has been implicated in the pathophysiology of schizophrenia, for instance.18 The locus coeruleus is part of the pons, which also showed strong signal (7th rank). It is responsible for physiological responses to emotional pain, stress, and panic and has widespread projections utilizing the neurotransmitter noradrenalin (the locus coeruleus-noradrenergic system). Dysregulation of this system contributes to cognitive dysfunction associated with a variety of psychiatric disorders, including attentiondeficit hyperactivity disorder, sleep and arousal disorders, and post-traumatic stress disorder.19 (c–d) Schematic representation of key cerebral circuits underlying cognitive function that are disrupted in psychiatric disorders (adapted from Millan et al.,20 colors match the corresponding networks in Panel b). The brain structures showing the strongest clustering of perturbed genes are not only individually relevant to psychiatric disorders, as explained above, but they also constitute the core components of these integrated cognitive circuits, further supporting an etiological model where their dysregulation is a cause of impaired cognitive function in patients. (c) The frontal lobe, basal ganglia, and thalamus form a feedback loop that integrates cognitive control, attention, and working memory.20 (d) Cognition is modulated by the cerebellum through feedback loops with the basal ganglia and the cortex, mainly via the thalamus and the pons.20 38 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 34 Adult forebrain (3) Nervous system and adult hindbrain (2) 0.0 1.0 2.0 –log10(q-value) Rank Rank High-level networks 1 2 Individual networks Putamen Globus pallidus Occipital cortex Frontal lobe Caudate nucleus Locus coeruleus Insula Hippocampus Cerebellum Optic nerve Amygdala Medial temporal gyrus Thalamus Spinal cord Basal ganglia Neural 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0.0 1.0 2.0 –log10(q-value) Supplementary Figure 34: Connectivity enrichment for schizophrenia Same as Fig. 3a, but showing all networks at FDR < 10%. 39 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 35 Rank High-level networks Neural Endocrine system Other Glands and internal genitalia (24) Pineal gland and eye (25) Nervous system and adult hindbrain (2) Mouth, throat and skeletal muscle tissue (22) Liver and kidney (19) Extraembryonic membrane (30) Mesenchymal stem and smooth muscle (7) Myeloid leukocytes (11) Heart (21) 1 2 3 4 5 6 7 8 9 0.0 1.0 2.0 –log10(q-value) Supplementary Figure 35: Connectivity enrichment for anorexia nervosa Anorexia nervosa showed the strongest clustering of associated genes in high-level regulatory networks of the endocrine system (hormonal glands, Supplementary Fig. 14c) followed by nervous system and hindbrain, suggesting that endocrine dysregulation, a hallmark of chronic starvation in anorexia nervosa,21 may be implicated. 40 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 36 Individual networks Amygdala, adult Medial temporal gyrus Cerebral meninges, adult Postcentral gyrus, adult Temporal lobe, adult Occipital pole, adult Occipital lobe, adult Medial temporal gyrus, adult Brain, adult Putamen, adult 1 2 3 Rank 4 5 6 7 8 9 10 0.0 1.0 2.0 –log10(q-value) Supplementary Figure 36: Connectivity enrichment for bipolar disorder Bipolar disorder did not show connectivity enrichment for the 32 high-level networks, but we nevertheless tested for enrichment in all individual networks of the nervous system (clusters 1–3 in Supplementary Fig. 14). All networks with score > 1.0 are shown (10% FDR, dashed line). The strongest clustering of perturbed genes is observed in the regulatory network of the amygdala, a group of nuclei playing key roles in memory modulation, emotional learning, emotional reactions, and mood. There are consistent findings in the literature suggesting that abnormalities in the structure and function of the amygdala are implicated in the etiology of bipolar disorder.22 41 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 37 Immune-related Vascular system Includes cell types of vascular system High-level networks Rank 0.0 1.0 Rank Connective tissue and muscle cells (8) Nervous system and adult hindbrain (2) Myeloid leukocytes (11) Immune organs (14) Astrocytes and pigment cells (27) Mesenchymal and smooth muscle (7) Mesenchymal, mixed (4) Glands and internal genitalia (24) Endothelial cells (6) Mouth, throat and skeletal muscle (22) Lung (23) Lymphocytes (10) 1 2 3 4 5 6 7 8 9 10 11 12 2.0 –log10(q-value) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Individual networks Endothelial cells (aortic) Endothelial cells (vein) Endothelial cells (lymphatic) Endothelial cells (microvascular) Fibroblast (cardiac) Endothelial cells (umbilical vein) Smooth muscle cells (coronary artery) Spleen, fetal Medulla oblongata, adult Mast cell Skeletal muscle, fetal Smooth muscle cells (aortic) Dendritic cells (monocyte immature) Mesenchymal stem cells (hepatic) Endothelial cells (artery) Renal glomerular endothelial cells Substantia nigra, adult Naive conventional T cells Smooth muscle cells (umbilical artery) Mesenchymal precursor (ovary, left) Thyroid, adult Vein, adult Astrocyte (cerebellum) Placental epithelial cells Spinal cord, adult Spleen, adult Optic nerve Breast, adult Submaxillary gland, adult Mesenchymal precursor (ovary, right) Memory conventional T cells Skeletal muscle (soleus muscle) Thyroid, fetal Fibroblast (pulmonary artery) Astrocyte (cerebral cortex) Locus coeruleus, adult 0.0 1.0 2.0 –log10(q-value) Supplementary Figure 37: Connectivity enrichment for inflammatory bowel disease Same as Fig. 3b, but showing all networks at FDR < 10%. 42 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 38 High-level networks 1 Mesenchymal stem & smooth muscle cells (7) 2 Astrocytes & pigment cells (27) 3 Nervous system & adult hindbrain (2) 4 Connective tissue & muscle cells (8) 5 Endothelial cells (6) 6 Mesenchymal, mixed (4) 7 Gastrointestinal system (20) 8 Connective tissue & integumental cells (9) 9 Mouth, throat & skeletal muscle tissue (22) 10 Immune organs (14) 11 Extraembryonic membrane (30) b 0.0 1.0 2.0 –log10(q-value) Immune-related Vascular system Includes cell types of vascular system Other Rank Rank a Individual networks Endothelial cells (microvascular) 1 Spleen, fetal 2 Endothelial cells (umbilical vein) 3 Endothelial cells (lymphatic) 4 Medulla oblongata, adult 5 Spinal cord, adult 6 Substantia nigra, adult 7 Skeletal muscle, fetal 8 Appendix, adult 9 Cerebellum, adult 10 CD4+ T cells 11 Fibroblast (cardiac) 12 Synoviocyte 13 CD14+ monocytes 14 Naive conventional T cells 15 Endothelial cells (aortic) 16 0.0 0.5 1.0 1.5 –log10(q-value) Supplementary Figure 38: Connectivity enrichment for ulcerative colitis Connectivity enrichment scores for ulcerative colitis across (a) the 32 high-level networks and (b) the corresponding individual networks. Results are highly consistent with those of inflammatory bowel disease, as expected (see Fig. 3b and main text). Namely, strong connectivity enrichment is observed for endothelial cells and immune organs. Endothelial cells drive pathophysiological processes including immune cell emigration and vascular growth (angiogenesis) that are direct or indirect targets of many drugs used to treat ulcerative colitis.23 The spleen is the body’s largest reservoir of monocytes and regulates inflammation through their storage and deployment.24 43 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 39 Connective tissue & muscle cells (8) Y$"$$$`z 1 2 0.0 1.0 Individual networks Rank b High-level networks Rank a Fibroblast (choroid plexus) 1 0.0 1.0 2.0 –log10(q-value) 2.0 –log10(q-value) Supplementary Figure 39: Connectivity enrichment for Crohn’s disease Connectivity enrichment scores for Crohn’s disease across (a) the 32 high-level networks and (b) the corresponding individual networks. Connectivity enrichment is much weaker than for inflammatory bowel disease (IBD) and ulcerative colitis (UC) and does not seem to reveal relevant cell types. (a) The high-level network for connective tissue & muscle cells (cluster 8) includes vascular cell types and also showed signal for IBD and UC. However, other clusters that include vascular cell types did not show signal here. (b) The vascular cells showing strong clustering of perturbed genes for IBD and UC were not recovered here. Only one cell type shows significant connectivity enrichment (fibroblasts from the choroid plexus), which is likely a false positive. 44 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 40 Rheumatoid arthritis Rank a High-level networks Immune organs (14) 1 0.0 1.0 2.0 Rank –log10(q-value) Individual networks 1 Neutrophils 0.0 1.0 2.0 –log10(q-value) Multiple sclerosis Rank b Individual networks 1 2 !ry conventional T cells 3 CD8+ T cells 4 Neutrophils 5 6 Langerhans cells (migratory) 7 Spleen, adult 8 Spleen, fetal 9 Lymph node, adult 10 Dendritic cells (monocyte immature derived) 11 CD4+ T cells 12 Endothelial progenitor cells 13 CD14+CD16+ monocytes 14 Langerhans cells (immature) 15 Dendritic cells (plasmacytoid) 0.0 1.0 2.0 Hepatitis C resolvers Rank c Immune-related Other Individual networks 1 CD14+CD16+ monocytes 0.0 1.0 2.0 Supplementary Figure 40: Connectivity enrichment for rheumatoid arthritis, multiple sclerosis, and hepatitis C resolvers Rheumatoid arthritis showed borderline connectivity enrichment in the high-level network of immune organs (a), while multiple sclerosis and hepatitis C resolvers did not show signal in any high-level network. We nevertheless tested these three immune-related traits for enrichment in all individual networks of the immune system. (a) For rheumatoid arthritis, the strongest evidence for perturbed regulatory modules was found in neutrophils, which have an activated phenotype in patients and contribute to pathogenesis through the release of cytotoxins, immunoregulatory molecules, and putative autoantigens.25 (b) Diverse immune cells showed connectivity enrichment for multiple sclerosis (MS), in particular T cells and monocytes. Indeed, the most consistent pathological finding in MS is the presence of T cell and monocyte/macrophage infiltrates among areas of myelin breakdown and gliosis in the central nervous system.26 (c) Hepatitis C resolvers, i.e., patients who spontaneously cleared the virus, showed borderline connectivity enrichment in monocytes, which have been proposed to spread infection over extended time because they get infected by the hepatitis C virus with few cytopathic effects.27 45 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 41 b High-level networks Adult forebrain (3) Endothelial cells (6) Connective tissue & integumental cells (9) Rank 1 2 3 0.0 1.0 2.0 –log10(q-value) Individual networks Endothelial cells (umbilical vein) Medial temporal gyrus, adult Locus coeruleus, adult Neural stem cells Occipital pole, adult Adipocyte (subcutaneous) Medial frontal gyrus, adult 1 2 3 Rank a 4 5 Brain structures Vascular system Includes cell types of vascular system Other 6 7 0.0 1.0 2.0 –log10(q-value) Supplementary Figure 41: Connectivity enrichment for Alzheimer’s disease Connectivity enrichment scores for Alzheimer’s disease across (a) the 32 high-level networks and (b) the corresponding individual networks. (a) The strongest clustering of perturbed genes was found in regulatory networks of adult forebrain and endothelial cells, supporting the view that neurovascular dysregulation is implicated.28 In contrast, these results do not confirm recent evidence for genetically driven dysregulation of immune cells in Alzheimer’s disease,29 however, the most relevant cell type (microglia) was not present in our library. (b) Besides from endothelial cells mentioned above, several brain structures as well as neural stem cells showed connectivity enrichment. The medial temporal gyrus ranks second: medial temporal lobe atrophy is predictive of AD in subjects with mild cognitive impairment.30 The locus coeruleus (3rd rank) is also particularly vulnerable in the early stages of AD and its degeneration contributes to disease pathology.31 46 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 42 a 1 Individual networks (without HLA region) Neurons and fetal brain (1) 0.0 1.0 Rank Rank High-level networks Neural stem cells 1 2.0 0.0 –log10(q-value) 1.0 2.0 –log10(q-value) Individual networks (with HLA region) Rank b CD4+ T cells Neural stem cells 1 2 0.0 1.0 2.0 –log10(q-value) Supplementary Figure 42: Connectivity enrichment for narcolepsy with and without HLA region Narcolepsy is a rare sleep disorder caused by an autoimmune attack targeting hypocretin-producing neurons.32 We generally excluded the HLA region from the connectivity enrichment analysis because of its strong LD and association with many immune-related traits — this is a conservative measure often taken in network-based analyses of GWAS data to make sure that results are not driven by the HLA region11 (see also Methods and Supplementary Fig. 29). (a) We observed connectivity enrichment only in regulatory networks of neural stem cells when the HLA region was excluded. This example shows that if the relevant cell type is not available (in this case the hypocretin neurons), networks of related cell types may still show signal. (b) Given that the HLA region shows particularly strong association with narcolepsy,32 we also run the connectivity enrichment analysis with the HLA region included (but still excluding all gene pairs that are in LD, see Methods) for all regulatory networks of the nervous and immune system. This analysis confirmed the observed clustering of perturbed genes in neural stem cells, while also showing signal for T cells, consistent with the autoimmune basis of narcolepsy and its association with the T-cell receptor alpha locus.32, 33 47 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 43 a b HDL LDL High-level networks 3 4 5 7 8 9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 0.0 1.0 Heart (21) Neurons & fetal brain (1) Lung epithelium & lung cancer (32) Y$"$$$`z Nervous system & adult hindbrain (2) Pineal gland & eye (25) Astrocytes & pigment cells (27) Lung (23) Adenocarcinoma (17) ]</""$$"/`z Mesenchymal, mixed (4) 1 2 3 4 5 Rank 2 Rank High-level networks Epithelial cells (29) Neuroectodermal tumors & sarcoma (28) ]</""$$"/`z Pineal gland & eye (25) Glands & internal genitalia (24) Heart (21) Adenocarcinoma (17) Myeloid_leukemia (15) Leukocytes (11) Connective tissue & integumental cells (9) Mesenchymal & smooth muscle cells (7) Adult forebrain (3) Nervous system & adult hindbrain (2) Neurons & fetal brain (1) Immune organs (14) Sarcoma (5) Lung epithelium & lung cancer (32) Extraembryonic membrane (30) Mouth, throat & skeletal muscle tissue (22) Connective tissue & muscle cells (8) Mesenchymal, mixed (4) Astrocytes & pigment cells (27) Lymphoma (13) Epithelial cells of kidney & uterus (31) 1 7 8 9 10 11 0.0 1.0 2.0 –log10(q-value) 2.0 –log10(q-value) Supplementary Figure 43: Connectivity enrichment for blood lipids (HDL and LDL) Connectivity enrichment scores for high-density lipoprotein (HDL) and low-density lipoprotein (LDL) across the 32 high-level networks. We found strong clustering of perturbed genes in diverse high-level networks: over 75% of all networks show connectivity enrichment for HDL and over 30% for LDL. The same observations are made when testing individual networks: a large number of diverse cell types show connectivity enrichment (results not shown). Accordingly, the global regulatory network (obtained by taking the union of all cell type and tissue-specific networks, see Methods) also shows strong connectivity enrichment (Supplementary Fig. 28). Indeed, all human cells synthesize cholesterol, and thus share related pathways, because it is an essential component of cell membranes. Cells that regulate cholesterol levels in the blood, e.g., hepatocytes, likely rely on some of these shared cholesterol pathways, explaining the broad connectivity enrichment. 48 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 44 a b Total cholesterol Triglycerides High-level networks 3 7 8 10 11 13 3 7 8 10 11 13 –log10(q-value) 17 0.0 1.0 @<JY$<$<J"/`z ]<//"$</"/"`z !/J$$`z ]</""$$"/`z *"$J$"y`z X$"r"$J"$"` z @<J`z Hear`z Imm</J"` z v<J<"$$$`z '"/"`z Adult \orebr"`z Nerv<"<$Y=r"`z Neurons & \etal br"`z Mouth, throat & sZeletal m<$<`z ;<muscle cells (8) My$$<Z"`z @v/Ze`z Y"$K^` z Y$"$$$\Zey & uterus (31) @Y\$"J`z Y$"$$$`z !"/"`z @Y"`z "$//<ve organs (18) Y$"$$$`z Extraembry=rane (3) Y$"$$$`z Mesenchymal & smooth muscle cells (7) 1 "Z *"$J$"y`z @<J`z Hear`z Neurons & \etal br"`z @<JY$<$<J"/`z !/J$$`z @v/Ze`z Y"$K^` z '"/"`z X$"r"$J"$"` z Y$"$$$`z v<J<"$$$`z ]</""$$"/`z !"/"`z Mesenchymal & smooth muscle cells (7) 1 "Z High-level networks 18 0.0 1.0 –log10(q-value) Supplementary Figure 44: Connectivity enrichment for blood lipids (TC and TG) Connectivity enrichment scores for total cholesterol (TC) and triglycerides (TG) across the 32 high-level networks. We found strong clustering of perturbed genes in diverse high-level networks: close to 50% of all networks show connectivity enrichment for TC and over 90% for TG. The same observations are made when testing individual networks: a large number of diverse cell types show connectivity enrichment (results not shown). This suggests that general pathways are implicated; see legend of Supplementary Fig. 43 for details. 49 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 45 a b High-level networks Mesenchymal, mixed (4) Endothelial cells (6) Gastrointestinal system (20) Immune organs (14) Mouth, throat & skeletal muscle tissue (22) Pineal gland & eye (25) 1 2 Rank 3 4 5 6 0.0 1.0 Individual networks Tonsil, adult Tongue, adult Colon, adult Thymus, adult Esophagus, adult Small intestine, adult Gall bladder, adult Adipocyte (perirenal) Hepatic mesenchymal tumor cell line Sertoli cells Trachea, adult Colon, fetal Tongue, fetal Appendix, adult Spleen, adult Skeletal muscle, fetal Mesenchymal stem cells Renal glomerular endothelial cells Hepatic sinusoidal endothelial cells Spleen, fetal Endothelial cells (microvascular) Bone marrow stromal cell line Basal cell carcinoma cell line Eye, fetal Penis, adult Throat, fetal Endothelial cells (artery) Preadipocyte (perirenal) Lymph node, adult Thymus, fetal Rectum, fetal Duodenum, fetal Endothelial cells (lymphatic) Pancreas, adult Neurofibroma cell line Diaphragm, fetal Trachea fetal Tridermal teratoma cell line Throat adult Endothelial cells (thoracic) Ewings sarcoma cell line Stomach, fetal Umbilical cord, fetal Retina adult Pineal gland, adult Skin fetal 1 2 3 4 5 6 7 2.0 –log10(q-value) 8 9 10 Annotation (UBERON terms) Immune system Digestive system Both terms Other 11 12 13 14 15 16 17 18 19 Ontology enrichment (UBERON) Rank Term 1 embryo 20 Q-value 21 0.0008 22 2 immune system 0.013 3 digestive system 0.017 Rank c 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 0.0 1.0 2.0 –log10(q-value) Supplementary Figure 45: Connectivity enrichment for body mass index (Caption next page.) 50 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 45: (Previous page.) Connectivity enrichment scores for body mass index (BMI) across (a) the 32 high-level networks and (b) the corresponding individual networks. (a) BMI shows connectivity enrichment in diverse high-level networks, including those of the gastrointestinal system (rank 3) and immune organs (rank 4). Together with ulcerative colitis (Supplementary Fig. 38), BMI is the only trait showing connectivity enrichment for the gastrointestinal system. (b) When testing individual networks, the strongest clustering of BMI-associated genes was found in networks of the digestive and immune system (red and purple indicate networks associated with the UBERON ontology terms digestive system and immune system, respectively, based on the sample annotation provided by FANTOM5). For the digestive system, we observed strong connectivity enrichment both for the lower gastrointestinal tract (small intestine, colon, rectum, gall bladder, appendix) and the upper gastrointestinal tract (esophagus, stomach) and related networks (tongue and tonsils, which group together with the esophagus in the hierarchical clustering of networks; Supplementary Fig. 20). For the immune system, we observed strong connectivity enrichment in different immune organs (tonsils, thymus, spleen, lymph nodes). (c) The overrepresentation of digestive and immune system networks was confirmed through ontology enrichment analysis (besides the general term embryo (fetal samples), immune system and digestive system are the only significant terms; Methods). These results suggest that perturbed pathways specific to the digestive system impact BMI. The digestive system is directly responsible for energy intake through (1) regulation of appetite by signaling to the brain and (2) digestion of nutrients. The strong signal for both the intestinal tract and the immune system may further suggest a potential link with the gut microbiome, which has recently been discovered to have a heritable component that impacts BMI.34, 35 Interestingly, previous studies by the GIANT consortium implicated mainly the central nervous system in obesity susceptibility (i.e., the receiver-end of appetite signals), and not the digestive and immune systems.36 Reconciling these results is an important avenue for future work. Possibly, the regulatory circuit-based analysis presented here and pathway analyses employed by the GIANT consortium (e.g., DEPICT37 ) capture complementary aspects that may together give a fuller picture of tissues and biological pathways that impact BMI. 51 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 46 Rank High-level networks Y$"$$$`z 1 0.0 1.0 2.0 –log10(q-value) Supplementary Figure 46: Connectivity enrichment for β-cell function Connectivity enrichment scores for β-cell function (HOMA-β) across the 32 high-level networks. Only the network of cluster 16 (endo-epithelial cells) shows borderline enrichment. Given that none of the individual networks in this cluster is related to β-cells, our library does not include β-cells or pancreatic islets, and no other glycemic traits showed connectivity enrichment, this is likely a false positive. 52 Nature Methods: doi:10.1038/nmeth.3799 Supplementary Figure 47 Individual networks Mesenchymal, mixed (4) 0.0 1.0 2.0 –log10(q-value) Vascular Tumors (induce vascularization) Other Rank Rank High-level networks 1 Smooth muscle cells (umbilical vein) Spindle cell sarcoma cell line Neurofibroma cell line Osteoclastoma cell line Basal cell carcinoma cell line Fibroblast (pulmonary artery) Preadipocyte (perirenal) Bone marrow stromal cell line Sertoli cells Mesenchymal stem cells Hepatic mesenchymal tumor cell line Tridermal teratoma cell line Ewings sarcoma cell line 1 2 3 4 5 6 7 8 9 10 11 12 13 0.0 1.0 2.0 –log10(q-value) Supplementary Figure 47: Connectivity enrichment for advanced macular degeneration (neovascular) Same as Fig. 3c, but showing all networks at FDR < 10%. Age-related macular degeneration (AMD) of neovascular type shows the strongest connectivity enrichment in regulatory networks of vascular smooth muscle cells followed by diverse tumors, which induce vascularization to achieve growth. As a control, we further confirmed that the dry form of AMD, which does not involve neovascularization, does not show any connectivity enrichment in these networks. 53 Nature Methods: doi:10.1038/nmeth.3799 References [1] Marstrand, T. T. & Storey, J. D. Identifying and mapping cell-type-specific chromatin programming of gene expression. Proceedings of the National Academy of Sciences 111, E645–E654 (2014). [2] Maurano, M. T. et al. Systematic Localization of Common Disease-Associated Variation in Regulatory DNA. Science 337, 1190–1195 (2012). [3] Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012). [4] Roy, S. et al. A predictive modeling approach for cell line-specific long-range regulatory interactions. Nucleic Acids Research 43, 8694–8712 (2015). [5] Babu, M. M., Luscombe, N. M., Aravind, L., Gerstein, M. & Teichmann, S. A. Structure and evolution of transcriptional regulatory networks. Current Opinion in Structural Biology 14, 283–291 (2004). [6] Albert, R. Scale-free networks in cell biology. Journal of Cell Science 118, 4947–4957 (2005). [7] Yip, K. Y. et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biology 13, R48 (2012). [8] Lamparter, D., Marbach, D., Rico, R., Kutalik, Z. & Bergmann, S. Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLoS Comp Bio (in press). [9] The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). [10] Kondor, R. I. & Lafferty, J. D. Diffusion Kernels on Graphs and Other Discrete Input Spaces. In Proceedings of the Nineteenth International Conference on Machine Learning, ICML ’02, 315–322 (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002). [11] Rossin, E. J. et al. Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genetics 7, e1001273 (2011). [12] Erten, S., Bebek, G., Ewing, R. M. & Koyutürk, M. DADA: Degree-Aware Algorithms for Network-Based Disease Gene Prioritization. BioData mining 4, 19 (2011). [13] Lage, K. et al. A human phenome-interactome network of protein complexes implicated in genetic disorders. Nature biotechnology 25, 309–316 (2007). [14] Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012). [15] Neph, S. et al. Circuitry and Dynamics of Human Transcription Factor Regulatory Networks. Cell 150, 1274–1286 (2012). [16] Boyer, L. A. et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956 (2005). [17] Perez-Costas, E., Melendez-Ferro, M. & Roberts, R. C. Basal ganglia pathology in schizophrenia: dopamine connections and anomalies. Journal of neurochemistry 113, 287–302 (2010). [18] Hazlett, E. A. et al. Three-Dimensional Analysis With MRI and PET of the Size, Shape, and Function of the Thalamus in the Schizophrenia Spectrum. American Journal of Psychiatry (2014). [19] Berridge, C. W. & Waterhouse, B. D. The locus coeruleus-noradrenergic system: modulation of behavioral state and state-dependent cognitive processes. Brain Research. Brain Research Reviews 42, 33–84 (2003). [20] Millan, M. J. et al. Cognitive dysfunction in psychiatric disorders: characteristics, causes and the quest for improved therapy. Nature Reviews Drug Discovery 11, 141–168 (2012). [21] Misra, M. & Klibanski, A. Endocrine consequences of anorexia nervosa. The Lancet Diabetes & Endocrinology 2, 581–592 (2014). [22] Garrett, A. & Chang, K. The role of the amygdala in bipolar disorder development. Development and Psychopathology 20, 1285–1296 (2008). [23] Cromer, W. E., Mathis, J. M., Granger, D. N., Chaitanya, G. V. & Alexander, J. S. Role of the endothelium in inflammatory bowel diseases. World Journal of Gastroenterology 17, 578–593 (2011). [24] Swirski, F. K. et al. Identification of Splenic Reservoir Monocytes and Their Deployment to Inflammatory Sites. Science 325, 612–616 (2009). [25] Wright, H. L., Moots, R. J. & Edwards, S. W. The multifactorial role of neutrophils in rheumatoid arthritis. Nature Reviews Rheumatology 10, 593–601 (2014). [26] Filion, L. G., Graziani-Bowering, G., Matusevicius, D. & Freedman, M. S. Monocyte-derived cytokines in multiple sclerosis. Clinical and Experimental Immunology 131, 324–334 (2003). 54 Nature Methods: doi:10.1038/nmeth.3799 [27] Revie, D. & Salahuddin, S. Z. Role of macrophages and monocytes in hepatitis C virus infections. World Journal of Gastroenterology : WJG 20, 2777–2784 (2014). [28] Iadecola, C. Neurovascular regulation in the normal brain and in Alzheimer’s disease. Nature Reviews. Neuroscience 5, 347–360 (2004). [29] Gjoneska, E. et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer/’s disease. Nature 518, 365–369 (2015). [30] Visser, P. J., Verhey, F. R. J., Hofman, P. a. M., Scheltens, P. & Jolles, J. Medial temporal lobe atrophy predicts Alzheimer’s disease in patients with minor cognitive impairment. Journal of Neurology, Neurosurgery, and Psychiatry 72, 491–497 (2002). [31] Heneka, M. T. et al. Locus ceruleus controls Alzheimer’s disease pathology by modulating microglial functions through norepinephrine. Proceedings of the National Academy of Sciences of the United States of America 107, 6058–6063 (2010). [32] Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nature Genetics 42, 786–789 (2010). [33] Hallmayer, J. et al. Narcolepsy is strongly associated with the T-cell receptor alpha locus. Nature Genetics 41, 708–711 (2009). [34] Goodrich, J. K. et al. Human genetics shape the gut microbiome. Cell 159, 789–799 (2014). [35] Sommer, F. & Bäckhed, F. The gut microbiota — masters of host development and physiology. Nature Reviews Microbiology 11, 227–238 (2013). [36] Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015). [37] Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nature Communications 6, 5890 (2015). 55 Nature Methods: doi:10.1038/nmeth.3799