Download Table S1 - Genetics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Biology and consumer behaviour wikipedia , lookup

Neurogenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Transcript
1
Supplementary File Captions
2
3
File S1: Supplementary Methods and Results.
4
5
File S2: Candidate SNPs and genes. SNPs identified as candidates in each
6
control and desiccation generation 13 replicate with respect to the mass-bred
7
population. SNP effect and gene annotation (SNPEff results) and candidate gene
8
lists for each pool are reported. For the desiccation-selected replicate line gene
9
lists, whether each gene was present in the putative lab adaptation gene list
10
(genes occurring in at least one control line replicate) is also indicated.
11
12
File S3: Allele counts and SNP frequency results for candidate SNPs identified
13
in each control and desiccation-selected replicate. See included Readme file for
14
details on data format.
15
16
File S4: Category enrichment for candidate genes. Gene enrichment test
17
results for gene lists significantly differentiated between the mass-bred and
18
generation 13 pools, for all control and desiccation-selected replicates.
19
20
21
Figure S1: Number of SNPs called in each mass bred—generation 13
22
comparison. Results are presented for different levels of minimum coverage
23
depth (x-axis) and minimum count of minor allele across both pools (panels).
24
The combination of 0 minimum coverage depth and 3 minimum minor allele
25
count was used for further analysis.
26
27
Figure S2: Chromosome level plots of genomic regions enriched in SNPs
28
that showed significant differentiation between generation 0 and
29
generation 13 pools. Grey line shows the proportion of SNPs that were
30
significantly-differentiated in 100 Kbp windows; black line shows the proportion
31
in 1 Mb windows. Horizontal black line segments highlight regions containing
32
100 Kbp windows with a proportion of significantly-differentiated SNPs in the
33
80th percentile (genome-wide). Where such windows occurred within 1 Mbp of
34
each other, they were combined into a single region (see Methods for analysis
35
performed on these regions). The pale blue line shows the proportion of SNPs
36
with allele frequency <0.1 at generation 0 (‘low-frequency SNPs’) per 100 Kbp
37
window; the dark blue line shows the same parameter calculated over 1 Mbp
38
windows. Asterisks indicate cases where the proportion of low-frequency SNPs
39
showed a significant positive correlation with the proportion of significantly-
40
differentiated SNPs (see Figure S6) after excluding windows without
41
significantly-differentiated SNPs.
42
43
Figure S3: Distribution of sequencing coverage depth in the generation 13
44
pool for candidate SNPs (solid coloured line) and 100,000 randomly-selected
45
SNPs (coloured dashed line) for each replicate. The distribution is also shown for
46
the same candidate SNPs (solid black line) and randomly-selected SNPs (dashed
47
grey line) for the generation 0 mass-bred pool.
48
49
Figure S4: Boxplots of frequency change (from generation 0 to generation
50
13 of the selection regime) in each replicate line for candidate SNPs. Boxes
51
display the range of observations (whiskers), the median value (thick line), first
52
and third quartiles (box ends), and outliers (points).
53
54
Figure S5: Correlation between the number of true associations and the
55
number of false associations detected 10Kbp (left panel) and 1Mbp (right
56
panel) away from a causal variant explaining 1% of broad-sense heritability. The
57
results are based on 2 million independent simulations under the model
58
presented in File S1. The densities are colour-coded on a log-scale. Excluding the
59
simulations where no association was found, we observed a positive correlation
60
between number of true associations and short-range false associations and a
61
negative correlation between number of true associations and long-range false
62
associations, suggesting (rare) short-range false positives could be due to linkage
63
while long-range false positive could be due to drift alone.
64
65
Figure S6: Scatterplots of the proportion of significantly-differentiated
66
SNPs versus the proportion of low-starting-frequency SNPs (allele
67
frequency < 0.1) across 100 Kbp genomic windows for each chromosome
68
and each replicate line. Between 235 and 320 windows were examined per
69
chromosome. Dashed pink line and solid red line show linear relationships for
70
the full dataset, and excluding windows without significantly-differentiated
71
SNPs, respectively. Excluding windows without significantly-differentiated SNPs
72
left between 3 and 230 datapoints. R-squared and significance values are
73
reported where P < 0.05. Note y-axis scale differs between panels.
74
75
Figure S7: Observed lab adaptation candidate SNP overlap compared to
76
expected. Histograms of the distribution of overlap (no. shared SNPs) among all
77
replicate combinations of the simulated control-replicate gene lists. The
78
histogram shows the distribution of overlap measures for 1000 simulations.
79
Density is shown by the black line and the corresponding normal distribution is
80
shown by the blue line. The red vertical line shows the number of genes shared
81
between the observed control candidate lists, labelled as a percentile of the
82
normal distribution.
83
84
Figure S8: Observed desiccation candidate SNP overlap compared to
85
expected. Histograms of the distribution of overlap (no. shared SNPs) among all
86
replicate combinations of the simulated desiccation-replicate gene lists. The
87
histogram shows the distribution of overlap measures for 1000 simulations.
88
Density is shown by the black line and the corresponding normal distribution is
89
shown by the blue line. The red vertical line shows the number of genes shared
90
between the observed desiccation candidate lists, labelled as a percentile of the
91
normal distribution.
92
93
Figure S9: Enrichment of candidate SNPs in genomic location categories.
94
The arrows show the difference in frequency of SNPs mapping to a genomic
95
location type, between the list of all SNPs and the list of candidate SNPs only.
96
Black arrows indicate this difference was significant by a 2 test (P < 0.05), grey
97
arrows indicate no significant difference. Separate calculations were performed
98
for each control and desiccation-line replicate comparison between the mass-
99
bred and generation 13.
100
101
Figure S10: Enrichment of candidate SNPs in genetic effect categories. The
102
arrows show the difference in frequency of SNPs mapping to a genomic effect
103
type, between the list of all SNPs and the list of candidate SNPs (only those
104
within 1000 bp of an annotated gene in each case). Low: either a synonymous
105
variant, a variant producing a premature start codon, a variant located near but
106
not within a splice site (within 1-3 bases of an exon or 3-8 bases of an intron), or
107
a synonymous or non-synonymous mutation affecting a start or stop codon;
108
Moderate: a non-synonymous coding variant; High: variant causing loss of a start
109
codon, loss or gain of a stop codon or loss or gain of a splice donor or acceptor
110
site (2 bases after coding exon start or 2 bases before coding exon end);
111
Modifier: any other variant, including upstream, downstream, 5’- and 3’-UTR
112
locations. Black arrows indicate this difference was significant by a 2 test (P <
113
0.05), grey arrows indicate no significant difference. Separate calculations were
114
performed for each control and desiccation-line replicate.
115
116
Figure S11: Observed eQTL/veQTL overlap compared to expected.
117
Histograms of the null distribution of the number of candidate SNPs expected to
118
hit known eQTL/veQTL loci (Huang et al. 2015) for each control (A-E) and
119
desiccation-selected (F-J) replicate line. The histogram shows the distribution of
120
eQTL/veQTL SNP numbers for 1000 simulations. Density is shown by the black
121
line and the corresponding normal distribution is shown by the blue line. The
122
red vertical line shows the number of known eQTL/veQTL loci hit in the real
123
candidate SNP lists, labelled as a percentile of the normal distribution.
124
125
Figure S12: Observed eQTL/veQTL target gene overlap compared to
126
expected. Histograms of the distribution of overlap (no. shared genes targetted
127
by candidate eQTL/veQTL SNPs) among all replicate combinations of the
128
simulated desiccation-candidate SNP lists. The histogram shows the distribution
129
of overlap measures for 1000 simulations. Density is shown by the black line and
130
the corresponding normal distribution is shown by the blue line. The red vertical
131
line shows the number of genes shared between the observed desiccation-
132
candidate eQTL/veQTL target lists, labelled as a percentile of the normal
133
distribution.
134
135
Figure S13: Heatmap of lab adaptation gene overlap among replicates. This
136
shows the overlap in putative lab-adaptation genes among the control (C1-C5)
137
and desiccation-selected (D1-D5) replicates. Red indicates that gene was
138
significantly differentiated between mass-bred and generation 13 pool for that
139
line; grey indicates no significant differentiation.
140
141
Figure S14: Observed lab adaptation gene occurrence in desiccation
142
replicates compared to expected. Histograms of the distribution of overlap
143
(no. shared genes) among each simulated desiccation-candidate gene list and the
144
corresponding simulated putative lab-adaptation gene list. The histogram shows
145
the distribution of overlap measures for 1000 simulations. Density is shown by
146
the black line and the corresponding normal distribution is shown by the blue
147
line. The red vertical line shows the number of genes shared between the
148
observed desiccation-candidate list and the putative lab-adaptation list, labelled
149
as a percentile of the normal distribution.
150
151
Figure S15: Observed lab adaptation candidate gene overlap compared to
152
expected. Histograms of the distribution of overlap (no. shared genes) among all
153
replicate combinations of the simulated control-replicate gene lists. The
154
histogram shows the distribution of overlap measures for 1000 simulations.
155
Density is shown by the black line and the corresponding normal distribution is
156
shown by the blue line. The red vertical line shows the number of genes shared
157
between the observed control candidate lists, labelled as a percentile of the
158
normal distribution.
159
160
Figure S16: Observed desiccation candidate gene overlap compared to
161
expected. Histograms of the distribution of overlap (no. shared genes) among all
162
replicate combinations of the simulated desiccation-candidate gene lists. The
163
histogram shows the distribution of overlap measures for 1000 simulations.
164
Density is shown by the black line and the corresponding normal distribution is
165
shown by the blue line. The red vertical line shows the number of genes shared
166
between the observed desiccation-candidate lists, labelled as a percentile of the
167
normal distribution.
168
169
Figure S17: Heatmap of desiccation candidate gene overlap among
170
replicates. Heatmap showing the overlap in putative desiccation-resistance
171
candidate among the desiccation-selected (D1-D5) replicates. Red indicates that
172
gene was significantly differentiated between mass-bred and generation 13 pool
173
for that replicate; grey indicates no significant differentiation.
174
175
Figure S18: Comparison of gene list overlap due to single-mapping SNPs
176
(SNPs that were mapped to only one gene) and overlap due to multi-
177
mapping SNPs (SNPs that mapped to more than one gene). Each point shows
178
the level of overlap (number of overlapping genes) calculated for one possible
179
pair, trio, quartet or the overlap among all 5 replicates in the control (panel A) or
180
selected line (panel B) treatment. The x- and y-axis are log10-transformed in
181
panel B for ease of visualisation. Grey dashed line shows the expected slope of 1,
182
i.e. no difference between the single- and multi-mapping SNP results on the
183
extent of overlap. The red line in panel B shows the line of best fit from observed
184
points.
185
186
Figure S19: Variance in allele frequency among replicates binned by
187
starting allele frequency. Mean standardized variance f versus starting allele
188
frequency among replicate generation 13 control lines (blue) and desiccation-
189
selected lines (red) measured at desiccation candidate loci (‘des’), lab-adaptation
190
candidate loci (‘lab’) or a random subset of neutral loci (‘neutral’). Loci were
191
assigned to bins of width 0.05 by starting (mass-bred) allele frequency, shown
192
on the x-axis.
193
194
Figure S20: Observed protein-protein interaction network overlap
195
compared to expected based on SNP resampling. Histograms of zero-order
196
PPI network size, number of listed genes with known interactions, and network
197
density for simulated networks. Size is represented as node count (A-E) and edge
198
count (F-J) distributions simulated by resampling from the entire mapped D.
199
melanogaster gene list (r6.01). Number of resampled genes with known
200
interactions (i.e. genes that appear in the background Drosophila PPI network) is
201
shown in panels K-O. Network density (the average connectivity across all
202
nodes) is shown in panels P-T. For network density only, the background
203
simulations were conducted by selecting subnetworks from the background
204
network of the same size (same no. nodes) as the observed network. For all
205
panels, the histogram shows the distribution of size measure for 1000
206
simulations. Distribution density is shown by the black line and the
207
corresponding normal distribution is shown by the blue line. The red vertical
208
line shows the size of the observed network, labelled as a percentile of the
209
normal distribution.
210
211
Figure S21: Observed protein-protein interaction network overlap
212
compared to expected based on gene resampling. Histograms of zero-order
213
PPI network overlap node count (A-J, S-AD) and edge count (K-T, AE-AN)
214
distributions simulated by resampling from the entire mapped D. melanogaster
215
gene list (r6.01). The histogram shows the distribution of overlap measures for
216
1000 simulations. Density is shown by the black line and the corresponding
217
normal distribution is shown by the blue line. The red vertical line shows the
218
overlap observed among the real networks, labelled as a percentile of the normal
219
distribution.
220
221
Figure S22: Relationship between gene length and likelihood of
222
representation in a candidate gene list. For each panel, the number of cases
223
(out of 1000) in which a gene appeared in the simulated candidate list is plotted
224
against the gene’s length (in base pairs, including 1000 bp up- and down-
225
stream). Grey points are genes not hit in the observed candidate gene list; red
226
points are genes observed in the real candidate gene list. Results are shown for
227
all individual control (A-E) and desiccation-selected replicate lines (F-J) as well
228
as for the lists of genes overlapping among replicates. Putative lab adaptation
229
genes were removed from desiccation-selected gene lists in each simulation and
230
in the observed data.
Table S1: Summary statistics for each replicate line. Total raw read count,
mean depth of genome coverage, number of SNPs called, Bonferroni-corrected Pvalue threshold, number of candidate SNPs, number of candidate genes and zeroorder PPI network sizes for each replicate line.
Table S2: Power analysis. Posterior probability of identifying at least one SNP
association with desiccation tolerance in a simulated causal gene (True Positive
Rate), a neutral gene located ~10Kbp from the causal variant (short-range False
Positive Rate), or a neutral gene located ~1Mbp from the causal variant (longrange False Positive Rate). The results are based on 2 million independent
simulations under the model presented in File S1.
Table S3: Inversion frequency results. Frequencies of diagnostic SNPs assayed
for various chromosomal inversions in the mass-bred and all generation 13
replicates.
Table S4: Desiccation candidate SNPs with putative high impact. SNPs
identified as putative high-impact by SNPEff (Cingolani et al. 2012) that
increased significantly in frequency between the mass-bred and generation 13 in
the desiccation replicates. These SNPs caused the loss or gain of a stop codon,
splice acceptor or donor site and hence may impact protein function. Gene
function is presented where this is known, using information from Flybase
(Santos et al. 2015) and references cited.