Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Supplementary File Captions 2 3 File S1: Supplementary Methods and Results. 4 5 File S2: Candidate SNPs and genes. SNPs identified as candidates in each 6 control and desiccation generation 13 replicate with respect to the mass-bred 7 population. SNP effect and gene annotation (SNPEff results) and candidate gene 8 lists for each pool are reported. For the desiccation-selected replicate line gene 9 lists, whether each gene was present in the putative lab adaptation gene list 10 (genes occurring in at least one control line replicate) is also indicated. 11 12 File S3: Allele counts and SNP frequency results for candidate SNPs identified 13 in each control and desiccation-selected replicate. See included Readme file for 14 details on data format. 15 16 File S4: Category enrichment for candidate genes. Gene enrichment test 17 results for gene lists significantly differentiated between the mass-bred and 18 generation 13 pools, for all control and desiccation-selected replicates. 19 20 21 Figure S1: Number of SNPs called in each mass bred—generation 13 22 comparison. Results are presented for different levels of minimum coverage 23 depth (x-axis) and minimum count of minor allele across both pools (panels). 24 The combination of 0 minimum coverage depth and 3 minimum minor allele 25 count was used for further analysis. 26 27 Figure S2: Chromosome level plots of genomic regions enriched in SNPs 28 that showed significant differentiation between generation 0 and 29 generation 13 pools. Grey line shows the proportion of SNPs that were 30 significantly-differentiated in 100 Kbp windows; black line shows the proportion 31 in 1 Mb windows. Horizontal black line segments highlight regions containing 32 100 Kbp windows with a proportion of significantly-differentiated SNPs in the 33 80th percentile (genome-wide). Where such windows occurred within 1 Mbp of 34 each other, they were combined into a single region (see Methods for analysis 35 performed on these regions). The pale blue line shows the proportion of SNPs 36 with allele frequency <0.1 at generation 0 (‘low-frequency SNPs’) per 100 Kbp 37 window; the dark blue line shows the same parameter calculated over 1 Mbp 38 windows. Asterisks indicate cases where the proportion of low-frequency SNPs 39 showed a significant positive correlation with the proportion of significantly- 40 differentiated SNPs (see Figure S6) after excluding windows without 41 significantly-differentiated SNPs. 42 43 Figure S3: Distribution of sequencing coverage depth in the generation 13 44 pool for candidate SNPs (solid coloured line) and 100,000 randomly-selected 45 SNPs (coloured dashed line) for each replicate. The distribution is also shown for 46 the same candidate SNPs (solid black line) and randomly-selected SNPs (dashed 47 grey line) for the generation 0 mass-bred pool. 48 49 Figure S4: Boxplots of frequency change (from generation 0 to generation 50 13 of the selection regime) in each replicate line for candidate SNPs. Boxes 51 display the range of observations (whiskers), the median value (thick line), first 52 and third quartiles (box ends), and outliers (points). 53 54 Figure S5: Correlation between the number of true associations and the 55 number of false associations detected 10Kbp (left panel) and 1Mbp (right 56 panel) away from a causal variant explaining 1% of broad-sense heritability. The 57 results are based on 2 million independent simulations under the model 58 presented in File S1. The densities are colour-coded on a log-scale. Excluding the 59 simulations where no association was found, we observed a positive correlation 60 between number of true associations and short-range false associations and a 61 negative correlation between number of true associations and long-range false 62 associations, suggesting (rare) short-range false positives could be due to linkage 63 while long-range false positive could be due to drift alone. 64 65 Figure S6: Scatterplots of the proportion of significantly-differentiated 66 SNPs versus the proportion of low-starting-frequency SNPs (allele 67 frequency < 0.1) across 100 Kbp genomic windows for each chromosome 68 and each replicate line. Between 235 and 320 windows were examined per 69 chromosome. Dashed pink line and solid red line show linear relationships for 70 the full dataset, and excluding windows without significantly-differentiated 71 SNPs, respectively. Excluding windows without significantly-differentiated SNPs 72 left between 3 and 230 datapoints. R-squared and significance values are 73 reported where P < 0.05. Note y-axis scale differs between panels. 74 75 Figure S7: Observed lab adaptation candidate SNP overlap compared to 76 expected. Histograms of the distribution of overlap (no. shared SNPs) among all 77 replicate combinations of the simulated control-replicate gene lists. The 78 histogram shows the distribution of overlap measures for 1000 simulations. 79 Density is shown by the black line and the corresponding normal distribution is 80 shown by the blue line. The red vertical line shows the number of genes shared 81 between the observed control candidate lists, labelled as a percentile of the 82 normal distribution. 83 84 Figure S8: Observed desiccation candidate SNP overlap compared to 85 expected. Histograms of the distribution of overlap (no. shared SNPs) among all 86 replicate combinations of the simulated desiccation-replicate gene lists. The 87 histogram shows the distribution of overlap measures for 1000 simulations. 88 Density is shown by the black line and the corresponding normal distribution is 89 shown by the blue line. The red vertical line shows the number of genes shared 90 between the observed desiccation candidate lists, labelled as a percentile of the 91 normal distribution. 92 93 Figure S9: Enrichment of candidate SNPs in genomic location categories. 94 The arrows show the difference in frequency of SNPs mapping to a genomic 95 location type, between the list of all SNPs and the list of candidate SNPs only. 96 Black arrows indicate this difference was significant by a 2 test (P < 0.05), grey 97 arrows indicate no significant difference. Separate calculations were performed 98 for each control and desiccation-line replicate comparison between the mass- 99 bred and generation 13. 100 101 Figure S10: Enrichment of candidate SNPs in genetic effect categories. The 102 arrows show the difference in frequency of SNPs mapping to a genomic effect 103 type, between the list of all SNPs and the list of candidate SNPs (only those 104 within 1000 bp of an annotated gene in each case). Low: either a synonymous 105 variant, a variant producing a premature start codon, a variant located near but 106 not within a splice site (within 1-3 bases of an exon or 3-8 bases of an intron), or 107 a synonymous or non-synonymous mutation affecting a start or stop codon; 108 Moderate: a non-synonymous coding variant; High: variant causing loss of a start 109 codon, loss or gain of a stop codon or loss or gain of a splice donor or acceptor 110 site (2 bases after coding exon start or 2 bases before coding exon end); 111 Modifier: any other variant, including upstream, downstream, 5’- and 3’-UTR 112 locations. Black arrows indicate this difference was significant by a 2 test (P < 113 0.05), grey arrows indicate no significant difference. Separate calculations were 114 performed for each control and desiccation-line replicate. 115 116 Figure S11: Observed eQTL/veQTL overlap compared to expected. 117 Histograms of the null distribution of the number of candidate SNPs expected to 118 hit known eQTL/veQTL loci (Huang et al. 2015) for each control (A-E) and 119 desiccation-selected (F-J) replicate line. The histogram shows the distribution of 120 eQTL/veQTL SNP numbers for 1000 simulations. Density is shown by the black 121 line and the corresponding normal distribution is shown by the blue line. The 122 red vertical line shows the number of known eQTL/veQTL loci hit in the real 123 candidate SNP lists, labelled as a percentile of the normal distribution. 124 125 Figure S12: Observed eQTL/veQTL target gene overlap compared to 126 expected. Histograms of the distribution of overlap (no. shared genes targetted 127 by candidate eQTL/veQTL SNPs) among all replicate combinations of the 128 simulated desiccation-candidate SNP lists. The histogram shows the distribution 129 of overlap measures for 1000 simulations. Density is shown by the black line and 130 the corresponding normal distribution is shown by the blue line. The red vertical 131 line shows the number of genes shared between the observed desiccation- 132 candidate eQTL/veQTL target lists, labelled as a percentile of the normal 133 distribution. 134 135 Figure S13: Heatmap of lab adaptation gene overlap among replicates. This 136 shows the overlap in putative lab-adaptation genes among the control (C1-C5) 137 and desiccation-selected (D1-D5) replicates. Red indicates that gene was 138 significantly differentiated between mass-bred and generation 13 pool for that 139 line; grey indicates no significant differentiation. 140 141 Figure S14: Observed lab adaptation gene occurrence in desiccation 142 replicates compared to expected. Histograms of the distribution of overlap 143 (no. shared genes) among each simulated desiccation-candidate gene list and the 144 corresponding simulated putative lab-adaptation gene list. The histogram shows 145 the distribution of overlap measures for 1000 simulations. Density is shown by 146 the black line and the corresponding normal distribution is shown by the blue 147 line. The red vertical line shows the number of genes shared between the 148 observed desiccation-candidate list and the putative lab-adaptation list, labelled 149 as a percentile of the normal distribution. 150 151 Figure S15: Observed lab adaptation candidate gene overlap compared to 152 expected. Histograms of the distribution of overlap (no. shared genes) among all 153 replicate combinations of the simulated control-replicate gene lists. The 154 histogram shows the distribution of overlap measures for 1000 simulations. 155 Density is shown by the black line and the corresponding normal distribution is 156 shown by the blue line. The red vertical line shows the number of genes shared 157 between the observed control candidate lists, labelled as a percentile of the 158 normal distribution. 159 160 Figure S16: Observed desiccation candidate gene overlap compared to 161 expected. Histograms of the distribution of overlap (no. shared genes) among all 162 replicate combinations of the simulated desiccation-candidate gene lists. The 163 histogram shows the distribution of overlap measures for 1000 simulations. 164 Density is shown by the black line and the corresponding normal distribution is 165 shown by the blue line. The red vertical line shows the number of genes shared 166 between the observed desiccation-candidate lists, labelled as a percentile of the 167 normal distribution. 168 169 Figure S17: Heatmap of desiccation candidate gene overlap among 170 replicates. Heatmap showing the overlap in putative desiccation-resistance 171 candidate among the desiccation-selected (D1-D5) replicates. Red indicates that 172 gene was significantly differentiated between mass-bred and generation 13 pool 173 for that replicate; grey indicates no significant differentiation. 174 175 Figure S18: Comparison of gene list overlap due to single-mapping SNPs 176 (SNPs that were mapped to only one gene) and overlap due to multi- 177 mapping SNPs (SNPs that mapped to more than one gene). Each point shows 178 the level of overlap (number of overlapping genes) calculated for one possible 179 pair, trio, quartet or the overlap among all 5 replicates in the control (panel A) or 180 selected line (panel B) treatment. The x- and y-axis are log10-transformed in 181 panel B for ease of visualisation. Grey dashed line shows the expected slope of 1, 182 i.e. no difference between the single- and multi-mapping SNP results on the 183 extent of overlap. The red line in panel B shows the line of best fit from observed 184 points. 185 186 Figure S19: Variance in allele frequency among replicates binned by 187 starting allele frequency. Mean standardized variance f versus starting allele 188 frequency among replicate generation 13 control lines (blue) and desiccation- 189 selected lines (red) measured at desiccation candidate loci (‘des’), lab-adaptation 190 candidate loci (‘lab’) or a random subset of neutral loci (‘neutral’). Loci were 191 assigned to bins of width 0.05 by starting (mass-bred) allele frequency, shown 192 on the x-axis. 193 194 Figure S20: Observed protein-protein interaction network overlap 195 compared to expected based on SNP resampling. Histograms of zero-order 196 PPI network size, number of listed genes with known interactions, and network 197 density for simulated networks. Size is represented as node count (A-E) and edge 198 count (F-J) distributions simulated by resampling from the entire mapped D. 199 melanogaster gene list (r6.01). Number of resampled genes with known 200 interactions (i.e. genes that appear in the background Drosophila PPI network) is 201 shown in panels K-O. Network density (the average connectivity across all 202 nodes) is shown in panels P-T. For network density only, the background 203 simulations were conducted by selecting subnetworks from the background 204 network of the same size (same no. nodes) as the observed network. For all 205 panels, the histogram shows the distribution of size measure for 1000 206 simulations. Distribution density is shown by the black line and the 207 corresponding normal distribution is shown by the blue line. The red vertical 208 line shows the size of the observed network, labelled as a percentile of the 209 normal distribution. 210 211 Figure S21: Observed protein-protein interaction network overlap 212 compared to expected based on gene resampling. Histograms of zero-order 213 PPI network overlap node count (A-J, S-AD) and edge count (K-T, AE-AN) 214 distributions simulated by resampling from the entire mapped D. melanogaster 215 gene list (r6.01). The histogram shows the distribution of overlap measures for 216 1000 simulations. Density is shown by the black line and the corresponding 217 normal distribution is shown by the blue line. The red vertical line shows the 218 overlap observed among the real networks, labelled as a percentile of the normal 219 distribution. 220 221 Figure S22: Relationship between gene length and likelihood of 222 representation in a candidate gene list. For each panel, the number of cases 223 (out of 1000) in which a gene appeared in the simulated candidate list is plotted 224 against the gene’s length (in base pairs, including 1000 bp up- and down- 225 stream). Grey points are genes not hit in the observed candidate gene list; red 226 points are genes observed in the real candidate gene list. Results are shown for 227 all individual control (A-E) and desiccation-selected replicate lines (F-J) as well 228 as for the lists of genes overlapping among replicates. Putative lab adaptation 229 genes were removed from desiccation-selected gene lists in each simulation and 230 in the observed data. Table S1: Summary statistics for each replicate line. Total raw read count, mean depth of genome coverage, number of SNPs called, Bonferroni-corrected Pvalue threshold, number of candidate SNPs, number of candidate genes and zeroorder PPI network sizes for each replicate line. Table S2: Power analysis. Posterior probability of identifying at least one SNP association with desiccation tolerance in a simulated causal gene (True Positive Rate), a neutral gene located ~10Kbp from the causal variant (short-range False Positive Rate), or a neutral gene located ~1Mbp from the causal variant (longrange False Positive Rate). The results are based on 2 million independent simulations under the model presented in File S1. Table S3: Inversion frequency results. Frequencies of diagnostic SNPs assayed for various chromosomal inversions in the mass-bred and all generation 13 replicates. Table S4: Desiccation candidate SNPs with putative high impact. SNPs identified as putative high-impact by SNPEff (Cingolani et al. 2012) that increased significantly in frequency between the mass-bred and generation 13 in the desiccation replicates. These SNPs caused the loss or gain of a stop codon, splice acceptor or donor site and hence may impact protein function. Gene function is presented where this is known, using information from Flybase (Santos et al. 2015) and references cited.