* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Supplement. 2014. Nature Methods.
Survey
Document related concepts
Biology and consumer behaviour wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Minimal genome wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression profiling wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Transcript
Supplementary Figure 1. Pair-wise comparison of MEF cells. The matrix shows pair-‐wise comparisons between first seven MEF single-‐cell measurements. The x and y axis of each plot show log10 RPM estimates in the cell corresponding to a given column and row respectively. The set of smoothed scatter plots on the lower left shows the overall correspondence between the transcript abundances estimated in two given cells. The upper right corner shows three-‐component mixture model, separating genes that “drop-‐out” in one of the cells (green component shows drop-‐out events in the column cell, red component shows drop-‐out events in the row cell). The correlated component is shown in blue. The percent of genes within each component is shown in the legend. Some cells, such as MEF_53 show consistently higher drop-‐out rates. Nature Methods: doi:10.1038/nmeth.2967 Supplementary Figure 2. Pair-wise comparison of cells from 16-cell mouse embryo measured by Deng et al. Similar to Supp. Figure 1, the matrix shows pair-‐wise comparisons between first seven four cells from 16-‐cell embryo measured by Deng et al. Despite the reduced noise of the newer protocol, the drop-‐out events are notable, and are more common in some cells then others (e.g. cell 1-‐13 shows ~30% drop-‐out most comparisons, whereas cell 1-‐12 shows ~15%). Nature Methods: doi:10.1038/nmeth.2967 GURSïRXWrDWH 0.0 0.2 0.4 0.6 0.8 1.0 a. 1 3 4 3 4 log10(rpm) 0 1 2 IUHTXHQF\ 10000 20000 30000 log10(rpm) 0 c. 2 PHDUVRQU 0.80 0.85 0.90 0.95 1.00 b. 0 0 1 2 3 4 5 log10(rpm) 3HDUVRQUEHWZHHQFHOOV 0.85 0.90 0.95 1.00 d. 1%LQWHUFHSW 1%VORSH GURSRXWLQWHUFHSW GURSRXWVORSH Supplementary Figure 3. Robustness and accuracy of the model fitting. a. Examples of predicted (solid lines) and true (dashed) drop-‐out dependencies obtained on simulated data (see simulation description below). b. Correlation of predicted drop-‐out rates is shown as a function of expression magnitude based on 100 random simulations (95% confidence band shown). Correlation remains relatively high (r~0.9) even for genes expressed at low magnitudes. c. Distribution of expression magnitudes illustrates that relatively few genes would be impacted by slight decrease in correlation at the extreme expression magnitudes. d. Dependency of the model parameters on the selection criteria for “reliable” genes whose median expression magnitude estimates are used for fitting the error models. The plot shows correlation between the model parameters obtained using the default estimate (obtained requiring at least 10 non-‐failed measurements of a gene) and other values of the minimum number of non-‐failed measurements (cells) per gene. e. Performance of the proposed direct and reciprocal adjusted similarity measures for different values of stabilization parameter k. The boxplots show mean fraction of cells classified correctly is shown is shown for the Pearson measure (blue), direct (red) and reciprocal (green) adjusted measures for different values of k (x-‐axis). Boxplots illustrate variability of the performance based on 50 random single-‐cell dataset simulations. In each simulation, 20 cells of each type were sampled. Simulation of single-cell data. In each iteration, we simulated 50 single-‐cell measurements, using a set of 5000 genes. The e. expression magnitudes were drawn from the 0.90 empirical distribution of the MEF expression 0.85 magnitudes in Islam et al. (across all of the 0.80 cells). The library size was chosen for each 0.75 simulated cell from a uniform distribution 0.70 0.65 U(0.4-‐3 million reads), with the negative 0.60 binomial overdispersion parameter (size) 0.55 chosen from a uniform distribution U(0.3, 2). The drop-‐out dependency on the expression magnitudes were simulated using logistic function (intercept ~ N(1.5,0.5), slope UHFLSURFDO GLUHFW ~N(3,1)) to approximately resemble dependencies observed in the Islam et al. data. For a-‐c. 500 (10%) differentially expressed genes were simulated. For classification measurements shown in e., a smaller number (50) of differentially expressed genes were used to pose a more challenging classification problem. Nature Methods: doi:10.1038/nmeth.2967 1 14 0.99 0.975 0.95 0.925 0.9 0.875 1 0.85 0.99 0.95 0.975 0.9 0.925 0.875 0.85 VR Q DU 4 6 8 10 12 PLQLPXPQXPEHURIFHOOVUHTXLUHGGXULQJILWWLQJ 3H PHDQIUDFWLRQFRUUHFWO\FODVVLILHG 2 0.6 1.0 a. genes higher in ESCs 0.8 0.6 0.2 SCDE (AUC=0.81) DESeq (AUC=0.74) CuffDiff2 (AUC=0.59) SCA (AUC=0.73) 0.0 0.0 0.00 0.4 true positive rate 0.4 0.3 0.2 0.1 true positive rate 0.5 0.02 0.04 0.06 0.08 0.0 0.10 0.2 0.4 0.6 0.8 1.0 false posititive rate false posititive rate 0.8 0.6 0.2 SCDE (AUC=0.84) DESeq (AUC=0.82) CuffDiff2 (AUC=0.67) SCA (AUC=0.80) 0.0 0.0 0.00 0.4 true positive rate 0.4 0.3 0.2 0.1 true positive rate 0.5 0.6 1.0 b. genes higher in MEFs 0.02 0.04 0.06 false positive rate 0.08 0.10 0.0 0.2 0.4 0.6 0.8 1.0 false positive rate Supplementary Figure 4. Performance in detecting genes significantly upregulated in ES or MEF cells. Similar to the Figure 2c of the main manuscript, the ROC curves are used to assess the performance of the differential expression analysis methods. In this case, the genes were ranked based on the significance of their up-‐ regulation in ES (a.) or MEF (b.) cells. Most methods perform better in detecting genes that are significantly higher in MEFs, which is likely related to much higher overall RNA-‐seq signals detected in MEF cells (as opposed to ES cells with relatively low RNA content and fewer detected reads). Nature Methods: doi:10.1038/nmeth.2967 8-cell vs. 16-cell embryo reciprocal 0 0.54 0.56 5 0.58 10 0.60 15 0.62 number of discrepant edges fraction correctly classified 0.64 20 0 0.6 2 0.7 4 0.8 6 0.9 number of discrepant edges fraction correctly classified 1.0 Supplementary Figure 5. Clustering performance on reciprocal reciprocal early embryo cells direct direct Pearson Pearson measured by Deng et al. Bray−Curtis Bray−Curtis a. Separating cells derived from 8-‐cell and 16-‐cell embryos. The left plot shows fraction of cells correctly classified by the top-‐level clustering split as a function of the number of top differentially expressed 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 genes removed (as in Figure 2d in the main manuscript). top differential genes omitted top differential genes omitted The right plot shows another b. 8-cell vs. early blastocyst measure of clustering reciprocal performance – number of direct Pearson discrepant edges (lower Bray−Curtis number corresponds to better performance, see figures c,d below). 10 cells of each type were randomly sampled in each iteration. By reciprocal direct both measures, the Pearson reciprocal distance measure Bray−Curtis shows better performance. b. Separating cells derived 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 from 8-‐cell embryos and top differential genes omitted top differential genes omitted early blastocysts. This is a more challenging contrast, ● 8-cell embryo cells c. and the top-‐level split ● 16-cell embryo cells discrepant edges identified by all of the examined methods does poorly in separating the two types of cells. However the “number of discrepant edges” (right plot) demonstrates that the drop-‐ out weighed distance measures (reciprocal and d. direct) tend to group 8-‐cell embryo cells in one part of the resulting tree. c,d. Lower number of discrepant edges corresponds to better classification performance. All measured 8-‐cell and 16-‐ cell embryo cells are clustered using reciprocal-‐weighted distance (c.) or Bray-‐Curtis distance (d.). The discrepant edges (highlighted in red) mark all branches of the tree that contain more than one type of cells within them. Here, a clustering derived using reciprocal-‐weighted distance (c.) groups most of the 8-‐cell embryo cells into one branch and the resulting number of discrepant edges is small. By contrast, clustering derived using Bryan-‐Curtis distance distributes 8-‐cell embryo cells more evenly, resulting in many more mixed branches, and a higher number of discrepant edges. a. ● ● ● ● ●● ● ●● ● ● ●● Bray-Curtis ●● ●● ● ●●● ●● ●●● ●● ● ● ●● ●● ● ●● ●●●● ●● ●● ● ●●● ●● ●●●● ● ● ●●●● ● ● ● ●● ●●●●● ●●●● ●● ● ● ●● ● ● ● ● ●●● ● ●●●●●●● ●●●●●● ●●●● ●●●●●●●●● ●●●● ●● ●● Nature Methods: doi:10.1038/nmeth.2967 ● ●●● ● ●●● ●●●● ●●●●●● ● ●● ● ●●●●● ●● ●●