Download Data Supplement

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Polycomb Group Proteins and Cancer wikipedia , lookup

Epigenetics in stem-cell differentiation wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

NEDD9 wikipedia , lookup

Transcript
MOL 17061
SUPPLEMENTAL DATA AND STATISTICAL ANALYSIS
A final assessment of the modeling effort is to investigate how robust the predicted GI50
profiles are with respect to perturbations in the gene expression measurements. Initially the
model was subjected to a Jack-knife test in which each cell line was removed and the
model’s parameters re-fitted. This caused a maximum reduction of the PCC of 4%,
indicating no particular sensitivity to exclusion of either sensitive or insensitive cell lines.
More importantly there are two statistical issues that need to be quantified; a) how well the
model can predict the cell lines that are sensitive and, conversely, b) how well the model
predicts insensitive cell lines. Both measures would be of importance in a practical setting.
The designation of sensitive cell lines is made by selecting a maximum concentration value
for the GI50 response; cells that show a 50% growth inhibition at lower concentrations are
termed sensitive while cells that are at or above this concentration are classified as
insensitive. The dose response curves used to measure the GI50 range from 10-4 to 10-8 M
and the 60 cell lines show a range of concentrations in this interval. Supplemental Table 1
shows the cells lines designated as sensitive for GI50 concentrations less than 10-6 M.
Using the model predictions for different selections of the critical maximum
concentration we can calculate how many cell lines are correctly predicted to be sensitive
(true positives (tp)), how many are correctly predicted to be insensitive (true negatives (tn)),
as well as false positives (fp) and false negatives (fn). The ability of the method to discern
true hits is gauged by the sensitivity, defined as the total number of true positives divided
by the total number of true hits in the dataset. Likewise the ability to identify insensitive
cell lines is given by the specificity, defined as the total number of predicted insensitive cell
lines divided by the total number of truly predicted insensitive cell lines in the dataset.
1
MOL 17061
These numbers are given in the second and third column of Supplemental Table 1, and
indicate that the model is fully capable of resolving the cell set at a critical GI50 of 10-6 M
for sensitivity. In order to gauge a more reasonable performance of the model it is necessary
to see how these predictions stack up when we introduce noise in the gene expression
measurements. The gene expression for each of the seven selected genes in the 60 cell lines
were perturbed with Gaussian noise, such that expression values were randomly re-sampled
from a normal distribution with the same mean as the measured value e and a variance
given as   c  e where c is varied between 0.00 and 0.25. These calculations are
summarized in Figure 5 using a critical GI50 of 10-6 M for sensitivity. The ability to discern
insensitive cell lines is shown as the specificity in Supplemental Figure 1 and is not
strongly dependent on the noise in the gene expression signal, with a reduction of
prediction by less than 10% at the highest noise level studied. Identification of the sensitive
cell lines shows a stronger dependency on the applied noise, with a 50% reduction in
sensitivity at an applied 25% random Gaussian noise. A maximum tolerated reduction in
the sensitivity of the model by 10% finds that less than 5% noise can be tolerated in the
RT/PCR measurements. We calculated the corresponding correlation coefficients, and for a
5% tolerance this translates to having repeat RT/PCR measurement having a minimum
PCC of 0.70. Although strict, this type of accuracy is achievable with RT/PCR methods
using the same primer design and employing multiple samples (Stahlberg et al., 2004).
2
MOL 17061
REFERENCES
Stahlberg A, Hakansson J, Xian X, Semb H and Kubista M (2004) Properties of the reverse
transcription reaction in mRNA quantification. Clin Chem 50:509-15.
3
MOL 17061
SUPPLEMENTAL TABLES
Table 1. Characteristics of the predicted GI50 values.
Statistical analysis showing the characteristics of the predicted GI50 values. Depending on
how cells are classified as sensitive (high -log(GI50) and insensitive (low -log(GI50)) the
ability of the procedure to positively identify sensitive cell lines varies. At a critical
maximum concentration of 10-7 M almost 90% of the sensitive cell lines were detected
without any false predictions, likewise all insensitive cell lines could be predicted without
error. In Supplemental Figure 1 these functions are measured when noise is applied to the
data.
Number of
-log(GI50) tp/(tp+fn)
tn/(tn+fp)
Sensitive
Cell lines tagged as sensitive
cells
T-47D, MCF-7, TK-10, IGROV1,
6.0
1.00
1.00
8
HCC-2998, SR, NCI-H460, OVCAR-5
T-47D, MCF-7, TK-10, IGROV1,
7.0
0.86
1.00
7
HCC-2998, SR, NCI-H460
T-47D, MCF-7, TK-10, IGROV1,
8.0
0.44
0.91
5
HCC-2998
4
MOL 17061
SUPPLEMENTAL FIGURES
Figure 1. Influence of noise on the prediction model. Since the experimental determination
of gene expression levels are associated with some uncertainty, we modeled the robustness
of the procedure by randomly re-sampling a Gaussian distribution with the same average
but with varying width. The specificity and sensitivity measure shows a steady decline with
increased noise, as expected.
5