Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MOL 17061 SUPPLEMENTAL DATA AND STATISTICAL ANALYSIS A final assessment of the modeling effort is to investigate how robust the predicted GI50 profiles are with respect to perturbations in the gene expression measurements. Initially the model was subjected to a Jack-knife test in which each cell line was removed and the model’s parameters re-fitted. This caused a maximum reduction of the PCC of 4%, indicating no particular sensitivity to exclusion of either sensitive or insensitive cell lines. More importantly there are two statistical issues that need to be quantified; a) how well the model can predict the cell lines that are sensitive and, conversely, b) how well the model predicts insensitive cell lines. Both measures would be of importance in a practical setting. The designation of sensitive cell lines is made by selecting a maximum concentration value for the GI50 response; cells that show a 50% growth inhibition at lower concentrations are termed sensitive while cells that are at or above this concentration are classified as insensitive. The dose response curves used to measure the GI50 range from 10-4 to 10-8 M and the 60 cell lines show a range of concentrations in this interval. Supplemental Table 1 shows the cells lines designated as sensitive for GI50 concentrations less than 10-6 M. Using the model predictions for different selections of the critical maximum concentration we can calculate how many cell lines are correctly predicted to be sensitive (true positives (tp)), how many are correctly predicted to be insensitive (true negatives (tn)), as well as false positives (fp) and false negatives (fn). The ability of the method to discern true hits is gauged by the sensitivity, defined as the total number of true positives divided by the total number of true hits in the dataset. Likewise the ability to identify insensitive cell lines is given by the specificity, defined as the total number of predicted insensitive cell lines divided by the total number of truly predicted insensitive cell lines in the dataset. 1 MOL 17061 These numbers are given in the second and third column of Supplemental Table 1, and indicate that the model is fully capable of resolving the cell set at a critical GI50 of 10-6 M for sensitivity. In order to gauge a more reasonable performance of the model it is necessary to see how these predictions stack up when we introduce noise in the gene expression measurements. The gene expression for each of the seven selected genes in the 60 cell lines were perturbed with Gaussian noise, such that expression values were randomly re-sampled from a normal distribution with the same mean as the measured value e and a variance given as c e where c is varied between 0.00 and 0.25. These calculations are summarized in Figure 5 using a critical GI50 of 10-6 M for sensitivity. The ability to discern insensitive cell lines is shown as the specificity in Supplemental Figure 1 and is not strongly dependent on the noise in the gene expression signal, with a reduction of prediction by less than 10% at the highest noise level studied. Identification of the sensitive cell lines shows a stronger dependency on the applied noise, with a 50% reduction in sensitivity at an applied 25% random Gaussian noise. A maximum tolerated reduction in the sensitivity of the model by 10% finds that less than 5% noise can be tolerated in the RT/PCR measurements. We calculated the corresponding correlation coefficients, and for a 5% tolerance this translates to having repeat RT/PCR measurement having a minimum PCC of 0.70. Although strict, this type of accuracy is achievable with RT/PCR methods using the same primer design and employing multiple samples (Stahlberg et al., 2004). 2 MOL 17061 REFERENCES Stahlberg A, Hakansson J, Xian X, Semb H and Kubista M (2004) Properties of the reverse transcription reaction in mRNA quantification. Clin Chem 50:509-15. 3 MOL 17061 SUPPLEMENTAL TABLES Table 1. Characteristics of the predicted GI50 values. Statistical analysis showing the characteristics of the predicted GI50 values. Depending on how cells are classified as sensitive (high -log(GI50) and insensitive (low -log(GI50)) the ability of the procedure to positively identify sensitive cell lines varies. At a critical maximum concentration of 10-7 M almost 90% of the sensitive cell lines were detected without any false predictions, likewise all insensitive cell lines could be predicted without error. In Supplemental Figure 1 these functions are measured when noise is applied to the data. Number of -log(GI50) tp/(tp+fn) tn/(tn+fp) Sensitive Cell lines tagged as sensitive cells T-47D, MCF-7, TK-10, IGROV1, 6.0 1.00 1.00 8 HCC-2998, SR, NCI-H460, OVCAR-5 T-47D, MCF-7, TK-10, IGROV1, 7.0 0.86 1.00 7 HCC-2998, SR, NCI-H460 T-47D, MCF-7, TK-10, IGROV1, 8.0 0.44 0.91 5 HCC-2998 4 MOL 17061 SUPPLEMENTAL FIGURES Figure 1. Influence of noise on the prediction model. Since the experimental determination of gene expression levels are associated with some uncertainty, we modeled the robustness of the procedure by randomly re-sampling a Gaussian distribution with the same average but with varying width. The specificity and sensitivity measure shows a steady decline with increased noise, as expected. 5