Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Supplementary Figures Standard Exon Array Quality Control Analysis CS1 and CS2 Array QC Every array image was initially analysed using the Affymetrix Expression Console software version 1.2 (www.affymetrix.com) QC parameters as per Affymetrix guidelines (Affymetrix White Paper: Quality Assessment of Exon and Gene Arrays Revision Date: 2007-04-06 Revision Version: 1.1) were examined. The purpose of examining these was to find outlier arrays that differed substantially from the majority of the cohort and determine reproducibility and consistency of signals across arrays. Sample Quality Metrics 6 metrics are routinely used to assess the sample and hybridization quality of arrays: Metric Meaning Description Pos vs neg auc (AUC) Area under the curve (AUC) value for a ROC curve plotting detection of positive controls against false detection of negative controls. Mean signal of all probesets in the analysis Overall data quality measurement. Perfect data scores 1. Scores of 0.5 mean there is no difference between positive and negative controls. Detects bright or dim arrays. Value should be consistent between replicates Comparison of a probeset signal to the median signal. Higher values indicate the signal is different from the others Values should be similar for similar samples. FFPE varies more and tends to be lower. Samples should not be more than 10% lower than typical values Four spikes are input in increasing concentration so signal values should be BioB< BoiC < BioD < Cre. As above, but for 3’ based expression All probeset mean (APM) All probeset RLE mean (RLE) Mean absolute relative log expression Percent present (%P) Percent of exon-level probesets detected based on DABG algorithm 5’ hybridization and labeling controls Bacterial spikes, labeled independently but added to hybridization cocktail, 5’ based expression As above, but for 3’ based expression 3’ hybridization and labeling controls 5’ and 3’ bacterial spikes were plotted as simple bar graphs to ensure the concentration ratios were as expected. For all arrays this was the case. An example plot is shown below. For the remaining metrics, an array was considered an outlier if the metric value was more than 10% above or below the mean value for the whole cohort. None of the arrays failed on more than 2 metrics therefore all arrays were considered in the downstream analysis. In addition PCA (Principal Components Analysis) plots were used to examine the overall variance structure in the two cohorts. PCA was carried out using all 1.4m probesets on the array, after RMA normalisation but prior to any probeset filtering. In the first two Principal Components, there were no distinct outliers for either cohort. Example of bacterial spike-in controls (CS2): A B Supplementary Figure 1. Bar charts to show the relationship between the spiked in bacterial hybridisation controls for CS2 (A: 5’, B: 3’). For each sample, a red, green blue and cyan bar represent the expression of BioB, BioC, BioD and Cre. The expression levels for each sample show the correct trend where BioB < BioC < BioD < Cre. Summary of QC metrics (CS1 and CS2): A B Supplementary Figure 2. Radial plots to summarise the QC metrics for CS1 (A) and CS2(B). Each coloured solid line represents the metric for the samples in the cohort, and the corresponding dashed lines represent +/- 10% of the mean value for the cohort. Where the solid line crosses the dashed line, the sample can be considered an outlier for that metric. PCA score plots of RMA normalised probesets (CS1 and CS2): A B Figure 3. Plots of scores for the first two Principal Components from a PCA of CS1 (A) and CS2(B). In both cases, RMA normalised probesets are filtered to retain exonic probesets that hit the genome once only. For CS1 there are no obvious outliers to the ‘cloud’ of samples distributed throughout the PC space. For CS2, there are 4 samples that lie outside the main grouping of samples (V183, V194, V212 and V228). Crossreference of these samples to the radial plots indicate that they are outliers for only one of the QC metrics, thus are included for downstream analysis. A B Figure 4. A) x-y scatterplots showing the median stem-loop probeset expression (miRNA 2.0 array) for all 1105 stem-loop probes. y-axis values are the median of eight CS1 samples and x-axis values are the median of eight CS2 samples. B). x-y scatterplots showing stem-loop probeset expression (miRNA 2.0 array) for all 1105 stem-loop probes in two cell lines. y-axis values is the expression of stem-loop probesets in Me180 (cervix SCC cell line) and x-axis values are the expression of stemloop probesets in HeLa cells (cervix AC cell line). A RNA yield vs. age 600 R2 = 0.0008 500 RNA yield (ng) 400 300 200 100 0 0 5 10 15 20 25 Age of FFPE block B RNA Quality vs. age 10 9 R2 = 0.0039 8 7 RIN 6 5 4 3 2 1 0 0 5 10 15 20 25 Age of FFPE block Supplementary Figure 5. (A) X-Y scatter showing RNA yield (ng) against the age of FFPE block. (B) X-Y scatter showing the RNA quality (RIN) against the age of the FFPE block. Combined cervix series. A Correlation between 260/280 and age of block R2 = 0.0268 4 3.5 260/280 3 2.5 2 1.5 1 0 5 10 15 20 25 Age of FFPE block B Correlation between 260/230 and age of block R2 = 0.0539 4 3.5 3 2.5 260/230 2 1.5 1 0.5 0 0 5 10 15 20 25 -0.5 -1 Age of FFPE block Supplementary Figure 6. (A) X-Y scatter showing 260/280 ratio against the age of FFPE block. (B) X-Y scatter showing the 260/230 ratio against the age of the FFPE block. Combined cervix series. Housekeeper gene expression (CS1): A B C Figure 7. Boxplots show the distribution of gene level expression (median of the exonic probesets) for a set of recognised housekeeper genes in CS2 arrays. A: all arrays in the cohort, B: those arrays passing a 20% Present (DABG p <0.05) filter and C: arrays passing a 15% Present filter. Distribution and range of expression does not differ using different cutoffs for %P, suggesting that the expression for these genes is stable across the cohort. Housekeeper gene expression (CS2): A B C Figure 8. Boxplots show the distribution of gene level expression (median of the exonic probesets) for a set of recognised housekeeper genes in CS2 arrays. A: all arrays in the cohort, B: those arrays passing a 20% Present (DABG p <0.05) filter and C: arrays passing a 15% Present filter. Distribution and range of expression does not differ using different cutoffs for %P, suggesting that the expression for these genes is stable across the cohort. FC: 7.6 FC: 6.9 0 -5 -10 -15 Cell lines CS1 AC SC C AC SC C AC -20 SC C log2(relative Expression of hsa-miR-205) (normalised to hsa-miR-16.1 & hsa-miR-26b) FC: 11.8 CS2 Supplementary Figure 9. Taqman qRT-PCR data showing the log2 of relative expression of hsa-miR-205 normalised to hsa-miR-16.1 and hsa-miR-26b across the random subsets from the three series; cell lines, CS1 and CS2 samples. Cervix Number of Median age of patients sample (years) Cohort Tumour type CS1 subset (“young”) Cervix cancer 8 12 [8-12] 28.4 [15.8-38.5] CS2 subset (“old”) Cervix cancer 8 19 [17-20] 16.0 [11.0-20.3] [ ] Numbers in square brackets represent the range. Supplementary Table 1. miRNA FFPE subsets Median %DABG