Download Supplementary Figures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Supplementary Figures
Standard Exon Array Quality Control Analysis
CS1 and CS2 Array QC
Every array image was initially analysed using the Affymetrix Expression Console software version 1.2
(www.affymetrix.com)
QC parameters as per Affymetrix guidelines (Affymetrix White Paper: Quality Assessment of Exon
and Gene Arrays Revision Date: 2007-04-06 Revision Version: 1.1) were examined. The purpose of
examining these was to find outlier arrays that differed substantially from the majority of the cohort
and determine reproducibility and consistency of signals across arrays.
Sample Quality Metrics
6 metrics are routinely used to assess the sample and hybridization quality of arrays:
Metric
Meaning
Description
Pos vs neg auc (AUC)
Area under the curve (AUC)
value for a ROC curve plotting
detection of positive controls
against false detection of
negative controls.
Mean signal of all probesets in
the analysis
Overall data quality
measurement. Perfect data
scores 1. Scores of 0.5 mean
there is no difference between
positive and negative controls.
Detects bright or dim arrays.
Value should be consistent
between replicates
Comparison of a probeset signal
to the median signal. Higher
values indicate the signal is
different from the others
Values should be similar for
similar samples. FFPE varies
more and tends to be lower.
Samples should not be more
than 10% lower than typical
values
Four spikes are input in
increasing concentration so
signal values should be BioB<
BoiC < BioD < Cre.
As above, but for 3’ based
expression
All probeset mean (APM)
All probeset RLE mean (RLE)
Mean absolute relative log
expression
Percent present
(%P)
Percent of exon-level probesets
detected based on DABG
algorithm
5’ hybridization and labeling
controls
Bacterial spikes, labeled
independently but added to
hybridization cocktail, 5’ based
expression
As above, but for 3’ based
expression
3’ hybridization and labeling
controls
5’ and 3’ bacterial spikes were plotted as simple bar graphs to ensure the concentration ratios were
as expected. For all arrays this was the case. An example plot is shown below.
For the remaining metrics, an array was considered an outlier if the metric value was more than 10%
above or below the mean value for the whole cohort. None of the arrays failed on more than 2
metrics therefore all arrays were considered in the downstream analysis.
In addition PCA (Principal Components Analysis) plots were used to examine the overall variance
structure in the two cohorts. PCA was carried out using all 1.4m probesets on the array, after RMA
normalisation but prior to any probeset filtering.
In the first two Principal Components, there were no distinct outliers for either cohort.
Example of bacterial spike-in controls (CS2):
A
B
Supplementary Figure 1. Bar charts to show the relationship between the spiked
in bacterial hybridisation controls for CS2 (A: 5’, B: 3’). For each sample, a red,
green blue and cyan bar represent the expression of BioB, BioC, BioD and Cre.
The expression levels for each sample show the correct trend where BioB <
BioC < BioD < Cre.
Summary of QC metrics (CS1 and CS2):
A
B
Supplementary Figure 2. Radial plots to summarise the QC metrics for CS1 (A) and
CS2(B). Each coloured solid line represents the metric for the samples in the cohort,
and the corresponding dashed lines represent +/- 10% of the mean value for the
cohort. Where the solid line crosses the dashed line, the sample can be considered an
outlier for that metric.
PCA score plots of RMA normalised probesets (CS1 and CS2):
A
B
Figure 3. Plots of scores for the first two Principal Components from a PCA of CS1 (A)
and CS2(B). In both cases, RMA normalised probesets are filtered to retain exonic
probesets that hit the genome once only. For CS1 there are no obvious outliers to the
‘cloud’ of samples distributed throughout the PC space. For CS2, there are 4 samples
that lie outside the main grouping of samples (V183, V194, V212 and V228). Crossreference of these samples to the radial plots indicate that they are outliers for only one
of the QC metrics, thus are included for downstream analysis.
A
B
Figure 4. A) x-y scatterplots showing the median stem-loop probeset expression
(miRNA 2.0 array) for all 1105 stem-loop probes. y-axis values are the median of eight
CS1 samples and x-axis values are the median of eight CS2 samples. B). x-y
scatterplots showing stem-loop probeset expression (miRNA 2.0 array) for all 1105
stem-loop probes in two cell lines. y-axis values is the expression of stem-loop
probesets in Me180 (cervix SCC cell line) and x-axis values are the expression of stemloop probesets in HeLa cells (cervix AC cell line).
A
RNA yield vs. age
600
R2 = 0.0008
500
RNA yield (ng)
400
300
200
100
0
0
5
10
15
20
25
Age of FFPE block
B
RNA Quality vs. age
10
9
R2 = 0.0039
8
7
RIN
6
5
4
3
2
1
0
0
5
10
15
20
25
Age of FFPE block
Supplementary Figure 5. (A) X-Y scatter showing RNA yield (ng) against the age of
FFPE block. (B) X-Y scatter showing the RNA quality (RIN) against the age of the FFPE
block. Combined cervix series.
A
Correlation between 260/280 and age of block
R2 = 0.0268
4
3.5
260/280
3
2.5
2
1.5
1
0
5
10
15
20
25
Age of FFPE block
B
Correlation between 260/230 and age of block
R2 = 0.0539
4
3.5
3
2.5
260/230
2
1.5
1
0.5
0
0
5
10
15
20
25
-0.5
-1
Age of FFPE block
Supplementary Figure 6. (A) X-Y scatter showing 260/280 ratio against the age of FFPE
block. (B) X-Y scatter showing the 260/230 ratio against the age of the FFPE block.
Combined cervix series.
Housekeeper gene expression (CS1):
A
B
C
Figure 7. Boxplots show the distribution of gene level expression (median of the exonic
probesets) for a set of recognised housekeeper genes in CS2 arrays. A: all arrays in the
cohort, B: those arrays passing a 20% Present (DABG p <0.05) filter and C: arrays passing a
15% Present filter. Distribution and range of expression does not differ using different cutoffs for %P, suggesting that the expression for these genes is stable across the cohort.
Housekeeper gene expression (CS2):
A
B
C
Figure 8. Boxplots show the distribution of gene level expression (median of the exonic
probesets) for a set of recognised housekeeper genes in CS2 arrays. A: all arrays in the
cohort, B: those arrays passing a 20% Present (DABG p <0.05) filter and C: arrays passing a
15% Present filter. Distribution and range of expression does not differ using different cutoffs for %P, suggesting that the expression for these genes is stable across the cohort.
FC: 7.6
FC: 6.9
0
-5
-10
-15
Cell lines
CS1
AC
SC
C
AC
SC
C
AC
-20
SC
C
log2(relative Expression of hsa-miR-205)
(normalised to hsa-miR-16.1 & hsa-miR-26b)
FC: 11.8
CS2
Supplementary Figure 9. Taqman qRT-PCR data showing the log2 of relative
expression of hsa-miR-205 normalised to hsa-miR-16.1 and hsa-miR-26b across the
random subsets from the three series; cell lines, CS1 and CS2 samples.
Cervix
Number of
Median age of
patients
sample (years)
Cohort
Tumour type
CS1 subset (“young”)
Cervix cancer
8
12 [8-12]
28.4 [15.8-38.5]
CS2 subset (“old”)
Cervix cancer
8
19 [17-20]
16.0 [11.0-20.3]
[ ] Numbers in square brackets represent the range.
Supplementary Table 1. miRNA FFPE subsets
Median %DABG
Related documents