Download Table 2 - BioMed Central

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Student's t-test wikipedia , lookup

World Values Survey wikipedia , lookup

Transcript
Table 2 - 64 statistical measures used to capture the features of each scatter P-plot. The scores have been assigned to callclusters on P-plot after UHT GetGenos software makes XX, XY, YY, or fail call for every signal. These 64 scores were then
combined to use as the input vectors for neural network training and for auto-classification of the new P-plots by the trained neural
network algorithm. x – the value of blue fluorescent intensity of the signal, y – the value of green fluorescent intensity of the signal.
PV = x/(x+y) is p-value of the SNP signal from one DNA sample that serves as the horizontal axis of the P-plot (Figure 1). logrf =
log10(x+y) is the measure of signal intensity that serves as the vertical axis of the P-plot (Figure 1). Measures of position (mean and
median), spread (standard deviation and q-range), skewness, kurtosis, and bimodal indices were calculated using standard formulas
from [10]. Separation of clusters was measured by differences and ratios of cluster center values. The Hardy-Weinberg Chi-square
(score in the row 60) was used to check the likelihood of the allele frequency occurring in the natural population. It was calculated
only when the total number of calls on the P-plot exceeds 25.
Number
1-3
4-6
7-9
10-12
Description
Number of samples in XX, XY, YY
clusters
Mean logrf values of XX, XY, YY
clusters
Standard deviations of logrf values of
XX, XY, YY clusters
Skewness of logrf values of XX, XY,
YY clusters
Formula
Features Measured
Nxx, Nxy, Nyy
Number of XX samples, XY samples, YY
samples
Average signal strengths of XX, XY and YY
samples
Spread of signal strengths of XX, XY and YY
samples
Skewness of logrf of XX samples, XY
samples, YY samples respectively
MEAN[log10(x+y)]
STD[log10(x+y)]
SKEWNESS[log10(x+y)]
13-15
16-18
19-21
22-24
25-27
28-30
31-33
34-36
37-39
40-42
43-45
46-48
49
50
51
Kurtosis of logrf values of XX, XY,
YY clusters
Medians of logrf values of XX, XY,
YY clusters
Q-ranges of logrf values of XX, XY,
YY clusters
Bimodal indices of logrf values of XX,
XY, YY clusters
Number of XX, XY, YY samples with
logrf>=4
Mean PV values of XX, XY, YY
clusters
Standard deviations of PV values of
XX, XY, YY clusters
Skewness of PV values of XX, XY,
YY clusters
Kurtosis of PV values of XX, XY, YY
clusters
Medians of PV values of XX, XY, YY
clusters
Q-ranges of PV values of XX, XY, YY
clusters
Bimodal indices of PV values of XX,
XY, YY clusters
Number of calls failed by UHT
GetGenos or by reviewers
Percentage of calls failed by UHT
GetGenos
Number of negative controls
KURTOSIS[log10(x+y)]
MEDIAN[log10(x+y)]
QRANGE[log10(x+y)]
BIMOD[log10(x+y)]
MEAN[x/(x+y)]
STD[x/(x+y)]
SKEWNESS[x/(x+y)]
KURTOSIS[x/(x+y)]
MEDIAN[x/(x+y)]
QRANGE[x/(x+y)]
BIMOD[x/(x+y)]
Kurtosis of logrf values of XX samples, XY
samples, YY samples
Centers of logrf values of XX samples, XY
samples, YY samples
Spread of logrf values of XX samples, XY
samples, YY samples
Measures whether logrf values in XX, XY,
YY samples have more than one modal
Number of XX, XY, YY samples with usual
signal strength
Average PV values of XX, XY and YY
samples
Spread of PV values of XX, XY and YY
samples
Skewness of PV values of XX samples, XY
samples, YY samples
Kurtosis of PV values of XX samples, XY
samples, YY samples
Centers of PV values of XX samples, XY
samples, YY samples
Spread of PV values of XX samples, XY
samples, YY samples
Measures whether PV values in XX, XY, YY
samples have more than one modal
Measures whether there is any overall reaction
problem
Measures whether there is any overall signal
problem
52
53
54
55
56
Maximum of logrf values of negative
controls
Total number of all calls (N)
Number of samples with logrf < 3.0
Percentage of samples with logrf < 3.0
Maximum size of three call-clusters
MAX[Nxx, Nxy, Nyy]
57
Minimum size of three call-clusters
MIN[Nxx, Nxy, Nyy]
58
Distance between medians of PV
values of XX and XY clusters
Distance between medians of PV
values of XY and YY clusters
Deviation from Hardy –Weinberg
equilibrium chi-square test (6,7)
MEDIANxx[x/(x+y)] MEDIANxy[x/(x+y)]
MEDIANxy[x/(x+y)] –
MEDIANyy[x/(x+y)]
Ratio of median PV values of XX
cluster to that of XY cluster
Ratio of median PV values of XY
cluster to that of YY cluster
Ratio of maximum logrf of negative
controls to median of logrfs of XX,
XY, YY clusters
MEDIANxx[x/(x+y)] /
MEDIANxy[x/(x+y)]
MEDIANxy[x/(x+y)] /
MEDIANyy[x/(x+y)]
MAXneg[log10(x+y)]/
MEDIANxx,xy,yy
[log10(x+y)]
59
60
61
62
63-65
MAXneg[log10(x+y)]
Measures whether there are any negative
controls with positive signal strength
Nxx+ Nxy + Nyy+Nfail
Number of low signal samples
Proportion of low signal samples
Whether there are enough samples in any of
the three types of clusters
Measures whether all three types of clusters
present
Measures the separation between XX cluster
and XY cluster
Measures the separation between XY cluster
and YY cluster
Measures the likelihood of the observed allele
frequency distribution happening in natural
population
Measures separation of XX and XY cluster
Measures separation of XY and YY cluster
Comparison of signal strength of negative
controls to samples; measures noise-to-signal
ratio