Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Table 2 - 64 statistical measures used to capture the features of each scatter P-plot. The scores have been assigned to callclusters on P-plot after UHT GetGenos software makes XX, XY, YY, or fail call for every signal. These 64 scores were then combined to use as the input vectors for neural network training and for auto-classification of the new P-plots by the trained neural network algorithm. x – the value of blue fluorescent intensity of the signal, y – the value of green fluorescent intensity of the signal. PV = x/(x+y) is p-value of the SNP signal from one DNA sample that serves as the horizontal axis of the P-plot (Figure 1). logrf = log10(x+y) is the measure of signal intensity that serves as the vertical axis of the P-plot (Figure 1). Measures of position (mean and median), spread (standard deviation and q-range), skewness, kurtosis, and bimodal indices were calculated using standard formulas from [10]. Separation of clusters was measured by differences and ratios of cluster center values. The Hardy-Weinberg Chi-square (score in the row 60) was used to check the likelihood of the allele frequency occurring in the natural population. It was calculated only when the total number of calls on the P-plot exceeds 25. Number 1-3 4-6 7-9 10-12 Description Number of samples in XX, XY, YY clusters Mean logrf values of XX, XY, YY clusters Standard deviations of logrf values of XX, XY, YY clusters Skewness of logrf values of XX, XY, YY clusters Formula Features Measured Nxx, Nxy, Nyy Number of XX samples, XY samples, YY samples Average signal strengths of XX, XY and YY samples Spread of signal strengths of XX, XY and YY samples Skewness of logrf of XX samples, XY samples, YY samples respectively MEAN[log10(x+y)] STD[log10(x+y)] SKEWNESS[log10(x+y)] 13-15 16-18 19-21 22-24 25-27 28-30 31-33 34-36 37-39 40-42 43-45 46-48 49 50 51 Kurtosis of logrf values of XX, XY, YY clusters Medians of logrf values of XX, XY, YY clusters Q-ranges of logrf values of XX, XY, YY clusters Bimodal indices of logrf values of XX, XY, YY clusters Number of XX, XY, YY samples with logrf>=4 Mean PV values of XX, XY, YY clusters Standard deviations of PV values of XX, XY, YY clusters Skewness of PV values of XX, XY, YY clusters Kurtosis of PV values of XX, XY, YY clusters Medians of PV values of XX, XY, YY clusters Q-ranges of PV values of XX, XY, YY clusters Bimodal indices of PV values of XX, XY, YY clusters Number of calls failed by UHT GetGenos or by reviewers Percentage of calls failed by UHT GetGenos Number of negative controls KURTOSIS[log10(x+y)] MEDIAN[log10(x+y)] QRANGE[log10(x+y)] BIMOD[log10(x+y)] MEAN[x/(x+y)] STD[x/(x+y)] SKEWNESS[x/(x+y)] KURTOSIS[x/(x+y)] MEDIAN[x/(x+y)] QRANGE[x/(x+y)] BIMOD[x/(x+y)] Kurtosis of logrf values of XX samples, XY samples, YY samples Centers of logrf values of XX samples, XY samples, YY samples Spread of logrf values of XX samples, XY samples, YY samples Measures whether logrf values in XX, XY, YY samples have more than one modal Number of XX, XY, YY samples with usual signal strength Average PV values of XX, XY and YY samples Spread of PV values of XX, XY and YY samples Skewness of PV values of XX samples, XY samples, YY samples Kurtosis of PV values of XX samples, XY samples, YY samples Centers of PV values of XX samples, XY samples, YY samples Spread of PV values of XX samples, XY samples, YY samples Measures whether PV values in XX, XY, YY samples have more than one modal Measures whether there is any overall reaction problem Measures whether there is any overall signal problem 52 53 54 55 56 Maximum of logrf values of negative controls Total number of all calls (N) Number of samples with logrf < 3.0 Percentage of samples with logrf < 3.0 Maximum size of three call-clusters MAX[Nxx, Nxy, Nyy] 57 Minimum size of three call-clusters MIN[Nxx, Nxy, Nyy] 58 Distance between medians of PV values of XX and XY clusters Distance between medians of PV values of XY and YY clusters Deviation from Hardy –Weinberg equilibrium chi-square test (6,7) MEDIANxx[x/(x+y)] MEDIANxy[x/(x+y)] MEDIANxy[x/(x+y)] – MEDIANyy[x/(x+y)] Ratio of median PV values of XX cluster to that of XY cluster Ratio of median PV values of XY cluster to that of YY cluster Ratio of maximum logrf of negative controls to median of logrfs of XX, XY, YY clusters MEDIANxx[x/(x+y)] / MEDIANxy[x/(x+y)] MEDIANxy[x/(x+y)] / MEDIANyy[x/(x+y)] MAXneg[log10(x+y)]/ MEDIANxx,xy,yy [log10(x+y)] 59 60 61 62 63-65 MAXneg[log10(x+y)] Measures whether there are any negative controls with positive signal strength Nxx+ Nxy + Nyy+Nfail Number of low signal samples Proportion of low signal samples Whether there are enough samples in any of the three types of clusters Measures whether all three types of clusters present Measures the separation between XX cluster and XY cluster Measures the separation between XY cluster and YY cluster Measures the likelihood of the observed allele frequency distribution happening in natural population Measures separation of XX and XY cluster Measures separation of XY and YY cluster Comparison of signal strength of negative controls to samples; measures noise-to-signal ratio