Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
TASK QUARTERLY 8 No 2 (2004), 217–221 PATTERN RECOGNITION OF SACROILEITIS WITH THE USE OF MULTISTAGE LOGIC WITH A FUZZY LOSS FUNCTION ROBERT BURDUK Chair of Systems and Computer Networks, Faculty of Electronics, Wroclaw University of Technology, Wyb. Wyspiańskiego 27, 50-370 Wroclaw, Poland [email protected] (Received 12 September 2003; revised manuscript received 30 January 2004) Abstract: The article describes the problem of pattern recognition of sacroileitis. Classification is based on a scheme of multistage recognition with a fuzzy loss function dependent on the node of the decision tree. Decision rules are based on k-nearest neighbors at particular internal nodes of the decision-tree. Paper presents influence of comparison fuzzy numbers on classifications results. Keywords: multistage classifier, sacroileitis, fuzzy loss function, k-nearest neighbor method 1. Medical description of the problem Sacroileitis (SI-itis) belongs to the group of rheumatoid arthritis [1]. SI-itis is a chronic, mostly progressive, inflammatory process, embracing lower back-hip joints, small spine joints, fibrous rings and lower back ligaments, leading to their gradual stiffening. The cause of the disease is not well-known; infectious and hereditary factors are the most likely contributing causes. Participation of microbes is suggested by the considerable frequency of contagions, embracing especially ureters, prior to symptoms of the disease. Genetic predisposition to SI-itis is confirmed by its occurrence in affected families in about 20% of male and 8% of female relatives of the first degree. It has been shown, that the HLA B27 antigen is present in 90–96% of the ill in comparison to 5–14% in a test population of healthy individuals. From among those with a positive HLA B27 antigen morbidity on sacroileitis is only 1%. It is considered that the presence of the antigen testifies to a genetic predisposition to falling ill under the influence of environmental factors, probably infectious. A correct diagnosis is more dificult due to the long time of diesease development and large probability of first symptoms for different diseases. The period after which we have final disease results ranges from 5 to 10 years. A sufficiently early diagnosis of disease is important, for correct treatment to be prescribed already in the initial phase of SI-itis. Therefore, an attempt at a computer-aided prognosis of the development of sacroileitis is well justified. tq208h-e/217 9 IV 2004 BOP s.c., http://www.bop.com.pl 218 R. Burduk 2. Data description Clinical material has been derived from the Research Institute of Rheumatic Diseases in Piestany. 84 patients with SI-itis were examined. Each case record was provided with a final diagnosis made after 5 years. In the multistage classifier the following diseases and groups of diseases of the rheumatoid type have been defined as classes: – – – – ankylosing spondylitis (definitive) (1), ankylosing spondylitis (probable) (2), sacroileitis persistens (3), others (4). The latter class includes the following diseases: – – – – arthritis periferal, sacroileitis present, arthritis periferal, sacroileitis absent, the Reiters syndrome, lumbagia only. 20 clinical features were used in examination, the exact description and characterization of which is given in [2]. 3. Description of the method Let us consider a pattern recognition problem, in which the number of classes is equal to M . Let us assume that the classes are organised in a (N + 1) horizontal decision tree. Let us number all the nodes of the constructed decision-tree with consecutive numbers of 0,1,2,... reserving 0 for the root-node. Let us then assign numbers of classes from the M = {1,2,...,M } set to the terminal nodes so that each one of them is labelled with the number of class connected with that node. This allows the introduction of the following notation: M (n) – the set of numbers of nodes, whose distance from the root is n, SN −1 n = 0,1,2,...,N . In particular, M (0) = {0}, M (N ) = M , M̄ = n=0 M (n) – the set of interior node numbers (non-terminal), Mi ⊆ M (N ) – the set of class numbers attainable from the ith node (i ∈ M̄ ), M i – the set of numbers of immediate descendant nodes i (i ∈ M̄ ), mi – number of the immediate predecessor of the ith node (i 6= 0), L̃(iN ,jN ) = L̃w – fuzzy loss function dependent on the node of the decision tree, where w is the first common predecessor of the nodes iN and jN . We will continue to adopt the probabilistic model of the recognition problem, i.e. we will assume that the class number of the pattern being recognised, jN ∈ M (N ), and its observed features x are realisations of a couple of random variables, JN and X. A multistage classifier for the prognosis of SI-itis development has been proposed [2], so the selection of decision logic is a result of earlier researches. A two-level decision logic is presented in Figure 1, where the natural numbers are as described in the previous section. The interior nodes (0), (5) and (6) are described as sacroileitis, ankylosing spondylitis (the Bechtterew disease) and other rheumatoid diseases, respectively. For the prognosis of SI-itis development, a fuzzy loss function dependent on the node of the decision tree has been used. This loss function means that the loss tq208h-e/218 9 IV 2004 BOP s.c., http://www.bop.com.pl 219 Pattern Recognition of Sacroileitis... 0 5 6 1 2 3 4 Figure 1. Decision tree for the prognosis of SI-itis depends on the node of the decision tree at which a misclassification is made and allows an imprecise, linguistic description of loss. We assume the following linguistic descriptions: small, medium, big for a loss function connected with the suitable internal nodes (5), (6) and (0). In Figure 2 fuzzy loss functions are shown, represented with triangular fuzzy numbers. L̃5 L̃6 L̃0 1 0 1 2 3 Figure 2. Fuzzy loss functions for the prognosis of SI-itis Decision rules at each interior node are based on the k-nearest neighbors (k − N N ) rule for a fuzzy loss function [3]: (k−N N ∗ ) Ψin ,Sm (xin ) = in+1 , whereas X n + (L̃in − L̃in+1 )kiin+1 [L̃in+1 − L̃jn+2 )q̂(jn+2 /in+1 ,jn+2 )kjinn+2 + jn+2 ∈M in+1 X + jn+3 ∈M [L̃jn+2 − L̃jn+3 )q̂(jn+3 /in+1 ,jn+3 )kjinn+3 + jn+2 X +··· + [L̃jN −1 q̂(jN /in+1 ,jN )kjinN ]...] = jN ∈M jN −1 n = max (L̃in − L̃l )klin + l∈M in X [L̃l − L̃jn+2 )q̂(jn+2 /k,jn+2 )kjinn+2 + jn+2 ∈M l X + jn+3 ∈M [L̃jn+2 − L̃jn+3 )q̂(jn+3 /k,jn+3 )kjinn+3 + jn+2 +··· + X o [L̃jN −1 q̂(jN /k,jN )kjinN ]···] , jN ∈M jN −1 tq208h-e/219 9 IV 2004 BOP s.c., http://www.bop.com.pl (1) 220 R. Burduk where kji means the quantity of neighbors of points xi from sequence Xi,j (sequence of measurements of learning objects Xi , i ∈ M̄ , belonging to node j ∈ M (n)) in k i nearest neighbors. Number k i is given and can vary for various i. Numbers k = 1, 3 and 5 were used for a two-stage prognosis of the SI-itis development. The probability of correct classification q̂(in /ik ,in ) was estimated by the leave-one-out method. Table 1. Results of feature selection in sacroileitis Node List of the ranged feature numbers 0 14 12 9 15 4 2 5 14 4 9 5 3 7 6 9 6 19 15 3 16 Table 1 lists the numbers of features selected at each interior node of the twostage classifier of Figure 1 with a fuzzy loss function dependent on the node of the decision tree. Each row presents a list of 6 features from the entire set of 20 symptoms, selected and ordered by the sequential procedure according to their classification accuracy [2]. 4. Results of the recognition algorithm In order to evaluate the multistage classifier with a fuzzy loss function dependent on the node of the decision tree, a test was carried out with a different methods of comparison of fuzzy numbers [4–6]. Results of the frequency of correct classification are presented in Table 2. Table 2. Frequency of correct classification [%] in two-stage classifiers with zero-one and fuzzy loss functions Loss k function 0–1 Method for fuzzy loss function Yager Chen Campos-Gonzalez λ=0 λ = 0.5 λ=1 1 63.1 63.1 63.1 63.1 63.1 63.1 3 71.4 71.4 71.4 69.1 71.4 71.4 5 65.5 65.5 65.5 63.1 65.5 66.6 Comparing the frequencies of correct classification, we can formulate the following conclusions: among all the investigated algorithms the best result is always for k = 3. This regularity persists for all methods of comparison of fuzzy numbers and for the zero-one loss function. Provided a suitable selection of parameter λ for method [4] and k = 5, we can obtain better recognition results than for the zero-one loss function. 5. Conclusion The obtained results seem to confirm the usefulness of multistage recognition algorithms with a fuzzy loss function in computer-aided medical diagnosis. For a longer learning set and a greater k algorithm with a fuzzy loss function it would probably tq208h-e/220 9 IV 2004 BOP s.c., http://www.bop.com.pl Pattern Recognition of Sacroileitis... 221 be better than an algorithm with the zero-one loss function. It is necessary to further test other methods of comparison of fuzzy numbers and apply other (parametric or non-parametric) estimators of conditional probability density functions. References [1] Meske S and Lamparter-Lang R 1997 Rheumatoid Arthritis, PZWL, Warsaw (in Polish) [2] Kurzyński M, Masaryk P and Svec V 1989 Systems Science, Wroclaw 15 59 [3] Burduk R 2003 Computer Algorithms of Multistage Classifier with the Probabilistic-fuzzy Model and Their Application in Medicine, Internal Report, Chair of Systems and Computer Networks, Wroclaw (in Polish) [4] Campos L M and González A 1989 Fuzzy Sets and Systems 29 145 [5] Chen S H 1985 Fuzzy Sets and Systems 17 113 [6] Yager R 1981 Information Sciences 24 143 tq208h-e/221 9 IV 2004 BOP s.c., http://www.bop.com.pl 222 TASK QUARTERLY 8 No 2 (2004) tq208h-e/222 9 IV 2004 BOP s.c., http://www.bop.com.pl