Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Machine learning wikipedia , lookup
Neural modeling fields wikipedia , lookup
Time series wikipedia , lookup
Gene expression programming wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
Backpropagation wikipedia , lookup
Pattern recognition wikipedia , lookup
Local Linear Radial Basis Function Neural Networks for Classification of Breast Cancer Data M.R Senapatia , S.P Dasb, P.K.Champatic, P.K.Routrayd, a Department of computer science and engineering Centurion Institute of Technology Centurion University of Technology and management, 752050, India +919437132010 E-mail: [email protected] b Department of Computer Science and Engineering Templecity Institute of Technology & Engineering Biju Pattnaik University of Technology Rourkela-752002, India E-mail: [email protected] c Department of Computer Science and Engineering Ajay Binay Institute of Technology Biju Pattnaik University of Technology Rourkela-752002, India E-mail: [email protected] d Department of Computer Science and Engineering NM Institute of Engineering and Technology Biju Pattnaik University of Technology Rourkela-752002, India E-mail: [email protected] Abstract: - Breast cancer is the major cause of cancer deaths in women today and it is the most common type of cancer in women. This paper presents some experiments for classification of breast cancer tumor and proposes the use of local liner radial basis function neural network (LLRBFNN) for classification and recognition of breast cancer. The experiments were conducted on extracted breast cancer data from University of Wisconsin Hospital, Madison. The neural network is trained with breast cancer data by using feed forward neural network model and back propagation learning algorithm with momentum and variable learning rate. The performance of the network is evaluated. The result has been compared with a wide range of classifiers to evaluate its performance. The evaluations show that the proposed approach is very robust, effective and gives better correct classification as compared to other classifiers. Keywords:- Local liner radial basis function neural network (LLRBFNN), Radial basis function neural network (RBFNN), Multi layer perceptron (MLP), Wisconsin breast cancer (WBC). 1. Introduction The high incidence of breast cancer in women especially in developed countries has increased significantly in the last few years. The causes for this is not very much clear and neither the reason for the increases in number. At present there is no method to prevent and avoid breast cancer; only early detection can increase the survival rate. There are various methods to detect breast cancer and mammography is considered the most reliable method. But is has got a major drawback that is the volume of mammograms to be rated by the Physician decrease the accuracy. Multiple reading of mammograms (consecutive reading by two physicians) will increase accuracy but at a higher cost. That is why computer aided classification techniques are necessary to achieve the accuracy and increase the chance of survival [1, 2]. Digital mammograms are among the most difficult medical images to be read due to their low contrast and difference in the type of tissues. Important visual clues of breast cancer include preliminary signs of masses and classification clusters. Unfortunately, in the early stages of breast cancer, those signs are subtle and vary in appearance, making it difficult for the specialist to diagnose. This is the main reason for the development of classification system to assist the specialist in the medical institution. Due to the importance and significance much research in the breast cancer classification has been done recently. [3-6]. Early diagnose requires an accurate and reliable diagnosis procedure that allows physicians to distinguish breast tumors from malignant ones. Thus finding an accurate and effective diagnosis method is very important. Many methods of AI have shown better results than obtained by the experimental methods; for example, in 1997, Burke et al. [7] compared the accuracy of TNM staging system with the accuracy of a multilayer back propagation Artificial Neural Network (ANN) for predicting the 5-year survival of patients with breast carcinoma. ANN increased the prediction capacity by 10 % obtaining the final result of 54 %. They used the following parameters: tumor size, number of positive regional lymph nodes, and distant metastasis. Domingos [8] used a breast cancer database from UCI repository for classifying survival of patients using the unification of two widely used empirical approaches: rule induction and instance-based learning. In 2000, Boros et al. [9] used the logical analysis of data method to predict the nature of the tumor: malignant or benign. Breast Cancer (Wisconsin) database was used for this purpose. The classification capacity was 97.2 %. This database was used by Street and Kim [10] who combined several classifiers to create a high-scale classifier. Also, it was used by Wang and Witten [11]; they presented a general- modeling method for optimal probability prediction over future observations, and they obtained the 96.7 % of classification. Huang et al. [12] construct a classifier with the mini max probability machine (MPM), which provides a worst-case bound on the probability of misclassification of future data points based on reliable estimates of means and covariance matrices of the classes from the training data points. They used the same database utilized by Domingos. The classification capacity was of 82.5 %. The classification capacity was of 82.5 %. In 2012 Senapati et. Al [13] used K-PSO to predict nature of the tumor: malignant or benign. The classification accuracy in this case is 96.43. The method proposed is to classify the breast cancer data, which has been downloaded from University of Wisconsin Hospital, Madison. Basically the objective of this prediction technique is to assign patient to either a βbenignβ group that does not have breast cancer or to a βmalignantβ group that has strong evidence of breast cancer. This paper a local linear radial basis function neural network (LLRBFNN) is proposed for breast cancer detection, in which the connection weights between the hidden layer units and output units are replaced by a local linear model and the parameters of the network are updated using back propagation. Simulation results for Breast cancer Pattern Classification problem was compared with some common classification techniques. The result thus derived shows the effectiveness of the proposed method. The rest of the paper is organized as follows. The LLRBFNN is described in Sec. 2. Radial basis function is explained in Sec.3. MLP algorithm for training LLRBFNN is presented in Sec.4. A short discussion as well as experimental results obtained on pattern classification for Wisconsin Breast Cancer (WBC) problem is given in Sec. 5.Finally, concluding remarks is derived in the last section i.e. Sec. 6. 2. Local linear Radial Basis Function neural network Local liner radial basis function neural network in fact is a modification of RBFNN. Because in RBFNN if we use the local liner summation in the last layer, learning parameters will be increased but mapping ability of neural network will be improved i.e. Performance (accuracy) will be increased. And it will create a local linear model. RBFNN with local linear model in the hidden layer is called LLRBFNN. The motivations for introducing the local linear models into a RBFNN are as follows: local linear models should provide a more parsimonious interpolation in high-dimension spaces when modeling samples and sparse and converges faster than RBFNN with fewer epochs. The local linear model parameters are randomly initialized at the beginning and are optimized by multilayer perceptron (MLP) algorithm discussed in the section 4. LLRBFNN is characterized by localized activation of the hidden layer units. The connecting weights associated with the hidden nodes can be viewed as locally accurate linear model. The mapping function of the network with Gaussian activation function and weighted linear summation in output neuron is given by: π π(π₯) = β π£π π§π (π₯) (1) π=1 where π₯ = [π₯1 , π₯2 , β¦ β¦ β¦ . π₯π ] is the no. of inputs π£π = ππ0 + ππ1 π₯1 + β¦ β¦ β¦ + πππ π₯π ππ represents weights between hidden to output. β||πβ πΆπ ||2 π§π (π₯) = ππ₯π ( 2ππ2 ) is the activation function of ith hidden neurons ππ is centers of ith activation function; ππ is the parameter for controlling the smoothness of the activation function named as spread. ||π₯ β ππ || indicates the Euclidean distance between the inputs and the centers. π₯1 π₯2 π£1 π£2 π£3 π(π) π₯3 π£π π₯π Fig-1 A local linear radial basis function neural network Eq-(1) is a family of functions generated from one single function π(π₯) by the operation of centers and weights, which is localized. The activities of the linear models ππ (i=1,2β¦..n) are determined by the associated locally significant . Obviously, the localization of the ith units of the hidden layer is determined by the spread parameter ππ and the center parameter ππ . According to the previous researchers, the two parameters can be either is predetermined based on the radial basis function transformation theory or to the determined by a training algorithm. The computational steps involved in implementing the LLRBFNN for classification are: 1. For untrained inputs, initialize the centers(c, weights (w) and spreads (π). π = πππππ‘ , π = πππππ‘ , πππ π = πππππ‘ (Initialization) 2. Update the LLRBFNN centers and weights using Error back propagation. 3. Calculate error mean , 4. Convergence test for centers and weights. The centers and weights are updated one by one in iteration, i.e. by new training input to the LLRBFNN. The spread of LLRBFNN is taken as 1, it is not updated throughout the program. An intrinsic feature of the radial basis function networks is the localized activation of the hidden layer units, so that the connection weights associated with the units can be viewed as locally accurate piecewise constant models whose validity for a given input is indicated by the activation functions. Compared to the multilayer perceptron neural network, this local capacity provides some advantages such as learning efficiency and the structure transparency. However, the problem of basis function networks is also led by it. Due to the crudeness of the local approximation, a large number of basis function units have to be employed to approximate a given system. A shortcoming of the LLRBFNN is that for higher dimensional problems many hidden layer units are needed. 3. Radial Basis Function neural network π₯1 π₯2 π₯3 ππ π1 ππ π2 π₯4 π€1 π€2 y π€β πβ ππ π₯π Input Layer Layer Hidden Layer Output Fig- 2 Radial basis function neural network The above figure of radial basis function describes the followings: ο§ ο§ ο§ Input layer: no activation function (no calculation at the level of this layer) Hidden layer: Gaussian function Output layer: linear function The hidden neurons give a set of functions that constitute an arbitrary basis for the input patterns. Hidden neurons are known as radial centers and represented by the vectorsπ1 , π2 π3 β¦ πβ Dimension of the center is depending upon number of inputs and number hidden neurons. Number of hidden neurons is not fixed (ranged between numbers of output neurons to the number of inputs) . Transformation from input space to hidden unit space is nonlinear whereas transformation from hidden unit space to output space is linear. The radial basis function in the hidden layer produces a significant non-zero response only when the input falls within a small localized region of the input space. An input vector π₯π which lies in the receptive field for center ππ would activate cj and by proper choice of weights the target output is obtained. The output is given as π¦ = βππ=1 ππ ππ (2) Where π€π : Weight of ith center, π: some radial basis function (Gaussian function) 2 2 Gaussian Radial Function: π(π§) = π βπ§ β2π Where, And π§ = (||π₯ β ππ ||) ||π₯ β ππ || = ββππ(π₯π β ππ )2 (3) (4) (5) π = constant. Fig-3 Information about Spreads As shown in fig. 3 spread is the distance or width of the curve. Spread should not be so large that each neuron is effectively responding in the same large area of the input space. 3.1 LEARNING IN RBFNN Training in RBFN requires optimal selection of the parameters ππ and π€π , where i=1, 2, 3β¦ h. There are several techniques used to update the weights and centers (Pseudo-Inverse technique, Hybrid Learning and Gradient Descent Learning). Here Gradient Descent Learning technique is used. The technique is a supervised training by error correcting term. The learning steps are as follows: 1. 2. The width(π) is fixed according to the spread of centers From fig.2 π = [π1, π2, π3,β¦.. , πβ ] π = [π1, π2, π3,β¦.. , πβ ] ππ = π¦ π 3. 4. 5. 6. is the desired output The update rule for center learning is : ππΈ πππ (π‘ + 1) = πππ (π‘) β π1 , π, π = 1 π‘π β ππππ The weight update law is: ππΈ ππ (π‘ + 1) = ππ (π‘) β π2 πππ 8. (7) 1 Cost function is : πΈ = β(π¦ π β π¦)2 2 Differencing E w.r.t πππ , we get ππΈ ππΈ ππ¦ πππ = Ξ§ Ξ§ ππππ ππ¦ πππ ππππ = β(π¦ π β π¦) Ξ§ ππ Ξ§ 7. (6) Now, πππ ππ§π = β And, ππ§π ππππ = π§π π2 π ππππ πππ ππ§π (8) Ξ§ ππ§π ππππ ππ 2 1β 2 (β(π₯π β πππ ) ) = β (π₯π β πππ )βπ§π After simplification , the update rule for center learning is : ππ πππ (π‘ + 1) = πππ (π‘) β π1 (π¦ π β π¦)ππ 2 (π₯π β πππ ) π Where π1 = center learning rate. The update rule for linear weight is : ππ (π‘ + 1) = ππ (π‘) β π2 (π¦ π β π¦)ππ Where π2 = weight learning rate (9) (10) Where ππ is the radial basis activation function of ith unit of the hidden layer and ππ is the weight connecting the ith unit of the hidden layer to the output layer unit. 4. Multi Layer Perceptron(MLP) Fundamentals Multi layer perceptron (MLP) is a recent nature inspired technique that has been used for solving non-linear optimization problems. A multilayer perceptron (MLP) is a kind of feed forward artificial neural network, which is a mathematical model inspired by the biological neural network. The multilayer perceptron can be used for various machine learning tasks such as classification. The size of input layer and output layer determines what kind of data a MLP can accept. Specifically, the number of neurons in the input layer determines the dimensions of the input feature; the number of neurons in the output layer determines the dimension of the output labels. After a lower most input layer there are usually any numbers of intermediate, or hidden, layers followed by an output layer at the top. Weights measure the degree of correlation between the activities levels of neurons that they connect. In this paper, Error back propagation,, which is described for numerical optimization problems, is applied to optimize the parameters of LLRBFNN. 4.1 Multi layer perceptron algorithm The basic component of a multilayer perceptron is the neuron. In this case the neurons are aligned in layers and in any two adjacent layers the neurons are connected in pairs with weighted edges. Multilayer perceptron consists of at least three layers of neurons, including one input layer, one or more hidden layers, and one output layer. In hidden layer, the number of neurons is a design issue. If the neurons are too few, the model will not be able to learn complex decision boundaries. On the contrary, too many neurons will decrease the generalization of the model. MLP is used feeding the input features to the input layer and get the result from the output layer. The results are calculated in a feed-forward approach, from the input layer to the output layer. One step of feed-forward is illustrated in the following figure. πΏπ 1 β1 Inputs πΏπ β2 π(π₯) πΏπ β ππ π»π π(π₯) Output βπ πΏπ Hidden Layer Fig-4 Network architecture of MLPs For each layer except the input layer, the value of the current neuron is calculated by taking the linear combination of the values output by the neurons of the previous layer, where the weight determines the contribution of a neuron in the previous layer to current neuron. Obtaining the linear combination result, a non-linear function is used to constrain the output into a restricted range .Typically, sigmoid function or tanh ( ) functions are used. - Input layer: no activation function (no calculation at the level of this layer) - Hidden layer: simplified hyperbolic tanh( ) function. H= tanh(weights*inputs) π π»π = π‘ππβ (β πππ π₯π ) (11) π=1 Output layer: simplified tanh function O= tanh (Bias+β (Weights * Hidden values)) π ππ = π‘ππβ (π0 + β π»π ππ ) (12) π=1 Where ππ = weight, π₯π= input , π0 = bias The proposed model takes training set data after normalization it passes through the network model then after several epochs (iterations) when the MSE drop towards zero the training is finished then the test dataset is passed through the network to check the model validity and finally the classified and misclassified classes are obtained including the classification graph A neural network learns by continuously adjusting the synaptic weights at the connections between layers of neurons until a satisfactory response is produced. In the present work, the MLP network was applied to estimate outputs based on an analysis of the data captured by the inputs. The weight readjustment method employed was back propagation, which consists of propagating the mean squared error .generated by each layer of neurons, readjusting the weights of the connections so as to reduce the error in the next iteration. 4.2 The weights updating equation The equations are different for the weights used in input layer to hidden layer and hidden layer to output layer. The rule to update the weights from input layer to hidden layer is given by: ππΈ π(π β ππ’π‘)2 = πππ πππ After derivation the generalized equation to update the weights of input to hidden layer is given by the equation: π(π‘+1) = ππ‘ + 2π π(1 β ππ’π‘ 2 )βπ (13) The rule to update the weights from hidden layer to output layer is given by: ππΈ = β2 π (1 β ππ’π‘ 2 )(1 β βπ 2 )ππ ππππ After derivation the generalized equation to update the weights of hidden layer to output layer is given by: πππ(π‘+1) = πππ + 2 π (1 β ππ’π‘ 2 )(1 β βπ )ππ (14) Although for approximating non-linear input-output mappings, the RBF networks can be trained much faster, MLPs may require a smaller number of parameters. 5. Discussion In order to evaluate the performance of the algorithms, Wisconsin dataset is used. First the training data are fed to the neural network models (LLRBFNN, RBFNN and MLP) and back propagation technique is used for optimization of the parameters. The initial values of the centers are initialized to random values and weights of the network are initialized to ones in RBFNN and LLRBFNN, but in MLP weights are taken randomly. During the training process the centers and weights are updated once after application of each input pattern. The cost function is taken to be error function. The training of the networks is continued to 200 epochs. During the training process at some point both center and weights are converged. The performance of the network is judged by mean square error method. 5.1 Normalization We apply the local linear radial basis function neural network explained in Sec. 2 to Wisconsin Breast Cancer (WBC) databases and compare its performances to the most common classification methods in both computer science and statistics literatures. All the computations are implemented using MATLAB v 6.5, under dual core personal computer with a clock speed of 2.4GHz. As commonly done, we normalize the input variables to make sure that they are independent of measurement units. Thus the predictors are normalized in the interval of (0:1) using the formula: π₯ πππ€ = ππππ β ππππ ππππ₯ βππππ (15) We have taken take 70% of the total patterns at a random from 699 patterns for training purpose. The new technique LLRBFNN is compared with two other classifiers to evaluate its performance with respect to correct classification rate and the time it takes to get trained. 5.2 Wisconsin breast cancer (WBC) The data set were obtained from University of Wisconsin Hospital; Madison WBC is a nine-dimensional data set with the following features: (i) Clump thickness;(ii) Uniformality of cell size;(iii) Uniformality of cell shape;(iv) Marginal adhesion;(v) Single epithelial cell size;(vi) Bare nuclei;(vii) Bland chromatin;(viii) Normal nucleoli; and (ix) Mitoses. For our classification purpose 489 exemplars were used for training and the rest 210 exemplars for testing. 5.3 Results and Comparison Tables 1, 2 and 3 of confusion matrix and Figures 5, 6 and 7 of convergence of center and weight, Figures 8, 9, 10 of mean square error, and Figures 11, 12, 13 of classification results obtained by local linear radial basis function neural network (LLRBFNN), radial basis function neural network (RBFNN) and multi layer perceptron (MLP) respectively, we observe the followings: 1. 2. 3. 4. 5. LLRBFNN method both center and weight convergence is far better than RBFNN method MLP and Radial basis function neural network shows worst accuracy. The LLRBFNN is giving the highest values of the accuracy with same number of iterations executions. The accuracy of the proposed technique for malignant type cancer is almost 99.59%. The overall percentage of accuracy of LLRBFNN is 98.00. We draw our conclusion in Sec. 6 by utilizing the useful information shown in tables 1, 2 and 3. Table.1 The confusion matrix obtained from LLRBFNN Benign Benign 445 Malignant 17 %of classification 97.16 Overall classification percentage is Malignant 8 240 99.59 98.00% Table.2 The confusion matrix obtained from RBFNN Benign Malignant Benign 443 16 Malignant 30 %of classification 96.72 Overall classification percentage is 240 99.59 97.71% Table.3 The confusion matrix obtained from MLP Benign Benign Malignant %of classification Overall classification percentage is 97.00% Malignant 442 5 16 236 96.51 97.93 Fig-5(a) Convergence of Centers in LLRBFNN Fig-5(b) Convergence of weights in LLRBFNN Fig-6(a) Convergence of centers in RBFNN Fig-6(b) Convergence of weight in RBFNN Fig-7(a) Convergence of weights from input to hidden layer in MLP Fig-7(b) Convergence of weight from hidden to output layer in MLP Fig- 8 Mean square error of LLRBFNN Fig-9 Mean square error of RBFNN Fig-10 Mean square error of MLP Fig-11 Classification (LLRBFNN) Fig-12 Classification (RBFNN) Fig-13 Classification (MLP) 5.4 Comparative study with other techniques The proposed technique is compared with RBFNN and MLP technique. The comparison is depicted in Fig-14. The result of the comparison shows that the proposed technique gives better classification as compared to some of the existing techniques. Percentage of Classification 98.2 98 97.8 97.6 97.4 97.2 97 96.8 96.6 96.4 LLRBFNN RBFNN Techniques MLP Fig-14 Percentage of classification obtained from different techniques 6. Conclusion Even though mammography is one of the best techniques for breast cancer detection, but in some cases, despite their experience radiologist canβt detect cancers. There computer aided methods like the one presented in this paper could assist medical staff and increase the accuracy of detection. Statistics shows that only 20% - 30% breast cancer cases are cancerous. In case of a false negative detection the cancer remains undetected that could lead to higher cost or even to the cost of a human life. Here is the trade of that motivated us to develop a classification system. In this paper we presented a technique for breast cancer classification using LLRBFNN. The technique provides an overall classification of 98.00%. The classification accuracy for malignant cancer is 99.59%, which is encouraging but accuracy for benign cancer is 97.16%, i.e. the false positive detection is high, which may cause an unnecessary hardship for the patient. The technique was compared with different methods already developed. We could empirically say that the proposed approach has better performance, high quality and generalization of common existing approaches. However more work is needed in evaluating the performance of the proposed method on other medical and / or other science or business databases. It is well known that data mining techniques are more suitable to larger databases. We intend to use larger database, from medical science and / or business sector to evaluate the performance of the technique. Also the technique needs to be evaluated using time series data to validate the findings. References 1. 2. 3. 4. 5. 6. 7. Antonie M-L, Zaine OR, Coman A (2011) Application of data mining techniques for medical image classification. In: Proceedings in the 2nd international workshop on multimedia data mining. (MDM/KDDβ2001), San Francisco, USA, August 26th, pp 94-101. Lai SM, Li X, Bischof WF (1989) on techniques for detecting circumscribed masses in mammograms. IEEE , Trans Med Imaging 8(4):377-386 Senapati, M. R.; Mohanty, A. K.; Dash, S.; Dash, P. K.(2013) Local linear wavelet neural network for breast cancer recognition. Neural computing and Applications vol. 22 issue January 1st p. 125 - 131, DOI: 10.1007/s00521-011-0670-y Karabatak M, Ince M(2009). An expert system for detection of breast cancer based on association rules and neural network. Expert Syst Appl 36((2 part 2)):3465β3469. Helvie MA, Hadjiiski LM, Makariou E, Chan HP, Petrick N, Sahiner B, Lo SCB, Freedman M, Adler D, Bailey J, et al(2004). Sensitivity of noncommercial computer-aided detection system for mammographic breast cancer detectionβA pilot clinical trial. Radiology.;231:208β214 Kobatake H, Murakami M, Takeo H, Nawano S(1999). Computer detection of malignant tumors on digital mammograms. IEEE Trans Med Imaging.18:369β378. Burke HB, Goodman PH, Rosen DB et al (1997) Artificial neural networks improves the accuracy of cancer survival prediction. Cancer 79(4):857β862 8. 9. 10. 11. 12. 13. Domingos P (1996) Unifying instance-based and rule-based induction. Mach Learn 24(2):141β168 Boros E, Hammer P, Ibaraki T (2000) An implementation of logical analysis of data. IEEE Trans Knowl Data Eng 12(2): 292β306 Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining (KDD β01), ACM, San Francisco, CA, pp 377β382 Wang Y, Witten IH (2002) Modeling for optimal probability prediction. In: Proceedings of the 9th international conference on machine learning (ICML β02), pp 650β657 Huang K, Yang H, King I (2004) Biased mini-max probability machine for medical diagnosis. In: Proceedings of the 8th international symposium on artificial intelligence and mathematics (AIM β04), Fort Lauderdale, FL M. R. Senapati β’ G. Panda β’ P. K. Dash (2012) Hybrid approach using KPSO and RLS for RBFNN design for breast cancer detection, Neural Computing and Application, Springer, DOI 10.1007/s00521-012-1286-6