Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Early Heart disease Prediction System Mohamad akram1, Santhi k2, Sriramaneni raviteja3 1 B-Tech School of Computer Science and Engineering, VIT University, Vellore, [email protected] 2 Associate Professor, School of Computer Science and Engineering, VIT University, Vellore, [email protected] 3 B-Tech School of Computer Science and Engineering, VIT University, Vellore, [email protected] _____________________________________________________________________________________ ABSTRACT Large amount of healthcare related data is being collected by the health care industry, these data woefully aren’t mined for decision making of the hidden information .The adoption of various technology tools is increasing day by day in the medical field. There are various applications of data mining approaches in the health care sector, which includes the detection of false insurance claims, prediction of disease to prevent deaths or fatal diseases. In this Paper, we have examined different approaches of data mining for prognosis of heart disease. According to a WHO 2012 study the heart disease results in 7.4 million deaths globally, Early Heart Disease Prediction system could be solution to minimize the deaths since it predicts the occurrence of the disease in advance. Real world data have been used for this research which is being collected and stored in the database .EHPS is computer based user friendly application which has been developed using front-end web services like HTML , CSS and backend services like PHP and MYSQL. Keywords: healthcare; data mining; artificial neural network; naïve bayes; EHPS. ------------------------------------------------------------------------------------------------------------------------------------------------- 1. INTRODUCTION The health care industry generally has the high amount of information yet the knowledge discovered from the available data is poor. There is abundance of information accessible inside the human services frameworks. There is a lack of effective or accurate analytical tools that are available in the market for finding out the trends in the data. The knowledge discovery and the classification of the data has various amounts of applications in many sector fields like health care, agriculture, business etc. The general objective of the information mining procedure is to concentrate data from an informational collection and change it into a reasonable structure for further use [10]. Aside from the intense investigation step, it includes database and information administration angles, information pre-handling, model and derivation contemplations, intriguing quality measurements, unpredictability contemplations, post-preparing of found structures, representation, and web based updating. Information mining is the examination venture of the "learning disclosure in databases" process, or KDD. Early heart disease prediction system aims to decrease the deaths caused by heart disease i.e. the mortality rate caused, by providing caution to the user/patient about the probability of getting heart disease prior to notice before it’s too late to get treated by a medical practitioner. The main objective is to develop a system for early heart disease prediction using data mining technique like naïve bayes algorithm. This system EHPS can extract hidden patterns associated with heart disease from a big database. It can answer complex queries for diagnosing heart disease and thus help doctors or experts to make crucial and clinical decisions which other systems won’t support. It can also reduce operation costs when used effective treatments. To enhance visualization and for easy interpretation. Fig-1: implementation of EHPS. 2. LITERATURE SURVEY Data mining extracts useful data within a large collection of database which is usually hidden. Each data mining technique serves a different purpose depending on the modeling objective, the two main objectives are classification and prediction, we are here working on prediction using classified data.[7] There are 3 main classifications: These algorithms include classification and regression trees (CART), iterative dichotomiser 3(ID3) and C4.5.The only difference between these kind of algorithms differ in selection of splits, when to stop a node from splitting and assignment of class to non-split node. Decision trees can also handle continuous data (only from categorical data). The algorithm for C4.5 is: 1. Find the base cases. 2. For each attribute, find the normalized information gained from splitting. 3. Name the highest normalized information gain attribute. 4. Create a decision node that splits on from the previous statement. 5. The sub lists obtained by splitting on nodes repeatedly are added as children of node. [16] An artificial neural network (ANN) which is often just called a neural network is a computational model (mathematical) based on biological neural networks otherwise, is an emulation of biological neural system. It consists of an interconnected group of artificial neurons and it processes information using a connectionist approach to computation. In most cases, an ANN is an adaptive system that changes its structure based on internal or external information that flows through the network during the learning phase. [12] Supervised Learning: In supervised learning, the network is trained by providing it with input and output matching patterns. These input-output pairs can be provided by an external agent, or by the system which contains the neural network. [7] Unsupervised Learning: In unsupervised learning an output unit is trained to respond to clusters of pattern within the input. In this paradigm, the system is supposed to discover statistically its salient features of the input population. Unlike the previous learning paradigm (supervised), there is no apriori set of categories into which the patterns are destined to be classified; rather a new representation of the input stimuli is developed by the system. [7] Reinforcement Learning: This type of learning may be considered as an intermediate form of the above two learning types. Here the learning machine does action on the environment and gets a feedback response from the environment. The learning system grades its action whether good or bad based on the environmental response and accordingly adjusts the parameters. [7] Naïve Bayes algorithm: Naive Bayes algorithm or Bayes’ Rule is the basis for many machine-learning and data mining methods. This rule (algorithm) is used to create models with predictive capabilities. It provides new ways of understanding and exploring data. [12] Since there are many techniques to achieve the objectives of this research, we weigh the pros and cons of every technique respective to the other. Data mining tasks: Predictive model Descriptive model The main tasks under predictive model are: Classification, Regression, Time series analysis, Prediction 3. RESEARCH METHODOLOGY The fundamental target of this paper for Prediction of Heart Attack System is to build up a Computer Based-Medical Decision Support System utilizing the naïve bayes data mining model. It is implemented as a web-based system, this system or framework ensures swiftness and simple diagnosis and making decisions. Through the appropriate responses given by the user, the framework creates result whether the patient is having coronary illness or not besides ensuring the quality support to the user. Naive Bayes or Bayes’ Rule is the basic principle for multiple existing data mining and machine learning methods. With the predictive capabilities the algorithm is used to create the models. It gives better approaches for investigating and understanding data. [10] Advantages of naïve bayes to other models, 1) When data is huge in numbers 2) If the attributes used in the algorithm are interdependent from one another. 3) To require efficient results compared to other existing data mining models. [5] Fig-3: implementation of EHPS. [5] NAIVE BAYES ALGORITHM Retrieve all the data from DB associated with the class label, each record associated with the n dimensional attribute vector X = (x1, x2, x3 ...) Let DS be a dataset which contain the frequency table associated with the test data Using bayes theorem posterior probability will be calculated P(A|x), from P(A), P(x), and P(x|A). and value which predicated will be independent of the value of other predictors .[5] P(A|x) P(A) P(x|A) P(x) = succeeding probability of class given attribute. = antecedent probability of a class. = likelihood = antecedent probability of predictor or class Probability of each attribute will be calculated based on the above equation and the maximum value belongs to the predicated class. [2] The time complexity of the algorithm O (n). 4. DATA SOURCE Databases related to medical field containing huge amount of information related to patient and their medical health conditions have been collected by different organizations and the data is made public for use of the common people. The data records along with the medical attributes have been acquired from the web source https://archive.ics.uci.edu/ml/datasets/Heart+Disease. Total of 207 records of Cleveland are present in the database ,the “Diagnosis” attribute is the predictable attribute with value “0” for patients with no heart disease and values “1,2,3,4 ” as the presence of the heart disease in the patient. “Id” i.e. patient id is used as the key and the remaining are the input attributes that are to be inputted by the user. It is presumed that all the problems such as inconsistent data, duplicate data and the missing data have been rectified. [13] Fig-2: database attributes information. [13] 5. RESULTS Data set: Fig-4: Data set of the Heart Disease Prediction System The fig 4 shows the dataset of heart disease database. Input Process: Fig-5: Input Process of the Heart Disease Prediction System Fig-6: Continuation of Input Process of the Heart Disease Prediction System The fig 5 and fig 6 shows the input process of the EHPS system, when the user inputs the details of the above required attributes mentioned in the data source section and submits the form in the page the prediction is done using the naïve bayes algorithm and the prediction result is generated to the user as a report. 6. CONCLUSION Prediction of the heart Disease simpler and fast was the main planned objective of the paper. Reliable data mining methods are used to access the patient information which is available. EHPS (Early Heart Disease PredictionSystem) is developed using the naïve bayes classification algorithm as a computer web-based decision support system for the forecasting the occurrence of the heart attack. From the heart disease database the hidden knowledge have been extracted .EHPS finds solutions with accurate results even for the tough queries, it predicts the possibility of occurrence of the heart disease of a person with at most accuracy. The result is displayed to the user as a report. EHPS gives the physicians a second opinion about their patient .EHPS can further be improved not just for coronary heart diseases it can be used to predict many diseases like tumor, cancer, diabetes etc. features like medical hospitals uploading the useful data can also be included in the EHPS. REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. Han, J., Kamber, M.: “Data Mining Concepts and Techniques”, Morgan Kaufmann Publishers, 2006 Priyanka.N, Dr.Pushpa Ravikumar, “Computer Based-Medical Decision Support System for prediction of Heart attack using Data mining techniques”, International Journal Of Advanced Research In Computer and Communication Engineering, Vol. 5, Issue 4, April 2016. Shweta Kharya, “Using Data Mining Techniques for Diagnosis and Prognosis of Cancer Disease”, International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 2, No. 2, April 2012. M. Durairaj, V. Ranjani, “Data Mining Applications in Health care sector: A study”, International Journal Of Science and Technology Research Volume 2, Issue 10, October 2013. G.Subbalakshmi , Ramesh, M, Chinna Rao ,” Decision Support in Heart Disease Prediction System using Naive Bayes”, ISSN : 0976-5166 Vol. 2 No. 2 Apr-May 2011. Ruben D. Canlas Jr., MSIT, MBA, “Data Mining in Health Care: Current applications and issues”, Center for conscious living Foundation Inc., 5 August, 2009. Ho, T.J.: “Data Mining and Data Warehousing”, Prentice Hall, 2005. Obenshain, M.K: “Application of data Mining Techniques to Healthcare Data”, Infection Control and Hospital Epidemiology, 25(8), 690–695, 2004. Sellappan Palaniappan, Rafiah Awang “Intelligent Heart Disease Prediction System Using Data Mining Techniques”, IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008. K.Srinivas B.Kavihta Rani Dr. A.Govardhan Associate Professor, Dept. of CSE Principal and professor of CSE “Applications of Data Mining Techniques in Healthcare and Prediction of heart attacks”-(IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 02, 2010, 250-255. Kaur, H., Wasan, S. K.: “Empirical Study on Applications of Data Mining Techniques in Healthcare”, Journal of Computer Science 2(2), 194-200, 2006. Tang, Z. H., MacLennan, and J.: “Data Mining with SQL Server 2005”, Indianapolis: Wiley, 2005. Sellappan, P., Chua, S.L.: “Model-based Healthcare Decision Support System”, Proc. Of Int. Conf. on Information Technology in Asia CITA’05, 45-50, Kuching, Sarawak, Malaysia, 2005 Thuraisingham, B.: “A Primer for Understanding and Applying Data Mining”, IT Professional, 28-31, 2000. https://archive.ics.uci.edu/ml/datasets/Heart+Disease Badr hssina, Abdelkarim merbouha, Hanane Ezzikouri Mohammed Erritali ,” A comparative study of decision tree ID3 and C4.5 , (IJACSA) International Journal of Advanced Computer Science and Applications”, Special Issue on Advances in Vehicular Ad Hoc Networking and Applications. Monika Gandhi , Dr. Shailendra Narayan Singh, “Predictions in Heart Disease Using Techniques of Data Mining”1st International Conference on Futuristic trend in Computational Analysis and Knowledge Management (ABLAZE-2015). 18. K.Sudhakar, Dr. M. Manimekalai, “ Study of Heart Disease Prediction using Data Mining”, International Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 1, January 2014. 19. Deepali Chandna, “Diagnosis of Heart Disease Using Data Mining Algorithm”, 1678-1680, (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (2) , 2014. 20. Deepali Chandna, “Diagnosis of Heart Disease Using Data Mining Algorithm”, 1678-1680, (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (2) , 2014.