Download secure and proficient web search using string matching

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Early Heart disease Prediction System
Mohamad akram1, Santhi k2, Sriramaneni raviteja3
1
B-Tech School of Computer Science and Engineering, VIT University, Vellore,
[email protected]
2
Associate Professor, School of Computer Science and Engineering, VIT University, Vellore,
[email protected]
3
B-Tech School of Computer Science and Engineering, VIT University, Vellore,
[email protected]
_____________________________________________________________________________________
ABSTRACT
Large amount of healthcare related data is being collected by the health care industry, these data woefully
aren’t mined for decision making of the hidden information .The adoption of various technology tools is
increasing day by day in the medical field. There are various applications of data mining approaches in the
health care sector, which includes the detection of false insurance claims, prediction of disease to prevent
deaths or fatal diseases. In this Paper, we have examined different approaches of data mining for prognosis of
heart disease. According to a WHO 2012 study the heart disease results in 7.4 million deaths globally, Early
Heart Disease Prediction system could be solution to minimize the deaths since it predicts the occurrence of
the disease in advance. Real world data have been used for this research which is being collected and stored in
the database .EHPS is computer based user friendly application which has been developed using front-end
web services like HTML , CSS and backend services like PHP and MYSQL.
Keywords: healthcare; data mining; artificial neural network; naïve bayes; EHPS.
-------------------------------------------------------------------------------------------------------------------------------------------------
1. INTRODUCTION
The health care industry generally has the high amount of information yet the knowledge discovered from the
available data is poor. There is abundance of information accessible inside the human services frameworks. There
is a lack of effective or accurate analytical tools that are available in the market for finding out the trends in the data.
The knowledge discovery and the classification of the data has various amounts of applications in many sector fields
like health care, agriculture, business etc. The general objective of the information mining procedure is to
concentrate data from an informational collection and change it into a reasonable structure for further use [10].
Aside from the intense investigation step, it includes database and information administration angles, information
pre-handling, model and derivation contemplations, intriguing quality measurements, unpredictability
contemplations, post-preparing of found structures, representation, and web based updating. Information mining is
the examination venture of the "learning disclosure in databases" process, or KDD.
Early heart disease prediction system aims to decrease the deaths caused by heart disease i.e. the mortality
rate caused, by providing caution to the user/patient about the probability of getting heart disease prior to notice
before it’s too late to get treated by a medical practitioner. The main objective is to develop a system for early heart
disease prediction using data mining technique like naïve bayes algorithm. This system EHPS can extract hidden
patterns associated with heart disease from a big database. It can answer complex queries for diagnosing heart
disease and thus help doctors or experts to make crucial and clinical decisions which other systems won’t support. It
can also reduce operation costs when used effective treatments. To enhance visualization and for easy interpretation.
Fig-1: implementation of EHPS.
2. LITERATURE SURVEY
Data mining extracts useful data within a large collection of database which is usually hidden. Each data mining
technique serves a different purpose depending on the modeling objective, the two main objectives are classification
and prediction, we are here working on prediction using classified data.[7]
There are 3 main classifications:
These algorithms include classification and regression trees (CART), iterative dichotomiser 3(ID3) and C4.5.The
only difference between these kind of algorithms differ in selection of splits, when to stop a node from splitting and
assignment of class to non-split node. Decision trees can also handle continuous data (only from categorical data).
The algorithm for C4.5 is:
1. Find the base cases.
2. For each attribute, find the normalized information gained from splitting.
3. Name the highest normalized information gain attribute.
4. Create a decision node that splits on from the previous statement.
5. The sub lists obtained by splitting on nodes repeatedly are added as children of node. [16]
An artificial neural network (ANN) which is often just called a neural network is a computational model
(mathematical) based on biological neural networks otherwise, is an emulation of biological neural system. It
consists of an interconnected group of artificial neurons and it processes information using a connectionist approach
to computation. In most cases, an ANN is an adaptive system that changes its structure based on internal or external
information that flows through the network during the learning phase. [12]
Supervised Learning: In supervised learning, the network is trained by providing it with input and output
matching patterns. These input-output pairs can be provided by an external agent, or by the system which contains
the neural network. [7]
Unsupervised Learning: In unsupervised learning an output unit is trained to respond to clusters of pattern
within the input. In this paradigm, the system is supposed to discover statistically its salient features of the input
population. Unlike the previous learning paradigm (supervised), there is no apriori set of categories into which the
patterns are destined to be classified; rather a new representation of the input stimuli is developed by the system. [7]
Reinforcement Learning: This type of learning may be considered as an intermediate form of the above two
learning types. Here the learning machine does action on the environment and gets a feedback response from the
environment. The learning system grades its action whether good or bad based on the environmental response and
accordingly adjusts the parameters. [7]
Naïve Bayes algorithm: Naive Bayes algorithm or Bayes’ Rule is the basis for many machine-learning and data
mining methods. This rule (algorithm) is used to create models with predictive capabilities. It provides new ways of
understanding and exploring data. [12]
Since there are many techniques to achieve the objectives of this research, we weigh the pros and cons of every
technique respective to the other.
Data mining tasks:


Predictive model
Descriptive model
The main tasks under predictive model are:




Classification,
Regression,
Time series analysis,
Prediction
3. RESEARCH METHODOLOGY
The fundamental target of this paper for Prediction of Heart Attack System is to build up a Computer Based-Medical
Decision Support System utilizing the naïve bayes data mining model. It is implemented as a web-based system, this
system or framework ensures swiftness and simple diagnosis and making decisions. Through the appropriate
responses given by the user, the framework creates result whether the patient is having coronary illness or not
besides ensuring the quality support to the user.
Naive Bayes or Bayes’ Rule is the basic principle for multiple existing data mining and machine learning methods.
With the predictive capabilities the algorithm is used to create the models. It gives better approaches for
investigating and understanding data. [10]
Advantages of naïve bayes to other models,
1) When data is huge in numbers
2) If the attributes used in the algorithm are interdependent from one another.
3) To require efficient results compared to other existing data mining models. [5]
Fig-3: implementation of EHPS. [5]
NAIVE BAYES ALGORITHM



Retrieve all the data from DB associated with the class label, each record associated with the n dimensional
attribute vector X = (x1, x2, x3 ...)
Let DS be a dataset which contain the frequency table associated with the test data
Using bayes theorem posterior probability will be calculated P(A|x), from P(A), P(x), and P(x|A). and value
which predicated will be independent of the value of other predictors .[5]
 P(A|x)
 P(A)
 P(x|A)
 P(x)
= succeeding probability of class given attribute.
= antecedent probability of a class.
= likelihood
= antecedent probability of predictor or class
Probability of each attribute will be calculated based on the above equation and the maximum value belongs to
the predicated class. [2]
The time complexity of the algorithm O (n).
4. DATA SOURCE
Databases related to medical field containing huge amount of information related to patient and their medical health
conditions have been collected by different organizations and the data is made public for use of the common people.
The data records along with the medical attributes have been acquired from the web source
https://archive.ics.uci.edu/ml/datasets/Heart+Disease. Total of 207 records of Cleveland are present in the database
,the “Diagnosis” attribute is the predictable attribute with value “0” for patients with no heart disease and values
“1,2,3,4 ” as the presence of the heart disease in the patient. “Id” i.e. patient id is used as the key and the remaining
are the input attributes that are to be inputted by the user. It is presumed that all the problems such as inconsistent
data, duplicate data and the missing data have been rectified. [13]
Fig-2: database attributes information. [13]
5. RESULTS
Data set:
Fig-4: Data set of the Heart Disease Prediction System
The fig 4 shows the dataset of heart disease database.
Input Process:
Fig-5: Input Process of the Heart Disease Prediction System
Fig-6: Continuation of Input Process of the Heart Disease Prediction System
The fig 5 and fig 6 shows the input process of the EHPS system, when the user inputs the details of the above
required attributes mentioned in the data source section and submits the form in the page the prediction is done using
the naïve bayes algorithm and the prediction result is generated to the user as a report.
6. CONCLUSION
Prediction of the heart Disease simpler and fast was the main planned objective of the paper. Reliable data mining
methods are used to access the patient information which is available. EHPS (Early Heart Disease
PredictionSystem) is developed using the naïve bayes classification algorithm as a computer web-based decision
support system for the forecasting the occurrence of the heart attack. From the heart disease database the hidden
knowledge have been extracted .EHPS finds solutions with accurate results even for the tough queries, it predicts the
possibility of occurrence of the heart disease of a person with at most accuracy. The result is displayed to the user as
a report. EHPS gives the physicians a second opinion about their patient .EHPS can further be improved not just for
coronary heart diseases it can be used to predict many diseases like tumor, cancer, diabetes etc. features like medical
hospitals uploading the useful data can also be included in the EHPS.
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
Han, J., Kamber, M.: “Data Mining Concepts and Techniques”, Morgan Kaufmann Publishers, 2006
Priyanka.N, Dr.Pushpa Ravikumar, “Computer Based-Medical Decision Support System for prediction of
Heart attack using Data mining techniques”, International Journal Of Advanced Research In Computer and
Communication Engineering, Vol. 5, Issue 4, April 2016.
Shweta Kharya, “Using Data Mining Techniques for Diagnosis and Prognosis of Cancer Disease”,
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 2,
No. 2, April 2012.
M. Durairaj, V. Ranjani, “Data Mining Applications in Health care sector: A study”, International Journal
Of Science and Technology Research Volume 2, Issue 10, October 2013.
G.Subbalakshmi , Ramesh, M, Chinna Rao ,” Decision Support in Heart Disease Prediction System using
Naive Bayes”, ISSN : 0976-5166 Vol. 2 No. 2 Apr-May 2011.
Ruben D. Canlas Jr., MSIT, MBA, “Data Mining in Health Care: Current applications and issues”, Center
for conscious living Foundation Inc., 5 August, 2009.
Ho, T.J.: “Data Mining and Data Warehousing”, Prentice Hall, 2005.
Obenshain, M.K: “Application of data Mining Techniques to Healthcare Data”, Infection Control and
Hospital Epidemiology, 25(8), 690–695, 2004.
Sellappan Palaniappan, Rafiah Awang “Intelligent Heart Disease Prediction System Using Data Mining
Techniques”, IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8,
August 2008.
K.Srinivas B.Kavihta Rani Dr. A.Govardhan Associate Professor, Dept. of CSE Principal and professor of
CSE “Applications of Data Mining Techniques in Healthcare and Prediction of heart attacks”-(IJCSE)
International Journal on Computer Science and Engineering Vol. 02, No. 02, 2010, 250-255.
Kaur, H., Wasan, S. K.: “Empirical Study on Applications of Data Mining Techniques in Healthcare”,
Journal of Computer Science 2(2), 194-200, 2006.
Tang, Z. H., MacLennan, and J.: “Data Mining with SQL Server 2005”, Indianapolis: Wiley, 2005.
Sellappan, P., Chua, S.L.: “Model-based Healthcare Decision Support System”, Proc. Of Int. Conf. on
Information Technology in Asia CITA’05, 45-50, Kuching, Sarawak, Malaysia, 2005
Thuraisingham, B.: “A Primer for Understanding and Applying Data Mining”, IT Professional, 28-31,
2000.
https://archive.ics.uci.edu/ml/datasets/Heart+Disease
Badr hssina, Abdelkarim merbouha, Hanane Ezzikouri Mohammed Erritali ,” A comparative study of
decision tree ID3 and C4.5 , (IJACSA) International Journal of Advanced Computer Science and
Applications”, Special Issue on Advances in Vehicular Ad Hoc Networking and Applications.
Monika Gandhi , Dr. Shailendra Narayan Singh, “Predictions in Heart Disease Using Techniques of Data
Mining”1st International Conference on Futuristic trend in Computational Analysis and Knowledge
Management (ABLAZE-2015).
18. K.Sudhakar, Dr. M. Manimekalai, “ Study of Heart Disease Prediction using Data Mining”, International
Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 1, January
2014.
19. Deepali Chandna, “Diagnosis of Heart Disease Using Data Mining Algorithm”, 1678-1680, (IJCSIT)
International Journal of Computer Science and Information Technologies, Vol. 5 (2) , 2014.
20. Deepali Chandna, “Diagnosis of Heart Disease Using Data Mining Algorithm”, 1678-1680, (IJCSIT)
International Journal of Computer Science and Information Technologies, Vol. 5 (2) , 2014.