Download Fuzzy Database for Heart Disease Diagnosis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Fuzzy Database for Heart Disease Diagnosis
Rehana Parvin
Department of Computer Science
Ryerson University
Toronto, Ontario M5B2K3, Canada
[email protected]
Keyword: fuzzy database, inaccurate data,
membership function, linguistic representation, SQL.
Abstract
Using a Database is a well known method
for storing information. In regular database systems,
sometimes because of existence of huge data it is not
possible to fulfill the user's criteria and to provide
them with the exact the information that they need to
make a decision. In this paper, a medical fuzzy
database is introduced in order to help users in
providing accurate information when there is
inaccuracy in database. Inaccuracy in data represents
imprecise or vague values (like the words use in
human conversation) or uncertainty in using the
available information required for decision making.
In this paper, a fuzzy database management system is
introduced to diagnose the severity of the heart
disease of a patient by using existing data in the
common database systems.
1.
Introduction
Information plays an important role in this
modern civilization to step forward in every sphere
from earth to sky, earth to planet, earth to ocean.
People are discovering amazing inventions at a rapid
pace based on information. Therefore, it is necessary
that accurate information be obtained.
Data sizes are becoming large day by day.
So, it is a challenge to deal with these large amounts
of data stored in databases. Regular database is based
on Boolean logic, which means that the information
is either completely true or completely false. It is
impossible to deal with imprecise information in
Dr. Abdolreza Abhari
Department of Computer Science
Ryerson University
Toronto, Ontario M5B2K3, Canada
[email protected]
classical database management systems. It is
important to discover convenient ways to store and
manage human perception based data which is often
vague and uncertain in regular database system.
There are huge data management tools
available within health care systems, but analysis
tools are not sufficient to discover hidden
relationships amongst the data. Most of the medical
information is vague, imprecise and uncertain.
Extracting correct information from this data is
considered an art. It can be said to be an art because it
is complicated by many factors and its solution
involves literally all of a human's abilities including
intuition and subconscious.
Medical diagnosis is a complicated task that
requires operating accurately and efficiently.
According to the World Health Organization, 12
million deaths occur each year due to heart diseases.
It is the primary reason behind deaths in adults. In the
United States, 50% of deaths occur due to cardio
vascular disease. In fact, one person dies every 34
seconds due to cardiovascular diseases in the United
States. Similarly, in other developed countries heart
disease is one of the main reasons behind adult death.
In order to decrease the mortality rate of
cardiovascular disease, it is necessary for the disease
to be diagnosed at an early stage.
Having fuzzy data management capability in
a database is important to be able to store vague data.
Ignoring vague data management means the risk of
losing important information, which may be useful
for some applications. A database that supports
vague, imprecise and uncertain information is called
a fuzzy database. It is based on fuzzy logic and fuzzy
set theory which is introduced by Zadeh [1].
This paper is organized as follows: Section 2
describes related work, Section 3 talks about
Expert's Opinion regarding the risk factors of Heart
Disease and fuzzy sets, Section 4 deals with the
methodology and proposed system which includes
Algorithm of the system, Membership functions,
Input and Output of the system, SQL (Structured
Query Language), several snap shots of the system,
System testing and forms, lastly, Section 5 will
conclude and discuss future outcomes.
2.
Related Works
There are many papers published related to
the diagnosis of various diseases.
Shrandhanjali [2] developed a Fuzzy Petri
net application for Heart Disease Diagnosis. The rule
based is associated with transition for certainty
factors. The fuzzy Petri net is drowning for the rule
base and get decision of the disease, truth value
proposition is used.
Lavanya et al. [3] designed a Fuzzy rule
based inference system for detection and diagnosis of
lung cancer. The dataset is used from domain expert
with symptoms, stages and treatment facilities to
provide an efficient and easy method to diagnose
lung cancer.
Adeli et al. [4] proposed a Fuzzy expert
system for the Heart Disease Diagnosis. The
developed system uses fuzzy logic. In their system
the crisp value is fuzzified to get fuzzy values. The
expert system uses those fuzzy values and the output
is also fuzzy. The fuzzy output is defuzzified to get a
crisp output.
Sony et al. [5] designed an Intelligent and
Effective Heart Disease Prediction System using
weighted associative classifiers. They used Java as
front end and Ms Access as backend tool. They only
consider two cases for prediction (Heart disease and
No Heart Disease).
Neshat et al. [6] developed a Fuzzy Expert
System for diagnosis of Liver Disorder. They
considered two cases, people with healthy liver, and
people with unhealthy liver along with calibration of
disease risk intensity measure. The fuzzy inference
system is developed in MATLAB software.
Kadhim et al. [7] implemented a fuzzy
expert system which was to diagnosis the back pain.
The rules were developed by experts and decision
sequence is illustrated by a decision tree.
All the works explained above are the
developed systems that use Fuzzy Logic to diagnose
disease. Most of them is based on developing a fuzzy
inference engine by using MATLAB.
The difference between our work and theses
works is we build our system based on the available
data existing in the database. Therefore, by focusing
on database concepts we developed a system that
uses SQL on common database management systems
such as Microsoft Access.
3.
Expert's Opinion regarding the risk factors
of Heart Disease and Fuzzy sets of the system
Here a very brief explanation of Fuzzy
theory in medical application is presented. A crisp
property P can be defined µ: x→ {0, 1} whereas a
fuzzy property can be described by µ: x → [0, 1].
µ(x) indicates the degree to which x has the property.
To clarify fuzzy logic in medical science, a simple
example described in [8 and 9] is explained below.
For example: in case of fever, the linguistic
variable is high fever. If the body temperature(x) is
greater than 39 degree Centigrade then in medical
concept µ(x) for high fever is 1 it means x has surely
'High Fever'. If x is less than 38.5 degree centigrade
then µ(x) of medical concept ' High Fever' is 0, which
means x surely does not have 'High Fever'. If x is in
the interval [38.5 degree Centigrade, 39 degree
Centigrade] then x has a property of high fever with
some degree (i.e., membership function) between [0,
1].
In our proposed system, the inputs have
fuzzy sets according to the range that they fall in.
Then, to achieve more accurate result the inputs are
connected with the membership function multiplied
by output membership function to generate the rules
strength.
When we take the union of all rules strength
then we have 44 rules. According to the input the rule
strength are generated and we take maximum from
the union fuzzy set to generate the Heart Disease
Condition as desired output.
According to the American Heart Association
[10], the major risk factors of heart disease are those
that significantly increase the risk of heart and blood
vessel (cardio vascular). The more the risk factors
one is exposed to, the greater the chance of
developing a heart disease. Some of the risk factors
such as increasing age and gender are related to birth
and cannot be changed. People at 65 or older die due
to heart disease comparison to other ages. Men have
a greater risk of developing a heart disease compared
to women. However, older women have the same risk
similar to men. Cholesterol level is also affected by
age and gender.
Blood sugar is another contributing factor
that increases the risk of developing heart disease. If
blood sugar is not controlled then a high risk remains.
Blood pressure is another risk factor which increases
the heart's workload and causes the heart to be
thickened and stiffened. This stiffening of the heart
decreases normal function. Stress, inactive lifestyle,
smoking, alcohol, diet and nutrition are other risk
factors that influence heart disease in the long run.
Therefore, in our system input variables are:
Age, Gender, Chest Pain, Cholesterol, Blood
Pressure,
Blood
Sugar,
Heart
Beat,
Electrocardiography (ECG), Exercise, Old Peak and
Thallium Scan. The Output is the Heart Disease
Condition shown as the linguistic terms.
4.
METHODOLOGY
AND
PROPOSED
SYSTEM
In this section, first the algorithm and
methodology are explained and then the snapshots of
the system functions and testing are illustrated.
4.1
Algorithm
I. 11 inputs: Age, Gender, ChestPain, Cholesterol,
BloodPressure,
BloodSugar,
HeartBeat,
ECG,
Exercise, OldPeak, ThalliumScan, Category of Heart
Disease.
II. Output: Heart Disease Condition shown as the
linguistic terms.
III. Each input has fuzzy variables
IV. Each fuzzy Variable
membership Function.
is
associated
with
V. The membership function is calculated for each
fuzzy variable
VI. The rules strength is calculated based on the
membership function of the fuzzy variable.
VIII. Heart Disease condition is calculated by taking
the maximum of the selected output set as the final
result.
4.2
Membership Function, Input and Output
For the input variable Cholesterol, we use
Low Density Lipoprotein (LDL). However, it is also
possible to use High density Lipoprotein (HDL). In
case of Blood Pressure, Systolic Blood Pressure is
used.
Membership function is important for each
fuzzy variable. Also rules strength is calculated based
on the membership function.
i. Age: This input consists of four fuzzy sets i.e.
Linguistic variable (Young, Mid, Old, Very old).
Each Linguistic variable has membership functions
associated with them. The range of the fuzzy sets for
age is shown
Herein Table 1.
Table 1 Age
Each fuzzy variable is associated with membership
function. The range of the fuzzy sets for Cholesterol
is given in Table 2.
Input Field
Range
Linguistic
Representation
Age
<38
Young
33-45
Mid
40- 58
Old
52>
Very Old
Table 2 Cholesterol
Input field
Range
Linguistic
Representation
Cholesterol
< 197
Low
188-250
Medium
217-307
High
281>
Very High
iv. Gender: This input Field has two representations
(Male and Female). 1 represents male and 0 indicates
female.
Figure 1 Membership function for Age
ii. Chest Pain: This input field has four Chest Pain
types: Typical Angina, Atypical Angina, NonAngina,
and Asymptomatic. One Patient can have only one
type of Chest Pain at a time. To represent ChestPain,
1= Typical Angina, 2 = Atypical Angina, 3= Non
Angina and 4 = Asymptomatic.
iii. Cholesterol: This input field influences the result
much more comparing to other input fields.
Cholesterol can be Low Density Lipoprotein (LDL)
and High density Lipoprotein (HDL). In our system,
we only consider LDL. However, it is possible to
consider HDL instead of LDL. We use only one type
at a time. This field has four fuzzy sets.
v. Blood Pressure: Another important risk factor is
Blood Pressure. It can be Systolic, Diastolic and
Mean types. In our system, we consider Systolic
Blood Pressure. It is possible to choose any type of
Blood Pressure. This field has four fuzzy sets. The
ranges for the Linguistic variable representation are
given in Table 3. The membership function is
calculated based on the range.
Table 3 Blood Pressure
Input field
Range
Linguistic
Representation
Blood
Pressure
< 134
Low
127- 153
Medium
142-172
High
154>
Very High
ST_Tabnormal in our designed system. This input
field has three fuzzy sets: Normal, ST_Tabnormal
and Hypertrophy. The ranges for fuzzy sets
representation are given in Table 6.
Table 6 ECG
Figure 2 Membership function for Blood Pressure
vi. Heart rate: This field has three fuzzy sets. Each
Linguistic representation is associated with
membership function. The ranges for each linguistic
representation are given in Table 4.
Input Field
Range
Fuzzy sets
Electrocardiography(ECG)
<0.4
Normal
0.4
1.8
1.8>
-
ST_T
abnormal
Hypertrophy
Table 4 Linguistic Representation for Heart rate
Input Field
Range
Fuzzy sets
Heart rate
< 141
Low
111-194
Medium
152>
High
Figure 3 Membership function for ECG
vii. Blood Sugar: This field plays an important role
in changing the results. It has two linguistic
representations. Each fuzz variable is associated with
membership function based on the range. The ranges
of fuzzy sets are given in Table 5.
Table 5 Blood Sugar
Input Field
Range
Linguistic
Representation
Blood Sugar
>=120
Yes(1)
<120
No(0)
viii. Electrocardiography (ECG): ECG is measured
by several waves in a graph paper such as T wave, Q
wave, P wave, S wave of electric in pulse of Heart
muscle. Normally the waves stay upper bound of the
graph. If any of the wave goes down then it is thought
as abnormality. In order to develop our system, we
assume that S wave and T wave go down to represent
the abnormality and this abnormality is named as
ix. Exercise: This field indicates whether the patient
need exercise test. This input field has two fuzzy sets
representations. If the Patient requires an Exercise
test then Value 1 is entered and if the patient does not
need Exercise test value zero is entered. The
linguistic representations are "Yes" for 1 and "No"
for 0. Each fuzzy set has membership function
associated with that.
x. Old Peak: This field means ST depression induced
by exercise relative to test. The meaning of ST
depression is related to the ECG field. It means
previously the patient's T wave and S wave in the
ECG graph paper were down. Old Peak is necessary
to assure the present condition of T wave and S wave
of the ECG. It has three fuzzy sets representation.
Each fuzzy variable is associated with membership
function. The range for the fuzzy sets is given in
Table 7.
Table 7 Old Peak
Table 9 Output
Input Field
Range
Fuzzy sets
Output Field
Range
Fuzzy sets
Old Peak
<2
Low
Result
< =1
Healthy
1.5 - 4.2
Risk
0-2
Mild
2.5>
Terrible
1- 3
Moderate
2-4
Severe
3>
VerySevere
xi. Thallium Scan: Thallium scan is the redistribution
of heart image. This input field has three linguistic
representations: Normal, Reversible Defect and Fixed
Defect. It depends on the hours that a heart image
appears on the screen of the Gamma camera. This
Gamma camera is able to detect radioactive dye in
the body. To develop our system we assume that the
linguistic representation of thallium scan in the
Normal, the heart image appears within 3 hours, in
fixed Defect heart image appears within 6 hours and
in the Reversible Defect the heart image appears
within 7 hours. The linguistic representation for
Thallium scan is given in Table 8.
Table 8 Thallium Scan
Input Field
Range
Fuzzy sets
3
Normal
6
Fixed Defect
7
Reversible Defect
Figure 4 Membership function for Output
4.3
Output: The output is the presence of Heart disease
valued from 0(no presence i.e. Healthy condition) to
4. If the integer value increases then the heart disease
risk increases. We divide the Output fuzzy sets
{Healthy, Mild, Moderate, Severe, VerySevere}.The
ranges and membership function for output variable
are given below:
SQL (Structured Query Language)
In our developed system, all the inputs are
connected together through SQL and thus we reach
our desired output. We build one Patient table with
all inputs and then connect the inputs together by
SQL to get the desired output. One SQL from our
designed system is given below:
SELECT Patient. Age, IIf([Age]<=29,1,IIf([Age]
Between 29 And 38,Format((38-[Age])/9,"#.00"),0))
AS YoungMF, IIf([Age]=38,1,IIf([Age] Between 33
And
38,Format(([Age]-33)/5,"#.00"),IIf([Age]
Between 38 And 45,Format((45-[Age])/7,"#.00"),0)))
AS MidMF, IIf([Age]=48,1,IIf([Age] Between 40
And
48,Format(([Age]-40)/8,"#.00"),IIf([Age]
Between
48
And
58,Format((58[Age])/10,"#.00"),0)))
AS
OldMF,
IIf([Age]>=60,1,IIf([Age]
Between
52
And
60,Format(([Age]-52)/8,"#.00"),0)) AS VeryOldMF;
The above query is designed to represent the
membership value of age into fuzzy variable such as
Young, Mid, Old, Very Old.
We have tested our developed system with the
following values and also for the Cleveland Clinic
Foundation's processed Heart dataset. For more
details please see[11]
The following figure indicates the overall summary
of our system:
Heart
Disease Rules
Crisp Input
Heart
Disease
Fuzzification
n
Inference
Engine
Condition
Figure 5 Developed Fuzzy Expert Database Systems
Cond
4.4
Figure 7 Diagnostic of Heart Disease with risk
factors
Snapshots of the System
Some snap shots for the developed system
are shown below
Figure 8 The Input Form for the Fuzzy Diagnostic of
Heart Disease with all the risk factors
5.
Figure 6 Input field Age with membership function
4.5
System Testing
Conclusion and Future work
There are many applications based on fuzzy
data analysis. However, building fuzzy inference
system based on current database applications is rare.
In the medical diagnosis, there are huge collections of
data available in the regular databases. The problem
of using these databases is there is an inaccuracy in
data that makes searching for some information
impossible. In our proposed fuzzy database system
we use the same data with the regular SQL to build
fuzzy rules. The advantage of our developed system
is that we can use the available data existing in the
current database systems for decision making. We
used the common SQL language, thus we do not need
to use any special tools such as SQLF or FSQL. In
future, we want to extend our work doing more
research on using fuzzy database for other diseases
like cancer and diabetes.
[10] American Heart Association, 2006, Inc."Journal
of American Heart Association ", Risk factors and
Coronary Heart Disease.
References:
[11] Robert Detrano & M.D & PhD, V.A. Medical
center, Long Each and Clevand Clinic Foundation.
Available:
www.archieve.ics.uci.edu/ml/datasets/Heart+Disease
[1] Zadeh, L.A, B.1965 "Fuzzy sets", Information
and Control, Vol. 8, pp.338- 353.
[2] Shradhanjali, 2012. "Fuzzy Petry Net Application:
Heart Disease Diagnosis", International Conference
on Computing and Control Engineering.
[3] Lavanya, K., Durai, M.A., N.Ch.Sriman Narayan
lyengar, 2011, "Fuzzy Rule Based Inference System
for Detection and Diagnosis of Lung Cancer",
International journal of Latest Trends in computing,
vol- 2, Issue 1, pp.165-169.
[4] Adeli, A., Mehdi.Neshat, 2010."A Fuzzy Expert
System for Heart Disease Diagnosis". Proceedings of
the International Multi Conference of Engineers and
Computer Scientists, Vol I, ISSN 2078-0966.
[5] Soni, J., Ansari, U., Dipesh Sharma, 2011,
"Intelligent and Effective Heart Disease Prediction
System using Weighted Associative classifiers".
International Journal on Computer Science and
Engineering, vol 3, No. 6, pp.2385- 2392.
[6] Neshat, M., Yaghobi, M., Naghibi, M.B.,
Esmaelzadeh, A., 2008, "Fuzzy Expert System
Design for Diagnosis of liver Disorders",
International Symposium on Knowledge Acuisition
and Modeling, IEEE, pp.252-256.
[7] Kadhim, M., Alam, M., Harleen Kaur, 2011,
"Design and Implementation of Fuzzy Expert System
for Back pain Diagnosis", International Journal of
Innovative Technology & Creative Engineering, Vol.
1, No.9, pp. 16-22.
[8] Klaus-Peter Adlassnig, 1986." Fuzzy Set theory in
Medical Diagnosis". IEEE Transactions on Systems,
Man and Cybernetics, Vol. Smc-16, No.2, pp. 260265.
[9] Phuong, N., Vladik Kreinovich, 2001, "Fuzzy
Logic and its Applications in Medicine",
International Journal of Medical Informatics, Vol- 62,
No. 2-3, pp.165-173.