Download Advances in Natural and Applied Sciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Advances in Natural and Applied Sciences, 9(6) Special 2015, Pages: 639-644
AENSI Journals
Advances in Natural and Applied Sciences
ISSN:1995-0772 EISSN: 1998-1090
Journal home page: www.aensiweb.com/ANAS
Automatic Ontology Construction Through Decision Tree Classification Techniques
1
R. Geetha Ramani and 2S. Siva Sankari
1
Associate Professor, Department of Information Science and Technology, College of Engineering, Anna University, Chennai-600025,
India.
2
Research Scholar, Department of Information Science and Technology, College of Engineering, Anna University, Chennai-600025, India.
ARTICLE INFO
Article history:
Received 12 October 2014
Received in revised form 26 December
2014
Accepted 1 January 2015
Available online 25 February 2015
Keywords:
Automatic Ontology building,
Classification trees,
Data Mining, Hepatitis,
OWL.
ABSTRACT
Background: Hepatitis is one of the most infectious diseases which affects majority of
the population in all age group. Diagnosing hepatitis is a major issue for general
practitioners. Building of Ontology would provide the required solution. Ontology is
conceptualization of domain and its terms and relationships expressing formal
specification. The technique of ontology finds its application in almost every area, some
of which includes medicine, e-commerce, chemistry, education etc. Construction of
ontology involves laborious and time intensive task. Objective: In this paper, automatic
building of ontology structure is attempted through data mining classification
techniques. Classification of training data is carried out using various Decision tree
classification algorithms An analysis on outcome of various classifier (J48, Rep Tree,
Random Tree) with respect to their classification accuracy is carried out to choose the
best knowledge structure which could be further used for automatic ontology
construction through using OWL (Web Ontology Language). Results: Among the
various classification algorithms analyzed, J48 had yielded a higher accuracy of
89.13%. Conclusion: Hence the rules (Decision Tree) with features evolved by J48 are
considered for ontology construction through OWL. The training data considered for
this research work is Hepatitis dataset, which is available at UCI Machine Learning
Repository.
© 2015 AENSI Publisher All rights reserved.
To Cite This Article: R. Geetha Ramani and S. Siva Sankari., Automatic Ontology Construction Through Decision Tree Classification
Techniques. Adv. in Nat. Appl. Sci., 9(6): 639-644, 2015
INTRODUCTION
Hepatitis, the fifth most death causing diseases after heart disease, stroke, chest disease and cancer
(Mougiakakou, Valavanis and Nikita, et al., 2009) causes 1.5 million death worldwide each year. Various risk
factors for Hepatitis includes blood transfusions, tattoos and piercing, drug abuse, haemodialysis, health
workers, and sexual contact with hepatitis carrier (Shankaracharya, Kumari and Vidyarthi, 2012). Early stage
diagnosis is very difficult in general population due to the lack of regular routine check up as well as awareness.
Hence diagnosis totally depends on visual task done by expert doctors based on their expertise. Hence ontology
construction for Hepatitis diagnosis can help serve this situation.
Ontology is defined as a formal explicit specification of conceptualization of a domain and its relationships
(Asunción Gómez-Pérez, Mariano Fernández-López and Oscar Corcho, 2004). Though ontology has many-fold
applications, construction of ontology structure still remains a complex task. The first step in ontology
construction is building a framework. The main components of core of building an ontology structure includes
domain identification concepts and relationships identification in the domain of interest. Taxonomical Hierarchy
structure, organized by Super-sub concept relationship which includes Classes, Sub-Classes, Super-Classes,
Category definition and inter-relationship definition is created. Hence it would be highly useful if the process of
Ontology building could be automated.
In this proposed method, automatic ontology construction is attempted through data mining techniques and
subsequently diagnosing hepatitis. This paper is organized as follows: Section 1 presents the study of related
work. Section 2 details on the proposed methodology for automatic ontology constructions using classification
techniques. Section 3 concludes the paper and also gives direction on the future research in this area.
1.
Related Work:
Only a few attempts have been made to automatically group the concepts, some of which are summarized
below COBWEB clustering algorithm was adopted by software agents to automatically generate concepts for
Corresponding Author: R. Geetha Ramani, Associate Professor, Department of Information Science and Technology,
College of Engineering, Anna University, Chennai-60025, India.
640
R. Geetha Ramani and S. Siva Sankari, 2015
Advances in Natural and Applied Sciences, 9(6) Special 2015, Pages: 639-644
music domain (Clerkin, Cunningham and Hayes, 2002). Structured knowledge was created for gene-product
using iterative statistical information extraction in combination with nearest neighbour clustering (Blaschke, and
Valencia, 2002). Formal Concept Analysis was used to formally abstract data as conceptual structures (Quan,
Hui, Fong and Cao, 2004). A further refinement to Formal Concept Analysis was made in (Ganter, Stumme and
Wille, 2005) by incorporating fuzzy logic in it to deal with uncertainties in data and interpret the concept
hierarchy. The fuzzy formal concept analysis was used in automatic generation of ontology for scholarly
semantic web. TextOntEx constructs ontology from natural domain text using semantic pattern-based
approach, and analyze natural domain text to extract candidate relations, and map them into meaning
representation to facilitate ontology representation (Wuermli, Wrobel, Hui and Joller, 2003). Based on the data
mining outputs from rule sets and decision trees, Ontologies were built automatically. RDF, RDF-S and
DAML+OIL were used for defining Ontologies (Dahab, Hassan and Rafea, 2007).
In this work automatic ontology building is attempted through the rules generated by the classification
algorithm. The following section discusses the Hepatitis dataset and the proposed methodology.
2.
Proposed Method of Automatic ontology construction:
The dataset used for processing the proposed methodology is detailed here. This hepatitis disease dataset
deals with whether patients with hepatitis will either live or die. The used data source in this study was taken
from UCI machine learning repository. The purpose of the dataset is to predict the presence or absence of
hepatitis disease given the results of various medical tests carried out on a patient.
Table 1: Features of Hepatitis Dataset.
No
Feature Name
1
Age
2
Sex
3
Steroid
4
Antivirals
5
Fatigue
6
Malaise
7
Anorexia
8
Liver big
9
Liver firm
10
Spleen palpable
11
Spiders
12
Ascites
13
Varices
14
Bilirubin
15
Alk phosphate
16
SGOT
17
ALBUMIN
18
PROTIME
19
HISTOLOGY
Domain values of Feature
10,20,30,40,50,60,70,80
Male, Female
Yes, No
Yes, No
Yes, No
Yes, No
Yes, No
Yes, No
Yes, No
Yes, No
Yes, No
Yes, No
Yes, No
0.39,0.80,1.20,2.00,3.00,4.00
3,38,01,20,16,02,00,250
13,10,02,00,30,04,00,500
2.1,3.0,3.8,4.5,5.0,6.0
10,20,30,40,50,60,70,80,90
Yes, No
The proposed framework is given in Fig 1. The proposed framework consists of domain identification, data
pre-processing, Building Decision tree and OWL construction.
Domain Identification
Domain Data
TABLE I. INFORMATION ABOUT THE FEATURES OF THE HEPATITIS
Data Pre-Processing
DATASET
Attributes
J48,REP
Tree,Random
Tree
Building DecisionTree
Name of
Number features
Extracted knowledge The values of features
1 Age
10,20,30,40,50,60,70,80
2 Sex
Male, Female
Classes, Sub classes,
OWL
3 Steroid
Yes, No
Super classes and
Category
4 Antivirals
Yes, No
5 Fatigue
Yes, No
Built Ontology
6 Malaise
Yes, No
7 Anorexia
Yes, No
Fig. 1: Automatic ontological structure framework through classification techniques.
8 Liver big
Yes, No
9 Liver firm
Yes, No
10 Spleen palpable Yes, No
11 Spiders
Yes, No
12 Ascites
Yes, No
641
R. Geetha Ramani and S. Siva Sankari, 2015
Advances in Natural and Applied Sciences, 9(6) Special 2015, Pages: 639-644
Domain Identification:
For the construction of the current ontology framework, the domain is chosen as hepatitis dataset, which is
obtained from UCI machine learning repository [http://archive.ics.uci.edu/ml/datasets/Hepatitis]. The dataset
contains 155 instances distributed between two classes (die, live) die with 32 instances and live with 123
instances. There are 19 features or attributes, 13 attributes are binary while 6 attributes with 6-8 discrete values
and some missing data. The goal of the dataset is to forecast the presence or absence of hepatitis virus.
Data Pre-Processing:
The hepatitis domain data is given as input to the data pre-processing step. To make data processing
interoperable between WEKA and OWL, the blank spaces in the attribute names are removed. This dataset
contains some missing values. The existing classifiers itself had procedure for handling the missing values. In
the case of J48 classifier, any split on an attribute with missing value will be done with weights proportional to
frequencies of the observed non-missing values (Ian Witten, Eibe Frank and Mark AFrank's, 2005). The preprocessed data is considered further for classification.
Classification:
Classification is used to classify data into predefined class labels. Class in classification is the attribute or
feature in a data set, in which users are most interested. Classification can be used to diagnose hepatitis and
prognosis based on symptoms and health conditions (Shomona Gracia Jacob et al., 2012). Decision tree learning
is one of the most widely used techniques for classification (Geetha Ramani and Jacob, 2013). In the present
study, three different state-of-art supervised machine learning algorithms namely J48, Rep Tree and Random
tree algorithm were analyzed. J48 implements C4.5 decision tree learning algorithm (Quinlan, 1993- Esposito,
Malerba and G. Semeraro, 1997). In this proposed method J48 algorithm serves to be the best one with the
highest accuracy of 89.13% through the validation procedure namely viz. percentage split with the proposition
of 70-30 (70% of the data is used for training and 30% of the data) is utilized for testing. The results are
tabulated in Table 2. Hence rules (in the form Decision tree) generated through J48 classification algorithm
(Shomona Gracia Jacob and Geetha Ramani, 2012) was used for building the Ontology structure.
Table 2: Performance comparison of various classifiers.
Classifiers
J48
Rep Tree
Random Tree
Accuracy
89.13%
82.6%
78.2%
WEKA Decision Tree:
A WEKA Decision tree was evolved and serialized into dot format using WEKAAPI to read the document
and create the ontology using the OWLAPI. We used J48 decision tree algorithm to discover and extract
knowledge from structure data. Then we build ontology from the generated decision tree.
OWL Construction:
OWL construction using Java in Eclipse .We integrates the Java with OWLAPI. The implementation of this
work was carried out using WEKA 3.6.10, an open source data mining tool and Protégé 5, open source tool for
ontology framework creation. To extend the decision tree used to automatically construct of extend branch and
terminal branch of ontology. The process of construction is given as Pseudo code.
Pseudo code:
Begin
Loop for each (object e of edges-array)
Edge e1 = e as Edge
Node head = Nodes(e1.head) as Node
Node tail = Nodes(e1.tail) as Node
if(head. degree is greater than 1)
//extend branch
owl:extendBranch(head,tail,e1);
else
//build terminal branch
owl:terminateBranch(head,tail,e1);
end if
End for
642
R. Geetha Ramani and S. Siva Sankari, 2015
Advances in Natural and Applied Sciences, 9(6) Special 2015, Pages: 639-644
// Generating OWL structure - Extending at head node.
superCls = tail.label_with tail.ID;
subCls = head.label with head.ID;
//node specific descriptor
Class node_descriptor = tail.label with tail.ID;
//generalized descriptor
OWLClass descriptor =tail.label;
//apply label annotations
RDFSLabel:superCls,tail.label with tail ID);
RDFSLabel:subCls,head.label with head.ID);
//make tail node parent of head node
Subclass(supercls,subcls);
//make descriptor child of descriptor class.
SubClass of node_descriptor);
// Generating OWL strcuture - Terminating at leaf node
superCls = tail.label with tail.id
node_descriptor = tail.label with tail.ID;
descriptor = tail.label;
generalcategory = new (owl:Class("Category");
//apply label annotations
RDFSLabel of superCls,of tail.label and tail.ID;
RDFSLabelof node_descriptor,of tail.label with
tail.ID;
RDFSLabel of (descriptor,t.label);
//make tail node parent of head node
SubClass(superCls, scategory);
//make descriptor child of descriptor class.
SubClass(new owl:Class("Descriptor"),n_descriptor);
SubClass(n_descriptor,descriptor);
SubClass(generalcategory, category);
SubClass(category, scategory);
The automatic ontology construction of hierarchical structure, consisting of a set of classes organized in a
structured manner to represent the domain’s salient classes, a set of slots associated to classes to describe their
properties and relationships, and a set of instances of those classes. In OWL, classes are interpreted as sets of
sub classes. The hierarchy structure of Hepatitis is depicted in Fig 2. Automatically construct a OWL structure
generation, which consists of nodes.
Fig. 2: Taxonomy Hierarchy Structure for Hepatitis Ontology.
643
xxxx et al, 2015
Advances in Natural and Applied Sciences, x(x) Special 2015, Pages:X-X
is-a
is-a
is-a
SGOT
SGOT_N16
FATIGUE
is-a
ANTIVIRALS
is-a
FATIGUE_N13
is-a
is-a
is-a
is-a
PROTIME_N4
SEX
SEX_N7
ANTIVIRALS_N15
is-a
is-a
is-a
is-a
is-a
is-a
is-a
PROTIME
LIVER_FIRM_N9
MALAISE
is-a
MALAISE_N2
Descriptor
is-a
is-a
is-a
AGE
is-a
LIVEN6
AGE_N10
is-a
LIVEN14
is-a
is-a
SPIDERS_N1
is-a
LIVEN23
ASCITES_N0
is-a
is-a
is-a
is-a
Thing
ALBUMIN_N24
is-a
is-a
LIVEN17
is-a
is-a
LIVEN3
is-a
is-a
LIVER_FIRM_N2
64
LIVEN8
is-a
is-a
is-a
is-a
LIVEN8
LIVER_FIRM
is-a
is-a
LIVEN11
DIEN5
LIVER_BIG_N18
is-a
Category
DIEN5
ALBUMIN_N27
is-a
is-a
LIVER_BIG
is-a
is-a
is-a
DIEN12
DIEN25
DIEN19
LIVEN30
is-a
is-a
ALBUMIN_N20
LIVEN28
DIEN29
is-a
is-a
is-a
is-a
DIEN21
LIVEN22
is-a
Fig. 3: Automatically Constructed Ontology Structure.
ALBUMIN
In Extend branch represents head node denoted as Descriptor and Terminal branch represents the tail node
denoted as Category. Each and every node assigned as super class (superCls) and sub class (subCls) with their
node identification (ID) value like N0,N1...so on. In extend branch, super class will create a tail label (tail.label)
and tail identification (tail.ID). Subsequently sub class, create a head label (head.label) and head identification
(head.ID). Based on the domain, Ascites_N0 is assigned as a root node with the head Label(Ascites) with
head.ID (N0). Super class of the node Ascites_N0 have two subclass as Spiders_N1 and Albumin_N24.
Similarly Sub Classes are assigned to other Super Classes.
In Terminal branch, have predict the presence (Die) and absence (Live) of the disease. Each branch in the
decision tree may have a set of leaves. Each leaf in the decision tree represents a classification rule as well as
target class (Category). Based on the two branches, Automatic construction ontology from the extracted
knowledge represented in the decision tree was shown in the Fig 2.
Each and every node has, is-a hierarchy relationship between the Super class and Subclass. The building
ontology structure can visualize in OWLViz.The Fig 3 shows the OWL visualize tree .The automatic ontology
644
xxxx et al, 2015
Advances in Natural and Applied Sciences, x(x) Special 2015, Pages:X-X
construction for Hepatitis disease through J48 would assist the medical practitioners to great extent. The
automatic ontology construction provided the optimal solution and could provided efficient inference.
3.
Conclusion:
Hepatitis is one of the most death causing diseases whose diagnosis still remains challenging for the
medical practitioners. Thus, application of computational approaches for Hepatitis prognosis is of great
demand.. Since manual Ontology construction is a complex task, automatic ontology construction techniques
are sought. In this paper, automatic ontology construction is attempted through the decision trees generated by
the classification techniques. To get the optimal decision tree, different classification algorithms were
investigated, out of which J48 performed the best yielding an accuracy of 89.13%. The generated rules are used
for automatic construction of Ontology .The evolved ontology will have optimal set of important Descriptor
and Category that will aid the diagnosis of Hepatitis. This method of Ontology structure construction would be
great help to the medical community as well as various domain areas.
REFERENCES
Asunción Gómez-Pérez, Mariano Fernández-López, Oscar Corcho, 2004. Ontological Engineering: With
Examples from the Areas of Knowledge Management, E-commerce and the Semantic Web, Springer, 1: 5-10.
Blaschke, C., A. Valencia, 2002. Automatic Ontology Construction from the Literature”, Genome
Informatics, 13: 201-213.
Clerkin, P., P. Cunningham and C. Hayes, 2002. Ontology Discovery for the Semantic Web Using
Hierarchical Clustering, Trinity College Dublin, Ireland.
Dahab, M.Y., H. Hassan and A. Rafea, 2007. TextOntoEx: Automatic ontology construction from natural
English text, Expert Systems with Applications, Elsevier, 34: 1474-1480.
DOI:10.1038/npre.2012.7093.1:Posted2.
Esposito, F., D. Malerba and G. Semeraro, 1997. A comparative Analysis of Methods for Pruning Decision
Trees, IEEE transactions on pattern analysis and machine intelligence, 19(5): 476-491.
Ganter, B., G. Stumme, R. Wille, (Eds.), 2005. Formal Concept Analysis: Foundations and Applications.
Lecture Notes in Artificial Intelligence, Springer-Verlag, 3626.
Geetha Ramani, R. and S.G. Jacob, 2013. Improved classification of Lung cancer tumors based on
structural and physicochemical properties of proteins using data mining models. PLoS ONE, 8(3): e58772.
Ian H. Witten, Eibe Frank and Mark AFrank's textbook, 2005. Data Mining Practical Machine Learning
Tools and Techniques, 2nd. Ed.
Mougiakakou, G.S., K.I. Valavanis, A. Nikita, et al. 2009. Diagnostic Support Systems and Computational
Intelligence: Differential Diagnosis of Hepatic Lesions from Computed Tomography Images”, IGI.
Quan, T.T., S.C. Hui, A.C.M. Fong and T.H. Cao, 2004. Automatic generation of ontology for scholarly
semantic Web. In: Lecture Notes in Computer Science, 3298: 726-740.
Quinlan, J.R., 1993. C4.5:programs for machine learning: Morgan Kaufmann Publishers Inc., 302.
Shankaracharya, Kumari, S., S.A. Vidyarthi, 2012. Development of java based graphical user interface for
diagnosis of hepatitis using mixture of expert, Nature proceeding.
Shomona Gracia Jacob and R. Geetha Ramani, 2012. Evolving efficient classification rules from
Cardiotocography data through data mining methods and techniques. European Journal of Scientific Research,
78(3): 468-480.
Shomona Gracia Jacob, R. Geetha Ramani and P. Nancy, 2012. Efficient classifier for classification of
Hepatitis C Virus clinical data through data mining algorithms and techniques. Proceedings of the International
Conference on Computer Applications, Techno Forum Group, Pondicherry, India, 27-31.
UCI Machine Learning Repository, [Online]. Available: http://archive.ics.uci.edu/ml/datasets/Hepatitis.
Wuermli, O., A. Wrobel, S.C. Hui and J.M. Joller, 2003. Data Mining For Ontology Building: Semantic
Web Overview, Diploma Thesis–Dep. of Computer science, Nanyang Technological University.