Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Advances in Natural and Applied Sciences, 9(6) Special 2015, Pages: 639-644 AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/ANAS Automatic Ontology Construction Through Decision Tree Classification Techniques 1 R. Geetha Ramani and 2S. Siva Sankari 1 Associate Professor, Department of Information Science and Technology, College of Engineering, Anna University, Chennai-600025, India. 2 Research Scholar, Department of Information Science and Technology, College of Engineering, Anna University, Chennai-600025, India. ARTICLE INFO Article history: Received 12 October 2014 Received in revised form 26 December 2014 Accepted 1 January 2015 Available online 25 February 2015 Keywords: Automatic Ontology building, Classification trees, Data Mining, Hepatitis, OWL. ABSTRACT Background: Hepatitis is one of the most infectious diseases which affects majority of the population in all age group. Diagnosing hepatitis is a major issue for general practitioners. Building of Ontology would provide the required solution. Ontology is conceptualization of domain and its terms and relationships expressing formal specification. The technique of ontology finds its application in almost every area, some of which includes medicine, e-commerce, chemistry, education etc. Construction of ontology involves laborious and time intensive task. Objective: In this paper, automatic building of ontology structure is attempted through data mining classification techniques. Classification of training data is carried out using various Decision tree classification algorithms An analysis on outcome of various classifier (J48, Rep Tree, Random Tree) with respect to their classification accuracy is carried out to choose the best knowledge structure which could be further used for automatic ontology construction through using OWL (Web Ontology Language). Results: Among the various classification algorithms analyzed, J48 had yielded a higher accuracy of 89.13%. Conclusion: Hence the rules (Decision Tree) with features evolved by J48 are considered for ontology construction through OWL. The training data considered for this research work is Hepatitis dataset, which is available at UCI Machine Learning Repository. © 2015 AENSI Publisher All rights reserved. To Cite This Article: R. Geetha Ramani and S. Siva Sankari., Automatic Ontology Construction Through Decision Tree Classification Techniques. Adv. in Nat. Appl. Sci., 9(6): 639-644, 2015 INTRODUCTION Hepatitis, the fifth most death causing diseases after heart disease, stroke, chest disease and cancer (Mougiakakou, Valavanis and Nikita, et al., 2009) causes 1.5 million death worldwide each year. Various risk factors for Hepatitis includes blood transfusions, tattoos and piercing, drug abuse, haemodialysis, health workers, and sexual contact with hepatitis carrier (Shankaracharya, Kumari and Vidyarthi, 2012). Early stage diagnosis is very difficult in general population due to the lack of regular routine check up as well as awareness. Hence diagnosis totally depends on visual task done by expert doctors based on their expertise. Hence ontology construction for Hepatitis diagnosis can help serve this situation. Ontology is defined as a formal explicit specification of conceptualization of a domain and its relationships (Asunción Gómez-Pérez, Mariano Fernández-López and Oscar Corcho, 2004). Though ontology has many-fold applications, construction of ontology structure still remains a complex task. The first step in ontology construction is building a framework. The main components of core of building an ontology structure includes domain identification concepts and relationships identification in the domain of interest. Taxonomical Hierarchy structure, organized by Super-sub concept relationship which includes Classes, Sub-Classes, Super-Classes, Category definition and inter-relationship definition is created. Hence it would be highly useful if the process of Ontology building could be automated. In this proposed method, automatic ontology construction is attempted through data mining techniques and subsequently diagnosing hepatitis. This paper is organized as follows: Section 1 presents the study of related work. Section 2 details on the proposed methodology for automatic ontology constructions using classification techniques. Section 3 concludes the paper and also gives direction on the future research in this area. 1. Related Work: Only a few attempts have been made to automatically group the concepts, some of which are summarized below COBWEB clustering algorithm was adopted by software agents to automatically generate concepts for Corresponding Author: R. Geetha Ramani, Associate Professor, Department of Information Science and Technology, College of Engineering, Anna University, Chennai-60025, India. 640 R. Geetha Ramani and S. Siva Sankari, 2015 Advances in Natural and Applied Sciences, 9(6) Special 2015, Pages: 639-644 music domain (Clerkin, Cunningham and Hayes, 2002). Structured knowledge was created for gene-product using iterative statistical information extraction in combination with nearest neighbour clustering (Blaschke, and Valencia, 2002). Formal Concept Analysis was used to formally abstract data as conceptual structures (Quan, Hui, Fong and Cao, 2004). A further refinement to Formal Concept Analysis was made in (Ganter, Stumme and Wille, 2005) by incorporating fuzzy logic in it to deal with uncertainties in data and interpret the concept hierarchy. The fuzzy formal concept analysis was used in automatic generation of ontology for scholarly semantic web. TextOntEx constructs ontology from natural domain text using semantic pattern-based approach, and analyze natural domain text to extract candidate relations, and map them into meaning representation to facilitate ontology representation (Wuermli, Wrobel, Hui and Joller, 2003). Based on the data mining outputs from rule sets and decision trees, Ontologies were built automatically. RDF, RDF-S and DAML+OIL were used for defining Ontologies (Dahab, Hassan and Rafea, 2007). In this work automatic ontology building is attempted through the rules generated by the classification algorithm. The following section discusses the Hepatitis dataset and the proposed methodology. 2. Proposed Method of Automatic ontology construction: The dataset used for processing the proposed methodology is detailed here. This hepatitis disease dataset deals with whether patients with hepatitis will either live or die. The used data source in this study was taken from UCI machine learning repository. The purpose of the dataset is to predict the presence or absence of hepatitis disease given the results of various medical tests carried out on a patient. Table 1: Features of Hepatitis Dataset. No Feature Name 1 Age 2 Sex 3 Steroid 4 Antivirals 5 Fatigue 6 Malaise 7 Anorexia 8 Liver big 9 Liver firm 10 Spleen palpable 11 Spiders 12 Ascites 13 Varices 14 Bilirubin 15 Alk phosphate 16 SGOT 17 ALBUMIN 18 PROTIME 19 HISTOLOGY Domain values of Feature 10,20,30,40,50,60,70,80 Male, Female Yes, No Yes, No Yes, No Yes, No Yes, No Yes, No Yes, No Yes, No Yes, No Yes, No Yes, No 0.39,0.80,1.20,2.00,3.00,4.00 3,38,01,20,16,02,00,250 13,10,02,00,30,04,00,500 2.1,3.0,3.8,4.5,5.0,6.0 10,20,30,40,50,60,70,80,90 Yes, No The proposed framework is given in Fig 1. The proposed framework consists of domain identification, data pre-processing, Building Decision tree and OWL construction. Domain Identification Domain Data TABLE I. INFORMATION ABOUT THE FEATURES OF THE HEPATITIS Data Pre-Processing DATASET Attributes J48,REP Tree,Random Tree Building DecisionTree Name of Number features Extracted knowledge The values of features 1 Age 10,20,30,40,50,60,70,80 2 Sex Male, Female Classes, Sub classes, OWL 3 Steroid Yes, No Super classes and Category 4 Antivirals Yes, No 5 Fatigue Yes, No Built Ontology 6 Malaise Yes, No 7 Anorexia Yes, No Fig. 1: Automatic ontological structure framework through classification techniques. 8 Liver big Yes, No 9 Liver firm Yes, No 10 Spleen palpable Yes, No 11 Spiders Yes, No 12 Ascites Yes, No 641 R. Geetha Ramani and S. Siva Sankari, 2015 Advances in Natural and Applied Sciences, 9(6) Special 2015, Pages: 639-644 Domain Identification: For the construction of the current ontology framework, the domain is chosen as hepatitis dataset, which is obtained from UCI machine learning repository [http://archive.ics.uci.edu/ml/datasets/Hepatitis]. The dataset contains 155 instances distributed between two classes (die, live) die with 32 instances and live with 123 instances. There are 19 features or attributes, 13 attributes are binary while 6 attributes with 6-8 discrete values and some missing data. The goal of the dataset is to forecast the presence or absence of hepatitis virus. Data Pre-Processing: The hepatitis domain data is given as input to the data pre-processing step. To make data processing interoperable between WEKA and OWL, the blank spaces in the attribute names are removed. This dataset contains some missing values. The existing classifiers itself had procedure for handling the missing values. In the case of J48 classifier, any split on an attribute with missing value will be done with weights proportional to frequencies of the observed non-missing values (Ian Witten, Eibe Frank and Mark AFrank's, 2005). The preprocessed data is considered further for classification. Classification: Classification is used to classify data into predefined class labels. Class in classification is the attribute or feature in a data set, in which users are most interested. Classification can be used to diagnose hepatitis and prognosis based on symptoms and health conditions (Shomona Gracia Jacob et al., 2012). Decision tree learning is one of the most widely used techniques for classification (Geetha Ramani and Jacob, 2013). In the present study, three different state-of-art supervised machine learning algorithms namely J48, Rep Tree and Random tree algorithm were analyzed. J48 implements C4.5 decision tree learning algorithm (Quinlan, 1993- Esposito, Malerba and G. Semeraro, 1997). In this proposed method J48 algorithm serves to be the best one with the highest accuracy of 89.13% through the validation procedure namely viz. percentage split with the proposition of 70-30 (70% of the data is used for training and 30% of the data) is utilized for testing. The results are tabulated in Table 2. Hence rules (in the form Decision tree) generated through J48 classification algorithm (Shomona Gracia Jacob and Geetha Ramani, 2012) was used for building the Ontology structure. Table 2: Performance comparison of various classifiers. Classifiers J48 Rep Tree Random Tree Accuracy 89.13% 82.6% 78.2% WEKA Decision Tree: A WEKA Decision tree was evolved and serialized into dot format using WEKAAPI to read the document and create the ontology using the OWLAPI. We used J48 decision tree algorithm to discover and extract knowledge from structure data. Then we build ontology from the generated decision tree. OWL Construction: OWL construction using Java in Eclipse .We integrates the Java with OWLAPI. The implementation of this work was carried out using WEKA 3.6.10, an open source data mining tool and Protégé 5, open source tool for ontology framework creation. To extend the decision tree used to automatically construct of extend branch and terminal branch of ontology. The process of construction is given as Pseudo code. Pseudo code: Begin Loop for each (object e of edges-array) Edge e1 = e as Edge Node head = Nodes(e1.head) as Node Node tail = Nodes(e1.tail) as Node if(head. degree is greater than 1) //extend branch owl:extendBranch(head,tail,e1); else //build terminal branch owl:terminateBranch(head,tail,e1); end if End for 642 R. Geetha Ramani and S. Siva Sankari, 2015 Advances in Natural and Applied Sciences, 9(6) Special 2015, Pages: 639-644 // Generating OWL structure - Extending at head node. superCls = tail.label_with tail.ID; subCls = head.label with head.ID; //node specific descriptor Class node_descriptor = tail.label with tail.ID; //generalized descriptor OWLClass descriptor =tail.label; //apply label annotations RDFSLabel:superCls,tail.label with tail ID); RDFSLabel:subCls,head.label with head.ID); //make tail node parent of head node Subclass(supercls,subcls); //make descriptor child of descriptor class. SubClass of node_descriptor); // Generating OWL strcuture - Terminating at leaf node superCls = tail.label with tail.id node_descriptor = tail.label with tail.ID; descriptor = tail.label; generalcategory = new (owl:Class("Category"); //apply label annotations RDFSLabel of superCls,of tail.label and tail.ID; RDFSLabelof node_descriptor,of tail.label with tail.ID; RDFSLabel of (descriptor,t.label); //make tail node parent of head node SubClass(superCls, scategory); //make descriptor child of descriptor class. SubClass(new owl:Class("Descriptor"),n_descriptor); SubClass(n_descriptor,descriptor); SubClass(generalcategory, category); SubClass(category, scategory); The automatic ontology construction of hierarchical structure, consisting of a set of classes organized in a structured manner to represent the domain’s salient classes, a set of slots associated to classes to describe their properties and relationships, and a set of instances of those classes. In OWL, classes are interpreted as sets of sub classes. The hierarchy structure of Hepatitis is depicted in Fig 2. Automatically construct a OWL structure generation, which consists of nodes. Fig. 2: Taxonomy Hierarchy Structure for Hepatitis Ontology. 643 xxxx et al, 2015 Advances in Natural and Applied Sciences, x(x) Special 2015, Pages:X-X is-a is-a is-a SGOT SGOT_N16 FATIGUE is-a ANTIVIRALS is-a FATIGUE_N13 is-a is-a is-a is-a PROTIME_N4 SEX SEX_N7 ANTIVIRALS_N15 is-a is-a is-a is-a is-a is-a is-a PROTIME LIVER_FIRM_N9 MALAISE is-a MALAISE_N2 Descriptor is-a is-a is-a AGE is-a LIVEN6 AGE_N10 is-a LIVEN14 is-a is-a SPIDERS_N1 is-a LIVEN23 ASCITES_N0 is-a is-a is-a is-a Thing ALBUMIN_N24 is-a is-a LIVEN17 is-a is-a LIVEN3 is-a is-a LIVER_FIRM_N2 64 LIVEN8 is-a is-a is-a is-a LIVEN8 LIVER_FIRM is-a is-a LIVEN11 DIEN5 LIVER_BIG_N18 is-a Category DIEN5 ALBUMIN_N27 is-a is-a LIVER_BIG is-a is-a is-a DIEN12 DIEN25 DIEN19 LIVEN30 is-a is-a ALBUMIN_N20 LIVEN28 DIEN29 is-a is-a is-a is-a DIEN21 LIVEN22 is-a Fig. 3: Automatically Constructed Ontology Structure. ALBUMIN In Extend branch represents head node denoted as Descriptor and Terminal branch represents the tail node denoted as Category. Each and every node assigned as super class (superCls) and sub class (subCls) with their node identification (ID) value like N0,N1...so on. In extend branch, super class will create a tail label (tail.label) and tail identification (tail.ID). Subsequently sub class, create a head label (head.label) and head identification (head.ID). Based on the domain, Ascites_N0 is assigned as a root node with the head Label(Ascites) with head.ID (N0). Super class of the node Ascites_N0 have two subclass as Spiders_N1 and Albumin_N24. Similarly Sub Classes are assigned to other Super Classes. In Terminal branch, have predict the presence (Die) and absence (Live) of the disease. Each branch in the decision tree may have a set of leaves. Each leaf in the decision tree represents a classification rule as well as target class (Category). Based on the two branches, Automatic construction ontology from the extracted knowledge represented in the decision tree was shown in the Fig 2. Each and every node has, is-a hierarchy relationship between the Super class and Subclass. The building ontology structure can visualize in OWLViz.The Fig 3 shows the OWL visualize tree .The automatic ontology 644 xxxx et al, 2015 Advances in Natural and Applied Sciences, x(x) Special 2015, Pages:X-X construction for Hepatitis disease through J48 would assist the medical practitioners to great extent. The automatic ontology construction provided the optimal solution and could provided efficient inference. 3. Conclusion: Hepatitis is one of the most death causing diseases whose diagnosis still remains challenging for the medical practitioners. Thus, application of computational approaches for Hepatitis prognosis is of great demand.. Since manual Ontology construction is a complex task, automatic ontology construction techniques are sought. In this paper, automatic ontology construction is attempted through the decision trees generated by the classification techniques. To get the optimal decision tree, different classification algorithms were investigated, out of which J48 performed the best yielding an accuracy of 89.13%. The generated rules are used for automatic construction of Ontology .The evolved ontology will have optimal set of important Descriptor and Category that will aid the diagnosis of Hepatitis. This method of Ontology structure construction would be great help to the medical community as well as various domain areas. REFERENCES Asunción Gómez-Pérez, Mariano Fernández-López, Oscar Corcho, 2004. Ontological Engineering: With Examples from the Areas of Knowledge Management, E-commerce and the Semantic Web, Springer, 1: 5-10. Blaschke, C., A. Valencia, 2002. Automatic Ontology Construction from the Literature”, Genome Informatics, 13: 201-213. Clerkin, P., P. Cunningham and C. Hayes, 2002. Ontology Discovery for the Semantic Web Using Hierarchical Clustering, Trinity College Dublin, Ireland. Dahab, M.Y., H. Hassan and A. Rafea, 2007. TextOntoEx: Automatic ontology construction from natural English text, Expert Systems with Applications, Elsevier, 34: 1474-1480. DOI:10.1038/npre.2012.7093.1:Posted2. Esposito, F., D. Malerba and G. Semeraro, 1997. A comparative Analysis of Methods for Pruning Decision Trees, IEEE transactions on pattern analysis and machine intelligence, 19(5): 476-491. Ganter, B., G. Stumme, R. Wille, (Eds.), 2005. Formal Concept Analysis: Foundations and Applications. Lecture Notes in Artificial Intelligence, Springer-Verlag, 3626. Geetha Ramani, R. and S.G. Jacob, 2013. Improved classification of Lung cancer tumors based on structural and physicochemical properties of proteins using data mining models. PLoS ONE, 8(3): e58772. Ian H. Witten, Eibe Frank and Mark AFrank's textbook, 2005. Data Mining Practical Machine Learning Tools and Techniques, 2nd. Ed. Mougiakakou, G.S., K.I. Valavanis, A. Nikita, et al. 2009. Diagnostic Support Systems and Computational Intelligence: Differential Diagnosis of Hepatic Lesions from Computed Tomography Images”, IGI. Quan, T.T., S.C. Hui, A.C.M. Fong and T.H. Cao, 2004. Automatic generation of ontology for scholarly semantic Web. In: Lecture Notes in Computer Science, 3298: 726-740. Quinlan, J.R., 1993. C4.5:programs for machine learning: Morgan Kaufmann Publishers Inc., 302. Shankaracharya, Kumari, S., S.A. Vidyarthi, 2012. Development of java based graphical user interface for diagnosis of hepatitis using mixture of expert, Nature proceeding. Shomona Gracia Jacob and R. Geetha Ramani, 2012. Evolving efficient classification rules from Cardiotocography data through data mining methods and techniques. European Journal of Scientific Research, 78(3): 468-480. Shomona Gracia Jacob, R. Geetha Ramani and P. Nancy, 2012. Efficient classifier for classification of Hepatitis C Virus clinical data through data mining algorithms and techniques. Proceedings of the International Conference on Computer Applications, Techno Forum Group, Pondicherry, India, 27-31. UCI Machine Learning Repository, [Online]. Available: http://archive.ics.uci.edu/ml/datasets/Hepatitis. Wuermli, O., A. Wrobel, S.C. Hui and J.M. Joller, 2003. Data Mining For Ontology Building: Semantic Web Overview, Diploma Thesis–Dep. of Computer science, Nanyang Technological University.