Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Generating Rules through WEKA Dr. Alka Arora 1. Introduction Weka is open source software under the GNU General Public License. System is developed at the University of Waikato in New Zealand. “Weka” stands for the Waikato Environment for Knowledge Analysis. The software is freely available at http://www.cs.waikato.ac.nz/ml/weka. The system is written using object oriented language Java. There are several different levels at which Weka can be used. Weka provides implementations of state-of-the-art data mining and machine learning algorithms. Weka contains modules for data preprocessing, classification, clustering and association rule extraction. Main features of Weka include: 49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 15 attribute/subset evaluators + 10 search algorithms for feature selection. 3 algorithms for finding association rules 3 graphical user interfaces – “The Explorer” (exploratory data analysis) – “The Experimenter” (experimental environment) – “The KnowledgeFlow” (new process model inspired interface) 1.1 Weka: Download and Installation Download Weka (the stable version) from http://www.cs.waikato.ac.nz/ml/weka/ – Choose a self-extracting executable (including Java VM) – (If you are interested in modifying/extending weka there is a developer version that includes the source code) After download is completed, run the self extracting file to install Weka, and use the default set-ups. 1.2 Start the Weka From windows desktop, – click “Start”, choose “All programs”, Choose “Weka 3.6” to start Weka – Then the first interface window appears: Weka GUI Chooser. 1.3 Weka Application Interfaces Explorer – preprocessing, attribute selection, learning, visualiation Experimenter – testing and evaluating machine learning algorithms Knowledge Flow – visual design of KDD process – Explorer Simple Command-line – A simple interface for typing commands 1.4 WEKA data formats Data can be imported from a file in various formats: ARFF (Attribute Relation File Format) has two sections: – the Header information defines attribute name, type and relations. – the Data section lists the data records. CSV: Comma Separated Values (text file) Data can also be read from a database using ODBC connectivity. 1.4.1 Attribute Relation File Format (arff) ARFF format of animal dataset is shown below: @relation Animal_clustered @attribute Instance_number numeric @attribute Name {frog,toad,elephant,cat,dog,rabbit,jaguar,whale} @attribute Metabolism {ectothermic,endothermic} @attribute Cover {wetskin,hair} @attribute Dentition {superior,no,complete,the_youngest} @attribute Reproduction {oviparous,viviparous} @attribute 'Number of feet' numeric @data 0,frog,ectothermic,wetskin,superior,oviparous,4 1,toad,ectothermic,wetskin,no,oviparous,4 2,elephant,endothermic,hair,complete,viviparous,4 3,cat,endothermic,hair,complete,viviparous,4 4,dog,endothermic,hair,complete,viviparous,4 5,rabbit,endothermic,hair,complete,viviparous,4 6,jaguar,endothermic,hair,complete,viviparous,4 7,whale,endothermic,hair,the_youngest,viviparous,0 . 2. Weka: Knowledge Flow GUI Knowledge Flow is new graphical user interface in WEKA. It is Java-Beans-based interface for setting up and running machine learning experiments. In this, data sources, classifiers, etc. are beans and can be connected graphically. Data “flows” through components: e.g., “data source” -> “filter” -> “classifier” -> “evaluator” These layouts can be saved and loaded again later. 2.1 Example – Type: Classification – Feature selection: GainRatio; Ranker (top 3) – Algorithm: ID3 – Training: Weather_nominal.arff – Test: Weather_nominal.arff Steps to carry out this experiment in Weka Knowledge Flow are presented here: References: [1] Holmes, A. Donkin, I. H. Witten, WEKA: A Machine Learning Workbench, In Proceedings of the Second Australian and New Zealand Conference on Intelligent Information Systems, 357-361,1994. [2] Software, available at: http://www.cs.waikato.ac.nz/~ml/ [3] I. H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementation, Morgan Kaufmann publishers, 1999. [4] http://webpages.uncc.edu/~wjiang3/TA/weka/practiceWEKA.pdf [5] www.cs.waikato.ac.nz/ml/weka/index_documentation.html [6] http://www.cs.utexas.edu/users/ml/tutorials/Weka-tut/index.htm [7] http://www.cs.ru.nl/~peterl/teaching/DM/weka.pdf