Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
What's Weka? Basics Essential Application Types Review Related Works Weka Data Mining Software Kai Adam Proseminar 'Methods and Tools' Rheinisch-Westfälische Technische Hochschule Aachen Chair of Data Management and -Exploration Prof. Dr. T. Seidl July 06, 2012 Summary References What's Weka? Basics Essential Application Types Review Related Works Contents What's Weka? Basics ARFF Data Set Format Classication Essential Application Types Explorer Preprocessing Classication Cluster, Associate, Select Attributes, Visualization Experimenter, Knowledge Flow, Simple CLI Review Related Works Summary Bibliography Summary References What's Weka? Basics Essential Application Types Review What's Weka? • • • • • A Workbench developed by Ian H. Witten, Mark A. Hall and Eibe Frank Workbench is a collection of dierent data mining tools Works with several one relational data set types Allows to process/analyse data sets with dierent algorithms for pre-processing, classication, clustering, visualization Oers a simple GUI with four varying application types Related Works Summary References What's Weka? Basics Essential Application Types Review Related Works Summary ARFF Data Set Format • • • • ARFF stands for 'Attribute Relational File Format' Stores data in terms of a relation Attributes can take dierent types of values: Nominal, Numeric, String, Date or comparable ones Consist of a data eld with instances of a relation References What's Weka? Basics Essential Application Types Review Related Works Summary Classication • Is Used to nd a classier(model/rule set) that is able to predict the class attribute on an unknown data set References What's Weka? Basics Essential Application Types Review Related Works Summary Classication • Is Used to nd a classier(model/rule set) that is able to predict the class attribute on an unknown data set Classier • • Essential are the values of the other attributes Should predict the class value as accurately as possible References What's Weka? Basics Essential Application Types Review Related Works Summary Classication • Is Used to nd a classier(model/rule set) that is able to predict the class attribute on an unknown data set Classier • • • Essential are the values of the other attributes Should predict the class value as accurately as possible Works with training and testing data sets that are given by dierent measurments and observations References What's Weka? Basics Essential Application Types Review Related Works Summary References What's Weka? Basics Essential Application Types Review Related Works Summary References Explorer: Preprocessing • • • Generic term that includes all methods which can process raw data Raw data has to be in a good quality to get task relevant data Most data sets are in an unusable quality, caused by incomplete, noisy or inconsistent What's Weka? Basics Essential Application Types Review Related Works Summary References Explorer: Classication • Weka enables to classify with several lters and test options What's Weka? Basics Essential Application Types Review Related Works Summary References Explorer: Classication • Weka enables to classify with several lters and test options Filter • • • • Bayes Functions Trees ... What's Weka? Basics Essential Application Types Review Related Works Summary Listing 1: C4.5 Pseudo code described by Quinlan 1 . Check f o r any b a s e c a s e s . 2 . For each a t t r i b u t e a . 1. Find the normalized i n f o r m a t i o n gain from s p l i t t i n g on a . 3 . L e t a_b e s t be t h e a t t r i b u t e w i t h t h e h i g h e s t normalized information gain . 4 . C r e a t e a d e c i s i o n node t h a t s p l i t s on a_b e s t 5 . R e c u r s e on t h e s u b l i s t s o b t a i n e d by s p l i t t i n g on a_b e s t , and add t h o s e nodes a s c h i l d r e n o f node References What's Weka? Basics Essential Application Types FT-Algorithm • • Builds a functional tree With Oblique splits and linear functions at the leaves Review Related Works Summary References What's Weka? Basics Essential Application Types FT-Algorithm • • Builds a functional tree With Oblique splits and linear functions at the leaves In comparison with the J48-Algorithm • • • Higher rate of correctly classied instances Lower error rates Better choice Review Related Works Summary References What's Weka? Basics Essential Application Types Review Related Works Summary References What's Weka? Basics Essential Application Types Review Related Works Summary References Explorer: Cluster & Associate Cluster • • Determines a priority class in each cluster Compares their matches with the preassigned class Associate • Six algorithms to determine association rules between the attributes What's Weka? Basics Essential Application Types Review Related Works Summary Explorer: Select Attribute & Visualization Select Attribute • Its to nd an best matching attribute in dependencies of a preassigned class Visualization • Visualize a data set and not the results of a classication References What's Weka? Basics Essential Application Types Review Related Works Summary References Experimenter • • • Allows to run algorithms against each other Could be applied on dierent data sets Provides three panels to Setup, Run and Analyse What's Weka? Basics Essential Application Types Review Related Works Knowledge Flow & Simple CLI Knowledge Flow • • Oers better opportunities to work with large data sets Enables user to develop a data ow CLI • Possibility to use dierent options that were hidden in the Explorer Summary References What's Weka? Basics Essential Application Types Review Related Works Summary Review Pros • • • Distributed under the General Public License [GNU] Simple GUI that provides some useful features Portability Cons • • Weka is not compatible with multi-relational data sets Lack of proper and adequate documentations References What's Weka? Basics Essential Application Types Review Related Works Summary References Related Works • http://www.kdnuggets.com/polls/ 2011/tools-analytics-datamining.html Free available tools are in high demand What's Weka? Basics Essential Application Types Review Related Works Summary References Related Works • Free available tools are in high demand RapidMiner • • • http://www.kdnuggets.com/polls/ 2011/tools-analytics-datamining.html Is the most commonly used data mining tool Started at the Articial Intelligence Unit of the University of Dortmund in 2001 Implements Weka learning schemes and allows to take advantage of varying modeling methods What's Weka? Basics Essential Application Types Review Related Works Summary References Summary • • • • Weka makes it easier to nd an optimized algorithm out of a collection Ability to implement algorithms Weka is becoming more comfortable in handling Weka could be integrated in every Java using Platform and enables the portability What's Weka? Basics Essential Application Types Review Related Works Summary References References & Bibliography Generated to model psychological experiments reported by Siegler, R.S.(1976). Three Aspects of Cognitive Development. Cognitive Psychology, 8, 481-520. http://archive.ics.uci.edu/ml/machine-learningdatabases/balance-scale/ Data Mining Tool Usage. http://www.kdnuggets.com/polls/2011/tools-analytics-datamining.html Prof. Dr. Thomas Seidl, Slides from the Data Mining lecture, Year 2011 What's Weka? Basics Essential Application Types Review Related Works Summary References Szugat, Martin: Im Datenrausch: Praktische Einführung in das Data Mining mit Weka 3.4. http://bioweka.sourceforge.net/download/LE_1.05_75-79.pdf Witten, H. Ian and Frank, Eibe and Hall, A. Mark: Data Mining: Practical Machine Learning Tools and Techniques.Third Edition. Morgan Kaufmann, 2011. Frank, Eibe and Hall, Mark and Trigg, Len and Holmes, Georey and Witten, Ian H.: Data mining in bioinformatics using Weka. Volume 20, Number 20, Pages 2479-2481, Year 2004