Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Machine Learning with WEKA WEKA: A Machine Learning Toolkit The Explorer • • Eibe Frank • • Department of Computer Science, University of Waikato, New Zealand • Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions WEKA: the bird Copyright: Martin Kramer ([email protected]) 4/30/2017 University of Waikato 2 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms 4/30/2017 University of Waikato 3 WEKA: versions There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4) 4/30/2017 University of Waikato 4 WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 4/30/2017 University of Waikato 5 WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 4/30/2017 University of Waikato 6 4/30/2017 University of Waikato 7 4/30/2017 University of Waikato 8 4/30/2017 University of Waikato 9 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: 4/30/2017 Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … University of Waikato 10 4/30/2017 University of Waikato 11 4/30/2017 University of Waikato 12 4/30/2017 University of Waikato 13 4/30/2017 University of Waikato 14 4/30/2017 University of Waikato 15 4/30/2017 University of Waikato 16 4/30/2017 University of Waikato 17 4/30/2017 University of Waikato 18 4/30/2017 University of Waikato 19 4/30/2017 University of Waikato 20 4/30/2017 University of Waikato 21 4/30/2017 University of Waikato 22 4/30/2017 University of Waikato 23 4/30/2017 University of Waikato 24 4/30/2017 University of Waikato 25 4/30/2017 University of Waikato 26 4/30/2017 University of Waikato 27 4/30/2017 University of Waikato 28 4/30/2017 University of Waikato 29 4/30/2017 University of Waikato 30 4/30/2017 University of Waikato 31 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: 4/30/2017 Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, … University of Waikato 32 4/30/2017 University of Waikato 33 4/30/2017 University of Waikato 34 4/30/2017 University of Waikato 35 4/30/2017 University of Waikato 36 4/30/2017 University of Waikato 37 4/30/2017 University of Waikato 38 4/30/2017 University of Waikato 39 4/30/2017 University of Waikato 40 4/30/2017 University of Waikato 41 4/30/2017 University of Waikato 42 4/30/2017 University of Waikato 43 4/30/2017 University of Waikato 44 4/30/2017 University of Waikato 45 4/30/2017 University of Waikato 46 4/30/2017 University of Waikato 47 4/30/2017 University of Waikato 48 4/30/2017 University of Waikato 49 4/30/2017 University of Waikato 50 4/30/2017 University of Waikato 51 4/30/2017 University of Waikato 52 4/30/2017 University of Waikato 53 4/30/2017 University of Waikato 54 4/30/2017 University of Waikato 55 4/30/2017 University of Waikato 56 4/30/2017 University of Waikato 57 4/30/2017 University of Waikato 58 4/30/2017 University of Waikato 59 4/30/2017 University of Waikato 60 4/30/2017 University of Waikato 61 4/30/2017 University of Waikato 62 4/30/2017 University of Waikato 63 4/30/2017 University of Waikato 64 4/30/2017 QuickTime™ and a TI FF (LZW) decompressor are needed to see this picture. University of Waikato 65 4/30/2017 QuickTime™ and a TI FF (LZW) decompressor are needed to see this picture. University of Waikato 66 4/30/2017 QuickTime™ and a TI FF (LZW) decompressor are needed to see this picture. University of Waikato 67 4/30/2017 University of Waikato 68 4/30/2017 University of Waikato 69 4/30/2017 University of Waikato 70 4/30/2017 University of Waikato 71 4/30/2017 University of Waikato 72 4/30/2017 University of Waikato 73 4/30/2017 University of Waikato 74 Qu i ck Ti me ™a nd a TIFF (LZW)d ec omp res so ra re ne ed ed to s ee th i s pi c tu re. 4/30/2017 University of Waikato 75 4/30/2017 University of Waikato 76 4/30/2017 University of Waikato 77 4/30/2017 University of Waikato 78 4/30/2017 University of Waikato 79 QuickTime™ and a TI FF (LZW) decompressor are needed to see this picture. 4/30/2017 University of Waikato 80 QuickTime™ and a TI FF (LZW) decompressor are needed to see this picture. 4/30/2017 University of Waikato 81 4/30/2017 University of Waikato 82 QuickTime™ and a TI FF (LZW) decompressor are needed to see this picture. 4/30/2017 University of Waikato 83 4/30/2017 University of Waikato 84 4/30/2017 University of Waikato 85 4/30/2017 University of Waikato 86 4/30/2017 University of Waikato 87 4/30/2017 University of Waikato 88 4/30/2017 University of Waikato 89 4/30/2017 University of Waikato 90 4/30/2017 University of Waikato 91 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution 4/30/2017 University of Waikato 92 4/30/2017 University of Waikato 93 4/30/2017 University of Waikato 94 4/30/2017 University of Waikato 95 4/30/2017 University of Waikato 96 4/30/2017 University of Waikato 97 4/30/2017 University of Waikato 98 4/30/2017 University of Waikato 99 4/30/2017 University of Waikato 100 4/30/2017 University of Waikato 101 4/30/2017 University of Waikato 102 4/30/2017 University of Waikato 103 4/30/2017 University of Waikato 104 4/30/2017 University of Waikato 105 4/30/2017 University of Waikato 106 4/30/2017 University of Waikato 107 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Can identify statistical dependencies between groups of attributes: Works only with discrete data milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence 4/30/2017 University of Waikato 108 4/30/2017 University of Waikato 109 4/30/2017 University of Waikato 110 4/30/2017 University of Waikato 111 4/30/2017 University of Waikato 112 4/30/2017 University of Waikato 113 4/30/2017 University of Waikato 114 4/30/2017 University of Waikato 115 Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two 4/30/2017 University of Waikato 116 4/30/2017 University of Waikato 117 4/30/2017 University of Waikato 118 4/30/2017 University of Waikato 119 4/30/2017 University of Waikato 120 4/30/2017 University of Waikato 121 4/30/2017 University of Waikato 122 4/30/2017 University of Waikato 123 4/30/2017 University of Waikato 124 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function 4/30/2017 University of Waikato 125 4/30/2017 University of Waikato 126 4/30/2017 University of Waikato 127 4/30/2017 University of Waikato 128 4/30/2017 University of Waikato 129 4/30/2017 University of Waikato 130 4/30/2017 University of Waikato 131 4/30/2017 University of Waikato 132 4/30/2017 University of Waikato 133 4/30/2017 University of Waikato 134 4/30/2017 University of Waikato 135 4/30/2017 University of Waikato 136 4/30/2017 University of Waikato 137 Conclusion: try it yourself! WEKA is available at http://www.cs.waikato.ac.nz/ml/weka Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang 4/30/2017 University of Waikato 138