Download WEKA: the bird - DataBase and Data Mining Group

Document related concepts
no text concepts found
Transcript
„
Machine Learning with
WEKA
„
WEKA: A Machine
Learning Toolkit
The Explorer
•
•
Eibe Frank
•
•
Department of Computer Science,
University of Waikato, New Zealand
•
„
„
„
Classification and
Regression
Clustering
Association Rules
Attribute Selection
Data Visualization
The Experimenter
The Knowledge
Flow GUI
Conclusions
WEKA: the bird
Copyright: Martin Kramer ([email protected])
1/10/2008
Machine Learning for Data Mining
University of Waikato
2
1
WEKA: the software
„
„
„
„
Machine learning/data mining software written in
Java (distributed under the GNU Public License)
Used for research, education, and applications
Complements “Data Mining” by Witten & Frank
Main features:
Comprehensive set of data pre-processing
pre processing tools,
learning algorithms and evaluation methods
‹ Graphical user interfaces (incl. data visualization)
‹ Environment for comparing learning algorithms
‹
1/10/2008
University of Waikato
3
WEKA: versions
„
There are several versions of WEKA:
WEKA 3.0: “book version” compatible with
description in data mining book
‹ WEKA 3.2: “GUI version” adds graphical user
interfaces (book version is command-line only)
‹ WEKA 3.3: “development version” with lots of
i
improvements
t
‹
„
This talk is based on the latest snapshot of WEKA
3.3 (soon to be WEKA 3.4)
1/10/2008
Machine Learning for Data Mining
University of Waikato
4
2
WEKA only deals with “flat” files
@relation heart-disease-simplified
@ tt ib t age numeric
@attribute
i
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
1/10/2008
University of Waikato
5
WEKA only deals with “flat” files
@relation heart-disease-simplified
@ tt ib t age numeric
@attribute
i
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
1/10/2008
Machine Learning for Data Mining
University of Waikato
6
3
1/10/2008
University of Waikato
7
1/10/2008
University of Waikato
8
Machine Learning for Data Mining
4
1/10/2008
University of Waikato
9
Explorer: pre-processing the data
„
„
„
„
Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL
database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:
‹
Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
1/10/2008
Machine Learning for Data Mining
University of Waikato
10
5
1/10/2008
University of Waikato
11
1/10/2008
University of Waikato
12
Machine Learning for Data Mining
6
1/10/2008
University of Waikato
13
1/10/2008
University of Waikato
14
Machine Learning for Data Mining
7
1/10/2008
University of Waikato
15
1/10/2008
University of Waikato
16
Machine Learning for Data Mining
8
1/10/2008
University of Waikato
17
1/10/2008
University of Waikato
18
Machine Learning for Data Mining
9
1/10/2008
University of Waikato
19
1/10/2008
University of Waikato
20
Machine Learning for Data Mining
10
1/10/2008
University of Waikato
21
1/10/2008
University of Waikato
22
Machine Learning for Data Mining
11
1/10/2008
University of Waikato
23
1/10/2008
University of Waikato
24
Machine Learning for Data Mining
12
1/10/2008
University of Waikato
25
1/10/2008
University of Waikato
26
Machine Learning for Data Mining
13
1/10/2008
University of Waikato
27
1/10/2008
University of Waikato
28
Machine Learning for Data Mining
14
1/10/2008
University of Waikato
29
1/10/2008
University of Waikato
30
Machine Learning for Data Mining
15
1/10/2008
University of Waikato
31
Explorer: building “classifiers”
„
„
Classifiers in WEKA are models for predicting
nominal or numeric quantities
Implemented learning schemes include:
‹
„
Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
“Meta”-classifiers include:
‹
Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
1/10/2008
Machine Learning for Data Mining
University of Waikato
32
16
1/10/2008
University of Waikato
33
1/10/2008
University of Waikato
34
Machine Learning for Data Mining
17
1/10/2008
University of Waikato
35
1/10/2008
University of Waikato
36
Machine Learning for Data Mining
18
1/10/2008
University of Waikato
37
1/10/2008
University of Waikato
38
Machine Learning for Data Mining
19
1/10/2008
University of Waikato
39
1/10/2008
University of Waikato
40
Machine Learning for Data Mining
20
1/10/2008
University of Waikato
41
1/10/2008
University of Waikato
42
Machine Learning for Data Mining
21
1/10/2008
University of Waikato
43
1/10/2008
University of Waikato
44
Machine Learning for Data Mining
22
1/10/2008
University of Waikato
45
1/10/2008
University of Waikato
46
Machine Learning for Data Mining
23
1/10/2008
University of Waikato
47
1/10/2008
University of Waikato
48
Machine Learning for Data Mining
24
1/10/2008
University of Waikato
49
1/10/2008
University of Waikato
50
Machine Learning for Data Mining
25
1/10/2008
University of Waikato
51
1/10/2008
University of Waikato
52
Machine Learning for Data Mining
26
1/10/2008
University of Waikato
53
1/10/2008
University of Waikato
54
Machine Learning for Data Mining
27
1/10/2008
University of Waikato
55
1/10/2008
University of Waikato
56
Machine Learning for Data Mining
28
1/10/2008
University of Waikato
57
1/10/2008
University of Waikato
58
Machine Learning for Data Mining
29
1/10/2008
University of Waikato
59
1/10/2008
University of Waikato
60
Machine Learning for Data Mining
30
1/10/2008
University of Waikato
61
1/10/2008
University of Waikato
62
Machine Learning for Data Mining
31
1/10/2008
University of Waikato
63
1/10/2008
University of Waikato
64
Machine Learning for Data Mining
32
1/10/2008
University of Waikato
65
1/10/2008
University of Waikato
66
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
Machine Learning for Data Mining
33
1/10/2008
University of Waikato
67
1/10/2008
University of Waikato
68
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
Machine Learning for Data Mining
34
1/10/2008
University of Waikato
69
1/10/2008
University of Waikato
70
Machine Learning for Data Mining
35
1/10/2008
University of Waikato
71
1/10/2008
University of Waikato
72
Machine Learning for Data Mining
36
1/10/2008
University of Waikato
73
1/10/2008
University of Waikato
74
Machine Learning for Data Mining
37
QuickTime™ and a TIFF (LZW) decompressor are needed to see this pict
1/10/2008
University of Waikato
75
1/10/2008
University of Waikato
76
Machine Learning for Data Mining
38
1/10/2008
University of Waikato
77
1/10/2008
University of Waikato
78
Machine Learning for Data Mining
39
1/10/2008
University of Waikato
79
University of Waikato
80
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
1/10/2008
Machine Learning for Data Mining
40
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
1/10/2008
University of Waikato
81
1/10/2008
University of Waikato
82
Machine Learning for Data Mining
41
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
1/10/2008
University of Waikato
83
1/10/2008
University of Waikato
84
Machine Learning for Data Mining
42
1/10/2008
University of Waikato
85
1/10/2008
University of Waikato
86
Machine Learning for Data Mining
43
1/10/2008
University of Waikato
87
1/10/2008
University of Waikato
88
Machine Learning for Data Mining
44
1/10/2008
University of Waikato
89
1/10/2008
University of Waikato
90
Machine Learning for Data Mining
45
1/10/2008
University of Waikato
91
Explorer: clustering data
„
„
WEKA contains “clusterers” for finding groups of
similar instances in a dataset
Implemented schemes are:
‹
„
„
k-Means, EM, Cobweb, X-means, FarthestFirst
Clusters can be visualized and compared to “true”
clusters ((if given)
g
)
Evaluation based on loglikelihood if clustering
scheme produces a probability distribution
1/10/2008
Machine Learning for Data Mining
University of Waikato
92
46
1/10/2008
University of Waikato
93
1/10/2008
University of Waikato
94
Machine Learning for Data Mining
47
1/10/2008
University of Waikato
95
1/10/2008
University of Waikato
96
Machine Learning for Data Mining
48
1/10/2008
University of Waikato
97
1/10/2008
University of Waikato
98
Machine Learning for Data Mining
49
1/10/2008
University of Waikato
99
1/10/2008
University of Waikato
100
Machine Learning for Data Mining
50
1/10/2008
University of Waikato
101
1/10/2008
University of Waikato
102
Machine Learning for Data Mining
51
1/10/2008
University of Waikato
103
1/10/2008
University of Waikato
104
Machine Learning for Data Mining
52
1/10/2008
University of Waikato
105
1/10/2008
University of Waikato
106
Machine Learning for Data Mining
53
1/10/2008
University of Waikato
107
Explorer: finding associations
„
WEKA contains an implementation of the Apriori
algorithm for learning association rules
‹
„
Can identify statistical dependencies between
groups of attributes:
‹
„
Works only with discrete data
milk, butter ⇒ bread, eggs (with confidence 0.9 and
support 2000)
Apriori can compute all rules that have a given
minimum support and exceed a given confidence
1/10/2008
Machine Learning for Data Mining
University of Waikato
108
54
1/10/2008
University of Waikato
109
1/10/2008
University of Waikato
110
Machine Learning for Data Mining
55
1/10/2008
University of Waikato
111
1/10/2008
University of Waikato
112
Machine Learning for Data Mining
56
1/10/2008
University of Waikato
113
1/10/2008
University of Waikato
114
Machine Learning for Data Mining
57
1/10/2008
University of Waikato
115
Explorer: attribute selection
„
„
Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
Attribute selection methods contain two parts:
A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
‹ An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
‹
„
Very flexible: WEKA allows (almost) arbitrary
combinations of these two
1/10/2008
Machine Learning for Data Mining
University of Waikato
116
58
1/10/2008
University of Waikato
117
1/10/2008
University of Waikato
118
Machine Learning for Data Mining
59
1/10/2008
University of Waikato
119
1/10/2008
University of Waikato
120
Machine Learning for Data Mining
60
1/10/2008
University of Waikato
121
1/10/2008
University of Waikato
122
Machine Learning for Data Mining
61
1/10/2008
University of Waikato
123
1/10/2008
University of Waikato
124
Machine Learning for Data Mining
62
Explorer: data visualization
„
„
Visualization very useful in practice: e.g. helps to
determine difficulty of the learning problem
WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)
‹
„
„
„
To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values
“Jitter” option to deal with nominal attributes (and
to detect “hidden” data points)
“Zoom-in” function
1/10/2008
University of Waikato
125
1/10/2008
University of Waikato
126
Machine Learning for Data Mining
63
1/10/2008
University of Waikato
127
1/10/2008
University of Waikato
128
Machine Learning for Data Mining
64
1/10/2008
University of Waikato
129
1/10/2008
University of Waikato
130
Machine Learning for Data Mining
65
1/10/2008
University of Waikato
131
1/10/2008
University of Waikato
132
Machine Learning for Data Mining
66
1/10/2008
University of Waikato
133
1/10/2008
University of Waikato
134
Machine Learning for Data Mining
67
1/10/2008
University of Waikato
135
1/10/2008
University of Waikato
136
Machine Learning for Data Mining
68
1/10/2008
University of Waikato
137
Performing experiments
„
„
„
„
„
„
Experimenter makes it easy to compare the
performance of different learning schemes
For classification and regression problems
Results can be written into file or database
Evaluation options: cross-validation, learning
curve,, hold-out
Can also iterate over different parameter settings
Significance-testing built in!
1/10/2008
Machine Learning for Data Mining
University of Waikato
138
69
1/10/2008
University of Waikato
139
1/10/2008
University of Waikato
140
Machine Learning for Data Mining
70
1/10/2008
University of Waikato
141
1/10/2008
University of Waikato
142
Machine Learning for Data Mining
71
1/10/2008
University of Waikato
143
1/10/2008
University of Waikato
144
Machine Learning for Data Mining
72
1/10/2008
University of Waikato
145
1/10/2008
University of Waikato
146
Machine Learning for Data Mining
73
1/10/2008
University of Waikato
147
1/10/2008
University of Waikato
148
Machine Learning for Data Mining
74
1/10/2008
University of Waikato
149
1/10/2008
University of Waikato
150
Machine Learning for Data Mining
75
1/10/2008
University of Waikato
151
The Knowledge Flow GUI
„
„
„
„
„
New graphical user interface for WEKA
Java-Beans-based interface for setting up and
running machine learning experiments
Data sources, classifiers, etc. are beans and can
be connected graphically
Data “flows” through
g components:
p
e.g.,
g,
“data source” -> “filter” -> “classifier” -> “evaluator”
Layouts can be saved and loaded again later
1/10/2008
Machine Learning for Data Mining
University of Waikato
152
76
1/10/2008
University of Waikato
153
1/10/2008
University of Waikato
154
Machine Learning for Data Mining
77
1/10/2008
University of Waikato
155
1/10/2008
University of Waikato
156
Machine Learning for Data Mining
78
1/10/2008
University of Waikato
157
1/10/2008
University of Waikato
158
Machine Learning for Data Mining
79
1/10/2008
University of Waikato
159
1/10/2008
University of Waikato
160
Machine Learning for Data Mining
80
1/10/2008
University of Waikato
161
1/10/2008
University of Waikato
162
Machine Learning for Data Mining
81
1/10/2008
University of Waikato
163
1/10/2008
University of Waikato
164
Machine Learning for Data Mining
82
1/10/2008
University of Waikato
165
1/10/2008
University of Waikato
166
Machine Learning for Data Mining
83
1/10/2008
University of Waikato
167
1/10/2008
University of Waikato
168
Machine Learning for Data Mining
84
1/10/2008
University of Waikato
169
1/10/2008
University of Waikato
170
Machine Learning for Data Mining
85
1/10/2008
University of Waikato
171
1/10/2008
University of Waikato
172
Machine Learning for Data Mining
86
Conclusion: try it yourself!
„
ƒ
ƒ
WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
Also has a list of projects based on WEKA
WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
Pfahringer
g , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger
g
,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,
Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,
Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang
1/10/2008
Machine Learning for Data Mining
University of Waikato
173
87
Related documents