Download Document

Document related concepts
no text concepts found
Transcript

Machine Learning with
WEKA

WEKA: A Machine
Learning Toolkit
The Explorer
•
•
Eibe Frank
•
•
Department of Computer Science,
University of Waikato, New Zealand
•



Classification and
Regression
Clustering
Association Rules
Attribute Selection
Data Visualization
The Experimenter
The Knowledge
Flow GUI
Conclusions
WEKA: the bird
Copyright: Martin Kramer ([email protected])
5/22/2017
University of Waikato
2
WEKA: the software
•
•
•
Machine learning/data mining software written in
Java (distributed under the GNU Public License)
Complements “Data Mining” by Witten & Frank
Main features:
•
•
•
5/22/2017
Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
Graphical user interfaces (incl. data visualization)
Environment for comparing learning algorithms
University of Waikato
3
WEKA only deals with “flat” files
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
5/22/2017
University of Waikato
4
WEKA only deals with “flat” files
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
5/22/2017
University of Waikato
5
5/22/2017
University of Waikato
6
Explorer: pre-processing the data
•
•
•
•
Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL
database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:
•
5/22/2017
Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
University of Waikato
7
5/22/2017
University of Waikato
8
5/22/2017
University of Waikato
9
5/22/2017
University of Waikato
10
5/22/2017
University of Waikato
11
5/22/2017
University of Waikato
12
5/22/2017
University of Waikato
13
5/22/2017
University of Waikato
14
5/22/2017
University of Waikato
15
5/22/2017
University of Waikato
16
5/22/2017
University of Waikato
17
5/22/2017
University of Waikato
18
5/22/2017
University of Waikato
19
5/22/2017
University of Waikato
20
5/22/2017
University of Waikato
21
5/22/2017
University of Waikato
22
5/22/2017
University of Waikato
23
5/22/2017
University of Waikato
24
5/22/2017
University of Waikato
25
5/22/2017
University of Waikato
26
5/22/2017
University of Waikato
27
5/22/2017
University of Waikato
28
Explorer: building “classifiers”
•
•
Classifiers in WEKA are models for predicting
nominal or numeric quantities
Implemented learning schemes include:
•
•
Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
“Meta”-classifiers include:
•
5/22/2017
Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
University of Waikato
29
5/22/2017
University of Waikato
30
5/22/2017
University of Waikato
31
5/22/2017
University of Waikato
32
5/22/2017
University of Waikato
33
5/22/2017
University of Waikato
34
5/22/2017
University of Waikato
35
5/22/2017
University of Waikato
36
5/22/2017
University of Waikato
37
5/22/2017
University of Waikato
38
5/22/2017
University of Waikato
39
5/22/2017
University of Waikato
40
5/22/2017
University of Waikato
41
5/22/2017
University of Waikato
42
5/22/2017
University of Waikato
43
5/22/2017
University of Waikato
44
5/22/2017
University of Waikato
45
5/22/2017
University of Waikato
46
5/22/2017
University of Waikato
47
5/22/2017
University of Waikato
48
5/22/2017
University of Waikato
49
5/22/2017
University of Waikato
50
5/22/2017
University of Waikato
51
5/22/2017
University of Waikato
52
Explorer: clustering data
•
•
WEKA contains “clusterers” for finding groups of
similar instances in a dataset
Implemented schemes are:
•
•
•
k-Means, EM, Cobweb, X-means, FarthestFirst
Clusters can be visualized and compared to “true”
clusters (if given)
Evaluation based on loglikelihood if clustering
scheme produces a probability distribution
5/22/2017
University of Waikato
53
5/22/2017
University of Waikato
54
5/22/2017
University of Waikato
55
5/22/2017
University of Waikato
56
5/22/2017
University of Waikato
57
5/22/2017
University of Waikato
58
5/22/2017
University of Waikato
59
5/22/2017
University of Waikato
60
5/22/2017
University of Waikato
61
5/22/2017
University of Waikato
62
5/22/2017
University of Waikato
63
5/22/2017
University of Waikato
64
5/22/2017
University of Waikato
65
5/22/2017
University of Waikato
66
5/22/2017
University of Waikato
67
5/22/2017
University of Waikato
68
Explorer: finding associations
•
WEKA contains an implementation of the Apriori
algorithm for learning association rules
•
•
Can identify statistical dependencies between
groups of attributes:
•
•
Works only with discrete data
milk, butter  bread, eggs (with confidence 0.9 and
support 2000)
Apriori can compute all rules that have a given
minimum support and exceed a given confidence
5/22/2017
University of Waikato
69
5/22/2017
University of Waikato
70
5/22/2017
University of Waikato
71
5/22/2017
University of Waikato
72
5/22/2017
University of Waikato
73
5/22/2017
University of Waikato
74
5/22/2017
University of Waikato
75
5/22/2017
University of Waikato
76
Conclusion: try it yourself!
WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
• Also has a list of projects based on WEKA
•
5/22/2017
University of Waikato
77
Related documents