Download 11.11.2014 1 Hand on Weka Data Mining Tools Exsercise1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
11.11.2014
http://kt.ijs.si/petra_kralj/dmkd.html
http://kt.ijs.si/petra_kralj/dmkd.html
Data Mining Tools
Hand on Weka
2014/11/11
•
•
•
•
•
•
Weka
Orange
Knime
Taverna
Rapid Miner
ClowdFlows
http://www.cs.waikato.ac.nz/ml/weka/
http://orange.biolab.si/
http://www.knime.org/
http://www.taverna.org.uk/
http://rapid-i.com/content/view/181/196/
http://clowdflows.org/
Petra Kralj Novak
[email protected]
http://kt.ijs.si/petra_kralj/dmkd.html
Weka (Waikato Environment for Knowledge Analysis)
•
•
•
•
Collection of machine learning algorithms for data mining tasks
The algorithms
– Can be applied directly to a dataset
– Can be called from Java code (library)
Weka contains tools for
– Data pre-processing
– Classification
– Regression
– Clustering
– Association rules
– Visualization
Weka is open source software issued under the GNU General Public
Licanse
http://kt.ijs.si/petra_kralj/dmkd.html
Weka: Install
http://kt.ijs.si/petra_kralj/dmkd.html
Exsercise1: ID3 in Weka
1. Build a decision tree with the ID3
algorithm on the lenses dataset,
evaluate on a separate test set
http://kt.ijs.si/petra_kralj/dmkd.html
Weka: Run Explorer
http://www.cs.waikato.ac.nz/ml/weka/
Choose Explorer
Download
version
3.6
1
11.11.2014
http://kt.ijs.si/petra_kralj/dmkd.html
http://kt.ijs.si/petra_kralj/dmkd.html
Exercise 1: ID3 in Weka
•
•
Load the data
In the Weka data mining tool, induce a decision
tree for the lenses dataset with the ID3
algorithm.
Data:
– lensesTrain.arff
– lensesTest.arff
•
Compare the outcome with the manually
obtained results.
http://kt.ijs.si/petra_kralj/dmkd.html
http://kt.ijs.si/petra_kralj/dmkd.html
The data are loaded
Load the data - 2
lensesTrain.arff
Choose
“Classify”
Target variable
http://kt.ijs.si/petra_kralj/dmkd.html
http://kt.ijs.si/petra_kralj/dmkd.html
Choose algoritem
trees
Id3
2
11.11.2014
http://kt.ijs.si/petra_kralj/dmkd.html
1
http://kt.ijs.si/petra_kralj/dmkd.html
2
3
5
Decision tree
4
lensesTest.arff
http://kt.ijs.si/petra_kralj/dmkd.html
http://kt.ijs.si/petra_kralj/dmkd.html
Exercise 2: CAR dataset
Classification accuracy
• 1728 examples
• 6 attributes
– 6 nominal
– 0 numeric
• Nominal target variable
– 4 classes: unacc, acc, good, v-good
– Distribution of classes
Confusion
matrix
• unacc (70%), acc (22%), good (4%), v-good (4%)
• No missing values
http://kt.ijs.si/petra_kralj/dmkd.html
http://kt.ijs.si/petra_kralj/dmkd.html
Preparing the data for WEKA - 1
Data in a spreadsheet
(e.g. MS Excel)
-
Rows are examples
Columns are attributes
The last column is the target
variable
Preparing the data for WEKA - 2
Save as “.csv”
-
Careful with dots “.”,
commas “,” and
semicolons “;”!
3
11.11.2014
http://kt.ijs.si/petra_kralj/dmkd.html
Car.csv
Load the data
http://kt.ijs.si/petra_kralj/dmkd.html
Choose algorithm J48
Target variable
http://kt.ijs.si/petra_kralj/dmkd.html
http://kt.ijs.si/petra_kralj/dmkd.html
Building and evaluating the tree
Classification
accuracy
Classified as
Actual values
http://kt.ijs.si/petra_kralj/dmkd.html
http://kt.ijs.si/petra_kralj/dmkd.html
Tree pruning
Parameters of the
algorithm
(right
mouse click)
Right
mouse
click
Set the minimal number
of objects per leaf to 15
4
11.11.2014
http://kt.ijs.si/petra_kralj/dmkd.html
Reduced
number of
leaves and
nodes
http://kt.ijs.si/petra_kralj/dmkd.html
Easier to interpret
Lower
classification
accuracy
http://kt.ijs.si/petra_kralj/dmkd.html
http://kt.ijs.si/petra_kralj/dmkd.html
Naïve Bayes classifier
http://kt.ijs.si/petra_kralj/dmkd.html
http://kt.ijs.si/petra_kralj/dmkd.html
Summary
•
•
•
•
•
Weka
ID3, separate test set
Data preparation
J48 (C4.5), cross validation, tree prunning
Naïve Bayes
5
Related documents