Download Machine Learning Software

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
An Exercise in
Machine Learning
http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/
Cornelia Caragea
Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
Machine Learning Software

Suites (General Purpose)





Specific




WEKA (Source: Java)
MLC++ (Source: C++)
SAS
List from KDNuggets (Various)
Classification: C4.5, SVMlight
Association Rule Mining
Bayesian Net …
Commercial vs. Free
What does WEKA do?
Implementation of the state-of-the-art
learning algorithm
 Main strengths in the classification
 Regression, Association Rules and
clustering algorithms
 Extensible to try new learning schemes
 Large variety of handy tools (transforming
datasets, filters, visualization etc…)

WEKA resources




API Documentation, Tutorials, Source code.
WEKA mailing list
Data Mining: Practical Machine Learning Tools
and Techniques with Java Implementations
Weka-related Projects:




Weka-Parallel - parallel processing for Weka
RWeka - linking R and Weka
YALE - Yet Another Learning Environment
Many others…
Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
Preparing Data
ARFF Data Format
 Header – describing the
attribute types
 Data – (instances,
examples) commaseparated list

Launching WEKA

java -jar weka.jar
Load Dataset into WEKA
Data Filters




Useful support for data preprocessing
Removing or adding attributes, resampling the
dataset, removing examples, etc.
Creates stratified cross-validation folds of the
given dataset, and class distributions are
approximately retained within each fold.
Typically split data as 2/3 in training and 1/3 in
testing
Data Filters
Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
Building Classifiers
A classifier model - mapping from dataset
attributes to the class (target) attribute.
Creation and form differs.
 Decision Tree and Naïve Bayes Classifiers
 Which one is the best?


No Free Lunch!
Building Classifiers
(1) weka.classifiers.rules.ZeroR



Class for building and using a 0-R classifier
Majority class classifier
Predicts the mean (for a numeric class) or the
mode (for a nominal class)
Exercise 1

http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/
exercises/ex1.html
(2)weka.classifiers.bayes.NaiveBayes

Class for building a Naive Bayes classifier
(3) weka.classifiers.trees.J48

Class for generating a pruned or
unpruned C4.5 decision tree
Test Options
Percentage Split (2/3 Training; 1/3 Testing)
 Cross-validation

estimating the generalization error based on
resampling when limited data; averaged error
estimate.
 stratified
 10-fold
 leave-one-out (Loo)

Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
Understanding Output
Decision Tree Output (1)
Decision Tree Output (2)
Exercise 2

http://www.cs.iastate.edu/~cs573x/BBSIlab/
2006/exercises/ex2.html
Performance Measures






Accuracy & Error rate
Confusion matrix – contingency table
True Positive rate & False Positive rate (Area
under Receiver Operating Characteristic)
Precision,Recall & F-Measure
Sensitivity & Specificity
For more information on these, see

uisp09-Evaluation.ppt
Decision Tree Pruning
Overcome Over-fitting
 Pre-pruning and Post-pruning
 Reduced error pruning
 Subtree raising with different confidence
 Comparing tree size and accuracy

Subtree replacement

Bottom-up: tree is considered for
replacement once all its subtrees have
been considered
Subtree Raising
Deletes node and redistributes instances
 Slower than subtree replacement

Exercise 3

http://www.cs.iastate.edu/~cs573x/BBSIlab/
2006/exercises/ex3.html
Related documents