Download Data Mining Software Kai Adam July 06, 2012

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
What's Weka?
Basics
Essential Application Types
Review
Related Works
Weka
Data Mining Software
Kai Adam
Proseminar 'Methods and Tools'
Rheinisch-Westfälische Technische Hochschule Aachen
Chair of Data Management and -Exploration
Prof. Dr. T. Seidl
July 06, 2012
Summary
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Contents
What's Weka?
Basics
ARFF Data Set Format
Classication
Essential Application Types
Explorer
Preprocessing
Classication
Cluster, Associate, Select Attributes, Visualization
Experimenter, Knowledge Flow, Simple CLI
Review
Related Works
Summary
Bibliography
Summary
References
What's Weka?
Basics
Essential Application Types
Review
What's Weka?
•
•
•
•
•
A Workbench developed by Ian H.
Witten, Mark A. Hall and Eibe Frank
Workbench is a collection of dierent
data mining tools
Works with several one relational data
set types
Allows to process/analyse data sets
with dierent algorithms for
pre-processing, classication,
clustering, visualization
Oers a simple GUI with four varying
application types
Related Works
Summary
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
ARFF Data Set Format
•
•
•
•
ARFF stands for 'Attribute Relational File Format'
Stores data in terms of a relation
Attributes can take dierent types of values:
Nominal, Numeric, String, Date or comparable ones
Consist of a data eld with instances of a relation
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
Classication
•
Is Used to nd a classier(model/rule set) that is able to
predict the class attribute on an unknown data set
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
Classication
•
Is Used to nd a classier(model/rule set) that is able to
predict the class attribute on an unknown data set
Classier
•
•
Essential are the values of the other attributes
Should predict the class value as accurately as possible
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
Classication
•
Is Used to nd a classier(model/rule set) that is able to
predict the class attribute on an unknown data set
Classier
•
•
•
Essential are the values of the other attributes
Should predict the class value as accurately as possible
Works with training and testing data sets that are given by
dierent measurments and observations
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
Explorer: Preprocessing
•
•
•
Generic term that includes
all methods which can
process raw data
Raw data has to be in a
good quality to get task
relevant data
Most data sets are in an
unusable quality, caused
by incomplete, noisy or
inconsistent
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
Explorer: Classication
•
Weka enables to classify
with several lters and
test options
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
Explorer: Classication
•
Weka enables to classify
with several lters and
test options
Filter
•
•
•
•
Bayes
Functions
Trees
...
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
Listing 1: C4.5 Pseudo code described by Quinlan
1 . Check f o r any b a s e c a s e s .
2 . For each a t t r i b u t e a .
1. Find the normalized i n f o r m a t i o n gain
from s p l i t t i n g on a .
3 . L e t a_b e s t be t h e a t t r i b u t e w i t h t h e h i g h e s t
normalized information gain .
4 . C r e a t e a d e c i s i o n node t h a t s p l i t s on a_b e s t
5 . R e c u r s e on t h e s u b l i s t s o b t a i n e d by s p l i t t i n g
on a_b e s t , and add t h o s e nodes a s c h i l d r e n o f node
References
What's Weka?
Basics
Essential Application Types
FT-Algorithm
•
•
Builds a functional tree
With Oblique splits and
linear functions at the
leaves
Review
Related Works
Summary
References
What's Weka?
Basics
Essential Application Types
FT-Algorithm
•
•
Builds a functional tree
With Oblique splits and
linear functions at the
leaves
In comparison with the
J48-Algorithm
•
•
•
Higher rate of correctly
classied instances
Lower error rates
Better choice
Review
Related Works
Summary
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
Explorer: Cluster & Associate
Cluster
•
•
Determines a priority
class in each cluster
Compares their
matches with the
preassigned class
Associate
•
Six algorithms to
determine association
rules between the
attributes
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
Explorer: Select Attribute & Visualization
Select Attribute
•
Its to nd an best
matching attribute in
dependencies of a
preassigned class
Visualization
•
Visualize a data set and
not the results of a
classication
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
Experimenter
•
•
•
Allows to run
algorithms
against each
other
Could be
applied on
dierent data
sets
Provides three
panels to Setup,
Run and
Analyse
What's Weka?
Basics
Essential Application Types
Review
Related Works
Knowledge Flow & Simple CLI
Knowledge Flow
•
•
Oers better opportunities
to work with large data
sets
Enables user to develop a
data ow
CLI
•
Possibility to use dierent
options that were hidden
in the Explorer
Summary
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
Review
Pros
•
•
•
Distributed under the General Public License [GNU]
Simple GUI that provides some useful features
Portability
Cons
•
•
Weka is not compatible with multi-relational data sets
Lack of proper and adequate documentations
References
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
Related Works
•
http://www.kdnuggets.com/polls/
2011/tools-analytics-datamining.html
Free available tools are in high
demand
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
Related Works
•
Free available tools are in high
demand
RapidMiner
•
•
•
http://www.kdnuggets.com/polls/
2011/tools-analytics-datamining.html
Is the most commonly used data
mining tool
Started at the Articial
Intelligence Unit of the
University of Dortmund in 2001
Implements Weka learning
schemes and allows to take
advantage of varying modeling
methods
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
Summary
•
•
•
•
Weka makes it easier to nd an optimized algorithm out of a
collection
Ability to implement algorithms
Weka is becoming more comfortable in handling
Weka could be integrated in every Java using Platform and
enables the portability
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
References & Bibliography
Generated to model psychological experiments reported by
Siegler, R.S.(1976). Three Aspects of Cognitive Development.
Cognitive Psychology, 8, 481-520.
http://archive.ics.uci.edu/ml/machine-learningdatabases/balance-scale/
Data Mining Tool Usage.
http://www.kdnuggets.com/polls/2011/tools-analytics-datamining.html
Prof. Dr. Thomas Seidl, Slides from the Data Mining lecture,
Year 2011
What's Weka?
Basics
Essential Application Types
Review
Related Works
Summary
References
Szugat, Martin: Im Datenrausch: Praktische Einführung in das
Data Mining mit Weka 3.4.
http://bioweka.sourceforge.net/download/LE_1.05_75-79.pdf
Witten, H. Ian and Frank, Eibe and Hall, A. Mark: Data
Mining: Practical Machine Learning Tools and
Techniques.Third Edition. Morgan Kaufmann, 2011.
Frank, Eibe and Hall, Mark and Trigg, Len and Holmes,
Georey and Witten, Ian H.: Data mining in bioinformatics
using Weka. Volume 20, Number 20, Pages 2479-2481, Year
2004