Download WEKA A Data Mining Tool

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
WEKA
A Data Mining Tool
By Susan L. Miertschin
1
Data Mining
Task Types
Numerous Algorithms
 Classification
 C4.5 Decision Tree
 Clustering
 K-Means
K Means Clustering
 Discovering Association
Rules
 Discovering Sequential
Patterns – Sequence Analysis
 Regression
R
i
 Detecting Deviations from
Normal
2
http://www cs waikato ac nz/ml/weka/
http://www.cs.waikato.ac.nz/ml/weka/
WEKA can be freelyy downloaded byy visitingg the Web site
3
WEKA – Data Mining Software
 Developed by the Machine Learning Group, University of
Waikato , New Zealand
 Vision: Build state-of-the-art software for developing
machine learning (ML) techniques and apply them to realreal
world data-mining problems
 Developed
p in Java
J
4
WELA’s Collection of Machine Learning
Alg ith
Algorithms
 Algorithms for data mining tasks
 WEKA is open source software issued under the GNU
General Public License
 Tools
T l ffor:
 Data pre-processing
 Classification
 Regression
 Clustering
 Association rules
 Visualization
5
After Installing - Start WEKA
6
WEKA Main Interface
7
WEKA Sample Files
 C:\Program Files\weka\data
 WEKA formatted files (.arff)
 Open the contact-lenses file
8
Example – Contact Lens Data
How manyy
data
instances
are in the
file?
How many
attributes?
Numerical
attributes?
Categorical
attributes?
9
Example – Contact Lens Data
Can you
think of
problems
that might
be solved
with
ith thi
this
data?
10
Example – Contact Lens Data
If
supervised
learning
were to be
done,
which
would be
the output
attribute,
attribute
do you
think?
11
Example – Contact Lens Data
12
Example – Contact Lens Data
13
Example – Contact Lens Data
14
E
Example
l – Classify
Cl if - Contact
C t t Lens
L
Data
D t
15
E
Example
l – Classify
Cl if - Contact
C t t Lens
L
Data
D t
16
E
Example
l – Classify
Cl if - Contact
C t t Lens
L
Data
D t
Select the
rule
generator
named
PART from
th list
the
li t that
th t
shows up
after you
select
Choose
17
E
Example
l – Classify
Cl if - Contact
C t t Lens
L
Data
D t
18
E
Example
l – Classify
Cl if - Contact
C t t Lens
L
Data
D t
19
10 Fold Cross-Validation
10-Fold
Cross Validation
 Data is partitioned into 10 equally (or nearly equally) sized
segments or folds
 10 iterations of training and validation are completed
 In
I eachh iteration a different
d ff
ffold
ld off the
h data
d is held
h ld out ffor
validation, with the remaining 9 folds used for learning
20
http://www.public.asu.edu/~ltang9/papers/ency-cross-validation.pdf
E
Example
l – Classify
Cl if - Contact
C t t Lens
L
Data
D t
21
E
Example
l – Classify
Cl if - Contact
C t t Lens
L
Data
D t
IF tear
tear-prodprod
rate = reduced
THEN contactllenses = none
IF astigmatism
tig ti
= no THEN
co tact e ses
contact-lenses
= soft
22
E
Example
l – Classify
Cl if - Contact
C t t Lens
L
Data
D t
Coverage =
12
23
E
Example
l – Classify
Cl if - Contact
C t t Lens
L
Data
D t
IF tear
tear-prodprod
rate = reduced
THEN contactllenses = none
IF astigmatism
tig ti
= no THEN
co tact e ses
contact-lenses
= soft
24
E
Example
l – Classify
Cl if - Contact
C t t Lens
L
Data
D t
 Coverage = 6
 Misclassification = 1
 Accuracy = 5/6 = 83.3%
25
E
Example
l – Classify
Cl if - Contact
C t t Lens
L
Data
D t
26
WEKA
A Data Mining Tool
By Susan L. Miertschin
27
Related documents