Download Practical_2 - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Practical No-2
PRACTICAL-2
AIM: Introduction about Weka.
Weka:
 Waikato Environment for Knowledge Analysis
 It’s a data mining/machine learning tool developed by Department of
Computer Science, University of Waikato, New Zealand.
 Weka is also a bird found only on the islands of New Zealand.
Download and Install WEKA
Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html
 Support multiple platforms (written in java):
 Windows, Mac OS X and Linux
What can Weka do?
•
Weka is a collection of machine learning algorithms for data mining tasks. The
algorithms can either be applied directly to a dataset (using GUI) or called from your
own Java code (using Weka Java library).
•
Weka contains tools for data pre-processing, classification, regression, clustering,
association rules, and visualization. It is also well-suited for developing new machine
learning schemes.
Alpha College of Engineering & Technology
Page 8
Practical No-2
Main Features:
 49 data preprocessing tools
 76 classification/regression algorithms
 8 clustering algorithms
 3 algorithms for finding association rules
 15 attribute/subset evaluators + 10 search algorithms for feature selection
Main GUI:
Three graphical user interfaces

“The Explorer” (exploratory data analysis)
 Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary
 Data can also be read from a URL or from an SQL database (using JDBC)
 Pre-processing tools in WEKA are called “filters”
 WEKA contains filters for:
 Discretization, normalization, resampling, attribute selection, transforming and
combining attributes, …

“The Experimenter” (experimental environment)
 “The KnowledgeFlow” (new process model inspired interface)
Weka’s Role in the Big Picture
Alpha College of Engineering & Technology
Page 9
Practical No-2
How to use Weka?
•
Weka Data File Format (Input)
•
Weka for Data Mining
•
Sample Output from Weka (Output)
Weka Data Format:
Weka permits the input data set to be in numerous file formats like CSV (comma
separted values: *.csv), Binary Serialized Instances (*.bsi) etc. However, the most preferred
and the most convenient input file format is the attribute relation file format (arff). So the first
step in Weka always is taking an input file and making sure that it is in ARFF.
Typically, here is how an ARFF data set looks like:
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
Alpha College of Engineering & Technology
Page 10
Practical No-2
Covert CSV to ARFF: There is a feature in Weka which helps you convert .CSV into arff on
the fly.
Following is an example CSV file
outlook,temperature,humidity,windy,play
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
Alpha College of Engineering & Technology
Page 11