Download Installing Weka Step 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
HW Assignment 1
Due: June 19th 2009
Part I – Installing Weka
The purpose of this assignment is to install and run Weka, a widely used, FREE, Data Mining Software Toolbox in Java. This homework will
walk you through the basic steps of installing, running the software, building classifiers, and labeling test cases. For this assignment, you will
need to download the TRAINING and TEST sets from the course website. Note: It is important that you properly install and learn how to run
Weka because we will use Weka for future hands on assignments as well as for the data mining competition and course project.
Step 1: Installing Weka
Go to the Weka website, http://www.cs.waikato.ac.nz/ml/weka/, and download the software. On the left hand side, click on the link
that says download. Select the appropriate link corresponding to the version of the software based on your operating system and
whether or not you already have Java VM running on your machine (if you don’t know what Java VM is, then you probably don’t).
The link will forward you to a site where you can download the software from a mirror site. Save the self-extracting executable to
disk and then double click on it to install Weka. Answer yes or next to the questions during the installation. Click yes to accept the
Java agreement if necessary. After you install the program Weka should appear on your start menu under Programs (if you are using
Windows).
Step 2: Running Weka
From the start menu select Programs, then Weka, then Weka 3*.
You will see the Weka GUI Chooser. Select Explorer. The Weka Explorer will then launch.
Step 3: Load Training Set
You will find the training set, TRAIN.arff on the course website. The training set includes the records you will use in your next
homework assignment.
The TRAINING set contains the following data:
On the Weka Explorer, push the button that says open file. Open TRAIN.arff.
Step 5: Constructing the Initial Decision Tree
Select the tab that says Classify. In the box that says classifier, you can choose a classifier. Click on the Choose button and you will be
presented with a hierarchy of methods. Pick weka, classifiers, trees, J48. Click on the text box in the classifer box (which says J48 and some
cryptic options instead of ZeroR which is the default classifier). In the popup, change the following settings, minNumObj to 1 and unpruned to
True and then Click OK. (Note: The order the options appear might vary depending on which mirror site you choose. For example, we found
minNumObj is closer to the top of the GUI in some versions)
You will find the test set, TEST.arff on the course website. The TEST set includes the records you will use in future homework assignments.
The TEST set contains the data below. In the box that says test options, pick Supplied test set. Click on the Set button and select your
TEST.arff file.
Now press Start!!!!!!!!!!!!! AND WATCH WEKA GO!
Step 6: Results
You may have to scroll up and down in the classifier output box to see all the results.
Cut and paste the results in the classifier output window to a text editor and HAND IN (or email) with your assignment.
You will compare these results with a future homework assignment. Don’t worry that you don’t yet know how to interpret the output. In a
short time, you will. This exercise is only to get you started with WEKA.
In the results box, on the bottom left, Right click on the item that says … trees.J48.
Select Visualize Classification Errors from the list. Click Save. And save the results as RESULTS.arff. This file will include your original
TEST set plus an extra column for the predicted classification.
Cut and paste the text in the RESULTS.arff file to the end of your assignment and HAND IN.
So, for the first part of the assignment, you simply need to hand in (or email to me) a text document with the results output from Weka
along with the prediction results found in your RESULTS.arff file
Part II: Classification/prediction problem ideas
List three prediction problem ideas for your class project based on publicly available data, Wharton research data services (wrds) data (see
shawndra.pbwiki.com), or data you have at your firm
Append your three ideas to the text file with your answers to Part I.