Download CIS526: Homework 7 - Temple University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
CIS526: Homework 7
Assigned: November 19, 2007
Homework Policy
All assignments are INDIVIDUAL! You may discuss the problems with your colleagues, but you must solve the homework by yourself. Please
acknowledge all sources you use in the homework (papers, code or ideas from someone else).Assignments should be submitted in class on the
day when they are due. No credit is given for assignments submitted at a later time, unless you have a medical problem.
Reading Assignment (due Nov 26, 2007, in class)
Read the following two papers and write a ½ page report for each. You should report on:

the motivation for the paper,

approach and methodology,

main experimental results and conclusions drawn from there

give your opinion on the strengths and weaknesses of the paper
To get the full credit, please refrain from copy-pasting sentences from the paper, but try to give a summary in your own
words.
PAPER 1 (Presented by James Joseph in class):
Z. Huang, H. Chen, C. J. Hsu, W.H. Chen, and S. Wu. Credit rating analysis with support vector machines and neural
networks: a market comparative study. Decision Support Systems, 37:543–558, 2004.
PAPER 2 (Presented by Jingting Zeng in class):
Yu, L., Liu, H. (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. Proceedings
of the International Conference on Machine Learning, 856-863.
TOPIC 3 (Presented by Ping Zhang in class):
Decision trees (Chapter 14.4. from the textbook. You will also get the handouts about decision trees in class on Nov
19)
The papers can be downloaded from Google (if you are at any Temple-owned computer) or through Temple University
Library site.
Programming Assignment (due Nov 29, 2007, Thursday, at noon)
WEKA Machine Learning Software – experiments on Adult data set
Downloand install WEKA 3: Data Mining Software in Java from http://www.cs.waikato.ac.nz/ml/weka/. Learn how to use
“Explorer GUI” – it is very user friendly and should not take long. Download “Adult Database” from
http://www.ics.uci.edu/~mlearn/MLSummary.html.
a)
b)
c)
Briefly explain the properties of the data set “adult.txt” (how much data, what is the meaning of attributes and target).
Assign each attribute to either nominal or numeric type.
Select the first 5000 data points from the data set (it will allow you to perform more experiments). Reformat the data to
WEKA format. Run 5-fold cross validation classification experiments using the following algorithms (you can leave
the default parameters of each algorithm):
a. ZeroR (trivial predictor)
b. J48 (decision tree). Note: you will learn about decision tree during your reading assignment.
c. NaiveBayes.
d. IBk (k-nearest neighbor)
e. MultilayerPerc (neural network)
f. SMO (support vector machine)
g. Bagging of 30 decision trees (meta learning algorithm)
Based on the J48 tree result, discuss which attributes are important for classification and which are not. Comment if
this agrees with your intuition. Report the accuracy of each algorithm. Rank the algorithms by their speed.
Try to improve the accuracy of each of the above algorithms by changing some of the default parameters or by doing
more careful data preprocessing. Explain your choices and present the results. Hopefully, you will be able to improve
accuracy of each algorithm other than ZeroR.
WEKA Machine Learning Software – competition on data sets from HW 6
In Homework 6 you were supposed to use SVM software from Spider to construct SVM classifiers for two classification
problems. Then, you were asked to apply the classifiers on the unlabeled data sets and submit your best predictions. In this
homework you have the same task, only that this time you should do it using WEKA software. As you have seen, this software
has many ML algorithms available. So, it is possible that you will be able to get even higher classification accuracy than in
Homework 6.
Deliverables: description of your experiments, your estimated accuracies using different WEKA classifiers, a file
data1prediction.mat that contains a vector of dimension 1000*1 that contains your class predictions for examples in
data1test.mat, and a file data2prediction.mat that contains a vector of dimension 2301*1 that contains your class predictions for
examples in data2test.mat. Your score will depend on how high accuracy you achieve. Students with the most accurate
predictions will get extra credit.
Good luck!!!