Download CIS526: Homework 7 - Temple University

CIS526: Homework 7 Assigned: November 19, 2007 Homework Policy All assignments are INDIVIDUAL! You may discuss the problems with your colleagues, but you must solve the homework by yourself. Please acknowledge all sources you use in the homework (papers, code or ideas from someone else).Assignments should be submitted in class on the day when they are due. No credit is given for assignments submitted at a later time, unless you have a medical problem. Reading Assignment (due Nov 26, 2007, in class) Read the following two papers and write a ½ page report for each. You should report on:  the motivation for the paper,  approach and methodology,  main experimental results and conclusions drawn from there  give your opinion on the strengths and weaknesses of the paper To get the full credit, please refrain from copy-pasting sentences from the paper, but try to give a summary in your own words. PAPER 1 (Presented by James Joseph in class): Z. Huang, H. Chen, C. J. Hsu, W.H. Chen, and S. Wu. Credit rating analysis with support vector machines and neural networks: a market comparative study. Decision Support Systems, 37:543–558, 2004. PAPER 2 (Presented by Jingting Zeng in class): Yu, L., Liu, H. (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. Proceedings of the International Conference on Machine Learning, 856-863. TOPIC 3 (Presented by Ping Zhang in class): Decision trees (Chapter 14.4. from the textbook. You will also get the handouts about decision trees in class on Nov 19) The papers can be downloaded from Google (if you are at any Temple-owned computer) or through Temple University Library site. Programming Assignment (due Nov 29, 2007, Thursday, at noon) WEKA Machine Learning Software – experiments on Adult data set Downloand install WEKA 3: Data Mining Software in Java from http://www.cs.waikato.ac.nz/ml/weka/. Learn how to use “Explorer GUI” – it is very user friendly and should not take long. Download “Adult Database” from http://www.ics.uci.edu/~mlearn/MLSummary.html. a) b) c) Briefly explain the properties of the data set “adult.txt” (how much data, what is the meaning of attributes and target). Assign each attribute to either nominal or numeric type. Select the first 5000 data points from the data set (it will allow you to perform more experiments). Reformat the data to WEKA format. Run 5-fold cross validation classification experiments using the following algorithms (you can leave the default parameters of each algorithm): a. ZeroR (trivial predictor) b. J48 (decision tree). Note: you will learn about decision tree during your reading assignment. c. NaiveBayes. d. IBk (k-nearest neighbor) e. MultilayerPerc (neural network) f. SMO (support vector machine) g. Bagging of 30 decision trees (meta learning algorithm) Based on the J48 tree result, discuss which attributes are important for classification and which are not. Comment if this agrees with your intuition. Report the accuracy of each algorithm. Rank the algorithms by their speed. Try to improve the accuracy of each of the above algorithms by changing some of the default parameters or by doing more careful data preprocessing. Explain your choices and present the results. Hopefully, you will be able to improve accuracy of each algorithm other than ZeroR. WEKA Machine Learning Software – competition on data sets from HW 6 In Homework 6 you were supposed to use SVM software from Spider to construct SVM classifiers for two classification problems. Then, you were asked to apply the classifiers on the unlabeled data sets and submit your best predictions. In this homework you have the same task, only that this time you should do it using WEKA software. As you have seen, this software has many ML algorithms available. So, it is possible that you will be able to get even higher classification accuracy than in Homework 6. Deliverables: description of your experiments, your estimated accuracies using different WEKA classifiers, a file data1prediction.mat that contains a vector of dimension 1000*1 that contains your class predictions for examples in data1test.mat, and a file data2prediction.mat that contains a vector of dimension 2301*1 that contains your class predictions for examples in data2test.mat. Your score will depend on how high accuracy you achieve. Students with the most accurate predictions will get extra credit. Good luck!!!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download CIS526: Homework 7 - Temple University