Download Homework 5

CIS526: Homework 5 Assigned: Oct 19, 2004 Due: in class, Tue, Oct 26, 2004 Homework Policy  All assignments are INDIVIDUAL! You may discuss the problems with your colleagues, but you must solve the homework by yourself. Please acknowledge all sources you use in the homework (papers, code or ideas from someone else). Assignments should be submitted in class on the day when they are due. No credit is given for assignments submitted at a later time, unless you have a medical problem. Problems Problem 1: Downloand install WEKA 3: Data Mining Software in Java from http://www.cs.waikato.ac.nz/ml/weka/. Learn how to use “Explorer GUI” – it is very user friendly and should not take long. Download “Adult Database” from http://www.ics.uci.edu/~mlearn/MLSummary.html. Briefly explain the properties of the data set “pima.txt” (how much data, what is the meaning of attributes and target). Assign each attribute to either nominal or numeric type. b) Select the first 5000 data points from the data set (it will allow you to perform more experiments). Reformat the data to WEKA format. Run 5-fold cross validation classification experiments using the following algorithms (you can leave the default parameters of each algorithm): a. ZeroR (trivial predictor) b. J48 (decision tree) c. NaiveBayes d. IBk (k-nearest neighbor) e. MultilayerPerc (neural network) f. SMO (support vector machine) g. Bagging of 30 decision trees (meta learning algorithm) Based on the J48 tree result, discuss which attributes are important for classification and which are not. Comment if this agrees with your intuition. Report the accuracy of each algorithm. Rank the algorithms by their speed. c) Try to improve the accuracy of each of the above algorithms by changing some of the default parameters. Explain your choices and present the results. Hopefully, you will be able to improve accuracy of each algorithm other than ZeroR. a) Problem 2: Write a one-page report (font 11pt, 1-inch margin) describing motivation, methodology, experimental results of the following paper: Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. Artificial Intelligent Research, Vol. 11 (1999), 169-198. You can download it from http://www.jair.org/abstracts/opitz99a.html. Within the one page limit, please also discuss what do you consider as strongest or most important parts of the paper and what are the drawbacks or problems with the paper. The paper is a very nice example of high-quality empirical work in data mining. Pay special attention to the style of presentation, since you will be expected to be able to emulate it in your course project report. Good Luck!!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Homework 5