Download KSE525 - Data Mining Lab

Data Mining and Knowledge Discovery (KSE525) Assignment #4 (May 3, 2012) 1. [10 points] The effectiveness of the SVM depends on the selection of kernels. trick? (b) Consider the quadratic kernel K(u, v) = (u • v + 1)2. = Φ(u) • Φ(v) for some Φ. (a) What is the kernel Show that this is a kernel, i.e., K(u, v) [Hint: I did the proof for K(u, v) = (u • v)2 in class.] 2. [5 points] What is boosting? Why does boosting improve classification accuracy? 3. [10 points] Discuss the advantages and disadvantages of the four clustering methods: k-means, EM, BIRCH, and DBSCAN. You had better fill out the table below. Advantages Disadvantages k-means EM BIRCH DBSCAN 4. [10 points] The clustering results of DBSCAN are sensitive to the parameter values. determining the proper values of the parameters Eps and MinPts is very important. heuristic method for estimating the good parameter values for DBSCAN. Thus, Propose a [Hint: Refer to the original paper published at KDD96.] 5. [15 points] LIBSVM is one of the most popular tools for the SVM. Download the Wine data set available at the URL below. Let’s practice to use LIBSVM. Then, arbitrarily divide wine.scale into the training set and the test set of approximately the same size. Run svm-train to build a classification model using the training set and run svm-predict to test the accuracy of the model using the test set. Identify the misclassified objects in the test set and report the accuracy of the model. You need to mention which kernel is used (the default is the Gaussian kernel).  Wine data set: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#wine

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download KSE525 - Data Mining Lab