Download KSE525 - Data Mining Lab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

K-means clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Data Mining and Knowledge Discovery (KSE525)
Assignment #4 (May 3, 2012)
1. [10 points] The effectiveness of the SVM depends on the selection of kernels.
trick?
(b) Consider the quadratic kernel K(u, v) = (u • v + 1)2.
= Φ(u) • Φ(v) for some Φ.
(a) What is the kernel
Show that this is a kernel, i.e., K(u, v)
[Hint: I did the proof for K(u, v) = (u • v)2 in class.]
2. [5 points] What is boosting?
Why does boosting improve classification accuracy?
3. [10 points] Discuss the advantages and disadvantages of the four clustering methods: k-means, EM,
BIRCH, and DBSCAN.
You had better fill out the table below.
Advantages
Disadvantages
k-means
EM
BIRCH
DBSCAN
4. [10 points] The clustering results of DBSCAN are sensitive to the parameter values.
determining the proper values of the parameters Eps and MinPts is very important.
heuristic method for estimating the good parameter values for DBSCAN.
Thus,
Propose a
[Hint: Refer to the original
paper published at KDD96.]
5. [15 points] LIBSVM is one of the most popular tools for the SVM.
Download the Wine data set available at the URL below.
Let’s practice to use LIBSVM.
Then, arbitrarily divide wine.scale into the
training set and the test set of approximately the same size.
Run svm-train to build a classification
model using the training set and run svm-predict to test the accuracy of the model using the test set.
Identify the misclassified objects in the test set and report the accuracy of the model.
You need to
mention which kernel is used (the default is the Gaussian kernel).

Wine data set: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#wine