Download Exam/Quiz1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
October 24, 2005
Solutions Sketches of Quiz/Exam 1
COSC 6367 Data mining
1 a) No, the algorithm will not always find the minimum tree that classifies all training
examples correctly because the decision tree learning algorithm is not optimal (as
witnessed when solving problem 5 in homework1). Moreover, if the dataset contains
examples that are inconsistent, it is impossible to obtain a decision tree that classifies all
examples correctly.
1 b)
Root test: A >= 0.25 or
Root test: A >= 0.3
1 c) Some decision tree tools prefer gain ratio over information gain because they want to
minimize number of splits.
1 d) Yes, the node B= will be pruned because the error rate decreases after pruning. (The
error rate’s computation is required.)
2 a)
Def 1: A hypothesis h overfits a data set D if there exists another hypothesis h’, where h
has better classification accuracy than h’ in D but worse classification accuracy than h’ in
the unknown data set D’.
Def 2: The accuracy of a classifier increases on a training set while decreasing on a
validation set.
Potential reasons for overfitting:
- The training data is noisy.
- There is insufficient number of examples.
- Training examples are badly selected.
2 b) 1st way:
k
 t2 
 (d
j 1
j
 d )2
k (k  1)
=
0.012  (0.01) 2  0 2  0 2
 0.0000167
4(3)
 t  0.004
d t  d  t / 2,k 1 t , where   0.1
d t  0.04  2.353(0.004)
Since dt doesn’t contain 0, reject the null hypothesis.
2nd way:
t ,k 1 
d k
k
 (d
j 1
j
= (0.04 – 2) / 0.00816  9.8
 d) 2
k 1
t ( / 2,k 1)  2.353
  ,k 1 falls in (t / 2,k 1 , ) . So, reject the null hypothesis.
3 a)
Learning model
Decision boundary
Need distance function?
K-nearest neighbor
approach
Lazy learning
Convex polygon
Yes
Decision tree approach
Eager learning
Rectangle
No
3 b) SVMs select margin optimal. SVM machines employ kernel functions because
mapping to higher space makes class linearly separable which is not linearly separable in
the original space.
3 c) Yes, a 1-NN classifier will behave differently. It is because the example O points on
the decision boundary. Removing O therefore changes the Delaunay triangle that will
result in the decision boundary.
4 a) should mention helps in “classifying and grouping data”, enable to find “interesting
hypotheses” worth exploring in further studies, help finding interesting patterns in large
amount of data,…
4b) develop a general understanding and background knowledge for the problem at hand;
get an initial assessment of the difficulty of the tasks to be solved and suitable of potential
tools for solving the task; obtain quantitative data for the problem to be solved, …
5 a) Missing values are problematic for most data mining techniques because there is no
input to learn a model.
How could decision trees be used to preprocess data with missing values for categorical
attributes? (Ch3. Data preprocessing’s slide 10)
5 c) The purpose of using attribute normalizations is to cope with differencess in scale. It
converts data into the same scale.
5 d) In preprocessing step, histograms make us to visualize the distribution of values of a
particular (set of) attribute(s). It gives us an approximate view of the attribute density
function. Histograms also help the data reduction.
5 e) 1-D, 2-A, 3-C, 4-E, 5-B