Download Data Mining: Tutorial 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Intelligent Data Analysis and Probabilistic Inference
Data Mining Tutorial 3: Clustering and Associations Rules
1.
i.
ii.
Explain the operation of the k-means clustering algorithm using pseudo code.
Given the following eight points, and assuming initial cluster centroids given by A, B, C,
and that a Euclidean distance function is used for measuring distance between points, use
k-means to show only the three clusters and calculate their new centroids after the second
round of execution.
ID
A
B
C
D
E
F
G
H
X
2
2
8
5
7
6
1
4
Y
10
5
4
8
5
4
2
9
2.
i.
ii.
Explain the meaning of support and confidence in the context of association rule
discovery algorithms and explain how the a priori heuristic can be used to improve the
efficiency of such algorithms.
Given the transactions described below, find all rules between single items that have
support >= 60%. For each rule report both support and confidence.
1: (Beer)
2: (Cola, Beer)
3: (Cola, Beer)
4: (Nuts, Beer)
5: (Nuts, Cola, Beer)
6: (Nuts, Cola, Beer)
7: (Crisps, Nuts, Cola)
8: (Crisps, Nuts, Cola, Beer)
9: (Crisps, Nuts, Cola, Beer)
10:(Crisps, Nuts, Cola, Beer)
[email protected], [email protected]
16th Dec2003
3.
a. Explain how hierarchical clustering algorithms work, make sure your answer describes what is
meant by a linkage method and how it is used.
b. Explain the advantages and disadvantages of hierarchical clustering compared to K-means
clustering.
4.
The following table shows the distance matrix between five genes,
G1
G2
G3
G4
G5
i.
ii.
iii.
G1
0
9
3
6
11
G2
G3
G4
G5
0
7
5
10
0
9
2
0
8
0
Based on a complete linkage method show the distance matrix between the first formed
cluster and the other data points.
Draw a dendrogram showing the full hierarchical clustering tree for five points based on
complete linkage.
Draw a dendrogram showing the full hierarchicatree for the five points based on single
linkage.
[email protected], [email protected]
16th Dec2003
Data Mining Tutorial 3: Answers
1.
Clusters after 1st iteration
Cluster1: A (2,10), D (5,8), H (4,9)
Cluster2: B: B (2,5), G (1,2)
Cluster3: C (8,4), E (7,5), F (6,4)
Centroids after 1st iteration
Cluster1: centroid: (3.66, 9)
Cluster2: centroid: (1.5, 3.5)
Cluster3: centroid: (7, 4.33)
Clusters after 2nd iteration
(no change)
Cluster1: A (2,10), D (5,8), H (4,9)
Cluster2: B: B (2,5), G (1,2)
Cluster3: C (8,4), E (7,5), F (6,4)
Centroids after 2nd iteration
(no change)
Cluster1: centroid: (3.66, 9)
Cluster2: centroid: (1.5, 3.5)
Cluster3: centroid: (7, 4.33)
2.
Initial Supports
Beer: Support = 9/10
Cola: Support=8/10
Nuts: Support=7/10
Crisps: Support=4/10 (Drop Crisps)
Beer, Cola: Support=7/10
Beer, Nuts: Support=6/10
Cola, Nuts: Support=6/10
Beer->Cola (Support=70%, Confidence= 7/9=77%
Cola->Beer (Support=70%, Confidence= 7/8=87.5
Beer->Nuts (Support=60%, Confidence= 6/9=66%
Nuts->Beer (Support= 60%, Confidence= 6/7=85.7%
Cola->Nuts (Support=60%, Confidence= 6/8=75%
Nuts->Cola (Support=60%, Confidence= 6/7=85.7%
[email protected], [email protected]
16th Dec2003
4. The first cluster will be formed from G3 and G5 since they have the minimum distance.
G35
0
11
G1
G35
G1
G2
G4
10
9
9
6
Single Linkage
[email protected], [email protected]
G2
G4
0
5
0
0
Complete Linkage
16th Dec2003
Related documents