* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download analyse input data
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Clustering II
Finite Mixtures
• Model data using a mixture of distributions
•
•
•
•
– Each distribution represents one cluster
– Each distribution gives probabilities of attribute values in that
cluster
Finite mixtures: finite number of clusters
Individual distributions are usually normal
Combine distributions using cluster weights
Each normal distribution can be described in terms of μ (mean)
and σ (standard deviation)
• For a single attribute with two clusters
– μA, σA for cluster A and μB, σB for cluster B
– The attribute values are obtained by combining values from cluster
A with a probability of PA and from cluster B with a probability PB
– Five parameters μA, σA, μB, σB and PA (because PA+PB=1) describe the
attribute value distribution
2
EM Algorithm
• EM = Expectation – Maximization
– Generalize k-means to probabilistic setting
• Input: Collection of instances and number of clusters,
k
• Output: probabilities with which each instance
belongs to each of the k clusters
• Method:
– Start by guessing values for all the parameters of the k
clusters (similar to guessing centroids in k-means)
– Repeat
• E ‘Expectation’ Step: Calculate cluster probability for each
instance
• M ‘Maximization’ Step: Estimate distribution parameters from
cluster probabilities
• Store cluster probabilities as instance weights
• Estimate parameters from weighted instances
– Until we obtain distribution parameters that predict the
input data
3
Incremental Clustering
(Cobweb/Classit)
• Input: Collection of instances
• Output: A hierarchy of clusters
• Method:
– Start with an empty root node of the tree
– Add instances one by one
– if any of the existing leaves is a good ‘host’ for the
incoming instance then form a cluster with it
• Good host has high category utility (next slide)
– If required restructure the tree
• Cobweb - for nominal attributes
• Classit – for numerical attributes
4
Category Utility
• Category Utility,
CU(C1,C2,…,Ck) = {∑lP[Cl] ∑i ∑j(P[ai=vij|Cl]2-P[ai=vij]2)}/k
• Computes the advantage in predicting the
values of attributes of instances in a cluster
– If knowing the cluster information of an instance
does not help in predicting the values of its
attributes, then the cluster isn’t worth forming
• The inner term of difference of squares of
probabilities, (P[ai=vij|Cl]2-P[ai=vij]2) is computing this
information
• The denominator, k is computing this
information per cluster
5
Weather Data with ID
ID
Outlook
Temperature
Humidity
Windy
Play
a
sunny
hot
high
false
no
b
sunny
hot
high
true
no
c
overcast
hot
high
false
yes
d
rainy
mild
high
false
yes
e
rainy
cool
normal
false
yes
f
rainy
cool
normal
true
no
g
overcast
cool
normal
true
yes
h
sunny
mild
high
false
no
i
sunny
cool
normal
false
yes
j
rainy
mild
normal
false
yes
k
sunny
mild
normal
true
yes
l
overcast
mild
high
true
yes
m
overcast
hot
normal
false
yes
n
rainy
mild
high
true
no
Artificial data, therefore not possible to find natural clusters
(two clusters of yeses and nos not possible)
6
Trace of Cobweb
2
1
a:no
a:no
b:no
c:yes
d:yes
e:yes
No good host for
the first five instances
3
a:no
b:no
c:yes
e is the best host
CU of e&f as cluster high
e&f are similar
d:yes
e:yes
f:no
7
Trace of Cobweb (Contd)
4
a:no
b:no
c:yes
d:yes
e:yes
f:no
At root: e&f cluster best host
At e&f: no host, so no new cluster, g added to e&f cluster
f&g are similar
5
b:no
a:no
d:yes
h:no
g:yes
c:yes
e:yes
f:no
g:yes
At root: a is the best host and d is the runner-up
Before h is inserted runner-up, d is evaluated
CU of a&d is high, so d merged into a to form a new cluster
At a&d: no host, so no new cluster, h added to a&d cluster
8
Trace of Cobweb (Contd)
g:yes
a:no
d:yes
c:yes
h=no
b:no
k:yes
l:yes
e:yes
f:no
j:yes
m:yes
n:no
i:yes
For large data sets, growing the tree to
individual instances might lead to overfitting.
A similarity threshold called cutoff used to
suppress growth
9
Hierarchical Agglomerative
Clustering
• Input: Collection of instances
• Output: A hierarchy of clusters
• Method:
– Start with individual instances as clusters
– Repeat
• Merge the ‘closest’ two clusters
– Until only one cluster remains
• Ward’s method: Closeness or proximity between two
clusters is defined as the increase in squared error
that results when two clusters are merged
• Squared error measure used for only the local
decision of merging clusters
– No global optimization
10
HCE
• A visual knowledge discovery tool for analysing and
understanding multi-dimensional (> 3D) data
• Offers multiple views of
– input data and clustered input data
– where views are coordinated
• Many other similar tools do a patch work of statistics
and graphics
• HCE follows two fundamental statistical principles of
exploratory data analysis
– To examine each dimension first and then find relationships
among dimensions
– To try graphical displays first and then find numerical
summaries
11
GRID Principles
• GRID – graphics, ranking and interaction for
discovery
• Two principles
– Study 1D, study 2D and find features
– Ranking guides insight, statistics confirm
• These principles help users organize their knowledge
discovery process
• Because of GRID, HCE is more than R + Visualization
• GRID can be used to derive some scripts to organize
exploratory data analysis using R (or some such
statistics package)
12
Rank-by-Feature Framework
• A user interface framework based on the
GRID Principles
• The framework
– Uses interactive information visualization
techniques combined with
– statistical methods and data mining algorithms
– Enables users to orderly examine input data
• HCE implements rank-by-feature framework
– This means
• HCE uses existing statistical and data mining methods to
analyse input data and
• Communicate those results using interactive information
visualization techniques
13
Multiple Views in HCE
•
•
•
•
•
Dendrogram
Colour Mosaic
1 D histograms
2D scatterplots
And more
14
Dendrogram Display
• Results of HAC are shown
visually using a dendrogram
• A dendrogram is a tree
– with data items at the
terminal (leaf) nodes
– Distance from the root node
represents similarity among
leaf nodes
• Two visual controls
A
B
C
D
– minimum similarity bar allows
users to adjust the number
of clusters
– Detail cut-off bar allows
users to reduce clutter
15
Colour Mosaic
•
•
•
•
•
•
Input data is shown using this
view
Is a colour coded visual display
of tabular data
Each cell in the table is painted
in a colour that reflects the
cell’s value
Two variations
1
3
2
4
Table
Original layout
– The layout of the mosaic is
similar to the original table
– A transpose of the original
layout
HCE uses the transposed layout
because data sets usually have
more rows than columns
A colour mapping control
Transposed Layout
16
1D Histogram Ordering
• This data view is part of the rank-by-feature
framework
• Data belonging to one column (variable) is
displayed as a histogram + box plot
– Histogram shows the scale and skewness
– Box plot shows the data distribution, center and
spread
• For the entire data set many such views are
possible
• By studying individual variables in detail
users can select the variables for other
visualizations
17
2D Scatter Plot Ordering
• This data view is again part of the rank-by-feature
framework
• Three categories of 2D presentations are possible
– Axes of the plot obtained from Principal Component Analysis
• Linear or non-linear combinations of original variables
– Axes of the plot obtained directly from the original variables
– Parallel coordinates
• HCE uses the second option of plotting pairs of
variables from the original variables
• Both 1D and 2D plots can be sorted according to some
user selected criteria such as number of outliers
18