Download Introduction (cont.)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Concurrency control wikipedia , lookup

ContactPoint wikipedia , lookup

Transcript
國立雲林科技大學
National Yunlin University of Science and Technology
Self-Organizing Topological Tree for
Online Vector Quantization and Data
Clustering
Advisor : Dr. Hsu
Graduate : Kuo-min Wang
Authors
: Pengei Xu,
Chip-Hong, Senior Member,IEEE
Andrew Palinski , Member, IEEE
2005 Expert Systems with Applications
1
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Outline








Motivation
Objective
Introduction
Structure of SOTT and Related Terminology
Training Algorithm of SOTT
Simulation Results
Conclusions
Personal Opinion
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Motivation
SOM have two important operations




Vector quantization
Topology-preserving mapping
Disadvantage in its application to clustering
problem
1. Clustering result is sensitive to the number of partitions
2. The clustering result of one partition provides knowledge at only
one similarity level
3. Favoring “equally-sized compact spheroidal clusters”
4. Computational complexity is long.
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Objective

Propose an online self-organizing topological tree
(SOTT) with faster learning.



Computational complexity is O (log N) rather than O (N) as for
the basic SOM
A hybrid clustering algorithm that fully exploit the online
learning and multi-resolution characteristics of SOTT is devised.
A new linkage metric is proposed which can be updated online to
accelerate the time consuming agglomerative hierarchical
clustering stage.
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Introduction

The applications of SOM



Obtain an optimal set of codebook vectors that
maximizes the rate-distortion performance
Clustering which aims to segregate a chaotic mixture
of patterns for the purpose of knowledge discovery
and analysis
Growing SOM[1]


overcomes the first problem
By growing the SOM to suitable number of partitions
through the insertion of new neurons
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Introduction (cont.)

Tree-Structured SOM [20]


provide a hierarchical structure to reduce the
computation complexity and alleviates the first two
problems simultaneously.
GHSOM [26]

Proposed to grow a hierarchical SOM to solve the
third problem.
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Introduction (cont.)

Vector Quantization



由Y. Linde, A. Buzo, and R. M. Gray 三位學者於1980年所提出
將影像切割成一群大小是n × n的影像區塊
每個以利用事先設計好的編碼簿來處理。
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Introduction (cont.)

Generalized Lloyd Algorithm (GLA) [23]


Tree search vector quantizer (TSVQ) [4]




從一群區塊向量(training vector)中,使用分群(clustering)的方
法,去訓練一個能夠還原原影像區塊的編碼簿
加速搜尋最鄰近碼向量的過程
需要較多的儲存空間
利用樹狀結構編碼簿所得到的影像品質,較傳統向量量化編
碼簿的影像品質來的差。
Tree-structure SOM (TS-SOM) [21]


Organized layers by layers
All training data are fed into the system repeatedly at every layer,
taxes the system resources
heavily for large database.
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Introduction (cont.)

We propose a new multi-resolution self-organizing
topological tree (SOTT) to accelerate the search
procedure.

Globally suboptimal and fails to find the real BMU


Using multi-path to overcome
How to maintain two kinds of neighborhood relationship co-exist
in the network



The inter-layer parent-child relationship
And the intra-layer sibling relationship
Using winning path to overcome
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Introduction (cont.)


In hybrid clustering scheme, a low complexity
partition clustering algorithm is first applied to reduce
the large amount of data before the computational
AHC
Linkage metric is a proximity measure used to merge
subset rather than individual points in AHC
SOTT
AHC algorithm
10
Intelligent Database Systems Lab
Structure of SOTT and Related
Terminology


A static SOTT can be viewed as a multi-layer SOM, with fixed
n
depth and breadth.  : R  W
Input vector x  ( x1 , x2 ,..., xn )  R n and
W  {wi , j  R ni  1,2,..., L and j  1,2,..., Ni }

N.Y.U.S.T.
I. M.
L, the number of layers and Ni is the number of neurons at the
ith layer

The ith layer has B i 1  N 0neurons

Two kinds of relationship


The intra-layer neighborhood
The inter-layer neighborhood
11
Intelligent Database Systems Lab
Structure of SOTT and Related
Terminology



Gi is a fully connected graph by the neurons and their
interconnections at the ith layer
A neuron, u is said to be in the k-distance
neighborhood of the neuron v if there is a connected
path from u to v and || u – v || ≦k
u is said to be a child of v iff v  Gi and u  Gi 1  H v
the neurons of Gi 1  H v have the
same parent neuron v, are called
the siblings
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
N.Y.U.S.T.
I. M.
Training Algorithm of SOTT
wij  [( j  1 / 2)  255 / Ni ,..., ( j  1 / 2)  255 / Ni ]T
i  1,2,..., L and j  1,2,..., Ni
13
Intelligent Database Systems Lab
Butterfly Permutation for Input
Randomization


N.Y.U.S.T.
I. M.
Online learning causes the learning performance to be
order dependent, when the training set contains a high
degree of redundant information
A block based butterfly jumping sequence was used
to subsample the pixels from each block to form
different training sweeps by Pei and Lo[25].
14
Intelligent Database Systems Lab
Butterfly Permutation for Input
Randomization (cont.)



N.Y.U.S.T.
I. M.
A global butterfly permutation sequence is used to present the
spatially correlated input data from a multidimensional
coordinate system
The aim is to let the neurons learn the characteristics of the
training source as early as possible to prevent the performance
degraded by order dependent learning.
The butterfly permutation is defined by a mapping :  : Z I  Z Jn
an input order number r  Z I to a n-dimensional coordinate
system, ( x1 , x2 ,..., xn )  Z Jn where Z Jn is a finite integer space
which bounded by [0,2J-1]n.
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Searching for the Winning Path

The updating of the winning neuron and its neighborhood


To trace the winning path, we need to search for a single
winning leaf


Until a winning path has been identified for each input
Uses two key parameters λκ
to bias the competitiveness of
some layers and emulate the positive effect of a multi
path search
The idea of the algorithm is
1) find the winning child neurons progressively on each layer, until a
winning child at the leaf layer is found.
2) Then path to the win_leaf is set as the winning path.
16
Intelligent Database Systems Lab
Searching for the Winning Path
(cont.)
17
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Updating of Winning Path Neurons and
Their Neighborhoods
m

(
m
)


(
0
)
k

1
, is a monotonic decreasing gain
function of the sweep time, this neighborhood
taper is
Neighborhood width
18
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Updating of Winning Path Neurons and
Their Neighborhoods

Maintain both the intra-layer relationship and
the inter-layer relationship correct is import,
the following updating rules are imposed
1. The initialization of neighborhood widths is
 (0)
N

proportional  (0) N
2. the children neurons will only be updated if their
parent neuron is also updated
3. the neighborhood neurons will only be updated if it
is sufficiently close to the winning neuron of their
layer. u  vi  
 i (m)
i
i
i 1
i 1
19
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
N.Y.U.S.T.
I. M.
Convergence Criteria

B L1
j 1
|| wLj (m)  wLj (m  1) ||
B

L 1

If the average square difference of the neuron
weights, wLj at the leaf layer is less than 0.1,
the training is terminated
20
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Hybrid Clustering Algorithm on SOTT


The main idea behind the hybrid clustering is to
combine the efficiency of the partition clustering and
the prowess of discrimination of AHC
To merge clusters rather than individual points the
distance between individual points has to be
generalized to the distance between clusters (sets of
SOTT
AHC
points)
21
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Hybrid Clustering Algorithm on SOTT

A metric Bond (Ai, Aj) to assess the connectivity of
two atomic clusters Ai and Aj is defined as follows:

Computational complexity is O ((ki + kj) kikj)
22
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Simulation Results

Measure the performance of the proposed SOTT in
VQ and compare it to the performance of the SOM,
GLA[23], and TSVQ[4]
23
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Simulation Results (cont.)
24
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Simulation Results (cont.)
25
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Simulation Results (cont.)
26
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Simulation Results (cont.)
27
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Simulation Results (cont.)
28
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Conclusions

The proposed SOTT hybrid clustering
algorithm has demonstrated to be



Computational efficient and possesses good scalability
Overcome the clustering performance deficiencies of kmeans and SOM algorithms.
The experimental results show that the
computation efficiency of SOTT is much
better than that of basic SOM and other vector
quantizers.
29
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Personal Opinions

Advantage


Application


Computational complexity is faster than others.
Pattern classification applications
Drawback


The structure of the paper is not good,
So it is not easy to understand.
30
Intelligent Database Systems Lab