Download week08

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nervous system network models wikipedia , lookup

Recurrent neural network wikipedia , lookup

Synaptic gating wikipedia , lookup

Convolutional neural network wikipedia , lookup

Neuroinformatics wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Transcript
CSE5230/DMS/2004/8
Data Mining - CSE5230
Self-Organizing Maps (SOMs)
CSE5230 - Data Mining, 2004
Lecture 8.1
Lecture Outline
 Motivation
unsupervised learning
the cortex
topographic feature maps
biological self-organizing maps
 Artificial
self-organizing maps
 Kohonen’s self-organizing network
learning algorithm
examples
 Data
mining examples
Text mining
Customer Understanding
CSE5230 - Data Mining, 2004
Lecture 8.2
Lecture Objectives
 By
the end of this lecture you should be able to
Explain the principal differences between MLPs and
SOMs
Describe the properties of a topological feature map,
with particular attention to the notion of similarity in
feature space being mapped to proximity in the SOM
Describe how Kohonen networks are trained
Give examples of how SOMs can be used in data
mining
CSE5230 - Data Mining, 2004
Lecture 8.3
Motivation - 1
 The
feed-forward back-propagation NNs
discussed last week are an example of a
supervised learning technique
 In supervised learning, the aim is to discover a
relationship between the inputs and outputs of a
system
 This relationship can be used for tasks such as
prediction, estimation or classification
 A known training set of input/output pairs is used
to train the network
CSE5230 - Data Mining, 2004
Lecture 8.4
Motivation - Unsupervised Learning
 Many
data mining tasks are not suited to this
approach
 Often the data mining task is to discover
structure in the data set, without any prior
knowledge of what is there
 This is an example of unsupervised learning
(we have already seen the example of the Kmeans clustering algorithm)
 A class of neural networks called SelfOrganizing Maps (SOMs) can be used for this
task
CSE5230 - Data Mining, 2004
Lecture 8.5
The Cortex - 1
 SOMs
research was inspired by the observation
of topologically correct sensory maps in the
cortex (e.g. the retinotopic, somatotopic,
tonotopic maps)
 In humans, the cortex consists of a layer of nerve
tissue about 0.2m2 in area and 2-3mm in
thickness
 It is highly convoluted to save space, and forms
the exterior of the brain - it’s the folded, wrinkled
stuff we see when we look at a brain
CSE5230 - Data Mining, 2004
Lecture 8.6
The Cortex - 2
Lateral (schematic) view of the human left-brain
hemisphere. Various cortical areas devoted to specialized
tasks can be distinguished [RMS1992, p. 18]
CSE5230 - Data Mining, 2004
Lecture 8.7
Sensory Surfaces
 Most
signals that the brain receives from the
environment come from “sensory surfaces”
covered with receptors:
skin (touch and temperature)
retina (vision)
cochlea [in the ear] (1-D sound sensor)
 It
is usually found that the “wiring” of the nervous
system exhibits topographic ordering:
signals from adjacent receptors tend to be conducted to
adjacent neurons in the cortex
CSE5230 - Data Mining, 2004
Lecture 8.8
Topographic Feature Maps - 1
 This
neighbourhood-preserving organization of
the cortex is called a topographic feature map
For touch, maps of the body are found in the
somatosensory cortex
In the primary visual cortex, neighbouring neurons tend
to respond to stimulation of neighbouring regions of the
retina
 As
well as these simple maps, the brain also
constructs topographic maps of abstract
features:
In the auditory cortex of many higher brains, a tonotopic
map is found, where the pitch of received sounds is
mapped regularly
CSE5230 - Data Mining, 2004
Lecture 8.9
Topographic Feature Maps - 2
Map of part of the body surface in
the somatosensory cortex of a
monkey
CSE5230 - Data Mining, 2004
Direction map for sound signals
in the so-called “optical tectum”
of an owl
[RMS1992, p. 21]
Lecture 8.10
Biological Self-Organizing Maps - 1
 The
subject of SOMs arose from the question of
how such topology-preserving mappings might
arise in neural networks
 It is probable that in biological systems that much
of the organization of such maps is genetically
determined, BUT:
 The brain is estimated to have ~1013 synapses
(connections), so it would be impossible to
produce this organization by specifying each
connection in detail – the genome does not
contain that much information
CSE5230 - Data Mining, 2004
Lecture 8.11
Biological Self-Organizing Maps - 2
 A more
likely scenario is that there are
genetically specified mechanisms of structure
formation that result in the creation of the
desired connectivity
 These could operate before birth, or as part of
later maturation, involving interaction with the
environment
 There is much evidence for such changes:
the normal development of edge-detectors in the visual
cortex of newborn kittens is suppressed in the absence
of sufficient visual experience
the somatosensory maps of adult monkeys have been
observed to adapt following the amputation of a finger
CSE5230 - Data Mining, 2004
Lecture 8.12
Biological Self-Organizing Maps - 3
Readaptation of the somatosensory map of the hand
region of an adult nocturnal ape due to the amputation of
one finger. Several weeks after the amputation of the
middle finger (3), the assigned region has disappeared
and the adjacent regions have spread out. [RMS, p. 117]
CSE5230 - Data Mining, 2004
Lecture 8.13
Artificial Self-Organizing Maps - 1
 In
the NN models we have seen so far, every
neuron in a layer is connected to every neuron in
the next layer of the network
 The location of a neuron in a layer plays no role
in determining its connectivity or weights
 With SOMs, the ordering of neurons within a
layer plays an important role:
How should the neurons organize their connectivity to
optimize the spatial distribution of their responses
within the layer?
CSE5230 - Data Mining, 2004
Lecture 8.14
Artificial Self-Organizing Maps - 2
 The
purpose of this optimization is to achieve the
mapping:
Similarity of
features
Proximity of
excited neurons
 Such
a mapping allows neurons with similar
tasks to communicate over especially short
connection paths - important for a massively
parallel system
 Moreover, it results in the formation of
topographic feature maps:
most important similarity relationships among the
input signals are converted into spatial relationships
between responding neurons
CSE5230 - Data Mining, 2004
Lecture 8.15
Kohonen’s Self-Organizing Network - 1
 Kohonen
[Koh1982] studied a system consisting
of a two-dimensional layer of neurons, with the
properties:
each neuron identified by its position vector r (i.e. its
coordinates)
input signals to the layer represented by a feature
vector x (usually normalized)
output of each neuron is a sigmoidal function of its total
activation (as for MLPs last week):
1
yr  f (netr ) 
1  e  netr
CSE5230 - Data Mining, 2004
Lecture 8.16
Kohonen’s Self-Organizing Network - 2

Each neuron r forms the weighted sum of the input signals.
The external activation is:
net r

external
n
  wrj x j
j 1
(the magnitudes of the weight vectors are usually
normalized)
In addition to the input connections, the neurons in the
layer are connected to each other
 the layer has internal feedback


The weight from neuron r’ to neuron r is labelled grr’
These lateral inputs are superimposed on the external input
signal:
netr   wrj x j   g rr ' yr '
j
CSE5230 - Data Mining, 2004
r'
Lecture 8.17
Kohonen’s Self-Organizing Network - 3
 The
output of neuron r is this given by:


yr  f   wrj x j   g rr ' yr ' 
r'
 j

 The
neuron activities are the solutions of this
system of non-linear equations
The feedback due to the lateral
connections grr’ is usually arranged
so that it is excitatory at small
distances and inhibitory at large
distances. This is often called a
“Mexican Hat” response
CSE5230 - Data Mining, 2004
Lecture 8.18
Kohonen’s Self-Organizing Network - 4
Kohonen’s model showing excitation zone around “winning” neuron
[RMS p. 64]
 The
solution of such systems of non-linear equations is
tedious and time-consuming. Kohonen avoided this by
introducing a simplification.
CSE5230 - Data Mining, 2004
Lecture 8.19
Kohonen’s Self-Organizing Network - 5
 The
response of the network is assumed to
always be the same “shape”:
the response is 1 at the location of the neuron r*
receiving maximal external excitation, and decreases to
0 as one moves away from r*
 The
excitation of neuron r is thus only a function
of its distance from r*:
yr  h r  r *   hrr*
 The
model then proposes a rule for changing the
weights to each neuron so that a topologically
ordered map is formed. Weight change is:
wrj   hrr* x j  hrr* wrj 
CSE5230 - Data Mining, 2004
Lecture 8.20
Kohonen’s Self-Organizing Network - 6
 Experiments
have shown that the precise shape
of the response is not critical
 A suitable function is thus simply chosen. The
Gaussian is a suitable choice:
hrr*  e
  r  r *2
2s 2
parameter s determines the length scale on
which input stimuli cause changes in the map
 The
usually learn coarse structure first and then the fine
structure. This is done by letting s decrease over time
 on the previous slide, which specifies the size of each
change, usually also decreases over time
CSE5230 - Data Mining, 2004
Lecture 8.21
Learning Algorithm
0. Initialization: start with appropriate initial values for the
weights wr. Usually just random
1. Choice of stimulus: Choose an input vector x at random
from the data set
2. Response: Determine the “winning” neuron r* most
strongly activated by x
3. Adaptation: Carry out a “learning” step by modifying the
weights:
w
 h
x  w 
new
old
old
r
r
rr*
r
(Normalize weights if required)
4. Continue with step 1 until specified number of learning
steps are completed
w
CSE5230 - Data Mining, 2004
Lecture 8.22
Examples - 1
SOM that has learnt data
uniformly distributed on a
square
CSE5230 - Data Mining, 2004
SOM that has learnt data on a
rotated square, where points are
twice as likely to occur in a circle
at the centre of the square
(relationship to clustering)
Lecture 8.23
Examples - 2
2-dimensional SOM that has learnt data uniformly distributed in a 3dimensional cube
CSE5230 - Data Mining, 2004
Lecture 8.24
Examples - 3
1-dimensional SOM
that has learnt data
uniformly distributed
in a 2-dimensional
circle
CSE5230 - Data Mining, 2004
Lecture 8.25
Examples - 4
2-dimensional SOM that has learnt 2-dimensional
data containing 3 clusters
CSE5230 - Data Mining, 2004
Lecture 8.26
The SOM for Data Mining
 The
SOM is a good method for obtaining an
initial understanding of a set of data about which
the analyst does not have any opinion (e.g. no
need to estimate number of clusters)
 The map can be used as an initial unbiased
starting point for further analysis. Once the
clusters are selected from the map, they are
analyzed to find out the reasons for such
clustering
It may be possible to determine which attributes were
responsible for the clusters
It may also be possible to identify some attributes which
do not contribute to the clustering
CSE5230 - Data Mining, 2004
Lecture 8.27
Example: Text Mining with a SOM - 1
 This
example comes from the WEBSOM project
in Finland: http://websom.hut.fi/websom/
 WEBSOM is a method for organizing
miscellaneous text documents onto meaningful
maps for exploration and search. WEBSOM
automatically organizes the documents onto a
two-dimensional grid so that related documents
appear close to each other
CSE5230 - Data Mining, 2004
Lecture 8.28
Example: Text Mining with a SOM - 2
 This
map was constructed using more than one
million documents from 83 USENET newsgroups:
 Color
denotes the density or the
clustering tendency of the
documents
 Light (yellow) areas are clusters
and dark (red) areas empty space
between the clusters
 This is a little difficult to read, but
WEBSOM allows one to zoom in
CSE5230 - Data Mining, 2004
Lecture 8.29
Example: Text Mining with a SOM - 3
 Zoomed
view of the WEBSOM map:
blues - rec.music.bluenote
books - rec.arts.books
classical rec.music.classical
humor - rec.humor
lang.dylan comp.lang.dylan
music - music
shostakovich alt.fan.shostakovich
CSE5230 - Data Mining, 2004
Lecture 8.30
Example: Customer Understanding
with a SOM - 1
 This
example is from [YaZ2001], using KDD 2000
Cup data:
clickstream and purchase data from Gazelle.com, a
retailer of legware and legcare products
 On-line
retailers are interested in understanding
their customers, so that they can
Better organize the website
Better target marketing
Improve strategies for acquiring and retaining
customers
 Gazelle.com
was interested in analysing the
differences between light ( $12) and heavy
spenders ( $12)
CSE5230 - Data Mining, 2004
Lecture 8.31
Example: Customer Understanding
with a SOM - 2

Data set and Feature Selection
 Data set has more than 1700 records, each with 426 features
and a variable indicating light or heavy spending. Features
include:
» age (discrete)
» income band (ordered), e.g.

< $15,000, $15,000-$19,999, $20,000-$29,999,…
» percentage of discounted items in purchase (continous)
 [YaZ2001] compared a variety of methods for generating a
reduced feature set. These were adapted from criteria used in
other DM techniques:
» Discriminant analysis, decision tree, naïve Bayes,
Principal Components Analysis (PCA)
 The different methods highlighted a variety of features, e.g.:
» discount rate, average and total weight of items,
minimum shipping order amount, geographic location,
house value, vendor, main template views, etc.
CSE5230 - Data Mining, 2004
Lecture 8.32
Example: Customer Understanding
with a SOM - 3
 [YaZ2001]
selected the eight variables indicated
by discriminant analysis
Projection onto the first two
components provided by
PCA of these data did not
show clear separation into
two clusters:
x: heavy spender
o: light spender
This could indicate the
presence of a non-linear
relationship
CSE5230 - Data Mining, 2004
Lecture 8.33
Example: Customer Understanding
with a SOM - 4
 Then
applied a modified self-organizing map,
called a Generative Topographic Mapping (GTM)
to produce another 2-D visualization of the data:
Separation of classes
into seven clusters
now much better:
1: heavy 88%, light 12%
2: heaving 93%, light 7%
3: heavy 100%
4: light 100%
5: light 94%, heavy 6%
6: light 93%, heavy 7%
7: light 97%, heavy 3%
CSE5230 - Data Mining, 2004
Lecture 8.34
Example: Customer Understanding
with a SOM - 5
 Analysis
of the features corresponding to these
clusters reveals facts such as:
Cluster 4 (100% light) are those customers with more
than 40% discounted items in their purchases
Clusters 1-3: Those who heard about the company from
friend/family are light spenders…
…but those who heard from a means other than
news, e-mail, print ad, direct mail, or friend/family were
heavy spenders
Clusters 6-7: people who frequently wear casual or
athletic socks are light spenders
 Insights
such as these could be used for
managing marketing, and also pricing policies
(e.g. discounts)
CSE5230 - Data Mining, 2004
Lecture 8.35
References
 [Koh1982]
Teuvo Kohonen, Self-organized
formation of topologically correct feature maps,
Biological Cybernetics, 43:59-69, 1982
 [RMS1992] Helge Ritter, Thomas Martinetz and
Klaus Schulten, Neural computation and selforganizing maps: an introduction, AddisonWesley, 1992
 [YaZ2001] Jinsan Yang and Byoung-Tak Zhang,
Customer Data Mining and Visualization by
Generative Topographic Mapping Methods, In
Simeon J. Simoff, Monique Noirhomme-Fraiture
and Michael H. Böhlen eds., Proceedings of the
International Workshop on Visual Data Mining
(VDM@ECML/PKDD2001), Freiburg, Germany,
pp. 55-66, 4 September 2001
CSE5230 - Data Mining, 2004
Lecture 8.36