Download Guidelines for Building Self-Organizing Maps

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cell culture wikipedia , lookup

Cell encapsulation wikipedia , lookup

Cellular differentiation wikipedia , lookup

List of types of proteins wikipedia , lookup

Cell growth wikipedia , lookup

Tissue engineering wikipedia , lookup

Extracellular matrix wikipedia , lookup

Organ-on-a-chip wikipedia , lookup

Amitosis wikipedia , lookup

Transcript
Guidelines for Building Self‐Organizing Maps
Jochen Wendel
Barbara Buttenfield
University of Colorado ‐ Boulder
[email protected]
Motivation
o SOMs are complex in nature
o No clear instructions on building SOM correctly
o Literatures varies in recommendations for building SOM Establishing Guidelines for building SOMs
What is a SOM?
• Self‐Organizing Maps, Kohonen Maps (Kohonen 1982), C. von der Malsburg (1973)
• Artificial Neural Network
• Based on unsupervised learning
• Classification and Data Reduction (Dimensionality)
• GIScience, Data Mining, Biology, Computer Science, ...
How does the SOM algorithm work?
SOMs organize themselves by competing for assignment of observations. Cells adjust their weights with each placement by becoming more similar to cells in their immediate vicinity which have had assignments.
– Create a list of observations with values on a number of variables (dimensions)
– Compute the similarity among observations based on all variables
– Iteratively classify on the basis of similarity ‐‐ “similarity is distance”
How does the SOM algorithm work?
• Every cell assigned a random weight • Place observation in the cell with the most similar weight (based on input variables) (BMU)
• Update neighboring cells to weights similar to the situated cell
• Place another observation
• Adjust the weights
Etc.
• Repeat this process
many times
SOM (typical MatLab output)
• 16x16 total
• Talk about the variables
• What do you see?
• Umatix
Case Study Data
•
•
•
•
•
List of 100+ GIS commands compiled
Attributes describe each command (“keywords”)
Commands focus on hydrology modeling
Simple versus compound commands
Binary matrix
Case Study Data
Raster Only
1 = task is raster data only
Raster and
Vector
1 = operates on both
Vector Only
Data
1 = task is a data mgt. function (copy, delete, etc.)
Management
Simple
1 = atomic command
Compound
Geometric
1 = task modifies geometry
Attribute
Terrain
1 = task deals with terrain
Flow
1 = task deals with flow
Regional
1 = task works on neighborhood
Local
1 = task works on each individual pixel
CSR
1 = task changes spatial relation
Terrain and Flow
Global
Steps in Building a SOM
•
•
•
•
•
•
•
Initialization SOM size SOM shape
Neighborhood geometry
Training length
Quantifying the error
Edge Effects
SOM Size
Qualitative Recommendations (Ultsch and Simon, 1990) :
• What is your goal with the SOM?
• Small, Medium and Large SOMs
Quantitative Recommendations:
• Avoid empty cells
• Optimal size after Vesanto (2005)
SOM Size
n = 108 obs
msize = 5 * SQRT (108)
= 51.96 cells
Flow
8x8
16x16
32x32
Optimal Size after Vesanto (2005)
• The optimal solution after Vesanto (2005) would suggest 165 as toatal SOM size
SOM Shape
• Quantitative Recommendations (Kohonen 1995):
• Symmetrical vs. Non‐symmetrical shapes
• Hexagonal vs. Square Grid
SOM Shape Symmetrial
8x8
Non ‐ Symmetrial
12x8
16x16
24x16
32x32
36x34
SOM Shape Symmetrial
8x8
Non ‐ Symmetrial
12x8
16x16
24x16
32x32
36x34
SOM Shape Symmetrial
8x8
Non ‐ Symmetrial
12x8
16x16
24x16
32x32
36x34
SOM Shape Symmetrial
8x8
Non ‐ Symmetrial
12x8
16x16
24x16
32x32
36x34
Qunatifing the Error ‐ Umatrix
•
•
•
•
Unified Distance Matrix
Double the number of cells
Distance measurement between attributes
Cluster can be seen as metaphor for landscape
• Quantisation Error
Qunatifing the Error ‐ Umatrix
•
•
•
•
Unified – Distance Matrix
Double in Size
Distance Measurment between attributes
Cluster can be seen as metaphor for landscape
• Quantisation Error
Quantifying the Error
qe = 0.468
qe = 0.1084
12x8
24x16
qe = 0.3689
qe = 0.0422
qe = 8.6026x10-5
36x34
qe = 2.4736x10-5
Guidelines (Size and Shape)
• Remove redundancy in the dataset (degrees of freedom)
• Vesanto (2005) helps determine the SOM size
• Non‐symmetrical SOMs have less edge effects
• Quantization Error is a helpful measurement to avoid overfitting the model (values very close to zero)