Download Statistical Perturbation Theory for Spectral Clustering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistical perturbation theory
for spectral clustering
Harrachov, 2007
A. Spence and Z. Stoyanov
Plan of the Talk
A. Clustering (Brief overview).
B. Deterministic Perturbation Theory.
C. Statistical Perturbation Theory.
Graph Clustering
2
5
1
4
3
7
6
Graph Clustering
2
5
1
4
3
7
6
Graph Clustering + Perturbation
2
5
?
1
4
3
7
6
An Application
Gene Expression Data
Clustering
1. Genes in same cluster behave similarly?
2. Genes in different clusters behave differently?
Issues:
• There are over 10 000 genes expressed in any one tissue;
• DNA arrays typically produce very noisy data.
Bi-partite Graphs
1
1
2
2
3
3
4
Matrix Form
A Real Data Matrix (Leukemia)
Spectral Clustering: General Idea
Discrete Optimisation Problem
(NP - Hard)
Exact - Impractical
Approximation
Real Optimisation Problem
(Tractable)
Heuristic - Practical
Discrete Optimisation  SVD
Active
Inactive
Solution:
Inactive
Active
Singular
Value
Decomposition of Wscaled
Clustering Algorithm: Summary
ACTIVE INACTIVE
INACTIVE ACTIVE
Literature
Types of Graph Matrices
How we Cluster
Leukemia Data
Clustered Leukemia Data
Inaccuracies in the Data
(Perturbation Theory)
Perturbation Theory
(Deterministic Noise)
Deterministic Perturbation
(Symmetric Matrix)
Linear Solve
Taylor Expansions
Rectangular Case  Symmetric
Random Perturbations
(plan)
• The Model
• Issues with the Theory
• A Possible Solution via Simulations?
• Experiments
The Model
2
5
1
4
3
7
6
Difficulties with Random Matrix
Theory (RMT)
Deterministic Perturbation 
Stochastic Perturbation
(simple eigenvector)
Deterministic Perturbation 
Stochastic Perturbation
(simple eigenvalues)
PP Plot -Test for Normality
(Largest eigenvalue of a Symmetric Matrix)
Simulated Random Perturbation
(Largest eigenvalue of a Symmetric Matrix)
Deterministic Perturbation 
Stochastic Perturbation
(simple eigenvectors)
Results for Laplacian Matrices
Functional of the Eigenvector
Results for hTv2
PP Plot of hTv’(0) - Test for Normality
(h = ej)
Histogram of hTv’(0) - Simulations
(h = ej)
PP Plot of Simulated v[j]()
(Distribution close to Normal)
Histogram of Simulated v[j]()
(Distribution close to Normal)
Extension to the Rectangular Case
Probability of “Wrong Clustering”
Issues with Numerics
Efficient Simulations
Solution via Simulations?
Solution via Simulations?
(Algorithm)
Comparing: Direct Calculation Vs.
Repeated Linear Solve
Related documents