Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov Plan of the Talk A. Clustering (Brief overview). B. Deterministic Perturbation Theory. C. Statistical Perturbation Theory. Graph Clustering 2 5 1 4 3 7 6 Graph Clustering 2 5 1 4 3 7 6 Graph Clustering + Perturbation 2 5 ? 1 4 3 7 6 An Application Gene Expression Data Clustering 1. Genes in same cluster behave similarly? 2. Genes in different clusters behave differently? Issues: • There are over 10 000 genes expressed in any one tissue; • DNA arrays typically produce very noisy data. Bi-partite Graphs 1 1 2 2 3 3 4 Matrix Form A Real Data Matrix (Leukemia) Spectral Clustering: General Idea Discrete Optimisation Problem (NP - Hard) Exact - Impractical Approximation Real Optimisation Problem (Tractable) Heuristic - Practical Discrete Optimisation SVD Active Inactive Solution: Inactive Active Singular Value Decomposition of Wscaled Clustering Algorithm: Summary ACTIVE INACTIVE INACTIVE ACTIVE Literature Types of Graph Matrices How we Cluster Leukemia Data Clustered Leukemia Data Inaccuracies in the Data (Perturbation Theory) Perturbation Theory (Deterministic Noise) Deterministic Perturbation (Symmetric Matrix) Linear Solve Taylor Expansions Rectangular Case Symmetric Random Perturbations (plan) • The Model • Issues with the Theory • A Possible Solution via Simulations? • Experiments The Model 2 5 1 4 3 7 6 Difficulties with Random Matrix Theory (RMT) Deterministic Perturbation Stochastic Perturbation (simple eigenvector) Deterministic Perturbation Stochastic Perturbation (simple eigenvalues) PP Plot -Test for Normality (Largest eigenvalue of a Symmetric Matrix) Simulated Random Perturbation (Largest eigenvalue of a Symmetric Matrix) Deterministic Perturbation Stochastic Perturbation (simple eigenvectors) Results for Laplacian Matrices Functional of the Eigenvector Results for hTv2 PP Plot of hTv’(0) - Test for Normality (h = ej) Histogram of hTv’(0) - Simulations (h = ej) PP Plot of Simulated v[j]() (Distribution close to Normal) Histogram of Simulated v[j]() (Distribution close to Normal) Extension to the Rectangular Case Probability of “Wrong Clustering” Issues with Numerics Efficient Simulations Solution via Simulations? Solution via Simulations? (Algorithm) Comparing: Direct Calculation Vs. Repeated Linear Solve