Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An Introduction to Topological Data Analysis Elizabeth Munch Duke University :: Dept of Mathematics Tuesday, June 11, 2013 Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 1 / 43 “There is no branch of mathematics, however abstract, which may not some day be applied to phenomena of the real world.” -Nikolai Lobachevsky Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 2 / 43 Finding Structure in Data Sets 1.5 1.0 0.5 0.0 0.5 1.0 1.51.5 Elizabeth Munch (Duke) 1.0 0.5 0.0 CompTop-SUNYIT 0.5 1.0 1.5 Tuesday, June 11, 2013 3 / 43 Outline 1 Homology & Persistent Homology 2 Statistics 3 Behavioral Clustering and Future Work Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 4 / 43 Outline 1 Homology & Persistent Homology 2 Statistics 3 Behavioral Clustering and Future Work Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 5 / 43 Homology & Persistent Homology What is Homology? A topological invariant which assigns a sequence of vector spaces, Hk (X ), to a given topological space X . Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 6 / 43 Homology & Persistent Homology Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 6 / 43 Homology & Persistent Homology Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 6 / 43 Homology & Persistent Homology Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 6 / 43 Homology & Persistent Homology Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 6 / 43 Homology & Persistent Homology Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 6 / 43 Homology & Persistent Homology Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 6 / 43 Homology & Persistent Homology What is Persistent Homology? A way to watch how the homology of a filtration (sequence) of topological spaces changes so that we can understand something about the space. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 7 / 43 Homology & Persistent Homology What is Persistent Homology? A way to watch how the homology of a filtration (sequence) of topological spaces changes so that we can understand something about the space. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 7 / 43 Homology & Persistent Homology What is Persistent Homology? A way to watch how the homology of a filtration (sequence) of topological spaces changes so that we can understand something about the space. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 7 / 43 Homology & Persistent Homology What is Persistent Homology? A way to watch how the homology of a filtration (sequence) of topological spaces changes so that we can understand something about the space. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 7 / 43 Homology & Persistent Homology What is Persistent Homology? A way to watch how the homology of a filtration (sequence) of topological spaces changes so that we can understand something about the space. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 7 / 43 Homology & Persistent Homology What is Persistent Homology? A way to watch how the homology of a filtration (sequence) of topological spaces changes so that we can understand something about the space. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 7 / 43 Homology & Persistent Homology What is Persistent Homology? A way to watch how the homology of a filtration (sequence) of topological spaces changes so that we can understand something about the space. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 7 / 43 Homology & Persistent Homology What is Persistent Homology? A way to watch how the homology of a filtration (sequence) of topological spaces changes so that we can understand something about the space. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 7 / 43 Filtration Given topological space K and filtration K0 ⊆ K1 ⊆ K2 ⊆ · · · ⊆ Kn gives a sequence of maps on homology H2 (K0 ) Elizabeth Munch (Duke) / H2 (K1 ) / H2 (K2 ) CompTop-SUNYIT / ··· / H2 (Kn ) Tuesday, June 11, 2013 8 / 43 Birth and Death Birth A class γ ∈ Hp (Ki ) is born at Ki if it is not in the image of Death A class γ dies entering Kj if merges with an older class. Hp (Ki−1 ) −→ Hp (Ki ) Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 9 / 43 Simplicial Complex Simplices Elizabeth Munch (Duke) Simplicial Complex CompTop-SUNYIT Tuesday, June 11, 2013 10 / 43 Boundary Matrix Data Structure Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 11 / 43 Large Data Sets 1.5 1.0 0.5 0.0 0.5 1.0 1.51.5 Elizabeth Munch (Duke) 1.0 0.5 0.0 CompTop-SUNYIT 0.5 1.0 1.5 Tuesday, June 11, 2013 12 / 43 Large Data Sets Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 13 / 43 Large Data Sets Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 13 / 43 Large Data Sets Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 13 / 43 Rips complex vs Čech complex Čech Complex Rips Complex Set of points χ ⊂ Rn Set of points χ ⊂ Rn Rips complex R Čech complex C σ∈R ⇔ kvi − vj k ≤ r for all vi , vj ∈ σ σ∈C ⇔ T Br (Vi ) 6= ∅ Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 14 / 43 Persistence Algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Elizabeth Munch (Duke) 1 2 3 4 5 6 7 1 2 3 4 5 6 7 CompTop-SUNYIT 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Tuesday, June 11, 2013 15 / 43 Persistence Algorithm R=D V =I for j = 1 to m: while ∃ j0 < j with low (j0 ) = low (j): add column j0 to column j 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Elizabeth Munch (Duke) 1 2 3 4 5 6 7 1 2 3 4 5 6 7 CompTop-SUNYIT 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Tuesday, June 11, 2013 15 / 43 Visualizing Birth and Death 7 6 5 4 3 2 1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Elizabeth Munch (Duke) 1 2 3 4 5 6 7 1 2 3 4 5 6 7 CompTop-SUNYIT Tuesday, June 11, 2013 16 / 43 Large Data Sets 1.5 2.5 1.0 2.0 0.5 1.5 0.0 1.0 0.5 0.5 1.0 1.51.5 1.0 0.5 0.0 Elizabeth Munch (Duke) 0.5 1.0 1.5 0.00.0 CompTop-SUNYIT 0.5 1.0 1.5 2.0 Tuesday, June 11, 2013 2.5 17 / 43 Wasserstein Distance on Dp c y d z b x a p-Wasserstein distance for diagrams Given diagrams X and Y , the distance between them is !1/p Wp [Lq ](X , Y ) = Elizabeth Munch (Duke) inf ϕ:X →Y X (kx − ϕ(x)kq ) p . x∈X CompTop-SUNYIT Tuesday, June 11, 2013 18 / 43 Stability Theorem (Cohen-Steiner, Edelsbrunner, Harer, Mileyko) Let X be a triangulable metric space that implies bounded degree k total persistence, for k ≥ 1, and let f , g : X → R be two tame Lipschitz functions. Then Wp (f , g ) ≤ C 1/p · kf − g k1−k/p ∞ for all p ≥ k, where C = CX max{Lip(f )k , Lip(g )k }. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 19 / 43 Properties of Dp Mileyko, Mukherjee, Harer Complete Separable Characterization of Compact Sets Turner, Mileyko, Mukherjee, Harer Non-negatively curved Alexandrov space Geodesics Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 20 / 43 1 Homology & Persistent Homology 2 Statistics 3 Behavioral Clustering and Future Work Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 21 / 43 Statistics Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 22 / 43 Statistics Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 22 / 43 Statistics How do we give a summary of the data? Will it play nicely with time varying persistence diagrams? Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 22 / 43 Fréchet mean Fréchet mean Given a probability space (Dp , B(Dp ), P), the quantity " Z VarP = inf X ∈Dp # Wp (X , Y )2 dP(Y ) < ∞ FP (X ) = Dp is the Fréchet variance of P. The set at which the value is obtained E(P) = {X |FP (X ) = VarP} is the Fréchet expectation, also called Fréchet mean. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 23 / 43 Properties of the Fréchet mean on Dp Theorem (Mileyko et al.) Let P be a probability measure on (Dp , B(Dp )) with a finite second moment. If P has compact support, then E(P) 6= ∅. Theorem (Mileyko et al.) Let P be a tight probability measure on (Dp , B(Dp )) with the rate of decay at infinity q > max{2, p}. Then E(P) 6= ∅. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 24 / 43 Goal Given diagrams X1 , · · · , XN , find a diagram Y which minimizes the Fréchet function. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 25 / 43 Example f b z c x y g h a Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 26 / 43 Example Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 26 / 43 Algorithm for Computation - Outline Algorithm - Kate Turner et al. 1 Persistence Diagrams X1 , · · · , XN 2 Pick one at random to start, Y = Xi 3 Repeat: I I I z CompTop-SUNYIT c x y Find best matching for Wasserstein distance Wp (Y , Xi ) Create matching G with a with a selection for each point yj ∈ Y with the points xi ∈ Xi for each i paired with yj Replace Y with meanX (G ) Elizabeth Munch (Duke) f b g h a Tuesday, June 11, 2013 27 / 43 Algorithm for Computation - Outline f b z c x y Theorem (Turner et al.) The algorithm terminates at a local minimum of the Fréchet function. g h a Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 27 / 43 The Problem with Pointwise Fréchet means Elizabeth Munch (Duke) 1 b a 2 CompTop-SUNYIT Tuesday, June 11, 2013 28 / 43 The Problem with Pointwise Fréchet means 1 x u a Elizabeth Munch (Duke) b v y 2 CompTop-SUNYIT Tuesday, June 11, 2013 28 / 43 The Solution Instead of a single diagram, return a distribution on diagrams! Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 29 / 43 The Solution Instead of a single diagram, return a distribution on diagrams! Main Idea: 1 Perturb the diagrams and compute the matching. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 29 / 43 The Solution Instead of a single diagram, return a distribution on diagrams! Main Idea: 1 Perturb the diagrams and compute the matching. 2 Do this repeatedly to get a distribution on matchings. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 29 / 43 The Solution Instead of a single diagram, return a distribution on diagrams! Main Idea: 1 Perturb the diagrams and compute the matching. 2 Do this repeatedly to get a distribution on matchings. 3 Associate the probability of getting a particular matching to the mean of the matching. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 29 / 43 The Solution Instead of a single diagram, return a distribution on diagrams! Main Idea: 1 Perturb the diagrams and compute the matching. 2 Do this repeatedly to get a distribution on matchings. 3 Associate the probability of getting a particular matching to the mean of the matching. Limit to SM,K ⊂ Dp : inside triangle of height M with at most K off-diagonal points. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 29 / 43 The Drawing Procedure - Make Xi0 Pick α Pick distribution ηx with mean at x and support contained in Bα (x). For each x ∈ Xi , make Xi0 by: 1 2 Draw point from ηx If contained in Bkx−∆k (x), add it to Xi0 . Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 30 / 43 The Drawing Procedure - Determine Matching f b z c x y g h a Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 31 / 43 The Drawing Procedure - Determine Matching f' b' x' z' g' y' h' Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 31 / 43 The Drawing Procedure - Determine Matching f' b' 0 dF 1 b0 2 ∆ 3 ∆ 4 ∆ x' z' d0 x0 y0 ∆ z0 0 d f0 g0 . h0 ∆ g' y' h' Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 31 / 43 The Drawing Procedure - Determine Matching f b z c x y d0 x0 y0 ∆ z0 0 d f0 g0 . h0 ∆ dF 1 b 2 ∆ 3 ∆ 4 ∆ 5 a 6 c d x y ∆ z ∆ ∆ d f g h . ∆ ∆ g h a Elizabeth Munch (Duke) 0 dF 1 b0 2 ∆ 3 ∆ 4 ∆ CompTop-SUNYIT Tuesday, June 11, 2013 ∆ 31 / 43 The Drawing Procedure - Create Distribution 1 b a 2 Definition µX = X P(G ) δmeanX (G ) G Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 32 / 43 The Drawing Procedure - Create Distribution 1 x u a Definition µX = X b v y 2 P(G ) δmeanX (G ) G Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 32 / 43 Continuity Theorem (Munch et al) The map SM,K × · · · × SM,K X1 , · · · , XN −→ 7−→ P(SM,K ) µX is continuous. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 33 / 43 Examples Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 34 / 43 1 Homology & Persistent Homology 2 Statistics 3 Behavioral Clustering and Future Work Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 35 / 43 How do we automate behavioral analysis? Image: Wikipedia Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 36 / 43 Behavior Vectors Main Idea: Quantify a particular behavior as a vector in RD and cluster these points. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 37 / 43 Behavior Vectors Main Idea: Example: Quantify a particular behavior as a vector in RD and cluster these points. Loop time vector: Put an entry into a list for each loop in a track, sort, and compare. Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 37 / 43 Behavior Vectors Main Idea: Example: Quantify a particular behavior as a vector in RD and cluster these points. Loop time vector: Put an entry into a list for each loop in a track, sort, and compare. Black: (6.52, 0, 0, 0) Purple: (6.63, 0, 0, 0) Orange: (5.01, 4.85, 0, 0) Red: (8.20, 6.46, 0, 0) Blue: (21.17, 17.60, 13.26, 7.54) Green: (15.40, 11.78, 8.28, 7.12) Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 37 / 43 Dendrograms Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 38 / 43 Experimental Results Experiment Tracks generated using open-source SUMO software. Additional tracks forced to make small circles, larger circles, and figure eights. Loops computed by: 1 2 Elizabeth Munch (Duke) checking for crossings, looking at time needed to complete loop. computing persistence, using lifetime of class as score. CompTop-SUNYIT Tuesday, June 11, 2013 39 / 43 Experimental Results - NonPersistence Random Small Loop Large Loop Figure 8 Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 40 / 43 Experimental Results - Persistence Random Small Loop Large Loop Figure 8 Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 41 / 43 Group Formation Image: Wikipedia Elizabeth Munch (Duke) CompTop-SUNYIT Tuesday, June 11, 2013 42 / 43 Acknowledgements Funding: DDR&E Honors: Jo Rae Wright Fellowship for Outstanding Women in Science, 2012-2013 Elizabeth Munch (Duke) Collaborators: John Harer Paul Bendich Kate Turner (UChicago) Sayan Mukherjee Jonathan Mattingly Ingrid Daubachies Robert Calderbank Peter Sang Chin (APL/JHU) CompTop-SUNYIT Tuesday, June 11, 2013 43 / 43