* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download lecture12and13_clustering
Survey
Document related concepts
Transcript
ICS 278: Data Mining Lectures 12,13: Clustering Algorithms Padhraic Smyth Department of Information and Computer Science University of California, Irvine Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Clustering • “automated detection of group structure in data” – Typically: partition N data points into K groups (clusters) such that the points in each group are more similar to each other than to points in other groups – descriptive technique (contrast with predictive) – for real-valued vectors, clusters can be thought of as clouds of points in p-dimensional space Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine References • Chapter 9 (sections 9.3 through 9.6) in the text • Algorithms for Clustering Data, A. K. Jain and R. C. Dubes, 1988, Prentice Hall. (a bit outdated but has many useful ideas and references on clustering) • Cluster Analysis (4th ed), B. S. Everitt, S. Landau, and M. Leese, Arnold Publishers, 2001 (broad overview of clustering methods) • How many clusters? which clustering method? answers via modelbased cluster analysis, C. Fraley and A. E. Raftery, the Computer Journal, 1998. (good overview article on probabilistic model-based clustering, available from class Web page) Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Clustering Sometimes easy Sometimes impossible and sometimes in between Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Why is Clustering useful? • “Discovery” of new knowledge from data – Contrast with supervised classification (where labels are known) – Long history in the sciences of categories, taxonomies, etc – Can be very useful for summarizing large data sets • For large n and/or high dimensionality • Applications of clustering – – – – – Clustering of documents produced by a search engine Segmentation of customers for an e-commerce store Discovery of new types of galaxies in astronomical data Clustering of genes with similar expression profiles Cluster pixels in an image into regions of similar intensity – …. many more Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine General Issues in Clustering • Clustering algorithm = Representation + Score + Optimization • Cluster Representation: • Score: • Optimization and Search • – – – – What types or “shapes” of clusters are we looking for? What defines a cluster? A clustering = assignment of n objects to K clusters Score = quantitative criterion used to evaluate different clusterings Finding the optimal (minimal/maximal score) clustering is typically NP-hard • Greedy algorithms to optimize the score are widely used Other issues – Distance function, D[x(i),x(j)] critical aspect of clustering, both – – How is K selected? Different types of data Data Mining Lectures • • distance of individual pairs of objects distance of individual objects from clusters • • Real-valued versus categorical Attribute-valued vectors vs. n2 distance matrix Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Different Types of Clustering Algorithms • partition-based clustering – e.g. K-means • probabilistic model-based clustering – e.g. mixture models [both of the above work with measurement data, e.g., feature vectors] • hierarchical clustering – e.g. hierarchical agglomerative clustering • graph-based clustering – E.g., min-cut algorithms [both of the above work with distance data, e.g., distance matrix] Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Partition-Based Clustering • input: n data points X={x(1) … x(n)} • output: C = {C1 … CK} = specification of K clusters – implicit representation: • each x(i) is assigned to a unique Cj (hard-assignment) – explicit representation • each Cj is specified in some manner, e.g., as a mean or a region in input space • Optimization algorithm – require that score[C] is minimized (or maximized) • e.g., sum-of-squares of within cluster distances – exhaustive search is intractable – combinatorial optimization problem: assign n objects to K classes – large search space: number of possible clusterings is approximately Kn / K! • so, use greedy iterative method • will be subject to local maxima Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Score Functions for Partition-Based Clustering • want compact clusters – minimize within cluster distances wc(C) • want different clusters far apart – maximize between cluster distances bc(C) • given cluster partitioning C, find centers c1…cK – e.g. for vectors, use centroids of points in cluster Ci • ci = 1/(ni) x Ci x – wc(C) = sum-of-squares within cluster distance (minimize) • wc(C) = i=1…k wc(Ci) = i=1…k x Ci d(x,ci)2 – bc(C) = distance between clusters (maximize) • bc(C) = i,j=1…k d(ci,cj)2 • Can define scores as combinations, i.e., score[C]=f[wc(C),bc(C)] – See discussion on pp 297-300 in text Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine K-means Clustering • basic idea: – Score = wc(C) = sum-of-squares within cluster distance – start with randomly chosen cluster centers c1 … cK – repeat until no cluster memberships change: • assign each point x to cluster with nearest center – find smallest d(x,ci), over all c1 … cK • recompute cluster centers over data assigned to them – ci = 1/(ni) x Ci x • algorithm terminates (finite number of steps) – Score(C) at each iteration (if membership changes) • converges to at least a local minimum of Score(C) – not necessarily the global minimum … – different initial centers (seeds) can lead to diff local minima Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine K-means Complexity • time complexity = O(I e n K) << exhaustive ~ Kn / K! – I = number of interations (steps) – e = cost of distance computation (e=p for Euclidean dist) • Approximations/speed-up tricks for very large n – use x(i)’s nearest to means as cluster centers instead of actual mean • reuse of cached dists from size n2 dist mat D (lowers effective “e”) – “condense”: reduce “n” by replacing group with prototype – Additional references: • D. Pelleg and A. Moore, Accelerating Exact k-means Algorithms with Geometric Reasoning, ACM SIGKDD Conference, 1999. • C. Elkan, Using the Triangle Inequality to Accelerate k-means. In Proceedings of the Twentieth International Conference on Machine Learning (ICML'03), 2003. Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine K-means 1. Ask user how many clusters they’d like. (e.g. K=5) (Example is courtesy of Andrew Moore, CMU) Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine K-means 1. Ask user how many clusters they’d like. (e.g. K=5) 2. Randomly guess K cluster Center locations Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine K-means 1. Ask user how many clusters they’d like. (e.g. K=5) 2. Randomly guess K cluster Center locations 3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints) Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine K-means 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it’s closest to. 4. Each Center finds the centroid of the points it owns Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine K-means 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it’s closest to. 4. Each Center finds the centroid of the points it owns 5. New Centers => new boundaries 6. Repeat until no change Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Image K-means clustering of RGB (3 value) pixel color intensities, K = 11 segments (from David Forsyth, UC Berkeley) Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Image Clusters on color K-means clustering of RGB (3 value) pixel color intensities, K = 11 segments (from David Forsyth, UC Berkeley) Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Issues in K-means clustering • Simple, but useful – tends to select compact “isotropic” cluster shapes – can be useful for initializing more complex methods – many algorithmic variations on the basic theme • e.g., in signal processing/data compression is similar to vector-quantization • Choice of distance measure – Euclidean distance – Weighted Euclidean distance – Many others possible • Selection of K – “screen diagram” - plot SSE versus K, look for “knee” of curve • Limitation: may not be any clear K value Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Finite Mixture Models p (x) ? Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Finite Mixture Models K p (x) p(x, ck ) k 1 Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Finite Mixture Models K p (x) p (x, ck ) k 1 K p (x | ck ) p (ck ) k 1 Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Finite Mixture Models K p(x) p(x, ck ) k 1 K p(x | ck ) p(ck ) k 1 K p(x | ck , k ) wk k 1 Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Finite Mixture Models K p(x) p(x, ck ) k 1 K p(x | ck ) p(ck ) k 1 K p(x | ck , k ) wk k 1 Component Modelk Data Mining Lectures Lectures 12,13: Clustering Weightk Parametersk Padhraic Smyth, UC Irvine 0.5 p(x) 0.4 Component 1 Component 2 0.3 0.2 0.1 0 -5 0 5 10 5 10 0.5 p(x) 0.4 Mixture Model 0.3 0.2 0.1 0 -5 Data Mining Lectures 0 x Lectures 12,13: Clustering Padhraic Smyth, UC Irvine 0.5 p(x) 0.4 Component 1 Component 2 0.3 0.2 0.1 0 -5 0 5 10 5 10 0.5 p(x) 0.4 Mixture Model 0.3 0.2 0.1 0 -5 Data Mining Lectures 0 x Lectures 12,13: Clustering Padhraic Smyth, UC Irvine 2 p(x) 1.5 Component Models 1 0.5 0 -5 0 5 10 5 10 0.5 p(x) 0.4 Mixture Model 0.3 0.2 0.1 0 -5 Data Mining Lectures 0 x Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Interpretation of Mixtures 1. C has a direct (physical) interpretation e.g., C {age of fish}, C = {male, female} Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Interpretation of Mixtures 1. C has a direct (physical) interpretation e.g., C {age of fish}, C = {male, female} 2. C is a convenient hidden variable (i.e., the cluster variable) - focuses attention on subsets of the data e.g., for visualization, clustering, etc - C might have a physical/real interpretation but not necessarily so Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Probabilistic Clustering: Mixture Models • assume a probabilistic model for each component cluster • mixture model: f(x) = k=1…K wk fk(x;k) • wk are K mixing weights – 0 wk 1 and k=1…K wk = 1 • where K component densities fk(x;k) can be: – – – – Gaussian Poisson exponential ... P d • Note: – Assumes a model for the data (advantages and disadvantages) – Results in probabilistic membership: p(cluster k | x) Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Gaussian Mixture Models (GMM) • model for k-th component is normal N(k,k) – often assume diagonal covariance: jj = j2 , ij = 0 – or sometimes even simpler: jj = 2 , ij = 0 • f(x) = k=1…K wk fk(x;k) with k = <k , k> or <k ,k> • generative model: – randomly choose a component • selected with probability wk – generate x ~ N(k,k) – note: k & k both d-dim vectors Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Learning Mixture Models from Data • Score function = log-likelihood L() – L() = log p(X|) = log H p(X,H|) – H = hidden variables (cluster memberships of each x) – L() cannot be optimized directly • EM Procedure – General technique for maximizing log-likelihood with missing data – For mixtures • E-step: compute “memberships” p(k | x) = wk fk(x;k) / f(x) • M-step: pick a new to maximize expected data log-likelihood • Iterate: guaranteed to climb to (local) maximum of L() Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine The E (Expectation) Step Current K clusters and parameters n data points E step: Compute p(data point i is in group k) Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine The M (Maximization) Step New parameters for the K clusters n data points M step: Compute , given n data points and memberships Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Complexity of EM for mixtures K models n data points Complexity per iteration scales as O( n K f(p) ) Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Comments on Mixtures and EM Learning • Complexity of each EM iteration – Depends on the probabilistic model being used • e.g., for Gaussians, Estep is O(nK), Mstep is O(nKp2) – Sometimes E or M-step is not closed form • => can require numerical optimization or sampling within each iteration • Generalized EM (GEM): instead of maximizing likelihood, just increase likelihood • EM can be thought of as hill-climbing with direction and step-size provided automatically • K-means as a special case of EM – Gaussian mixtures with isotropic (diagonal, equi-variance) k ‘s – Approximate the E-step by choosing most likely cluster (instead of using membership probabilities) • Generalizations… – – – – Data Mining Lectures Mixtures of multinomials for text data Mixtures of Markov chains for Web sequences + more Will be discussed later in lectures on text and Web data Lectures 12,13: Clustering Padhraic Smyth, UC Irvine ANEMIA PATIENTS AND CONTROLS Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 Data Mining Lectures 3.4 3.5 3.6 3.7 Red Blood Volume Lectures 12,13:Cell Clustering 3.8 3.9 4 Padhraic Smyth, UC Irvine EM ITERATION 1 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 Data Mining Lectures 3.4 3.5 3.6 3.7 RedLectures Blood Volume 12,13:Cell Clustering 3.8 3.9 4 Padhraic Smyth, UC Irvine EM ITERATION 3 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 Data Mining Lectures 3.4 3.5 3.6 3.7 RedLectures Blood Volume 12,13:Cell Clustering 3.8 3.9 4 Padhraic Smyth, UC Irvine EM ITERATION 5 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 Data Mining Lectures 3.4 3.5 3.6 3.7 RedLectures Blood Volume 12,13:Cell Clustering 3.8 3.9 4 Padhraic Smyth, UC Irvine EM ITERATION 10 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 Data Mining Lectures 3.4 3.5 3.6 3.7 RedLectures Blood Volume 12,13:Cell Clustering 3.8 3.9 4 Padhraic Smyth, UC Irvine EM ITERATION 15 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 Data Mining Lectures 3.4 3.5 3.6 3.7 RedLectures Blood Volume 12,13:Cell Clustering 3.8 3.9 4 Padhraic Smyth, UC Irvine EM ITERATION 25 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 Data Mining Lectures 3.4 3.5 3.6 3.7 RedLectures Blood Volume 12,13:Cell Clustering 3.8 3.9 4 Padhraic Smyth, UC Irvine ANEMIA DATA WITH LABELS Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 Data Mining Lectures 3.4 3.5 3.6 3.7 RedLectures Blood Volume 12,13:Cell Clustering 3.8 3.9 4 Padhraic Smyth, UC Irvine LOG-LIKELIHOOD AS A FUNCTION OF EM ITERATIONS 490 480 Log-Likelihood 470 460 450 440 430 420 410 400 0 Data Mining Lectures 5 10 15 EM Iteration Lectures 12,13: Clustering 20 25 Padhraic Smyth, UC Irvine Selecting K in mixture models • cannot just choose K that maximizes likelihood – • Likelihood L() is always larger for larger K Model selection alternatives: – 1) penalize complexity • • • – 2) Bayesian: compute posteriors p(k | data) • • • – Data Mining Lectures P(k|data) requires computation of p(data|k) = marginal likelihood Can be tricky to compute for mixture models Recent work on Dirichlet process priors has made this more practical 3) (cross) validation: • • • • – e.g., BIC = L() – d/2 log n , d = # parameters (Bayesian information criterion) Asymptotically correct under certain assumptions Often used in practice for mixture models even though assumptions for theory are not met Score different models by log p(Xtest | ) split data into train and validate sets Works well on large data sets Can be noisy on small data (logL is sensitive to outliers) Note: all of these methods evaluate the quality of the clustering as a density estimator, rather than with any explicit notion of clustering Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Example of BIC Score for Red-Blood Cell Data Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Example of BIC Score for Red-Blood Cell Data True number of classes (2) selected by BIC Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Hierarchical Clustering • • Representation: tree of nested clusters Works from a distance matrix – advantage: x’s can be any type of object – disadvantage: computation • two basic approachs: – merge points (agglomerative) – divide superclusters (divisive) • visualize both via “dendograms” – shows nesting structure – merges or splits = tree nodes • Applications – e.g., clustering of gene expression data – Useful for seeing hierarchical structure, for relatively small data sets Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Simple example of hierarchical clustering Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Agglomerative Methods: Bottom-Up • algorithm based on distance between clusters: – for i=1 to n let Ci = { x(i) }, i.e. start with n singletons – while more than one cluster left • let Ci and Cj be cluster pair with minimum distance, dist[Ci , Cj ] • merge them, via Ci = Ci Cj and remove Cj • time complexity = O(n2) to O(n3) – n iterations (start: n clusters; end: 1 cluster) – 1st iteration: O(n2) to find nearest singleton pair • space complexity = O(n2) – accesses all distances between x(i)’s • interpreting large n dendrogram difficult anyway (like decision trees) – large n idea: partition-based clusters at leafs Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Distances Between Clusters • single link / nearest neighbor measure: – D(Ci,Cj) = min { d(x,y) | x Ci, y Cj } – can be outlier/noise sensitive Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Distances Between Clusters • single link / nearest neighbor measure: – D(Ci,Cj) = min { d(x,y) | x Ci, y Cj } – can be outlier/noise sensitive • complete link / furthest neighbor measure: – D(Ci,Cj) = max { d(x,y) | x Ci, y Cj } – enforces more “compact” clusters Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Distances Between Clusters • single link / nearest neighbor measure: • complete link / furthest neighbor measure: • intermediates between those extremes: – D(Ci,Cj) = min { d(x,y) | x Ci, y Cj } – can be outlier/noise sensitive – D(Ci,Cj) = max { d(x,y) | x Ci, y Cj } – enforces more “compact” clusters – average link: D(Ci,Cj) = avg { d(x,y) | x Ci, y Cj } – centroid: D(Ci,Cj) = d(ci,cj) where ci , cj are centroids – Wards’s SSE measure (for vector data): • Merge clusters than minimize increase in within-cluster sum-of-squared-dists – Note that centroid and Ward require that centroid (vector mean) can be defined • Which to choose? Different methods may be used for exploratory purposes, depends on goals and application Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Dendrogram Using Single-Link Method Old Faithful Eruption Duration vs Wait Data Notice how single-link tends to “chain”. dendrogram y-axis = crossbar’s distance score Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Dendogram Using Ward’s SSE Distance Old Faithful Eruption Duration vs Wait Data Data Mining Lectures Lectures 12,13: Clustering More balanced than single-link. Padhraic Smyth, UC Irvine Divisive Methods: Top-Down • algorithm: – begin with single cluster containing all data – split into components, repeat until clusters = single points • two major types: – monothetic: • split by one variable at a time -- restricts search space • analogous to decision trees – polythetic • splits by all variables at once -- many choices makes this difficult • less commonly used than agglomerative methods – generally more computationally intensive • more choices in search space Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine MATLAB Demo • In-class Matlab demo comparing different hierarchical algorithms and k-means, on the same data sets Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Spectral/Graph-based Clustering Idea: • think of distance matrix as a weighted graph where - objects are nodes - edges exist for objects with high similarity Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Spectral/Graph-based Clustering Idea: • think of distance matrix as a weighted graph where - objects are nodes - edges exist for objects with high similarity • finding a good grouping of the objects is somewhat equivalent to finding low-weight subgraphs - can be reduced to “min-cut” algorithms - related to eigenstructure of distance matrix - e.g., Shi and Malik, 1997, for image segmentation Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Clustering non-vector objects • E.g., sequences, images, documents, etc – Can be of varying lengths, sizes • Distance matrix approach – E.g., compute edit distance/transformations for pairs of sequences – Apply clustering (e.g., hierarchical) based on distance matrix – However….does not scale well computationally • “Vectorization” – Represent each object as a vector – Cluster resulting vectors using vector-space algorithm – However…. can lose (e.g., sequence) information by going to vector space • Probabilistic model-based clustering – Treat as mixture of stochastic models (e.g., Markov models) – Can naturally handle variable lengths/sizes – Will discuss application to Web session clustering later in the quarter Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Clustering of TROPICAL CYCLONES Western North Pacific 1983-2002 (from Scott Gaffney’s Phd Thesis, 2004: uses mixture of regressions clustering) Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine K-Means Clustering Clustering Task Representation Partition based on K centers Score Function Within-cluster sum of squared errors Search/Optimization Data Mining Lectures Iterative greedy search Data Management None specified Models, Parameters K centers Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Probabilistic Model-Based Clustering Clustering Task Representation Mixture of Probability Components Score Function Log-likelihood Search/Optimization Data Mining Lectures EM (iterative) Data Management None specified Models, Parameters Probability model Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Single-Link Hierarchical Clustering Clustering Task Representation Tree of nested groupings Score Function No global score Search/Optimization Data Mining Lectures Iterative merging of nearest neighbors Data Management None specified Models, Parameters Dendrogram Lectures 12,13: Clustering Padhraic Smyth, UC Irvine Summary • General comments: – – – – – Many different approaches and algorithms What type of cluster structure are you looking for? Computational complexity may be an issue for large n Dimensionality is also an issue Validation/selection of K is often an ill-posed problem • Reading – Chapter 9 • Covers all of the clustering methods discussed here (except graph/spectral clustering) – See also earlier reading references Data Mining Lectures Lectures 12,13: Clustering Padhraic Smyth, UC Irvine