Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lloyd Algorithm K-Means Clustering Gene Expression • Susumu Ohno: whole genome duplications • The expression of genes can be measured over time. • Identifying which genes are expressed at a given moment can help determine function. Grouping • Grouping genes by derivative. • Data must be clustered by derivative. Clustering Problems • Cluster d data points into k clusters, such that each point is closer to the points in its cluster than those of any other. • Data is usually not that clearly organized. Lloyd’s Algorithm • Assign points to clusters, minimizing distance between points and centers of clusters. • Assign cluster center of gravity as new center, repeat until centers do not change, minimize squared error distortion. The Computational Problem • Input: A matrix of points with dimensions m and the desired number of clusters k. • Output: Points organized into k clusters, minimizing distance from center, and a visual representation of the data. Pseudo-pseudocode • Arbitrarily assign k centers. • Assign points to k clusters, minimizing Euclidian distance from center. • Assign cluster center of gravity as new center. • Repeat until algorithm converges Plotting Plotting