Download Lloyd Algorithm K-Means Clustering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomic imprinting wikipedia , lookup

Genome evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Ridge (biology) wikipedia , lookup

Transcript
Lloyd Algorithm
K-Means Clustering
Gene Expression
• Susumu Ohno: whole
genome duplications
• The expression of genes
can be measured over
time.
• Identifying which genes
are expressed at a given
moment can help
determine function.
Grouping
• Grouping genes by derivative.
• Data must be clustered by derivative.
Clustering Problems
• Cluster d data points
into k clusters, such
that each point is
closer to the points in
its cluster than those
of any other.
• Data is usually not
that clearly
organized.
Lloyd’s Algorithm
• Assign points to clusters, minimizing distance
between points and centers of clusters.
• Assign cluster center of gravity as new center,
repeat until centers do not change, minimize
squared error distortion.
The Computational Problem
• Input: A matrix of points with dimensions m
and the desired number of clusters k.
• Output: Points organized into k clusters,
minimizing distance from center, and a visual
representation of the data.
Pseudo-pseudocode
• Arbitrarily assign k centers.
• Assign points to k clusters, minimizing
Euclidian distance from center.
• Assign cluster center of gravity as new center.
• Repeat until algorithm converges
Plotting
Plotting