Download Unsupervised Learning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

Intelligence explosion wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Clustering / Unsupervised Learning
The target features are not given in the training examples
The aim is to construct a natural classification that can
be used to predict features of the data.
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 1
Clustering / Unsupervised Learning
The target features are not given in the training examples
The aim is to construct a natural classification that can
be used to predict features of the data.
The examples are partitioned in into clusters or classes.
Each class predicts feature values for the examples in the
class.
I
I
In hard clustering each example is placed definitively in
a class.
In soft clustering each example has a probability
distribution over its class.
Each clustering has a prediction error on the examples.
The best clustering is the one that minimizes the error.
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 2
k-means algorithm
The k-means algorithm is used for hard clustering.
Inputs:
training examples
the number of classes, k
Outputs:
a prediction of a value for each feature for each class
an assignment of examples to classes
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 3
k-means algorithm formalized
E is the set of all examples
the input features are X1 , . . . , Xn
val(e, Xj ) is the value of feature Xj for example e.
there is a class for each integer i ∈ {1, . . . , k}.
The k-means algorithm outputs
a function class : E → {1, . . . , k}.
class(e) = i means e is in class i.
a pval function where pval(i, Xj ) is the prediction for
each example in class i for feature Xj .
The sum-of-squares error for class and pval is
n
XX
(pval(class(e), Xj ) − val(e, Xj ))2 .
e∈E j=1
Aim: find class and pval that minimize sum-of-squares error.
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 4
Minimizing the error
The sum-of-squares error for class and pval is
n
XX
(pval(class(e), Xj ) − val(e, Xj ))2 .
e∈E j=1
Given class, the pval that minimizes the sum-of-squares
error is
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 5
Minimizing the error
The sum-of-squares error for class and pval is
n
XX
(pval(class(e), Xj ) − val(e, Xj ))2 .
e∈E j=1
Given class, the pval that minimizes the sum-of-squares
error is the mean value for that class.
Given pval, each example can be assigned to the class
that
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 6
Minimizing the error
The sum-of-squares error for class and pval is
n
XX
(pval(class(e), Xj ) − val(e, Xj ))2 .
e∈E j=1
Given class, the pval that minimizes the sum-of-squares
error is the mean value for that class.
Given pval, each example can be assigned to the class
that minimizes the error for that example.
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 7
k-means algorithm
Initially, randomly assign the examples to the classes.
Repeat the following two steps:
For each class i and feature Xj ,
P
e:class(e)=i val(e, Xj )
pval(i, Xj ) ←
,
|{e : class(e) = i}|
For each example e, assign e to the class i that minimizes
n
X
(pval(i, Xj ) − val(e, Xj ))2 .
j=1
until the second step does not change the assignment of any
example.
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 8
Example Data
10
8
6
4
2
0
0
2
4
c
D.
Poole and A. Mackworth 2010
6
8
10
Artificial Intelligence, Lecture 11.1, Page 9
Random Assignment to Classes
10
8
6
4
2
0
0
2
4
c
D.
Poole and A. Mackworth 2010
6
8
10
Artificial Intelligence, Lecture 11.1, Page 10
Assign Each Example to Closest Mean
10
8
6
4
2
0
0
2
4
c
D.
Poole and A. Mackworth 2010
6
8
10
Artificial Intelligence, Lecture 11.1, Page 11
Ressign Each Example to Closest Mean
10
8
6
4
2
0
0
2
4
c
D.
Poole and A. Mackworth 2010
6
8
10
Artificial Intelligence, Lecture 11.1, Page 12
Properties of k-means
An assignment of examples to classes is stable if running
both the M step and the E step does not change the
assignment.
This algorithm will eventually converge to a stable local
minimum.
Any permutation of the labels of a stable assignment is
also a stable assignment.
It is not guaranteed to converge to a global minimum.
It is sensitive to the relative scale of the dimensions.
Increasing k can always decrease error until k is the
number of different examples.
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 13
EM Algorithm
Used for soft clustering — examples are probabilistically
in classes.
k-valued random variable C
Model
X1
t
f
f
C
X1
X2
X3
X4
Data
X2 X3
f
t
t
t
f
t
···
c
D.
Poole and A. Mackworth 2010
ê
X4
t
f
t
Probabilities
P(C )
P(X1 |C )
P(X2 |C )
P(X3 |C )
P(X4 |C )
Artificial Intelligence, Lecture 11.1, Page 14
EM Algorithm
M-step
X1 X2 X3 X4 C
..
..
..
..
..
.
.
.
.
.
t
f
t
t 1
t
f
t
t 2
t
f
t
t 3
..
..
..
..
..
.
.
.
.
.
count
..
.
P(C)
P(X1 |C)
P(X2 |C)
P(X3 |C)
P(X4 |C)
0.4
0.1
0.5
..
.
E-step
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 15
EM Algorithm Overview
Repeat the following two steps:
I
I
E-step give the expected number of data points for the
unobserved variables based on the given probability
distribution.
M-step infer the (maximum likelihood or maximum
aposteriori probability) probabilities from the data.
Start either with made-up data or made-up probabilities.
EM will converge to a local maxima.
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 16
Augmented Data — E step
Suppose k = 3, and
P(C = 1|X1 = t, X2
P(C = 2|X1 = t, X2
P(C = 3|X1 = t, X2
dom(C ) = {1, 2, 3}.
= f , X3 = t, X4 = t) = 0.407
= f , X3 = t, X4 = t) = 0.121
= f , X3 = t, X4 = t) = 0.472:
X1 X2 X3 X4
..
..
..
..
.
.
.
.
t f
t
t
..
..
..
..
.
.
.
.
A[X1 , . . . , X4 , C ]
z
}|
{
X1 X2 X3 X4 C Count
..
..
..
.. ..
..
Count
.
.
.
. .
.
..
.
t f
t
t 1 40.7
−→
100
t f
t
t 2 12.1
..
t
f
t
t 3 47.2
.
..
..
..
.. ..
..
.
.
.
. .
.
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 17
M step
C
X1 X2 X3 X4 C Count
..
..
..
..
.. ..
.
.
.
.
. .
t f
t
t 1 40.7
−→
t f
t
t 2 12.1
t f
t
t 3 47.2
..
..
..
.. ..
..
.
.
. .
.
.
c
D.
Poole and A. Mackworth 2010
X1
X2
X3
Artificial Intelligence, Lecture 11.1, Page 18
X4
M step
C
X1 X2 X3 X4 C Count
..
..
..
..
.. ..
.
.
.
.
. .
t f
t
t 1 40.7
−→
t f
t
t 2 12.1
t f
t
t 3 47.2
..
..
..
.. ..
..
.
.
. .
.
.
P
t|=C =vi Count(t)
P
P(C =vi ) =
t Count(t)
P
P(Xk = vj |C =vi ) =
t|=C =vi ∧Xk =vj
P
t|=C =vi
X1
X2
X3
Count(t)
Count(t)
...perhaps including pseudo-counts
c
D.
Poole and A. Mackworth 2010
Artificial Intelligence, Lecture 11.1, Page 19
X4