Download Machine Learning Methods

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
Machine Learning Methods for
Human-Computer Interaction
Kerem Altun
Postdoctoral Fellow
Department of Computer Science
University of British Columbia
IEEE Haptics Symposium
March 4, 2012
Vancouver, B.C., Canada
Machine learning
Machine
learning
Pattern
recognition
Template matching
Statistical pattern
recognition
Supervised
methods
Regression
Structural pattern
recognition
Neural networks
Unsupervised
methods
IEEE Haptics Symposium 2012
2
What is pattern recognition?



title even appears in the
International Association
for Pattern Recognition
(IAPR) newsletter
many definitions exist
simply: the process of
labeling observations (x)
with predefined
categories (w)
IEEE Haptics Symposium 2012
3
Various applications of PR
[Jain et al., 2000]
IEEE Haptics Symposium 2012
4
Supervised learning
“tufa”
“tufa”
“tufa”
Can you identify other “tufa”s here?
lifted from lecture notes by Josh Tenenbaum
IEEE Haptics Symposium 2012
5
Unsupervised learning
How many categories are there?
Which image belongs to which category?
lifted from lecture notes by Josh Tenenbaum
IEEE Haptics Symposium 2012
6
Pattern recognition in haptics/HCI



[Altun et al., 2010a]
human activity recognition
body-worn inertial sensors


daily activities


accelerometers and
gyroscopes
sitting, standing, walking,
stairs, etc.
sports activities

walking/running, cycling,
rowing, basketball, etc.
IEEE Haptics Symposium 2012
7
Pattern recognition in haptics/HCI
[Altun et al., 2010a]
walking
basketball
right arm acc
left arm acc
IEEE Haptics Symposium 2012
8
Pattern recognition in haptics/HCI


[Flagg et al., 2012]
touch gesture recognition
on a conductive fur patch
IEEE Haptics Symposium 2012
9
Pattern recognition in haptics/HCI
5
5
4
4
4
3
2
1
0
0
Vfur (Volts)
5
Vfur (Volts)
Vfur (Volts)
[Flagg et al., 2012]
3
2
1
0.5
1
1.5
2
2.5
3
2
1
0
0
0.5
1
1.5
2
2.5
0
0
0.5
1
1.5
t (s)
t (s)
t (s)
stroke
scratch
light touch
IEEE Haptics Symposium 2012
2
2.5
10
Other haptics/HCI applications?
IEEE Haptics Symposium 2012
11
Pattern recognition example
[Duda et al., 2000]


excellent example by
Duda et al.
classifying incoming
fish on a conveyor
belt using a camera
image


sea bass
salmon
IEEE Haptics Symposium 2012
12
Pattern recognition example

how to classify? what kind of information can
distinguish these two species?


suppose a fisherman tells us that salmon are usually
shorter


so, let's use length as a feature
what to do to classify?


length, width, weight, etc.
capture image – find fish in the image – measure length –
make decision
how to make the decision?

how to find the threshold?
IEEE Haptics Symposium 2012
13
Pattern recognition example
[Duda et al., 2000]
IEEE Haptics Symposium 2012
14
Pattern recognition example

on the average, salmon are usually shorter,
but is this a good feature?

let's try classifying according to lightness of
the fish scales
IEEE Haptics Symposium 2012
15
Pattern recognition example
[Duda et al., 2000]
IEEE Haptics Symposium 2012
16
Pattern recognition example

how to choose the threshold?
IEEE Haptics Symposium 2012
17
Pattern recognition example

how to choose the threshold?


minimize the probability of error
sometimes we should consider costs of
different errors



salmon is more expensive
customers who order salmon but get sea bass
instead will be angry
customers who order sea bass but occasionally
get salmon instead will not be unhappy
IEEE Haptics Symposium 2012
18
Pattern recognition example


we don't have to use just one feature
let's use lightness and width
each point is a
feature vector
2-D plane is the
feature space
[Duda et al., 2000]
IEEE Haptics Symposium 2012
19
Pattern recognition example


we don't have to use just one feature
let's use lightness and width
each point is a
feature vector
2-D plane is the
feature space
decision boundary
[Duda et al., 2000]
IEEE Haptics Symposium 2012
20
Pattern recognition example

should we add as more features as we can?

do not use redundant features
IEEE Haptics Symposium 2012
21
Pattern recognition example

should we add as more features as we can?


do not use redundant features
consider noise in the measurements
IEEE Haptics Symposium 2012
22
Pattern recognition example

should we add as more features as we can?



do not use redundant features
consider noise in the measurements
moreover,





avoid adding too many features
more features means higher dimensional feature vectors
difficult to work in high dimensional spaces
this is called the curse of dimensionality
more on this later
IEEE Haptics Symposium 2012
23
Pattern recognition example

how to choose the decision boundary?
is this one better?
[Duda et al., 2000]
IEEE Haptics Symposium 2012
24
Pattern recognition example

how to choose the decision boundary?
is this one better?
[Duda et al., 2000]
IEEE Haptics Symposium 2012
25
Probability theory review

a chance experiment, e.g., tossing a 6-sided die



1, 2, 3, 4, 5, 6 are possible outcomes
the set of all outcomes: W={1,2,3,4,5,6} is the sample
space
any subset of the sample space is an event



the event that the outcome is odd: A={1,3,5}
each event is assigned a number called the
probability of the event: P(A)
the assigned probabilities can be selected freely, as
long as Kolmogorov axioms are not violated
IEEE Haptics Symposium 2012
26
Probability axioms





for any event,
for the sample space,
for disjoint events
third axiom also includes the case
die tossing – if all outcomes are equally likely

for all i=1…6, probability of getting outcome i is 1/6
IEEE Haptics Symposium 2012
27
Conditional probability


sometimes events occur and change the
probabilities of other events
example: ten coins in a bag




nine of them are fair coins – heads (H) and tails (T)
one of them is fake – both sides are heads (H)
I randomly draw one coin from the bag, but I don’t
show it to you
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
which of these events would you bet on?
IEEE Haptics Symposium 2012
28
Conditional probability

suppose I flip the coin five times, obtaining
the outcome HHHHH (five heads in a row)

call this event F
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T

which of these events would you bet on now?
IEEE Haptics Symposium 2012
29
Conditional probability

definition: the conditional probability of
event A given that event B has occurred:
read as: "probability of A given B"

P(AB) is the probability of events A and B
occurring together

Bayes’ theorem:
IEEE Haptics Symposium 2012
30
Conditional probability
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)

we know that F occurred
we want to find –

difficult – use Bayes’ theorem

IEEE Haptics Symposium 2012
31
Conditional probability
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
IEEE Haptics Symposium 2012
32
Conditional probability
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
probability of observing F
if H0 was true
posterior probability
prior probability
(before the observation F)
total probability of observing F
IEEE Haptics Symposium 2012
33
Conditional probability
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
total probability of
observing F
IEEE Haptics Symposium 2012
34
Conditional probability
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1
1
IEEE Haptics Symposium 2012
35
Conditional probability
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1
1
1/10
1/10
IEEE Haptics Symposium 2012
36
Conditional probability
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1
1
1/10
1/10
IEEE Haptics Symposium 2012
1/32
37
Conditional probability
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1
1
1/10
1/10
IEEE Haptics Symposium 2012
1/32
9/10
38
Conditional probability
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1
32/41
1
1/10
1/10
1/32
9/10
which event would you bet on?
IEEE Haptics Symposium 2012
39
Conditional probability
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1
32/41
1
1/10
1/10
1/32
9/10
this is very similar to a pattern recognition problem!
IEEE Haptics Symposium 2012
40
Conditional probability
H0: the coin is fake, both sides H
H1: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1
32/41
1
1/10
1/10
1/32
9/10
we can put a label on the coin as “fake” based on our
observations!
IEEE Haptics Symposium 2012
41
Bayesian inference
w0: the coin belongs to the “fake” class
w1: the coin belongs to the “fair” class
x: observation

decide
if the posterior probability
is higher than others

this is called the MAP (maximum a posteriori) decision rule
IEEE Haptics Symposium 2012
42
Random variables



we model the observations with random
variables
a random variable is a real number whose
value depends on a chance experiment
discrete random variable


the possible values form a discrete set
continuous random variable

the possible values form a continuous set
IEEE Haptics Symposium 2012
43
Random variables

a discrete random variable X is characterized
by a probability mass function (pmf)

a pmf has two properties
IEEE Haptics Symposium 2012
44
Random variables

a continuous random variable X is
characterized by a probability density function
(pdf) denoted by
for all possible values

probabilities are calculated for intervals
IEEE Haptics Symposium 2012
45
Random variables

a pdf also has two properties
IEEE Haptics Symposium 2012
46
Expectation

definition

average of possible values of X, weighted by
probabilities
also called expected value, mean

IEEE Haptics Symposium 2012
47
Variance and standard deviation

variance is the expected value of deviation
from the mean

variance is always positive


or zero, which means X is not random
standard deviation is the square root of the
variance
IEEE Haptics Symposium 2012
48
Gaussian (normal) distribution

possibly the most ''natural'' distribution


encountered frequently in nature
central limit theorem

sum of i.i.d. random variables is Gaussian

definition: the random variable with pdf

two parameters:
IEEE Haptics Symposium 2012
49
Gaussian distribution
it can be proved that:
figure lifted from http://assets.allbusiness.com
IEEE Haptics Symposium 2012
50
Random vectors

extension of the scalar case




pdf:
mean:
covariance matrix:
covariance matrix is always symmetric and
positive semidefinite
IEEE Haptics Symposium 2012
51
Multivariate Gaussian distribution

probability density function:

two parameters:
compare with the univariate case:

IEEE Haptics Symposium 2012
52
Bivariate Gaussian exercise
The scatter plots show 100 independent samples drawn from zero-mean Gaussian
distributions,with different covariance matrices. Match the covariance matrices
with the scatter plots, by inspection only.
4
4
4
2
2
2
0
0
0
-2
-2
-2
-4
-4
-2
0
a
2
4
-4
-4
-2
0
2
b
IEEE Haptics Symposium 2012
4
-4
-4
-2
0
2
4
c
53
Bivariate Gaussian exercise
The scatter plots show 100 independent samples drawn from zero-mean Gaussian
distributions,with different covariance matrices. Match the covariance matrices
with the scatter plots, by inspection only.
4
4
4
2
2
2
0
0
0
-2
-2
-2
-4
-4
-2
0
a
2
4
-4
-4
-2
0
2
b
IEEE Haptics Symposium 2012
4
-4
-4
-2
0
2
4
c
54
Bayesian decision theory

Bayesian decision theory falls into the
subjective interpretation of probability

in the pattern recognition context, some prior
belief about the class (category) of an
observation is updated using the Bayes rule
IEEE Haptics Symposium 2012
55
Bayesian decision theory




back to the fish example
say we have two classes (states of nature)
let
be the prior probability that the
fish is a sea bass
is the prior probability that the fish is a
salmon
IEEE Haptics Symposium 2012
56
Bayesian decision theory



prior probabilities reflect our belief about
which kind of fish to expect, before we
observe it
we can choose according to the fishing
location, time of year etc.
if we don’t have any prior knowledge, we can
choose equal priors (or uniform priors)
IEEE Haptics Symposium 2012
57
Bayesian decision theory

let
be the feature vector obtained
from our observations

can include features like lightness, weight, length, etc.

calculate posterior probabilities

how to calculate?

and
IEEE Haptics Symposium 2012
58
Bayesian decision theory

is called the class-conditional
probability density function (CCPDF)


the CCPDF is usually not known



pdf of observation x if the true class was
e.g., impossible to know the pdf of the length of all
sea bass in the world
but it can be estimated, more on this later
for now, assume that the CCPDF is known

just substitute observation x in
IEEE Haptics Symposium 2012
59
Bayesian decision theory

MAP rule (also called the minimum-error rule):



decide
decide
if
otherwise
do we really have to calculate
IEEE Haptics Symposium 2012
?
60
Bayesian decision theory

multiclass problems:
maximum a posteriori (MAP)
decision rule
the MAP rule minimizes the error probability, and is the best performance
that can be achieved (of course, if the CCPDFs are known)

if prior probabilities are equal:
maximum likelihood (ML)
decision rule
IEEE Haptics Symposium 2012
61
Exercise (single feature)

find:

the maximum likelihood
decision rule
[Duda et al., 2000]
IEEE Haptics Symposium 2012
62
Exercise (single feature)

find:

the maximum likelihood
decision rule
[Duda et al., 2000]
IEEE Haptics Symposium 2012
63
Exercise (single feature)

find:

the MAP decision rule

if

if
[Duda et al., 2000]
IEEE Haptics Symposium 2012
64
Exercise (single feature)

find:

the MAP decision rule

if

if
[Duda et al., 2000]
IEEE Haptics Symposium 2012
65
Discriminant functions

we can generalize this
let
be the discriminant function for the
ith class
decision rule: assign x to class i if

for the MAP rule:


IEEE Haptics Symposium 2012
66
Discriminant functions

the discriminant functions divide the feature
space into decision regions that are
separated by decision boundaries
IEEE Haptics Symposium 2012
67
Discriminant functions for
Gaussian densities

consider a multiclass problem (c classes)


discriminant functions:
easy to show analytically that the decision
boundaries are hyperquadrics


if the feature space is 2-D, conic sections
hyperplanes (or lines for 2-D) if covariance
matrices are the same for all classes (degenerate
case)
IEEE Haptics Symposium 2012
68
Examples
2-D
3-D
equal and spherical
covariance matrices
equal covariance
matrices
[Duda et al., 2000]
IEEE Haptics Symposium 2012
69
Examples
[Duda et al., 2000]
IEEE Haptics Symposium 2012
70
Examples
[Duda et al., 2000]
IEEE Haptics Symposium 2012
71
2-D example

artificial data
3
2
1
0
-1
-2
-3
-2
0
2
4
[Jain et al., 2000]
IEEE Haptics Symposium 2012
72
Density estimation

but, CCPDFs are usually unknown

that's why we need training data
density estimation
parametric
non-parametric
assume a class of
densities (e.g. Gaussian),
find the parameters
IEEE Haptics Symposium 2012
estimate the pdf directly
(and numerically)
from the training data
73
Density estimation

assume we have n samples of training
vectors for a class

we assume that these samples are
independent and drawn from a certain
probability distribution

this is called the generative approach
IEEE Haptics Symposium 2012
74
Parametric methods

we will consider only the Gaussian case


underlying assumption: samples are actually
noise-corrupted versions of a single feature vector
why Gaussian? three important properties



completely specified by mean and variance
linear transformations remain Gaussian
central limit theorem: many phenomena
encountered in reality are asymptotically
Gaussian
IEEE Haptics Symposium 2012
75
Gaussian case

assume
Gaussian distribution

how to find the pdf?
are drawn from a
IEEE Haptics Symposium 2012
76
Gaussian case

assume
Gaussian distribution

how to find the pdf?

are drawn from a
finding the mean and covariance is sufficient
sample mean
sample covariance
IEEE Haptics Symposium 2012
77
2-D example
4

back to the 2-D
example
3
calculate
1
2
0
apply the MAP rule
-1
-2
-3
-2
IEEE Haptics Symposium 2012
0
2
4
78
2-D example

back to the 2-D
example
IEEE Haptics Symposium 2012
79
2-D example
decision boundary
with true pdf
3
2
1
0
-1
decision boundary
with estimated pdf
-2
-3
-2
0
2
4
IEEE Haptics Symposium 2012
80
Haptics example
5
5
4
4
4
3
2
1
0
0
Vfur (Volts)
5
Vfur (Volts)
Vfur (Volts)
[Flagg et al., 2012]
3
2
1
0.5
1
1.5
2
2.5
3
2
1
0
0
0.5
1
1.5
2
2.5
0
0
0.5
1
1.5
t (s)
t (s)
t (s)
stroke
scratch
light touch
2
2.5
which feature to use for discrimination?
IEEE Haptics Symposium 2012
81
Haptics example





[Flagg et al., 2012]
7 participants performed each gesture 10
times
210 samples in total
we should find distinguishing features
let's use one feature at a time

we assume the feature value is normally
distributed, find the mean and covariance
IEEE Haptics Symposium 2012
82
Haptics example
1.4
stroke
scratch
light touch
1.2
1
0.8
assume equal priors
apply ML rule
0.6
0.4
0.2
0
-5
0
5
minimum value
10
IEEE Haptics Symposium 2012
83
Haptics example
30
25
stroke
scratch
light touch
apply ML rule
20
15
decision boundaries?
(decision thresholds for 1-D)
10
5
0
3.5
4
4.5
maximum value
5
IEEE Haptics Symposium 2012
84
Haptics example
5

let's plot the 2-D
distribution
4.5
maximum value

clearly this isn't a
"good" classifier
for this problem
4
3.5

the Gaussian
assumption is not
valid
3
-1
stroke
scratch
light touch
0
IEEE Haptics Symposium 2012
1
2
3
minimum value
4
5
85
Activity recognition example






[Altun et al., 2010a]
4 participants (2 male, 2 female)
activities: standing, ascending stairs, walking
720 samples in total
sensor: accelerometer on the right leg
let's use the same features

minimum and maximum values
IEEE Haptics Symposium 2012
86
Activity recognition example
feature 2
feature 1
3
3.5
standing
stairs
walking
2.5
standing
stairs
walking
3
2.5
2
2
1.5
1.5
1
1
0.5
0
-5
0.5
-4
-3
-2
minimum value
-1
0
0
-2
IEEE Haptics Symposium 2012
0
2
maximum value
4
87
Activity recognition example
4

the Gaussian
assumption
looks valid
3
this is a "good"
classifier for this
problem
maximum value

standing
stairs
walking
2
1
0
-1
-2
-5
-4
IEEE Haptics Symposium 2012
-3
-2
-1
minimum value
0
1
88
Activity recognition example
4
decision
boundaries
3
maximum value

standing
stairs
walking
2
1
0
-1
-2
-5
-4
-3
-2
-1
minimum value
IEEE Haptics Symposium 2012
0
1
89
Haptics example

how to solve the problem?
IEEE Haptics Symposium 2012
90
Haptics example

how to solve the problem?

either change the classifier, or change the
features
IEEE Haptics Symposium 2012
91
Non-parametric methods




let's estimate the CCPDF directly from
samples
simplest method to use is the histogram
partition the feature space into (equally-sized)
bins
count the number of samples in each bin
k: number of samples in the bin that includes x
n: total number of samples
V: volume of the bin
IEEE Haptics Symposium 2012
92
Non-parametric methods

how to choose the bin size?

number of bins increase exponentially with
the dimension of the feature space

we can do better than that!
IEEE Haptics Symposium 2012
93
Non-parametric methods


compare the following density estimates
pdf estimates with six samples
image from http://en.wikipedia.org/wiki/Parzen_Windows
IEEE Haptics Symposium 2012
94
Kernel density estimation

a density estimate can be obtained as

where the functions
are Gaussians
centered at
. More precisely,
K: Gaussian kernel
hn: width of the Gaussian
IEEE Haptics Symposium 2012
95
Kernel density estimation

three different density estimates with different widths



if the width is large, the pdf will be too smooth
if the width is small, the pdf will be too spiked
as the width approaches zero, the pdf converges to a sum of Dirac
delta functions
[Duda et al., 2000]
IEEE Haptics Symposium 2012
96
KDE for activity recognition data
1.6
2
standing
stairs
walking
1.5
standing
stairs
walking
1.4
1.2
1
0.8
1
0.6
0.4
0.5
0.2
0
-5
-4
-3
-2
-1
minimum value
0
1
0
-2
IEEE Haptics Symposium 2012
0
2
4
maximum value
6
97
KDE for activity recognition data
4
standing
stairs
walking
maximum value
3
2
1
0
-1
-2
-5
-4
-3
-2
-1
minimum value
IEEE Haptics Symposium 2012
0
1
98
KDE for gesture recognition data
0.5
0.4
10
stroke
scratch
light touch
8
0.3
6
0.2
4
0.1
2
0
-5
0
5
minimum value
10
0
2
IEEE Haptics Symposium 2012
stroke
scratch
light touch
3
4
5
maximum value
6
99
Other density estimation methods

Gaussian mixture models



parametric
model the distribution as sum of M Gaussians
optimization algorithm:


expectation-maximization (EM)
k-nearest neighbor estimation



non-parametric
variable width
fixed k
IEEE Haptics Symposium 2012
100
Another example
[Aksoy., 2011]
IEEE Haptics Symposium 2012
101
Measuring classifier performance

how do we know our classifiers will work?

how do we measure the performance, i.e., decide
one classifier is better than the other?



correct recognition rate
confusion matrix
ideally, we should have more data independent from
the training set and test the classifiers
IEEE Haptics Symposium 2012
102
Confusion matrix
confusion matrix for an 8-class problem [Tunçel et al., 2009]
IEEE Haptics Symposium 2012
103
Measuring classifier performance

use the training samples to test the classifiers

this is possible, but not good practice
100% correct
classification rate
for this example!
because the classifier
"memorized" the
training samples
instead of "learning"
them
[Duda et al., 2000]
IEEE Haptics Symposium 2012
104
Cross validation

having a separate test data set might not be
possible for some cases

we can use cross validation


use some of the data for training, and the
remaining for testing
how to divide the data?
IEEE Haptics Symposium 2012
105
Cross validation methods

repeated random sub-sampling



divide the data into two groups randomly (usually
the size of the training set is larger)
train and test, record the correct classification rate
do this repeatedly, take the average
IEEE Haptics Symposium 2012
106
Cross validation methods

K-fold cross validation




randomly divide the data into K sets
use K-1 sets for training, 1 set for testing
repeat K times, at each fold use a different set for
testing
leave-one-out cross validation


use one sample for testing, and all the remaining
for training
same as K-fold cross validation, with K being
equal to the total number of samples
IEEE Haptics Symposium 2012
107
Haptics example
assume equal priors
1.4
stroke
scratch
light touch
1.2
apply ML rule
1
0.8
0.6
stroke
stroke
scratch
light touch
0.4
53
2
35
light touch
16
1
66
2
28
7
60.0%
0.2
0
-5
scratch
0
5
minimum value
10
IEEE Haptics Symposium 2012
the decision region for
light touch is too small!!
108
Haptics example
30
25
stroke
scratch
light touch
apply ML rule
20
15
stroke
stroke
scratch
light touch
10
61
13
18
light touch
0
9
24
33
14
38
58.5%
5
0
3.5
scratch
4
4.5
maximum value
5
IEEE Haptics Symposium 2012
109
Haptics example
10
0.5
0.4
stroke
scratch
light touch
8
0.3
6
0.2
4
0.1
2
0
-5
0
5
minimum value
stroke
stroke
scratch
light touch
scratch
48
2
32
10
light touch
16
6
67
1
30
8
0
2
stroke
scratch
light touch
3
4
5
maximum value
stroke
stroke
scratch
light touch
58.8%
scratch
60
4
9
6
light touch
0
10
23
43
13
48
62.4%
IEEE Haptics Symposium 2012
110
Activity recognition example
3
3.5
standing
stairs
walking
2.5
standing
stairs
walking
3
2.5
2
2
1.5
1.5
1
1
0.5
0
-5
standing
stairs
walking
0.5
-4
-3
-2
minimum value
-1
0
-2
0
standing stairs
walking
239
1
0
5
171
64
0
132
108
standing
stairs
walking
0
2
maximum value
4
standing stairs
walking
232
8
0
41
146
53
0
72
168
75.8%
71.9%
IEEE Haptics Symposium 2012
111
Activity recognition example
4
standing
stairs
walking
maximum value
3
2
1
standing
stairs
walking
standing stairs
walking
239
1
0
0
209
31
0
56
184
0
87.8%
-1
-2
-5
-4
-3
-2
-1
minimum value
0
1
IEEE Haptics Symposium 2012
112
Another cross-validation method


used in HCI studies with multiple human
subjects
subject-based leave-one-out cross validation




number of subjects: S
leave one subject's data out, train with the
remaining data
repeat for S times, each time test with a different
subject, then average
gives an estimate for the expected correct
recognition rate when a new user is encountered
IEEE Haptics Symposium 2012
113
Activity recognition example
minimum value
maximum value
K-fold
standing
stairs
walking
K-fold
standing stairs
walking
239
1
0
5
171
64
0
132
108
standing
stairs
walking
75.8%
71.9%
subject-based leave-one-out
standing
stairs
walking
standing stairs
walking
232
8
0
41
146
53
0
72
168
standing stairs
walking
180
60
0
13
150
77
1
125
114
subject-based leave-one-out
standing
stairs
walking
standing stairs
walking
134
106
0
42
135
63
0
71
169
60.8%
61.6%
IEEE Haptics Symposium 2012
114
Activity recognition example
4
standing
stairs
walking
3
maximum value
standing
stairs
walking
K-fold
standing stairs
walking
239
1
0
0
209
31
0
56
184
87.8%
2
1
subject-based leave-one-out
0
standing
stairs
walking
-1
-2
-5
-4
-3
-2
-1
minimum value
0
1
IEEE Haptics Symposium 2012
standing stairs
walking
206
34
0
0
182
58
0
39
201
81.8%
115
Dimensionality reduction
[Duda et al., 2000]

for most problems
a few features are
not enough

adding features
sometimes helps
IEEE Haptics Symposium 2012
116
Dimensionality reduction
[Jain et al., 2000]

should we add
as many
features as we
can?

what does this
figure say?
IEEE Haptics Symposium 2012
117
Dimensionality reduction

we should add features up to a certain point



the more the training samples, the farther away this
point is
more features = higher dimensional spaces
in higher dimensions, we need more samples to
estimate the parameters and the densities
accurately


number of necessary training samples grows
exponentially with the dimension of the feature space
this is called the curse of dimensionality
IEEE Haptics Symposium 2012
118
Dimensionality reduction

how many features to use?


which features to use?


rule of thumb: use at least ten times as many
training samples as the number of features
difficult to know beforehand
one approach: consider many features and
select among them
IEEE Haptics Symposium 2012
119
Pen input recognition
[Willems, 2010]
IEEE Haptics Symposium 2012
120
Touch gesture recognition
[Flagg et al., 2012]
IEEE Haptics Symposium 2012
121
Feature reduction and selection



form a set of many features
some of them might be redundant
feature reduction (sometimes called feature
extraction)



form linear or nonlinear combinations of features
features in the reduced set usually don’t have
physical meaning
feature selection

select most discriminative features from the set
IEEE Haptics Symposium 2012
122
Feature reduction


we will only consider Principal Component
Analysis (PCA)
unsupervised method



we don’t care about the class labels
consider the distribution of all the feature
vectors in the d-dimensional feature space
PCA is the projection to a lower dimensional
space that “best represents the data”

get rid of unnecessary dimensions
IEEE Haptics Symposium 2012
123
Principal component analysis

how to “best represent the data?”
6
4
2
0
-2
-4
-6
-6
-4
-2
0
2
4
6
IEEE Haptics Symposium 2012
124
Principal component analysis

how to “best represent the data?”
6
4
find the direction(s) in which
the variance of the data
is the largest
2
0
-2
-4
-6
-6
-4
-2
0
2
4
6
IEEE Haptics Symposium 2012
125
Principal component analysis

find the covariance matrix

spectral decomposition:



eigenvalues: on the diagonal of
eigenvectors: columns of
covariance matrix is symmetric and positive
semidefinite = eigenvalues are nonnegative,
eigenvectors are orthogonal
IEEE Haptics Symposium 2012
126
Principal component analysis



put the eigenvalues in decreasing order
corresponding eigenvectors show the
principal directions in which the variance of
the data is largest
say we want to have m features only

project to the space spanned by the first m
eigenvectors
IEEE Haptics Symposium 2012
127
Activity recognition example
[Altun et al., 2010a]

five sensor units (wrists,
legs,chest)



each unit has three
accelerometers, three
gyroscopes, three
magnetometers
45 sensors in total
computed 26 features from
sensor signals


mean, variance, min, max,
Fourier transform etc.
45x26=1170 features
IEEE Haptics Symposium 2012
128
Activity recognition example




compute covariance
matrix
find eigenvalues and
eigenvectors
plot first 100
eigenvalues
reduced the number
of features to 30
IEEE Haptics Symposium 2012
129
Activity recognition example
IEEE Haptics Symposium 2012
130
Activity recognition example
what does the Bayesian decision making (BDM) result suggest?
IEEE Haptics Symposium 2012
131
Feature reduction




ideally, this should be done for the training
set only
estimate
from the training set, find
eigenvalues and eigenvectors and the
projection
apply the projection to the test vector
for example for K-fold cross validation, this
should be done K times

computationally expensive
IEEE Haptics Symposium 2012
132
Feature selection


alternatively, we can select from our large
feature set
say we have d features and want to reduce it
to m



optimal way: evaluate all
possibilities and
choose the best one
not feasible except for small values of m and d
suboptimal methods: greedy search
IEEE Haptics Symposium 2012
133
Feature selection

best individual features

evaluate all the d features individually, select the
best m features
IEEE Haptics Symposium 2012
134
Feature selection

sequential forward selection





start with the empty set
evaluate all features one by one, select the best
one, add to the set
form pairs of features with this one and one of the
remaining features, add the best one to the set
form triplets of features with these two and one of
the remaining features, add the best one to the
set
…
IEEE Haptics Symposium 2012
135
Feature selection

sequential backward selection




start with the full feature set
evaluate by removing one feature at a time from
the set, then remove the worst feature
continue step 2 with the current feature set
…
IEEE Haptics Symposium 2012
136
Feature selection

plus p – take away r selection


first enlarge the feature set by adding p features
using sequential forward selection
then remove r features using sequential backward
selection
IEEE Haptics Symposium 2012
137
Activity recognition example
first 5 features selected by sequential forward selection
first 5 features selected by PCA
SFS performs better than PCA
for a few features. If 10-15
features are used, their
performances become closer.
Time domain features and leg
features are more discriminative
[Altun et al., 2010b]
IEEE Haptics Symposium 2012
138
Activity recognition example
[Altun et al., 2010b]
IEEE Haptics Symposium 2012
139
Discriminative methods

we talked about discriminant functions

for the MAP rule we used

discriminative methods try to find
directly from data
IEEE Haptics Symposium 2012
140
Linear discriminant functions

consider the discriminant function that is a
linear combination of the components of x

for the two-class case, there is a single
decision boundary
IEEE Haptics Symposium 2012
141
Linear discriminant functions

for the multiclass case, there are options

c two-class problems, separate

consider classes pairwise
IEEE Haptics Symposium 2012
from others
142
Linear discriminant functions
distinguish one class from others
consider classes pairwise
[Duda et al., 2000]
IEEE Haptics Symposium 2012
143
Linear discriminant functions

or, use the original definition

assign x to class i if
[Duda et al., 2000]
IEEE Haptics Symposium 2012
144
Nearest mean classifier

find the means of training vectors

assign the class of the nearest mean for a
test vector y
IEEE Haptics Symposium 2012
145
2-D example

artificial data
3
2
1
0
-1
-2
-3
-2
IEEE Haptics Symposium 2012
0
2
4
146
2-D example

3
estimated
parameters
2
1
0
-1
decision boundary
with true pdf
-2
decision boundary
with nearest mean
classifier
-3
-2
IEEE Haptics Symposium 2012
0
2
4
147
Activity recognition example
4
standing
stairs
walking
maximum value
3
2
1
0
-1
-2
-5
-4
-3
-2
-1
minimum value
IEEE Haptics Symposium 2012
0
1
148
k-nearest neighbor method

for a test vector y



find the k closest training vectors
let
be the number of training vectors
belonging to class i among these k vectors
simplest case: k=1


just find the closest training vector assign its class
decision boundaries:

Voronoi tessellation of the space
IEEE Haptics Symposium 2012
149
1-nearest neighbor

decision regions:
this is called a
Voronoi tessellation
[Duda et al., 2000]
IEEE Haptics Symposium 2012
150
k-nearest neighbor

test sample


class


square
class


circle
triangle
note how the
decision is
different for k=3
and k=5
k=3
k=5
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
IEEE Haptics Symposium 2012
151
k-nearest neighbor


no training is needed
computation time for testing is high


many techniques to reduce the computational
load exist
other alternatives exist for computing the
distance


Manhattan distance (L1 norm)
chessboard distance (L∞ norm)
IEEE Haptics Symposium 2012
152
Haptics example
K-fold
5
stroke
stroke
scratch
light touch
maximum value
4.5
scratch
52
7
13
light touch
6
12
40
23
16
41
63.3%
4
subject-based leave-one-out
3.5
3
-1
stroke
stroke
scratch
light touch
stroke
scratch
light touch
scratch
50
7
14
light touch
6
14
41
22
23
33
59.0%
0
1
2
3
minimum value
4
5
IEEE Haptics Symposium 2012
153
Activity recognition example
4
standing
stairs
walking
maximum value
3
K-fold
standing
stairs
walking
2
90.0%
1
0
subject-based leave-one-out
standing
stairs
walking
-1
-2
-5
standing stairs
walking
240
0
0
0
206
34
0
38
202
-4
-3
-2
-1
minimum value
0
1
IEEE Haptics Symposium 2012
standing stairs
walking
240
0
0
0
202
38
0
40
200
89.2%
154
Activity recognition example
4
standing
stairs
walking
maximum value
3
decision boundaries
for k=3
2
1
0
-1
-2
-5
-4
-3
-2
-1
minimum value
0
1
IEEE Haptics Symposium 2012
155
Feature normalization

especially when computing distances, the
scales of the feature axes are important


features with large ranges may be weighted more
feature normalization can be applied so that
the ranges are similar
IEEE Haptics Symposium 2012
156
Feature normalization

linear scaling
where l is the lowest value and u is the largest value
of the feature x

normalization to zero mean & unit variance
where m is the mean value and s is the standard
deviation of the feature x

other methods exist
IEEE Haptics Symposium 2012
157
Feature normalization


ideally, the parameters l, u, m, and s should
be estimated from the training set only, and
then used on the test vectors
for example for K-fold cross validation, this
should be done K times
IEEE Haptics Symposium 2012
158
Discriminative methods

another popular method is the binary
decision tree



start from the root node
proceed in the tree by setting thresholds on the
feature values
proceed with sequentially answering questions
like

"is feature j less than threshold value Tk?"
IEEE Haptics Symposium 2012
159
Activity recognition example
4
standing
stairs
walking
maximum value
3
2
1
0
-1
-2
-5
-4
-3
-2
-1
minimum value
IEEE Haptics Symposium 2012
0
1
160
Discriminative methods
[Aksoy, 2011]



one very popular
method is the support
vector machine
classifier
linear classifier
applicable to linearly
separable data
if the data is not
linearly separable,
maps to a higher
dimensional space

usually a Hilbert space
IEEE Haptics Symposium 2012
161
Comparison for activity recognition



1170 features reduced to 30 by PCA
19 activities
8 participants
IEEE Haptics Symposium 2012
162
References












S. Aksoy, Pattern Recognition lecture notes, Bilkent University, Ankara, Turkey, 2011.
A. Moore, Statistical Data Mining tutorials (http://www.autonlab.org/tutorials)
J. Tenenbaum, The Cognitive Science of Intuitive Theories lecture notes, Massachussetts Institute of Technology,
MA, USA, 2006. (accessed online: http://www.mit.edu/~jbt/9.iap/9.94.Tenenbaum.ppt)
R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classification, 2nd ed., Wiley-Interscience, 2000.
A. K. Jain, R. P. D. Duin, J. Mao, “Statistical pattern recognition: a review,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, 22(1):4—37, January 2000.
A. R. Webb, Statistical Pattern Recognition, 2nd ed., John Wiley & Sons, West Sussex, England, 2002.
V. N. Vapnik, The Nature of Statistical Learning Theory, 2nd ed., Springer-Verlag New York, Inc., 2000.
K. Altun, B. Barshan, O. Tuncel, (2010a) “Comparative study on classifying human activities with miniature
inertial/magnetic sensors,” Pattern Recognition, 43(10):3605—3620, October 2010.
K. Altun, B. Barshan, (2010b) "Human activity recognition using inertial/magnetic sensor units," in Human Behavior
Understanding, Lecture Notes in Computer Science, A.A.Salah et al. (eds.), vol. 6219, pp. 38—51, Springer,
Berlin, Heidelberg, August 2010.
A. Flagg, D. Tam, K. MacLean, R. Flagg, “Conductive fur sensing for a gesture-aware furry robot,” Proceedings of
IEEE 2012 Haptics Symposium, March 4-7, 2012, Vancouver, B.C., Canada.
O. Tuncel, K. Altun, B. Barshan, “Classifying human leg motions with uniaxial piezoelectric gyroscopes,” Sensors,
9(11):8508—8546, November 2009.
D. Willems, Interactive Maps – using the pen in human-computer interaction, PhD Thesis, Radboud University
Nijmegen, Netherlands, 2010
(accessed online: http://www.donwillems.net/waaaa/InteractiveMaps_PhDThesis_DWillems.pdf)
IEEE Haptics Symposium 2012
163