Download slides_chapter1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Least squares wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Transcript
ECSE 6610
Pattern Recognition
Professor Qiang Ji
Spring, 2011
Pattern Recognition Overview
Training
Training
Raw Data
Feature
extraction
Training
Features
Unknown
Output
Values
Classifier/
Regressor
Testing
Testing
Raw Data
Feature
extraction
Training
Features
Learned
Classification/
Regression
Classifier/
Regressor
Output
Values
Feature extraction: extract the most discriminative features to concisely represent
the original data, typically involving dimensionality reduction
Training/Learning: learn a mapping function that maps input to output
Classification/regression: map the input to a discrete output value for classification
and to continuous output value for regression.
Pattern Recognition Overview (cont’d)
• Supervised learning
• Both input (feature) and output (class labels)
are provided
• Unsupervised learning-only input is given
• Clustering
• Dimensionality reduction
• Density estimation
• Semi-supervised learning-some input has
output labels and others do not have
Examples of Pattern
Recognition Applications
• Computer/Machine Vision
 object recognition, activity recognition, image segmentation, inspection
• Medical Imaging

Cell classification
• Optical Character Recognition
 Machine or hand written character/digit recognition
• Brain Computer Interface
 Classify human brain states from EEG signals
• Speech Recognition
 Speaker recognition, speech understanding, language translation
• Robotics
 Obstacle detection, scene understanding, navigation
Computer Vision Example:
Facial Expression Recognition
Machine Vision Example
Example: Handwritten Digit Recognition
Probability Calculus
U is the sample space
X is a subset of the outcome or an event
P(X ˅ Y)=P(X)+P(Y) - P(X ˄Y)
,i.e, X and Y are mutually exclusive
8
Probability Calculus (cont’d)
• Conditional independence
X Y | Z 
P( X | Y , Z )  P( X | Z ) or
P( X , Y | Z )  P( X | Z ) P(Y | Z )
• The Chain Rule
Given three events A, B, C
P ( A, B, C )  P ( A | B, C ) P ( B | C ) P (C )
9
The Rules of Probability
• Sum Rule
• Product Rule
Combining the sum and product rules yields
P ( X )   P ( X | Y ) P (Y )
Y
and conditional sum rule
P (C | A)   P (C | A, B ) P ( B | A)
B
Bayes’ Theorem
posterior  likelihood × prior
Bayes Rule
p(A, B) p(B | A)p(A)


p(A | B)
p(B)
p(B)
p(E | A i )p(A i )
p(E | A i )p(A i )

p(A i | E) 
p(E)
 p(E | Ai )p(A i )
A1
A2 A3 A4
E
A6
A5
i
•
•
•
•
•
Based on definition of conditional probability
p(Ai|E) is posterior probability given evidence E
p(Ai) is the prior probability
P(E|Ai) is the likelihood of the evidence given Ai
p(E) is the probability of the evidence
13
Bayesian Rule (cont’d)
P( H | E1 ) P( E2 | E1 , H )
P( H | E1 ) P( E2 | E1 , H )
P( H | E1 , E2 ) 

P( E2 | E1 )
 P( H | E1 ) P( E2 | E1 , H )
H
Assume E1 and E2 are independent given H, the above equation may
be written as
P( H | E1 , E2 ) 
P( H | E1 ) P( E2 | H )
 P( H | E1 ) P( E2 | H )
H
where
P( H | E1 )
is the prior and
P( E2 | H )
is the likelihood of H given E2
A Simple Example
Consider two related variables:
1. Drug (D) with values y or n
2. Test (T) with values +ve or –ve
And suppose we have the following probabilities:
P(D = y) = 0.001
P(T = +ve | D = y) = 0.8
P(T = +ve | D = n) = 0.01
These probabilities are sufficient to define a joint probability distribution.
Suppose an athlete tests positive. What is the probability that he has
taken the drug?
P(D  y|T  ve) 


P(T  ve | D  y ) P( D  y )
P(T  ve | D  y ) P( D  y )  P(T  ve | D  n) P( D  n)
0.8  0.001
0.8  0.001  0.01 0.999
0.074
15
Expectation (or Mean)
• For discrete RV X
E ( X )   x  p ( x)
x
• For continuous RV X
E ( X )   x  p ( x) dx
x
• Conditional Expectation
E ( X | y )   x  p ( x | y ) dx
x
16
Expectations
Conditional Expectation
(discrete)
Approximate Expectation
(discrete and continuous)
Variance
• The variance of a RV X
Var ( X )  E[( X  E ( X )) 2 ]  E ( X 2 )  E 2 ( X )
• Standard deviation
 X  Var ( X )
• Covariance of RVs X and Y,

2
XY
 E[( X  E ( X ))(Y  E (Y ))]  E ( XY )  E ( X ) E (Y )
• Chebyshev inequality
1
P(| X  E ( X ) | k x )  2
k
18
Independence
• If X and Y are independent, then
E ( X  Y )  E ( X ) E (Y )
Var ( X  Y )  Var ( X )  Var (Y )
20
Probability Densities
p(x) is the density function, while P(x) is the cumulative distribution. P(x) is a
non-decreasing function.
Transformed Densities
The Gaussian Distribution
Gaussian Mean and Variance
The Multivariate Gaussian
mmean vector
Scovariance matrix
Minimum Misclassification Rate
Two types of mistakes:
False positive (type 1)
False negative (type 2)
The above is called Bayes error. Minimum Bayes error is achieved at x0
Generative vs Discriminative
Generative approach:
Model
Use Bayes’ theorem
Discriminative approach:
Model
directly