Download CS 59000 Statistical Machine learning Lecture 12

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CS 59000 Statistical Machine learning
Lecture 12
Yuan (Alan) Qi
Outline
• Review of Laplace approximation, BIC,
Bayesian logistic regression
• Kernel methods
• Kernel ridge regression
• Kernel construction
• Kernel principle component analysis
Laplace Approximation for Posterior
Gaussian approximation around mode:
Evidence Approximation
Bayesian Information Criterion
Approximation of Laplace approximation:
More accurate evidence approximation
needed
Bayesian Logistic Regression
Kernel Methods
Predictions are linear combinations of a kernel
function evaluated at training data points.
Kernel function <-> feature space mapping
Linear kernel:
Stationary kernels:
Fast Evaluation of Inner Product of Feature
Mappings by Kernel Functions
Inner product needs computing six feature
values and 3 x 3 = 9 multiplications
Kernel function has 2 multiplications and a
squaring
Kernel Trick
1. Reformulate an algorithm such that input
vector enters only in the form of inner
product
.
2. Replace input x by its feature mapping:
3. Replace the inner product by a Kernel
function:
Examples: Kernel PCA, Kernel Fisher
discriminant, Support Vector Machines
Dual Representation for Ridge Regression
Dual variables:
Kernel Ridge Regression
Using kernel trick:
Now the cost function depends on input only
through the Gram matrix.
Kernel Ridge Regression
Equivalent cost function over dual variables:
Minimize over dual variables:
Constructing Kernel function
Example: Gaussian kernel
Consider Gaussian kernel:
Why is it a valid kernel?
Example: Gaussian kernel
Consider Gaussian kernel:
Why is it a valid kernel?
Generalization:
Combining Generative & Discriminative Models by Kernels
Since each modeling approach has distinct
advantages, how to combine them?
• Use generative models to construct kernels
• Use these kernels in discriminative
approaches
Measure Probability Similarity by Kernels
Simple inner product:
For mixture distribution:
For infinite mixture models:
For models with latent variables (e.g,. Hidden
Markov Models:)
Fisher Kernels
Fisher Score:
Fisher Information Matrix:
Fisher Kernel:
Sample Average:
Principle Component Analysis (PCA)
Assume
We have
is a normalized eigenvector:
Feature Mapping
Eigen-problem in feature space
Dual Variables
Suppose
(why it cannot be smaller
than 0?), we have
Eigen-problem in Feature Space (1)
Multiplying both sides by
, we obtain
Eigen-problem in Feature Space (2)
Normalization condition:
Projection coefficient:
General Case for Non-zero Mean Case
Kernel Matrix:
Kernel PCA on Synthetic Data
Contour plots of projection coefficients in feature space
Limitations of Kernel PCA
Discussion…
Limitations of Kernel PCA
If N is big, it is computationally expensive
since K is N by N while S is D by D.
Not easy for low-rank approximation.
Related documents