Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Reinforcement learning wikipedia, lookup

Pattern recognition wikipedia, lookup

Expectation–maximization algorithm wikipedia, lookup

Data assimilation wikipedia, lookup

Machine learning wikipedia, lookup

Types of artificial neural networks wikipedia, lookup

K-nearest neighbors algorithm wikipedia, lookup

Generalized linear model wikipedia, lookup

Inverse problem wikipedia, lookup

Computational fluid dynamics wikipedia, lookup

Predictive analytics wikipedia, lookup

Theoretical computer science wikipedia, lookup

History of numerical weather prediction wikipedia, lookup

Transcript

Total Variation and Euler's Elastica for Supervised Learning Tong Lin, Hanlin Xue, Ling Wang, Hongbin Zha Contact: [email protected] Key Laboratory of Machine Perception, School of EECS, Peking University, Beijing 100871, China MODELS MOTIVATION Supervised learning infers a function that maps inputs to desired outputs with the guidance of training data. The state-of-the-art algorithm is SVM based on large margin and kernel trick. It was observed that SVM is liable to overfitting, especially on small sample data sets; sometimes SVM can offer 100% accuracies. We argue that maximal margin should not be the sole criterion for supervised learning; the curvature and gradients of the output decision boundaries can play an important role to avoid overfitting. Our method is inspired by the great success of Total variation (TV) and Euler’s elastica (EE) in image processing. We extend TV and EE to high dimensional supervised learning settings. PROBLEM DEFINITION Given training data {(xi , yi )}, i 1,..., n , the goal is to infer the underlying mapping: Total Variation Total variation is the measure of total quantity of TV model: JTV [u] (u y)2 dx | u | dx the changes of a function, which has been widely used in image processing, such as the ROF denoising. The proposed TV model can be seen as a special case of the following EE model. Euler’s Elastica (EE) model in image inpainting: J EE [ I ] 2 E ( I I 0 ) 2 dx E K (a b 2 ) | I | dx curvature formula: I n, n | I | Curvature of a curve Euler’s Elastic Energy EE model: J EE [u] (u y)2 dx (a b 2 ) | u | dx u | u | Mean curvature of a hypersurface/decision boundary Geometric intuition of TV/EE regularization: u : xi yi , i 1,..., n A general framework for supervised learning is the following regularization framework: min i 1 L(u(xi ), yi ) S (u) n where L denotes the loss function, and S(u) is the regularization/smoothing term. This discrete model can be changed into the following continuous form with squared loss: min (u y)2 dx S (u) IMPLEMENTATION Using the calculus of variations, we get the following Euler-Lagrange PDEs: Our purpose is to introduce two smoothing terms (TV & EE) for supervised learning tasks. FLOW CHART Energy functional with TV/EE regularization terms min J [u] (u y)2 dx S (u) STV (u) | u | dx u (u y) 0 | u | V ( ) Euler-Lagrange PDE via the calculus of variations RBF function approximations Two numerical methods to find the PDE solutions Classification results on synthetic data sets: SVM Radial Basis Function Approximation SEE (u) (a b ) | u | dx 2 u(x ) i 1 wii (x ) Φw m i (x ) exp( c || x x i ||2 ) V (u y ) 0 u 1 1 T ( '( ) | u |) u ( u ( '( ) | u |)) 3 | u | | u | | u | (1) GD: Gradient Descent Time Marching The PDEs are nonlinear and high dimensional, so we use function approximations to find the numerical solutions. (2) LagLE: Lagged Linear Equation Iteration EXPERIMENTAL RESULTS The proposed TV/EE methods are compared with SVM, BPNN, and LR (Laplacian Regularization) on benchmark data sets for binary classification, multi-class classification, and regression. EE Key Lab. Of Machine Perception, School of EECS, Peking University, China