* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 幻灯片 1 - Peking University
Survey
Document related concepts
Computational fluid dynamics wikipedia , lookup
Corecursion wikipedia , lookup
History of numerical weather prediction wikipedia , lookup
Computer simulation wikipedia , lookup
Predictive analytics wikipedia , lookup
Reinforcement learning wikipedia , lookup
Learning classifier system wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Theoretical computer science wikipedia , lookup
Inverse problem wikipedia , lookup
K-nearest neighbors algorithm wikipedia , lookup
Generalized linear model wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Machine learning wikipedia , lookup
Transcript
Total Variation and Euler's Elastica for Supervised Learning Tong Lin, Hanlin Xue, Ling Wang, Hongbin Zha Contact: [email protected] Key Laboratory of Machine Perception, School of EECS, Peking University, Beijing 100871, China MODELS MOTIVATION Supervised learning infers a function that maps inputs to desired outputs with the guidance of training data. The state-of-the-art algorithm is SVM based on large margin and kernel trick. It was observed that SVM is liable to overfitting, especially on small sample data sets; sometimes SVM can offer 100% accuracies. We argue that maximal margin should not be the sole criterion for supervised learning; the curvature and gradients of the output decision boundaries can play an important role to avoid overfitting. Our method is inspired by the great success of Total variation (TV) and Euler’s elastica (EE) in image processing. We extend TV and EE to high dimensional supervised learning settings. PROBLEM DEFINITION Given training data {(xi , yi )}, i 1,..., n , the goal is to infer the underlying mapping: Total Variation Total variation is the measure of total quantity of TV model: JTV [u] (u y)2 dx | u | dx the changes of a function, which has been widely used in image processing, such as the ROF denoising. The proposed TV model can be seen as a special case of the following EE model. Euler’s Elastica (EE) model in image inpainting: J EE [ I ] 2 E ( I I 0 ) 2 dx E K (a b 2 ) | I | dx curvature formula: I n, n | I | Curvature of a curve Euler’s Elastic Energy EE model: J EE [u] (u y)2 dx (a b 2 ) | u | dx u | u | Mean curvature of a hypersurface/decision boundary Geometric intuition of TV/EE regularization: u : xi yi , i 1,..., n A general framework for supervised learning is the following regularization framework: min i 1 L(u(xi ), yi ) S (u) n where L denotes the loss function, and S(u) is the regularization/smoothing term. This discrete model can be changed into the following continuous form with squared loss: min (u y)2 dx S (u) IMPLEMENTATION Using the calculus of variations, we get the following Euler-Lagrange PDEs: Our purpose is to introduce two smoothing terms (TV & EE) for supervised learning tasks. FLOW CHART Energy functional with TV/EE regularization terms min J [u] (u y)2 dx S (u) STV (u) | u | dx u (u y) 0 | u | V ( ) Euler-Lagrange PDE via the calculus of variations RBF function approximations Two numerical methods to find the PDE solutions Classification results on synthetic data sets: SVM Radial Basis Function Approximation SEE (u) (a b ) | u | dx 2 u(x ) i 1 wii (x ) Φw m i (x ) exp( c || x x i ||2 ) V (u y ) 0 u 1 1 T ( '( ) | u |) u ( u ( '( ) | u |)) 3 | u | | u | | u | (1) GD: Gradient Descent Time Marching The PDEs are nonlinear and high dimensional, so we use function approximations to find the numerical solutions. (2) LagLE: Lagged Linear Equation Iteration EXPERIMENTAL RESULTS The proposed TV/EE methods are compared with SVM, BPNN, and LR (Laplacian Regularization) on benchmark data sets for binary classification, multi-class classification, and regression. EE Key Lab. Of Machine Perception, School of EECS, Peking University, China