Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Advanced Pattern Recognition Lecture 1 Spring 2007 [1] J.Shawe-Taylor, N.Christianini, ”Kernel methods for Pattern Analysis” Cambridge University Press, 2004. [2] B. Schőlkopf, A.Smola, ”Learning with kernels”, MIT Press, 2002. [3] www.kernel-methods.net [4] Journal papers, tutorials. • Pattern Recognition is a field of Computational Intelligence, where predefined form of an input signal is searched or similarities between the signal forms are studied. • The input signal can be from an electric measurement or e.g. text documents. • Computational Intelligence includes methods groups like Artificial Intelligence, Pattern Recognition, Fuzzy Logic, Genetic Algorithms, Neural Networks, etc. • Application fields include Image Analysis, Speech Recognition, medical signal and DNA sequence analysis. Watanabe: •Traditionally Pattern Recognition (PR) is divided into statistical pattern recognition and structural pattern recognition. • In statistical PR signal statistics are needed for recognition. • In structural (syntactical) PR a pattern is described by a grammar for structural elements. PR applications PR applications (cont’d) Examples • Handwriting recognition on PDA. • Structural PR: recognition of hanzi. • Cluster analysis (area, length, etc.). How many clusters? • Forest photo segmentation. Example: Face detection Example: Segmentation for multiple sclerosis Salmon or Sea bass? Overfitting problem. Non-linear decision boundary Planetary data: 3 T is period of convolution (years), and R is radius of orbit The quantity R3/T2 remains the same. Example: Planetary data (1) T2 Const 3 R log(T) or 2 log T 3 log R C * log(R) Example: Planetary data (2) y2/b2 y x x2/a2 The artificial planetary data lying on an ellipse in two dimensions and the same data represented using features x2 and y2 showing a linear relation: x2 y2 2 1 2 a b We will define pattern recognition as a function (pattern function) for which f(x) = 0 (1) For example, for the planetary data f(R,T) = R3T2=0 Zero in ideal case, in practical cases f(x) not equal to zero: f(D,P) = R3T20 Let’s assume that we have a function g(x), which predicts output (i.e. class where a pattern belongs to) for input x. Let’s also assume that we have a training set where we know correct output g (i.e. classification). The training set is formed by pairs (x,y). The function f can be defined as f(x,y) = L(g(x), y)=0 (2) where g is prediction function, and L:XYR+ is a loss function for which the volume is 0, when predicted class (g(x)) and the correct class (y) are equal. In practical cases relation f(x,y)=0 is not exact but we have to accept approximation f(x,y) 0. Note 1. If (2) is exactly valid for a training set, here is a risk of overfitting. It means that we tune the recognition for that set at does not work well in general. In statistical sense, this means that Ef(x) 0, where the E is the expectance value. Definition A. Pattern Analysis Algorithm takes as input a finite set of examples from the source of data to be analyzed. Its output is either an indication that no patterns detectable in the data, or a positive pattern function f that the algorithm assert satisfies Ef(x) = 0 (5) Pattern Analysis Algorithm should be • Computationally efficient, it is possible to use large data sets; • Robust, must be able to handle noisy data and identify approximate patterns; • Statistically stable, the output of the algorithm should not be depending on the data sets used. . • It would be desirable to use linear functions. In practice, the data is often not separable by linear functions. • In Kernel Methods we use linear feature space by kernels without doing actual transform to the feature space. Example ( x1 , x2 ) ( z1 , z2 , z3 ) x12 , 2 x1 x2 , x22 Inner product is ( x), (u ) x12 , 2 x1 x2 , x22 , u 12 , 2u1u2 , u22 x12u12 2 x1 x2u1u2 x22u22 x1u1 x2u2 ( x, u ) 2 2