Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PATTERN CLASSIFICATION By: Dr. Rajeev Srivastava PATTERN CLASSIFICATION Deals with: • Concept of classifiers • Evaluation of classifiers • Structural and syntactic recognition methods • Clustering algorithms INTRODUCTION The process of comparing an unknown object with stored patterns to recognize the unknown object is called classification. It is the process of applying a label or pattern class to an unknown instance. It is the study of how machines can observe the environment , learn to distinguish patterns of interest and make reasonable decisions about the categories of the patterns. PATTERN CLASSIFICATION DESIGN CYCLE IMAGE ACQUISITION MAIN PROGRAM IMAGE PREPROCESSING ALGORITHM IF SATISFACTORY EVALUATION OF RESULTS IF NOT REPEAT WHOLE PROCESS EXTRACTION OF FEATURES FEATURE DATA COLLECTION AND PREPROCESSING LEARNING 1. 2. 3. One of the important component in pattern recognition is the ability of the system to learn from the data. Learning means the development of algorithms by acquiring knowledge from the given empirical data. Various learning approaches are : Supervised Learning Unsupervised Learning Reinforced Learning SUPERVISED LEARNING It needs an explicit supervision over the system. A cost/label is provided for each pattern in a training set ,based on which the system learns to generate a concept to classify the pattern. Once the system becomes a learnt system , the test data is supplied to test the system. UNSUPERVISED LEARNING There is no such explicit supervision required for this unsupervised system , the system itself learns by trial and error method. The instances used form groups or clusters , based on similarity measures. The goal of clustering is similar to that of classification , however it is performed where domain model is not available. The user has to provide the number of clusters they desire. REINFORCED LEARNING Here the learning system is binary in decision outputs. The binary feedback of right or wrong is sent back to the input and is used to reinforce learning from the data. The learning continues unless the learning system is right , given only two binary assessment to be right or wrong. STAGES OF PATTERN RECOGNITION DESIGN CYCLE Stages in the pattern recognition design cycle include: 1) Feature data collection and preprocessing. 2) Choosing the pattern recognition model. 3) Testing and evaluation of the performance of the pattern recognition task. FEATURE DATA COLLECTION AND PREPROCESSING. 1. 2. 3. 4. This is one of the important phases in pattern recognition because the quality of the pattern recognition task depends on the quality of the input feature data. The procedures in this phase are related to : Collection of training data. Noise removal Identifying the missing value Performing data transformations to normalize and condition the data. TRAINING DATASET The training dataset comprises of vectors , patterns ,cases , samples , or observation of an object. The collection of these data is called an image dataset or a feature dataset , this is stored in a dataset called feature database. Some of the characteristics of the dataset are high dimensionality and sparseness. COMPRESSION Data objects with a large number of bands increase the computational complexity of the image and the sparseness of the dataset also poses problems such as poor quality. Compression can be applied to maintain the object at reasonable size in these cases. PROBLEMS IN FEATURE DATA COLLECTION Some of the factors that may affect the quality and reliability of the results are noise , artefact , bias , imprecision , and inaccuracy of the input data. Some common data collection problems are the presence of outliers , missing and inconsistent values , and duplicate data. Some of the qualities of good data for training the classifier are timeliness , relevence , and self-sufficiency. PATTERN CLASSIFICATION MODELS Template matching approach Classification based approach- statistical & syntactic Artificial Neural Networks(ANN) approach TEMPLATE MATCHING Also known as MATCHED FILTERING. This technique compare portions of images against one another. The target object to be identified is defined as a template. The template is then superimposed on and correlated with the image. The correlation is high if there is a perfect match between the template and the image. Based on the highest correlation value the degree of match can be determined. TEMPLATE MATCHING METHODS The matching process moves the template image to all possible positions in a larger source image and computes a numerical index that indicates how well the template matches the image in that position. The correlation between the template and the image replaces the center pixel of the mask in the resultant image. Match is done on a pixel-by-pixel basis. The maximum value indicates the best match. I(x,y) x,y Correlation O(x,y) x,y Template Image Input Image Output Image TYPES OF TEMPLATE MATCHING There are basically 2 types of template matching implementations: 1. 2. BI-LEVEL IMAGE TEMPLATE MATCHING GREY-LEVEL IMAGE TEMPLATE MATCHING. BI-LEVEL IMAGE TM Template is a small image, usually a bi-level image. Find template in source image, with a Yes/No approach. Template Source GREY-LEVEL IMAGE TM When using template-matching scheme on grey-level image it is unreasonable to expect a perfect match of the grey levels. Instead of yes/no match at each pixel, the difference in level should be used. Template Source Image EUCLIDEAN DISTANCE Let I be a gray level image and g be a gray-value template of size nm. d ( I , g , r , c) n m 2 I (r i, c j) g (i, j) i 1 j 1 In this formula (r, c) denoted the top left corner of the template g. CORRELATION Correlation is a measure of the degree to which two variables agree, not necessary in actual value but in general behavior. The two variables are the corresponding pixel values in two images, template and source. If we assume f(x,y) as the given image and w(x,y) is the template , then correlation of the image and the template is given by: C(x,y) = 𝜶 𝜷 𝒘(𝜶, 𝜷)f(x+𝜶, 𝒚 + 𝜷) GREY-LEVEL CORRELATION FORMULA cor N 1 i 0 N 1 ( xi x ) yi y N 1 2 x x y y i i i 0 2 i 0 𝑥𝑖 is the template gray level image 𝑥 is the average grey level in the template image 𝑦𝑖 is the source image section 𝑦 is the average grey level in the source image N is the number of pixels in the section image (N= template image size = columns * rows) The value ‘cor’ is between –1 and +1, with larger values representing a stronger relationship between the two images. DISADVANTAGES OF TEMPLATE MATCHING No variation in scale or orientation is permitted. It involves large calculations when used for higher dimensions , hence feature based schemes are preferred. CLASSIFICATION It is a supervised learning method. Classification involves two phases: 1. Training phase: A classifier first need to be trained or better say it should learn the complex relationship between the input image features using the training data. 2. Testing phase: After the learning process is over the classifier is called a ‘learnt system’ and produces a classification model , therefore the classifier assigns level either as correct or incorrect. CLASSIFICATION SCHEME UNKNOWN IMAGE OR OBJECT TEST FEATURES KNOWN OBJECT FEATURES LEARNING ALGORITHM CLASSIFICATION MODEL LABEL TRAINING PHASE In this phase , the classifier algorithm is fed with a large set of known dataset, called training data or labelled data. A dataset is required to train the classifier to classify input . These attributes are called input features , attributes or independent variables and they should be large and representative in nature. Once training phase is over the data driven classification model is created. TESTING PHASE 1. In this phase the constructed model is tested and evaluated with unknown test data. The model can be either : Descriptive-It can explain its classification decision. e.g. - decision tree based classifier. 2. Predictive-It cant explain its decision. e.g. -neural network based classifiers. TYPES OF CLASSIFIERS (Based on input ) 1. 2. The difference between the classifiers lies only on the nature of the data. There can be 2 types of classifiers: PIXEL BASED: The input to the classifier is raw pixel data , the classifier in this case takes images which has several pixel of the required regions. FEATURE BASED: This technique extract the features of the image such as size , shape , location , texture which are then used for classification. FACTORS AFFECTING PERFORMANCE OF A CLASSIFIER Generally the performance of the classifier depends on these factors: Nature of data : A classification model depends on the availability of good quality training data , another problem is that of missing data ,missing data may be unintentional or deliberate. Nature of learning: ‘ Over-fitting of the model ’: The learning process should not take more than necessary it leads to a generalization error. CLASSIFIER DESIGN PARAMETRIC TECHNIQUES CLASSIFICATION ALGORITHM STATISTICAL TECHNIQUES NON-PARAMETRIC TECHNIQUES NON STATISTICAL TECHNIQUES SYNTACTIC TECHNIQUES STRUCTURAL TECHNIQUES DECISION THEORETIC TECHNIQUES PROBABILISTIC TECHNIQUES HYBRID TECHNIQUES STATISTICAL CLASSIFIERS Statistical classifiers use statistical principles for deriving models from given training dataset using statistical learning techniques. These are of two types: 1. Parametric classifier 2. Non parametric classifier PARAMETRIC CLASSIFIER These classifiers takes a set of training data and construct a classification model. The parameters are estimated by assuming a probability distribution or density for each data set. Then statistical parameters such as mean and variance are found. These are of two types based on the techniques they use: Decision theoretic techniques 2. Probabilistic techniques 1. DECISION THEORETIC METHODS Often called DISCRIMINANT FUNCTION ANALYSIS The idea used here is to classify the object by designing a decision boundary or discriminating functions to separate the feature vector clusters in the feature space. The decision function is designed so as to give different responses to different classes. E.g. LDA(linear discriminant analysis). LDA(linear discriminant analysis). The idea here is to use decision functions to discriminate the input features. let x=(𝑥1 ,𝑥2 .....𝑥𝑛 )𝑇 represent the n-dimensional vectors. let the number of classifiers be k . Here we design k decision functions 𝑑1 𝑥 , 𝑑2 𝑥 , 𝑑3 𝑥 … . . 𝑑𝑘 𝑥 . The instance is classified as class i and not j if: 𝑑𝑖 𝑥 >𝑑𝑗 𝑥 ; i ≠ j for i,j=1,2....k Then the decision boundary is given as : 𝑑𝑖 𝑥 − 𝑑𝑗 𝑥 =0 The decision rule can be designed as: Assign the instance to the class i if 𝑑𝑖𝑗 𝑥 >0 and assign the instance to j if 𝑑𝑖𝑗 𝑥 <0 . PROBABILISTIC TECHNIQUES These use probabilistic techniques for classification. These are based on two probability concepts: prior probability & conditional probability One of the most popular classifier based on this is the Bayesian classifier. Bayesian principle: One can find the inverse probability P(i/x) from P(x/i) and P(i) from the Bayes theorem ,given by: P(𝑖 𝑥) = 𝑃 𝑥 𝑖 𝑃(𝑖) 𝑃(𝑥) BAYESIAN CLASSIFIER The Bayesian classifier requires three piece of information: • P(𝐶𝑖 ) - Prior probability of the class i 𝑥 • P( ) - Conditional probability that the class i has x. This can be 𝑖 calculated from the training data table 𝑥 𝑖 • P(𝑥) –Sum of P( ) over the entire dataset. This information is not probability information , but serves as a normalization factor. There are four types of Bayesian classifier as shown: Bayesian principle Maximum likelihood classifier Minimum Distance classifier Minimum risk classifier Bayesian classifier for multiple features BAYESIAN CLASSIFIER: ALGORITHM 1. 2. 3. 4. The algorithm for finding the Bayesian classifier : Train the classifier with the training images or labeled featured data Compute the probability P(i) using intuition based on experts’ opinion , or using histogram-based estimation. Compute P(i/x) Find the maximum P(i/x) and assign the unknown instance to that class. PROS AND CONS OF BAYESIAN CLASSIFIERS The Bayesian classifiers have advantages because: They are much easy to use. 2. They require only one scan of the training set. 3. They are not affected much by missing values. 4. They produce good results for datasets with simple relationships. 1. The only disadvantage of the Bayesian classifiers is that it cant be used for continuous data. MAXIMUM LIKELIHOOD CLASSIFIER According to Bayesian Maximum Likelihood classifier , the instance is assigned into a class i for which P(i/x) maximum. Suppose the attributes are many (m) independent variables , this is given as : 𝑷(𝒙/𝒊) = 𝒎 𝒌=𝟏 𝑷(𝒙𝒊𝒌 /𝒊) In other words the instance having many attributes is assigned to class i and not to class j if: P(i/x) = P(j/x) This resultant algorithm is called Maximum Likelihood Classifier. If the attributes are assumed to be independent the same classifier is then called Naive Bayesian Classifier. MINIMUM DISTANCE CLASSIFIER When the training set is of many images it is easier to approximate P(x/i) as a function with fewer parameters. This approximation of the input data is in the form of a Gaussian distribution. This kind of approximation is called Parametric approximation. PARAMETRIC APPROXIMATION Parametric approximation is given by 𝒙−𝒎𝒊 𝟐 𝟐 𝟐 𝝈𝒊 } P(x/i)=(𝟏 𝟐𝝅𝝈𝒊 ) ∗ 𝒆{ 𝑚𝑖 𝑎𝑛𝑑 𝜎𝑖 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑠𝑡𝑎𝑛𝑑𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝑖. Since the class is multi dimensional , the mean becomes a covariance matrix 𝒊 . So the resultant formula is: P(x/i)=P(i) * {𝟏 (𝒙−𝒎𝒊 )𝑻 (𝒙−𝒎𝒊 ) (𝟐𝝅)𝒅 det 𝒊 }*𝒆−𝟏/𝟐(𝒙−𝒎𝒊 Here , The term 𝟏 Similarly ,by taking log and simplifying ,this expression yields: 𝒊 (𝟐𝝅)𝒅 det )𝑻 𝒊 (𝒙−𝒎𝒊 ) is called MAHALANOBIS DISTANCE. 𝒊 can be ignored as it is a scaling factor. P(𝒊 𝒙) = 𝒍𝒐𝒈𝒊 P(i)- 𝟏 𝒍𝒐𝒈𝒆 𝟐 𝒊 - 𝟏/𝟐 𝒙 − PARAMETRIC APPROXIMATION(CONT.) Therefore based on the distance used ,there are variations in Bayesian distance classifier: 1. Mahalanobis distance 2. Euclidean distance 3. City block distance Mahalanobis distance is the most reliable but computationally intensive as compared to the others. DECISION FUNCTIONS FOR MINIMUM DISTANCE CLASSIFIERS The decision function for class i with mean 𝑚𝑖 is denoted as: 𝟏 𝒅𝒊 (x)=𝒙𝑻 𝒎𝒊 − 𝒎𝒊 𝑻, for i=1,2,3… 𝟐 The approach used here is to assign the instance to the classifier if the distance between unknown sample and the mean vector is minimum , where 𝟏 Mean vector of a pattern class i : 𝒎𝒊 = 𝒙 ; i=1,2,….k 𝑵 𝒙∈𝒘𝒊 𝒊 The Euclidean distance to compute the distance between the unknown instance x and the mean vector: 𝒅𝒊 = 𝒙 − 𝒎𝒊 𝟏 𝟐 Norm of 𝒅𝒊 ∶ 𝒙𝑻 𝒎𝒊 − 𝒎𝒊 𝑻 𝒎𝒊 ; for i=1,2…k 𝑤ℎ𝑒𝑟𝑒 𝑛𝑜𝑟𝑚 𝑖𝑠 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑎𝑠 𝒂 = 𝒂𝑻 𝒂 𝟏 𝟐 𝟏 Similarly norm for class j : 𝒅𝒋 : 𝒙𝑻 𝒎𝒋 − 𝒎𝒋 𝑻 𝒎𝒋 ; for j=1,2…k 𝟐 The decision boundary between the classes i & j can be calculated as 𝒅𝒊 (x)- 𝒅𝒋 (x). This is equivalent to : 𝒙𝑻 𝒎𝒊 − 𝒎𝒊 𝑻 𝒎𝒊 − 𝒙𝑻 𝒎𝒋 + 𝒎𝒋 𝑻 𝒎𝒋 1. 2. 3. For n=2,dividing function is a line. For n=3,it’s a plane For n>3 it is a hyper plane 𝟏 𝟐 𝟏 𝟐 MINIMUM RISK CLASSIFIER A cost function called loss function is assigned to the classification . In case of any error of misclassification or a risk ,a penalty is assigned so that the risk can be minimized or avoided in future. The cost of the decision is based on the nature of the application in which the classifier are used. The estimated cost or loss function is multiplied with the posterior probabilities for taking the final decision of assigning a label for the unknown instance. The decision rule can be designed as follows: 𝜶𝟐 𝒊 𝜶𝟏 𝒋 IF Loss( )xP( ) > Loss( )xP( ) : assign an instance x to the 𝒊 𝒙 𝒋 𝒙 class i 𝜶𝟐 𝒊 𝜶𝟏 𝒋 IF Loss( )xP( ) < Loss( )xP( ) : assign the instance x to the 𝒊 𝒙 𝒋 𝒙 class j Here, 𝛼1 𝑎𝑛𝑑 𝛼2 are the costs of the decisions. BAYESIAN CLASSIFIER FOR MULTIPLE FEATURES Real world problems involve objects having multiple attributes. In this case a set of features is used as a feature vector So for k classes, 𝒙 P(𝒊 𝒙) = 𝑷 𝒊 𝑷( 𝒊 ) 𝒌 𝑷 𝒋 𝑷(𝒙) 𝒋=𝟏 𝒋 For P(x) being Gaussian distribution given by: P(x) = 𝟏 (𝟐𝝅)𝒅 𝐝𝐞𝐭 𝒆 𝟏 −𝟐[(𝒙−𝒎)𝑻 −𝟏(𝒙−𝒎)] If more feature are involved mean becomes a mean vector and , a covariance matrix. NON PARAMETRIC STATISTICAL METHODS In this method the representative of every class is selected. The classification is performed by assigning each tuple to the class to which it is more similar. Let the Classes be {𝒄𝟏 , 𝒄𝟐 , … 𝒄𝒏 } & Training dataset D has {𝒕𝟏 , 𝒕𝟐 , … 𝒕𝒏 } The K-nearest neighbours problem is to assign 𝑡𝑖 to the class 𝑐𝑗 such that the similarity measure of (t,𝒄𝒋 ) is greater than or equal to the similarity measure of (t,𝒄𝒊 ),where i ≠ j. The similarity measure can be obtained by using distance measures. 1. 2. 3. ALGORITHM: Choose the representative of the class . Normally, the center or the centroid of the class is chosen as the representative of the class. Compare the test tuple and the center of each class. Classify the test tuple to the appropriate class. REGRESSION METHODS Regression is one of the method used for numerical prediction Regression analysis models one or more independent variables (results) and a dependent variable (input attributes). e.g. - Fitting a line to a set of points: It can be described as Y=𝑾𝟎 + 𝑾𝟏 𝒙 where 𝑾𝟎 & 𝑾𝟏 are the weights of the regression coefficients. The coefficients can be found using method of least squares to fit a line that minimizes the error between the actual data and the estimate. If D is the training set: 𝑾𝟏 = 𝑫 𝒊=𝟏(𝒙𝒊 − 𝒙)((𝒚𝒊 − 𝒚)/ 𝑫 𝒊=𝟏(𝒙𝒊 − 𝒙)𝟐 𝑾𝟎 =𝑦 − 𝑤𝑖 𝑥 Here 𝑥 𝑎𝑛𝑑 𝑦 are the mean values of the data x and y. STRUCTURAL AND SYNTACTIC CLASSIFIER ALGORITHM Structural methods exploit the relationship that exist among the basic elements of the objects, They use techniques such as graphs to encode the objects and the problem of recognition becomes a matching problem. Syntactic methods( Grammer-based or linguistic approach) use strings or small sets of pattern primitives and grammatical rules for recognizing the object. SYNTACTIC CLASSIFIERS The idea is to decompose the object in terms of the basic primitives. The process of decomposing an object into a set of primitives is called Parsing. The basic primitives can then be reconstructed to the original object using formal languages to check whether the recognized pattern is obtained. Hence formal language theory plays an important role in syntactic classification. STAGES OF SYNTACTIC CLASSIFIER: 1st phase:Training phase: The syntactic classifier is given the training dataset of valid strings of known objects . The patterns are decomposed into basic patterns and the Grammer necessary for combining the primitives to reconstruct the original object is identified in the training phase. 2nd phase:Testing stage : Here unknown patterns are given into the Grammer of the syntactic classification system. Each unknown pattern is decomposed into the basic primitives and checked using a parser. SHAPE MATCHING ALGORITHMS Assume the shapes A & B have shape numbers in the form of a string of chain codes. Let the strings represent the shape characteristics of the boundary of an object. By this assumption , the shapes have a similarity of 𝛼 if : 𝑆𝑗 𝐴 = 𝑆𝑗 𝐵 𝑓𝑜𝑟𝑗 = 4,6,8 … . 𝛼 𝑆𝑗 𝐴 ≠ 𝑆𝑗 𝐵 𝑓𝑜𝑟 𝑗 = 𝑘 + 2, 𝑘 + 4, . . here j is the order The similarities are recorded in a matrix called Similarity matrix. Another way is to use the distance measure for shape matching. The distance measure is given as the reciprocal of the similarity measure 1 , which is given as D(A,B) = where D(A,B) is the distance between two 2 shapes A and B and k is the degree of similarity. STRING MATCHING ALGORITHMS: Let there be two regions , a and b. Assume that they are coded into two strings : a={𝑎1 , 𝑎2 … 𝑎𝑛 } b={𝑏1 , 𝑏2 … 𝑏𝑛 } Let us assume that 𝑎1 = 𝑎2 , 𝑏1 = 𝑏2 ,etc Let the position where there is no match ,that is 𝑎𝑘 ≠ 𝑎𝑘 , be 𝛼. Then the following two measures can be defined : 1. The number of symbol that do not match: 𝛽= max( 𝑎 , 𝑏 )-𝛼 where 𝑎 , 𝑏 are the lengths of the strings a and b. 𝛼is the number of matches between these strings. 𝛽 = 0 if no symbols match. 2. Degree of similarity R = 𝛼 = 𝛼 max( 𝑎 , 𝑏 )−𝛼 When the strings are the same , R=∞. The value of R is high when there is a good match between the strings. 𝛽 STRUCTURAL METHODS: RULE BASED ALGORITHMS Tree search is a popular approach that uses rules for classification. The simplest rules would be IF(condition) and THEN(conclusion) IF part is called antecedent or precondition THEN part is called rule consequent. Decision rules are generated using a technique called covering algorithm where the best attribute is chosen to perform the classification based on the training data. The algorithm chooses the best attribute that minimizes the error and uses that attributes in generating a rule. RULE BASED ALGORITHMS 1. 2. 3. 4. In a decision tree ,every node can have only two children. The root is specially designated node and all the other intermediate nodes of the tree represent the rule conditions. The leaves of the tree are classes that are assigned to the instances. The unknown object or instance features are taken and their values are compared and validated with the conditions represented sequentially in the internal nodes of the tree. Tracing the path from the root to the assigned class gives conditions that led to the classification of that instance. For any tree classifier , the required feature is searched , the search is continued till the instance is assigned to a class. Some of the algorithms that are used : Top down search DFS BFS A* algorithms. GRAPH BASED APPROACH The graph-based approach is an extension of the tree-based approach. Initially an object is modelled as a graph. Graph matching is then used to give the similarity measure of the objects. Two graphs can be similar even if they are structurally different.. If there is a complete match , the match is declared isomorphic otherwise it’s a dissimilar graph. EVALUATION OF CLASSIFIER ALGORITHMS Some of the techniques used are: Separate training sets This is one of the simplest methods for testing the classifier. The dataset is seperated into 2 sets: one of them is called training dataset and the other is test dataset and is used for testing the performance of the classifier. 1. EVALUATION OF CLASSIFIER ALGORITHMS(cont.) 2. k-fold cross validation: It is an improvement over the previous cross validation method. The dataset is divided in to k datasets. Each time a classifier is tested,k-1 subsets are together considered as training dataset and remaining are called test dataset. The process is then repeated for k trials. The average value of k is 10 EVALUATION OF CLASSIFIER ALGORITHMS(cont.) 3. Leave out cross validation Also called N-folding or jack-knifing technique. In this method every instance is treated as a dataset Then N classifiers are generated and each of them is used to classify the single instance. This method is unsuitable for real world problems because computation is intensive. EVALUATION OF CLASSIFIER ALGORITHMS(cont.) Method Performance Separate training sets Predictive accuracy : 𝑁 where , C is the number of instances correctly classified & N is the number of instances. k-fold cross validation Overall performance: Average error of misclassification of the classifier across all k trials. Leave out cross validation Predictive accuracy: 𝐶 𝒄𝒐𝒓𝒓𝒆𝒄𝒕𝒍𝒚 𝒄𝒍𝒂𝒔𝒔𝒊𝒇𝒊𝒆𝒅 𝒔𝒂𝒎𝒑𝒍𝒆𝒔 𝒕𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒊𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔 Metrics of qualitative quantification Description Classification time Time for constructing the model + Time for classification of unknown instances Robustness Immunity of the classifier to noise or missing data Scalable Able to handle large dataset Goodness of fit Quality of the model generated , as described in confusion matrix True positive rate(TP rate) Sensitivity of classifier: False positive rate(FP rate) Specificity f the classifiers: N=FP+TN False negative rate(FN rate) Probability of producing erroneous rate for 𝐹𝑁 positive instances: 𝑃 where P=TP+FN True negative rate(TN rate) Probability of producing erroneous rate for 𝑇𝑁 negative instances: 𝑁 where N=FP+TN Positive predictive value(precision) 𝑇𝑃 𝑃 where P=TP+FN 𝐹𝑃 𝑃 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 where Metrics of qualitative quantification Description Accuracy Ability of a classifier to classify instances: 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 Negative predictive value Error rate Probability of an object not classified 𝑇𝑁 correctly: 𝑇𝑁+𝐹𝑁 𝐹𝑃 + 𝐹𝑁 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 Graphical method for performance evaluation The Receiver operating characteristic(ROC) graph is an effective tool for visualization of a classifier performance as well as comparing the performance of many classifiers. It is a 2-D graph plot where x-axis is the FP rate and y axis is the TP rate. For any classifier the FP and TP rate can be plotted as an (x,y) value in a graph. To compare any two classifiers the points need to be compared. The ROC curve is helpful in understanding the tuning process that results in the best way of classification. Area under the curve indicates the accuracy of the model: if the area is one ,its it is perfect. A classifier performance can be crudely compared with the best classifier represented as (1,0) using a Euclidian Distance formula : Eucledian Distance = 𝑭𝑷 𝒓𝒂𝒕𝒆𝟐 + (𝟏 − 𝑻𝑷𝒓𝒂𝒕𝒆)𝟐 The Euclidian distance ranges from 0(best classifier) to 1(worst classifier) UNSUPERVISED LEARNING CLUSTERING Clustering is a technique for partitioning a group of images/data into meaningful disjoint subgroups. Images that are similar to each other , group themselves into a single cluster. All the images in a subgroup are similar to each other and images across the clusters are different. Clustering is an example of unsupervised learning where there is no idea about the classes or clusters prior to clustering. METHODS FOR FINDING THE SIMILARITY AND DISSIMILARITIES OF THE IMAGES Image clustering algorithm are based on the notion of similarity or dissimilarity between images. Proximity can be used to denote similarity and dissimilarity together. Similarity measures are indicated by distance functions. DISTANCE MEASURES Distance function characterize how close one image is to another. For a distance function to be called a Metric function it should fulfil Triangle equality: 1. D(i,j)≥0 for all i and j 2. D(i,j)=0 if i=j 3. D(i,j) = d(j,i) for all i & j 4. D(i,j) ≤ d(i,j)+ d(j,i),for all i,j,and k The distance measures depend on the data type of the objects involved in the clustering process. Data types Distance measure Example Nominal(categorical) D(x,y) = (n-m)/m Here m-number of matches between attributes of x and y Identification number , label number Binary variables D(x,y) = (n-m)/(n-s) Here m-number of matches between attributes of x and y and s is number of features absent in both images Variables indicating occurrence or non occurrence of an event Quantitative measures EUCLEDIAN DISTANCE: D(𝑶𝒊 , 𝑶𝒋 )= ( 𝑶𝒊𝒌 − 𝑶𝒋𝒌 )𝟐 Size , centroid , area MANHATTEN AVERAGE DISTANCE: 𝟏 D(𝑶𝒊 , 𝑶𝒋 )𝑵 𝒏𝒌=𝟏 (𝑶𝒊𝒌 − 𝑶𝒋𝒌 ) MINKOWSKI DISTANCE: D(x,y) =( 𝒙𝟏− 𝒚𝟏 𝒒 𝟏 + 𝒙 𝟐 − 𝒚𝟐 𝒒 + ⋯ + 𝒙 𝒏 − 𝒚𝒏 𝒒 ) 𝒒 Ordinal or ranked variable 𝒓𝒊 − 𝟏 ; 𝑴−𝟏 Where 𝑟𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑟𝑎𝑛𝑘 𝑎𝑛𝑑 𝑀 𝑖𝑠 𝑡ℎ𝑒 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑟𝑎𝑛k If Grades ={S,B,A},inherent order is present as S,B>A Qualitative measure Number of matches Shape number Interval and ratio variables MINKOWSKI DISTANCE: 𝒁𝒊 = D(x,y) =( 𝒙𝟏− 𝒚𝟏 𝒒 + 𝒙 𝟐 − 𝒚𝟐 𝒒 + ⋯ + 𝒙 𝒏 − 𝒚𝒏 𝒒 ) 𝟏 𝒒 When the difference measure is meaningful CLUSTERING ALGORITHMS DIVISIVE METHODS CLUSTERING ALGORITHM AGGLOMERATIVE ALGORITHMS HIERARCHICAL CLUSTERING PARTITIONAL METHODS Divisive methods HIERARCHICAL CLUSTERING Hierarchical methods produce a recursive partition set of objects and the results are shown as Dendogram. These are subdivided as agglomerative methods and divisive methods. The advantages of this method are: There is no need for vector representation for each object. 2. These algorithms are easy to understand and interpret and are simple. 3. They normally yield the correct number of clusters 4. They are helpful in identifying the outliners 1. DENDOGRAM Dendogram for a grayscale image Shown below AGGLOMERATIVE ALGORITHMS These treat each individual object as a cluster. They are then merged with other clusters and this process is continued to ultimately get a single cluster. Stages: 1. Create a separate cluster for every data instance. 2. Repeat the following steps till a single cluster is obtained: 3. Determine the two most similar clusters using similarity measures 4. Merge the two clusters into a single cluster 5. Choose a cluster formed by one of the 2 results as final , if no more merging is possible AGGLOMERATIVE ALGORITHMS(cont.) One of the popular algorithm is : Singlelinkage algorithms It takes a single instance and merges it with a cluster with which it is closer. This process is continued till no more merging is possible PARTITIONAL METHODS: These are greedy approaches that are used iteratively to obtain a single level of partition. These produce locally optimal or suboptimal solutions. One of the popular algorithm is: Kmeans algorithm K-MEANS ALGORITHM The algorithm for K-means algorithm is: The user has to specify the number of clusters initially. 2. Then the algorithm generates the required number of random clusters , called initial cluster centres. 3. It then assigns each point to the clusters if the distance between the point and the cluster is minimum. 4. Then the centroid of the cluster is computed and iteration is used next till there is no change in the centroid value. Otherwise choose a new mean and repeat the process. 1. K-means algorithm K-means cluster evaluation for a matrix : X = [ randn (100,2)+ones(100,2); randn (100,2)-ones(100,2)]; CHARACTERISTICS OF A GOOD CLUSTER Efficiency of the clustering algorithm Ability to handle missing data in the dataset Ability to handle noisy and outlier data Ability to handle different attribute types Scale-invariance Ability to obtain good clusters on all attribute values/methods Consistency. CLUSTER EFFICIENCY MEASURES: Cluster cohesion : Its a measure of how similar the elements are to each other in a cluster Cluster separation : Its a measure to indicate how distinct a cluster is from other clusters METRICES OF CLUSTER EVALUATION metrics measure PURITY 1 (Sum of majority elements of all 𝑇𝑜𝑡𝑎𝑙 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 clusters) PRECISION AND RECALL It measures the extent of a class present in a cluster SIMILARITY BASED MEASURES Contingency table JACCARD COEFFICIENT RAND COEFFICIENT 𝐴 (𝐵 + 𝐶 + 𝐷) 𝐴+𝐵 𝐴+𝐵+𝐶+𝐷 REFERENCES Digital image processing: S. Sridhar Digital image processing : Gonzalez woods & Edd MATHWORKS: http://www.mathworks.in END