Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
INTRODUCTION An image is a 2-D function ( f(x,y) ) from the spatial coordinates x and y to the intensity or the gray value of that point in space. Each point (x,y) in an image is called a pixel. If the image is colored then there are three different components in the intensity representation of the point, i.e. red, blue, green. In a monochromatic image the three components are equal for every point. If the domain and the range of the image function are discrete then the image is called a digital image Digital image processing is the field that deals with the processing of the digital images with the help of a computer. It comprises of the procedures like the reducing noise, contrast enhancement, image sharpening and smoothing, segmentation, description of objects, classification of objects, etc. The quality of the satellite image is highly dependent on various natural phenomenon like the atmospheric conditions, illumination due to the sun and various artificial sources and also conditions like the position of the satellite when the image was taken, etc. Image classification is the process of making quantitative decisions from image data, grouping pixels or regions of the image into classes intended to represent different physical objects or types. The output of the classification process may be regarded as a thematic map rather than an image. The majority of the classification techniques use mainly the radiometric data (pixel value) present in the image with little or no reference to the spatial variation. Suppose we have an n-band image, and the pixel value in each band can take k different values. The number of the possible coordinates in the n-dimensional pixel value space is kn, a number that can very easily exceed a million. However it is very unlikely that the image represents a million or more different classes of data, or that we could make use of information if we did. What we require is some simplification of the data in the n-dimensional pixel value space, identifying a volume within this space as representing a single class of data. In our project we will take up the first step to unsupervised classification that is Clustering of the image data in which the entire image is analyzed without reference to any training data. The aim of the analysis is to identify distinguishable clusters of data in the n-dimensional pixel value space. The clustering can be further used in image classifications. Multi Spectral Classification: In this we mainly have: Supervised Classification Unsupervised Classification Hybrid Classification Supervised Classification: In this type of classification the image analyst supervises the pixel categorization process by specifying to the computer algorithm, numerical descriptors of the various land cover types present in a scene. Unsupervised Classification: In this type of classification the image data is first classified by aggregating them into the natural spectral groupings or clusters, present in the scene. Then the image analyst determines the land cover identity of these spectral groups by comparing the classified image data to ground reference data. Hybrid Classification: This type of classification involves aspects of both supervised and the unsupervised classification and are aimed at improving the accuracy or efficiency (or both) of the classification process. Unsupervised Classification This family of classifiers involves algorithms that examine the unknown pixels in an image and aggregate them into a number of classes based on the natural groupings or clusters present in the image values. The basic premise is that values within a given cover type should be close together in the measurement space, whereas data in different classes should be comparatively well separated. The classes that result from unsupervised classification are spectral classes. Because they are based solely on natural groupings in the image values, the identity of the spectral classes will not be initially known. The analyst must compare the classified data with some form of the reference data to determine the identity and informational value of the spectral class. Thus in unsupervised approach we determine spectrally separable classes and then define their informational utility. There are numerous clustering algorithms that can be used to determine the natural spectral grouping present in a data set. One common form of clustering is the process, in which the program reads through the entire data set and builds clusters. There is a mean vector associated with each cluster. A minimum distance classification to means algorithm is applied on a pixel-by-pixel basis where each pixel is assigned to the clusters initially created. Therefore we will create cluster structures to be used by the classifier. The first step in an unsupervised classification is to cluster the image data and we implemented this basic clustering approach in turbo C. Clustering Techniques There are many clustering algorithms available: a) Clustering Method b) Non Hierarchical clustering Method: Nearest Centroid Sorting-fixed number of clusters Forgy’s Method and Jancy’s vacant Macqueen’s K-Means Methods and variant Nearest Centroid Sorting-variable number of clusters Macqueen’s K-Means Methods with coarsening and refining parameters. Wishart’s variant on K-means. Isodata Method. c) Hierarchical clustering Methods: The central Agglomerative procedure Stored Matrix Approach Stored Data Approach Sorted Matrix Approach Parks Clustering Program Cluster Analysis a) Need for Cluster Analysis Algorithm Even though little or nothing about the category structure can be stated in advance, one frequently has atleast some latent notions of the desirable and unacceptable features for a classification scheme. In operational terms the analyst usually is informed sufficiently about the problem that he can distinguish about between good and bad category structures when confronted with them. The number of ways of sorting ‘n’ observations into ‘m’ groups is a stirring number of the second kind. k=m S(m)n =1/m!(-1)m-k k=0 m Ck kn It would take an inordinately long period of time to examine so many alternatives and the ability to make meaningful distinctions between cases would diminish rapidly. It is generally the intent of the cluster analysis algorithm to emulate some human efficiency and find an acceptable solution while considering only a small number of the alternatives. b) Uses of Cluster Analysis Cluster analysis has been employed as an effective tool in scientific inquiry. One of its most useful roles is to generate hypotheses about category structures. An algorithm can assemble observations into groups which prior misconceptions and ignorance would otherwise preclude. The result of cluster analysis can contribute directly to the development of classification schemes. In more theorectical vein, cluster analysis can be used to develop inductive generalizations. c) Clustering Criteria The terms cluster is often left undefined and taken as a primitive notion in much the same manner as “point” is treated in geometry. But when it comes to finding clusters in real image data, the term bears a definite meaning. The choice of the clustering criterion is tantamount to defining a cluster. It may not be possible to say what a cluster is in abstract terms but it can always be defined constructively through statement of the criterion and implementing algorithm. Many criteria for clustering have been proposed and used. In some problem, there is a natural choice while in others almost any criterion might have status as the candidate. Problem Statement Given: Samples of multi-spectral satellite images from IRS satellites. Problem: To identify distinguishable clusters of data in an n-dimensional pixel value image. Result: Different Clusters obtained Clustering Algorithm The analyst may be required to supply four types of information: R, a radius in spectral space used to determine when a new cluster should be formed. C, a spectral space distance parameter used when merging clusters. N, the number of pixels to be evaluated between each merging of the clusters. Cmax the maximum number of clusters to be identified by the algorithm. The multispectral data set, is sequentially evaluated pixel by pixel from left to right. Firstly, we let the brightness value associated with the first pixel represents the mean data vector of a cluster. It is an ndimensional mean data vector with n being the number of bands used in the unsupervised classification. Pixel1 is considered as cluster1, Pixel2 is considered as cluster2. Spectral distance(D) between cluster1 and cluster2 is calculated If the spectral distance between cluster1 and cluster2 is more than R, cluster2 remains cluster2. If the spectral distance between cluster1 and cluster2 is less than R, then the mean data vector of cluster1 becomes the average of the first and second pixel brightness values and the weight of the cluster1 becomes 2. This cluster accumulation continues until the number of pixels evaluated is greater than N. At that point, the program stops evaluating the individual pixels and looks closely at the nature of the clusters obtained so far. It calculates the distance between each cluster and every other cluster. Any two clusters separated by a distance less than C are merged. After merging a new cluster is obtained whose mean vector is the weighted average of the two original clusters and the weight is the sum of the two individual weights. This process continues until there are no clusters with a separation distance of less than C. It is necessary to evaluate the location of the clusters and combine some clusters. If the number of clusters formed is greater than Cmax it doesnot form new clusters but uses minimum distance to means algorithm to classify all the pixels in one of the Cmax clusters, The analyst usually produces a display depicting to which cluster each pixel was assigned.