Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Extraction of thematic information through image classifications Data vs. Information Data: A collection of numbers or facts that require further processing before they are meaningful Information: Derived knowledge from raw data. Something that is independently meaningful 1 Information Classes vs. Spectral Classes Information classes are the categories of interest to the users of the data, e.g., forest types, wetland categories, agricultural fields, urban land use, … The classes form the information that can be derived from remote sensing data. These classes are not directly recorded on remote sensing images. We can derive them only indirectly, using the evidence contained in pixel values recorded by each spectral band of remote sensing images. Spectral classes are groups of pixels that are uniform with respect to the brightness in their spectral channels. The links between spectral classes and the information classes are the primary goal of image classification Land-use and Land-cover Classification Schemes Land cover refers to the type of material present on the landscape (e.g., water, sand, crops, forest, wetland, humanmade materials such as asphalt). Land use refers to what people do on the land surface (e.g., agriculture, commerce, settlement). Obtaining successfully land-use and/or land-cover information requires the use of a classification scheme containing correct definitions or description of classes that are organized according to logical criteria, i.e., classification systems. 2 Land-use and Land-cover Classification Systems If a hard classification is to be performed, then the classes in the classification system should be: • mutually exclusive, • exhaustive, and • hierarchical. Each pixel location corresponds to a certain type of land cover. Land-use and Land-cover Classification Systems Mutually exclusive means that there is no taxonomic overlap (or fuzziness) of any classes (i.e., deciduous forest and evergreen forest are distinct classes). Exhaustive means that all land-cover classes present in the landscape are accounted for and none have been omitted. Hierarchical means that sublevel classes (e.g., single-family residential, multiple-family residential) may be hierarchically combined into a higher- level category (e.g., residential) that makes sense. This allows simplified thematic maps to be produced when required. 3 Four Levels of the U.S. Geological Survey Land-Use/Land-Cover Classification System for Use with Remote Sensor Data and the type of remotely sensed data typically used to provide the information. 4 International Geosphere-Biosphere Program IGBP LandCover Classification System If a scientist is interested in inventorying land cover at the regional, national, and global scale, then the modified International Geosphere-Biosphere Program Land-Cover Classification System may be appropriate. The land-cover parameter identifies 17 categories of land-cover following the IGBP global vegetation database which defines nine classes of natural vegetation, three classes of developed lands, two classes of mosaic lands, and three classes of nonvegetated lands (snow/ice, bare soil/rocks, water). MODIS Land Cover Data Product 10 5 The Land Cover Classification System (LCCS) is a comprehensive, standardized classification system, designed to meet specific user requirements, and created for mapping exercises, independent of the scale or means used to map. Any land cover identified anywhere in the world can be readily accommodated. Landsat Working Data (2000) (30-m Resolution) 6 Land Cover Classification System (Adapted from U.N. Land Cover Classification System) 7 Observations about Classification Schemes Geographical information is often imprecise. For example, there is usually a gradual transition at the interface of forests and rangeland, yet many of the aforementioned classification schemes insist on a hard boundary between the classes at this transition zone. The schemes should contain fuzzy definitions because the thematic information they contain is fuzzy. Fuzzy classification schemes are not currently standardized. They are typically developed by individual researchers for site-specific projects. The fuzzy classification systems may not be transferable to other environments. Therefore, we tend to see the use of existing hard classification schemes. They continue to be widely employed because they are scientifically based and different individuals using the same classification system can compare results. Multispectral Classification is the process of sorting pixels into a finite number of individual classes, or categories of data, based on their data file values. If a pixel satisfies a certain set of criteria, the pixel is assigned to the class that corresponds to that criteria. 8 Unsupervised Classification In an unsupervised classification, the identities of land-cover types to be specified as classes within a scene are not generally known a priori because ground reference information is lacking or surface features within the scene are not well defined. The computer is instructed to group pixels with similar spectral characteristics into unique clusters according to some statistically determined criteria. The analyst then re-labels and combines the spectral clusters into information classes. Unsupervised Classification Unsupervised classification is a process of numerical operations that search for natural groupings of the spectral properties of pixels, as examined in multispectral feature space. It is also called clustering. It requires only a minimal amount of initial input from the analyst. However, the analyst will have the task of interpreting the classes that are created by the unsupervised training algorithm. 9 Unsupervised Classification Unsupervised classification is the process where numerical operations are performed that search for natural groupings of the spectral properties of pixels, as examined in multispectral feature space. The clustering process results in a classification map consisting of m spectral classes. The analyst then attempts a posteriori (after the fact) to assign or transform the spectral classes into thematic information classes of interest (e.g., forest, agriculture). This may be difficult. Some spectral clusters may be meaningless because they represent mixed classes of Earth surface materials. The analyst must understand the spectral characteristics of the landscape well enough to be able to label certain clusters as specific information classes. ISODATA (Iterative Self-Organizing Data Analysis Technique) Makes a large number of passes through the remote sensing dataset until specified results are obtained. It requires the analyst to specify: 1. the maximum number of clusters to be identified by the algorithm. 2. the maximum percentage of pixels whose class values are allowed to be unchanged between iterations. When this number is reached (e.g., 95%), the algorithm terminates. 10 ISODATA (Iterative Self-Organizing Data Analysis Technique) Phase 1: ISODATA Cluster Building using many passes through the dataset. Phase 2: Assignment of pixels to one of the clusters using minimum distance to means classification logic. a) ISODATA initial distribution of five hypothetical mean vectors using ±1s standard deviations in both bands as beginning and ending points. b) In the first iteration, each candidate pixel is compared to each cluster mean and assigned to the cluster whose mean is closest in Euclidean distance. c) During the second iteration, a new mean is calculated for each cluster based on the actual spectral locations of the pixels assigned to each cluster, instead of the initial arbitrary calculation. 11 This involves analysis of several parameters to merge or split clusters. After the new cluster mean vectors are selected, every pixel in the scene is assigned to one of the new clusters. d) This split–merge–assign process continues until there is little change in class assignment between iterations (the threshold is reached) or the maximum number of iterations is reached. Example: Classification an image into 20 clusters a) Distribution of 20 ISODATA mean vectors after just one iteration using Landsat TM band 3 and 4 data. Notice that the initial mean vectors are distributed along a diagonal in twodimensional feature space according to the ±2s standard deviation logic discussed. b) Distribution of 20 ISODATA mean vectors after 20 iterations. The bulk of the important feature space (the gray background) is partitioned rather well after just 20 iterations. 12 Supervised Classification In a supervised classification, the identity and location of the land-cover types (e.g., urban, agriculture, or wetland) are known a priori through a combination of fieldwork, interpretation of aerial photography, map analysis, and personal experience. Supervised Classification The analyst attempts to locate specific sites in the remotely sensed data that represent homogeneous examples of known land-cover types. These areas are commonly referred to as training sites because the spectral characteristics of these known areas are used to train the classification algorithm for eventual land-cover mapping of the remainder of the image. 13 Spatial Resolution (Pixel size) Spectral Resolution (Bands) Differences between vegetation classes are often more distinct in the Near IR than visible spectral bands. 14 Spectral band 1 2 23 3 42 4 Xc = 89 94 Pixel Location 23 42 89 94 a measurement vector, Xc Pixel-based classifier operate on each pixel as a single set of spectral values considered in isolation from its neighbors. In this example landscape, two regions within the image differ with respect to average brightness but also with respect to “texture” – the uniformity of pixels within a neighborhood. Mean STD 15 Training is the process of defining the criteria by which patterns in image data are recognized for the purpose of classification Training Sample (Signature) A set of pixels selected to represent a potential class. Supervised Training Any methods of generating signatures for classification, in which the analyst is directly involved in pattern recognition process. Number Pixels for a Training Signature. The general rule is that if training data are being extracted from n bands then >10n pixels of training data are collected for each class. This is sufficient to compute the variance–covariance matrices required by some classification algorithms. Training Site Selection and Statistics Extraction The analyst may view the image on screen and select polygonal areas of interest (AOI) (e.g., a stand of oak forest). Most image processing systems use a “rubber band” tool that allows the analyst to identify detailed AOIs. The analyst may seed a specific location in the image space using the cursor. The seed program begins at a single x, y location and evaluates neighboring pixel values in all bands of interest. Using criteria specified by the analyst, the seed algorithm expands outward like an amoeba as long as it finds pixels with spectral characteristics similar to the original seed pixel. 16 Training Site Selection and Statistics Extraction Each pixel in each training site associated with a particular class (c) is represented by a measurement vector, Xc: where BVi,j,k is the brightness value for the i,jth pixel in band k. Training Site Selection and Statistics Extraction The brightness values for each pixel in each band in each training class can then be analyzed statistically to yield a mean measurement vector, Mc, for each class: where mck represents the mean value of the data obtained for class c in band k. 17 Training Site Selection and Statistics Extraction The raw measurement vector can also be analyzed to yield the covariance matrix for each class c: where Covckl is the covariance of class c between bands k through l. For brevity, the notation for the covariance matrix for class c (i.e., Vckl) will be shortened to just Vc. The same will be true for the covariance matrix of class d (i.e., Vdkl = Vd). Supervised Classification Multivariate statistical parameters (means, standard deviations, covariance matrices, correlation matrices, etc.) are calculated for each training site. Every pixel both within and outside the training sites is then evaluated and assigned to the class of which it has the highest likelihood of being a member. 18 Select the Appropriate Classification Algorithm Parametric classification algorithms assumes that the observed measurement vectors Xc obtained for each class in each spectral band during the training phase of the supervised classification are Gaussian; that is, they are normally distributed. Nonparametric classification algorithms make no such assumption. Several widely adopted nonparametric classification algorithms: • one-dimensional density slicing • parallepiped, • minimum distance, • nearest-neighbor, and • neural network and expert system analysis. The most widely adopted parametric classification algorithms: • maximum likelihood. Parallelepiped Classification Algorithm This is a widely used digital image classification decision rule based on simple Boolean “and/or” logic. Training data in n spectral bands are used to perform the classification. Brightness values from each pixel of the multispectral imagery are used to produce an n-dimensional mean vector, Mc = (mck, mc2, mc3, …, mcn) with mck being the mean value of the training data obtained for class c in band k out of m possible classes, as previously defined. sck is the standard deviation of the training data class c of band k out of m possible classes. 19 Parallelepiped Classification Algorithm Using a one-standard deviation threshold, a parallelepiped algorithm decides BVijk is in class c if, and only if: where c = 1, 2, 3, …, m, number of classes, and k = 1, 2, 3, …, n, number of bands. Therefore, if the low and high decision boundaries are defined as: and the parallelepiped algorithm becomes Points a and b are pixels in the image to be classified. Pixel a has a brightness value of 40 in band 4 and 40 in band 5. Pixel b has a brightness value of 10 in band 4 and 40 in band 5. The boxes represent the parallelepiped decision rule associated with a ±1s classification. The vectors (arrows) represent the distance from a and b to the mean of all classes in a minimum distance to means classification algorithm. 20 Minimum Distance Rule: (Spectral Distance) The minimum distance to means decision rule is computationally simple. It requires that the user provide the mean vectors for each class in each band µck from the training data. To perform a minimum distance classification, a program must calculate the distance to each mean vector µck from each unknown pixel (BVijk). It is possible to calculate this distance using Euclidean distance. 21 This decision rule calculates the spectral distance between measurement vector from the candidate pixel and the mean vector for each signature. Maximum Likelihood Decision Rule (Bayesian Classifier) The maximum likelihood decision rule assumes that the histograms of the bands of data have normal distributions. It is based on the probability that a pixel belongs to a particular class. The basic equation assumes that these probabilities are equal for all classes. If the user has a priori knowledge that the probability are not equal for all classes, the weight factors can be employed to improve the classification. Mean STD 22 Maximum Likelihood Decision Rule (Bayesian Classifier) The aforementioned classifiers were based primarily on identifying decision boundaries in feature space based on training class multispectral distance measurements. The maximum likelihood decision rule is based on probability. It assigns each pixel having pattern measurements or features X to the class i whose units are most probable or likely to have given rise to feature vector X. In other words, the probability of a pixel belonging to each of a predefined set of m classes is calculated, and the pixel is then assigned to the class for which the probability is the highest. The maximum likelihood decision rule is one of the most widely used supervised classification algorithms. Maximum Likelihood Decision Rule The maximum likelihood procedure assumes that the training data statistics for each class in each band are normally distributed (Gaussian). Training data with bi- or n-modal histograms in a single band are not ideal. In such cases the individual modes probably represent unique classes that should be trained upon individually and labeled as separate training classes. This should then produce unimodal, Gaussian training class statistics that fulfill the normal distribution requirement. 23 For example, consider the hypothetical histogram (data frequency distribution) of forest training data obtained in band k. We could choose to store the values contained in this histogram in the computer, but a more elegant solution is to approximate the distribution by a normal probability density function (curve), as shown superimposed on the histogram. Maximum Likelihood Decision Rule (Bayesian Classifier) The estimated probability density function for class wi (e.g., forest) is computed using the equation: where exp [ ] is e (the base of the natural logarithms) raised to the computed power, x is one of the brightness values on the x-axis, is the estimated mean of all the values in the forest training class, and is the estimated variance of all the measurements in this class. Therefore, we need to store only the mean and variance of each training class (e.g., forest) to compute the probability function associated with any of the individual brightness values in it. 24 Maximum Likelihood Decision Rule (Bayesian Classifier) But what if our training data consists of multiple bands of remote sensor data for the classes of interest? In this case we compute an n-dimensional multivariate normal density function using: where is the determinant of the covariance matrix, is the inverse of the covariance matrix, and is the transpose of the vector . The mean vectors (Mi) and covariance matrix (Vi) for each class are estimated from the training data. Maximum Likelihood Classification Without Prior Probability Information: Decide unknown measurement vector X is in class i if, and only if, pi > pj for all i and j out of 1, 2, ... m possible classes and where Mi is the mean measurement vector for class i and Vi is the covariance matrix of class i for bands k through l. Therefore, to assign the measurement vector X of an unknown pixel to a class, the maximum likelihood decision rule computes the value pi for each class. Then it assigns the pixel to the class that has the largest (or maximum) value. 25 What happens when the probability density functions of two or more training classes overlap in feature space? For example, consider two hypothetical normally distributed probability density functions associated with forest and agriculture training data measured in bands 1 and 2. In this case, pixel X would be assigned to forest because the probability density of unknown measurement vector X is greater for forest than for agriculture. Multispectral Classification Multispectral classification may be performed using a variety of methods, including: • algorithms based on parametric and nonparametric statistics that use ratio- and interval-scaled data and nonmetric methods that can incorporate nominal scale data; • the use of supervised or unsupervised classification logic; • the use of hard or soft (fuzzy) set classification logic to create hard or fuzzy thematic output products; • the use of per-pixel or object-oriented classification logic, and hybrid approaches. 26 Multispectral Classification Parametric methods such as maximum likelihood classification and unsupervised clustering assume normally distributed remote sensor data and knowledge about the forms of the underlying class density functions. Nonparametric methods such as minimum distance classifier and fuzzy classifiers may be applied to remote sensor data that are not normally distributed and without the assumption that the forms of the underlying densities are known. Nonmetric methods such as rule-based decision tree classifiers can operate on both real-valued data (e.g., reflectance values from 0 to 100%) and nominal scaled data (e.g., class 1 = forest; class 2 = agriculture). Hard vs. Fuzzy Classification Supervised and unsupervised classification algorithms typically use hard classification logic to produce a classification map that consists of hard, discrete categories (e.g., forest, agriculture). Most digital image classification was based on processing the entire scene pixel by pixel. This is commonly referred to as per-pixel classification. It is also possible to use fuzzy set classification logic, which takes into account the heterogeneous and imprecise nature of the real world. 27 Hard vs. Fuzzy Classification Object-oriented classification techniques allow the analyst to decompose the scene into many relatively homogenous image objects (referred to as patches or segments) using a multiresolution image segmentation process. The various statistical characteristics of these homogeneous image objects in the scene are then subjected to traditional statistical or fuzzy logic classification. Object-oriented classification based on image segmentation is often used for the analysis of high-spatial-resolution imagery. Object-oriented classification based on image segmentation is often used for the analysis of high-spatial-resolution imagery. 28