Download Extraction of thematic information through image classifications

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Extraction of thematic information
through image classifications
Data vs. Information
Data: A collection of numbers or facts that require further processing before they
are meaningful
Information:
Derived knowledge from raw data. Something that is independently meaningful
1
Information Classes vs. Spectral Classes
Information classes are the categories of interest to the users of
the data, e.g., forest types, wetland categories, agricultural
fields, urban land use, … The classes form the information
that can be derived from remote sensing data. These classes
are not directly recorded on remote sensing images. We can
derive them only indirectly, using the evidence contained in
pixel values recorded by each spectral band of remote sensing
images.
Spectral classes are groups of pixels that are uniform with
respect to the brightness in their spectral channels.
The links between spectral classes and the information classes
are the primary goal of image classification
Land-use and Land-cover Classification Schemes
Land cover refers to the type of material present on the
landscape (e.g., water, sand, crops, forest, wetland, humanmade materials such as asphalt).
Land use refers to what people do on the land surface (e.g.,
agriculture, commerce, settlement).
Obtaining successfully land-use and/or land-cover information
requires the use of a classification scheme containing correct
definitions or description of classes that are organized
according to logical criteria, i.e., classification systems.
2
Land-use and Land-cover Classification Systems
If a hard classification is to be performed, then the classes in
the classification system should be:
• mutually exclusive,
• exhaustive, and
• hierarchical.
Each pixel location corresponds to a certain type of land cover.
Land-use and Land-cover Classification Systems
Mutually exclusive means that there is no taxonomic overlap (or
fuzziness) of any classes (i.e., deciduous forest and evergreen
forest are distinct classes).
Exhaustive means that all land-cover classes present in the
landscape are accounted for and none have been omitted.
Hierarchical means that sublevel classes (e.g., single-family
residential, multiple-family residential) may be hierarchically
combined into a higher- level category (e.g., residential) that
makes sense. This allows simplified thematic maps to be
produced when required.
3
Four Levels of the U.S. Geological Survey
Land-Use/Land-Cover Classification System
for Use with Remote Sensor Data and the type
of remotely sensed data typically used to
provide the information.
4
International Geosphere-Biosphere Program IGBP LandCover Classification System
If a scientist is interested in inventorying land cover at the
regional, national, and global scale, then the modified
International Geosphere-Biosphere Program Land-Cover
Classification System may be appropriate.
The land-cover parameter identifies 17 categories of land-cover
following the IGBP global vegetation database which defines
nine classes of natural vegetation, three classes of developed
lands, two classes of mosaic lands, and three classes of
nonvegetated lands (snow/ice, bare soil/rocks, water).
MODIS Land Cover Data Product
10
5
The Land Cover Classification System (LCCS) is a comprehensive, standardized
classification system, designed to meet specific user requirements, and created
for mapping exercises, independent of the scale or means used to map.
Any land cover identified anywhere in the world can be readily accommodated.
Landsat Working Data (2000)
(30-m Resolution)
6
Land Cover Classification System
(Adapted from U.N. Land Cover Classification System)
7
Observations about Classification Schemes
Geographical information is often imprecise. For example, there
is usually a gradual transition at the interface of forests and
rangeland, yet many of the aforementioned classification
schemes insist on a hard boundary between the classes at this
transition zone. The schemes should contain fuzzy definitions
because the thematic information they contain is fuzzy.
Fuzzy classification schemes are not currently standardized. They
are typically developed by individual researchers for site-specific
projects. The fuzzy classification systems may not be transferable
to other environments.
Therefore, we tend to see the use of existing hard classification
schemes. They continue to be widely employed because they are
scientifically based and different individuals using the same
classification system can compare results.
Multispectral Classification
is the process of sorting pixels into a finite number of
individual classes, or categories of data, based on their data
file values. If a pixel satisfies a certain set of criteria, the
pixel is assigned to the class that corresponds to that criteria.
8
Unsupervised Classification
In an unsupervised classification, the identities of land-cover
types to be specified as classes within a scene are not
generally known a priori because ground reference
information is lacking or surface features within the scene are
not well defined.
The computer is instructed to group pixels with similar
spectral characteristics into unique clusters according to some
statistically determined criteria.
The analyst then re-labels and combines the spectral clusters
into information classes.
Unsupervised Classification
Unsupervised classification is a process of numerical
operations that search for natural groupings of the spectral
properties of pixels, as examined in multispectral feature
space. It is also called clustering.
It requires only a minimal amount of initial input from the
analyst. However, the analyst will have the task of
interpreting the classes that are created by the unsupervised
training algorithm.
9
Unsupervised Classification
Unsupervised classification is the process where
numerical operations are performed that search for natural
groupings of the spectral properties of pixels, as examined in
multispectral feature space.
The clustering process results in a classification map
consisting of m spectral classes. The analyst then attempts a
posteriori (after the fact) to assign or transform the spectral
classes into thematic information classes of interest (e.g.,
forest, agriculture).
This may be difficult. Some spectral clusters may be
meaningless because they represent mixed classes of Earth
surface materials. The analyst must understand the spectral
characteristics of the landscape well enough to be able to label
certain clusters as specific information classes.
ISODATA (Iterative Self-Organizing Data Analysis Technique)
Makes a large number of passes through the remote sensing
dataset until specified results are obtained. It requires the
analyst to specify:
1. the maximum number of clusters to be identified by the
algorithm.
2. the maximum percentage of pixels whose class values are
allowed to be unchanged between iterations. When this
number is reached (e.g., 95%), the algorithm terminates.
10
ISODATA (Iterative Self-Organizing Data Analysis Technique)
Phase 1: ISODATA Cluster Building using many passes through
the dataset.
Phase 2: Assignment of pixels to one of the clusters using
minimum distance to means classification logic.
a) ISODATA initial distribution
of five hypothetical mean
vectors using ±1s standard
deviations in both bands as
beginning and ending points.
b) In the first iteration, each
candidate pixel is compared to
each cluster mean and
assigned to the cluster whose
mean is closest in Euclidean
distance.
c) During the second iteration, a
new mean is calculated for
each cluster based on the
actual spectral locations of the
pixels assigned to each cluster,
instead of the initial arbitrary
calculation.
11
This involves analysis of
several parameters to merge
or split clusters. After the
new cluster mean vectors
are selected, every pixel in
the scene is assigned to one
of the new clusters.
d) This split–merge–assign
process continues until there
is little change in class
assignment between
iterations (the threshold is
reached) or the maximum
number of iterations is
reached.
Example:
Classification an image into 20 clusters
a) Distribution of 20 ISODATA mean
vectors after just one iteration using
Landsat TM band 3 and 4 data. Notice
that the initial mean vectors are
distributed along a diagonal in twodimensional feature space according to
the ±2s standard deviation logic
discussed.
b) Distribution of 20 ISODATA mean
vectors after 20 iterations. The bulk of
the important feature space (the gray
background) is partitioned rather well
after just 20 iterations.
12
Supervised Classification
In a supervised classification, the identity and location of the
land-cover types (e.g., urban, agriculture, or wetland) are
known a priori through a combination of fieldwork,
interpretation of aerial photography, map analysis, and
personal experience.
Supervised Classification
The analyst attempts to locate specific sites in the remotely
sensed data that represent homogeneous examples of known
land-cover types. These areas are commonly referred to as
training sites because the spectral characteristics of these
known areas are used to train the classification algorithm for
eventual land-cover mapping of the remainder of the image.
13
Spatial Resolution
(Pixel size)
Spectral
Resolution
(Bands)
Differences
between
vegetation
classes are often
more distinct in
the Near IR than
visible spectral
bands.
14
Spectral band 1
2
23
3
42
4
Xc =
89
94
Pixel
Location
23
42
89
94
a measurement vector, Xc
Pixel-based classifier operate on each pixel as a single set
of spectral values considered in isolation from its neighbors.
In this example landscape, two
regions within the image differ
with respect to average brightness
but also with respect to “texture” –
the uniformity of pixels within a
neighborhood.
Mean
STD
15
Training
is the process of defining the criteria by which patterns in
image data are recognized for the purpose of classification
Training Sample (Signature)
A set of pixels selected to represent a potential class.
Supervised Training
Any methods of generating signatures for classification, in
which the analyst is directly involved in pattern recognition
process.
Number Pixels for a Training Signature.
The general rule is that if training data are being extracted from
n bands then >10n pixels of training data are collected for each
class. This is sufficient to compute the variance–covariance
matrices required by some classification algorithms.
Training Site Selection and Statistics Extraction
The analyst may view the image on screen and select polygonal
areas of interest (AOI) (e.g., a stand of oak forest). Most image
processing systems use a “rubber band” tool that allows the
analyst to identify detailed AOIs.
The analyst may seed a specific location in the image space
using the cursor. The seed program begins at a single x, y
location and evaluates neighboring pixel values in all bands of
interest. Using criteria specified by the analyst, the seed
algorithm expands outward like an amoeba as long as it finds
pixels with spectral characteristics similar to the original seed
pixel.
16
Training Site Selection and Statistics Extraction
Each pixel in each training site associated with a particular class (c)
is represented by a measurement vector, Xc:
where BVi,j,k is the brightness value for the i,jth pixel in band k.
Training Site Selection and Statistics Extraction
The brightness values for each pixel in each band in each training
class can then be analyzed statistically to yield a mean measurement
vector, Mc, for each class:
where mck represents the mean value of the data obtained for class c
in band k.
17
Training Site Selection and Statistics Extraction
The raw measurement vector can also be analyzed to yield the
covariance matrix for each class c:
where Covckl is the covariance of class c between bands k through l.
For brevity, the notation for the covariance matrix for class c (i.e.,
Vckl) will be shortened to just Vc. The same will be true for the
covariance matrix of class d (i.e., Vdkl = Vd).
Supervised Classification
Multivariate statistical parameters (means, standard deviations,
covariance matrices, correlation matrices, etc.) are calculated
for each training site.
Every pixel both within and outside the training sites is then
evaluated and assigned to the class of which it has the highest
likelihood of being a member.
18
Select the Appropriate Classification Algorithm
Parametric classification algorithms assumes that the observed
measurement vectors Xc obtained for each class in each spectral
band during the training phase of the supervised classification
are Gaussian; that is, they are normally distributed.
Nonparametric classification algorithms make no such
assumption.
Several widely adopted nonparametric classification algorithms:
•
one-dimensional density slicing
•
parallepiped,
•
minimum distance,
•
nearest-neighbor, and
•
neural network and expert system analysis.
The most widely adopted parametric classification algorithms:
• maximum likelihood.
Parallelepiped Classification Algorithm
This is a widely used digital image classification decision rule
based on simple Boolean “and/or” logic.
Training data in n spectral bands are used to perform the
classification.
Brightness values from each pixel of the multispectral imagery
are used to produce an n-dimensional mean vector, Mc = (mck,
mc2, mc3, …, mcn) with mck being the mean value of the training
data obtained for class c in band k out of m possible classes, as
previously defined. sck is the standard deviation of the training
data class c of band k out of m possible classes.
19
Parallelepiped Classification Algorithm
Using a one-standard deviation threshold, a parallelepiped algorithm
decides BVijk is in class c if, and only if:
where
c = 1, 2, 3, …, m, number of classes, and
k = 1, 2, 3, …, n, number of bands.
Therefore, if the low and high decision boundaries are defined as:
and
the parallelepiped algorithm becomes
Points a and b are pixels in the
image to be classified.
Pixel a has a brightness value of
40 in band 4 and 40 in band 5.
Pixel b has a brightness value of
10 in band 4 and 40 in band 5.
The boxes represent the
parallelepiped decision rule
associated with a ±1s
classification.
The vectors (arrows) represent the
distance from a and b to the mean
of all classes in a minimum
distance to means classification
algorithm.
20
Minimum Distance Rule:
(Spectral Distance)
The minimum distance to means
decision rule is computationally
simple. It requires that the user
provide the mean vectors for
each class in each band µck from
the training data. To perform a
minimum distance classification,
a program must calculate the
distance to each mean vector µck
from each unknown pixel (BVijk).
It is possible to calculate this
distance using Euclidean
distance.
21
This decision rule
calculates the spectral
distance between
measurement vector
from the candidate pixel
and the mean vector for
each signature.
Maximum Likelihood Decision Rule (Bayesian Classifier)
The maximum likelihood decision rule assumes that the histograms
of the bands of data have normal distributions. It is based on the
probability that a pixel belongs to a particular class. The basic
equation assumes that these probabilities are equal for all classes.
If the user has a priori knowledge that the probability are not equal
for all classes, the weight factors can be employed to improve the
classification.
Mean
STD
22
Maximum Likelihood Decision Rule (Bayesian Classifier)
The aforementioned classifiers were based primarily on identifying
decision boundaries in feature space based on training class
multispectral distance measurements. The maximum likelihood
decision rule is based on probability.
It assigns each pixel having pattern measurements or features X to
the class i whose units are most probable or likely to have given
rise to feature vector X.
In other words, the probability of a pixel belonging to each of a
predefined set of m classes is calculated, and the pixel is then
assigned to the class for which the probability is the highest.
The maximum likelihood decision rule is one of the most widely
used supervised classification algorithms.
Maximum Likelihood Decision Rule
The maximum likelihood procedure assumes that the training data
statistics for each class in each band are normally distributed
(Gaussian).
Training data with bi- or n-modal histograms in a single band are
not ideal. In such cases the individual modes probably represent
unique classes that should be trained upon individually and labeled
as separate training classes. This should then produce unimodal,
Gaussian training class statistics that fulfill the normal
distribution requirement.
23
For example, consider the hypothetical histogram (data frequency
distribution) of forest training data obtained in band k.
We could choose to store the values contained in this histogram
in the computer, but a more elegant solution is to approximate the
distribution by a normal probability density function (curve), as
shown superimposed on the histogram.
Maximum Likelihood Decision Rule (Bayesian Classifier)
The estimated probability density function for class wi (e.g., forest)
is computed using the equation:
where exp [ ] is e (the base of the natural logarithms) raised to the
computed power, x is one of the brightness values on the x-axis,
is the estimated mean of all the values in the forest training class,
and
is the estimated variance of all the measurements in this
class.
Therefore, we need to store only the mean and variance of each
training class (e.g., forest) to compute the probability function
associated with any of the individual brightness values in it.
24
Maximum Likelihood Decision Rule (Bayesian Classifier)
But what if our training data consists of multiple bands of remote
sensor data for the classes of interest? In this case we compute an
n-dimensional multivariate normal density function using:
where
is the determinant of the covariance matrix,
is the
inverse of the covariance matrix, and
is the transpose of
the vector
. The mean vectors (Mi) and covariance matrix
(Vi) for each class are estimated from the training data.
Maximum Likelihood Classification Without Prior Probability
Information:
Decide unknown measurement vector X is in class i if, and only if,
pi > pj for all i and j out of 1, 2, ... m possible classes and
where Mi is the mean measurement vector for class i and Vi is the
covariance matrix of class i for bands k through l.
Therefore, to assign the measurement vector X of an unknown
pixel to a class, the maximum likelihood decision rule computes
the value pi for each class. Then it assigns the pixel to the class that
has the largest (or maximum) value.
25
What happens when the
probability density functions
of two or more training
classes overlap in feature
space?
For example, consider two
hypothetical normally
distributed probability
density functions associated
with forest and agriculture
training data measured in
bands 1 and 2. In this case,
pixel X would be assigned to
forest because the
probability density of
unknown measurement
vector X is greater for forest
than for agriculture.
Multispectral Classification
Multispectral classification may be performed using a variety of
methods, including:
• algorithms based on parametric and nonparametric statistics
that use ratio- and interval-scaled data and nonmetric
methods that can incorporate nominal scale data;
• the use of supervised or unsupervised classification logic;
• the use of hard or soft (fuzzy) set classification logic to create
hard or fuzzy thematic output products;
• the use of per-pixel or object-oriented classification logic, and
hybrid approaches.
26
Multispectral Classification
Parametric methods such as maximum likelihood
classification and unsupervised clustering assume normally
distributed remote sensor data and knowledge about the forms
of the underlying class density functions.
Nonparametric methods such as minimum distance classifier
and fuzzy classifiers may be applied to remote sensor data
that are not normally distributed and without the assumption
that the forms of the underlying densities are known.
Nonmetric methods such as rule-based decision tree classifiers
can operate on both real-valued data (e.g., reflectance values
from 0 to 100%) and nominal scaled data (e.g., class 1 = forest;
class 2 = agriculture).
Hard vs. Fuzzy Classification
Supervised and unsupervised classification algorithms
typically use hard classification logic to produce a
classification map that consists of hard, discrete categories
(e.g., forest, agriculture).
Most digital image classification was based on processing the
entire scene pixel by pixel. This is commonly referred to as
per-pixel classification.
It is also possible to use fuzzy set classification logic, which
takes into account the heterogeneous and imprecise nature of
the real world.
27
Hard vs. Fuzzy Classification
Object-oriented classification techniques allow the analyst to
decompose the scene into many relatively homogenous image
objects (referred to as patches or segments) using a multiresolution image segmentation process.
The various statistical characteristics of these homogeneous
image objects in the scene are then subjected to traditional
statistical or fuzzy logic classification.
Object-oriented classification based on image segmentation is
often used for the analysis of high-spatial-resolution imagery.
Object-oriented classification based on image segmentation is
often used for the analysis of high-spatial-resolution imagery.
28