Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

no text concepts found

Transcript

Initial Statistical Extraction and Image Quality Assessment Dr. John R. Jensen Department of Geography University of South Carolina Columbia, SC 29208 Jensen, 2003 Image Processing System Considerations The analyst responsible for analyzing the digital remote sensor data must first assess its quality. This is normally performed by: 1. Computing fundamental image statistics and evaluating them to see if there are any unusual anomalies in the image data that might be of concern, and 2. performing a subjective evaluation of the appearance of the remote sensor data. Jensen, 2003 Image Processing Mathematical Notation The following notation will be used to describe the mathematical operations applied to the digital remote sensor data: i = a row (or line) in the imagery j = a column (or sample) in the imagery k = a band of imagery l = another band of imagery n = total number of picture elements (pixels) in an array BVijk = brightness value in a row i, column j, of band k BVik = ith brightness value in band k Jensen, 2003 Image Processing Mathematical Notation BVil = ith brightness value in band l mink = minimum value of band k maxk = maximum value of band k rangek = range of actual brightness values in band k quantk = quantization level of band k (e.g., 28 = 0 to 255; 212 = 0 to 4095) µk = mean of band k vark = variance of band k sk = standard deviation of band k Jensen, 2003 Image Processing Mathematical Notation skewnessk = skewness of a band k distribution kurtosisk = kurtosis of a band k distribution covkl = covariance between pixel values in two bands, k and l rkl = correlation between pixel values in two bands, k and l Xc = measurement vector for class c composed of brightness values (BVijk) from row i, column j, and band k Jensen, 2003 Image Processing Mathematical Notation Mc = mean vector for class c Md = mean vector for class d µck = mean value of the data in class c, band k sck = standard deviation of the data in class c, band k vckl = covariance matrix of class c for bands k through l; shown as Vc vdkl = covariance matrix of class d for bands k through l; shown as Vd Jensen, 2003 Remote Sensing Sampling Theory A population is an infinite or finite set of elements. An infinite population could be all possible images that might be acquired of the Earth in 2001. All Landsat 7 ETM+ images of Charleston, S.C. in 2001 is a finite population. A sample is a subset of the elements taken from a population used to make inferences about certain characteristics of the population. For example, we might decide to analyze a June 1, 2001, Landsat image of Charleston. If observations with certain characteristics are systematically excluded from the sample either deliberately or inadvertently (such as selecting images obtained only in the spring of the year), it is a biased sample. Sampling error is the difference between the true value of a population characteristic and the value of that characteristic inferred from a sample. Remote Sensing Sampling Theory • Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around some central value, and the frequency of occurrence declines away from this central point. A graph of the distribution appears bell shaped and is called a normal distribution. • Many statistical tests used in the analysis of remotely sensed data assume that the brightness values recorded in a scene are normally distributed. Unfortunately, remotely sensed data may not be normally distributed and the analyst must be careful to identify such conditions. In such instances, nonparametric statistical theory may be preferred. Jensen, 2003 Common Symmetric and Skewed Distributions in Remotely Sensed Data Jensen, 2003 Remote Sensing Sampling Theory • The histogram is a useful graphic representation of the information content of a remotely sensed image. •It is instructive to review how a histogram of a single band of imagery, k, composed of i rows and j columns with a brightness value BVijk at each pixel location is constructed. Jensen, 2003 Histogram of A Single Band of Landsat Thematic Mapper Data of Charleston, SC Jensen, 2003 Histogram of Thermal Infrared Imagery of a Thermal Plume in the Savannah River Jensen, 2003 Remote Sensing Metadata Metadata is “data or information about data”. Most quality digital image processing systems read, collect, and store metadata about a particular image or sub-image. It is important that the image analyst have access to this metadata information. In the most fundamental instance, metadata might include: the file name, date of last modification, level of quantization (e.g, 8-bit), number of rows and columns, number of bands, univariate statistics (minimum, maximum, mean, median, mode, standard deviation), perhaps some multivariate statistics, geo-referencing performed (if any), and pixel size. Jensen, 2003 Viewing Individual Pixels Viewing individual pixel brightness values in a remotely sensed image is one of the most useful methods for assessing the quality and information content of the data. Virtually all digital image processing systems allow the analyst to: 1. use a mouse-controlled cursor (cross-hair) to identify a geographic location in the image (at a particular row and column or geographic x,y coordinate) and display its brightness value in n bands, 2. display the individual brightness values of an individual band in a matrix (raster) format. Jensen, 2003 Cursor and Raster Display of Brightness Values Jensen, 2003 Individual Pixel Display of Brightness Values Jensen, 2003 Raster Display of Brightness Values Jensen, 2003 ThreeDimensional Evaluation of Pixel Brightness Values within a Geographic Area Jensen, 2003 Remote Sensing Univariate Statistics The mean of a single band of imagery composed of n brightness values (BVik) is computed using the formula: n mk BV i 1 ik n The sample mean, mk, is an unbiased estimate of the population mean. For symmetrical distributions, the sample mean tends to be closer to the population mean than any other unbiased estimate (such as the median or mode). Unfortunately, the sample mean is a poor measure of central tendency when the set of observations is skewed or contains an extreme value. Jensen, 2003 Sample Hypothetical Dataset of Brightness Values Pixel Band 1 (green) Band 2 (red) Band 3 (nearinfrared) Band 4 (nearinfrared) (1,1) 130 57 180 205 (1,2) 165 35 215 255 (1,3) 100 25 135 195 (1,4) 135 50 200 220 (1,5) 145 65 205 235 Jensen, 2003 Univariate Statistics for the Hypothetical Sample Dataset Band 1 (green) Band 2 (red) Band 3 (nearinfrared) Band 4 (nearinfrared) Mean (mk) 135 46.40 187 222 Variance (vark) 562.50 264.80 1007 570 Standard deviation (sk) 23.71 16.27 31.4 23.87 Minimum (mink) 100 25 135 195 Maximum (maxk) 165 65 215 255 Range (BVr) 65 40 80 60 Jensen, 2003 Remote Sensing Univariate Statistics - Variance The variance of a sample is the average squared deviation of all possible observations from the sample mean. The variance of a band of imagery, vark, is computed using the equation: n vark BV i 1 mk 2 ik n The numerator of the expression is the corrected sum of squares (SS). If the sample mean (mk) were actually the population mean, this would be an accurate measurement of the variance. Jensen, 2003 Remote Sensing Univariate Statistics Unfortunately, there is some underestimation because the sample mean was calculated in a manner that minimized the squared deviations about it. Therefore, the denominator of the variance equation is reduced to n – 1, producing a larger, unbiased estimate of the sample variance; SS vark n 1 Jensen, 2003 Univariate Statistics for the Hypothetical Sample Dataset Band 1 (green) Band 2 (red) Band 3 (nearinfrared) Band 4 (nearinfrared) Mean (mk) 135 46.40 187 222 Variance (vark) 562.50 264.80 1007 570 Standard deviation (sk) 23.71 16.27 31.4 23.87 Minimum (mink) 100 25 135 195 Maximum (maxk) 165 65 215 255 Range (BVr) 65 40 80 60 Jensen, 2003 Remote Sensing Univariate Statistics The standard deviation is the positive square root of the variance. The standard deviation of the pixel brightness values in a band of imagery, sk, is computed as sk k vark Jensen, 2003 Jensen, 2003V Univariate Statistics for the Hypothetical Sample Dataset Band 1 (green) Band 2 (red) Band 3 (nearinfrared) Band 4 (nearinfrared) Mean (mk) 135 46.40 187 222 Variance (vark) 562.50 264.80 1007 570 Standard deviation (sk) 23.71 16.27 31.4 23.87 Minimum (mink) 100 25 135 195 Maximum (maxk) 165 65 215 255 Range (BVr) 65 40 80 60 Jensen, 2003 Remote Sensing Univariate Statistics Skewness is a measure of the asymmetry of a histogram and is computed using the formula BVik m k sk i 1 skewnessk n n 3 Jensen, 2003 Remote Sensing Univariate Statistics A histogram may be symmetric but have a peak that is very sharp or one that is subdued when compared with a perfectly normal distribution. A perfectly normal distribution (histogram) has zero kurtosis. The greater the positive kurtosis value, the sharper the peak in the distribution when compared with a normal histogram. Conversely, a negative kurtosis value suggests that the peak in the histogram is less sharp than that of a normal distribution. Kurtosis is computed using the formula 1 n BV m k kurtosisk ik sk n i 1 4 3 Jensen, 2003 Remote Sensing Multivariate Statistics The different remote-sensing-derived spectral measurements for each pixel often change together in some predictable fashion. If there is no relationship between the brightness value in one band and that of another for a given pixel, the values are mutually independent; that is, an increase or decrease in one band’s brightness value is not accompanied by a predictable change in another band’s brightness value. Because spectral measurements of individual pixels may not be independent, some measure of their mutual interaction is needed. This measure, called the covariance, is the joint variation of two variables about their common mean. Jensen, 2003 Remote Sensing Multivariate Statistics To calculate covariance, we first compute the corrected sum of products (SP) defined by the equation n SPkl BVik m k BVil m l i 1 Jensen, 2003 Remote RemoteSensing SensingUnivariate Multivariate Statistics Statistics It is computationally more efficient to use the following formula to arrive at the same result: n n SPkl BVik BVil i 1 n BV BV i 1 ik i 1 il n This quantity is called the uncorrected sum of products. Jensen, 2003 Remote Sensing Multivariate Statistics Just as simple variance was calculated by dividing the corrected sums of squares (SS) by (n – 1), covariance is calculated by dividing SP by (n – 1). Therefore, the covariance between brightness values in bands k and l, covkl, is equal to SPkl cov kl n 1 Jensen, 2003 Format of a Variance-Covariance Matrix Band 1 (green) Band 2 (red) Band 3 (nearinfrared) Band 4 (nearinfrared) Band 1 SS1 cov1,2 cov1,3 cov1,4 Band 2 cov2,1 SS2 cov2,3 cov2,4 Band 3 cov3,1 cov3,2 SS3 cov3,4 Band 4 cov4,1 cov4,2 cov4,3 SS4 Jensen, 2003 Computation of Variance-Covariance Between Bands 1 and 2 of the Sample Data Band 1 (Band 1 x Band 2) Band 2 130 7,410 57 165 5,775 35 100 2,500 25 135 6,750 50 145 9,425 65 675 31,860 232 SP12 (31,860) cov12 675232 540 135 4 5 Jensen, 2003 Variance-Covariance Matrix of the Sample Data Band 1 (green) Band 2 (red) Band 3 (nearinfrared) Band 4 (nearinfrared) Band 1 562.25 - - - Band 2 135 264.80 - - Band 3 718.75 275.25 1007.50 - Band 4 537.50 64 663.75 570 Jensen, 2003 Remote Sensing Multivariate Statistics To estimate the degree of interrelation between variables in a manner not influenced by measurement units, the correlation coefficient, r, is commonly used. The correlation between two bands of remotely sensed data, rkl, is the ratio of their covariance (covkl) to the product of their standard deviations (sksl); thus: cov kl rkl s k sl Jensen, 2003 Correlation Matrix for the Sample Data Band 1 (green) Band 2 (red) Band 3 (nearinfrared) Band 4 (nearinfrared) Band 1 - - - - Band 2 0.35 - - - Band 3 0.95 0.53 - - Band 4 0.94 0.16 0.87 Jensen, 2003 Band 1 2 3 4 5 6 7 Min 51 17 14 5 0 0 102 Max 242 115 131 105 193 128 124 Mean Standard Deviation 65.163137 10.231356 25.797593 5.956048 23.958016 8.469890 26.550666 15.690054 32.014001 24.296417 15.103553 12.738188 110.734372 4.305065 Covariance Matrix Band Band 1 Band 2 1 104.680654 58.797907 2 58.797907 35.474507 3 82.602381 48.644220 4 69.603136 45.539546 5 142.947000 90.661412 6 94.488082 57.877406 7 24.464596 14.812886 Correlation Matrix Band Band 1 Band 2 1 1.000000 0.964874 2 0.964874 1.000000 3 0.953195 0.964263 4 0.433582 0.487311 5 0.575042 0.626501 6 0.724997 0.762857 7 0.555425 0.577699 Band 3 82.602381 48.644220 71.739034 76.954037 149.566052 91.234270 23.827418 Band 3 0.953195 0.964263 1.000000 0.579068 0.726797 0.845615 0.653461 Band 4 69.603136 45.539546 76.954037 246.177785 342.523400 157.655947 46.815767 Band 4 0.433582 0.487311 0.579068 1.000000 0.898511 0.788821 0.693087 Univariate and Multivariate Statistics of Landsat TM Data of Charleston, SC Band 5 142.947000 90.661412 149.566052 342.523400 590.315858 294.019002 82.994241 Band 5 0.575042 0.626501 0.726797 0.898511 1.000000 0.950004 0.793462 Band 6 0.724997 0.762857 0.845615 0.788821 0.950004 1.000000 0.814648 Band 6 94.488082 57.877406 91.234270 157.655947 294.019002 162.261439 44.674247 Band 7 24.464596 14.812886 23.827418 46.815767 82.994241 44.674247 18.533586 Band 7 0.555425 0.577699 0.653461 0.693087 0.793462 0.814648 1.000000 Jensen, 2003 3-Dimensional View of the Thermal Infrared Matrix of Data Jensen, 2003 Two-dimensional Feature Space Plot of TM Bands 3 and 4 Jensen, 2003