* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
DATA MINING RELATIONSHIPS AMONG URBAN SOCIOECONOMIC, LAND COVER, AND REMOTELY SENSED ECOLOGICAL DATA Jeremy Mennis*, Carol, Wessman, and Nancy Golubiewski**, *Department of Geography and **Department of Ecology and Evolutionary Biology, University of Colorado Objective and Motivation Results Analyzing socioeconomic-vegetation relations in the context of urban growth contributes to an understanding of the role of urban regions in carbon cycling and global environmental change. This project investigates the relationships among socioeconomic character, land use, and vegetation in residential land in the Front Range of Colorado, a rapidly urbanizing region. a with NDVI are Population Density (negative relationship), Commercial Statistics. Correlations indicate that the variables that have the strongestCoefficients relationships Density (negative relationship) and Residential Density (positive relationship). In a multivariate context, Housing Year exerts the most influence when the influence of the other explanatory variables are accounted for, although its zero-order correlation is much lower those of all of the other explanatory variables. Standardi NDVI (image) Model 1 (Cons tant) (Constant) PEDU Education Housing MYR Year PDENRMU Population Den. ElevationMELEV Dist. to Highway DLIMIT Commercial Den. MCMUDEN Residential Den. RESDEN Land Use Data: Sources and Preprocessing • Vegetation: NDVI from July 27, 1999 Landsat 7 ETM+ image • Land use: USGS (from aerial photography) • Socioeconomic Status: 2000 U.S. Census • Residential and Commercial Density: calculated by generating grids of the number of residential and commercial grid cells within 1 km of each cell, then calculating the tract mean • Elevation: USGS • Highways: ESRI Note that although colors are mapped to entire tracts, data represents only the residential land within each tract. NDVI (tract level) Education Boulder Land use natural veg. commercial residential agriculture water other NDVI -0.55-0.00 0.01-0.20 0.21-0.40 0.40-0.78 Residential Density Uns tandardized Coefficients B Std. Error 2.457 .344 8.860E-04 .000 -1.32E-03 .000 -8.20E-06 .000 1.470E-04 .000 2.404E-06 .000 -4.44E-05 .000 2.297E-05 .000 zed Coefficien ts Beta .209 -.331 -.293 .124 .225 -.268 .227 t 7.141 5.159 -7.334 -7.977 3.273 6.405 -5.784 5.040 Sig. .000 .000 .000 .000 .001 .000 .000 .000 Correlations Zero-order Partial .391 -.128 -.513 .275 .263 -.513 .492 .253 -.348 -.375 .164 .309 -.281 .247 Collinearity Statis tics Tolerance VIF Part .162 -.230 -.250 .103 .201 -.181 .158 .601 .483 .726 .688 .797 .459 .482 1.665 2.070 1.378 1.453 1.255 2.178 2.073 a. Dependent Variable: MNDVI Denver Spatial Association Rule Mining. Results suggest that residential NDVI is lowest in older, socioeconomically disadvantaged neighborhoods nearby commercial centers. Residential NDVI is highest in older neighborhoods with higher socioeconomic status. Residential NDVI is also highest in areas of residential concentration but sparse population, i.e. planned developments with large lots. Note the role of low Housing Year in predicting both low and high residential NDVI, which explains its statistical results. Elevation Sample of the Mined Rule Set People/m^2 in residential land 322-2669 2673-3348 3389-4525 4527-19810 Housing Year Commercial Density Distance to Highway Mean distance to limited access highways in residential land (m) Mean density of commercial land in residential land (cells) 10-167 168-308 309-520 523-2191 Median year structures built 1939-1957 1958-1971 1972-1979 1980-1997 270-1551 1556-3002 3035-5716 5732-26237 Methods Spatial data mining techniques are exploratory methods for detecting patterns in very large spatial databases. We use spatial association rule mining and spatial on-line analytical processing (OLAP), as well as mapping and statistics. Spatial Association Rule Mining seeks to discover associations among transactions encoded in a spatial database. An association rule takes the form A → B where A and B are sets of predicates, and either A or B contains a spatial relationship. Interesting rules are found by using metrics such lift, which indicates how much more often than expected B occurs when paired with A. Magnum Opus Association Rule Mining Software Spatial On-Line Analytical Processing is an extension to the SQL GroupBy operation that exhaustively summarizes the value of a measurement variable contained in the fact table by all unique combinations of a set of categorical dimension variables contained in dimension tables. Here, we summarize NDVI by categorizations of the other variables, and export the results to GIS for mapping. Microsoft SQL Server Relational Star Schema Education_D 0 Level_2 0 Minority_D 0 1 0 2 1 3 1 Dimension Table Tract_ID 1 2 3 … Fact Table NDVI_D 0 Level_2 0 1 0 2 1 3 1 Dimension Table and and and and Residential Density Residential Density Income Distance to Commercial is low is low is low is low then then then then NDVI NDVI NDVI NDVI is low is low is low is low (Lift = 4.8) (Lift = 4.4) (Lift = 4.1) (Lift = 3.3) If If If If Housing Year Housing Year Population Den. Housing Value is low is low is low is high and and and and % Minority Distance to Highway Residential Density Distance to Commercial is low is high is high is low then then then then NDVI NDVI NDVI NDVI is high is high is high is high (Lift = 5.4) (Lift = 5.0) (Lift = 4.8) (Lift = 3.9) Spatial On-Line Analytical Processing. The maps at right show one OLAP result where mean NDVI is calculated for dimensions of Residential Density and Housing Year. Each tract is categorized as belonging to a unique combination of the dimensions (e.g. low Residential Density and high Housing Year). The mean for all tracts within each category is then calculated. Maps use the HSV color model to display the multidimensional data. Hue is mapped to Housing Year where yellow, orange, red, and purple map from lowest (oldest) to highest (most recent). Saturation is mapped to Residential Density where low (high) saturation represents low (high) Residential Density. Value maps to the NDVI value using a linear stretch between values of 105 and 255. With Value Mapped to NDVI Data The map on the left shows the NDVI data mapped to tracts categorized by Value = NDVI Residential Density and Housing Year. The map on the right maps the color value to the NDVI mean for the entire data set. Areas that are darker (lighter) in the map on the left have a relatively high (low) Hue NDVI. Older, densely residential areas have high NDVI. Comparison of the color cubes shows that Residential Density distinguishes between high and low NDVI, but only between the areas of lowest Residential Density and the other classes. Likewise, Housing Year is important only in distinguishing the most recent residential development from other areas. NDVI Low High Without Value Mapped to NDVI Data 1 2 3 Level_2 0 0 1 1 Dimension Table Education 73 Education_D 2 … …. 58 82 … 1 3 … … … … PopDen_D 0 Level_2 0 1 0 2 1 3 1 Dimension Table Res. Den. Population Density Mean elev. in residential land (m) 1506-1610 1611-1637 1638-1668 1669-1817 is low is high is low is low Res. Den. -0.12-0.15 0.16-0.19 0.20-0.22 0.23-0.33 % with a high school diploma 37-77 78-89 90-95 96-99 Housing Year %Minority Elevation Education Saturation Mean NDVI in residential land Mean density of residential land in residential land (cells) 66-1510 1511-1969 1973-2315 2318-2927 If If If If Hous. Yr. Conclusions This research demonstrates that vegetation greeness in residential areas is a function of the age and type of development as well as socioeconomic status. Vegetation tends to be concentrated in older, densely residential developments that are far from commercial centers and highways and that contain primarily non-minority households with high educational attainment and income. Spatial data mining and visualization, in combination with multivariate statistics, have shown to be useful tools in identifying land cover, socioeconomic, and ecological relationships that are complex and non-linear. GIS serves a key function as data pre-processor and map display device. Future research will address using more sophisticated metrics of ecological character and the application of similar techniques to identify patterns and relationships in time series data. Contact: Jeremy Mennis, Department of Geography, UCB 260, University of Colorado, Boulder, CO 80309, Phone: (303) 492-4794, Fax: (303) 492-7501, Email: [email protected] Hous. Yr.