Download Residential Density

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
DATA MINING RELATIONSHIPS AMONG URBAN SOCIOECONOMIC, LAND COVER, AND REMOTELY SENSED ECOLOGICAL DATA
Jeremy Mennis*, Carol, Wessman, and Nancy Golubiewski**, *Department of Geography and **Department of Ecology and Evolutionary Biology, University of Colorado
Objective and Motivation
Results
Analyzing socioeconomic-vegetation relations in the context of urban growth contributes to an understanding of the role of urban
regions in carbon cycling and global environmental change. This project investigates the relationships among socioeconomic
character, land use, and vegetation in residential land in the Front Range of Colorado, a rapidly urbanizing region.
a with NDVI are Population Density (negative relationship), Commercial
Statistics. Correlations indicate that the variables that have the strongestCoefficients
relationships
Density (negative relationship) and Residential Density (positive relationship). In a multivariate context, Housing Year exerts the most influence when the
influence of the other explanatory variables are accounted
for, although its zero-order correlation is much lower those of all of the other explanatory variables.
Standardi
NDVI (image)
Model
1
(Cons tant)
(Constant)
PEDU
Education
Housing MYR
Year
PDENRMU
Population
Den.
ElevationMELEV
Dist. to Highway
DLIMIT
Commercial
Den.
MCMUDEN
Residential
Den.
RESDEN
Land Use
Data: Sources and Preprocessing
• Vegetation: NDVI from July 27, 1999 Landsat 7 ETM+ image
• Land use: USGS (from aerial photography)
• Socioeconomic Status: 2000 U.S. Census
• Residential and Commercial Density: calculated by generating grids of the
number of residential and commercial grid cells within 1 km of each cell, then
calculating the tract mean
• Elevation: USGS
• Highways: ESRI
Note that although colors are mapped to entire tracts, data represents only the
residential land within each tract.
NDVI (tract level)
Education
Boulder
Land use
natural veg.
commercial
residential
agriculture
water
other
NDVI
-0.55-0.00
0.01-0.20
0.21-0.40
0.40-0.78
Residential Density
Uns tandardized
Coefficients
B
Std. Error
2.457
.344
8.860E-04
.000
-1.32E-03
.000
-8.20E-06
.000
1.470E-04
.000
2.404E-06
.000
-4.44E-05
.000
2.297E-05
.000
zed
Coefficien
ts
Beta
.209
-.331
-.293
.124
.225
-.268
.227
t
7.141
5.159
-7.334
-7.977
3.273
6.405
-5.784
5.040
Sig.
.000
.000
.000
.000
.001
.000
.000
.000
Correlations
Zero-order
Partial
.391
-.128
-.513
.275
.263
-.513
.492
.253
-.348
-.375
.164
.309
-.281
.247
Collinearity Statis tics
Tolerance
VIF
Part
.162
-.230
-.250
.103
.201
-.181
.158
.601
.483
.726
.688
.797
.459
.482
1.665
2.070
1.378
1.453
1.255
2.178
2.073
a. Dependent Variable: MNDVI
Denver
Spatial Association Rule Mining. Results suggest that residential NDVI is lowest in older, socioeconomically
disadvantaged neighborhoods nearby commercial centers. Residential NDVI is highest in older neighborhoods
with higher socioeconomic status. Residential NDVI is also highest in areas of residential concentration but sparse
population, i.e. planned developments with large lots. Note the role of low Housing Year in predicting both low and
high residential NDVI, which explains its statistical results.
Elevation
Sample of the Mined Rule Set
People/m^2 in
residential land
322-2669
2673-3348
3389-4525
4527-19810
Housing Year
Commercial Density
Distance to Highway
Mean
distance
to
limited
access
highways
in residential
land (m)
Mean
density of
commercial
land in residential land (cells)
10-167
168-308
309-520
523-2191
Median year
structures built
1939-1957
1958-1971
1972-1979
1980-1997
270-1551
1556-3002
3035-5716
5732-26237
Methods
Spatial data mining techniques are exploratory methods for detecting patterns in very large spatial databases. We use spatial association rule mining and spatial on-line
analytical processing (OLAP), as well as mapping and statistics.
Spatial Association Rule
Mining seeks to discover
associations among
transactions encoded in a
spatial database. An association
rule takes the form A → B
where A and B are sets of
predicates, and either A or B
contains a spatial relationship.
Interesting rules are found by
using metrics such lift, which
indicates how much more often
than expected B occurs when
paired with A.
Magnum Opus Association Rule
Mining Software
Spatial On-Line Analytical
Processing is an extension to the
SQL GroupBy operation that
exhaustively summarizes the
value of a measurement variable
contained in the fact table by all
unique combinations of a set of
categorical dimension variables
contained in dimension tables.
Here, we summarize NDVI by
categorizations of the other
variables, and export the results to
GIS for mapping.
Microsoft SQL Server Relational Star
Schema
Education_D
0
Level_2
0
Minority_D
0
1
0
2
1
3
1
Dimension Table
Tract_ID
1
2
3
…
Fact Table
NDVI_D
0
Level_2
0
1
0
2
1
3
1
Dimension Table
and
and
and
and
Residential Density
Residential Density
Income
Distance to Commercial
is low
is low
is low
is low
then
then
then
then
NDVI
NDVI
NDVI
NDVI
is low
is low
is low
is low
(Lift = 4.8)
(Lift = 4.4)
(Lift = 4.1)
(Lift = 3.3)
If
If
If
If
Housing Year
Housing Year
Population Den.
Housing Value
is low
is low
is low
is high
and
and
and
and
% Minority
Distance to Highway
Residential Density
Distance to Commercial
is low
is high
is high
is low
then
then
then
then
NDVI
NDVI
NDVI
NDVI
is high
is high
is high
is high
(Lift = 5.4)
(Lift = 5.0)
(Lift = 4.8)
(Lift = 3.9)
Spatial On-Line Analytical Processing. The maps at right show one
OLAP result where mean NDVI is calculated for dimensions of
Residential Density and Housing Year. Each tract is categorized as
belonging to a unique combination of the dimensions (e.g. low
Residential Density and high Housing Year). The mean for all tracts
within each category is then calculated. Maps use the HSV color model
to display the multidimensional data. Hue is mapped to Housing Year
where yellow, orange, red, and purple map from lowest (oldest) to
highest (most recent). Saturation is mapped to Residential Density
where low (high) saturation represents low (high) Residential Density.
Value maps to the NDVI value using a linear stretch between values of
105 and 255.
With Value Mapped to NDVI Data
The map on the left shows the NDVI
data mapped to tracts categorized by
Value = NDVI
Residential Density and Housing Year.
The map on the right maps the color value
to the NDVI mean for the entire data set.
Areas that are darker (lighter) in the map
on the left have a relatively high (low)
Hue
NDVI. Older, densely residential areas
have high NDVI. Comparison of the color cubes shows that Residential
Density distinguishes between high and low NDVI, but only between the
areas of lowest Residential Density and the other classes. Likewise,
Housing Year is important only in distinguishing the most recent
residential development from other areas.
NDVI
Low
High
Without Value Mapped to NDVI Data
1
2
3
Level_2
0
0
1
1
Dimension Table
Education
73
Education_D
2
…
….
58
82
…
1
3
…
…
…
…
PopDen_D
0
Level_2
0
1
0
2
1
3
1
Dimension Table
Res. Den.
Population Density
Mean elev.
in residential
land (m)
1506-1610
1611-1637
1638-1668
1669-1817
is low
is high
is low
is low
Res. Den.
-0.12-0.15
0.16-0.19
0.20-0.22
0.23-0.33
% with a
high school
diploma
37-77
78-89
90-95
96-99
Housing Year
%Minority
Elevation
Education
Saturation
Mean NDVI in
residential land
Mean
density of
residential
land in residential land (cells)
66-1510
1511-1969
1973-2315
2318-2927
If
If
If
If
Hous. Yr.
Conclusions
This research demonstrates that vegetation greeness in residential areas is a function of the
age and type of development as well as socioeconomic status. Vegetation tends to be
concentrated in older, densely residential developments that are far from commercial centers
and highways and that contain primarily non-minority households with high educational
attainment and income.
Spatial data mining and visualization, in combination with multivariate statistics, have shown to
be useful tools in identifying land cover, socioeconomic, and ecological relationships that are
complex and non-linear. GIS serves a key function as data pre-processor and map display
device.
Future research will address using more sophisticated metrics of ecological character and the
application of similar techniques to identify patterns and relationships in time series data.
Contact: Jeremy Mennis, Department of Geography, UCB 260, University of Colorado, Boulder, CO 80309, Phone: (303) 492-4794, Fax: (303) 492-7501, Email: [email protected]
Hous. Yr.