• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia, lookup

Data assimilation wikipedia, lookup

Regression toward the mean wikipedia, lookup

Choice modelling wikipedia, lookup

Coefficient of determination wikipedia, lookup

Forecasting wikipedia, lookup

Transcript
```Why is it there?
(How can a GIS analyze
data?)
Getting Started, Chapter 6
Paula Messina
GIS is capable of data
analysis

Attribute Data
Describe with statistics
 Analyze with hypothesis testing


Spatial Data
Describe with maps
 Analyze with spatial analysis

Describing one attribute
Flat File Database
Attribute
Attribute
Attribute
Record
Value
Value
Value
Record
Value
Value
Value
Record
Value
Value
Value
Attribute Description



The extremes of an attribute are the
highest and lowest values, and the range
is the difference between them in the
units of the attribute.
A histogram is a two-dimensional plot of
attribute values grouped by magnitude
and the frequency of records in that
group, shown as a variable-length bar.
For a large number of records with
random errors in their measurement, the
histogram resembles a bell curve and is
Describing a classed raster
grid
20
% (blue) = 19/48
15
10
5
If the attributes are:

Numbers
 statistical description
 min, max, range
 variance
 standard deviation
Statistical description
Range : max-min
 Central tendency : mode, median,
mean
 Variation : variance, standard
deviation

Statistical description



Range : outliers
mode, median, mean
Variation : variance, standard deviation
Elevation (book example)
GPS Example Data:
Elevation
Data Extreme Date Time D M S
Minimum
Maximum
Range
6/14/95
6/15/95
1 Day
D MS
10:47am 42 30 54.8 75 41 13.8
10:47pm 42 31 03.3 75 41 20.0
12 hours
00 8.5
00 6.2
Elev
247
610
363
Mean


Statistical average
Sum of the values
for one attribute
divided by the
number of records
n
X =
X i/n
i = 1
Variance
The total variance is the sum of each
record with its mean subtracted and
then multiplied by itself.
The standard deviation is the square
root of the variance divided by the
number of records less one.
Standard Deviation


Average difference
from the mean
st.dev.
Sum of the mean
subtracted from the
value for each record,
squared, divided by
the number of records1, square rooted.
=
 (X i - X )
n-1
2
GPS Example Data: Elevation
Standard Deviation


Same units as the values of the
records, in this case meters.
Elevation is the mean (459.2 meters)



plus or minus the expected error of 82.92
meters
Elevation is most likely to lie between
376.28 meters and 542.12 meters.
These limits are called the error band
or margin of error.
Standard Deviations and
the Bell Curve
One Std. Dev.
below the mean
Mean
542.1
459.2
376.3
One Std. Dev.
above the mean
Testing Means (1)



Mean elevation of 459.2 meters
Standard deviation 82.92 meters
What is the chance of a GPS reading of
484.5 meters?
• 484.5 is 25.3 meters above the mean
• 0.31 standard deviations ( Z-score)
• 0.1217 of the curve lies between the mean
and this value
• 0.3783 beyond it
Testing Means (2)
Mean
12.17 %
484.5
459.2
37.83 %
Accuracy
Determined by testing
measurements against an
independent source of higher fidelity
and reliability.
 Must pay attention to units and
significant digits.
 Not to be confused with precision!

The difference is the map
question: Where?
 GIS data analysis answers the
question: Why is it there?
 GIS data description is different
from statistics because the results
can be placed onto a map for visual
analysis.

Spatial Statistical
Description
For coordinates, the means and
standard deviations correspond to
the mean center and the standard
distance
 A centroid is any point chosen to
represent a higher dimension
geographic feature, of which the
mean center is only one choice.

Spatial Statistical
Description

For coordinates, data extremes
define the two corners of a bounding
rectangle.
Geographic extremes



Southernmost point in
the continental United
States.
Range: e.g. elevation
difference; map
extent
Depends on
projection, datum etc.
Mean Center
mean y
mean x
Centroid: mean center of a
feature
Mean center?
Comparing spatial means
Spatial Analysis
Lower 48 United States
 1996 Data from the U.S. Census on
gender
 Gender Ratio = # females per 100
males
 Range is 96.4 - 114.4
 What does the spatial distribution
look like?

Gender Ratio by State:
1996
Searching for Spatial
Pattern


A linear relation is a predictable straightline link between the values of a
dependent and an independent variable.
(y = a + bx) It is a simple model of
correlation.
A linear relation can be tested for
goodness of fit with least squares
methods. The coefficient of determination
r-squared is a measure of the degree of
fit, and the amount of variance explained.
Simple linear relation
best fit
regression line
y = a + bx
observation
dependent
variable
intercept
y=a+bx
independent variable
Testing the relation
gr = 117.46 +
0.138 long.
GIS and Spatial Analysis



Geographic inquiry examines the
relationships between geographic
features collectively to help describe and
understand the real-world phenomena
that the map represents.
Spatial analysis compares maps,
investigates variation over space, and
predicts future or unknown maps.
Many GIS systems have to be coaxed to
generate a full set of spatial statistics.
You can lie with...
Maps
Statistics
Correlation is not causation!
Terrain Analysis
Paula
Messina
Introduction to Terrain
Analysis
What is terrain analysis?
 How are data points interpolated to
a grid?
 How are topographic data sets
produced from non-point data?
 How are derivative data sets (i.e.,
slope and aspect maps) produced by
ArcView?

What is Terrain Analysis?

Terrain Analysis: the study of groundsurface relief and pattern by numerical
methods (a.k.a geomorphometry).

Geomorphology  qualitative

Geomorphometry = quantitative
Interpolation to a Grid
?
46
58
70

86
46
58
97
70
86
97
Assumptions:


Elevations are continuously distributed
The influence of one known point over an
unknown point increases as distance
between them decreases
Interpolation Using the
Neighborhood Model

Inverse-Distance
theory dictates:



The value of X > 58
The value of X < 97
The value of X is
closer to 58 than 97
46
58
70
x
86
97
Neighborhood Interpolation Using
Inverse Distance Weighting
R

Zp dp-n
P=1
Zx =
ArcGIS calls
this IDW
46
R

dp-n
58
P=1
Zx= elevation at kernal (point x)
x
86
70
Zp = elevation at known point p
dp = distance from point x to point p
n = “friction of distance” value; usually between 1 and 6

When n=2, the technique is called
“inverse-squared distance weighting.”
97
Types of “Neighborhoods”
used with IDW

Nearest n Neighbors




in this example, n = 3
this method isn’t effective
when there are clusters of
points
“nearest in octants” searches
can help
46
58
x
70


points are selected only if
they lie within that fixed
86
97
46
58 x
70
86
97
Interpolation using the
Spline Method

The spline interpolator fits a
minimum-curvature surface through
input points.


“Rubber sheet fit”
The spline interpolator fits a
mathematical function to a specified
number of nearest points
Interpolation Using Kriging

Based on regionalized variable
theory

Drift, random correlated component,
noise
This method produces a
statistically optimal surface, but it
is very computationally intensive
 Kriging is used frequently in
soil science and geology

Trend Interpolator

Fits a mathematical function (a
polynomial of specified order) to input
points





Points may be chosen by nearest neighbor or radius
searches
--or-All points may be used
Uses a least-squares regression fit
The surface produced does not
necessarily
Not available as
pass through the points used
in ArcGIS
This is an excellent choice
when data points are sparse
interpolator is better?

IDW

Assumption: The variable being mapped
decreases in influence with distance
power for a retail site analysis

Spline


Assumption: The variable being mapped is
a smooth, continuous surface;
it is not particularly good for surfaces with
large variability over small horizontal
distances
• Examples: terrain, water table heights, pollution
concentration, etc.
The Finished
Grid
46
58
70
x
86

97
Grids are subject to
the “layer cake
effect”
46 48 50 52 46 46 44
48 50 54 56 46 56 54
50 52 60 64 68 80 80
56 58 65 74 86 84 80
66 69 73 80 90 88 86
70 75 78 86 94 94 80
72 76 80 84 90 89 84

The Messina “Eyeball”
Interpolator was used
Point Data Collection in the
Field
It is critical to obtain data at the
corners of the grid extent
 It is advisable to obtain the VIPs
(Very Important Points) such as the
highest and lowest elevations

Other Continuous Surface
Sources

USGS DEMs






produced directly from USGS Topographic Maps
Elevations of an area are averaged within the grid cell
(pixel)
High and low points can never be saved as a grid cell
value
Various techniques (i.e. stereograms) were used to
accomplish this process
DEM
Spatial resolution: 30m (7.5 minute data), 1 arc-second
(1 degree data), 10m*, 5m*
*(limited
coverage)
Other Continuous Surface
Sources


Shuttle Missions:
• Shuttle Radar Topography Mission, 2/00
• SIR-C , 1994

Other Orbiters
• Magellan Mapping Mission of Venus,
animation of the Venutian surface
topography

• AirSAR/TopSAR
• GeoSAR: California mapping
How is Slope Computed?

Slope = arctan
[
() ()
dZ
dX
2+
dZ 2
]
dY
100
Calculate the slope
for the central pixel.
solution.
130
140
120
150
160
160
170
200
Grid cell = 100m x 100m
How is Aspect Computed?

()
()
Aspect A’ = arctan -
dZ
dY
dZ
dX
If
If
If
dZ is negative, add 90 to A’
dX
dZ is positive, and
dX
dZ is positive, and
dX
dZ is negative: add 270 to A’
dY
dZ is positive: subtract A’ from 270
dY
100
130
140
120
150
160
160
170
200
Grid cell = 100m x 100m
Calculate the aspect
for the central pixel.