Download ESRI Technology Update - City University of New York

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Time series wikipedia, lookup

Spatial analysis wikipedia, lookup

Transcript
GIS Analysis
Describing Attributes
Statistical Analysis
Spatial Description
Spatial Analysis
Searching for Spatial Relationships
GIS and Spatial Analysis
Introduction to GIS: Lecture #7 (GIS Analysis)
Dueker (1979)
• “a geographic information system is a special
case of information systems where the
database consists of observations on
spatially distributed features, activities or
events, which are definable in space as
points, lines, or areas. A geographic
information system manipulates data about
these points, lines, and areas to retrieve data
for ad hoc queries and analyses".
Introduction to GIS: Lecture #7 (GIS Analysis)
GIS is capable of data analysis
• Attribute Data
– Describe with statistics
– Analyze with hypothesis testing
• Spatial Data
– Describe with maps
– Analyze with spatial analysis
Introduction to GIS: Lecture #7 (GIS Analysis)
Attribute Description
• The extremes of an attribute are the highest and
lowest values, and the range is the difference
between them in the units of the attribute.
• A histogram is a two-dimensional plot of attribute
values grouped by magnitude and the frequency
of records in that group, shown as a variablelength bar.
• For a large number of records with random errors
in their measurement, the histogram resembles a
bell curve and is symmetrical about the mean.
Introduction to GIS: Lecture #7 (GIS Analysis)
If the records are:
• Text
– Semantics of text e.g. “Hampton”
– word frequency e.g. “Creek”,
“Kill”
– address matching
• Example: Display all places called
“State Street”
Introduction to GIS: Lecture #7 (GIS Analysis)
If the records are:
• Classes
– histogram by class
– numbers in class
– contiguity description, e.g. average
neighbor (roads, commercial)
Introduction to GIS: Lecture #7 (GIS Analysis)
Describing a classed raster grid
20
P (blue) = 19/48
15
10
5
Introduction to GIS: Lecture #7 (GIS Analysis)
If the records are:
• Numbers
– statistical description
– min, max, range
– variance
– standard deviation
Introduction to GIS: Lecture #7 (GIS Analysis)
Statistical description
• Range : min, max, max-min
• Central tendency : mode, median (odd,
even), mean
• Variation : variance, standard deviation
Introduction to GIS: Lecture #7 (GIS Analysis)
Statistical description
• Range : outliers
• mode, median, mean
• Variation : variance, standard deviation
Introduction to GIS: Lecture #7 (GIS Analysis)
Elevation (book example)
Introduction to GIS: Lecture #7 (GIS Analysis)
GPS Example Data: Elevation
Table 6.2: Sample GPS Readings
Data Extreme Date Time D M S
Minimum
Maximum
Range
6/14/95
6/15/95
1 Day
D MS
10:47am 42 30 54.8 75 41 13.8
10:47pm 42 31 03.3 75 41 20.0
12 hours
00 8.5
00 6.2
Introduction to GIS: Lecture #7 (GIS Analysis)
Elev
247
610
363
Mean
• Statistical average
• Sum of the values for
one attribute divided
by the number of
records
n
X =
X i /n
i = 1
Introduction to GIS: Lecture #7 (GIS Analysis)
Computing the Mean
Sum of attribute values across all records, divided by
the number of records.
Add all attribute values down a column, / by #
records
A representative value, and for measurements with
normally distributed error, converges on the true
reading.
A value lacking sufficient data for computation is
called a missing value. Does not get included in
sum or n.
Introduction to GIS: Lecture #7 (GIS Analysis)
Variance
The total variance is the sum of each record
with its mean subtracted and then
multiplied by itself.
The standard deviation is the square root of
the variance divided by the number of
records less one.
For two values, there is only one variance.
Introduction to GIS: Lecture #7 (GIS Analysis)
Standard Deviation


Average difference
from the mean
st.dev.
Sum of the mean
subtracted from the
value for each record,
squared, divided by
the number of records1, square rooted.
=
 (X i - X )
n-1
Introduction to GIS: Lecture #7 (GIS Analysis)
2
GPS Example Data: Elevation
Standard deviation
Same units as the values of the records, in this case
meters.
Average amount readings differ from the average
Can be above of below the mean
Elevation is the mean (459.2 meters)
plus or minus the expected error of 82.92 meters
Elevation is most likely to lie between 376.28 meters
and 542.12 meters.
These limits are called the error band or margin of error.
Introduction to GIS: Lecture #7 (GIS Analysis)
The Bell Curve
Mean
12.17 %
484.5
459.2
37.83 %
Introduction to GIS: Lecture #7 (GIS Analysis)
Samples and populations
• A sample is a set of measurements taken
from a larger group or population.
• Sample means and variances can serve as
estimates for their populations.
• Easier to measure with samples, then draw
conclusions about entire population.
Introduction to GIS: Lecture #7 (GIS Analysis)
Testing Means
Mean elevation of 459.2 meters
standard deviation 82.92 meters
what is the chance of a GPS reading of 484.5
meters?
484.5 is 25.3 meters above the mean
0.31 standard deviations ( Z-score)
0.1217 of the curve lies between the mean and this
value
0.3783 beyond it
Introduction to GIS: Lecture #7 (GIS Analysis)
Hypothesis testing
• Set up NULL hypothesis (e.g. Values or
Means are the same) as H0
• Set up ALTERNATIVE hypothesis. H1
• Test hypothesis. Try to reject NULL.
• If null hypothesis is rejected alternative is
accepted with a calculable level of
confidence.
Introduction to GIS: Lecture #7 (GIS Analysis)
Testing the Mean
• Mathematical version of the normal
distribution can be used to compute
probabilities associated with measurements
with known means and standard deviations.
• A test of means can establish whether two
samples from a population are different from
each other, or whether the different measures
they have are the result of random variation.
Introduction to GIS: Lecture #7 (GIS Analysis)
Alternative attribute histograms
Skewed above mean
No error in measurement
All measurements
equally right
Random with some
systematic error
Decrease away
from zero
Bimodal
(two averages)
Introduction to GIS: Lecture #7 (GIS Analysis)
Normal distribution
Random distribution
Accuracy
Determined by testing measurements against
an independent source of higher fidelity
and reliability.
Must pay attention to units and significant
digits.
Can be expressed as a number using statistics
(e.g. expected error).
Accuracy measures imply accuracy users.
Introduction to GIS: Lecture #7 (GIS Analysis)
The difference is the map
• GIS data description answers the question:
Where?
• GIS data analysis answers the question:
Why is it there?
• GIS data description is different from
statistics because the results can be placed
onto a map for visual analysis.
Introduction to GIS: Lecture #7 (GIS Analysis)
Spatial Statistical Description
• For coordinates, the means and standard
deviations correspond to the mean center
and the standard distance
• A centroid is any point chosen to represent
a higher dimension geographic feature, of
which the mean center is only one choice.
• The standard distance for a set of point
spatial measurements is the expected
spatial error.
Introduction to GIS: Lecture #7 (GIS Analysis)
Spatial Statistical Description
• For coordinates, data extremes define the
two corners of a bounding rectangle.
Introduction to GIS: Lecture #7 (GIS Analysis)
Mean Center
mean y
mean x
Introduction to GIS: Lecture #7 (GIS Analysis)
Centroid: mean center of a feature
Introduction to GIS: Lecture #7 (GIS Analysis)
Comparing spatial means
Introduction to GIS: Lecture #7 (GIS Analysis)
GIS and Spatial Analysis
Descriptions of geographic properties such as shape,
pattern, and distribution are often verbal
Quantitative measure can be devised, although few are
computed by GIS.
GIS statistical computations are most often done using
retrieval options such as buffer and spread.
Also by manipulating attributes with arithmetic
commands (map algebra).
Introduction to GIS: Lecture #7 (GIS Analysis)
Statistics and features
HIGH VALUE
MEDIUM VALUE
LOW VALUE
Points
Bounding rectangle
Standard distance
Lines
Number of points
Length of line
Length index
Areas
Area in square units
Boundary length
Number of Holes
Area/area of bounding rectangle
Area of largest enclosed circle
Figure 6.8 Numbers that describe points, lines, and areas.
Introduction to GIS: Lecture #7 (GIS Analysis)
An example
•
•
•
•
•
Lower 48 United States
1996 Data from the U.S. Census on gender
Gender Ratio = # females per 100 males
Range is 96.4 - 114.4
What does the spatial distribution look
like?
Introduction to GIS: Lecture #7 (GIS Analysis)
Gender Ratio by State: 1996
Introduction to GIS: Lecture #7 (GIS Analysis)
Searching for Spatial Pattern
A linear relationship is a predictable straightline link between the values of a dependent
and an independent variable. (y = a + bx) It
is a simple model of the relationship.
A linear relation can be tested for goodness of
fit with least squares methods. The
coefficient of determination r-squared is a
measure of the degree of fit, and the
amount of variance explained.
Introduction to GIS: Lecture #7 (GIS Analysis)
Simple linear relationship
best fit
regression line
y = a + bx
observation
dependent
variable
gradient
intercept
y=a+bx
independent variable
Introduction to GIS: Lecture #7 (GIS Analysis)
Testing the relationship
gr = 117.46 +
0.138 long.
Introduction to GIS: Lecture #7 (GIS Analysis)
Patterns in Residual Mapping
Differences between observed values of the dependent
variable and those predicted by a model are called
residuals.
A GIS allows residuals to be mapped and examined for
spatial patterns.
A model helps explanation and prediction after the GIS
analysis.
A model should be simple, should explain what it represents,
and should be examined in the limits before use.
We should always examine the limits of the model’s
applicability (e.g. Does the regression apply to Europe?)
Introduction to GIS: Lecture #7 (GIS Analysis)
Mapping residuals from a model
Introduction to GIS: Lecture #7 (GIS Analysis)
Unexplained variance
•
•
•
•
•
•
•
More variables?
Different extent?
More records?
More spatial dimensions?
More complexity?
Another model?
Another approach?
Introduction to GIS: Lecture #7 (GIS Analysis)
Example: Spatial Analysis
12
14
19
10
40
12
6
25
?
14
11
30
meters to water table
resolution? extent? accuracy? precision?
Introduction
to GIS: Lecture #7 (GIS Analysis)
boundary effects? point
spacing?
GIS and Spatial Analysis
Geographic inquiry examines the
relationships between geographic features
collectively to help describe and understand
the real-world phenomena that the map
represents.
Spatial analysis compares maps, investigates
variation over space, and predicts future or
unknown maps.
Many GIS systems have to be coaxed to
generate a full set of spatial statistics.
Introduction to GIS: Lecture #7 (GIS Analysis)
Analytic Tools and GIS
Tools for searching out spatial relationships and for
modeling are only lately being integrated into GIS.
Statistical and spatial analytical tools are also only now
being integrated into GIS, and many people use
separate software systems outside the GIS.
Real geographic phenomena are dynamic, but GISs have
been mostly static. Time-slice and animation methods
can help in visualizing and analyzing spatial trends.
GIS places real-world data into an organizational
framework that allows numerical description and
allows the analyst to model, analyze, and predict with
both the map and the attribute data.
Introduction to GIS: Lecture #7 (GIS Analysis)