Download Exploring Spatial Patterns in your data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Transcript
EXPLORING SPATIAL PATTERNS IN
YOUR DATA
OBJECTIVES
Learn how to examine your data using the
Geostatistical Analysis tools in ArcMap.
 Learn how to use descriptive statistics in ArcMap
and Geoda to analyze data.
 Be able to identify Geostatistical Analysis tools that
can be used for further analysis.

WHY EXPLORE YOUR DATA?

It allows you to better select an appropriate tool to
analyze your data.

If you skip exploring your data, you may miss key
information about it that may lead to incorrect
conclusions and decisions.
GEODA VS. ARCMAP

Geoda – free, open-source, simple, software
specifically for statistical analysis

ArcMap – proprietary, GIS software that can
perform statistical analysis along with hundreds of
other analyses
GEODA VS. ARCMAP
With ArcMap you
can view several
data layers at once.
In Geoda, you view
only one data layer.
 Some tools are
found in both
programs, while
some are found in
only one.

EXPLORE THE LOCATION OF YOUR
DATA
EXPLORE THE LOCATION OF YOUR DATA

Explore:
size of the study area
 mean
 median
 direction data are oriented


You will see where data are clustered relative to the
rest of the data.
MEAN CENTER
The geographic center for a set of features.
 Constructed from the average x and y values for
the input feature centroids (middle points, if input
features are polygons).

MEDIAN CENTER
Median Center is robust to outliers.
 Uses an algorithm to find the point that minimizes
travel from it to all other features in the dataset.
 At each step (t) in the algorithm, a candidate
Median Center is found (Xt, Yt) and refined until it
represents the location that minimizes Euclidian
Distance d to all features (i) in the dataset.

DIRECTION DISTRIBUTION (STANDARD
DEVIATIONAL ELLIPSE)



Standard deviational ellipses summarize the spatial
characteristics of geographic features: central tendency,
dispersion, and directional trends.
The ellipse allows you to see if the distribution of
features is elongated and hence has a particular
orientation.
When the underlying spatial pattern of features is
concentrated in the center with fewer features toward
the periphery (a spatial normal distribution),
a one standard deviation ellipse polygon will cover
approximately 68 percent of the features
 two standard deviations will contain approximately 95 percent
of the features
 three standard deviations will cover approximately 99 percent
of the features

EXPLORE THE VALUES OF YOUR DATA
NORMAL DISTRIBUTION

Some analysis tools assume a normal distribution:
Mean and median are similar
 Data are symmetrical

DATA FREQUENCY USING HISTOGRAMS
DATA DISTRIBUTION USING A QQ PLOT
ManyAcharacteristics
normally
Not
distributed
normal
of a normal
dataset
dataset
A normal QQ plot shows the relationship of your data to a normal distribution line.
BOX PLOT
Displays the median and interquartile range (IQ)
(25%-75%)
 Hinge = multiple of interquartile range

MAPS

For examining data values and frequencies:
Quantile Map
 Natural breaks
 Equal intervals


For finding outliers:
Percentile Map
 Box Map
 Standard Deviation Map

QUANTILE MAP

Displays the distribution of values in categories with
an equal number of observations in each category.
EQUAL INTERVAL MAP


Sets the value ranges in each category equal in size.
The entire range of data values is divided equally into
however many categories have been chosen.
NATURAL BREAKS MAP

Seeks to reduce the variance within classes and
maximize the variance between classes
OTHER EXPLORATORY METHODS
Scatter Plot (2 variables)
 Parallel coordinate plot (A pattern of lines is drawn
that connects the coordinates of each observation
across the variables on parallel x-axes.)

DETECT OUTLIERS
OUTLIERS

Outliers can reveal mistakes, unusual occurrences,
and shift points in data patterns (a valley in a
mountain range).

You should use more than one method to find
outliers because some techniques will only highlight
data values near the two ends of your range.
PERCENTILE MAP
Groups ranked data into 6 categories
 Lowest and highest 1% are potential outliers

BOX MAP
Groups data into
4 categories, plus
2 outlier
categories at both
ends
 Data are outliers
if they are 1.5 or
3 times the IQ.
 Detects outliers
with more
certainty than a
percentile map

STANDARD DEVIATION MAP
Displays data 3 standard deviations above and
below the mean.
 As a parametric map, it is sensitive to outliers.

SEMIVARIOGRAM CLOUD
When points closer together have greater
differences in their values, this may indicate an
outlier in the data.
 The selected points may be outliers.

VORONOI MAP
The gray
polygons may
be outliers.

Cluster Voronoi maps show spatial outliers in your
data; simple Voronoi maps can pinpoint data values
that are many class breaks removed from
surrounding polygons.
HISTOGRAM

Values in the last bars to the left or right, if far
removed from the adjacent values, may indicate
outliers.
NORMAL QQ PLOT

Values at the tails of a normal QQ plot can also be
outliers. This can happen when the tail values do
not fall along the reference line.
BOXPLOT

Points outside the hinges (represented by the
black, horizontal lines), maybe outliers.
EXPLORE SPATIAL RELATIONSHIPS IN
YOUR DATA
SPATIAL AUTOCORRELATION
 Everything
is related, but objects closer
together are more related than objects
farther apart.
 Explore using a semivariogram graph or
cloud
 Can also be explored using Moran’s I and
Getis-Ord G statistics
Height (sill) = variation between
data values.
Range = distance between
points at which the
semivariogram flattens out.
As the range increase, height
should increase, since points
further away from each other are
not as related, so there should
be more variation.
If a semivariogram is a
horizontal line, there is no
spatial autocorrelation.
VARIATION IN YOUR DATA

Many spatial statistics analysis techniques assume
your data are stationary, meaning the relationship
between two points and their values depends on
the distance between them, not their exact location.

Explore variation using a Voronoi map.

A Voronoi map is created by defining Thiessen
polygons around each point in your dataset.

Any location inside a polygon represents the area
closer to that data point than to any other data
point.

This allows you to explore the variation of each
sample point based on its relationship to
surrounding sample points.
A SIMPLE VORONOI MAP
Green = little local
variation
Orange and Red =
greater local variation

A simple Voronoi map shows the data value at each
location. The map is symbolized using a geometrical
interval classification. This will show the variation in data
values across your entire dataset.
TYPES OF





VORONOI MAPS
Simple: The value assigned to a polygon is the value
recorded at the sample point within that polygon.
Mean: The value assigned to a polygon is the mean value that
is calculated from the polygon and its neighbors.
Mode: All polygons are categorized using five class intervals.
The value assigned to a polygon is the mode (most frequently
occurring class) of the polygon and its neighbors.
Cluster: All polygons are categorized using five class
intervals. If the class interval of a polygon is different from
each of its neighbors, the polygon is colored gray and put into
a sixth class to distinguish it from its neighbors.
Entropy: All polygons are categorized using five classes
based on a natural grouping of data values (smart quantiles).
The value assigned to a polygon is the entropy that is
calculated from the polygon and its neighbors.
Entropy = - Σ (pi * Log pi ),
EXPLORE TRENDS IN YOUR DATA
TREND ANALYSIS

You can use the trend analysis tool in Arcmap to
visually compare the trend lines with any patterns in
your data.

When exploring trends, your data locations are
mapped along the x- and y-axes. The values of
each data location are mapped as height (z-axis).

Trends are analyzed based on direction and on the
order of the line that fits the trend. The trend line is
a mathematical function, or polynomial, that
describes the variation in the data.
You can determine whether
the order of the polynomial
fits your data based on the
shape created by the line.
A second-order polynomial
will appear as an upward
or a downward curve
(known as a parabola).
These polynomials show
a clear curve, indicating
a second-order trend
in the data.
SELECTING AN ANALYSIS TECHNIQUE

Each of the following techniques are types of
interpolation. Interpolation creates surfaces based
on spatially continuous data.

Each surface uses the values and locations of your
points to create (or interpolate) the values for the
remaining points in the surface.
GEOSTATISTICAL INTERPOLATION

Creates surfaces using the relationships between
your data locations and their values.

Predicts values based on your existing data.

Assumptions:

Data is not clustered.
(Simple kriging technique has a declustering option.)

Data is normally distributed.
(Transformation options are available.)
Data is stationary (no local variation).
 Data is autocorrelated.
 Data has no local trends.

(You can remove trends from data as part of the interpolation
process. )
GLOBAL DETERMINISTIC INTERPOLATION

Creates surfaces using the existing values at each
location.

Uses your entire dataset to create your surface.

Assumptions:
Outliers have been removed from the data.
 Global trends exist in the data.

LOCAL DETERMINISTIC INTERPOLATION

Uses several subsets, or neighborhoods, within an
entire dataset to create the different components of
the surface.

Assumption:

Data is normally distributed.
INVERSE DISTANCE WEIGHTED
INTERPOLATION (IDW)

A type of local deterministic interpolation.

Assumptions:
Data is not clustered.
 Data is autocorrelated.

OTHER SPATIAL STATISTICAL TESTS

Tests for spatial autocorrelation
Getis-Ord General G and Global Moran’s I (to determine
overall clustering and dispersion of values)
 Hot Spot Analysis (Getis-Ord Gi*) and Anselin’s Local
Moran’s I (to determine specific clusters of high and low
values)


Regression

Used to evaluate relationships between two or more
feature attributes. Are location, crime rates, racial makeup, and income related to housing values in a census
tract?