Download PROC ARKMAP: A Low Cost Method of Presenting Geographical Data in an Educational Environment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Categorical variable wikipedia , lookup

Time series wikipedia , lookup

Transcript
PROC ARKMAP: A Low Cost Method of Presenting Geographical
Data in an Educational Environment
Dennis w. King, University of Arkansas
Richard M. Smith, University of Arkansas
INTRODUCTION
The ARKMAP
procedure is a
means of
limited equipment-CRT
(2) It is easy to use
because of
its limited command
set and
parallels to standard
SAS procedure
coding 1 (3) Few map design decisions are
required on the part of the user.
The
creating county-level choropleth maps of
the State of Arkansas (Figure 1).
be produced using
and line printer;
It is
in current use primarily by researchers,
graduate students and classroom students
selection of class intervals (levels)
and print symbols is built in and is
consistent
with
the
most
current
cartographic research.
in the
College of
Agriculture for
presenting geographical data. There are
several
reasons why
ARKMAP is
an
attractive procedure:
(I)" The maps can
............
............
:
:
:
:
:
:
:
:
...
~"""" .. ,, . . . . . H
... " . . . . .
~"." < ........ "
.................... " .. "" ........ ,,""" ....... ,," ........ " ........... ,............. " ........... " ............... " ....... " ........................ :
: .. ".... ".... "· .. "··<>" ...... ~ .... ".. ".... •........ ".. •...................... ".. ~f~[~O; .. :E~i~~" .... •.......................... •.... "...... ".. •...... "........ •.......... •.......................... ..
.....
(LA"S
'"
.................
~"'"
" .. <> ..
LP"ns
., ... )I<;T •
s ... 0 I<;r •
1111111111 •
"~"".'"
PATTEIIN
I l l l I '"
CLIISS LIMITe;
toE DI
~r.
PAHEIIN
iiHi '"
CLAS~
L P"!tS
SE nIST.
......... " ................................................................ " ....................................................................................... :
566
of the mean.
ARKMAF is written in PL/I and runs
in a 1 MB virtual machine under VM/CMS
SAS 82.3.
ARRMAF will process both
numeric
variables
and
character
variables of length B or less.
Current
cartographic
research
indicates that the average map reader is
unable to reliably differentiate more
than 5 to 7 gray tone map symbols.
This
perceptual limitation is even more pronounced in the case of map patterns
produced with a computer line printer.
For this reason, ARKMAP limits the user
to a maximum of four classes cn any
single map.
Line printer produced map
patterns have been selected to maximize
gray tone contrast between symbols and
are automatically assigned.
In order to use ARKMAP, a SAS Data
Set must be created for input.
Each
observation in the data set must be
identified by a number which represents
a map unit area.
ARKMAP uses standard
FIPS code numbers to identify counties.
The numeric variable whose values are
FIFS codes is indicated
in the ID
statement.
The
VARIABLES statement
provides a list of variables in the SAS
data set to be processed by ARKMAF.
ARKMAP produces
a separate
set of
summary statistics and map for each
variable in the VARIABLES statement.
Additionally, it calculates the mean or
the sum
{depending on the
use or
exclusion of the MEANS option)
for each
numeric variable by county.
Although a user may not create more
than a four class map with ARKMAP, it is
possible to make a map with less than
four classes. This is done by selecting
either the Quantile or Range Method and
indicating the number of classes with
the
GROUP
parameter.
The
GROUF
parameter is ignored
for all other
grouping methods which create only four
class maps.
After
numeric
variables
are
processed,
ARKMAF
establishes class
intervals for the map according to the
algorithm specified in
the CLASSINT
parameter.
The mapmaker may either
accept the recommended default method
for selecting class intervals or select
one of the alternative methods.
The
default Optimization ~iethod (OPTIMIZE)
is an adaption of Fisher's algorithm
which groups data values for maximum
homogeneity. The result of optimization
grouping is a set of choropleth map
.classes having minimum variation within
classes and ·maximum variation between
classes.
From a practical standpoint,
this means that
each pattern which
appears on the
choropleth map will
represent a group of data values which
are as alike as is possible.
The
default method need not be coded to
produce class intervals and should be
used in
all but the
most unusual
circumstances.
Complete
versatility
of
class
interval selection is achieved through
the use of character variables.
In the
Data Step the user may create character
variable values which correspond to any
set of class interval values he or she
might choose.
ARKMAP then assigns a
choropleth map symbol to each unique
variable value.
For example, one may
wish to divide acreage data into three
specific groups as indicated in the
following code:
DATA TWO, SET ONE,
IF ACRE LT 100 THEN ACREAGE='LT 100
IF 100 LE ACRE LE 5220
,
.•
THEN ACREAGE='IOO-5220';
IF ACRE
,.
•
GT 5220 THEN
ACREAGE='GT 5220
One may also wish simply to assign
a label of some sort to a map unit area:
DATA TWO; SET ONE;
IF FIPS LT 39 THEN REGION='NORTHERN';
The
Natural Breaks
Method
of
grouping (NATBRKS) arranges the cata in
ascending order,
locates
the thtee
largest intervals between data values,
and then assigns class boundaries at
these locations.
The Quantile Method
(QUANTILE) assigns an equal number of
data values to each class interval. The
Equal Interval Method, '(RANGE) determines
class intervals by sorting the . data in
ascending order,
finding
the range
between the smallest and largest value,
and dividing
the range
into equal
intervals.
The Standard
Deviation
Method (STDDEV)
first calculates the
mean and standard deviation of the data
set to be grouped. Class boundaries are
then established at a distance of one or
two standard deviations on either Side
IF 39 LE FIPS LE 79
THEN REGION='SOUTBERN';
IF FIPS GT 79 THEN REGION='WESTERN';
ARKMAP will produce up to a four
class map,
depending on the values of
the character variables.
In this case,
there should be only one observation per
county in the input data set.
In the
case of multiple observations, only the
last observation for each county will be
used. Also, a maximum of four classes
may be defined by the character variable
value; further classes will be ignored.
The
four
class
limit
has
been
established
because of
map
deSign
considerations.
Counties
for which
th.ere are no observations in the data
567
set are left blank.
~ ~
of the BY statement variables.
ARK MAP statement
PROC ARK MAP
The 10 statement
The ID
statement indicates
the
numeric variable in the data set which
identifies which county the observation
is to be attributed.
The valUe of the
ID variable must be a valid FIPS code
for Arkansas.
The 10 statement must be
present
for the
procedure to
run
successfully.
options-and-paramete~s:
The options and parameters that can
appear, are as follows:
DATA=data-set-name
The DATA parameter
tells ARKMAP
which SAS da ta set to be analyzed.
If
it is omitted, the last data set created
will be used.
The VARIABLES statement
The VARIABLES statement identifies
the variables in the data set to be
processed by ARKMAP.
A separate map
will be created for each variable in the
VARIABLES statement.
If the statement
is omitted,
ARKMAP produces a map for
each variable not listed in an 10 or BY
statement.
The VARIABLES statement may
contain a mix of character and nUmeric
variables.
CLASSINT=OPTIMIZE
NATBRKS
QUANTILE
RANGE
STDDEV
The CLASSINT parameter tells ARKMAP
which algorithm will be used to generate
the
class intervals
for the
map.
OPTIMIZE requests FISHER'S algorithm.
NATBRKS requests
that intervals
be
located at the largest gaps in the data
array.
QUANTILE requests the data be
divided equally among classes.
RANGE
requests that
intervals be
set by
dividing the range into equal parts.
STDOEV requests that interval selection
be based on the
mean and standard
'deviation of the data set.
OPTIMIZE is
the default method.
All of these
methods
are
discussed
in
the
introduction.
TREATMENT OF MISSING VALUES
If a value of the ID variable or the
var fable being processed is missing,
that observation is
ignored in the
calculations.
REFERENCES
Fisher, W.D. ·On Grouping for Maximum
Homogeneity",
Journal
Qf
American
Statistical Associati~.
Vol. 53, pp
789-798, 1958.
GROUP=no. of intervals
Hartigan, J.A.
Clustering Algorithms.
New York: John Wiley & Sons, 1975.
The GROUP parameter is a means of
specifying
the
number
of
groups
(classes) into which ARKMAP will divide
the data set.
GROUP is valid only for
the RANGE
and QUANTILE
methods as
specified in the CLASSINT parameter; it
is ignored
for all
other methods .•
Omission of this parameter will cause
ARKMAP to create a map with four class
intervals.
lllM
Q!li COMMAND
AND MMJlQ
San Jose, California; IBM
Corp., SC19-6209-1.
Jenks, G.F.
Optimal DAtA Classificati2n
Hap~.
Lawrence, Kansas:
Occasional Paper No.
2, Department of
Geography, University of Kansas, 1977.
L2r Choropleth
Jenks, G.F. and Caspall, F.C. "Error on
Choroplethic
Maps:
Definition,
Measurement, Reduction", Annals Qf Lhe
Association ~
American ~raphers.
MEANS
The MEANS option indicates that the
mean of the response
data will be
calculated for each county. Means would
then be used for interval generation.
The
MEANS options
is ignored
for
character data.
PROCEDURE. INFORMATION
~:
REFERENCE.
Vol. 61, No.2, pp. 177-184, 1976.
MANUAL.
San Jose,
LANGUAGE REFERENCE
California:
IBM
Corp., GC26-3977-0.
BAS
PROGRAMME~ mLI.Ill::.
Institute, Inc., 1981.
~EMENTS
Cary, N.C.: SAS'
The BY statement
Smith, R.M. -Improved Areal Symbols for
Computer
Line-printed
Maps·,
.:rJle.
American Cartographer. Vol. 7, No. 1,
If a BY statement is used, the data
set must be sorted by the variables in
the BY statement. ARKMAP will produce a
new map and summary table for each valUe
pp. 51-57.
S68
,