Download Spatial Data - Midlands State University

Document related concepts
no text concepts found
Transcript
MIDLANDS STATE UNIVERSITY
DEPARTMENT OF SURVEYING AND GEOMATICS
SVG210 GEOGRAPHIC INFORMATION SYSTEMS I
LECTURE NOTES
D NJIKE
FEBRUARY 2010
SVG210 GIS I NOTES
1
MODULE OUTLINE
1.
INTRODUCTION
i.
ii.
iii.
iv.
2.
FUNDAMENTALS OF GIS
i.
ii.
iii.
3.
GIS Components
GIS Subsystems
Characteristics of geographic features
GIS AS MODELS OF REALITY
i.
ii.
iii.
4.
Spatial objects
Relationships of spatial objects
Representation of spatial objects
SPATIAL DATA MODELS
i.
ii.
iii.
iv.
5.
Vector data model
Raster data model
Capabilities of Raster and Vector GIS
The Raster/Vector data Debate
GIS DATA CAPTURE TECHNIQUES
i.
ii.
iii.
iv.
6.
Definition of GIS
Contributing Disciplines
Areas of Applications
History of GIS
Sources of data
Modes of spatial data entry
Vectorisation
Rasterisation
FUTURE OF GIS
SVG210 GIS I NOTES
2
CHAPTER 1
INTRODUCTION
SVG210 GIS I NOTES
3
1.1 INTRODUCTION
Definition of GIS
Several working definitions of GIS have been
proposed.
All definitions focus on:
•
•
•
•
•
Data
Users
Hardware
Software
Methods/purpose
Lets look at what the words themselves mean
SVG210 GIS I NOTES
4
Cont’d
a)
Geography
The science which has for its object the description of the earth’s surface, treating of its
form and physical features, its natural and political divisions, the climate, productions,
populations etc. of the various countries.
It is frequently divided into mathematical, physical and political geography.
b)
Data
The collected raw facts about data objects, the fundamental indivisible object in an
application
c)
Information
The derived relationship between data sets/items, implicit functional association between
data in an application
d)
System
An organized or connected group of objects.
A set or assemblage of things connected, associated or interdependent, so as to form
a complex unity.
A whole composed of parts in orderly arrangement according to some scheme or plan.
A group of connected entities and activities which interact for a common purpose
a car is a system in which all the components operate together to provide
transportation
SVG210 GIS I NOTES
5
Cont’d
•
an Information System is a set of processes, executed on raw data, to produce
information which will be useful in decision-making
– a chain of steps leads from observation and collection of data through analysis
– an information system must have a full range of functions to achieve its purpose, including
observation, measurement, description, explanation, forecasting, decision-making
•
a Geographic Information System uses geographically referenced data as well as
non-spatial data and includes operations which support spatial analysis
– in GIS, the common purpose is decision-making, for managing use of land, resources,
transportation, retailing, oceans or any spatially distributed entities
– the connection between the elements of the system is geography, e.g. location, proximity,
spatial distribution
•
in this context GIS can be seen as a system of hardware, software and procedures
designed to support the capture, management, manipulation, analysis, modeling
and display of spatially-referenced data for solving complex planning and
management problems
– although many other computer programs can use spatial data (e.g. AutoCAD
and statistics packages), GISs include the additional ability to perform spatial
operations
Geographic Information System is a complex arrangement of associated or connected
things or objects, whose purpose is to communicate knowledge about features on
the surface of the earth. We can expand the definition of this to include features
above and below the surface of the earth.
SVG210 GIS I NOTES
6
Some other definitions
•
•
•
•
•
•
•
An information system that is designed to work with data referenced by spatial or
geographic coordinates. In other words, a GIS is both a system with specific
working capabilities for spatially-referenced data, as well as a set of operations for
working [analysis] with the data. (Star and Estes, 1990)
A working GIS integrates five key components: hardware, software, data, people
and methods (ESRI 1997)
A system for capturing, storing checking, integrating, manipulating, analyzing and
displaying data which are spatially referenced to the earth.
Automated systems for the capture, storage, retrieval, analysis and display of
spatial data. (Clarke 1990)
A system of hardware, software, and procedures designed to support the capture,
management, manipulation, analysis, modeling and display of spatially-referenced
data for solving complex planning and management problems (NCGIA lecture by
David Cowen 1989)
An integrated package for the input, storage analysis, and output of spatial
information…..analysis being the most significant (Gaile and Willmott 1989)
GIS are simultaneously the telescope, the microscope, the computer, and the xerox
machine of regional analysis and synthesis of spatial data (Alber, 1988)
SVG210 GIS I NOTES
7
OTHER RELATED TERMS
CAD – Computer Aided Design
– Non spatial data
– Cannot perform analysis
AUTOCAD – Automated CAD
– Mapping soft wares
– Can handle spatial data
– Cannot perform analysis
CAM – Computer Aided Mapping
– Store maps in automated form
AM/FM – Automated Mapping/Facility Management
– Specialized systems to manage facilities like water pipes, electric lines, telephone etc.
CADASTRE – multi-purpose cadastres
–
manage information on land parcels
Many packages can now perform many operations as GIS.
What distinguishes GIS from these is:
– the ability to integrate geo-referenced data
– ability to perform analysis like buffering
– ability to create topology
SVG210 GIS I NOTES
8
Why is GIS important?
• "GIS technology is to geographical analysis what the microscope, the
telescope, and computers have been to other sciences.... (It) could
therefore be the catalyst needed to dissolve the regional-systematic and
human- physical dichotomies that have long plagued geography" and
other disciplines which use spatial information.
• GIS integrates spatial and other kinds of information within a single system
- it offers a consistent framework for analyzing geographical data
• by putting maps and other kinds of spatial information into digital form,
GIS allows us to manipulate and display geographical knowledge in new
and exciting ways
• GIS makes connections between activities based on geographic proximity
– looking at data geographically can often suggest new insights, explanations
– these connections are often unrecognized without GIS, but can be vital to
understanding and managing activities and resources
– e.g. we can link toxic waste records with school locations through geographic
proximity
• GIS allows access to administrative records - property ownership, tax files,
utility cables and pipes - via their geographical positions
SVG210 GIS I NOTES
9
1.2 CONTRIBUTING DISCIPLINES AND TECHNOLOGIES
•
•
•
GIS is a convergence of technological fields and traditional disciplines
GIS has been called an "enabling technology" because of the potential it offers for the wide variety
of disciplines which must deal with spatial data
each related field provides some of the techniques which make up GIS
–
•
many of these related fields emphasize data collection - GIS brings them together by emphasizing
integration, modeling and analysis
as the integrating field, GIS often claims to be the science of spatial information
Geography
• broadly concerned with understanding the world and man's place in it
• long tradition in spatial analysis
• provides techniques for conducting spatial analysis and a spatial perspective on research
Cartography
• concerned with the display of spatial information
• currently the main source of input data for GIS is maps
• provides long tradition in the design of maps which is an important form of output from GIS
• computer cartography (also called "digital cartography", "automated cartography") provides
methods for digital
representation and manipulation of cartographic features and methods of visualization
SVG210 GIS I NOTES
10
Cont’d
Remote Sensing
• images from space and the air are major source of geographical data
• remote sensing includes techniques for data acquisition and processing anywhere on the globe at
low cost, consistent update potential
• many image analysis systems contain sophisticated analytical functions
• interpreted data from a remote sensing system can be merged with other data layers in a GIS
Photogrammetry
• using aerial photographs and techniques for making accurate measurements from them,
photogrammetry is the source of most data on topography (ground surface elevations) used for
input to GIS
Surveying
• provides high quality data on positions of land boundaries, buildings, etc.
Geodesy
• source of high accuracy positional control for GIS
Statistics
• many models built using GIS are statistical in nature, many statistical techniques used for analysis
• statistics is important in understanding issues of error and uncertainty in GIS data
Operations Research
• many applications of GIS require use of optimizing techniques for decision-making
SVG210 GIS I NOTES
11
Cont’d
Computer Science
• computer-aided design (CAD) provides software, techniques for data input, display
and visualization, representation, particularly in 3 dimensions
• advances in computer graphics provide hardware, software for handling and
displaying graphic objects, techniques of visualization
• database management systems (DBMS) contribute methods for representing data
in digital form, procedures for system design and handling large volumes of data,
particularly access and update
• artificial intelligence (AI) uses the computer to make choices based on available
data in a way that is seen to emulate human intelligence and decision-making computer can act as an "expert" in such functions as designing maps, generalizing
map features
– although GIS has yet to take full advantage of AI, AI already provides methods and techniques
for system design
Mathematics
• several branches of mathematics, especially geometry and graph theory, are used
in GIS system design and analysis of spatial data
Civil Engineering
• GIS has many applications in transportation, urban engineering
SVG210 GIS I NOTES
12
1.3 MAJOR AREAS OF PRACTICAL APPLICATION
Street network-based
•
•
•
•
address matching - finding locations given street addresses
vehicle routing and scheduling
location analysis, site selection
development of evacuation plans
Natural resource-based
• management of wild and scenic rivers, recreation resources, floodplains,
wetlands, agricultural lands, aquifers, forests, wildlife
• Environmental Impact Analysis (EIA)
• View shed analysis
• hazardous or toxic facility citing
• groundwater modeling and contamination tracking
• wildlife habitat analysis, migration routes planning
SVG210 GIS I NOTES
13
Cont’d
Land parcel-based
• zoning, subdivision plan review
• land acquisition
• environmental impact statements
• water quality management
• maintenance of ownership
Facilities management
• locating underground pipes, cables
• balancing loads in electrical networks
• planning facility maintenance
• tracking energy use
SVG210 GIS I NOTES
14
1.4 HISTORY OF GIS
A. INTRODUCTION
• GIS has evolved out of a long tradition of map making
• development of GIS was influenced by:
– key groups, companies and individuals
– timely development of key concepts
•
outside North America, significant developments occurred at the Experimental
Cartography Unit in the UK
– history of this group has been documented by Rhind (1988)
B. HISTORIC USE OF MULTIPLE THEME MAPS
• idea of portraying different layers of data on a series of base maps, and relating
things geographically, has been around much longer than computers
– maps of the Battle of Yorktown (American Revolution) drawn by the French Cartographer
Louis-Alexandre Berthier contained hinged overlays to show troop movements
– the mid-19th Century "Atlas to Accompany the Second report of the Irish Railway
Commissioners" showed population, traffic flow, geology and topography superimposed on
the same base map
– Dr. John Snow used a map showing the locations of death by cholera in central London in
September, 1854 to track the source of the outbreak to a contaminated well - an early
example of geographical analysis
SVG210 GIS I NOTES
15
Cont’d
C. EARLY COMPUTER ERA
• several factors caused a change in cartographic analysis:
– computer technology - improvements in hardware, esp. graphics
– development of theories of spatial processes in economic and social
geography, anthropology, regional science
– increasing social awareness, education levels and mobility, awareness of
environmental problems
• integrated transportation plans of 1950s and 60s in Detroit, Chicago
– required integration of transportation information - routes, destinations,
origins, time
– produced maps of traffic flow and volume
• University of Washington, Department of Geography, research on
advanced statistical methods, rudimentary computer programming,
computer cartography, most active 1958-611:
– Nystuen - fundamental spatial concepts - distance, orientation, connectivity
– Tobler - computer algorithms for map projections, computer cartography
– Bunge - theoretical geography - geometric basis for geography - points, lines
and areas
– Berry's Geographical Matrix of places by characteristics (attributes) - regional
studies by overlaying maps of different themes - systematic studies by detailed
evaluation of a single layer
SVG210 GIS I NOTES
16
Cont’d
D. CANADA GEOGRAPHIC INFORMATION SYSTEM (CGIS)
•
•
•
Canada Geographic Information System is an example of one of the earliest GISs developed, started
in the mid '60's
is a large scale system still operating today
its development provided many conceptual and technical contributions
Purpose
•
•
•
•
to analyze the data collected by the Canada Land Inventory (CLI) and to produce statistics to be
used in developing land management plans for large areas of rural Canada
the CLI created maps which:
– classify land using various themes: soil capability for agriculture recreation capability
capability for wildlife (ungulates) capability for wildlife (waterfowl) forestry capability present
land use shoreline
– were developed at map scales of 1:50,000
– use a simple rating scheme, 1 (best) to 7 (poorest), with detailed qualification codes, e.g. on
soils
– may indicate bedrock, shallow soil, alkaline conditions
product of CLI was 7 primary map layers, each showing area objects with homogeneous attributes
– other map layers were developed subsequently, e.g. census reporting zones
perception was that computers could perform analyses once the data had been input
SVG210 GIS I NOTES
17
Cont’d
Technological innovations
• CGIS required the development of new technology
– no previous experience in how to structure data internally
– no precedent for GIS operations of overlay, area measurement
– experimental scanner had to be built for map input
• very high costs of technical development
– cost-benefit studies done to justify the project were initially convincing
– major cost over-runs
– analysis behind schedule
• by 1970 project was in trouble
– failure to deliver promised tabulations, capabilities
• completion of database, product generation under way by mid 1970s
– main product was statistical summaries of the area with various combinations
of themes
– later enhancement allowed output of simple maps
• CGIS still highly regarded in late 1970s, early 1980s as center of
technological excellence despite aging of database
– attempts were made to adapt the system to new data
– new functionality added, especially networking capability and remote access
– however, this was too late to compete with the new vendor products of 1980s
SVG210 GIS I NOTES
18
Cont’d
Key innovative ideas in CGIS
• use of scanning for input of high density area objects
– maps had to be redrafted (scribed) for scanning
– note: scribing is as labor intensive as digitizing
• vectorization of scanned images
• geographical partitioning of data into "map sheets" or "tiles" but with edge matching across
tile boundaries
• partitioning of data into themes or layers
• use of absolute system of coordinates for entire country with precision adjustable to
resolution of data
– number of digits of precision can be set by the system manager and changed from layer
to layer
• internal representation of line objects as chains of incremental moves in 8 compass
directions rather than straight lines between points (Freeman chain code)
• coding of area object boundaries by arc, with pointers to left and right area objects
– first "topological" system with planar enforcement in each layer, relationships between
arcs and areas coded in the database
• separation of data into attribute and locational files
– "descriptor dataset" (DDS) and "image dataset" (IDS)
– concept of an attribute table
• implementation of functions for polygon overlay, measurement of area, user-defined circles
and polygons for query
Key individual
• Roger Tomlinson, now with Tomlinson Associates, Ottawa
SVG210 GIS I NOTES
19
Cont’d
E. HARVARD LABORATORY
• full name - Harvard Laboratory For Computer Graphics And Spatial Analysis
• Howard Fisher, moved from Chicago to establish a lab at Harvard, initially to
develop general-purpose mapping software - mid 1960s
• Harvard Lab for Computer Graphics and Spatial Analysis had major influence on
the development of GIS until early 1980s, still continues at smaller scale
• Harvard software was widely distributed and helped to build the application base
for GIS
• many pioneers of newer GIS "grew up" at the Harvard lab
The Harvard packages
• SYMAP
– developed as general-purpose mapping package beginning in 1964
– output exclusively on line printer
•
poor resolution, low quality
– limited functionality but simple to use
•
a way for the non-cartographer to make maps
– first real demonstration of ability of computers to make maps
– sparked enormous interest in a previously unheard-of technology
•
CALFORM (late 1960s)
– SYMAP on a plotter
– user avoided double-coding of internal boundaries by inputting a table of point locations, plus
a set of polygons defined by sequences of point IDs
– more cosmetic than SYMAP - North arrows, better legends
SVG210 GIS I NOTES
20
Cont’d
•
•
•
•
SYMVU (late 1960s)
– 3D perspective views of SYMAP output
– first new form of display of spatial data to come out of a computer
GRID (late 1960s)
– raster cells could be displayed using the same output techniques as SYMAP
– later developed to allow multiple input layers of raster cells, beginnings of raster GIS
– used to implement the ideas of overlay from landscape architecture and McHarg
POLYVRT (early 1970s)
– converted between various alternative ways of forming area objects: SYMAP - every polygon
separately, internal boundaries twice CALFORM - table of point locations plus lists of IDs DIME
- see below
– motivated by need of computer mapping packages for flexible input, transfer of boundary files
between systems, growing supply of data in digital form, e.g. from Bureau of the Census
ODYSSEY (mid 1970s)
– extended POLYVRT idea beyond format conversion to a comprehensive analysis package based
on vector data
– first robust, efficient algorithm for polygon overlay - included sliver removal
Key individuals
• Howard Fisher - initiated Lab, development of SYMAP
• William Warntz - succeeded Fisher as Director until 1971, developed techniques, theories of spatial
analysis based on computer handling of spatial data
• Scott Morehouse - move to ESRI was key link between ODYSSEY and the development of ARC/INFO
• see Chrisman (1988) for additional information on the Lab and its key personnel
SVG210 GIS I NOTES
21
Cont’d
F. BUREAU OF THE CENSUS
•
need for a method of assigning census returns to correct geographical location
– address matching to convert street addresses to geographic coordinates and census reporting zones
– with geographic coordinates, data could be aggregated to user-specified custom reporting zones
•
need for a comprehensive approach to census geography
– reporting zones are hierarchically related
– e.g. enumeration districts nest within census tracts
•
1970 was the first geocoded census
•
DIME files were the major component of the geocoding approach
DIME files
•
precursor to TIGER, urban areas only
•
coded street segments between intersections using
– IDs of right and left blocks
– IDs of from and to nodes (intersections)
– x,y coordinates
– address ranges on each side
•
this is essentially the arc structure of CGIS and the internal structure (common denominator format) of POLYVRT
•
DIME files were very widely distributed and used as the basis for numerous applications
•
topological ideas of DIME were refined into TIGER model
– planar enforcement
– 0-, 1- and 2-cell terminology
•
DIME, TIGER were influential in stimulating development work on products which rely on street network
databases
– automobile navigation systems
– driver guides to generate text driving instructions (e.g. auto rental agencies)
– garbage truck routing
– emergency vehicle dispatching
SVG210 GIS I NOTES
22
Cont’d
Urban atlases
• beginning with the 1970 census
• production of "atlases" of computer-generated maps for selected census variables for
selected cities
• demonstrated the value of simple computer maps for marketing, retailing applications
– stimulated development of current range of PC-based statistical mapping packages
• based on use of digital boundary files produced by the Bureau
G. ESRI
•
•
•
•
•
Jack Dangermond founded Environmental Systems Research Institute in 1969 based on
techniques, ideas being developed at Harvard Lab and elsewhere
1970s period of slow growth based on various raster and vector systems
early 1980s release of ARC/INFO
– successful implementation of CGIS idea of separate attribute and locational information
– successful marriage of standard relational database management system (INFO) to
handle attribute tables with specialized software to handle objects stored as arcs (ARC) a basic design which has been copied in many other systems
– "toolbox", command-driven, product-oriented user interface
• modular design allowed elaborate applications to be built on top of toolbox
ARC/INFO was the first GIS to take advantage of new super-mini hardware
– GIS could now be supported by a platform which was affordable to many resource
management agencies
– emphasis on independence from specific platforms, operating systems
initial successes in forestry applications, later diversification to many GIS markets
– expansion to $40 million company by 1988
SVG210 GIS I NOTES
23
CHAPTER 2
FUNDAMENTALS OF GIS
SVG210 GIS I NOTES
24
2.1 COMPONENTS OF GIS
A GIS can be divided into five components:
People, Data, Hardware, Software, and
Procedures. All of these components need to
be in balance for the system to be successful.
No one part can run without the other.
SVG210 GIS I NOTES
25
COMPONENTS OF GIS
SVG210 GIS I NOTES
26
Cont’d
People
The people are the component who actually makes the GIS work. They include a
plethora of positions including GIS managers, database administrators, application
specialists, systems analysts, and programmers. They are responsible for
maintenance of the geographic database and provide technical support. People
also need to be educated to make decisions on what type of system to use. People
associated with a GIS can be categorized into: viewers, general users, and GIS
specialists.
– Viewers are the public at large whose only need is to browse a geographic
database for referential material. These constitute the largest class of users.
– General Users are people who use GIS to conducting business, performing
professional services, and making decisions. They include facility managers,
resource managers, planners, scientists, engineers, lawyers, business
entrepreneurs, etc.
– GIS specialists are the people who make the GIS work. They include GIS
managers, database administrators, application specialists, systems analysts,
and programmers. They are responsible for the maintenance of the
geographic database and the provision of technical support to the other two
classes of users. (Lo, 2002)
SVG210 GIS I NOTES
27
Cont’d
Hardware
Hardware consists of the technical equipment needed to run a GIS including a computer system with
enough power to run the software, enough memory to store large amounts of data, and input and
output devices such as scanners, digitizers, GPS data loggers, media disks, and printers. (Carver,
1998)
Software
There are many different GIS software packages available today. All packages must be capable of data
input, storage, management, transformation, analysis, and output, but the appearance, methods,
resources, and ease of use of the various systems may be very different. Today’s software packages
are capable of allowing both graphical and descriptive data to be stored in a single database, known
as the object-relational model. Before this innovation, the geo-relational model was used. In this
model, graphical and descriptive data sets were handled separately. The modern packages usually
come with a set of tools that can be customized to the users needs (Lo, 2002).
Applications
Applications of geographic information to real world problem solving is the crux of any GIS. Whether the
application be simple data tracking and storage or complex multidimensional analysis, a GIS should
be designed with the potential applications in mind. This technology can be an important tool for
coastal resource managers. Applications could include water quality monitoring, mapping of coral
reefs or oyster beds, shoreline monitoring, land use planning, and hazard mitigation.
SVG210 GIS I NOTES
28
Cont’d
Data
Perhaps the most time consuming and costly aspect of initiating a GIS is
creating a database. There are several things to consider before acquiring
geographic data. It is crucial to check the quality of the data before
obtaining it. Errors in the data set can add many unpleasant and costly
hours to implementing a GIS and the results and conclusions of the GIS
analysis most likely will be wrong. Several guidelines to look at include:
• Lineage – This is a description of the source material from which the data
were derived, and the methods of derivation, including all transformations
involved in producing the final digital files. This should include all dates of
the source material and updates and changes made to it. (Guptill, 1995)
• Positional Accuracy – This is the closeness of an entity in an appropriate
coordinate system to that entity’s true position in the system. The
positional accuracy includes measures of the horizontal and vertical
accuracy of the features in the data set. (Guptill, 1995)
SVG210 GIS I NOTES
29
Data (cont’d)
• Attribute Accuracy – An attribute is a fact about some location, set of
locations, or features on the surface of the earth. This information often
includes measurements of some sort, such as temperature or elevation or
a label of a place name. The source of error usually lies within the
collection of these facts. It is vital to the analysis aspects of a GIS that this
information be accurate.
• Logical Consistency - Deals with the logical rules of structure and attribute
rules for spatial data and describes the compatibility of a datum with
other data in a data set. There are several different mathematical theories
and models used to test logical consistency such as metric and incidence
tests, topological and order related tests. These consistency checks
should be run at different stages in the handling of spatial data. (Guptill,
1995)
• Completeness – This is a check to see if relevant data is missing with
regards to the features and the attributes. This could deal with either
omission errors or spatial rules such as minimum width or area that may
limit the information. (Guptill, 1995) (Chrisman,1999)
SVG210 GIS I NOTES
30
The new concept
• For years the Components of GIS have been the fundamental guidelines
for geographers worldwide. Hardware, software, people, data, and
procedures have been drilled into students in all universities and served as
the basis for managers in planning and building a GIS. While other
occupations are aged, tested, and refined, the field of GIS is young and
growing. In just a short time, the world has come to recognize GIS as a tool
of unlimited potential. The new geographic tool has rejuvenated other
professions that were once set in their ways. As a result, there has been a
merger between GIS and other professions.
• With any new partnership comes change. In the case of GIS, it is necessary
to develop specialty knowledge in addition to the basic geographic
concepts. The GIS technology is evolving as well. Users are no longer
proficient in just one type of software. Instead, GIS professionals are able
to maneuver through several software packages and are fluent in at least
one programming language. There is a vast amount of new data and the
need for data management systems. Moreover, people who use GIS are
finding new ways to portray the data and analyses to the end user.
• It is important, as the profession evolves, to redefine the most basic of its
components. These components should describe or outline the
foundation from which the entire field is built.
SVG210 GIS I NOTES
31
SVG210 GIS I NOTES
32
THE PROPOSED SIX COMPONENTS OF GIS
I: Core Geographic Ideas
Currently, people are one of the components of GIS. But what is it about people that make
them a component? It is the knowledge that people carry that makes them important. In
order to properly function as a GIS professional it is necessary to have at least a general
knowledge of geography and GIS. This knowledge is the nucleus of the new components.
II: Technology
Software and hardware were thought to be separate components. Instead, they should be
combined and placed in a group of technological necessities in today’s GIS. They are
accompanied by system administration, database administration, programming, and GPS
units.
• GIS Software may include products from ESRI, MapInfo, Intergraph, AutoCAD, or GRASS.
There are many other companies making GIS products as well. As GIS evolves, technicians
and analysts are finding themselves using software from more than one maker. Gone are the
days of being proficient in only one GIS application.
• As the needs of interoperability rise and the popularity of open source software strengthens,
so do the needs of GIS professionals to know programming and scripting languages. As more
people use the Internet to share data, there is also a call for additional knowledge in web
design programming. With the data comes the need for database administration and design
using Oracle and SQL. And to bring it all together, the GIS professional must be able to
perform basic hardware maintenance and have an understanding of networks and servers.
• Collecting new, unique and specialized data has become an integral part of GIS. To do this,
fieldwork is often required. GPS knowledge is essential in gathering new data along with
surveying concepts and other forms of data collection.
SVG210 GIS I NOTES
33
Cont’d
III: Data
This component should not change. Data includes any information that is spatial or tabular
that relates to geography and specialty fields. They may include parcels, crime statistics, or
tornado paths. The greatest thing is that these data could be anything and everything.
However, the quality and accuracy of data is important and should always be considered.
Metadata, or the data about data, has proven to be very important as more data becomes
available.
IV: Specialty Fields
The other half of the current component, people, is the knowledge professionals have
acquired in fields other than geography. More than ever, there is a need for GIS users to be
able to make specific queries and analyses that can only be conceptualized with a specialized
knowledge base. With this merger of GIS and specialty fields, comes the ability to further
understand the results from such queries and analyses. As a result, GIS Specialists are
learning specialized and focused fields while people in specialized fields are learning to use
GIS. For example, soon, it will be difficult for a professional who specialized in transportation
GIS to transfer to forestry GIS without first obtaining the specialized training.
V: Procedures \ Methods
This component does not need much modification with the exception that it binds GIS with
the specialty fields. No longer can management focus solely on GIS, but must consider the
plans, models, and organization of GIS and the specialty field. Managers need to describe
new and unique ways to use the data and technology, and then present the results to the end
user.
SVG210 GIS I NOTES
34
Cont’d
VI: GeoVisualization
A new and very important proposed component of GIS is the
presentation of data to the end user. This representation of space
and time can take the form of maps, graphs, charts, animations,
and simulations. Visualizing aspects in 3D or using creative
symbology gives users new perspectives and can enable a higher
level of communication. The World Wide Web has created a forum
for interactive mapping that allows for the user to request desired
data without any influence from the author.
• GeoVisualization in GIS is synonymous with influence, and how data
is presented can impact decision making and planning. All data has
a story to tell and can be manipulated in unethical ways.
GeoVisualization, or communication, has become a vital component
of GIS because of its ethical and influential roles.
SVG210 GIS I NOTES
35
SVG210 GIS I NOTES
36
2.2 GIS INTERRELATED SUBSYSTEMS
Data Processing Subsystem
• data acquisition - from maps, images or field surveys
• data input - data must be input from source material to the digital database
• data storage - how often is it used, how should it be updated, is it confidential?
Data Analysis Subsystem
•
retrieval and analysis - may be simple responses to queries, or complex statistical analyses of large
sets of data
• information output - how to display the results? as maps or tables? Or will the information be fed
into some other digital system?
Information Use Subsystem
•
users may be researchers, planners, managers
• interaction needed between GIS group and users to plan analytical procedures and data structures
Management Subsystem
• organizational role - GIS section is often organized as a separate unit within a resource management
agency (cf. the Computer Center at many universities) offering spatial database and analysis
services
• staff - include System Manager, Database Manager, System Operator, System Analysts, Digitizer
Operators - a typical resource management agency GIS center might have a staff of 5-7
• procedures - extensive interaction is needed between the GIS group and the rest of the
organization if the system is to function effectively
SVG210 GIS I NOTES
37
2.3 CHARACTERISTIC OF GEOGRAPHIC FEATURES
INTRODUCTION
• the objects in a spatial database are representations of
real-world entities with associated descriptions
• the power of a GIS comes from its ability to look at
entities in their geographical context and examine
relationships between entities
• thus a GIS database is much more than a collection of
objects.
• objects in the real world can be represented as either:
– Points
– Lines
– Areas/polygons
• The choice depends largely on scale and purpose.
SVG210 GIS I NOTES
38
Cont’d
POINT FEATURES
• the simplest type of spatial object
• choice of entities which will be represented as points depends on
the scale of the map/study
– on a large scale map - building structures as point locations
– on a small scale map - cities as point locations
• the coordinates of each point can be stored as two additional
attributes
• information on a set of points can be viewed as an extended
attribute table
– each row is a point - all information about the point is contained in the
row
– each column is an attribute
– two of the columns are the coordinates
– northing and easting represent y and x coordinates
• each point is independent of every other point, represented as a
separate row in the database model
– Examples
• Boreholes, schools, cities
SVG210 GIS I NOTES
39
Cont’d
LINE FEATURES
• Features with a dimension/length
• Start on a point and end on a different point
• One dimension
– Examples
• Rivers, roads
SVG210 GIS I NOTES
40
Cont’d
AREA/POLYGON FEATURES
• Features with 2 dimension length and width
• Bounded by a series of connected line
segments
• Start and end on the same point
– Examples
• Countries, dams, schools, cities
SVG210 GIS I NOTES
41
CHAPTER 3
GIS AS MODELS OF REALITY
SVG210 GIS I NOTES
42
3.1 SPATIAL OBJECTS
Identification of Spatial Objects
Spatial objects are defined by 4 major
components:
• Spatial data
• Attribute data
• Relationships
• Time
SVG210 GIS I NOTES
43
Cont’d
Spatial Data
• Each feature has a location that must be specified in a unique way
• Data that defines the location
• Answers the question, where it is?
• Location recorded in terms of coordinates
• These are in the form of graphic primitives that are usually either
points, line or areas/polygons
• This is the graphical representation of the data
• Normally defined by
– a map
– Coordinates
• All data sets that will be used together should have a common
coordinate system.
– enables transformations to be carried out
SVG210 GIS I NOTES
44
BOREHOLES IN MAGO FARM
SVG210 GIS I NOTES
45
Cont’d
Non-spatial data (Attribute data)
• The descriptive data about features
• Data that describes the features
• Answers the questions:
– What it is?
– When?
– How much?
• Attributes are normally stored in tables called attribute tables
• Attribute tables are linked to the spatial data
• In the attribute table:
– each object corresponds to a row of the table
– each characteristic/field or theme corresponds to a column of the table
– thus the table shows the thematic and some of the spatial modes
• Data stored in attribute tables can be numeric, alpha, alpha/numeric
• Can have multiple tables for different types of objects and themes
SVG210 GIS I NOTES
46
Attribute table
NAME
ID
DEPTH
CAPACITY
Y
X
RWIZI
BH01
60
8000
3566.89
7791.23
CHURU
BH02
80
10000
3669.67
8100.65
GOMO
BH03
95
7000
4900.98
7784.75
SVG210 GIS I NOTES
47
Cont’d
Time (temporal data)
• Spatial objects change with time
• It is important to indicate when the data was captured
• the temporal mode can be captured in several ways
– by specifying the interval of time over which an object
exists
– by capturing information at certain points in time
– by specifying the rates of movement of objects
• depending on how the temporal mode is captured, it
may be included in a single attribute table, or be
represented by series of attribute tables on the same
objects through time
SVG210 GIS I NOTES
48
Scales of measurement
•
numerical values may be defined with respect to nominal, ordinal, interval, or ratio
scales of measurement
• it is important to recognize the scales of measurement used in GIS data as this
determines the kinds of mathematical operations that can be performed on the
data
• the different scales can be demonstrated using an example of a marathon race:
1. Nominal
• on a nominal scale, numbers merely establish identity
– e.g. a phone number signifies only the unique identity of the phone
•
in the race, the numbers issued to racers which are used to identify individuals are
on a nominal scale
– these identity numbers do not indicate any order or relative value in terms of the race
outcome
2. Ordinal
• on an ordinal scale, numbers establish order only
– phone number 9618224 is not more of anything than 9618049, so phone numbers are not
ordinal
•
in the race, the finishing places of each racer, i.e. 1st place, 2nd place, 3rd place,
are measured on an ordinal scale
– however, we do not know how much time difference there is between each racer
SVG210 GIS I NOTES
49
Cont’d
3. Interval
•
on interval scales, the difference (interval) between numbers is meaningful, but the numbering scale does
not start at 0
– subtraction makes sense but division does not
– e.g. it makes sense to say that 200C is 10 degrees warmer than 100C, so Celsius temperature is an
interval scale, but 200C is not twice as warm as 100C
– e.g. it makes no sense to say that the phone number 9680244 is 62195 more than 9618049, so
phone numbers are not measurements on an interval scale
•
in the race, the time of the day that each racer finished is measured on an interval scale
– if the racers finished at 9:10 GMT, 9:20 GMT and 9:25 GMT, then racer one finished 10 minutes
before racer 2 and the difference between racers 1 and 2 is twice that of the difference between
racers 2 and 3
– however, the racer finishing at 9:10 GMT did not finish twice as fast as the racer finishing at 18:20
GMT
4. Ratio
•
on a ratio scale, measurement has an absolute zero and the difference between numbers is significant
– division makes sense
– e.g. it makes sense to say that a 50 kg person weighs half as much as a 100 kg person, so weight in kg
is on a ratio scale
– the zero point of weight is absolute but the zero point of the Celsius scale is not
•
in our race, the first place finisher finished in a time of 2:30, the second in 2:40 and the 450th place
finisher took 5 hours
– the 450th finisher took twice as long as the first place finisher (5/2.5 = 2)
•
note these distinctions, though important, are not always clearly defined
– is elevation interval or ratio? if the local base level is 750 feet, is a mountain at 2000 feet twice as
SVG210 GIS I NOTES
50
high as one at 1000 feet when viewed from the valley?
Cont’d
• many types of geographical data used in GIS applications are
nominal or ordinal
– values establish the order of classes, or their distinct identity, but
rarely intervals or ratios
• thus you cannot:
– multiply soil type 2 by soil type 3 and get soil type 6
– divide urban area by the rank of a city to get a meaningful number
– subtract suitability class 1 from suitability class 4 to get 3 of anything
• however, you can:
– divide population by area (both ratio scales) and get population
density
– subtract elevation at point a from elevation at point b and get
difference of elevation
SVG210 GIS I NOTES
51
3.2 RELATIONSHIPS OF SPATIAL OBJECTS
•
•
•
•
•
•
Spatial objects are related to each other in different ways.
There are millions of relationships between spatial objects
The power of GIS is its ability to store relationships among objects
This is a unique feature of GIS
The relationship is called topology
Relatioships are important in analysis
– e.g. "is contained in" relationship between a point and an area is important in
relating objects to their surrounding environment
– e.g. "intersects" between two lines is important in analyzing routes through
networks
• relationships can exist between entities of the same type or of different
types
– e.g. for each shopping center, can find the nearest shopping center (same
type)
– e.g. for each customer, can find the nearest shopping center (different types)
• Examples of relationships
–
–
–
–
‘is within’
‘is nearest to’
‘is within’
‘is contained in’
SVG210 GIS I NOTES
52
3.3 REPRESENTATION OF SPATIAL OBJECTS
INTRODUCTION
• maps are the main source of data for GIS
• the traditions of cartography are fundamentally important to GIS
• GIS has roots in the analysis of information on maps, and
overcomes many of the limitations of manual analysis
• this unit is about cartography and its relationship to GIS - how does
GIS differ from cartography, particularly automated cartography,
which uses computers to make maps?
WHAT IS A MAP?
• according to the International Cartographic Association, a map is:
– a representation, normally to scale and on a flat medium, of a
selection of material or abstract features on, or in relation to, the
surface of the Earth
SVG210 GIS I NOTES
53
Cont’d
Cartographic abstraction
• production of a map requires:
–
–
–
–
–
selection of the few features in the real world
classification of selected features into groups (i.e. bridges, churches, railways)
simplification of jagged lines like coastlines
exaggeration of features to be included that are to small to show at the scale of the map
symbolization to represent the different classes of features chosen
Types of maps
• in practice we normally think of two types of map:
• topographic map - a reference tool, showing the outlines of selected natural and
man-made features of the Earth
– often acts as a frame for other information
– "Topography" refers to the shape of the surface, represented by contours and/or shading, but
topographic maps also show roads and other prominent features
•
thematic map - a tool to communicate geographical concepts such as the
distribution of population densities, climate, movement of goods, land use etc.
SVG210 GIS I NOTES
54
Cont’d
Thematic maps in GIS
• several types of thematic map are important in GIS:
• a choropleth map uses reporting zones such as counties or census tracts to show
data such as average incomes, percent female, or rates of mortality
– the boundaries of the zones are established independently of the data, and may be used to
report many different sets of data
•
an area class map shows zones of constant attributes, such as vegetation, soil type,
or forest species
– the boundaries are different for each map as they are determined by the variation of the
attribute being mapped, e.g. breaks of soil type may occur independently of breaks of
vegetation
•
an isopleth map shows an imaginary surface by means of lines joining points of
equal value, "isolines" (e.g. contours on a topographic map)
– used for phenomena which vary smoothly across the map, such as temperature, pressure,
rainfall or population density
Line maps versus photo maps
• an important distinction for GIS is between a line map and a photo map
• a line map shows features by conventional symbols or by boundaries
• a photo map is derived from a photographic image taken from the air
– features are interpreted by the eye as it views the map
– certain features may be identified by overprinting labels
– photomaps are relatively cheap to make but are rarely completely free of distortions
SVG210 GIS I NOTES
55
Cont’d
AUTOMATED AND COMPUTER-ASSISTED CARTOGRAPHY
Changeover to computer mapping
•
•
•
•
•
•
personalities were critically important in the 1960s and early 1970s - individual interests
determined the direction and focus of research and development in computer
cartography (see Rhind, 1988)
impetus for change began in two communities:
1. scientists wishing to make maps quickly to see the results of modeling, or to display
data from large archives already in digital form, e.g. census tables
– SYMAP was the first significant package for this purpose, released by the Harvard
Lab in 1967
2. cartographers seeking to reduce the cost and time of map production and editing
– hardware costs limited interest in this technology prior to 1980 to the major
mapping agencies
– the costs of computing have dropped dramatically, by an order of magnitude every
six years
– an early belief that the entire map-making process could be automated diminished
by 1975 because of difficulties of generalization and design
• has resurfaced in the context of Expert Systems where the computer chooses
the proper techniques based on characteristics of the data, scale, map
purpose, etc.
today, far more maps are made by computer than by hand
– now few mapmakers are trained cartographers
also, it is now clear that once created, digital data can serve purposes other than mapmaking, so it has additional value
SVG210 GIS I NOTES
56
Cont’d
Advantages of computer cartography
• lower cost for simple maps, faster production
• greater flexibility in output - easy scale or projection change - maps can be
tailored to user needs
• other uses for digital data
Disadvantages of computer cartography
• relatively few full-scale systems have been shown to be truly cost-effective
in practice, despite early promise
• high capital cost, though this is now much reduced
• computer methods do not ensure production of maps of high quality
– there is a perceived loss of regard for the "cartographic tradition" with the
consequent production of "cartojunk"
GIS and Computer Cartography
• computer cartography has a primary goal of producing maps
– systems have advanced tools for map layout, placement of labels, large
symbol and font libraries, interfaces for expensive, high quality output devices
• however, it is not an analytical tool
– therefore, unlike data for GIS, cartographic data does not need to be stored in
ways which allow, for example, analysis of relationships between different
themes such as population density and housing prices or the routing of flows
along connecting highway or river segments
SVG210 GIS I NOTES
57
Cont’d
GIS COMPARED TO MAPS
Data stores
• spatial data stored in digital format in a GIS allows for rapid access for traditional
as well as innovative purposes
• nature of maps creates difficulties when used as sources for digital data
– most GIS take no account of differences between datasets derived from maps at different
scales
– idiosyncrasies (e.g. generalization procedures) in maps become "locked in" to the data derived
from them
– such errors often become apparent only during later processing of digital data derived from
them
•
however, maps still remain an excellent way of compiling spatial information, e.g.
field survey
– maps can be designed to be easy to convert to digital form, e.g. by the use of different colors
which have distinct signatures when scanned by electronic sensors
•
as well maps can be produced by GISs as cheap, high density stores of information
for the end user
– however, consistent, accurate retrieval of data from maps is difficult
– only limited amounts of data can be shown due to constraints of the paper medium
Data indexes
• this function can be performed much better by a good GIS due to the ability to
provide multiple and efficient cross-referencing and searching
SVG210 GIS I NOTES
58
Cont’d
Data analysis tools
• GIS is a powerful tool for map analysis
– traditional impediments to the accurate and rapid measurement of
area or to map overlay no longer exist
– many new techniques in spatial analysis are becoming available
Data display tools
• electronic display offers significant advantages over the paper map
– ability to browse across an area without interruption by map sheet
boundaries
– ability to zoom and change scale freely
– potential for the animation of time dependent data
– display in "3 dimensions" (perspective views), with "real-time"
rotation of viewing angle
– potential for continuous scales of intensity and the use of color and
shading independent of the constraints of the printing process, ability
to change colors as required for interpretation
• one of a kind, special purpose products are possible and
inexpensive
SVG210 GIS I NOTES
59
CHAPTER 4
SPATIAL DATA MODELS
SVG210 GIS I NOTES
60
4.1 INTRODUCTION
• Traditionally spatial data has been stored and presented in the form
of a map.
• Three basic types of spatial data models have evolved for storing
geographic data digitally.
• These are referred to as:
– Vector data model
– Raster data model
– Image.
• The following diagram reflects the two primary spatial data
encoding techniques.
• These are vector and raster.
• Image data utilizes techniques very similar to raster data, however
typically lacks the internal formats required for analysis and
modeling of the data.
• Images reflect pictures or photographs of the landscape.
SVG210 GIS I NOTES
61
SVG210 GIS I NOTES
62
4.2 VECTOR DATA MODEL
• Vector data model is in the form of the normal map.
• On a map objects are represented as either points, lines, or areas
• Vector storage implies the use of vectors (directional lines) to represent a
geographic feature.
• Vector data is characterized by the use of sequential points or vertices to
define a linear segment.
• Each vertex consists of an X coordinate and a Y coordinate.
• Vector lines are often referred to as arcs and consist of a string of vertices
terminated by a node.
• A node is defined as a vertex that starts or ends an arc segment.
• Point features are defined by one coordinate pair, a vertex.
• Polygonal features are defined by a set of closed coordinate pairs.
• In vector representation, the storage of the vertices for each feature is
important, as well as the connectivity between features, e.g. the sharing
of common vertices where features connect.
SVG210 GIS I NOTES
63
SVG210 GIS I NOTES
64
Vector data model
SVG210 GIS I NOTES
65
Vector models
vector models
There are different models to store and manage vector information.
Each of them has different advantages and disadvantages.
list of coordinates "spaghetti" (figure 5)
vertex dictionary (figure 6)
Dual Independent Map Encoding (DIME) (figure 7)
arc / node (figure 8)
List of coordinates
• simple
• easy to manage
• no topology
• lots of duplication, hence need for large storage space
• very often used in CAC (computer assisted cartography)
SVG210 GIS I NOTES
66
SVG210 GIS I NOTES
67
ARC/NODE structure
SVG210 GIS I NOTES
68
ARC/NODE
File 1. Coordinates of nodes and vertex for all the arcs
ARC
F_node
Vertex
T_node
1
3.2, 5.2
1, 5.2
1,3
2
1,3
1.8,2.6 2.8,3 3.3,4
3.2, 5.2
3
1,2
3.5,2 4.2,2.7
5.2,2.7
File 2. Arcs topology
ARC
F_node
T_node
R_poly
L_poly
1
1
2
External
A
2
2
1
A
External
3
3
4
External
External
SVG210 GIS I NOTES
69
ARC/NODE cont’d
File 3. Polygons topology
Polygon
Arcs
A
1, 2
File 4. Nodes topology
Node
Arcs
1
1,2
2
1,2
3
3
4
4
5
5
SVG210 GIS I NOTES
70
ARC/NODE cont’d
• Fundamental primitive is a point
• Points are identified as nodes
• Nodes are created where lines intersect
SVG210 GIS I NOTES
71
Advantages of vector data model
• Data can be represented at its original resolution and
form without generalization.
• Graphic output is usually more aesthetically pleasing
(traditional cartographic representation);
• Since most data, e.g. hard copy maps, is in vector form
no data conversion is required.
• Accurate geographic location of data is maintained.
• Allows for efficient encoding of topology, and as a
result more efficient operations that require
topological information, e.g. proximity, network
analysis.
SVG210 GIS I NOTES
72
Disadvantages of vector data
• The location of each vertex needs to be stored explicitly.
• For effective analysis, vector data must be converted into a
topological structure. This is often processing intensive and usually
requires extensive data cleaning. As well, topology is static, and any
updating or editing of the vector data requires re-building of the
topology.
• Algorithms for manipulative and analysis functions are complex and
may be processing intensive. Often, this inherently limits the
functionality for large data sets, e.g. a large number of features.
• Continuous data, such as elevation data, is not effectively
represented in vector form. Usually substantial data generalization
or interpolation is required for these data layers.
• Spatial analysis and filtering within polygons is impossible
SVG210 GIS I NOTES
73
THE RASTER DATA MODEL
• Raster data models incorporate the use of a grid-cell data structure where
the geographic area is divided into cells identified by row and column.
• This data structure is commonly called raster.
• Each grid cell is called a pixcel (derived from two words picture element)
• While the term raster implies a regularly spaced grid other tessellated
data structures do exist
• The size of cells in a tessellated data structure is selected on the basis of
the data accuracy and the resolution needed by the user.
• There is no explicit coding of geographic coordinates required since that is
implicit in the layout of the cells.
• A raster data structure is in fact a matrix where any coordinate can be
quickly calculated if the origin point is known, and the size of the grid cells
is known.
• Since grid-cells can be handled as two-dimensional arrays in computer
encoding many analytical operations are easy to program. This makes
tessellated data structures a popular choice for many GIS software.
SVG210 GIS I NOTES
74
Cont’d
• Topology is not a relevant concept with tessellated structures since
adjacency and connectivity are implicit in the location of a particular cell
in the data matrix.
• Several tessellated data structures exist, however only two are commonly
used in GIS's.
• The most popular cell structure is the regularly spaced matrix or raster
structure. This data structure involves a division of spatial data into
regularly spaced cells.
• Each cell is of the same shape and size. Squares are most commonly
utilized.
• Geographic data is rarely distinguished by regularly spaced shapes
• The problem of determining the proper resolution for a particular data
layer can be a concern.
• If one selects too coarse a cell size then data may be overly generalized.
• If one selects too fine a cell size then too many cells may be created
resulting in a large data volume, slower processing times, and a more
cumbersome data set.
SVG210 GIS I NOTES
75
GIS MAP Structure - RASTER systems (Adapted from Berry)
SVG210 GIS I NOTES
76
Raster data model cont’d
• Most raster based GIS software requires that the raster cell contain
only a single discrete value.
• Accordingly, a data layer, e.g. forest inventory stands, may be
broken down into a series of raster maps, each representing an
attribute type, e.g. a species map, a height map, a density map, etc.
• These are often referred to as one attribute maps.
• Raster data storage provides the foundation for quantitative
analysis techniques.
• This is often referred to as raster or map algebra.
• The use of raster data structures allow for sophisticated
mathematical modelling processes while vector based systems are
often constrained by the capabilities and language of a relational
DBMS.
SVG210 GIS I NOTES
77
Advantages of Raster data model
• The geographic location of each cell is implied by its position in the
cell matrix. Accordingly, other than an origin point, e.g. bottom left
corner, no geographic coordinates are stored.
• Due to the nature of the data storage technique data analysis is
usually easy to program and quick to perform.
• The inherent nature of raster maps, e.g. one attribute maps, is
ideally suited for mathematical modeling and quantitative analysis.
• Discrete data, e.g. forestry stands, is accommodated equally well as
continuous data, e.g. elevation data, and facilitates the integrating
of the two data types.
• Grid-cell systems are very compatible with raster-based output
devices, e.g. electrostatic plotters, graphic terminals.
SVG210 GIS I NOTES
78
Disadvantages of Raster data model
• The cell size determines the resolution at which the data is
represented.;
• It is especially difficult to adequately represent linear
features depending on the cell resolution. Accordingly,
network linkages are difficult to establish.
• Processing of associated attribute data may be
cumbersome if large amounts of data exists. Raster maps
inherently reflect only one attribute or characteristic for an
area.
• Since most input data is in vector form, data must undergo
vector-to-raster conversion. Besides increased processing
requirements this may introduce data integrity concerns
due to generalization and choice of inappropriate cell size.
SVG210 GIS I NOTES
79
Cont’d
• It is important to understand that the selection of a particular data
structure can provide advantages during the analysis stage.
• For example, the vector data model does not handle continuous
data, e.g. elevation, very well while the raster data model is more
ideally suited for this type of analysis.
• Accordingly, the raster structure does not handle linear data
analysis, e.g. shortest path, very well while vector systems do.
• It is important for the user to understand that there are certain
advantages and disadvantages to each data model.
• The selection of a particular data model, vector or raster, is
dependent on the source and type of data, as well as the intended
use of the data.
• Certain analytical procedures require raster data while others are
better suited to vector data.
SVG210 GIS I NOTES
80
4.4 CAPABILITIES OF VECTOR AND RASTER GIS
CAPABILITIES OF VECTOR GIS
A. INTRODUCTION
• analysis functions with vector GIS are not quite the same as with raster
GIS
– more operations deal with objects
– measures such as area have to be calculated from coordinates of objects,
instead of counting cells
• some operations are more accurate
– estimates of area based on polygons more accurate than counts of pixels
– estimates of perimeter of polygon more accurate than counting pixel
boundaries on the edge of a zone
• some operations are slower
– e.g. overlaying layers, finding buffers
• some operations are faster
– e.g. finding path through road network
SVG210 GIS I NOTES
81
Cont’d
B. SIMPLE DISPLAY AND QUERY
Display
• using points and "arcs" can display the locations of all objects
stored
• attributes and entity types can be displayed by varying colors, line
patterns and point symbols
• may only want to display a subset of the data
– e.g. want to display areas of urban landuse with some base map data
• select all political boundaries and highways, but only areas that had urban land
uses
• how would the user do this?
–
–
–
–
–
e.g. one of the layers in a database is a "map" of land use, called USE
area objects on this layer have several attributes
one attribute, called CLASS, identifies the area's land use
for urban land use, it has the value "U"
need to extract boundaries for all areas that have CLASS="U"
SVG210 GIS I NOTES
82
Cont’d
Standard Query Language (SQL)
• different systems use different ways of formulating queries
• Standard Query Language (SQL) is used by many systems
• SQL operators:
– relational: >, &LT, =, >=, &LT=
– arithmetic: =, -, *, / (only on numeric fields)
– Boolean: and, or, not
Boolean operators
• used to combine conditions
– e.g. WHERE cumgrade > 3.0 AND grade = "A" (selects students satisfying both conditions
only)
• Boolean operators can have a spatial meaning in GIS as well
– e.g. when two maps are overlayed, areas (polygons) that are superimposed have the
"and" condition
• a spatial representation is used to illustrate Boolean operators in the study of logic, through
the use of diagrams called Venn diagrams
– thus GIS area overlay is a geographical instance of a Venn diagram
– "XOR" is the "exclusive or" - A xor B means A or B but not both
SQL extensions for spatial queries
• some systems allow specifically spatial queries to be handled under SQL e.g. WITHIN
operator
• SELECT &LTobjects> WITHIN &LTspecific area>
• the criteria for these spatial searches may include searching within the radius of a point,
within a bounding rectangle, or within an irregular polygon
SVG210 GIS I NOTES
83
Cont’d
C. RECLASSIFY, DISSOLVE AND MERGE
• reclassify, dissolve and merge operations are used frequently in
working with area objects
– these are used to aggregate areas based on attributes
• consider a soils map:
– we wish to produce a map of major soil types from a layer that has
polygons based on much more finely defined classification scheme
Steps
• 1. reclassify areas by a single attribute or some combination
– e.g. reclassify soil areas by soil type only
• 2. dissolve boundaries between areas of same type
– by delete the arc between two polygons if the relevant attributes are
the same in both polygons
• 3. merge polygons into large objects
– recode the sequence of line segments that connect to form the
boundary (i.e. rebuild topology)
– assign new ID #'s to each new object
SVG210 GIS I NOTES
84
Cont’d
D. TOPOLOGICAL OVERLAY
• suppose individual layers have planar enforcement (required in
many systems, not all)
• when two layers are combined ("overlayed", "superimposed") the
result must have planar enforcement as well
– new intersection must be calculated and created wherever two lines
cross
– a line across an area object creates two new area objects
• topological overlay is the general name for overlay followed by
planar enforcement
• relationships are updated for the new, combined map
• result may be information about relationships (new attributes) for
the old (input) maps rather than the creation of new objects
– e.g. overlay map of school districts on census tracts
• result is map showing every school district/census tract combination
• for each combination, the database contains an area object
• however, concern may be with obtaining the number of overlapping census
tracts as a new attribute of each school district rather than with new objects
themselves
SVG210 GIS I NOTES
85
Cont’d
E. BUFFERING
• a buffer can be constructed around a point, line or area
– buffering creates a new area, enclosing the buffered object
• applications in transportation, forestry, resource management
–
–
–
–
protected zone around lakes and streams
zone of noise pollution around highways
service zone around bus route (e.g. 300 m walking distance)
groundwater pollution zone around waste site
• options available for raster, such as a "friction" layer, do not exist for
vector
• buffering is much more difficult in vector from the point of view of the
programmer
• sometimes, width of the buffer can be determined by an attribute of the
object
– e.g. buffering residential buildings away from a street network:
• three types of street (1, 2, 3 or major, secondary, tertiary) with the setbacks being 600
feet from a major street, 200 feet from a secondary street, and only 100 feet from a
tertiary street
• problems with buffer operations may occur when buffering very
convoluted lines or areas
SVG210 GIS I NOTES
86
Cont’d
CAPABILITIES OF RASTER GIS
A. INTRODUCTION
• a raster GIS must have capabilities for:
–
–
–
–
input of data
various housekeeping functions
operations on layers, like - recode, overlay and spread
output of data and results
• the range of possible functions is enormous, current raster
GISs only scratch the surface
– because the range is so large, some have tried to organize
functions into a consistent scheme, but no scheme has been
widely accepted yet
– this section covers a selection of the most useful and common
• each raster GIS uses different names for the functions
SVG210 GIS I NOTES
87
Cont’d
B. DISPLAYING LAYERS
Basic display
• the simplest type of values to display are integers
– on a color display each integer value can be assigned a unique color
•
if the values have a natural order we will want the sequence of colors to make
sense
– e.g. elevation is often shown on a map using the sequence blue-green-yellow-brown-white for
increasing elevation
•
•
there should be a legend explaining the meaning of each color
on a dot matrix printer shades of grey can be generated by varying the density of
dots
Other types of display
• it may be appropriate to display the data as a surface
• contours can be "threaded" through the pixels along lines of constant value
• the surface can be shown in an oblique, perspective view
– this can be done by drawing profiles across the raster with each profile offset and hidden lines
removed
– the surface might be colored using the values in a second layer (a second layer can be
"draped" over the surface defined by the first layer)
– the result can be very effective
– these operations are also computer-intensive because of the calculations necessary to
simulate perspective and remove hidden lines
SVG210 GIS I NOTES
88
Cont’d
C. LOCAL OPERATIONS
• produce a new layer from one or more input layers
• the value of each new pixel is defined by the values of the
same pixel on the input layer(s)
• note: arithmetic operations make no sense unless the
values have appropriate scales of measurement (see Unit 6)
– you cannot find the "average" of soils types 3 and 5, nor is soil 5
"greater than" soil 3
Recoding
• using only one input layer
• some systems allow a full range of mathematical operations
– e.g. newvalue = (2*oldvalue + 3)2
Overlaying layers
• an overlay occurs when the output value depends on two
or more input layers
– many systems restrict overlay to two input layers only
SVG210 GIS I NOTES
89
Cont’d
D. OPERATIONS ON LOCAL NEIGHBORHOODS
• the value of a pixel on the new layer is determined by the local neighborhood of
the pixel on the old layer
Filtering
• a filter operates by moving a "window" across the entire raster
– e.g. many windows are 3x3 cells
•
•
the new value for the cell at the middle of the window is a weighted average of
the values in the window
by changing the weights we can produce two major effects:
– smoothing (a "low pass" filter, removes or reduces local detail)
– edge enhancement (a "high pass" filter, exaggerates local detail)
•
•
weights should add to 1
filters can be useful in enhancing detail on images for input to GIS, or smoothing
layers to expose general trends
Slopes and aspects
• if the values in a layer are elevations, we can compute the steepness of slopes by
looking at the difference between a pixel's value and those of its adjacent
neighbors
• the direction of steepest slope, or the direction in which the surface is locally
"facing", is called its aspect
• slope and aspect are useful in analyzing vegetation patterns, computing energy
balances and modeling erosion or runoff
– aspect determines the direction of runoff, this can be used to sketch drainage paths for runoff
SVG210 GIS I NOTES
90
Cont’d
E. OPERATIONS ON EXTENDED NEIGHBORHOODS
Distance
•
calculate the distance of each cell from a cell or the nearest of several cells
– each pixel's value in the new layer is its distance from the given cell(s)
Buffer zones
•
buffers around objects and features are very useful GIS capabilities
– e.g. build a logging buffer 500 m wide around all lakes and watercourses
•
buffer operations can be visualized as spreading the object spatially by a given distance
•
the result could be a layer with values: 1 if in original selected object 2 if in buffer 0 if outside object and
buffer
•
applications include noise buffers around roads, safety buffers around hazardous facilities
•
in many programs the buffer operation requires the user to first do a distance operation, then a
reclassification of the distance layer
•
the rate of spreading may be modified by another layer representing "friction"
– e.g. the friction layer could represent varying cost of travel
– this will affect the width of the buffer - narrow in areas of high friction, etc.
Visible area or "viewshed"
•
given a layer of elevations, and one or more viewpoints, compute the area visible from at least one
viewpoint
– useful for planning locations of unsightly facilities such as smokestacks, or surveillance facilities such
as fire towers, or transmission facilities
F. OPERATIONS ON ZONES
SVG210 GIS I NOTES
91
RASTER – VECTOR DEBATE
• arguments about which was better have been
commonplace since the earliest systems were created
• raster databases are appealing
– simplicity of organization
– speed of many operations, e.g. overlay, buffers
– especially appealing to the remote sensing community who are
used to "pixel" processing
• on the other hand, there are many situations in which the
raster approach may appear to sacrifice too much detail
– cartographers were appalled by the crude outlines of parcels
that resulted in the "pinking shear" effect of diagonal
boundaries represented by grid cell edges
SVG210 GIS I NOTES
92
Cont’d
– surveyors were dismayed by the "inaccuracy" caused by the cells when
portraying linear features and points
– situations in which the raster approach sacrificed too much detail
• however, computing times for overlaying vector based information can be
excessive
– early polygon overlay routines were error-prone, expensive, slow
• today, there are situations in which it is clear that one approach is more
functional than the other
– e.g. using "friction" layer to control width of buffer is only feasible in raster
– e.g. viewshed algorithms to find area visible from a point are feasible with
elevation grids (raster DEMs), not with digitized contours
– e.g. land survey data can only be represented with precise lines
• an important current trend involves linking raster and vector systems,
displaying vector data overlying a raster base
– raster data may be from a GIS file (perhaps a remotely sensed image) or from
a plain scanned image file
• therefore, the question has evolved from "Which is best?" to "Under what
conditions is which best and how can we have flexibility to use the most
appropriate approaches on a case by case basis?"
SVG210 GIS I NOTES
93
Basic issues
• four issues to the discussions of raster versus
vector:
– coordinate precision
– speed of analytical processing
– mass storage requirements
– characteristics of phenomena
SVG210 GIS I NOTES
94
CHAPTER 5
GIS DATA CAPTURE TECHNIQUES
SVG210 GIS I NOTES
95
5.1 INTRODUCTION
• Since the input of attribute data is usually quite simple, the discussion of
data input techniques will be limited to spatial data only.
• need to have tools to transform spatial data of various types into digital
format
• data input is a major bottleneck in application of GIS technology
– costs of input often consume 80% or more of project costs
– data input is labor intensive, tedious, error-prone
– there is a danger that construction of the database may become an end in
itself and the project may not move on to analysis of the data collected
– essential to find ways to reduce costs, maximize accuracy
• need to automate the input process as much as possible, but:
– automated input often creates bigger editing problems later
– source documents (maps) may often have to be redrafted to meet rigid quality
requirements of automated input
• because of the costs involved, much research has gone into devising
better input methods - however, few reductions in cost have been realized
SVG210 GIS I NOTES
96
Cont’d
• sharing of digital data is one way around the input
bottleneck
– more and more spatial data is becoming available in digital
form
• data input to a GIS involves encoding both the
locational and attribute data
• the locational data is encoded as coordinates on a
particular cartesian coordinate system
– source maps may have different projections, scales
– several stages of data transformation may be needed to
bring all data to a common coordinate system
• attribute data is often obtained and stored in tables
• The choice of data input method is governed largely by
the application, the available budget, and the type and
the complexity of data being input.
SVG210 GIS I NOTES
97
Modes of data input
• keyboard entry for non-spatial attributes and occasionally locational
data
• manual locating devices
– user directly manipulates a device whose location is recognized by the
computer
– e.g. digitizing
• automated devices
– automatically extract spatial data from maps and photography
– e.g. scanning
• conversion directly from other digital sources
• voice input has been tried, particularly for controlling digitizer
operations
– not very successful - machine needs to be recalibrated for each
operator, after coffee breaks, etc.
SVG210 GIS I NOTES
98
5.2 DIGITIZING
•
digitizers are the most common device for extracting spatial information from
maps and photographs
Hardware
• the position of an indicator as it is moved over the surface of the digitizing tablet is
detected by the computer and interpreted as pairs of x,y coordinates
– the indicator may be a pen-like stylus or a cursor (a small flat plate the size of a hockey puck
with a cross-hair)
•
•
frequently, there are control buttons on the cursor which permit control of the
system without having to turn attention from the digitizing tablet to a computer
terminal
contemporary tablets use a grid of wires embedded in the tablet to generate a
magnetic field which is detected by the cursor
– accuracies are typically better than 0.1 mm
– this is better than the accuracy with which the average operator can position the cursor
– functions for transforming coordinates are sometimes built into the tablet and used to process
data before it is sent to the host
SVG210 GIS I NOTES
99
The digitizing operation
•
•
the map is affixed to a digitizing table
three or more control points ("reference points", "tics", etc.) are digitized for each
map sheet
– these will be easily identified points (intersections of major streets, major peaks, points on
coastline)
– the coordinates of these points will be known in the coordinate system to be used in the final
database, e.g. lat/long, State Plane Coordinates, military grid
– the control points are used by the system to calculate the necessary mathematical
transformations to convert all coordinates to the final system
– the more control points, the better
•
digitizing the map contents can be done in two different modes:
– in point mode, the operator identifies the points to be captured explicitly by pressing a button
– in stream mode points are captured at set time intervals (typically 10 per second) or on
movement of the cursor by a fixed amount
•
advantages and disadvantages:
– in point mode the operator selects points subjectively
•
two point mode operators will not code a line in the same way
– stream mode generates large numbers of points, many of which may be redundant
– stream mode is more demanding on the user while point mode requires some judgement
about how to represent the line
SVG210 GIS I NOTES
100
Problems with digitizing maps
• arise since most maps were not drafted for the purpose of digitizing
– paper maps are unstable: each time the map is removed from the digitizing
table, the reference points must be re-entered when the map is affixed to the
table again
– if the map has stretched or shrunk in the interim, the newly digitized points
will be slightly off in their location when compared to previously digitized
points
– errors occur on these maps, and these errors are entered into the GIS
database as well
– the level of error in the GIS database is directly related to the error level of the
source maps
• maps are meant to display information, and do not always accurately
record locational information
– for example, when a railroad, stream and road all go through a narrow
mountain pass, the pass may actually be depicted wider than its actual size to
allow for the three symbols to be drafted in the pass
• discrepancies across map sheet boundaries can cause discrepancies in the
total GIS database
– e.g. roads or streams that do not meet exactly when two map sheets are
placed next to each other
SVG210 GIS I NOTES
101
DIGITIZING ERRORS AND EDITING
•
•
•
•
•
•
•
•
•
Overshoot
Undershoot
Pseudo node
Open polygon
Dangling arc
Dangling node
Gap
Sliver
No-node arc intersection
Editing errors from digitizing
• some errors can be corrected automatically
– small gaps at line junctions
– overshoots and sudden spikes in lines
•
error rates depend on the complexity of the map, are high for small scale, complex
maps
SVG210 GIS I NOTES
102
Advantages of digitizing
• Manual digitizing has many advantages. These include:
– Low capital cost, e.g. digitizing tables are cheap;
– Low cost of labour;
– Flexibility and adaptability to different data types and
sources;
– Easily taught in a short amount of time - an easily
mastered skill
– Generally the quality of data is high;
– Digitizing devices are very reliable and most often offer a
greater precision that the data warrants; and
– Ability to easily register and update existing data.
SVG210 GIS I NOTES
103
5.3 AUTOMATIC SCANNING
• A variety of scanning devices exist for the automatic capture of spatial
data.
• All have the advantage of being able to capture spatial features from a
map at a rapid rate of speed.
• Most scanning devices have limitations with respect to the capture of
selected features, e.g. text and symbol recognition.
• Experience has shown that most scanned data requires a substantial
amount of manual editing to create a clean data layer.
• Given these basic constraints some other practical limitations of scanners
should be identified. These include :
– hard copy maps are often unable to be removed to where a scanning device is
available
– hard copy data may not be in a form that is viable for effective scanning, e.g.
maps are of poor quality, or are in poor condition;
– geographic features may be too few on a single map to make it practical, costjustifiable, to scan;
– often on busy maps a scanner may be unable to distinguish the features to be
captured from the surrounding graphic information, e.g. dense contours with
labels;
– with raster scanning there it is difficult to read unique labels (text) for a
geographic feature effectively; and
– scanning is much more expensive than manual digitizing, considering all the
104
cost/performance issues. SVG210 GIS I NOTES
Requirements for scanning
• documents must be clean (no smudges or extra markings)
• lines should be at least 0.1 mm wide
• complex line work provides greater chance of error in
scanning
• text may be accidently scanned as line features
• contour lines cannot be broken with text
• automatic feature recognition is not easy (two contour lines
vs. road symbols) diagram
• special symbols (e.g. marsh symbols) must be recognized
and dealt with
• if good source documents are available, scanning can be an
efficient time saving mode of data input
SVG210 GIS I NOTES
105
CRITERIA FOR CHOOSING MODES OF INPUT
• the type of data source
– images favor scanning
– maps can be scanned or digitized
• the database model of the GIS
– scanning easier for raster, digitizing for vector
• the density of data
– dense linework makes for difficult digitizing
• expected applications of the GIS
implementation
SVG210 GIS I NOTES
106
5.4 RASTERIZATION AND VECTORIZATION
Rasterization of digitized data
• for some data, entry in vector form is more efficient, followed by
conversion to raster
• we might digitize the county boundary in vector form by
– mounting a map on a digitizing table
– capturing the locations of points along the boundary
– assuming that the points are connected by straight line segments
• this may produce an ASCII file of pairs of xy coordinates which must then
be processed by the GIS, or the output of the digitizer may go directly into
the GIS
• the vector representation of the boundary as points is then converted to a
raster by an operation known as vector-raster conversion
– the computer calculates which county each cell is in using the vector
representation of the boundary and outputs a raster
• digitizing the boundary is much less work than cell by cell entry
• most raster GIS have functions such as vector-raster conversion to support
vector entry
– many support digitizing and editing of vector data
SVG210 GIS I NOTES
107
Cont’d
Vectorization of scanned images
• for many purposes it is necessary to extract features and objects from a
scanned image
– e.g. a road on the input document will have produced characteristic values in
each of a band of pixels
– if the scanner has pixels of 25 microns = 0.025 mm, a line of width 0.5 mm will
create a band 20 pixels across
– the vectorized version of the line will be a series of coordinate points joined by
straight lines, representing the road as an object or feature instead of a
collection of contiguous pixels
• successful vectorization requires a clean line scanned from media free of
cluttering labels, coffee stains, dust etc.
– to create a sufficiently clean line, it is often necessary to redraft input
documents
• e.g. the Canada Geographic Information System redrafted each of its approximately
10,000 input documents
• since the scanner can be color sensitive, vectorizing may be aided by the
use of special inks for certain features
• although scanning is much less labor intensive, problems with
vectorization lead to costs which are often as high as manual digitizing
– two stages of error correction may be necessary: 1. edit the raster image prior
to vectorization 2. edit the vectorized
features
SVG210 GIS I NOTES
108
CHAPTER 6
FUTURE OF GIS
SVG210 GIS I NOTES
109
6.1 INTRODUCTION
• The development and application of geographic information
systems is vibrant and exciting.
• The term GIS remains one of the most popular buzz words in the
computer industry today. GIS is perceived as one of the emerging
technologies in the computer marketplace.
• Everybody wants a GIS.
• GIS is very much a multi-disciplinary tool for the management of
spatial data.
• It is inherently complex because of the need to integrate data from
a variety of sources.
• Functions must accommodate several application areas in a
detailed and efficient manner.
• A variety of important developments are occurring which will have
profound effects on the use of GIS.
SVG210 GIS I NOTES
110
Hardware
•
•
•
•
•
•
•
•
•
Fast geoprocessing
Parallel Processing
Memory
Workstations
Networks
Hardware for specialized processing functions
Operating systems
Peripheral devices
Specialized workstations
SVG210 GIS I NOTES
111
Software
•
•
•
•
•
•
•
Database management systems
Relational DBMSs
DBMS versus Fourth Generation Languages
GIS system integration
Display products
Interfaces to other technologies
User interfaces
SVG210 GIS I NOTES
112
New sources of Data
•
•
•
•
Remote sensing
Global Positioning Systems
Error/uncertainty
Data sharing
New Application areas if GIS technology
• Modeling and decision support
• Sciences and mathematics
SVG210 GIS I NOTES
113