Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MIDLANDS STATE UNIVERSITY DEPARTMENT OF SURVEYING AND GEOMATICS SVG210 GEOGRAPHIC INFORMATION SYSTEMS I LECTURE NOTES D NJIKE FEBRUARY 2010 SVG210 GIS I NOTES 1 MODULE OUTLINE 1. INTRODUCTION i. ii. iii. iv. 2. FUNDAMENTALS OF GIS i. ii. iii. 3. GIS Components GIS Subsystems Characteristics of geographic features GIS AS MODELS OF REALITY i. ii. iii. 4. Spatial objects Relationships of spatial objects Representation of spatial objects SPATIAL DATA MODELS i. ii. iii. iv. 5. Vector data model Raster data model Capabilities of Raster and Vector GIS The Raster/Vector data Debate GIS DATA CAPTURE TECHNIQUES i. ii. iii. iv. 6. Definition of GIS Contributing Disciplines Areas of Applications History of GIS Sources of data Modes of spatial data entry Vectorisation Rasterisation FUTURE OF GIS SVG210 GIS I NOTES 2 CHAPTER 1 INTRODUCTION SVG210 GIS I NOTES 3 1.1 INTRODUCTION Definition of GIS Several working definitions of GIS have been proposed. All definitions focus on: • • • • • Data Users Hardware Software Methods/purpose Lets look at what the words themselves mean SVG210 GIS I NOTES 4 Cont’d a) Geography The science which has for its object the description of the earth’s surface, treating of its form and physical features, its natural and political divisions, the climate, productions, populations etc. of the various countries. It is frequently divided into mathematical, physical and political geography. b) Data The collected raw facts about data objects, the fundamental indivisible object in an application c) Information The derived relationship between data sets/items, implicit functional association between data in an application d) System An organized or connected group of objects. A set or assemblage of things connected, associated or interdependent, so as to form a complex unity. A whole composed of parts in orderly arrangement according to some scheme or plan. A group of connected entities and activities which interact for a common purpose a car is a system in which all the components operate together to provide transportation SVG210 GIS I NOTES 5 Cont’d • an Information System is a set of processes, executed on raw data, to produce information which will be useful in decision-making – a chain of steps leads from observation and collection of data through analysis – an information system must have a full range of functions to achieve its purpose, including observation, measurement, description, explanation, forecasting, decision-making • a Geographic Information System uses geographically referenced data as well as non-spatial data and includes operations which support spatial analysis – in GIS, the common purpose is decision-making, for managing use of land, resources, transportation, retailing, oceans or any spatially distributed entities – the connection between the elements of the system is geography, e.g. location, proximity, spatial distribution • in this context GIS can be seen as a system of hardware, software and procedures designed to support the capture, management, manipulation, analysis, modeling and display of spatially-referenced data for solving complex planning and management problems – although many other computer programs can use spatial data (e.g. AutoCAD and statistics packages), GISs include the additional ability to perform spatial operations Geographic Information System is a complex arrangement of associated or connected things or objects, whose purpose is to communicate knowledge about features on the surface of the earth. We can expand the definition of this to include features above and below the surface of the earth. SVG210 GIS I NOTES 6 Some other definitions • • • • • • • An information system that is designed to work with data referenced by spatial or geographic coordinates. In other words, a GIS is both a system with specific working capabilities for spatially-referenced data, as well as a set of operations for working [analysis] with the data. (Star and Estes, 1990) A working GIS integrates five key components: hardware, software, data, people and methods (ESRI 1997) A system for capturing, storing checking, integrating, manipulating, analyzing and displaying data which are spatially referenced to the earth. Automated systems for the capture, storage, retrieval, analysis and display of spatial data. (Clarke 1990) A system of hardware, software, and procedures designed to support the capture, management, manipulation, analysis, modeling and display of spatially-referenced data for solving complex planning and management problems (NCGIA lecture by David Cowen 1989) An integrated package for the input, storage analysis, and output of spatial information…..analysis being the most significant (Gaile and Willmott 1989) GIS are simultaneously the telescope, the microscope, the computer, and the xerox machine of regional analysis and synthesis of spatial data (Alber, 1988) SVG210 GIS I NOTES 7 OTHER RELATED TERMS CAD – Computer Aided Design – Non spatial data – Cannot perform analysis AUTOCAD – Automated CAD – Mapping soft wares – Can handle spatial data – Cannot perform analysis CAM – Computer Aided Mapping – Store maps in automated form AM/FM – Automated Mapping/Facility Management – Specialized systems to manage facilities like water pipes, electric lines, telephone etc. CADASTRE – multi-purpose cadastres – manage information on land parcels Many packages can now perform many operations as GIS. What distinguishes GIS from these is: – the ability to integrate geo-referenced data – ability to perform analysis like buffering – ability to create topology SVG210 GIS I NOTES 8 Why is GIS important? • "GIS technology is to geographical analysis what the microscope, the telescope, and computers have been to other sciences.... (It) could therefore be the catalyst needed to dissolve the regional-systematic and human- physical dichotomies that have long plagued geography" and other disciplines which use spatial information. • GIS integrates spatial and other kinds of information within a single system - it offers a consistent framework for analyzing geographical data • by putting maps and other kinds of spatial information into digital form, GIS allows us to manipulate and display geographical knowledge in new and exciting ways • GIS makes connections between activities based on geographic proximity – looking at data geographically can often suggest new insights, explanations – these connections are often unrecognized without GIS, but can be vital to understanding and managing activities and resources – e.g. we can link toxic waste records with school locations through geographic proximity • GIS allows access to administrative records - property ownership, tax files, utility cables and pipes - via their geographical positions SVG210 GIS I NOTES 9 1.2 CONTRIBUTING DISCIPLINES AND TECHNOLOGIES • • • GIS is a convergence of technological fields and traditional disciplines GIS has been called an "enabling technology" because of the potential it offers for the wide variety of disciplines which must deal with spatial data each related field provides some of the techniques which make up GIS – • many of these related fields emphasize data collection - GIS brings them together by emphasizing integration, modeling and analysis as the integrating field, GIS often claims to be the science of spatial information Geography • broadly concerned with understanding the world and man's place in it • long tradition in spatial analysis • provides techniques for conducting spatial analysis and a spatial perspective on research Cartography • concerned with the display of spatial information • currently the main source of input data for GIS is maps • provides long tradition in the design of maps which is an important form of output from GIS • computer cartography (also called "digital cartography", "automated cartography") provides methods for digital representation and manipulation of cartographic features and methods of visualization SVG210 GIS I NOTES 10 Cont’d Remote Sensing • images from space and the air are major source of geographical data • remote sensing includes techniques for data acquisition and processing anywhere on the globe at low cost, consistent update potential • many image analysis systems contain sophisticated analytical functions • interpreted data from a remote sensing system can be merged with other data layers in a GIS Photogrammetry • using aerial photographs and techniques for making accurate measurements from them, photogrammetry is the source of most data on topography (ground surface elevations) used for input to GIS Surveying • provides high quality data on positions of land boundaries, buildings, etc. Geodesy • source of high accuracy positional control for GIS Statistics • many models built using GIS are statistical in nature, many statistical techniques used for analysis • statistics is important in understanding issues of error and uncertainty in GIS data Operations Research • many applications of GIS require use of optimizing techniques for decision-making SVG210 GIS I NOTES 11 Cont’d Computer Science • computer-aided design (CAD) provides software, techniques for data input, display and visualization, representation, particularly in 3 dimensions • advances in computer graphics provide hardware, software for handling and displaying graphic objects, techniques of visualization • database management systems (DBMS) contribute methods for representing data in digital form, procedures for system design and handling large volumes of data, particularly access and update • artificial intelligence (AI) uses the computer to make choices based on available data in a way that is seen to emulate human intelligence and decision-making computer can act as an "expert" in such functions as designing maps, generalizing map features – although GIS has yet to take full advantage of AI, AI already provides methods and techniques for system design Mathematics • several branches of mathematics, especially geometry and graph theory, are used in GIS system design and analysis of spatial data Civil Engineering • GIS has many applications in transportation, urban engineering SVG210 GIS I NOTES 12 1.3 MAJOR AREAS OF PRACTICAL APPLICATION Street network-based • • • • address matching - finding locations given street addresses vehicle routing and scheduling location analysis, site selection development of evacuation plans Natural resource-based • management of wild and scenic rivers, recreation resources, floodplains, wetlands, agricultural lands, aquifers, forests, wildlife • Environmental Impact Analysis (EIA) • View shed analysis • hazardous or toxic facility citing • groundwater modeling and contamination tracking • wildlife habitat analysis, migration routes planning SVG210 GIS I NOTES 13 Cont’d Land parcel-based • zoning, subdivision plan review • land acquisition • environmental impact statements • water quality management • maintenance of ownership Facilities management • locating underground pipes, cables • balancing loads in electrical networks • planning facility maintenance • tracking energy use SVG210 GIS I NOTES 14 1.4 HISTORY OF GIS A. INTRODUCTION • GIS has evolved out of a long tradition of map making • development of GIS was influenced by: – key groups, companies and individuals – timely development of key concepts • outside North America, significant developments occurred at the Experimental Cartography Unit in the UK – history of this group has been documented by Rhind (1988) B. HISTORIC USE OF MULTIPLE THEME MAPS • idea of portraying different layers of data on a series of base maps, and relating things geographically, has been around much longer than computers – maps of the Battle of Yorktown (American Revolution) drawn by the French Cartographer Louis-Alexandre Berthier contained hinged overlays to show troop movements – the mid-19th Century "Atlas to Accompany the Second report of the Irish Railway Commissioners" showed population, traffic flow, geology and topography superimposed on the same base map – Dr. John Snow used a map showing the locations of death by cholera in central London in September, 1854 to track the source of the outbreak to a contaminated well - an early example of geographical analysis SVG210 GIS I NOTES 15 Cont’d C. EARLY COMPUTER ERA • several factors caused a change in cartographic analysis: – computer technology - improvements in hardware, esp. graphics – development of theories of spatial processes in economic and social geography, anthropology, regional science – increasing social awareness, education levels and mobility, awareness of environmental problems • integrated transportation plans of 1950s and 60s in Detroit, Chicago – required integration of transportation information - routes, destinations, origins, time – produced maps of traffic flow and volume • University of Washington, Department of Geography, research on advanced statistical methods, rudimentary computer programming, computer cartography, most active 1958-611: – Nystuen - fundamental spatial concepts - distance, orientation, connectivity – Tobler - computer algorithms for map projections, computer cartography – Bunge - theoretical geography - geometric basis for geography - points, lines and areas – Berry's Geographical Matrix of places by characteristics (attributes) - regional studies by overlaying maps of different themes - systematic studies by detailed evaluation of a single layer SVG210 GIS I NOTES 16 Cont’d D. CANADA GEOGRAPHIC INFORMATION SYSTEM (CGIS) • • • Canada Geographic Information System is an example of one of the earliest GISs developed, started in the mid '60's is a large scale system still operating today its development provided many conceptual and technical contributions Purpose • • • • to analyze the data collected by the Canada Land Inventory (CLI) and to produce statistics to be used in developing land management plans for large areas of rural Canada the CLI created maps which: – classify land using various themes: soil capability for agriculture recreation capability capability for wildlife (ungulates) capability for wildlife (waterfowl) forestry capability present land use shoreline – were developed at map scales of 1:50,000 – use a simple rating scheme, 1 (best) to 7 (poorest), with detailed qualification codes, e.g. on soils – may indicate bedrock, shallow soil, alkaline conditions product of CLI was 7 primary map layers, each showing area objects with homogeneous attributes – other map layers were developed subsequently, e.g. census reporting zones perception was that computers could perform analyses once the data had been input SVG210 GIS I NOTES 17 Cont’d Technological innovations • CGIS required the development of new technology – no previous experience in how to structure data internally – no precedent for GIS operations of overlay, area measurement – experimental scanner had to be built for map input • very high costs of technical development – cost-benefit studies done to justify the project were initially convincing – major cost over-runs – analysis behind schedule • by 1970 project was in trouble – failure to deliver promised tabulations, capabilities • completion of database, product generation under way by mid 1970s – main product was statistical summaries of the area with various combinations of themes – later enhancement allowed output of simple maps • CGIS still highly regarded in late 1970s, early 1980s as center of technological excellence despite aging of database – attempts were made to adapt the system to new data – new functionality added, especially networking capability and remote access – however, this was too late to compete with the new vendor products of 1980s SVG210 GIS I NOTES 18 Cont’d Key innovative ideas in CGIS • use of scanning for input of high density area objects – maps had to be redrafted (scribed) for scanning – note: scribing is as labor intensive as digitizing • vectorization of scanned images • geographical partitioning of data into "map sheets" or "tiles" but with edge matching across tile boundaries • partitioning of data into themes or layers • use of absolute system of coordinates for entire country with precision adjustable to resolution of data – number of digits of precision can be set by the system manager and changed from layer to layer • internal representation of line objects as chains of incremental moves in 8 compass directions rather than straight lines between points (Freeman chain code) • coding of area object boundaries by arc, with pointers to left and right area objects – first "topological" system with planar enforcement in each layer, relationships between arcs and areas coded in the database • separation of data into attribute and locational files – "descriptor dataset" (DDS) and "image dataset" (IDS) – concept of an attribute table • implementation of functions for polygon overlay, measurement of area, user-defined circles and polygons for query Key individual • Roger Tomlinson, now with Tomlinson Associates, Ottawa SVG210 GIS I NOTES 19 Cont’d E. HARVARD LABORATORY • full name - Harvard Laboratory For Computer Graphics And Spatial Analysis • Howard Fisher, moved from Chicago to establish a lab at Harvard, initially to develop general-purpose mapping software - mid 1960s • Harvard Lab for Computer Graphics and Spatial Analysis had major influence on the development of GIS until early 1980s, still continues at smaller scale • Harvard software was widely distributed and helped to build the application base for GIS • many pioneers of newer GIS "grew up" at the Harvard lab The Harvard packages • SYMAP – developed as general-purpose mapping package beginning in 1964 – output exclusively on line printer • poor resolution, low quality – limited functionality but simple to use • a way for the non-cartographer to make maps – first real demonstration of ability of computers to make maps – sparked enormous interest in a previously unheard-of technology • CALFORM (late 1960s) – SYMAP on a plotter – user avoided double-coding of internal boundaries by inputting a table of point locations, plus a set of polygons defined by sequences of point IDs – more cosmetic than SYMAP - North arrows, better legends SVG210 GIS I NOTES 20 Cont’d • • • • SYMVU (late 1960s) – 3D perspective views of SYMAP output – first new form of display of spatial data to come out of a computer GRID (late 1960s) – raster cells could be displayed using the same output techniques as SYMAP – later developed to allow multiple input layers of raster cells, beginnings of raster GIS – used to implement the ideas of overlay from landscape architecture and McHarg POLYVRT (early 1970s) – converted between various alternative ways of forming area objects: SYMAP - every polygon separately, internal boundaries twice CALFORM - table of point locations plus lists of IDs DIME - see below – motivated by need of computer mapping packages for flexible input, transfer of boundary files between systems, growing supply of data in digital form, e.g. from Bureau of the Census ODYSSEY (mid 1970s) – extended POLYVRT idea beyond format conversion to a comprehensive analysis package based on vector data – first robust, efficient algorithm for polygon overlay - included sliver removal Key individuals • Howard Fisher - initiated Lab, development of SYMAP • William Warntz - succeeded Fisher as Director until 1971, developed techniques, theories of spatial analysis based on computer handling of spatial data • Scott Morehouse - move to ESRI was key link between ODYSSEY and the development of ARC/INFO • see Chrisman (1988) for additional information on the Lab and its key personnel SVG210 GIS I NOTES 21 Cont’d F. BUREAU OF THE CENSUS • need for a method of assigning census returns to correct geographical location – address matching to convert street addresses to geographic coordinates and census reporting zones – with geographic coordinates, data could be aggregated to user-specified custom reporting zones • need for a comprehensive approach to census geography – reporting zones are hierarchically related – e.g. enumeration districts nest within census tracts • 1970 was the first geocoded census • DIME files were the major component of the geocoding approach DIME files • precursor to TIGER, urban areas only • coded street segments between intersections using – IDs of right and left blocks – IDs of from and to nodes (intersections) – x,y coordinates – address ranges on each side • this is essentially the arc structure of CGIS and the internal structure (common denominator format) of POLYVRT • DIME files were very widely distributed and used as the basis for numerous applications • topological ideas of DIME were refined into TIGER model – planar enforcement – 0-, 1- and 2-cell terminology • DIME, TIGER were influential in stimulating development work on products which rely on street network databases – automobile navigation systems – driver guides to generate text driving instructions (e.g. auto rental agencies) – garbage truck routing – emergency vehicle dispatching SVG210 GIS I NOTES 22 Cont’d Urban atlases • beginning with the 1970 census • production of "atlases" of computer-generated maps for selected census variables for selected cities • demonstrated the value of simple computer maps for marketing, retailing applications – stimulated development of current range of PC-based statistical mapping packages • based on use of digital boundary files produced by the Bureau G. ESRI • • • • • Jack Dangermond founded Environmental Systems Research Institute in 1969 based on techniques, ideas being developed at Harvard Lab and elsewhere 1970s period of slow growth based on various raster and vector systems early 1980s release of ARC/INFO – successful implementation of CGIS idea of separate attribute and locational information – successful marriage of standard relational database management system (INFO) to handle attribute tables with specialized software to handle objects stored as arcs (ARC) a basic design which has been copied in many other systems – "toolbox", command-driven, product-oriented user interface • modular design allowed elaborate applications to be built on top of toolbox ARC/INFO was the first GIS to take advantage of new super-mini hardware – GIS could now be supported by a platform which was affordable to many resource management agencies – emphasis on independence from specific platforms, operating systems initial successes in forestry applications, later diversification to many GIS markets – expansion to $40 million company by 1988 SVG210 GIS I NOTES 23 CHAPTER 2 FUNDAMENTALS OF GIS SVG210 GIS I NOTES 24 2.1 COMPONENTS OF GIS A GIS can be divided into five components: People, Data, Hardware, Software, and Procedures. All of these components need to be in balance for the system to be successful. No one part can run without the other. SVG210 GIS I NOTES 25 COMPONENTS OF GIS SVG210 GIS I NOTES 26 Cont’d People The people are the component who actually makes the GIS work. They include a plethora of positions including GIS managers, database administrators, application specialists, systems analysts, and programmers. They are responsible for maintenance of the geographic database and provide technical support. People also need to be educated to make decisions on what type of system to use. People associated with a GIS can be categorized into: viewers, general users, and GIS specialists. – Viewers are the public at large whose only need is to browse a geographic database for referential material. These constitute the largest class of users. – General Users are people who use GIS to conducting business, performing professional services, and making decisions. They include facility managers, resource managers, planners, scientists, engineers, lawyers, business entrepreneurs, etc. – GIS specialists are the people who make the GIS work. They include GIS managers, database administrators, application specialists, systems analysts, and programmers. They are responsible for the maintenance of the geographic database and the provision of technical support to the other two classes of users. (Lo, 2002) SVG210 GIS I NOTES 27 Cont’d Hardware Hardware consists of the technical equipment needed to run a GIS including a computer system with enough power to run the software, enough memory to store large amounts of data, and input and output devices such as scanners, digitizers, GPS data loggers, media disks, and printers. (Carver, 1998) Software There are many different GIS software packages available today. All packages must be capable of data input, storage, management, transformation, analysis, and output, but the appearance, methods, resources, and ease of use of the various systems may be very different. Today’s software packages are capable of allowing both graphical and descriptive data to be stored in a single database, known as the object-relational model. Before this innovation, the geo-relational model was used. In this model, graphical and descriptive data sets were handled separately. The modern packages usually come with a set of tools that can be customized to the users needs (Lo, 2002). Applications Applications of geographic information to real world problem solving is the crux of any GIS. Whether the application be simple data tracking and storage or complex multidimensional analysis, a GIS should be designed with the potential applications in mind. This technology can be an important tool for coastal resource managers. Applications could include water quality monitoring, mapping of coral reefs or oyster beds, shoreline monitoring, land use planning, and hazard mitigation. SVG210 GIS I NOTES 28 Cont’d Data Perhaps the most time consuming and costly aspect of initiating a GIS is creating a database. There are several things to consider before acquiring geographic data. It is crucial to check the quality of the data before obtaining it. Errors in the data set can add many unpleasant and costly hours to implementing a GIS and the results and conclusions of the GIS analysis most likely will be wrong. Several guidelines to look at include: • Lineage – This is a description of the source material from which the data were derived, and the methods of derivation, including all transformations involved in producing the final digital files. This should include all dates of the source material and updates and changes made to it. (Guptill, 1995) • Positional Accuracy – This is the closeness of an entity in an appropriate coordinate system to that entity’s true position in the system. The positional accuracy includes measures of the horizontal and vertical accuracy of the features in the data set. (Guptill, 1995) SVG210 GIS I NOTES 29 Data (cont’d) • Attribute Accuracy – An attribute is a fact about some location, set of locations, or features on the surface of the earth. This information often includes measurements of some sort, such as temperature or elevation or a label of a place name. The source of error usually lies within the collection of these facts. It is vital to the analysis aspects of a GIS that this information be accurate. • Logical Consistency - Deals with the logical rules of structure and attribute rules for spatial data and describes the compatibility of a datum with other data in a data set. There are several different mathematical theories and models used to test logical consistency such as metric and incidence tests, topological and order related tests. These consistency checks should be run at different stages in the handling of spatial data. (Guptill, 1995) • Completeness – This is a check to see if relevant data is missing with regards to the features and the attributes. This could deal with either omission errors or spatial rules such as minimum width or area that may limit the information. (Guptill, 1995) (Chrisman,1999) SVG210 GIS I NOTES 30 The new concept • For years the Components of GIS have been the fundamental guidelines for geographers worldwide. Hardware, software, people, data, and procedures have been drilled into students in all universities and served as the basis for managers in planning and building a GIS. While other occupations are aged, tested, and refined, the field of GIS is young and growing. In just a short time, the world has come to recognize GIS as a tool of unlimited potential. The new geographic tool has rejuvenated other professions that were once set in their ways. As a result, there has been a merger between GIS and other professions. • With any new partnership comes change. In the case of GIS, it is necessary to develop specialty knowledge in addition to the basic geographic concepts. The GIS technology is evolving as well. Users are no longer proficient in just one type of software. Instead, GIS professionals are able to maneuver through several software packages and are fluent in at least one programming language. There is a vast amount of new data and the need for data management systems. Moreover, people who use GIS are finding new ways to portray the data and analyses to the end user. • It is important, as the profession evolves, to redefine the most basic of its components. These components should describe or outline the foundation from which the entire field is built. SVG210 GIS I NOTES 31 SVG210 GIS I NOTES 32 THE PROPOSED SIX COMPONENTS OF GIS I: Core Geographic Ideas Currently, people are one of the components of GIS. But what is it about people that make them a component? It is the knowledge that people carry that makes them important. In order to properly function as a GIS professional it is necessary to have at least a general knowledge of geography and GIS. This knowledge is the nucleus of the new components. II: Technology Software and hardware were thought to be separate components. Instead, they should be combined and placed in a group of technological necessities in today’s GIS. They are accompanied by system administration, database administration, programming, and GPS units. • GIS Software may include products from ESRI, MapInfo, Intergraph, AutoCAD, or GRASS. There are many other companies making GIS products as well. As GIS evolves, technicians and analysts are finding themselves using software from more than one maker. Gone are the days of being proficient in only one GIS application. • As the needs of interoperability rise and the popularity of open source software strengthens, so do the needs of GIS professionals to know programming and scripting languages. As more people use the Internet to share data, there is also a call for additional knowledge in web design programming. With the data comes the need for database administration and design using Oracle and SQL. And to bring it all together, the GIS professional must be able to perform basic hardware maintenance and have an understanding of networks and servers. • Collecting new, unique and specialized data has become an integral part of GIS. To do this, fieldwork is often required. GPS knowledge is essential in gathering new data along with surveying concepts and other forms of data collection. SVG210 GIS I NOTES 33 Cont’d III: Data This component should not change. Data includes any information that is spatial or tabular that relates to geography and specialty fields. They may include parcels, crime statistics, or tornado paths. The greatest thing is that these data could be anything and everything. However, the quality and accuracy of data is important and should always be considered. Metadata, or the data about data, has proven to be very important as more data becomes available. IV: Specialty Fields The other half of the current component, people, is the knowledge professionals have acquired in fields other than geography. More than ever, there is a need for GIS users to be able to make specific queries and analyses that can only be conceptualized with a specialized knowledge base. With this merger of GIS and specialty fields, comes the ability to further understand the results from such queries and analyses. As a result, GIS Specialists are learning specialized and focused fields while people in specialized fields are learning to use GIS. For example, soon, it will be difficult for a professional who specialized in transportation GIS to transfer to forestry GIS without first obtaining the specialized training. V: Procedures \ Methods This component does not need much modification with the exception that it binds GIS with the specialty fields. No longer can management focus solely on GIS, but must consider the plans, models, and organization of GIS and the specialty field. Managers need to describe new and unique ways to use the data and technology, and then present the results to the end user. SVG210 GIS I NOTES 34 Cont’d VI: GeoVisualization A new and very important proposed component of GIS is the presentation of data to the end user. This representation of space and time can take the form of maps, graphs, charts, animations, and simulations. Visualizing aspects in 3D or using creative symbology gives users new perspectives and can enable a higher level of communication. The World Wide Web has created a forum for interactive mapping that allows for the user to request desired data without any influence from the author. • GeoVisualization in GIS is synonymous with influence, and how data is presented can impact decision making and planning. All data has a story to tell and can be manipulated in unethical ways. GeoVisualization, or communication, has become a vital component of GIS because of its ethical and influential roles. SVG210 GIS I NOTES 35 SVG210 GIS I NOTES 36 2.2 GIS INTERRELATED SUBSYSTEMS Data Processing Subsystem • data acquisition - from maps, images or field surveys • data input - data must be input from source material to the digital database • data storage - how often is it used, how should it be updated, is it confidential? Data Analysis Subsystem • retrieval and analysis - may be simple responses to queries, or complex statistical analyses of large sets of data • information output - how to display the results? as maps or tables? Or will the information be fed into some other digital system? Information Use Subsystem • users may be researchers, planners, managers • interaction needed between GIS group and users to plan analytical procedures and data structures Management Subsystem • organizational role - GIS section is often organized as a separate unit within a resource management agency (cf. the Computer Center at many universities) offering spatial database and analysis services • staff - include System Manager, Database Manager, System Operator, System Analysts, Digitizer Operators - a typical resource management agency GIS center might have a staff of 5-7 • procedures - extensive interaction is needed between the GIS group and the rest of the organization if the system is to function effectively SVG210 GIS I NOTES 37 2.3 CHARACTERISTIC OF GEOGRAPHIC FEATURES INTRODUCTION • the objects in a spatial database are representations of real-world entities with associated descriptions • the power of a GIS comes from its ability to look at entities in their geographical context and examine relationships between entities • thus a GIS database is much more than a collection of objects. • objects in the real world can be represented as either: – Points – Lines – Areas/polygons • The choice depends largely on scale and purpose. SVG210 GIS I NOTES 38 Cont’d POINT FEATURES • the simplest type of spatial object • choice of entities which will be represented as points depends on the scale of the map/study – on a large scale map - building structures as point locations – on a small scale map - cities as point locations • the coordinates of each point can be stored as two additional attributes • information on a set of points can be viewed as an extended attribute table – each row is a point - all information about the point is contained in the row – each column is an attribute – two of the columns are the coordinates – northing and easting represent y and x coordinates • each point is independent of every other point, represented as a separate row in the database model – Examples • Boreholes, schools, cities SVG210 GIS I NOTES 39 Cont’d LINE FEATURES • Features with a dimension/length • Start on a point and end on a different point • One dimension – Examples • Rivers, roads SVG210 GIS I NOTES 40 Cont’d AREA/POLYGON FEATURES • Features with 2 dimension length and width • Bounded by a series of connected line segments • Start and end on the same point – Examples • Countries, dams, schools, cities SVG210 GIS I NOTES 41 CHAPTER 3 GIS AS MODELS OF REALITY SVG210 GIS I NOTES 42 3.1 SPATIAL OBJECTS Identification of Spatial Objects Spatial objects are defined by 4 major components: • Spatial data • Attribute data • Relationships • Time SVG210 GIS I NOTES 43 Cont’d Spatial Data • Each feature has a location that must be specified in a unique way • Data that defines the location • Answers the question, where it is? • Location recorded in terms of coordinates • These are in the form of graphic primitives that are usually either points, line or areas/polygons • This is the graphical representation of the data • Normally defined by – a map – Coordinates • All data sets that will be used together should have a common coordinate system. – enables transformations to be carried out SVG210 GIS I NOTES 44 BOREHOLES IN MAGO FARM SVG210 GIS I NOTES 45 Cont’d Non-spatial data (Attribute data) • The descriptive data about features • Data that describes the features • Answers the questions: – What it is? – When? – How much? • Attributes are normally stored in tables called attribute tables • Attribute tables are linked to the spatial data • In the attribute table: – each object corresponds to a row of the table – each characteristic/field or theme corresponds to a column of the table – thus the table shows the thematic and some of the spatial modes • Data stored in attribute tables can be numeric, alpha, alpha/numeric • Can have multiple tables for different types of objects and themes SVG210 GIS I NOTES 46 Attribute table NAME ID DEPTH CAPACITY Y X RWIZI BH01 60 8000 3566.89 7791.23 CHURU BH02 80 10000 3669.67 8100.65 GOMO BH03 95 7000 4900.98 7784.75 SVG210 GIS I NOTES 47 Cont’d Time (temporal data) • Spatial objects change with time • It is important to indicate when the data was captured • the temporal mode can be captured in several ways – by specifying the interval of time over which an object exists – by capturing information at certain points in time – by specifying the rates of movement of objects • depending on how the temporal mode is captured, it may be included in a single attribute table, or be represented by series of attribute tables on the same objects through time SVG210 GIS I NOTES 48 Scales of measurement • numerical values may be defined with respect to nominal, ordinal, interval, or ratio scales of measurement • it is important to recognize the scales of measurement used in GIS data as this determines the kinds of mathematical operations that can be performed on the data • the different scales can be demonstrated using an example of a marathon race: 1. Nominal • on a nominal scale, numbers merely establish identity – e.g. a phone number signifies only the unique identity of the phone • in the race, the numbers issued to racers which are used to identify individuals are on a nominal scale – these identity numbers do not indicate any order or relative value in terms of the race outcome 2. Ordinal • on an ordinal scale, numbers establish order only – phone number 9618224 is not more of anything than 9618049, so phone numbers are not ordinal • in the race, the finishing places of each racer, i.e. 1st place, 2nd place, 3rd place, are measured on an ordinal scale – however, we do not know how much time difference there is between each racer SVG210 GIS I NOTES 49 Cont’d 3. Interval • on interval scales, the difference (interval) between numbers is meaningful, but the numbering scale does not start at 0 – subtraction makes sense but division does not – e.g. it makes sense to say that 200C is 10 degrees warmer than 100C, so Celsius temperature is an interval scale, but 200C is not twice as warm as 100C – e.g. it makes no sense to say that the phone number 9680244 is 62195 more than 9618049, so phone numbers are not measurements on an interval scale • in the race, the time of the day that each racer finished is measured on an interval scale – if the racers finished at 9:10 GMT, 9:20 GMT and 9:25 GMT, then racer one finished 10 minutes before racer 2 and the difference between racers 1 and 2 is twice that of the difference between racers 2 and 3 – however, the racer finishing at 9:10 GMT did not finish twice as fast as the racer finishing at 18:20 GMT 4. Ratio • on a ratio scale, measurement has an absolute zero and the difference between numbers is significant – division makes sense – e.g. it makes sense to say that a 50 kg person weighs half as much as a 100 kg person, so weight in kg is on a ratio scale – the zero point of weight is absolute but the zero point of the Celsius scale is not • in our race, the first place finisher finished in a time of 2:30, the second in 2:40 and the 450th place finisher took 5 hours – the 450th finisher took twice as long as the first place finisher (5/2.5 = 2) • note these distinctions, though important, are not always clearly defined – is elevation interval or ratio? if the local base level is 750 feet, is a mountain at 2000 feet twice as SVG210 GIS I NOTES 50 high as one at 1000 feet when viewed from the valley? Cont’d • many types of geographical data used in GIS applications are nominal or ordinal – values establish the order of classes, or their distinct identity, but rarely intervals or ratios • thus you cannot: – multiply soil type 2 by soil type 3 and get soil type 6 – divide urban area by the rank of a city to get a meaningful number – subtract suitability class 1 from suitability class 4 to get 3 of anything • however, you can: – divide population by area (both ratio scales) and get population density – subtract elevation at point a from elevation at point b and get difference of elevation SVG210 GIS I NOTES 51 3.2 RELATIONSHIPS OF SPATIAL OBJECTS • • • • • • Spatial objects are related to each other in different ways. There are millions of relationships between spatial objects The power of GIS is its ability to store relationships among objects This is a unique feature of GIS The relationship is called topology Relatioships are important in analysis – e.g. "is contained in" relationship between a point and an area is important in relating objects to their surrounding environment – e.g. "intersects" between two lines is important in analyzing routes through networks • relationships can exist between entities of the same type or of different types – e.g. for each shopping center, can find the nearest shopping center (same type) – e.g. for each customer, can find the nearest shopping center (different types) • Examples of relationships – – – – ‘is within’ ‘is nearest to’ ‘is within’ ‘is contained in’ SVG210 GIS I NOTES 52 3.3 REPRESENTATION OF SPATIAL OBJECTS INTRODUCTION • maps are the main source of data for GIS • the traditions of cartography are fundamentally important to GIS • GIS has roots in the analysis of information on maps, and overcomes many of the limitations of manual analysis • this unit is about cartography and its relationship to GIS - how does GIS differ from cartography, particularly automated cartography, which uses computers to make maps? WHAT IS A MAP? • according to the International Cartographic Association, a map is: – a representation, normally to scale and on a flat medium, of a selection of material or abstract features on, or in relation to, the surface of the Earth SVG210 GIS I NOTES 53 Cont’d Cartographic abstraction • production of a map requires: – – – – – selection of the few features in the real world classification of selected features into groups (i.e. bridges, churches, railways) simplification of jagged lines like coastlines exaggeration of features to be included that are to small to show at the scale of the map symbolization to represent the different classes of features chosen Types of maps • in practice we normally think of two types of map: • topographic map - a reference tool, showing the outlines of selected natural and man-made features of the Earth – often acts as a frame for other information – "Topography" refers to the shape of the surface, represented by contours and/or shading, but topographic maps also show roads and other prominent features • thematic map - a tool to communicate geographical concepts such as the distribution of population densities, climate, movement of goods, land use etc. SVG210 GIS I NOTES 54 Cont’d Thematic maps in GIS • several types of thematic map are important in GIS: • a choropleth map uses reporting zones such as counties or census tracts to show data such as average incomes, percent female, or rates of mortality – the boundaries of the zones are established independently of the data, and may be used to report many different sets of data • an area class map shows zones of constant attributes, such as vegetation, soil type, or forest species – the boundaries are different for each map as they are determined by the variation of the attribute being mapped, e.g. breaks of soil type may occur independently of breaks of vegetation • an isopleth map shows an imaginary surface by means of lines joining points of equal value, "isolines" (e.g. contours on a topographic map) – used for phenomena which vary smoothly across the map, such as temperature, pressure, rainfall or population density Line maps versus photo maps • an important distinction for GIS is between a line map and a photo map • a line map shows features by conventional symbols or by boundaries • a photo map is derived from a photographic image taken from the air – features are interpreted by the eye as it views the map – certain features may be identified by overprinting labels – photomaps are relatively cheap to make but are rarely completely free of distortions SVG210 GIS I NOTES 55 Cont’d AUTOMATED AND COMPUTER-ASSISTED CARTOGRAPHY Changeover to computer mapping • • • • • • personalities were critically important in the 1960s and early 1970s - individual interests determined the direction and focus of research and development in computer cartography (see Rhind, 1988) impetus for change began in two communities: 1. scientists wishing to make maps quickly to see the results of modeling, or to display data from large archives already in digital form, e.g. census tables – SYMAP was the first significant package for this purpose, released by the Harvard Lab in 1967 2. cartographers seeking to reduce the cost and time of map production and editing – hardware costs limited interest in this technology prior to 1980 to the major mapping agencies – the costs of computing have dropped dramatically, by an order of magnitude every six years – an early belief that the entire map-making process could be automated diminished by 1975 because of difficulties of generalization and design • has resurfaced in the context of Expert Systems where the computer chooses the proper techniques based on characteristics of the data, scale, map purpose, etc. today, far more maps are made by computer than by hand – now few mapmakers are trained cartographers also, it is now clear that once created, digital data can serve purposes other than mapmaking, so it has additional value SVG210 GIS I NOTES 56 Cont’d Advantages of computer cartography • lower cost for simple maps, faster production • greater flexibility in output - easy scale or projection change - maps can be tailored to user needs • other uses for digital data Disadvantages of computer cartography • relatively few full-scale systems have been shown to be truly cost-effective in practice, despite early promise • high capital cost, though this is now much reduced • computer methods do not ensure production of maps of high quality – there is a perceived loss of regard for the "cartographic tradition" with the consequent production of "cartojunk" GIS and Computer Cartography • computer cartography has a primary goal of producing maps – systems have advanced tools for map layout, placement of labels, large symbol and font libraries, interfaces for expensive, high quality output devices • however, it is not an analytical tool – therefore, unlike data for GIS, cartographic data does not need to be stored in ways which allow, for example, analysis of relationships between different themes such as population density and housing prices or the routing of flows along connecting highway or river segments SVG210 GIS I NOTES 57 Cont’d GIS COMPARED TO MAPS Data stores • spatial data stored in digital format in a GIS allows for rapid access for traditional as well as innovative purposes • nature of maps creates difficulties when used as sources for digital data – most GIS take no account of differences between datasets derived from maps at different scales – idiosyncrasies (e.g. generalization procedures) in maps become "locked in" to the data derived from them – such errors often become apparent only during later processing of digital data derived from them • however, maps still remain an excellent way of compiling spatial information, e.g. field survey – maps can be designed to be easy to convert to digital form, e.g. by the use of different colors which have distinct signatures when scanned by electronic sensors • as well maps can be produced by GISs as cheap, high density stores of information for the end user – however, consistent, accurate retrieval of data from maps is difficult – only limited amounts of data can be shown due to constraints of the paper medium Data indexes • this function can be performed much better by a good GIS due to the ability to provide multiple and efficient cross-referencing and searching SVG210 GIS I NOTES 58 Cont’d Data analysis tools • GIS is a powerful tool for map analysis – traditional impediments to the accurate and rapid measurement of area or to map overlay no longer exist – many new techniques in spatial analysis are becoming available Data display tools • electronic display offers significant advantages over the paper map – ability to browse across an area without interruption by map sheet boundaries – ability to zoom and change scale freely – potential for the animation of time dependent data – display in "3 dimensions" (perspective views), with "real-time" rotation of viewing angle – potential for continuous scales of intensity and the use of color and shading independent of the constraints of the printing process, ability to change colors as required for interpretation • one of a kind, special purpose products are possible and inexpensive SVG210 GIS I NOTES 59 CHAPTER 4 SPATIAL DATA MODELS SVG210 GIS I NOTES 60 4.1 INTRODUCTION • Traditionally spatial data has been stored and presented in the form of a map. • Three basic types of spatial data models have evolved for storing geographic data digitally. • These are referred to as: – Vector data model – Raster data model – Image. • The following diagram reflects the two primary spatial data encoding techniques. • These are vector and raster. • Image data utilizes techniques very similar to raster data, however typically lacks the internal formats required for analysis and modeling of the data. • Images reflect pictures or photographs of the landscape. SVG210 GIS I NOTES 61 SVG210 GIS I NOTES 62 4.2 VECTOR DATA MODEL • Vector data model is in the form of the normal map. • On a map objects are represented as either points, lines, or areas • Vector storage implies the use of vectors (directional lines) to represent a geographic feature. • Vector data is characterized by the use of sequential points or vertices to define a linear segment. • Each vertex consists of an X coordinate and a Y coordinate. • Vector lines are often referred to as arcs and consist of a string of vertices terminated by a node. • A node is defined as a vertex that starts or ends an arc segment. • Point features are defined by one coordinate pair, a vertex. • Polygonal features are defined by a set of closed coordinate pairs. • In vector representation, the storage of the vertices for each feature is important, as well as the connectivity between features, e.g. the sharing of common vertices where features connect. SVG210 GIS I NOTES 63 SVG210 GIS I NOTES 64 Vector data model SVG210 GIS I NOTES 65 Vector models vector models There are different models to store and manage vector information. Each of them has different advantages and disadvantages. list of coordinates "spaghetti" (figure 5) vertex dictionary (figure 6) Dual Independent Map Encoding (DIME) (figure 7) arc / node (figure 8) List of coordinates • simple • easy to manage • no topology • lots of duplication, hence need for large storage space • very often used in CAC (computer assisted cartography) SVG210 GIS I NOTES 66 SVG210 GIS I NOTES 67 ARC/NODE structure SVG210 GIS I NOTES 68 ARC/NODE File 1. Coordinates of nodes and vertex for all the arcs ARC F_node Vertex T_node 1 3.2, 5.2 1, 5.2 1,3 2 1,3 1.8,2.6 2.8,3 3.3,4 3.2, 5.2 3 1,2 3.5,2 4.2,2.7 5.2,2.7 File 2. Arcs topology ARC F_node T_node R_poly L_poly 1 1 2 External A 2 2 1 A External 3 3 4 External External SVG210 GIS I NOTES 69 ARC/NODE cont’d File 3. Polygons topology Polygon Arcs A 1, 2 File 4. Nodes topology Node Arcs 1 1,2 2 1,2 3 3 4 4 5 5 SVG210 GIS I NOTES 70 ARC/NODE cont’d • Fundamental primitive is a point • Points are identified as nodes • Nodes are created where lines intersect SVG210 GIS I NOTES 71 Advantages of vector data model • Data can be represented at its original resolution and form without generalization. • Graphic output is usually more aesthetically pleasing (traditional cartographic representation); • Since most data, e.g. hard copy maps, is in vector form no data conversion is required. • Accurate geographic location of data is maintained. • Allows for efficient encoding of topology, and as a result more efficient operations that require topological information, e.g. proximity, network analysis. SVG210 GIS I NOTES 72 Disadvantages of vector data • The location of each vertex needs to be stored explicitly. • For effective analysis, vector data must be converted into a topological structure. This is often processing intensive and usually requires extensive data cleaning. As well, topology is static, and any updating or editing of the vector data requires re-building of the topology. • Algorithms for manipulative and analysis functions are complex and may be processing intensive. Often, this inherently limits the functionality for large data sets, e.g. a large number of features. • Continuous data, such as elevation data, is not effectively represented in vector form. Usually substantial data generalization or interpolation is required for these data layers. • Spatial analysis and filtering within polygons is impossible SVG210 GIS I NOTES 73 THE RASTER DATA MODEL • Raster data models incorporate the use of a grid-cell data structure where the geographic area is divided into cells identified by row and column. • This data structure is commonly called raster. • Each grid cell is called a pixcel (derived from two words picture element) • While the term raster implies a regularly spaced grid other tessellated data structures do exist • The size of cells in a tessellated data structure is selected on the basis of the data accuracy and the resolution needed by the user. • There is no explicit coding of geographic coordinates required since that is implicit in the layout of the cells. • A raster data structure is in fact a matrix where any coordinate can be quickly calculated if the origin point is known, and the size of the grid cells is known. • Since grid-cells can be handled as two-dimensional arrays in computer encoding many analytical operations are easy to program. This makes tessellated data structures a popular choice for many GIS software. SVG210 GIS I NOTES 74 Cont’d • Topology is not a relevant concept with tessellated structures since adjacency and connectivity are implicit in the location of a particular cell in the data matrix. • Several tessellated data structures exist, however only two are commonly used in GIS's. • The most popular cell structure is the regularly spaced matrix or raster structure. This data structure involves a division of spatial data into regularly spaced cells. • Each cell is of the same shape and size. Squares are most commonly utilized. • Geographic data is rarely distinguished by regularly spaced shapes • The problem of determining the proper resolution for a particular data layer can be a concern. • If one selects too coarse a cell size then data may be overly generalized. • If one selects too fine a cell size then too many cells may be created resulting in a large data volume, slower processing times, and a more cumbersome data set. SVG210 GIS I NOTES 75 GIS MAP Structure - RASTER systems (Adapted from Berry) SVG210 GIS I NOTES 76 Raster data model cont’d • Most raster based GIS software requires that the raster cell contain only a single discrete value. • Accordingly, a data layer, e.g. forest inventory stands, may be broken down into a series of raster maps, each representing an attribute type, e.g. a species map, a height map, a density map, etc. • These are often referred to as one attribute maps. • Raster data storage provides the foundation for quantitative analysis techniques. • This is often referred to as raster or map algebra. • The use of raster data structures allow for sophisticated mathematical modelling processes while vector based systems are often constrained by the capabilities and language of a relational DBMS. SVG210 GIS I NOTES 77 Advantages of Raster data model • The geographic location of each cell is implied by its position in the cell matrix. Accordingly, other than an origin point, e.g. bottom left corner, no geographic coordinates are stored. • Due to the nature of the data storage technique data analysis is usually easy to program and quick to perform. • The inherent nature of raster maps, e.g. one attribute maps, is ideally suited for mathematical modeling and quantitative analysis. • Discrete data, e.g. forestry stands, is accommodated equally well as continuous data, e.g. elevation data, and facilitates the integrating of the two data types. • Grid-cell systems are very compatible with raster-based output devices, e.g. electrostatic plotters, graphic terminals. SVG210 GIS I NOTES 78 Disadvantages of Raster data model • The cell size determines the resolution at which the data is represented.; • It is especially difficult to adequately represent linear features depending on the cell resolution. Accordingly, network linkages are difficult to establish. • Processing of associated attribute data may be cumbersome if large amounts of data exists. Raster maps inherently reflect only one attribute or characteristic for an area. • Since most input data is in vector form, data must undergo vector-to-raster conversion. Besides increased processing requirements this may introduce data integrity concerns due to generalization and choice of inappropriate cell size. SVG210 GIS I NOTES 79 Cont’d • It is important to understand that the selection of a particular data structure can provide advantages during the analysis stage. • For example, the vector data model does not handle continuous data, e.g. elevation, very well while the raster data model is more ideally suited for this type of analysis. • Accordingly, the raster structure does not handle linear data analysis, e.g. shortest path, very well while vector systems do. • It is important for the user to understand that there are certain advantages and disadvantages to each data model. • The selection of a particular data model, vector or raster, is dependent on the source and type of data, as well as the intended use of the data. • Certain analytical procedures require raster data while others are better suited to vector data. SVG210 GIS I NOTES 80 4.4 CAPABILITIES OF VECTOR AND RASTER GIS CAPABILITIES OF VECTOR GIS A. INTRODUCTION • analysis functions with vector GIS are not quite the same as with raster GIS – more operations deal with objects – measures such as area have to be calculated from coordinates of objects, instead of counting cells • some operations are more accurate – estimates of area based on polygons more accurate than counts of pixels – estimates of perimeter of polygon more accurate than counting pixel boundaries on the edge of a zone • some operations are slower – e.g. overlaying layers, finding buffers • some operations are faster – e.g. finding path through road network SVG210 GIS I NOTES 81 Cont’d B. SIMPLE DISPLAY AND QUERY Display • using points and "arcs" can display the locations of all objects stored • attributes and entity types can be displayed by varying colors, line patterns and point symbols • may only want to display a subset of the data – e.g. want to display areas of urban landuse with some base map data • select all political boundaries and highways, but only areas that had urban land uses • how would the user do this? – – – – – e.g. one of the layers in a database is a "map" of land use, called USE area objects on this layer have several attributes one attribute, called CLASS, identifies the area's land use for urban land use, it has the value "U" need to extract boundaries for all areas that have CLASS="U" SVG210 GIS I NOTES 82 Cont’d Standard Query Language (SQL) • different systems use different ways of formulating queries • Standard Query Language (SQL) is used by many systems • SQL operators: – relational: >, <, =, >=, <= – arithmetic: =, -, *, / (only on numeric fields) – Boolean: and, or, not Boolean operators • used to combine conditions – e.g. WHERE cumgrade > 3.0 AND grade = "A" (selects students satisfying both conditions only) • Boolean operators can have a spatial meaning in GIS as well – e.g. when two maps are overlayed, areas (polygons) that are superimposed have the "and" condition • a spatial representation is used to illustrate Boolean operators in the study of logic, through the use of diagrams called Venn diagrams – thus GIS area overlay is a geographical instance of a Venn diagram – "XOR" is the "exclusive or" - A xor B means A or B but not both SQL extensions for spatial queries • some systems allow specifically spatial queries to be handled under SQL e.g. WITHIN operator • SELECT <objects> WITHIN <specific area> • the criteria for these spatial searches may include searching within the radius of a point, within a bounding rectangle, or within an irregular polygon SVG210 GIS I NOTES 83 Cont’d C. RECLASSIFY, DISSOLVE AND MERGE • reclassify, dissolve and merge operations are used frequently in working with area objects – these are used to aggregate areas based on attributes • consider a soils map: – we wish to produce a map of major soil types from a layer that has polygons based on much more finely defined classification scheme Steps • 1. reclassify areas by a single attribute or some combination – e.g. reclassify soil areas by soil type only • 2. dissolve boundaries between areas of same type – by delete the arc between two polygons if the relevant attributes are the same in both polygons • 3. merge polygons into large objects – recode the sequence of line segments that connect to form the boundary (i.e. rebuild topology) – assign new ID #'s to each new object SVG210 GIS I NOTES 84 Cont’d D. TOPOLOGICAL OVERLAY • suppose individual layers have planar enforcement (required in many systems, not all) • when two layers are combined ("overlayed", "superimposed") the result must have planar enforcement as well – new intersection must be calculated and created wherever two lines cross – a line across an area object creates two new area objects • topological overlay is the general name for overlay followed by planar enforcement • relationships are updated for the new, combined map • result may be information about relationships (new attributes) for the old (input) maps rather than the creation of new objects – e.g. overlay map of school districts on census tracts • result is map showing every school district/census tract combination • for each combination, the database contains an area object • however, concern may be with obtaining the number of overlapping census tracts as a new attribute of each school district rather than with new objects themselves SVG210 GIS I NOTES 85 Cont’d E. BUFFERING • a buffer can be constructed around a point, line or area – buffering creates a new area, enclosing the buffered object • applications in transportation, forestry, resource management – – – – protected zone around lakes and streams zone of noise pollution around highways service zone around bus route (e.g. 300 m walking distance) groundwater pollution zone around waste site • options available for raster, such as a "friction" layer, do not exist for vector • buffering is much more difficult in vector from the point of view of the programmer • sometimes, width of the buffer can be determined by an attribute of the object – e.g. buffering residential buildings away from a street network: • three types of street (1, 2, 3 or major, secondary, tertiary) with the setbacks being 600 feet from a major street, 200 feet from a secondary street, and only 100 feet from a tertiary street • problems with buffer operations may occur when buffering very convoluted lines or areas SVG210 GIS I NOTES 86 Cont’d CAPABILITIES OF RASTER GIS A. INTRODUCTION • a raster GIS must have capabilities for: – – – – input of data various housekeeping functions operations on layers, like - recode, overlay and spread output of data and results • the range of possible functions is enormous, current raster GISs only scratch the surface – because the range is so large, some have tried to organize functions into a consistent scheme, but no scheme has been widely accepted yet – this section covers a selection of the most useful and common • each raster GIS uses different names for the functions SVG210 GIS I NOTES 87 Cont’d B. DISPLAYING LAYERS Basic display • the simplest type of values to display are integers – on a color display each integer value can be assigned a unique color • if the values have a natural order we will want the sequence of colors to make sense – e.g. elevation is often shown on a map using the sequence blue-green-yellow-brown-white for increasing elevation • • there should be a legend explaining the meaning of each color on a dot matrix printer shades of grey can be generated by varying the density of dots Other types of display • it may be appropriate to display the data as a surface • contours can be "threaded" through the pixels along lines of constant value • the surface can be shown in an oblique, perspective view – this can be done by drawing profiles across the raster with each profile offset and hidden lines removed – the surface might be colored using the values in a second layer (a second layer can be "draped" over the surface defined by the first layer) – the result can be very effective – these operations are also computer-intensive because of the calculations necessary to simulate perspective and remove hidden lines SVG210 GIS I NOTES 88 Cont’d C. LOCAL OPERATIONS • produce a new layer from one or more input layers • the value of each new pixel is defined by the values of the same pixel on the input layer(s) • note: arithmetic operations make no sense unless the values have appropriate scales of measurement (see Unit 6) – you cannot find the "average" of soils types 3 and 5, nor is soil 5 "greater than" soil 3 Recoding • using only one input layer • some systems allow a full range of mathematical operations – e.g. newvalue = (2*oldvalue + 3)2 Overlaying layers • an overlay occurs when the output value depends on two or more input layers – many systems restrict overlay to two input layers only SVG210 GIS I NOTES 89 Cont’d D. OPERATIONS ON LOCAL NEIGHBORHOODS • the value of a pixel on the new layer is determined by the local neighborhood of the pixel on the old layer Filtering • a filter operates by moving a "window" across the entire raster – e.g. many windows are 3x3 cells • • the new value for the cell at the middle of the window is a weighted average of the values in the window by changing the weights we can produce two major effects: – smoothing (a "low pass" filter, removes or reduces local detail) – edge enhancement (a "high pass" filter, exaggerates local detail) • • weights should add to 1 filters can be useful in enhancing detail on images for input to GIS, or smoothing layers to expose general trends Slopes and aspects • if the values in a layer are elevations, we can compute the steepness of slopes by looking at the difference between a pixel's value and those of its adjacent neighbors • the direction of steepest slope, or the direction in which the surface is locally "facing", is called its aspect • slope and aspect are useful in analyzing vegetation patterns, computing energy balances and modeling erosion or runoff – aspect determines the direction of runoff, this can be used to sketch drainage paths for runoff SVG210 GIS I NOTES 90 Cont’d E. OPERATIONS ON EXTENDED NEIGHBORHOODS Distance • calculate the distance of each cell from a cell or the nearest of several cells – each pixel's value in the new layer is its distance from the given cell(s) Buffer zones • buffers around objects and features are very useful GIS capabilities – e.g. build a logging buffer 500 m wide around all lakes and watercourses • buffer operations can be visualized as spreading the object spatially by a given distance • the result could be a layer with values: 1 if in original selected object 2 if in buffer 0 if outside object and buffer • applications include noise buffers around roads, safety buffers around hazardous facilities • in many programs the buffer operation requires the user to first do a distance operation, then a reclassification of the distance layer • the rate of spreading may be modified by another layer representing "friction" – e.g. the friction layer could represent varying cost of travel – this will affect the width of the buffer - narrow in areas of high friction, etc. Visible area or "viewshed" • given a layer of elevations, and one or more viewpoints, compute the area visible from at least one viewpoint – useful for planning locations of unsightly facilities such as smokestacks, or surveillance facilities such as fire towers, or transmission facilities F. OPERATIONS ON ZONES SVG210 GIS I NOTES 91 RASTER – VECTOR DEBATE • arguments about which was better have been commonplace since the earliest systems were created • raster databases are appealing – simplicity of organization – speed of many operations, e.g. overlay, buffers – especially appealing to the remote sensing community who are used to "pixel" processing • on the other hand, there are many situations in which the raster approach may appear to sacrifice too much detail – cartographers were appalled by the crude outlines of parcels that resulted in the "pinking shear" effect of diagonal boundaries represented by grid cell edges SVG210 GIS I NOTES 92 Cont’d – surveyors were dismayed by the "inaccuracy" caused by the cells when portraying linear features and points – situations in which the raster approach sacrificed too much detail • however, computing times for overlaying vector based information can be excessive – early polygon overlay routines were error-prone, expensive, slow • today, there are situations in which it is clear that one approach is more functional than the other – e.g. using "friction" layer to control width of buffer is only feasible in raster – e.g. viewshed algorithms to find area visible from a point are feasible with elevation grids (raster DEMs), not with digitized contours – e.g. land survey data can only be represented with precise lines • an important current trend involves linking raster and vector systems, displaying vector data overlying a raster base – raster data may be from a GIS file (perhaps a remotely sensed image) or from a plain scanned image file • therefore, the question has evolved from "Which is best?" to "Under what conditions is which best and how can we have flexibility to use the most appropriate approaches on a case by case basis?" SVG210 GIS I NOTES 93 Basic issues • four issues to the discussions of raster versus vector: – coordinate precision – speed of analytical processing – mass storage requirements – characteristics of phenomena SVG210 GIS I NOTES 94 CHAPTER 5 GIS DATA CAPTURE TECHNIQUES SVG210 GIS I NOTES 95 5.1 INTRODUCTION • Since the input of attribute data is usually quite simple, the discussion of data input techniques will be limited to spatial data only. • need to have tools to transform spatial data of various types into digital format • data input is a major bottleneck in application of GIS technology – costs of input often consume 80% or more of project costs – data input is labor intensive, tedious, error-prone – there is a danger that construction of the database may become an end in itself and the project may not move on to analysis of the data collected – essential to find ways to reduce costs, maximize accuracy • need to automate the input process as much as possible, but: – automated input often creates bigger editing problems later – source documents (maps) may often have to be redrafted to meet rigid quality requirements of automated input • because of the costs involved, much research has gone into devising better input methods - however, few reductions in cost have been realized SVG210 GIS I NOTES 96 Cont’d • sharing of digital data is one way around the input bottleneck – more and more spatial data is becoming available in digital form • data input to a GIS involves encoding both the locational and attribute data • the locational data is encoded as coordinates on a particular cartesian coordinate system – source maps may have different projections, scales – several stages of data transformation may be needed to bring all data to a common coordinate system • attribute data is often obtained and stored in tables • The choice of data input method is governed largely by the application, the available budget, and the type and the complexity of data being input. SVG210 GIS I NOTES 97 Modes of data input • keyboard entry for non-spatial attributes and occasionally locational data • manual locating devices – user directly manipulates a device whose location is recognized by the computer – e.g. digitizing • automated devices – automatically extract spatial data from maps and photography – e.g. scanning • conversion directly from other digital sources • voice input has been tried, particularly for controlling digitizer operations – not very successful - machine needs to be recalibrated for each operator, after coffee breaks, etc. SVG210 GIS I NOTES 98 5.2 DIGITIZING • digitizers are the most common device for extracting spatial information from maps and photographs Hardware • the position of an indicator as it is moved over the surface of the digitizing tablet is detected by the computer and interpreted as pairs of x,y coordinates – the indicator may be a pen-like stylus or a cursor (a small flat plate the size of a hockey puck with a cross-hair) • • frequently, there are control buttons on the cursor which permit control of the system without having to turn attention from the digitizing tablet to a computer terminal contemporary tablets use a grid of wires embedded in the tablet to generate a magnetic field which is detected by the cursor – accuracies are typically better than 0.1 mm – this is better than the accuracy with which the average operator can position the cursor – functions for transforming coordinates are sometimes built into the tablet and used to process data before it is sent to the host SVG210 GIS I NOTES 99 The digitizing operation • • the map is affixed to a digitizing table three or more control points ("reference points", "tics", etc.) are digitized for each map sheet – these will be easily identified points (intersections of major streets, major peaks, points on coastline) – the coordinates of these points will be known in the coordinate system to be used in the final database, e.g. lat/long, State Plane Coordinates, military grid – the control points are used by the system to calculate the necessary mathematical transformations to convert all coordinates to the final system – the more control points, the better • digitizing the map contents can be done in two different modes: – in point mode, the operator identifies the points to be captured explicitly by pressing a button – in stream mode points are captured at set time intervals (typically 10 per second) or on movement of the cursor by a fixed amount • advantages and disadvantages: – in point mode the operator selects points subjectively • two point mode operators will not code a line in the same way – stream mode generates large numbers of points, many of which may be redundant – stream mode is more demanding on the user while point mode requires some judgement about how to represent the line SVG210 GIS I NOTES 100 Problems with digitizing maps • arise since most maps were not drafted for the purpose of digitizing – paper maps are unstable: each time the map is removed from the digitizing table, the reference points must be re-entered when the map is affixed to the table again – if the map has stretched or shrunk in the interim, the newly digitized points will be slightly off in their location when compared to previously digitized points – errors occur on these maps, and these errors are entered into the GIS database as well – the level of error in the GIS database is directly related to the error level of the source maps • maps are meant to display information, and do not always accurately record locational information – for example, when a railroad, stream and road all go through a narrow mountain pass, the pass may actually be depicted wider than its actual size to allow for the three symbols to be drafted in the pass • discrepancies across map sheet boundaries can cause discrepancies in the total GIS database – e.g. roads or streams that do not meet exactly when two map sheets are placed next to each other SVG210 GIS I NOTES 101 DIGITIZING ERRORS AND EDITING • • • • • • • • • Overshoot Undershoot Pseudo node Open polygon Dangling arc Dangling node Gap Sliver No-node arc intersection Editing errors from digitizing • some errors can be corrected automatically – small gaps at line junctions – overshoots and sudden spikes in lines • error rates depend on the complexity of the map, are high for small scale, complex maps SVG210 GIS I NOTES 102 Advantages of digitizing • Manual digitizing has many advantages. These include: – Low capital cost, e.g. digitizing tables are cheap; – Low cost of labour; – Flexibility and adaptability to different data types and sources; – Easily taught in a short amount of time - an easily mastered skill – Generally the quality of data is high; – Digitizing devices are very reliable and most often offer a greater precision that the data warrants; and – Ability to easily register and update existing data. SVG210 GIS I NOTES 103 5.3 AUTOMATIC SCANNING • A variety of scanning devices exist for the automatic capture of spatial data. • All have the advantage of being able to capture spatial features from a map at a rapid rate of speed. • Most scanning devices have limitations with respect to the capture of selected features, e.g. text and symbol recognition. • Experience has shown that most scanned data requires a substantial amount of manual editing to create a clean data layer. • Given these basic constraints some other practical limitations of scanners should be identified. These include : – hard copy maps are often unable to be removed to where a scanning device is available – hard copy data may not be in a form that is viable for effective scanning, e.g. maps are of poor quality, or are in poor condition; – geographic features may be too few on a single map to make it practical, costjustifiable, to scan; – often on busy maps a scanner may be unable to distinguish the features to be captured from the surrounding graphic information, e.g. dense contours with labels; – with raster scanning there it is difficult to read unique labels (text) for a geographic feature effectively; and – scanning is much more expensive than manual digitizing, considering all the 104 cost/performance issues. SVG210 GIS I NOTES Requirements for scanning • documents must be clean (no smudges or extra markings) • lines should be at least 0.1 mm wide • complex line work provides greater chance of error in scanning • text may be accidently scanned as line features • contour lines cannot be broken with text • automatic feature recognition is not easy (two contour lines vs. road symbols) diagram • special symbols (e.g. marsh symbols) must be recognized and dealt with • if good source documents are available, scanning can be an efficient time saving mode of data input SVG210 GIS I NOTES 105 CRITERIA FOR CHOOSING MODES OF INPUT • the type of data source – images favor scanning – maps can be scanned or digitized • the database model of the GIS – scanning easier for raster, digitizing for vector • the density of data – dense linework makes for difficult digitizing • expected applications of the GIS implementation SVG210 GIS I NOTES 106 5.4 RASTERIZATION AND VECTORIZATION Rasterization of digitized data • for some data, entry in vector form is more efficient, followed by conversion to raster • we might digitize the county boundary in vector form by – mounting a map on a digitizing table – capturing the locations of points along the boundary – assuming that the points are connected by straight line segments • this may produce an ASCII file of pairs of xy coordinates which must then be processed by the GIS, or the output of the digitizer may go directly into the GIS • the vector representation of the boundary as points is then converted to a raster by an operation known as vector-raster conversion – the computer calculates which county each cell is in using the vector representation of the boundary and outputs a raster • digitizing the boundary is much less work than cell by cell entry • most raster GIS have functions such as vector-raster conversion to support vector entry – many support digitizing and editing of vector data SVG210 GIS I NOTES 107 Cont’d Vectorization of scanned images • for many purposes it is necessary to extract features and objects from a scanned image – e.g. a road on the input document will have produced characteristic values in each of a band of pixels – if the scanner has pixels of 25 microns = 0.025 mm, a line of width 0.5 mm will create a band 20 pixels across – the vectorized version of the line will be a series of coordinate points joined by straight lines, representing the road as an object or feature instead of a collection of contiguous pixels • successful vectorization requires a clean line scanned from media free of cluttering labels, coffee stains, dust etc. – to create a sufficiently clean line, it is often necessary to redraft input documents • e.g. the Canada Geographic Information System redrafted each of its approximately 10,000 input documents • since the scanner can be color sensitive, vectorizing may be aided by the use of special inks for certain features • although scanning is much less labor intensive, problems with vectorization lead to costs which are often as high as manual digitizing – two stages of error correction may be necessary: 1. edit the raster image prior to vectorization 2. edit the vectorized features SVG210 GIS I NOTES 108 CHAPTER 6 FUTURE OF GIS SVG210 GIS I NOTES 109 6.1 INTRODUCTION • The development and application of geographic information systems is vibrant and exciting. • The term GIS remains one of the most popular buzz words in the computer industry today. GIS is perceived as one of the emerging technologies in the computer marketplace. • Everybody wants a GIS. • GIS is very much a multi-disciplinary tool for the management of spatial data. • It is inherently complex because of the need to integrate data from a variety of sources. • Functions must accommodate several application areas in a detailed and efficient manner. • A variety of important developments are occurring which will have profound effects on the use of GIS. SVG210 GIS I NOTES 110 Hardware • • • • • • • • • Fast geoprocessing Parallel Processing Memory Workstations Networks Hardware for specialized processing functions Operating systems Peripheral devices Specialized workstations SVG210 GIS I NOTES 111 Software • • • • • • • Database management systems Relational DBMSs DBMS versus Fourth Generation Languages GIS system integration Display products Interfaces to other technologies User interfaces SVG210 GIS I NOTES 112 New sources of Data • • • • Remote sensing Global Positioning Systems Error/uncertainty Data sharing New Application areas if GIS technology • Modeling and decision support • Sciences and mathematics SVG210 GIS I NOTES 113