* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Introduction to Geographic Information Systems Spring 2013 (INF 385T-28437) Data Modeling, Database Design You are here Food is there Well, now it’s there INF385T(28437) – Spring 2013 – Lecture 11 3 Outline          The design process Describing a system of viewpoints Developing use cases GIS design stages Conceptual design Normalization example Database diagrams Building the database Prototype to production INF385T(28437) – Spring 2013 – Lecture 11 4 Best practices: iterative, incremental development Ideas, Use Cases, Requirements Planning, Cost-Benefit, Risk Management Analysis, Design and Evaluation Maintenance, Evaluation Deployment INF385T(28437) – Spring 2013 – Lecture 11 Development, Quality Control and Evaluation Note: This is only an illustration – your experience may differ! 5 Best practices: iterative approach  Focus on current issues  Short iterations better reflect near-term internal and external environment     Resolve misunderstandings early Resolve analysis, design and implementation disparities early More accurate overall project status Workload better distributed across life cycle  Medium- to long-term changes in environment factored in more easily through spiral approach INF385T(28437) – Spring 2013 – Lecture 11 6 Starting out   Define the project Get to know the client  Stakeholders and decision makers  Build the design team  Analyst  Subject matter experts  Users  Inventory tasks, products, data  Identify model  Develop Use Cases  Inventory data INF385T(28437) – Spring 2013 – Lecture 11 GIS Design Seminar  Teach client GIS concepts and design methods  Introduce yourself and the design team  Set realistic expectations for the GIS system  Alleviate fears and concerns 7 Building a System… of Viewpoints Community Objectives Business aspects: purpose, scope and policies What for? Why? Who? When? Enterprise Viewpoint Abstract/Best Practices Information sources Information and Viewpoint models What is it about? Computational Viewpoint Types of services and protocols How does each bit work? Implementation/Development Engineering Viewpoint Technology Viewpoint Solution types: distribution infrastructure How do the components work together? Implementation system: hardware, software, distribution With what? Viewpoints in “Reference Model - Open Distributed Processing (RM-ODP)” ISO/IEC 10746 Use cases  A description of a task you want the system to perform Use case  Add new water service, record parcel sale  Notify Owners Basis of all analysis and design  Start simple; expand with detail later   Analysis of use cases yields data, interfaces, applications Use cases can:  Capture existing work flows  Define new applications  Help understand alternative and pathological work flows Data GIS Database INF385T(28437) – Spring 2013 – Lecture 11 9 Use case diagrams System boundary  Show the actor/use case relationships  System architecture  Data flows, coordination  Develop with users Manage flood control structures Operations & Maintenance Staff  During meetings, interviews  Clean up and refine later  Graphical notation is helpful, but the use case document is the most important artifact Geocode & map call list GIS Analyst Use case and name INF385T(28437) – Spring 2013 – Lecture 11 Produce flood maps Document locations of flooding Emergency Call Center Actor and name 10 Example use case  Some context: For emergency response application, assume a set of use cases focused on exchanging information leading to creating, updating, and posting flood maps to interested agencies.  Each use case is documented according to a template, such as: Use Case Name Description: Actors: Pre-conditions: Post-conditions: Flow of events: - business rules - user actions, responses Exceptions: Alternates: INF385T(28437) – Spring 2013 – Lecture 11 11 Example use case      Use case: Flooding Information & Response Description: The call center receives and documents citizen calls related to flooding during storm events. The information is geocoded, mapped, and provided to the Water District’s Operation & Maintenance staff, which makes decisions to manage control structures to mitigate flooding. Actors: Emergency Response Call Center, GIS Analyst, Operations & Maintenance Staff. Pre-conditions: The database of critical water facilities has been created. The Emergency Call Center has been activated for a storm event. Water district staff are operating under emergency operating procedures. Post-conditions: Status change notices are sent to the relevant agencies registered to receive updates. INF385T(28437) – Spring 2013 – Lecture 11 12 Use case primary scenario 1 Citizen places a call to the Emergency Call Center hotline. 2 Call center staff document location and description of flooding problem. 3 GIS analyst receives and geocodes locations from the call center, producing a map of call locations. 4 Call reports are symbolized based on flooding issue. Maps are produced and handed off to the operations staff. 5 Operations & Maintenance staff review maps of flooding incidents and make decisions for operating control structures, gates, and pumps. INF385T(28437) – Spring 2013 – Lecture 11 13 Are we finished yet?  What layers will you need based on the use case?  Where will you get these layers?  Are there any changes that can be made to the business process? INF385T(28437) – Spring 2013 – Lecture 11 14 Using the use cases      From the set of use cases developed, the functional requirements and interfaces can be fleshed out. From an understanding of the collaborators and stakeholders involved in the use cases, appropriate data sources and maintenance authorities can be determined. From a comparison of all the use cases, redundant information and tasks can be discovered and minimized. From an examination of potential alternate scenarios, pathological situations can be anticipated and mitigated. BUT… beware the use case time sink  You cannot completely and correctly document all the use cases for a reasonably complex system in your lifetime  Keep it simple, and start prototyping as soon as you can – this will further inform the use cases and keep your project moving INF385T(28437) – Spring 2013 – Lecture 11 15 GIS DESIGN INF385T(28437) – Spring 2013 – Lecture 11 Designing the database Conceptual model Logical model Key Project Feature collection Business practices Collect information, identify desired themes and sources INF385T(28437) – Spring 2013 – Lecture 11 Roads Network Rail Topology Boundaries Relationship Physical model table table table Map themes to GIS database elements: define database entities and organization Complete data organization, build full schema, test and refine 17 GIS design practice  Think about the GIS features represented by thematic layers, and about the integrity and behavior of those features-    Parcels are represented as polygons. Parcels share geometry with boundaries. Parcels do not overlap. … etc. INF385T(28437) – Spring 2013 – Lecture 11 18 Conceptual design        Entities, general relationships, important attributes Sketches ER/UML conceptual diagrams Spreadsheets Often reconstruct from existing systems/datasets Very important for complex projects Very useful to communicate with domain experts/business people INF385T(28437) – Spring 2013 – Lecture 11 19 Conceptual design   Purpose and usage of GIS Data sources  coverages, shapefiles, CAD, etc.  compilation scale and accuracy  Spatial representation  raster, vector, surface, address  Attributes  required fields, types of measurement  Relationships  network, topological, general INF385T(28437) – Spring 2013 – Lecture 11 20 Conceptual design It is important to understand what you want to achieve from the outset  Symbology and labels  what symbols at which scales  text presentation on the map Key  Spatial reference  projection and datum  the largest area mapped  required detail and resolution  Special design cases, for example:  condominium parcels  parcel annotation INF385T(28437) – Spring 2013 – Lecture 11 21 Diagramming themes  Classic layer diagrams  Organize data into logical units  Focus on common data elements to help determine:  Attributes  Associations  Spatial relationships Water Use Application Hydrology Utilities Boundaries Roads INF385T(28437) – Spring 2013 – Lecture 11 22 Documenting themes Layer Map use Data source Representation Spatial relationships Map scale, accuracy, currency Symbology and annotation Parcels Parcels define land ownership and are used for taxation Compiled from land ownership transactions and cadastral records Polygons Parcel polygons do not overlap 1:2400, +/- 5 ft, quarterly update Labeled or annotated with house number and street name Layer Map use Data source Representation Spatial relationships Map scale, accuracy, currency Symbology and annotation Streets Define the street centerline network Public or commercial data products or various government agencies Polylines Streets intersect only at endpoints and generally do not overlap 1:12000, +/- 10 ft., semiannual update Symbolized according to road classification, labeled with street name 2-23 Inventory existing data Legacy data Target data layers Annotation Boundaries Lots Parcels PLSS Monument PLSS Quarter PLSS Section PLSS Township INF385T(28437) – Spring 2013 – Lecture 11 24 Inventory existing data  Model database schema from existing data  Bridge existing data with current technology, for example: Legacy data Target Data Layers GIS Database Annotation Boundaries Lots Parcels PLSS Monument PLSS Quarter PLSS Section PLSS Township INF385T(28437) – Spring 2013 – Lecture 11 • Boundaries hold survey attributes • Coverage parcel polygons only exist for regions 25 Lecture 11 DATABASE DESIGN INF385T(28437) – Spring 2013 – Lecture 11 Files, databases, and GIS    Data files contain text or other data in arbitrary formats Data tables contain records with fields (attributes, data items) identified by a primary key Relational Database Management System (RDBMS or just DBMS):  creates and maintains relationships between data tables  allows one or more users to create or edit data in the tables  allows users to sort, select, and retrieve information using QUERIES and REPORTS  GIS adds a spatial dimension to databases, by integrating location and geometric shape information with the tables INF385T(28437) – Spring 2013 – Lecture 11 27 Relational database A formal information model called “relational”  Tables can have formal & ad hoc relationships, based on: Ab 32 34 R      Rows and columns Known column types Relationships SQL language and operators Cd 12 9 A Ef 17 11 X xz 53 46 G 11 ed 4 w 12 vg 9 f 24 kl 2 c 12 op 2 v Relational is based on a simple, generic model with many implementations (MS Access, IBM DB2, Oracle, MS SQLServer, and many others) INF385T(28437) – Spring 2013 – Lecture 11 28 Data tables Organized into columns, rows, and cells (like a spreadsheet) Columns = attributes = fields = data items Rows = records Cells = values Attribute or Column Record or Row Cell or Value INF385T(28437) – Spring 2013 – Lecture 11 29 Defining columns To define a column or attribute, you must specify the column name and type All DBMS’s support basic types: • • • • Number (integer, float, decimal) String (text) Boolean (Yes/No) Date Many DBMS (SQL-99) and GIS systems support additional types (BLOB, XML, time series, …) INF385T(28437) – Spring 2013 – Lecture 11 30 Primary key The field or combination of fields that identifies each and every record uniquely within a table Note: Primary key is more often arbitrary, meaningless to users; main purpose is to be unique 3-31 Water use permit example • Paper-based application form required for permit to withdraw surface or ground water 3-32 Sample data for water use permits INF385T(28437) – Spring 2013 – Lecture 11 33 use codes INF385T(28437) – Spring 2013 – Lecture 11 34 Relational organization    Tables should be organized according to basic rules of relational design for most efficient use. Normalization is a series of steps followed to obtain a database design that allows for efficient access and storage of data in a relational database. These steps reduce data redundancy and the chances of data becoming inconsistent. 3NF or BCNF are the usual standards for relational database design, however performance and convenience may drive toward de-normalization. INF385T(28437) – Spring 2013 – Lecture 11 35 Database normalization steps       First Normal Form (1NF) eliminates repeating groups by putting each into a separate table and connecting them with a one-to-many relationship. Second Normal Form (2NF) eliminates functional dependencies on a partial key by putting the fields in a separate table from those that are dependent on the whole key. Third Normal Form (3NF) eliminates functional dependencies on non-key fields by putting them in a separate table. At this stage, all non-key fields are dependent on the key, the whole key and nothing but the key. Boyce-Codd Normal Form (BCNF) is sometimes applied as a stronger form of 3NF in which every determinant of a functional dependency within a relation must be a candidate key for the schema. Fourth Normal Form (4NF) separates independent multi-valued facts stored in one table into separate tables. Fifth Normal Form (5NF) breaks out data redundancy that is not covered by any of the previous normal forms. source - http://www.hyperdictionary.com/dictionary/database+normalisation INF385T(28437) – Spring 2013 – Lecture 11 36 First normal form - NOT  Do you see any groups of repeating columns?  What’s wrong with that?  Can you think of a case where this is okay?  How would you reorganize to fix this? INF385T(28437) – Spring 2013 – Lecture 11 37 First normal form Foreign keys Primary key B C Primary key A  The Use Code columns can be removed from the main table (A), and made into rows of a separate table (B), keyed by ActID. (compare with previous slide)  INF385T(28437) – Spring 2013 – Lecture 11 (C) is a lookup table for use code descriptions. 38 Relationship cardinality  With this design, one ActID can have any number of use codes, and any one use code can be associated with many ActID’s  This is called a Many-to-Many (M:M) relationship  This is much more space-efficient for data storage  One record in the Use Code Descriptions table (C) can be associated with many records in the ActID-Use Codes table (B)  This is called a One-to-Many (1:M) relationship   You may also have relationships with fixed cardinalities, such as 1:1, 1:2, 1:0..5, etc. Cardinality of 1:0 generally means “nulls are allowed” INF385T(28437) – Spring 2013 – Lecture 11 39 Second normal form - NOT  Do you see any dependencies between non-key columns and a partial key?  If the primary key were compound and included an OwnerID, there could be such a dependency between Owner and OwnerID Compound key   What’s wrong with that? What would you do to fix this? INF385T(28437) – Spring 2013 – Lecture 11 40 Second normal form   Remove the non-key data to a separate table and link to it … and clean up the data while you’re at it!  Spelling, abbreviations, punctuation  Firstname Lastname vs. Lastname, Firstname INF385T(28437) – Spring 2013 – Lecture 11 41 Third normal form - NOT • Do you see any functional dependencies among non-key fields in the table below? » Need we ask again: what’s wrong with this? • How would you reorganize to fix this? 3-42 Third normal form   Remove the source description to a separate table, and join using the source code field This will reduce duplication of data (and errors) INF385T(28437) – Spring 2013 – Lecture 11 43 Is that all there is to it? Name City ST PostalCode John Patty Smith Denver Seattle Vancouver CO WA BC 80031 98107 V6C 1T2  This table is NOT in Third Normal Form: ○ The PostalCode field is dependent on the City and ST fields  To place this table in 3NF, a separate table would be created for the City and ST fields, and joined using the PostalCode field ○ But this is generally not done with address & postal codes… WHY? Normalization tradeoffs  When would you expect to normalize tables?  For primary data entry and updates; easier to set up and manage data integrity validation  Such as name and address subfields  To support more kinds of ad hoc queries  When would you expect to denormalize?  For presentation of data to users  To reduce the number of table-joins for faster performance    Queries are known and fixed Better performance for web publishing Database views are often used to flatten relationship structure for read-only access INF385T(28437) – Spring 2013 – Lecture 11 45 Lecture 11 DATABASE DIAGRAMMING INF385T(28437) – Spring 2013 – Lecture 11 Database diagramming: conceptual / logical overview Owners 1   * Applications * * Use Codes * Database relationships and cardinality can be diagrammed for prototyping and 1 documentation Use Code Descrips The “*” on an association link means “many” INF385T(28437) – Spring 2013 – Lecture 11 47 Database diagramming: Entity-Relationship (E-R) or Unified Modeling Language (UML) Owners 1 *OwnerID FirstName LastName Phone StreetAddr City State PostalCode Applications * * * *ApplicationID OwnerID ApplicationType ProjectLocation BusinessName BusinessType … Use Codes *RowID UseCode ApplicationID * 1 Use Code Descrips *UseCode Description INF385T(28437) – Spring 2013 – Lecture 11 48 Normalization tradeoff: referential integrity  Suppose you removed a record from the Owner table  What should be done with the related records from the Applications table?  Would this be easier or harder to manage than with the de-normalized design on slide 31, “Sample data for water use permits”?  The more tables are interconnected by relationships, the greater the need to support referential integrity within your applications  A DBMS’ default support for referential integrity may be very basic, such as to place Nulls in associated foreign key fields, but only when a relationship is declared INF385T(28437) – Spring 2013 – Lecture 11 49 Lecture 11 INF385T(28437) – Spring 2013 – Lecture 11 BUILDING THE DATABASE Prototype prototype prototype…     Critical for validating your data model and applications An easy way to discover project requirements Don’t plan a lot of time for this, just do it! Prototype in the simplest environment to learn the most in the least time  Validate that thematic choices, schema & integrity rules support your requirements  Reduce data management overhead with personal, single-user system INF385T(28437) – Spring 2013 – Lecture 11 51 Database environments  Production/Publishing  Read-only copies of databases  Used by majority of users  Contains custom views of databases  Development/Maintenance  Where compilation and editing occur  Normalized for greater integrity enforcement  May have multiple environments by data model (cadastral/land use, transportation, utilities, hydro…)  Design/Test  Prototype validation, load testing  Isolate testing changes to the development environment, so as not to corrupt the development system INF385T(28437) – Spring 2013 – Lecture 11 52 Large projects can seem like this… etc, etc… smoke alarm… 1. Burning toast… …fill glass! 1-53 But they can be simplified with common data models  Should have:  Simple structure with most common elements across a set of applications in a user community  Minimal rules, custom behavior, or cross-dependencies  May include collections or sets or systems of feature classes, e.g., networks, topologies, terrains  Should lend to:  Web distribution  Incremental, multi-user data updates  User-side fusion, densification, value-adding http://support.esri.com/en/knowledgebase/techarticles/detail/40585 1-54 Summary          The design process Describing a system of viewpoints Developing use cases GIS design stages Conceptual design Normalization example Database diagrams Building the database Prototype to production INF385T(28437) – Spring 2013 – Lecture 11 55
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            