Download Data Management (Part 1)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expense and cost recovery system (ECRS) wikipedia , lookup

Operational transformation wikipedia , lookup

Data center wikipedia , lookup

Data analysis wikipedia , lookup

Concurrency control wikipedia , lookup

Data model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Versant Object Database wikipedia , lookup

Database wikipedia , lookup

Data vault modeling wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Business intelligence wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Data Management Part 1
Vicki Drake
Earth Sciences Department
Santa Monica College
Data Management
• Computer-based storage and retrieval technology developed out of basic need of
industries to function more effectively with accurate and timely information.
• Initial concepts of database and database management systems developed along
with the information systems field during 1960s and 1970s.
• A database is the stored information, and database management systems
organize and retrieve the stored information.
Data Management
• A Spatial Database is a collection of spatially referenced data that acts as a model
of reality consisting of selected phenomena deemed important enough to be
represented in a digital format
• The digital representation might be for some past, present or future time period
• The content, structure and use of the spatial database will be unique dependent
on user demands and specifications
Data Management
• Spatial data needs for two different organizations may be the same, although use
of data may be different
– I.e., highway data from the different points of view of a natural resources
organization and a highway transportation organization
– I.e., wetlands data from the different points of view of an ecological
organization and a taxing authority
Data Management
• Spatial databases will contain phenomena/features important enough to collect
and represent for an individual organization’s needs.
• Identifying the phenomena/feature and then choosing an appropriate data
representation for them is part of a process called database design
Data Management
• Main objective for developing a database is to relate facts previously separate.
• Two approaches to database management:
– 1) File processing approach
– 2) Database management approach
Data Management – File Processing Approach
• File processing approach – the most common approach to using a database
– Data is stored in one or more computer files accessed by special database
software
– Each application program must directly access each data file it uses – creating
redundancy since instructions for access must be written into each application
program
• Data must be shared by different application programs and different users.
– Any modifications made by users or programs, creates control problems.
– A lack of central control can degrade the database
DATABASE MANAGEMENT APPROACH
• A DBMS is comprised of a set of programs to manipulate and maintain database
© Vicki Drake
SMC – Intro to GIS
Fall 2000 Lectures
1
• DBMS manages the sharing of data and maintaining integrity of database itself, by
acting as central control between database and application programs.
– Application programs do not need specific instructions regarding storage or
organization of data, as access is through DBMS only
– DBMS can “package” data to be application program-specific
DBMS - ADVANTAGES
• Centralized control – Data quality and integrity maintained
• Data easily shared, but still controlled by DBMS
• Reduced redundancy as application programs do not need “built-in” database
organizational instructions
• Database searches and analysis faster through DBMS through “user-friendly”
interfaces
• Multiple “views” of data created
DBMS- DISADVANTAGES
• Database system software and hardware can be expensive
– Represents additional acquisition and maintenance costs to projects
• Database system more complex with more susceptibility to failure and data loss.
– Backup and recovery systems required
• Centralization of data and redundancy reduction runs risk of corruption of data
– Backup and recovery systems may alleviate some risks
Data Management - Database Elements
• Elements of reality modeled in a GIS database have two main identities
• Entity - the element in reality
• Object - the element as it is represented in the database- a “digital
representation of all or part of an entity”
• A third identity important in cartographic applications is the symbol used to depict
the entity/object as a feature on a map
Data Management - Definitions
• Database Model – a conceptual description of a database defining entity type and
associate attributes
• Layers - spatial objects groupings – also called overlays, coverages and themes
• For a complete Glossary of GIS Terms:
http://www.urisa.org/glossary.htm
Data Management – Spatial Object Types
• 1st step to database development is the selection and definition of entity types to
be included
• 2nd step of database design is to choose an appropriate method of spatial
representation for each entity type
• Appropriate digital representation dependent on spatial object type (using
National Standard for Digital Cartographic Databases) classification based on
spatial dimensions
© Vicki Drake
SMC – Intro to GIS
Fall 2000 Lectures
2
Data Management – Spatial Object Types
• Classification based on following definition of spatial dimension
• 0-dimensional object types
– Point – specific geometric location
– Node – a topological junction or end point, may specify location
• 1-dimensional object types
– Line – a one dimensional object type
– Line segment – a directed line between two points
– Arc – a locus of points that forms a curve defined by a mathematical function
– Link – a connection between two nodes
– Directed link – a link with one direction specified
• 2-dimensional object types
– Area – a bounded continuous object which may or may not include its
boundary
– Interior area – an area not including its boundary
– Polygon- an area, consisting of an interior area, one outer ring and zero or
more non-intersecting, non-nested inner rings
– Pixel – a picture element that is the smallest non-divisible element of an image
– Grid cell – an element of a regular or nearly regular tesselation* of a surface,
differs from pixel by relative size – a pixel is relatively small compared to a grid
cell
© Vicki Drake
SMC – Intro to GIS
Fall 2000 Lectures
3
Data Management – Database Structure
• A database is a collection of related information, or related objects (tables,
queries, etc.) stored in a single file.
– Tables – Contain the actual database information, arranged in tabular
(column/row) format
– One or more tables represent the core of any database and each table contains
information related to a particular subject.
– Queries –Questions and results asked about the information in a table.
Data Management • Tables are made up of two components: fields and records
• Fields – a category of information containing an item of data (attribute/nonspatial data)
– A field defines where a particular type of data can be found in the record
• Key – A field or a combination of fields that uniquely identifies each record in a
table
• Types of possible queries are determined by number and type of key fields
Data Management
• Records – collection of all field information for one table entity
• Records represent the information pertaining to a particular element or entity
Data Management – Data Models
• The conceptual organization of a database is termed the data model
– A style of describing and manipulating the data in a database
• Three classic data models used to organize electronic databases
Hierarchical – data are organized by records on a parent-child one-to-many
relations
Network – data are organized by records classified into record types with
pointers linking associated records
Relational – data organized by records without using internal pointers or keys
Object-oriented - New and emerging system as data are identified as individual
objects classified into object types according to characteristics of the object
Data Management
© Vicki Drake
SMC – Intro to GIS
Fall 2000 Lectures
4
Data Management – Relational Database Model
•
•
•
In the Relational database model, there is not hierarchy of data fields within a record, and
every data field can be used as a Key Field
Data stored as collection of values in forms of tuples (record row) grouped together in 2dimensional tables (each table stored as a separate file)
The table, itself, represents the relationships among all the attributes it contains and is
called a “relation”
• Relational Data Structure - the Table (aka: a Relation)
– A relation is a collection of tuples corresponding to rows of table
– A tuple is made up of attributes corresponding to columns of table
– Each relation has a unique identifier called the Primary Key – a column or
combination of columns that have no identical values in any two rows –
•Values of each row of Primary Key are unique
•Primary Keys used to relate data in different tables
Data Management – Relational Database Model
• Searches of related attributes stored in different tables can be done by linking two or more
tables using the common attribute (field)
© Vicki Drake
SMC – Intro to GIS
Fall 2000 Lectures
5
• Advantages of Relational Database Model over Hierarchical or Network Database
Model
– Relational is more flexible – processing not restricted by the way data values are set in
a table
– Hierarchical/Network – internal structure of data model determines processing
capabilities
– Organization of the Relational Model is simple to understand – easier communication of
ideas
Data Management – Relational Database Model
•
Disadvantages of Relational Database Model over Hierarchical or Network Models
– More difficult to implement
– Slower performance – absences of “pointers” (codes to indicate location of files, etc.)
requires matching values in relational tables for data manipulation
© Vicki Drake
SMC – Intro to GIS
Fall 2000 Lectures
6