Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Concurrency control wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational algebra wikipedia , lookup
Geog 480: Principles of GIS Guofeng Cao CyberInfrastructure and Geospatial Information Laboratory Department of Geography National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign What we have learned • The Nature of Geographic Data: • Tobler’s first law of geography (spatial dependence) • Spatial Heterogeneity • Fractal Behavior • GIS Functionalities(spatial analysis) o o o o Geometric, topological, and set-oriented analysis Field-based analysis Network analysis Overlay analysis • Data and Database o Spatial Data: Vector vs. Raster • Hardware Database Fundamentals What is a database? • A database is a collection of data organized in such a way that a computer can efficiently store and retrieve data o A repository of data that is logically related • A database is created and maintained using a generalpurpose piece of software called a database management system (DBMS) The database approach • Before databases, computers were primarily used to convert data between different formats o “The computer as a giant calculator” • Databases treat computers as useful repositories of data o “The computer as data repository” • Most applications (including GIS) require a balance of processing and storage Databases in a nutshell • In order to be effective, databases must offer the following functions: – – – – – Reliability Integrity Security User views User interface – – – – – Data independence Self-describing Concurrency Distributed capabilities High performance • All these functions are managed by the DBMS Nutty Nuggets #1 • We might write a program to organize the stock for the “Nutty Nuggets” restaurant • As time continues, this program will become more complex, offering more functions Stage 1 Stage 2 Nutty Nuggets #2 • Key problems with the previous approach are: o Loss of integrity o Loss of independence o Loss of security • Stage 3, the database, solves these problems Stage 3 Common database applications • Home/office database o Simple applications (e.g., Nutty Nuggets) • Commercial database o Store the information for businesses (e.g. customers, employees) • Engineering database o Used to store engineering designs (e.g. CAD) • Image and multimedia database o Store image, audio, video data • Geodatabase o Store a combination of spatial and non-spatial data Elements of a DBMS • Query language • Query compiler • Runtime database processor • Constraint enforcer • Stored data manager • System catalog/data dictionary Transaction management • A transaction is an atomic unit of interaction between user and database o o o o Insertion of data Modification of data Deletion of data Retrieval of data • Transaction management must support o Concurrency (multiple users accessing the same data at the same time) o Recovery management (retrieval of a valid database state following system failure) Concurrency: Lost update • Lost update can occur when atomic transactions are incorrectly interleaved Relational databases Database architectures • Most databases today are either: o Relational; or o Object-oriented (especially useful for spatial data) • Early database systems were based on the hierarchical model o Efficient storage, but limited expressiveness • The network model was used to overcome lack of expressiveness in hierarchical databases o But led to highly complex database system • The deductive model is an active research area today o Stores rules in addition to facts The relational model • A relational database is a collection of relations, often just called tables • Each relation has a set of attributes • The data in the relation is structured as a set of rows, often called tuples • Each tuple consists of data items for each attribute • Each cell in a tuple contains a single value • A relational database management system (RDBMS) is the software that manages a relational database Example relation Relation Attribute Tuple Data item Relations • A relation is basically a “table” • A relation scheme is the set of attribute names and the domain (data type) for each attribute name • A database scheme is a set of relation schemes • In a relation: o Each tuple contains as many values as there are attributes in the relation scheme o Each data item is drawn from the domain for its attribute o The order of tuples is not significant o Tuples in a relation are all distinct from each other • In most relational systems, data items are atomic o A relation that contains only atomic items is said to be in first normal form (1NF) • The degree of a relation is its number of columns • The cardinality of a relation is the number of tuples Relation scheme • A candidate key is an attribute or minimal set of attributes that will uniquely identify each tuple in a relation • One candidate key is usually chose as a primary key Operations on relations • Fundamental relational operators: o Union, intersection and difference: usual set operations, but require both operands have the same schema o Selection: picking certain rows o Projection: picking certain columns o Products and joins: compositions of relations • Together, these operations and the way they are combined is called relational algebra combined: o An algebra whose operands are relations or variables that represent relations • The relational model is said to be closed, because relational operators take one or more relations as input and return a relation Project operator • The project operator is unary o It outputs a new relation that has a subset of attributes o Identical tuples in the output relation are coalesced Relation Sells: bar Murphy’s Murphy’s Legend’s Legend’s beer Bud Miller Bud Miller price 2.50 2.75 2.50 3.00 Prices := PROJbeer,price(Sells): beer price Bud 2.50 Miller 2.75 Miller 3.00 Select operator • The select operator is unary o It outputs a new relation that has a subset of tuples o A condition specifies those tuples that are required Relation Sells: bar Murphy’s Murphy’s Legend’s Legend’s beer Bud Miller Bud Miller price 2.50 2.75 2.50 3.00 MurphyMenu := SELECTbar=“Murphy’s”(Sells): bar beer price Murphy’s Bud 2.50 Murphy’s Miller 2.75 Join operator • The join operator is binary o It outputs the combined relation where tuples agree on a specified attribute (natural join) Sells( bar, Murphy’s Murphy’s Legend’s Legend’s beer, Bud Miller Bud Coors price ) 2.50 2.75 2.50 3.00 Bars( bar, address) Murphy’s 604 Green St. Legend’s 522 Green St. BarInfo := Sells JOIN Bars Note Bars.name has become Bars.bar to make the natural join “work.” BarInfo(bar, Murphy’s Murphy’s Legend’s Legend’s beer, Bud Milller Bud Coors price, 2.50 2.75 2.50 3.00 address ) 604 Green 604 Green 522 Green 522 Green St. St. St. St. Join operator •Join is the most time-consuming of all relational operators to compute o In general, relational operators may not be arbitrarily reordered (left join, right join) o Query optimization aims to find an efficient way of processing queries, for example reordering to produce equivalent but more efficient queries Complex relational operator example Join relations SHOW and FILM using FILM_NAME and TITLE Select using CINEMA_ID=1 Project TITLE, DIRECTOR, CINEMA_ID, and SCREEN_NO For full database see book web site: http://worboys.duckham.org Relational databases and spatial data • Several issues prevent unmodified databases being useful for spatial data o Structure of spatial data does not naturally fit with tables o Performance is impaired by the need to perform multiple joins with spatial data o Indexes are non-spatial in a conventional relational database • An extensible RDBMS offers some solutions to these problems with o o o o user defined data types user-defined operations user-defined indexes and access methods active database functions (e.g., triggers) • End of this topic