* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download No Slide Title
Survey
Document related concepts
Microsoft Access wikipedia , lookup
Oracle Database wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Relational algebra wikipedia , lookup
Concurrency control wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Ingres (database) wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
Dr. Ahmad Bintouq Overview Why we need database Database Concept Types of database models Relational database DBMS • DBMS is a collection of data (database) and programs to access that data. The goal of DBMS is to store, retrieve, and display information • Key characteristics of DBMS are: performance, store large volume of database, share data (access), provide security (authorization), remove redundancy (normalization) and provide concurrent access (different users at the same time). What is a Database? • A database is a set of computer files that stores information in an organized, structured format • The information is organized in records and fields • Information in a database is related so questions can be asked such as: • List all of the courses that are 4000 level or higher • List the name and address for all people whose last names begin with "T" Database-continue • 4 basic types of computer database Structures for management of data: hierarchical, network, relational, and object oriented • Database Records and Fields • Record: a small group of related data items (the logical unit of a database) • Field: An individual item of data (contain information that describe records) Why we need database? Without database GIS is cartography (electronic map) No database No spatial analysis What is GIS? Hardware People GIS Methods Software Editing, analysis, production Information Database and GIS GIS database Attribute DBMS Text, Images Sound, movie Multimedia Spatial data location Data abstraction • Physical level: Describe how the data are actually store (word or bytes) • Conceptual level: Describe what data are actually stored in the database (Structure). it gives Schematic representation of phenomena Data abstraction-Continue • View level: Describe only a part of the entire database. Many users of the database may be concerned with a subset of information. The system may provide many views for the same database Data abstraction View Level Conceptual level Logical Level Different users for the same database Interdisciplinary - Centralization – Application A Economic Development Application B Permits Application C Application D Engineering Utility Billing Application Application E E Application F Work Work Emergency Management Response Management An Example Application G Application H Application I Tax Billing Leisure Facilities Enforcement External Source Payroll General Ledger Investments A near-real time Financial Representation of Businesses in Enterprise CORPORATE DATABASE (GIS) Drawings A near-real time Spatial Representation of Businesses in Enterprise User Interface Expert Interface Web Service User Interface Expert Interface Instance of schemes • The collection of database at a particular moment is called the instance of the database • The overall design of the database is called the database scheme Types of database models Data model is a collection of conceptual tools for describing data, data relationship, data semantics, and consistency constraints. There are mainly three types of models • Object-based logical models Are used to describe data at the conceptual and view level. Example of these the Entity-Relationship model and object-oriented model Data models- Continue • Record-based logical models Are used to describe data at the conceptual and view level. Example of these are: Network model, Hierarchical model, and relational model. • Physical data models Are used to describe data at the physical level (bytes and words). It is mainly deal with hardware. Entity-Relationship (E-R) model • It is based on simulation of the real world which consists of basic objects called entities and relationship among these objects • The overall logical data structure of a database can be expressed graphically by an E-R diagram. Which consists of rectangle (entity), ellipse (attribute), diamond(relationship), and lines. Street Number Balance SSN Name Date Customer 1 Custom Acct Account 3 2 3 Tables can represent the above relation E-R Diagram for customer CustomAcct relationship associate a customer with each account he has Aggregation Aggregation is an abstraction through which relationships are treated as higher-level entities i.e. express relationships among relationships Name Employee SSN Hours Numbe r Project Work Users 5 tables can represent the diagram Type Machinery E-R diagram with aggregation Generalization In E-R diagram generalization is depicted through a triangle labeled ISA (is a). The attribute of higher level entity are said to be inherited by lower level entity. e.g. both saving and checking account inherit the attributes of account Balance Account number Account 3 tables can represent the diagram Saving account Interest rate ISA Checking account Over draft E-R diagram with generalization E-R diagram Table • An Entity (E) with attributes a1..an can be represented by a table called E with n columns for each attribute. • Each row in this table corresponds to one entity of the entity set E Let D1 ==> set of all account number D2===> set of all balance Any row consists of 2 tuples (v1,v2) e.g. (255,3000) The set of all possible rows is the Cartesian product of D1 and D2 i.e. = D1 X D2 For a table with n columns the total number of rows = D1 X D2 X ..X Dn-1 X Dn Account_no 255 452 560 323 215 balance 3000 3222 34555 21000 456780 Mapping constraints • Mapping cardinalities express the number of entities to which another entity can be associated via a relationship • For a binary relationship set R between entity set A and B the mapping can be oneone (1-1), one-many(1-M), many-one (M1), and many-many(M-M) a 1 a 2 b1 b2 b3 1-M relation Object oriented Model The basic unit that an object-oriented (OODBMS) manages is the object. It is based on four basic concepts of abstraction: • Classification: Mapping of several objects (instances) to common class • Generalization: Group several classes which have the same properties in common (roads, railway)-transportation network Object oriented Model-Continue • Association: Relation between similar objects is considered a higher level set object • Aggregation: Objects which consist of several other objects (Composed objects) Object oriented Model-Continue • OO model uses objects rather than records to manage data • An object is a collection of data elements and operations that together are considered a single entity • An object has associated with it a set of variables that contain the data for the object, a set of messages to which the object respond, and a method which response to the message Object Oriented Model-Continue • Once the structure is setup, the details of it need not be user visible • This approach has the attraction that query is very natural • A geographic data handling systems employ this model are: TIGRIS, DAPLEX, and PROBE • It is application in GIS is recommended Object oriented Model-Continue • Objects are typed and the format and operations of an object instance are the same as some object prototype • Example of an object might be a lake: • List of border chain: C1, C2, C3, Cn • List of nodes: N1, N2, N3, Nn • Attribute: Depth, soil type Object oriented Model-Continue For example student can be a superclass. First and second year student may represented by a classes that are specialization of a student class variables and methods specific to first year students are associated with fist year student class. Variables and methods that apply both to first and second year students are associated with student class. The variables associated with each class may be: Student: Name, ID, address Student First year student: Subject Second year student:Practical course First year Second year Hierarchical Model • Based on Tree structure (child-parent) • No element can have more than one parent • Requires knowledge by the user of the actual storage scheme used by the DBMS • Examples of database are: System2000 and IMS • Not commonly applicable in GIS Network Model • Organized data in a network or plex structure (child-parent) • Children may have more than one parent • The query language is procedural • Examples of database are: DBMS-10, DMS1100, IDMS • Not commonly applicable in GIS Database tree structure (Hierarchical, Network) Root (Parent) A1 B2 B1 C1 Child C2 Child Parent C3 C4 Child C5 C6 Relational model • A relational database consists of a collection of tables, each of which is assigned a unique name • The relational models differs from network and hierarchical models in that it does not use pointers or links. Instead , the relational model relate records by the value they contain.This freedom from the use of pointers allows formal mathematical foundation to be defined • Examples of RDBMS are Oracle, Informix, and Sybase Reasons to use Relational Model • Independence of the physical data storage and logical database structure. Results in users do not need to understand the underlying physical layout of the data to access data from a logical structure, such as a table • Variable and easy access to all data. Results in access to data is not predefined as in hierarchical databases in which users must understand and navigate through the hierarchy to retrieve data • Flexible in database design. i.e complex objects are expressed as simple tables and relationships • Applying relational design methods reduces data redundancy (Normalization) and storage requirements Relational DBMS • Aspects of an RDBMS – Structures: Well defined objects – Operations: Clearly defined actions – Integrity Rules: Rules that control which operations are allowed on the data and structures of the database Relational DBMS • Components of a Relational Database – Table: collection of rows all containing the same columns – Row: Horizontal components of a table. Consists of values for each column. Each row is equivalent to a record – Column: Vertical component of a table. Each column in the record is often referred to as a field Relational DBMS • Relational Database Rules – Each column in a table must be unique – The order of the rows in a table is not meaningful – The order of the columns in a table is not meaningful – All data in a column must be the same type – Every table has a primary key, each column in the primary key must have a value Relational DBMS • Primary Key and Foreign Key – Relational database use primary keys and foreign keys to allow mapping of information from one table to another – A foreign key is column or group of columns in a table whose value matches those of the primary key of another table – Values in primary key column must be unique e.g. social security number (SSN) Relational DBMS • Primary Key and Foreign Key – Referential Integrity refers to the integrity of the reference from the primary key in one table to a foreign key in another table. Relationships between Tables • • • • One-to-One One-to-Many Many-to-One Many-to-Many Relational DBMS • Relational Database Example (1-1) Weather table city_name Washington Amsterdam Warsaw Tokyo Washington measurement_dt avg_temp 05-01-94 70 05-01-94 47 05-01-94 43 05-01-94 60 05-01-94 55 Foreign Key Primary Key Location table city_name Seattle Amsterdam Warsaw Tokyo Washington Primary Key country_name United States Neatherlands Poland Japan 05-01-94 One-to-One NAME Caroline Talbot Dorchester Wicomico Worcester Somerset AREA POP1990 331.8066 27035 238.2847 30549 534.1747 30236 383.3481 74339 457.6503 35028 269.267 23440 NAME Caroline Talbot Dorchester Wicomico Worcester Somerset NAME POP1997 Caroline 29424 Talbot 32565 Dorchester 29953 Wicomico 79716 Worcester 41885 Somerset 24251 AREA POP1990 POP1997 331.8066 27035 29424 238.2847 30549 32565 534.1747 30236 29953 383.3481 74339 79716 457.6503 35028 41885 269.267 23440 24251 Relational DBMS • Relational Database Example (1-M) Complexes table comp_name Kotraya H.Plaza Komtar Primary Key N.Shops 444 555 622 N.Banks 70 47 43 Owner table comp_name Kotraya Kotraya Kotraya Kotraya Kotraya Foreign Key shop_owner ALi Tan Lee Raman Nora One-to-Many Dept. Agri Agri Nat Sci Dept. Agri Nat Sci Agri Agri Agri Nat Sci Agri Agri Advisor Shorter Ruby Mollett Shorter Shorter Ruby Mollett Mollett Advisor Shorter Mollett Ruby Name Aaron Brian Devin Eric Julie Leigh Ronnie Ryan Advisor Shorter Ruby Mollett Shorter Shorter Ruby Mollett Mollett Name Aaron Brian Devin Eric Julie Leigh Ronnie Ryan Many-to-One SITE WV1 WV2 MD1 MD2 SITE WV1 WV2 MD1 MD2 AREA 331.8066 238.2847 534.1747 383.3481 Veg Type D1 C2 D1 S3 AREA 331.8066 238.2847 534.1747 383.3481 Veg Type Name C2 Forest Coniferous D1 Forest Deciduous S3 Wetland Swamp Dominant pH of Soil Scrub Pine 6.5 Maple 8 Skunk Cabbage 5 Veg Type Name Dominant pH of Soil D1 Forest Deciduous Maple 8 C2 Forest Coniferous Scrub Pine 6.5 D1 Forest Deciduous Maple 8 S3 Wetland Sw am p Skunk Cabbage 5 Data Definition Language (DDL) and Data Manipulation Language (DML) • DDL store files that contains data about data (metadata). For example storage of structure in data dictionary • DML enable users to access or manipulate data (retrieval, insertion, deletion). The part of DML that involves information retrieval is called a query language (QL) Types of DML • There are two types of DML, procedural and non procedural • Procedural DML: Require the user to specify what data is needed and how to get it • Non-procedural DML: Require the user to specify what data is needed without specify how to get it Query language (QL) • QL is the language in which a user requests information from the database. Example of QL are relational algebra (procedural) and tuple relational calculus (nonprocedural) • The most common query languages are Structured Query Language (SQL),Query By Example (QBE), and Quel • SQL has gain wide acceptance in commercial products Relational algebra (RA) • RA is a procedural query language. It consists of a set of operations that take one or two relations as input and produce a new relation as a result • The fundamental operations in RA are select(6), project(II), Cartesian product(X) , rename, union(U), and set difference(-) • Other operations include intersection, natural join, division, and assignment • To select those tuples (rows) of the AccCust relation where customer name is “John” it could be written as: 6name = “John” (AccCust) The results may be one or more records or street = “ spring” 6street = “Spring” (AccCust) • Comparisons can done using =, >, <, >=,etc. • Let E1 and E2 be relational algebra expressions. Then the following are all relational algebra expressions: • E1 U E2 union • E1 - E2 set difference(-) • E1 X E2 Cartesian product Structured Query Language (SQL) • SQL is the standard relational database language • SQL include commands not only restricted to query but to other functions such as defining relation, deleting relations, creating indices, and modifying relation scheme, access right, integrity, and transaction control • Basic structure of an SQL expression consists of three clauses: Select, from, and where • A typical SQL query has the form select A1, A2, ..., An from r1, r2, .., rm where P Each Ai represents an attribute and each ri a relation. P is a predicate (selection). The list of attributes A1..An can be replaced by (*) to select all attributes • The result of an SQL query is a relation (table) • The keyword “distinct” is used after select to force the elimination of duplicates. For example, find all customer names having a balance equal $ 6000 select distinct customer-name from AccCust where balance = 6000 • SQL supports union, intersect, and minus • SQL uses logical connectives and, or, and not and operators (+, -, *, /) on values from tuples e.g. select distinct customer-name from AccCust where balance between 600 and 700 Setting up User Accounts and Privileges • Typical Roles and Privileges User 1 User 2 User 3 User 4 Role 1 create table create view alter table select update Creating Tables – In a relational database, data is stored in tables – Users with connect privileges can create tables – The creator of the table becomes the owner – Users cannot access the tables unless permission is explicitly given – Users can modify the database by deletion i.e. remove selected tuples, insertion i.e. insert tuples, or update i.e change a value in a tuple without changing all values in the tuple Field types • • • • • INTEGERS e.g. number of population, age REAL (DECIMAL) e.g. income, salary CHARACTER e.g. names, description DATES e.g. date of flood IMAGES (Multimedia) e.g. image of flood (*.tif) • SOUND e.g. Sound of thunder (*.wav) • MOVIE e.g. recording film (*.avi, *.mov) • Granting Object Privileges – Only owners of a database object can grant object privileges to other users – Only users with DBA privileges can grant object privileges to roles Creating and Displaying Indexes • Indexes are optional structures that can be created on any column or set of columns in a table to speed up viewing and retrieval of data rows. • An index is a list of keywords with the location of the keyword information • Users can create indexes Index Concept – Index is conceptually similar to an index at the end of a book or cataloging books in a library – Database index is not a physical structure but a logical structure. The RDBMS, not the user uses the database index. The DBMS chooses the best index to use in the table. – Index can dramatically speed up data access but will require additional space and may slow down certain SQL operations Exporting and Importing Data • Export and Import utilities allows data to be moved in and out of the native database. For Example, converting database tables to other format such Comma Delimited Format (CDF), Space Delimited Format (SDF), *.dbf, and ASCII flat files (Text) Relational Database design In the database design considerations must be paid to: • Repetition of information (Normalization) • Loss of information • Number of fields and records required, some RDBMS supports limited number of fields and records (e.g. 255 fields) • Nature of database, some RDMBS supports only textual data, some supports multimedia (sound, image, etc.) Database quality Check accuracy of attribute database by Performing quality assurance / quality control (QA/QC) on the database by: • Check correctness of data • Check mis-typing errors Storage Capacity Terminology Symbol Term K Kilobyte M Megabyte G Gigabyte Bytes Actual Bytes 1000 1024 1,000,000 1,048,576 1,000,000,000 1,073,741,824 NCGIA Boolean expressions • AND • OR • NOT AND A AND B = T T T F F T F F Result T F F F Boolean Operators both expressions are true Elevation >= 100 AND Veg Type = conif Elevation Veg Type Rainfall 100 Mixed 1 250 conif 3 50 conif 2 225 decid 1 pH 7.0 6.5 4.0 8.0 OR A OR B = T T T F F T F F Result T T T F Boolean Operators at least one expression is true Rainfall > 2 OR Veg type = decid Elevation Veg Type Rainfall 100 Mixed 1 250 conif 3 50 conif 2 225 decid 1 pH 7.0 6.5 4.0 8.0 NOT Negates the Boolean value. Returns TRUE if the Boolean is FALSE Returns FALSE if the Boolean is TRUE Boolean Operators Excludes information NOT pH <= 6.5 Elevation Veg Type Rainfall 100 Mixed 1 250 conif 3 50 conif 2 225 decid 1 pH 7.0 6.5 4.0 8.0 Operators - equals - greater than - less than - not equal to Operators - greater than or equal to - less than or equal to - expressions enclosed in parentheses are evaluated first Query Syntax [] around field names _ in place of spaces in field names “ ” around string values () grouping and evaluation order * multiple character wildcard ? Single character wildcard Date format yyyymmdd Query Syntax Examples ([Roof_type] = "tile”) ([Owner_age] > 65 and [Income] < 20000 ) ([Type_use] = "res*" and [Const_date] < 1955) ([State_name] = “A*”) ([Name] = “?athy”) ([Date] = 19991027) Common Query Syntax Errors [Roof_type] = "tile”) ([Owner_age] > 65) and [Income] < 20000 ) ([Type_use] = res*) ([Const_date] < “1955”) Query Syntax Salisbury Hebron Princess Anne Willards Correct Query Syntax ([City] = “Salisbury” OR [City] = “Princess Anne” OR [City] = “Hebron” OR [City] = “Willards”) Incorrect Query Syntax ([City] = “Salisbury” OR “Princess Anne” OR “Hebron” OR “Willards”) References • Bailey, T. C. (1994). “A review of statistical spatial analysis in geographical information systems.” In: Fotheringham, A. S. and P. A. Rogerson (eds.). Spatial analysis and GIS. Taylor & Francis Ltd., London, UK. • Batini, C. S. and S. B. Navathe (1992). Conceptual database design. The Benjamin/ Cummings Publishing, California, USA. • Hoffer, J. A., J. S. Valacich and J. F. George (1996). Modern systems analysis and design. The Benjamin/Cummings Publishing, California, USA. • Michael, A. and C. Smith (1996). Teach yourself database programming with Visual Basic 4 in 21 days. Sams Publishing, Indiana, USA