* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Infrastructure
Microsoft Jet Database Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Clusterpoint wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational algebra wikipedia , lookup
An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko 1 Presentation Overview Database functionality Relational Data Model SQL Web-Database Connectivity 2 Transaction A transaction is an exchange of: Information; Goods; Services; Currency/currencies; Transaction properties Atomicity: a transaction must be all or nothing. Consistency: a transaction takes the system from one consistent state to another. Isolation. Durability. 3 Databases and Information Economy Shift from computation to information; corporate computing; personal computing and the Internet; scientific computing. Growing importance of transaction-orientation and information retrieval. The database field concentrates on the efficient management of large amounts of persistent, reliable shared data. 4 Database versus File System Database management systems (DBMS) is a software that provides transaction support by implementing Data independence; Data access efficiency; Concurrency control ; Data integrity; Reliability; Security; Data distribution and heterogeneity. 5 Data Independence and Access Efficiency DBMS allows to avoid rewriting all access routines every time the data format changes or data is added/modified/deleted. insulate applications from data storage details. Logical independence: protection from changes in logical structure of data. Physical independence: protection from changes in physical structure of data. DBMS maintains data structures and implements algorithms allowing to avoid linear search indexing: search in O(log n); fast access even on complex data queries. 6 Concurrency Control and Data Integrity Interleaving actions of different applications boosts performance. DBMS insures semantically correct access to the same data by concurrent applications two programs accessing the same data at the same time can result in an inconsistent update; implement sharing in a controlled manner. Data semantics may require certain constraints to be satisfied. DBMS guarantees that application programs comply with the constraints when adding/modifying the data. 7 Reliability and Security DBMS provides techniques for recovery from software and hardware failures guarantee survival of the data across catastrophes. DBMS prevents unauthorized users from accessing/modifying data or denying service to other users. 8 Data Distribution and Heterogeneity Centralization is the enemy of scalability a vast number of modern applications are distributed. Data sharing in a distributed environment is a challenge. Heterogeneity applies to networks, hardware, operating systems, programming languages, data formats, etc. Distributed applications must mask the differences. Need distributed data management. 9 Categories of Data Models High-level or conceptual entities, attributes, relationships. Representational or implementation or logical relational, network hierarchical, objectoriented, object-relational. Physical or low-level data storage. 10 Levels of Abstraction in a Database Schema versus Instance schema = description of the data that captures data types, relationships, constraints on the data; l meta-data (data about data), knowledge, e.g., Employees(EmpName, EmpNo, Dept, Sal) is independent of any application program changes infrequently instance = set of records/tuples/rows for that schema, the actual data in the database at a given time time-varying e.g., <Jane, 201, Shoe, 1M>,<Susan, 302, Toy, 1M> 11 3-schema Architecture Physical level description of a database: how things are stored on disk: files, record structures, indices, data structures for disk blocks, methodology for dealing with too long records, etc. Conceptual level description of a database The description of application data (its schema) using one of the traditional data models. 12 3-Schema Architecture (cont'd) View-level description of a database What users of a particular application see their own customized schema, e.g., for payroll, for the ticket agent, for a simulation program. Multiple levels helps with data independence; helps with maintenance. Many views, single logical and physical schema. Levels of abstraction give data independence. 13 The Entity-Relational Model Entity: a distinguishable object. Entity set: a set of entities all of the same type. Attribute: a single property of an entity; simple vs composite; single-valued vs multi-valued; stored vs derived; null values. Domain: set of values permitted for that attribute. 14 The E-R Model (2) Relationship: an association between two or more entities. Relationship set: a set of relationships all of the same type There is no correct schema for a batch of data. Which schema is best depends on the application. Many basic data modelling choices depend on an understanding of the application. 15 Data Model Data model: notation for describing data, plus a set of operations used to manipulate that data. a set of primitives for defining the structure of a DB; a set of operations for specifying the retrievals and updates on a DB; relational, hierarchical, network, object-oriented. 16 The Relational Model (Codd 1970) The relational data model is the most important data model currently existing. Value-oriented, i.e., allows operations on relations whose results are relations, thus enables to combine operations. As opposed to object-oriented models, in which Operations cannot be applied to the result of other operations; The result of an operation may be a new data type, and operations may not be available for this type. 17 Definitions: Domain & Relation A domain is a set of atomic values. A relation is a finite subset of the cartesian product of a finite list of domains; relation is a set of tuples; order of tuples is irrelevant and no relation has 2 identical tuples; each tuple value is atomic no composite attributes; no multi-valued attributes. 18 Relational Model (cont’d) Everything is represented by relations – Formally: Given sets D1, D2, ....Dn (not necessarily distinct), a relation R D1 X D2 X ...X Dn – Di 's are the domains and n is the arity (degree) of R – elements of R are called tuples – number of tuples in R is the cardinality of R relational data model helps to view a relation as a table Observe the following properties: 19 Relational Model (cont’d) Everything is represented by relations; Given sets D1, D2, ....Dn (not necessarily distinct), a relation R D1 X D2 X ...X Dn; Di 's are the domains and n is the arity (degree) of R; elements of R are called tuples; number of tuples in R is the cardinality of R. Relational data model helps to view a relation as a table: each row represents a tuple (record); each column represents an attribute (field). Properties: no two rows are identical; the ordering of tuples is unimportant; the ordering of columns is important. 20 Keys Let R be a relation schema and K R. K is a superkey of R if it can uniquely identify any tuple in any r(R). There are no tuples t and t' such that t[K] = t'[K}. K is a candidate key if K is a minimal superkey. There is no K' K such that K' is also a superkey of r(R) A primary key is one of the candidate keys, remaining candidate keys are alternate keys; Every relation has a key. A key is a property of a relation schema, not a relation. 21 Integrity Constraints Relational database schema is a set of relation schemas and a set of integrity constraints Integrity constraint: condition that must be true for any instance of a database. Integrity constraints are expected to hold on every database instance of the schema Integrity constraints Structural: key constraint: uniqueness of keys; entity integrity constraint: no primary key value can be null; referential integrity constraint: reference from relation R to relation S must refer to an existing tuple of S Semantic. 22 Foreign Keys A set of attributes (FK) of R is a foreign key if the attributes in FK have the same domain as the primary key (PK) attributes of another relation S, and for each instance of R, the values of FK occur as a value of PK for some instance in S, or is null. In the relational model, the only way an entity can reference another entity is through the value of the primary key of the second entity. Foreign keys don't have to be unique or non-null, but if one component is null, then all components must be null. 23 E-R to Relations (I.e., Defining Relations) Done using DDL (Data Definition Language) Name whole database schema Declare domains for attributes Define relations: name attribute names and domains primary and other keys foreign keys 24 Translating from E-R Represent entity set E by a relation whose attributes are all the E-R attributes of E. Then each tuple represents one entity of E. To represent relation R between entity sets E1, …, Ek, create relation R with key attributes of E1, …, key attributes of Ek, as attributes (rename duplicates). Each tuple of the relation represents one combination of entities that are related to one another. You might have some redundant relations, which you can delete. 25 Schema Normalization Formal theory of database design based on grouping attributes in a particular way using attribute dependencies to achieve ‘good’ schemas 1NF, 2NF, 3NF, BCNF, 4NF, … Goal: don’t store redundant information can represent everything (otherwise, the schema is useless!) 26 Query and Update Languages DDL : data definition language used by DBA to define schemas, create views, create indices DML : data manipulation language used by sophisticated casual user to query data or update data 27 Relational Query Languages Query languages allow manipulation and retrieval of data from a database. Relational model supports simple, powerful query languages; strong formal foundation based on logic; allows for optimization. Two mathematical languages form the basis for relational languages (e.g., SQL) and for implementation: Relational Algebra: More operational, useful for representing execution plans; Relational Calculus: Lets users describe what they want, rather than how to compute it (non-operational, declarative). Basic operations: selection, projection, cross-product, set-difference, union, intersection, join, division 28 SQL SQL (Structured Query Language) is the query language for the System R developed at IBM San Jose [Astraham, Gray, Lindsay, Selinger ..] SQL is now the query language for IBM's DB2 and the de-facto standard on most commercial RDBMS. SQL is a comprehensive language providing statements for data definition, query and update. Hence it is both DDL and DML. 29 SQL (cont’d) SQL allows to create views, it can be embedded in a general-purpose programming language (C or PASCAL). SQL has one basic statement for retrieving data from the database: the SELECT statement SELECT <attribute list> FROM <table list> WHERE <condition> Standards: SQL or SQL1 (ANSI 1986); SQL2 or SQL-92 (ANSI 1992); SQL3 underway: extends SQL with OO and other concepts. 30 SQL Data Types Numeric Integers of various ranges: INTEGER (or INT), SMALLINT; Real numbers of various precision: FLOAT, REAL, DOUBLE PRECISION; Formatted numbers: DECIMAL(i,j) or DEC(i,j) or NUMERIC(i,j). Character Strings Fixed length n: CHAR(n) or CHARACTER(n); Variable length of maximum n: VARCHAR(n) or CHAR VARYING(n) (default n =1). Bit strings: Fixed length n: BIT(n); Varying length of maximum n: VARBIT(n) or BIT VARYING(n). 31 SQL Data Types (cont’d) Date & Time [SQL2]: DATE (10 positions): YYYY-MM-DD; TIME (8 positions): HH:MM:SS; TIMESTAMP:date, time with 6 fractions of seconds and optional time zone TIME(i) defines i decimal fractions of seconds (8+1+i positions): HH:MM:SS:ddd...d; TIME WITH TIME ZONE includes the displacement from standard universal time zone [+13:00 to -12:59] (6 additional positions): HH:MM:SS+/-HH:MM INTERVAL: Year/Month or Day/TIME 32 Data Definition Language DDL is used to define the (schema of) database to create a database schema; to create a domain; to create, drop. alter a table; to create, remove an index [defunct in SQL2]; to create or drop a view; to define integrity constraints; to define access privileges to users (Oracle: CONNECT, RESOURCE, DBA); to GRANT or REVOKE privileges ON/TO object/user SQL2 supports multiple schemas CREATE SCHEMA name AUTHORIZATION user; – CREATE SCHEMA EMPLOYEE AUTHORIZATION yesha; 33 SQL Schema EMP(Name,SSN,DNO,BirthPlace) DEPT(DName,DNO,MGRSSN) PROJECT(PName,PNO,PLocation,DNum) WORKSON(ESSN,PNO,Hours) CREATE SCHEMA 'COMPANY'; CREATE TABLE EMP ( EName name_dom NOT NULL, SSN CHAR(9) NOT NULL, DNO INTEGER NOT NULL, BirthPlace city_dom, PRIMARY KEY(SSN), FOREIGN KEY (DNO) REFERENCES DEPT (DNO) ); 34 Drop DROP command can be used to remove – a schema: DROP SCHEMA Company CASCADE; DROP SCHEMA Company RESTRICT – CASCADE option removes everything: tuples, tables, domains, ... RESTRICT option removes the schema if it has no elements in it a table: DROP TABLE EMP CASCADE; DROP SCHEMA EMP RESTRICT CASCADE option removes the table and all references to it RESTRICT option removes the table if it is not referenced 35 Alter The ALTER allows to: – – – – – alter the domain of an attribute ALTER TABLE Student ALTER GPA NUMBER(4,2); set or drop default value of an attribute ALTER TABLE Student ALTER GPA DROP DEFAULT; ALTER TABLE Student ALTER GPA SET DEFAULT 0.00; add a new attribute to a relation ALTER TABLE Student ALTER Admission DATE; drop an attribute (not in SQL1) ALTER TABLE Student DROP GPA [CASCADE/RESTRICT]; 36 Data Manipulation Language SELECT tuple queries aggregate queries INSERT DELETE UPDATE CREATE VIEW 37 Data Query: SELECT Used for retrieval. Used to specify subqueries for retrieval and for the other operations. Not the sigma or select operator of the relational algebra. The general form of a SELECT statement: SELECT <attribute list> FROM <table list> WHERE <condition> GROUP BY <attribute list> HAVING <condition> ORDER BY <attribute,{ASC/DESC} pair> 38 Relational Operators in SQL Projection: SELECT A,B FROM R Selection: SELECT * FROM R WHERE F The WHERE condition can be a Boolean combination of conditions. Product of two tables, A X B: SELECT FROM R., S. R, S 39 Data Update Examples: Create a new instance of an entity explicitly with all details: INSERT table VALUES (v1, v2, ..., vn) inserts a tuple (v1, v2, ..., vn) into table Unspecified columns get their default value if available,NULL if allowed (and no default is defined), (fails otherwise): INSERT table(c1, c2, cm) VALUES (v1, v2, ..., vm) inserts a tuple with m columns set to (v1, v2, ..., vm). 40 Data Update (cont’d) Examples: UPDATE table SET c1 = v1, ..., cm = vm [WHERE expr] Update all instances matching some local criterion: UPDATE table SET ? WHERE? Remove instances matching some local criterion: DELETE table WHERE ? Remove instances matching a non-local criterion DELETE table FROM table, table2 WHERE AND ? 41 SQL Views An SQL view is a table derived from other tables base (physical) tables—ultimately depend on these defining tables other views Views are usually virtual sometimes materialized 42 View Specification Views are specified with the paradigm CREATE VIEW view AS SELECT … The SELECT … part is the defining query In a view definition, we can define column names of view use aggregation (GROUP BY) 43 View Resolution Views are computed by a sort of macro expansion when needed—view resolution query modification: compute fresh view materialization: store always up to date: modifications to the defining tables are automatically reflected in the view 44 Updating Views A view tuple may have as its source a combination of base tuples, because of joins aggregations Therefore, updating a view is nontrivial to determine may potentially lead to more than one update on the defining relations is often not allowed 45 Updating Views (2) Restricted kinds of views may be updated. The subselect must have only one defining table columns including a candidate key of the table only column names, not expressions no joins, e.g., no correlative subqueries no DISTINCT keyword no aggregates 46 CHECK OPTION Specified for updatable views Checks whether an INSERT or UPDATE to a view would immediately cause the tuple to disappear from the view the new tuple should satisfy the WHERE clause of the view definition 47 SQL Constraints SQL allows declarative constraints such as CREATE ASSERTION assertion-name CHECK (enhanced subselect query) The DBMS checks if any ASSERTION is ever violated 48