Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Clusterpoint wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Functional Database Model wikipedia , lookup
Database model wikipedia , lookup
Chap 2. The Relational Model of Data Contents An Overview of Data Models Basics of the Relational Model Defining a Relation Schema in SQL » after Chapter 6 (SQL) 2 An Algebraic Query Language Constraints on Relations An Overview of Data Models Data model Data model (when focused on the structure): abstract description on the logical structure of data » abstract description of data the description generally consists of structure and operations with certain constraints – structure of the data » high-level description on the structure of the data sometimes referred to as a conceptual (data) model Higher level than data structures in C or Java such as arrays and structures 3 An Overview of Data Models (cont’d) – operations on the data » usually a limited set of high-level operations in DB data model queries operations that retrieve information modifications operations that change the database – constraints on the data » a way to describe limitations on what the data can be (ex) “a movie has at most one title” “a day of the week is an integer between 1 and 7” 4 An Overview of Data Models (cont’d) Various data models – relational model » widely used in all commercial database management systems – semistructured-data model » includes XML and related standards – other data models » object-oriented model may be used for some special purpose applications » object-relational model O-O features are added to the relational model » hierarchical model, network model: used in earlier DBMS 5 An Overview of Data Models (cont’d) Skip Comparison of modeling approaches – semistructured models have more flexibility than relations – the relational model is still preferred in DBMS’s efficiency of operations on large databases ease of use – the productivity of programmers » provides a simple, limited approach to structuring data, yet is reasonably versatile, so anything can be modeled » provides a limited collection of operations on data yet is very useful 13 Basics of the Relational Model Relation – a two-dimensional table » set of tuples whose components have atomic values attributes The relation Movies (or table) title movie1 movie3 year length Gone With the Wind 1939 231 drama Star Wars 1977 124 sciFi Wayne’s World 1992 95 comedy Each row represents a movie Each column represents a property of movies 14 genre tuples (rows) Basics of the Relational Model (cont’d) Attributes – names for the columns of the relation (ex) title, year, length, genre in relation Movies Tuples – rows of a relation (ex) (Star Wars, 1977, 124, sciFi) the relational model requires that each attribute be atomic, i.e., a record structure, set, list, etc are not allowed Domains – an elementary type associated with each attribute of a relation (ex) The value for an attribute title must be a string whose length is less than or equal to 30 15 Basics of the Relational Model (cont’d) Schema » description of data itself – relation schema » name of a relation and the set of attributes for a relation (ex) the schema for relation Movies Movies (title, year, length, genre) – relational database schema (or simply, database schema) a set of schemas for the relations of a database Relation instance – a set of tuples for a given relation 16 Basics of the Relational Model (cont’d) Equivalent representations of a relation – the order of tuples in a relation is irrelevant a relation is a set of tuples, not a list of tuples – the column order is also irrelevant year genre title length 1977 1992 1939 sciFi comedy drama Star Wars Wayne’s World Gone With the Wind 124 95 231 Another presentation of the relation Movies 17 Basics of the Relational Model (cont’d) Key of a relation a fundamental constraint – an attribute (or a set of attributes) in a relation, » where no two tuples are allowed to have the same values in all the attributes of the key (ex) Declare that title and year form a key in Movies (King Kong, 1980, . . . ) (King Kong, 1980, . . . ) – for unique identification of a tuple 18 No Basics of the Relational Model (cont’d) – Notation for the key attribute(s) use underlines » e.g., Movies(title, year, length, genre) – Key constraint » is about all possible instances of the relation not about a single instance – There can be several keys in a relation (ex) Suppose a relation Students the social-security number, student ID, etc. can serve as a key 19 Basics of the Relational Model (cont’d) An example database schema Movies (title:string, year:integer, length:integer, genre:string, studioName:string, producerC#:integer) MovieStar (name:string, address:string, gender:char, birthdate:date) 20 StarsIn (movieTitle:string, movieYear:integer, starName:string) MovieExec (name:string, address:string, cert#:integer, netWorth:integer) Studio (name:string, address:string, presC#:integer) All move executives: including producers in Movies and presidents in Studio Basics of the Relational Model (cont’d) Movies(title, year, length, genre, studioName, producerC#) MovieStar(name, address, gender, birthdate) StarsIn(movieTitle, movieYear, starName) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) 21 DBMS allows us to see the data in this way. Do not need to know how data are physically organized. - order of attributes - delimiters between values - length of strings - existence of indexes, etc Defining a Relation Schema in SQL SQL » sometimes pronounced “sequel” – the principal language used to describe and manipulate relational databases – Data Definition Language (DDL) » for declaring database schemas – Data Manipulation Language (DML) » for querying and modifying the database 22 Note: SQL standards SQL standards – Sequel » System R project (in the middle of 1970); renamed as SQL later on – SQL-86, SQL-89 – SQL-92 (also called SQL2) – SQL-99 (also called SQL3) » object-oriented features, and some others – SQL:2003 » XML-related features – SQL:2006 » XQuery 23 Defining a Relation Schema in SQL (cont’d) Relations in SQL – stored relations (or tables) » relations that exist in the database – views » relations defined by a computation » not stored, but constructed in whole or in part, when needed – temporary tables » constructed by the SQL language processor during execution » thrown away and not stored 24 Defining a Relation Schema in SQL (cont’d) Data types – INT (or INTEGER), SHORTINT – FLOAT (or REAL), DOUBLE PRECISION, DECIMAL » DECIMAL(n,d) n decimal digits with the decimal point assumed to be d positions from the right e.g., DECIMAL(6, 2): 0123.45 » NUMERIC: almost a synonym for DECIMAL – CHAR(n), VARCHAR(n) » character strings of fixed or varying length 25 Defining a Relation Schema in SQL (cont’d) – BIT(n), BIT VARYING(n) » bit strings of fixed or varying length – BOOLEAN » TRUE, FALSE, UNKNOWN – DATE and TIME » character strings of a special form 26 Defining a Relation Schema in SQL (cont’d) Table declarations: CREATE TABLE table-name – CREATE TABLE » relation name and a parenthesized, comma-separated list of the attribute names and their types CREATE TABLE MovieStar ( name CHAR(30), address VARCHAR(255), gender CHAR(1), birthdate DATE ); Table deletions: DROP TABLE table-name DROP TABLE MovieStar; 27 Defining a Relation Schema in SQL (cont’d) Modifying relation schemas: ALTER TABLE table-name – ALTER TABLE » ADD followed by an attribute name and its data type » DROP followed by an attribute name ALTER TABLE MovieStar ADD phone CHAR(16); Existing tuples do not have values. NULL value is used when a specific value is not given. ALTER TABLE MovieStar DROP birthdate; 28 NULL: unknown value (or undefined value) Defining a Relation Schema in SQL (cont’d) Default values – keyword DEFAULT and appropriate value gender CHAR(1) DEFAULT ‘?’, birthdate DATE DEFAULT DATE ‘0000-00-00’ ALTER TABLE MovieStar ADD phone CHAR(16) DEFAULT ‘unlisted’; 29 Defining a Relation Schema in SQL (cont’d) Declaring keys » declare in the CREATE TABLE statement – PRIMARY KEY » NULL is not allowed in the attributes of a key – UNIQUE » NULL is permitted 30 Defining a Relation Schema in SQL (cont’d) (Ex) Declaring keys CREATE TABLE MovieStar ( name CHAR(30) PRIMARY KEY, address VARCHAR(255), gender CHAR(1), birthdate DATE); CREATE TABLE Movies( title CHAR(100), year INT, length INT, genre CHAR(10), studioName CHAR(30), producerC# INT, PRIMARY KEY(title, year) ) ; 31 CREATE TABLE MovieStar ( name CHAR(30), address VARCHAR(255), gender CHAR(1), birthdate DATE, PRIMARY KEY(name) ); When no PRIMARY KEY, the relation is a bag. An Algebraic Query Language Relational algebra – a formal query language » construct new relations from given relations simple but powerful – not used directly in commercial DBMS, but » SQL incorporates the relational algebra at its center » SQL query is often translated into relational algebra 32 An Algebraic Query Language (cont’d) Advantages of relational algebra » over conventional programming languages like C or Java – ease of programming though less powerful than C or Java – optimized by the compiler » e.g., compiler can choose the best available sorting algorithm for the relation to be sorted Algebra in general » consists of operators and operands » operands in the relational algebra: relations 33 (x + y) * z ((x + 7) / (y – 3)) + x An Algebraic Query Language (cont’d) Operations of the relational algebra – usual set operations » union, intersection, difference – operations that remove parts of a relation » selection, projection – operations that combine the tuples of two relations » Cartesian product, join – renaming operations » change the names of the attributes or the name of the relation itself 34 An Algebraic Query Language (cont’d) Set operations: ⋃, ⋂, – » R, S: relations – union: R ⋃ S – intersection: R ⋂ S – difference: R – S Condition » R and S must have schemas with identical sets of attributes » the order of attributes in R and S must be the same 35 An Algebraic Query Language (cont’d) Projection: π – π A1,A2,...,An(R) » produce a relation that has only A1,A2,...,An attributes of R Movies title year length genre studioName producerC# Star Wars Galaxy Quest Wayne ’s World 1977 1999 1992 124 104 95 sciFi Fox DreamWorks Paramount 12345 67890 99999 πtitle, year, length(Movies) 36 comedy comedy title year length Star Wars Galaxy Quest Wayne ’s World 1977 1999 1992 124 104 95 πgenre(Movies) genre sciFi comedy An Algebraic Query Language (cont’d) Selection: s » produces a relation with a subset of tuples of the operand relation – sC(R) » a set of tuples that satisfy a condition C C: conditional expression » operands in C are either constants or attributes of R title year length genre studioName producerC# Star Wars Galaxy Quest 1977 1999 124 104 sciFi comedy Fox Disney s length>100 (Movies) 37 12345 67890 An Algebraic Query Language (cont’d) Cartesian Product: × – set of pairs of tuples from R and S » first element of the pair: any tuple of R » second element of the pair: any tuple of S A A 1 3 R B 2 4 B 2 4 9 C D 5 6 7 8 10 11 S 1 1 1 3 3 3 R.B S.B 2 2 2 4 4 4 2 4 9 2 4 9 R×S 38 C D 5 6 7 8 10 11 5 6 7 8 10 11 An Algebraic Query Language (cont’d) Natural Joins: ⋈ Remove duplicate columns – set of pairs of tuples from R and S » that agree in common attributes of R and S (Ex) Natural Join A 1 3 B 2 4 Dangling tuple - a tuple that fails to be joined common attribute B C D 2 5 6 4 7 8 9 10 11 R S dangling tuple 39 A 1 3 B 2 4 C 5 7 R⋈S D 6 8 One of duplicated columns are removed An Algebraic Query Language (cont’d) (Ex) Natural Join: when there are more than one common attributes A 1 6 9 B 2 7 7 U C 3 8 8 B 2 2 7 C 3 3 8 V D 4 5 10 A 1 1 6 9 B 2 2 7 7 C 3 3 8 8 U⋈V 40 D 4 5 10 10 One of duplicated columns are removed Note: Natural join Definition of Natural Joins R ⋈ S = πL [sC (R ⨉ S)], where » L : union of all the attributes in R and S » C : R.A1= S.A1 R.A2= S.A2 . . . R.An= S.An {A1, A2, . . . , An}: set of common attributes of R and S – If R and S have no common attributes, » R ⋈S = R ⨉ S Because there is no selection condition s, π : produce a subset of a single relation ⋈ : produce a subset of a Cartesian product of two relations 41 An Algebraic Query Language (cont’d) Theta-Joins: R ⋈C S = sC (R ⨉ S) – pair tuples from two relations on some condition 1. take the product of R and S 2. select from the product only those tuples that satisfy the condition C A 1 6 9 B 2 7 7 U C 3 8 8 B 2 2 7 C 3 3 8 V D 4 5 10 A U.B U.C V.B V.C D 1 2 3 2 3 4 1 2 3 2 3 5 1 2 3 7 8 10 6 7 8 7 8 10 9 7 8 7 8 10 U ⋈A<DV 42 Duplicated columns are not eliminated An Algebraic Query Language (cont’d) Combining operations to form queries » construct complex expressions by applying operations to the results of other expressions relations (Ex) Find the titles and years of movies made by “Fox” studio that are at least 100 minutes long. p title, year (slength 43 100 (Movies) ∩ sstudioName= ‘Fox’ (Movies)) An Algebraic Query Language (cont’d) ptitle, year (slength 100 (Movies) ∩ sstudioName= ‘Fox’ (Movies)) Expression tree for a relational algebra expression leaf node: a relation nonleaf node: an operator p title, year ∩ s length 100 Movies 44 s studioName = ‘Fox’ Movies evaluated bottom-up by applying the operator (at a nonleaf node) to its children Note: Equivalent expression Equivalent expressions » expressions that produce the same answer whenever they are given the same relations as operands (ex) p title, year (slength 100 p title, year (s length >100 (Movies) ∩ sstudioName= ‘Fox’ (Movies)) AND studioName = ‘Fox’ (Movies)) p title, year Query optimizer » replace one expression by an equivalent s length 100 AND studioName = ‘Fox’ expression that is more efficiently evaluated Movies 45 An Algebraic Query Language (cont’d) Renaming: r S(A1,A2,...,An) (R) the resulting relation has exactly the same tuples » only change names – same tuples as R – resulting relation has name S and attributes A1, A2, ..., An A 1 3 B 2 4 R B 2 4 9 C D 5 6 7 8 10 11 S A 1 1 1 3 3 3 B 2 2 2 4 4 4 X C D 2 5 6 4 7 8 9 10 11 2 5 6 4 7 8 9 10 11 R ⨉ r S(X,C,D) (S) 46 An Algebraic Query Language (cont’d) R⋂S Relationships among operations – dependent operators » R ⋂ S = R – (R – S) » R ⋈C S = sC (R ⨉ S) R S R-S » R ⋈ S = pL (sC (R ⨉ S)) – independent operators (or fundamental operators) » selection, projection, union, difference, cartesian product, (renaming) cannot be written in terms of others 47 An Algebraic Query Language (cont’d) Linear notation for algebraic expressions – use temporary relations together with a sequence of assignments (ex) ptitle, year (slength temporary relations: R, S, T, Answer 100 (Movies) R (t, y, l, g, s, p) := s length 100 (Movies) S (t, y, l, g, s, p) := s studioName = ‘Fox’ (Movies) T (t, y, l, g, s, p) := R ∩ S Answer (title, year) := p t, y (T) relational algebra expression expression tree sequence of assignments to temporary relations 48 ∩ sstudioName= ‘Fox’ (Movies)) Answer(title, year) := p t, y (R ∩ S) Constraints on Relations restriction on the data, e.g., possible values in attribute “gender” Relational algebra as a constraint language » relational algebra can be used to express constraints e.g., key constraint – two ways to express constraints R, S: expressions of relational algebra » R = f : “There are no tuples in the result of R” » R S : “Every tuple in R must also be in S” These two ways are actually equivalent » R S can be written R - S = f » R = f can be written R f 49 Constraints on Relations (cont’d) Referential integrity constraints » if a value v appears in attribute A of relation R, then v must appear in a particular attribute (say B) in relation S – referential integrity constraint in relational algebra » πA(R) ⊆ πB(S), or » πA(R) – πB(S) = ϕ ... CS Smith ... Students 50 ... CS ... Jones ... Stuart ... BioChem ... ??? Departments 1500 We expect that every department is in the Departments table Constraints on Relations (cont’d) (Ex) Consider the following two relations: Movies (title, year, length, genre, studioName, producerC#) MovieExec (name, address, certificate#, netWorth) – The producer of every movie has to appear in MovieExec. p producerC# (Movies) p certificate# (MovieExec), or p producerC# (Movies) - p certificate# (MovieExec) = f 51 Constraints on Relations (cont’d) (Ex) A referential integrity where the value involved is represented by more than one attribute. StarsIn(movieTitle, movieYear, starName) Movies(title, year, length, genre, studioName, producerC#) – Any movie mentioned in StarsIn also appears in Movies. p movieTitle, movieYear (StarsIn) p title, year (Movies) 52 Constraints on Relations (cont’d) Key constraints MovieStar(name, address, gender, birthdate) – name attribute is a key » no two tuples agree on the name component » if two tuples agree on name, then they must also agree on address » these two tuples must be the same tuples and agree in all attributes s MS1.name = MS2.name AND MS1.address MS2.address (MS1×MS2) = f a correct key constraint MS1=r MS1(name, address, gender, birthdate) (MovieStar) MS2=r MS2(name, address, gender, birthdate) (MovieStar) Not exactly a key constraint, but a functional dependency MS1.name = MS2.name AND (MS1.address MS2.address OR MS1.gender MS2.gender OR MS1.birthdate MS2.birthdate) 53 Constraints on Relations (cont’d) Additional constraints (Ex) Values of gender attribute of MovieStar must be ‘F’ or ‘M’ s gender ‘F’ AND Domain constraint 54 gender’M’ (MovieStar) =f Constraints on Relations (cont’d) (Ex) One must have a net worth of at least $10,000,000 to be the president of a movie studio. We have assumed a referential integrity constraint from Studio.presC# to MovieExec.cert# MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) snetWorth<10000000 (Studio ⋈presC#=cert# MovieExec) = f or ppresC# (Studio) pcert# (snetWorth 10000000 (MovieExec)) 55 Neither domain constraint, nor referential integrity constraint