Download An Overview of Data Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Functional Database Model wikipedia , lookup

PL/SQL wikipedia , lookup

SQL wikipedia , lookup

Database model wikipedia , lookup

Relational algebra wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Chap 2. The Relational Model of Data
Contents

An Overview of Data Models

Basics of the Relational Model

Defining a Relation Schema in SQL
» after Chapter 6 (SQL)
2

An Algebraic Query Language

Constraints on Relations
An Overview of Data Models

Data model
Data model (when focused on the structure):
abstract description on the logical structure of data
» abstract description of data
 the description generally consists of structure and operations
 with certain constraints
– structure of the data
» high-level description on the structure of the data
 sometimes referred to as a conceptual (data) model
Higher level than data structures in C or Java
such as arrays and structures
3
An Overview of Data Models (cont’d)
– operations on the data
» usually a limited set of high-level operations in DB data model
 queries
 operations that retrieve information
 modifications
 operations that change the database
– constraints on the data
» a way to describe limitations on what the data can be
(ex) “a movie has at most one title”
“a day of the week is an integer between 1 and 7”
4
An Overview of Data Models (cont’d)

Various data models
– relational model
» widely used in all commercial database management systems
– semistructured-data model
» includes XML and related standards
– other data models
» object-oriented model
 may be used for some special purpose applications
» object-relational model
 O-O features are added to the relational model
» hierarchical model, network model: used in earlier DBMS
5
An Overview of Data Models (cont’d)
Skip

Comparison of modeling approaches
– semistructured models have more flexibility than relations
– the relational model is still preferred in DBMS’s
 efficiency of operations on large databases
 ease of use – the productivity of programmers
» provides a simple, limited approach to structuring data,
 yet is reasonably versatile, so anything can be modeled
» provides a limited collection of operations on data
 yet is very useful
13
Basics of the Relational Model

Relation
– a two-dimensional table
» set of tuples whose components have atomic values
attributes
The relation Movies
(or table)
title
movie1
movie3
year
length
Gone With the Wind
1939
231
drama
Star Wars
1977
124
sciFi
Wayne’s World
1992
95
comedy
Each row represents a movie
Each column represents a property of movies
14
genre
tuples (rows)
Basics of the Relational Model (cont’d)

Attributes
– names for the columns of the relation
(ex) title, year, length, genre in relation Movies

Tuples
– rows of a relation
(ex) (Star Wars, 1977, 124, sciFi)

the relational model requires that
each attribute be atomic, i.e.,
a record structure, set, list, etc
are not allowed
Domains
– an elementary type associated with each attribute of a relation
(ex) The value for an attribute title must be a string whose length is
less than or equal to 30
15
Basics of the Relational Model (cont’d)

Schema
» description of data itself
– relation schema
» name of a relation and the set of attributes for a relation
(ex) the schema for relation Movies
Movies (title, year, length, genre)
– relational database schema (or simply, database schema)
 a set of schemas for the relations of a database

Relation instance
– a set of tuples for a given relation
16
Basics of the Relational Model (cont’d)

Equivalent representations of a relation
– the order of tuples in a relation is irrelevant
 a relation is a set of tuples, not a list of tuples
– the column order is also irrelevant
year
genre
title
length
1977
1992
1939
sciFi
comedy
drama
Star Wars
Wayne’s World
Gone With the Wind
124
95
231
Another presentation of the relation Movies
17
Basics of the Relational Model (cont’d)

Key of a relation
 a fundamental constraint
– an attribute (or a set of attributes) in a relation,
» where no two tuples are allowed to have the same values in all the
attributes of the key
(ex) Declare that title and year form a key in Movies
(King Kong, 1980, . . . )
(King Kong, 1980, . . . )
– for unique identification of a tuple
18
No
Basics of the Relational Model (cont’d)
– Notation for the key attribute(s)
 use underlines
» e.g., Movies(title, year, length, genre)
– Key constraint
» is about all possible instances of the relation
 not about a single instance
– There can be several keys in a relation
(ex) Suppose a relation Students
 the social-security number, student ID, etc. can serve as a key
19
Basics of the Relational Model (cont’d)

An example database schema
Movies (title:string,
year:integer,
length:integer,
genre:string,
studioName:string,
producerC#:integer)
MovieStar (name:string,
address:string,
gender:char,
birthdate:date)
20
StarsIn (movieTitle:string,
movieYear:integer,
starName:string)
MovieExec (name:string,
address:string,
cert#:integer,
netWorth:integer)
Studio (name:string,
address:string,
presC#:integer)
All move executives:
including producers
in Movies and
presidents in Studio
Basics of the Relational Model (cont’d)
Movies(title, year, length, genre, studioName, producerC#)
MovieStar(name, address, gender, birthdate)
StarsIn(movieTitle, movieYear, starName)
MovieExec(name, address, cert#, netWorth)
Studio(name, address, presC#)
21
DBMS allows us to see the data
in this way. Do not need to know
how data are physically organized.
- order of attributes
- delimiters between values
- length of strings
- existence of indexes, etc
Defining a Relation Schema in SQL

SQL
» sometimes pronounced “sequel”
– the principal language used to describe and manipulate
relational databases
– Data Definition Language (DDL)
» for declaring database schemas
– Data Manipulation Language (DML)
» for querying and modifying the database
22
Note: SQL standards

SQL standards
– Sequel
» System R project (in the middle of 1970); renamed as SQL later on
– SQL-86, SQL-89
– SQL-92 (also called SQL2)
– SQL-99 (also called SQL3)
» object-oriented features, and some others
– SQL:2003
» XML-related features
– SQL:2006
» XQuery
23
Defining a Relation Schema in SQL (cont’d)

Relations in SQL
– stored relations (or tables)
» relations that exist in the database
– views
» relations defined by a computation
» not stored, but constructed in whole or in part, when needed
– temporary tables
» constructed by the SQL language processor during execution
» thrown away and not stored
24
Defining a Relation Schema in SQL (cont’d)

Data types
– INT (or INTEGER), SHORTINT
– FLOAT (or REAL), DOUBLE PRECISION, DECIMAL
» DECIMAL(n,d)
 n decimal digits with the decimal point assumed to be d positions
from the right
 e.g., DECIMAL(6, 2): 0123.45
» NUMERIC: almost a synonym for DECIMAL
– CHAR(n), VARCHAR(n)
» character strings of fixed or varying length
25
Defining a Relation Schema in SQL (cont’d)
– BIT(n), BIT VARYING(n)
» bit strings of fixed or varying length
– BOOLEAN
» TRUE, FALSE, UNKNOWN
– DATE and TIME
» character strings of a special form
26
Defining a Relation Schema in SQL (cont’d)

Table declarations: CREATE TABLE table-name
– CREATE TABLE
» relation name and a parenthesized, comma-separated list of the attribute
names and their types
CREATE TABLE MovieStar (
name CHAR(30),
address VARCHAR(255),
gender CHAR(1),
birthdate DATE );
Table deletions: DROP TABLE table-name
DROP TABLE MovieStar;
27
Defining a Relation Schema in SQL (cont’d)

Modifying relation schemas: ALTER TABLE table-name
– ALTER TABLE
» ADD followed by an attribute name and its data type
» DROP followed by an attribute name
ALTER TABLE MovieStar ADD phone CHAR(16);
Existing tuples
do not have values.
NULL value is used when
a specific value is not given.
ALTER TABLE MovieStar DROP birthdate;
28
NULL: unknown value
(or undefined value)
Defining a Relation Schema in SQL (cont’d)

Default values
– keyword DEFAULT and appropriate value
gender CHAR(1) DEFAULT ‘?’,
birthdate DATE DEFAULT DATE ‘0000-00-00’
ALTER TABLE MovieStar ADD phone CHAR(16) DEFAULT ‘unlisted’;
29
Defining a Relation Schema in SQL (cont’d)

Declaring keys
» declare in the CREATE TABLE statement
– PRIMARY KEY
» NULL is not allowed in the attributes of a key
– UNIQUE
» NULL is permitted
30
Defining a Relation Schema in SQL (cont’d)
(Ex) Declaring keys
CREATE TABLE MovieStar (
name CHAR(30) PRIMARY KEY,
address VARCHAR(255),
gender CHAR(1),
birthdate DATE);
CREATE TABLE Movies(
title CHAR(100),
year INT,
length INT,
genre CHAR(10),
studioName CHAR(30),
producerC# INT,
PRIMARY KEY(title, year) ) ;
31
CREATE TABLE MovieStar (
name CHAR(30),
address VARCHAR(255),
gender CHAR(1),
birthdate DATE,
PRIMARY KEY(name) );
When no PRIMARY KEY,
the relation is a bag.
An Algebraic Query Language

Relational algebra
– a formal query language
» construct new relations from given relations
 simple but powerful
– not used directly in commercial DBMS, but
» SQL incorporates the relational algebra at its center
» SQL query is often translated into relational algebra
32
An Algebraic Query Language (cont’d)

Advantages of relational algebra
» over conventional programming languages like C or Java
– ease of programming
 though less powerful than C or Java
– optimized by the compiler
» e.g., compiler can choose the best available sorting algorithm for the
relation to be sorted
Algebra in general
» consists of operators and operands
» operands in the relational algebra: relations
33
(x + y) * z
((x + 7) / (y – 3)) + x
An Algebraic Query Language (cont’d)

Operations of the relational algebra
– usual set operations
» union, intersection, difference
– operations that remove parts of a relation
» selection, projection
– operations that combine the tuples of two relations
» Cartesian product, join
– renaming operations
» change the names of the attributes or the name of the relation itself
34
An Algebraic Query Language (cont’d)

Set operations: ⋃, ⋂, –
» R, S: relations
– union: R ⋃ S
– intersection: R ⋂ S
– difference: R – S
Condition
» R and S must have schemas with identical sets of attributes
» the order of attributes in R and S must be the same
35
An Algebraic Query Language (cont’d)

Projection: π
– π A1,A2,...,An(R)
» produce a relation that has only A1,A2,...,An attributes of R
Movies
title
year
length
genre
studioName
producerC#
Star Wars
Galaxy Quest
Wayne ’s World
1977
1999
1992
124
104
95
sciFi
Fox
DreamWorks
Paramount
12345
67890
99999
πtitle, year, length(Movies)
36
comedy
comedy
title
year
length
Star Wars
Galaxy Quest
Wayne ’s World
1977
1999
1992
124
104
95
πgenre(Movies) genre
sciFi
comedy
An Algebraic Query Language (cont’d)

Selection: s
» produces a relation with a subset of tuples of the operand relation
– sC(R)
» a set of tuples that satisfy a condition C
 C: conditional expression
» operands in C are either constants or attributes of R
title
year
length
genre
studioName producerC#
Star Wars
Galaxy Quest
1977
1999
124
104
sciFi
comedy
Fox
Disney
s length>100 (Movies)
37
12345
67890
An Algebraic Query Language (cont’d)

Cartesian Product: ×
– set of pairs of tuples from R and S
» first element of the pair: any tuple of R
» second element of the pair: any tuple of S
A
A
1
3
R
B
2
4
B
2
4
9
C D
5 6
7 8
10 11
S
1
1
1
3
3
3
R.B S.B
2
2
2
4
4
4
2
4
9
2
4
9
R×S
38
C
D
5 6
7 8
10 11
5 6
7 8
10 11
An Algebraic Query Language (cont’d)

Natural Joins: ⋈
Remove duplicate columns
– set of pairs of tuples from R and S
» that agree in common attributes of R and S
(Ex) Natural Join
A
1
3
B
2
4
Dangling tuple
- a tuple that fails to be joined
common attribute
B C D
2 5 6
4 7 8
9 10 11
R
S
dangling tuple
39
A
1
3
B
2
4
C
5
7
R⋈S
D
6
8
One of duplicated
columns are removed
An Algebraic Query Language (cont’d)
(Ex) Natural Join: when there are more than one common attributes
A
1
6
9
B
2
7
7
U
C
3
8
8
B
2
2
7
C
3
3
8
V
D
4
5
10
A
1
1
6
9
B
2
2
7
7
C
3
3
8
8
U⋈V
40
D
4
5
10
10
One of duplicated
columns are removed
Note: Natural join

Definition of Natural Joins
R ⋈ S = πL [sC (R ⨉ S)], where
» L : union of all the attributes in R and S
» C : R.A1= S.A1  R.A2= S.A2  . . .  R.An= S.An
 {A1, A2, . . . , An}: set of common attributes of R and S
– If R and S have no common attributes,
» R ⋈S = R ⨉ S
Because there is
no selection condition
 s, π : produce a subset of a single relation
 ⋈ : produce a subset of a Cartesian product of two relations
41
An Algebraic Query Language (cont’d)

Theta-Joins: R ⋈C S = sC (R ⨉ S)
– pair tuples from two relations on some condition
1. take the product of R and S
2. select from the product only those tuples that satisfy
the condition C
A
1
6
9
B
2
7
7
U
C
3
8
8
B
2
2
7
C
3
3
8
V
D
4
5
10
A U.B U.C V.B V.C D
1 2
3
2
3 4
1 2
3
2
3 5
1 2
3
7
8 10
6 7
8
7
8 10
9 7
8
7
8 10
U ⋈A<DV
42
Duplicated
columns
are not
eliminated
An Algebraic Query Language (cont’d)

Combining operations to form queries
» construct complex expressions by applying operations to the results of
other expressions
relations
(Ex) Find the titles and years of movies made by “Fox” studio that
are at least 100 minutes long.
p title, year (slength
43
 100
(Movies) ∩ sstudioName= ‘Fox’ (Movies))
An Algebraic Query Language (cont’d)
ptitle, year (slength
 100
(Movies) ∩ sstudioName= ‘Fox’ (Movies))
 Expression tree for a relational algebra expression
 leaf node: a relation
 nonleaf node: an operator
p title, year
∩
s length  100
Movies
44
s studioName = ‘Fox’
Movies
evaluated bottom-up
by applying the operator
(at a nonleaf node)
to its children
Note: Equivalent expression
Equivalent expressions
» expressions that produce the same answer whenever they are given the
same relations as operands
(ex) p title, year (slength
 100
p title, year (s length >100
(Movies) ∩ sstudioName= ‘Fox’ (Movies))
AND studioName = ‘Fox’
(Movies))
p title, year
Query optimizer
» replace one expression by an equivalent
s length  100 AND studioName = ‘Fox’
expression that is more efficiently evaluated
Movies
45
An Algebraic Query Language (cont’d)

Renaming: r S(A1,A2,...,An) (R)
the resulting relation has
exactly the same tuples
» only change names
– same tuples as R
– resulting relation has name S and attributes A1, A2, ..., An
A
1
3
B
2
4
R
B
2
4
9
C D
5 6
7 8
10 11
S
A
1
1
1
3
3
3
B
2
2
2
4
4
4
X C D
2 5 6
4 7 8
9 10 11
2 5 6
4 7 8
9 10 11
R ⨉ r S(X,C,D) (S)
46
An Algebraic Query Language (cont’d)

R⋂S
Relationships among operations
– dependent operators
» R ⋂ S = R – (R – S)
» R ⋈C S = sC (R ⨉ S)
R
S
R-S
» R ⋈ S = pL (sC (R ⨉ S))
– independent operators (or fundamental operators)
» selection, projection, union, difference, cartesian product, (renaming)
 cannot be written in terms of others
47
An Algebraic Query Language (cont’d)

Linear notation for algebraic expressions
– use temporary relations together with a sequence of assignments
(ex) ptitle, year (slength
temporary
relations:
R, S, T,
Answer
 100 (Movies)
R (t, y, l, g, s, p) := s length
 100
(Movies)
S (t, y, l, g, s, p) := s studioName = ‘Fox’ (Movies)
T (t, y, l, g, s, p) := R ∩ S
Answer (title, year) := p t, y (T)
 relational algebra expression
 expression tree
 sequence of assignments to temporary relations
48
∩ sstudioName= ‘Fox’ (Movies))
Answer(title, year) := p t, y (R ∩ S)
Constraints on Relations
restriction on the data,
e.g., possible values in attribute “gender”

Relational algebra as a constraint language
» relational algebra can be used to express constraints
 e.g., key constraint
– two ways to express constraints
 R, S: expressions of relational algebra
» R = f : “There are no tuples in the result of R”
» R  S : “Every tuple in R must also be in S”
These two ways are actually equivalent
» R  S can be written R - S = f
» R = f can be written R  f
49
Constraints on Relations (cont’d)

Referential integrity constraints
» if a value v appears in attribute A of relation R, then v must appear in a
particular attribute (say B) in relation S
– referential integrity constraint in relational algebra
» πA(R) ⊆ πB(S), or
» πA(R) – πB(S) = ϕ
...
CS
Smith
...
Students
50
...
CS
...
Jones
...
Stuart
... BioChem
...
???
Departments
1500
We expect that
every department
is in the
Departments table
Constraints on Relations (cont’d)
(Ex) Consider the following two relations:
Movies (title, year, length, genre, studioName, producerC#)
MovieExec (name, address, certificate#, netWorth)
– The producer of every movie has to appear in MovieExec.
p producerC# (Movies)  p certificate# (MovieExec), or
p producerC# (Movies) - p certificate# (MovieExec) = f
51
Constraints on Relations (cont’d)
(Ex) A referential integrity where the value involved is
represented by more than one attribute.
StarsIn(movieTitle, movieYear, starName)
Movies(title, year, length, genre, studioName, producerC#)
– Any movie mentioned in StarsIn also appears in Movies.
p movieTitle, movieYear (StarsIn)  p title, year (Movies)
52
Constraints on Relations (cont’d)

Key constraints
MovieStar(name, address, gender, birthdate)
– name attribute is a key
» no two tuples agree on the name component
» if two tuples agree on name, then they must also agree on address
» these two tuples must be the same tuples and agree in all attributes
s MS1.name = MS2.name AND MS1.address  MS2.address (MS1×MS2) = f
a correct
key constraint
 MS1=r MS1(name, address, gender, birthdate) (MovieStar)
 MS2=r MS2(name, address, gender, birthdate) (MovieStar)
Not exactly a key constraint,
but a functional dependency
MS1.name = MS2.name AND (MS1.address  MS2.address OR MS1.gender  MS2.gender OR MS1.birthdate  MS2.birthdate)
53
Constraints on Relations (cont’d)

Additional constraints
(Ex) Values of gender attribute of MovieStar must be ‘F’ or ‘M’
s gender  ‘F’ AND
Domain constraint
54
gender’M’ (MovieStar)
=f
Constraints on Relations (cont’d)
(Ex) One must have a net worth of at least $10,000,000
to be the president of a movie studio.
We have assumed a referential integrity constraint
from Studio.presC# to MovieExec.cert#
MovieExec(name, address, cert#, netWorth)
Studio(name, address, presC#)
snetWorth<10000000 (Studio ⋈presC#=cert# MovieExec) = f
or
ppresC# (Studio)  pcert# (snetWorth 10000000 (MovieExec))
55
Neither domain constraint, nor
referential integrity constraint