Download CSE 331. Computer Organization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ECE 569
Database System Engineering
Fall 2004
Yanyong Zhang www.ece.rutgers.edu/~yyzhang
Course URL www.ece.rutgers.edu/~yyzhang/fall04
ECE569 Lecture 02.1
Fall 2004
Today’s topic


Relational data model

Different data models exist: relational, network,
hierarchical, object-oriented

Powerful, simple, declarative
Relational algebra
ECE569 Lecture 02.2
Fall 2004
Overview of Database Design

Conceptual design (ER model is used at this stage)

What are the entities and relationships in the enterprise?

What information about these entities and relationships
should we store in the database?

What are the integrity constraints or business rules that
hold

A database ‘schema’ in the ER model can be represented
pictorially

Can map an ER diagram into a relational schema
ECE569 Lecture 02.3
Fall 2004
Entity-Relationship model

To describe the conceptual scheme.

An entity is a thing that exists and is distinguishable.

Entity sets consist of a group of all “similar” entities.

Each entity has certain attributes.

A relationship among entity sets is an ordered list of entity
sets.

Relationship set: collection of similar relationships

An n-ary relationship set R relates n entity sets E1, …, En; each
relationship in R involves entities e1 E1, …, en En
- Same entity set could participate in different relationship sets, or in
different “roles” in same set
ECE569 Lecture 02.4
Fall 2004
Example of entity-relationship model
Date
Balance
Account
Made To
Billed
Patient
From
Address
ECE569 Lecture 02.5
Disease
Vital Sign
Time
Blood
Diagnosed
Treatment
Amount
Pulse
Name
Room#
Payment
Name
Fall 2004
Data Model

A data model is a mathematical formalism with two
parts:

A notation for describing data

A set of operations used to manipulate that data
ECE569 Lecture 02.6
Fall 2004
Why Study the Relational Model?

Most widely used model


“legacy systems” in older models


Vendors: IBM, informix, Microsoft,Oracle,Sybase,etc.
e.g., IBM’s IMS
Recent competitor: object-oriented model

ObjectStore, Versant, Ontos

A synthesis emerging: object-relational model
- Informix Universal server, UniSQL, O2, Oracle, DB2
ECE569 Lecture 02.7
Fall 2004
Relational Database: Definitions

Relational database: a set of relations

Relation: made up of 2 parts

Instance: a table, with rows and columns
- # Rows = cardinality, # columns = degree

Schema: specifies name of relation, plus name and type
of each column
- e.g., students(sid:string, name:string,
login:string,age:integer, gpa:real)

Can think of a relation as a set of rows or tuples (i.e., all
rows are distinct)
ECE569 Lecture 02.8
Fall 2004
Set-theoretic notion of a relation

A domain D is a set of values

colors = {red, blue, green}

age = set of positive integers less than 20

last_name = {a-zA-Z}+

A relation over domains D1,D2,…,Dn is a subset of
D1D2D3…Dn.

Example

R(colors, age)  {(red,1), (blue,1),(green,1),
(red,2), (blue,2),(green,2), …}
ECE569 Lecture 02.9
Fall 2004
Set-theoretic notion of a relation (cont)

We are only interested in finite relations.

The members of a relation are called tuples.

Each relation that is a subset of the product of k
domains is k-degree.

A relation can be represented as a table, where
each row is a tuple and each column corresponds
to one domain.
PERSON (NAME, AGE)
Name
Zhang
Merril
Stewart
ECE569 Lecture 02.10
Age
27
2
40
Fall 2004
Set-theoretic notion of a relation (cont)

Because a relation is a set of tuples, the order of
tuples in the table is insignificant (but the order of
domains does matter).
Name
Zhang
Merril
Stewart
Age
27
2
40

Name
Zhang
Stewart
Merril
Age
27
40
2

Age
27
2
40
ECE569 Lecture 02.11
Name
Zhang
Merill
Stewart
Fall 2004
An alternative definition – set of mappings

A relation schema R = {A1,A2,…,An} (no order imposed on
attributes)

A relation instance of R is a finite set of mappings {1, 2, …,
m} where
2: R  D
where D = null  dom(A1)  dom(A2)…  dom(An).

Each tuple i maps each attribute in tuple i to a value.

Note that attributes must be given names that are unique in a
relation schema.

Under this definition, all three instances above are
equivalent.
ECE569 Lecture 02.12
Fall 2004
Keys

A set S of attributes is a candidate key for R if

No semantically correct instance of R can include two
tuples    such that [s] = [s], and

No proper subset of S satisfies (a)

A set of attributes satisfying (a) but not necessarily
(b) is a superkey.

A relation may have multiple candidate keys. One
can be designated as the primary key.
ECE569 Lecture 02.13
Fall 2004
Example

Schema for a relation of degree four
patient(name,address,balance_due,room#)

An instance of patient with three tuples that
satisfies the key constraint (name is a candidate
key)
patient

Name
Address
Balance_due
Room #
Zhang
1964 highland dr
$5,230
518
Stewart
2342 K St.
$25
203
Brinkley
2342 K St.
$10,200
203
No other candidate key is possible (assuming that
this is a semantically correct instance of patient).
ECE569 Lecture 02.14
Fall 2004
Example – Healthcare DB
patient(name, address, balance_due, room#)
payments(name, amount, date)
vital_signs(name, pulse, blood, date, time)
diagnosis(patient_name, disease_name)
disease(disease_name, treatment)
treats(patient, doctor)
ECE569 Lecture 02.15
Fall 2004
Example instance of students relation
Sid
53666
53688
53650
ECE569 Lecture 02.16
Name
Jones
Smith
Smith
Login
jones@cse
smith@eecs
smith@math
Age
18
18
19
Gpa
3.4
3.2
3.8
Fall 2004
Relational Query Language

A major strength of the relational model: supports
simple, powerful querying of data

Queries can be written intuitively, and the DBMS is
responsible for efficient evaluation

Key: precise semantics for relational queries

Allows the optimizer to extensively re-order operations,
and still ensure that the answer does not change
ECE569 Lecture 02.17
Fall 2004
The SQL Query Language

Developed by IBM (system R) in the 1970s

Need for a standard since it is used by many
vendors

Standards

SQL-86

SQL-89 (minor revision)

SQL-92 (major revision)

SQL-99 (major extensions, current standard)
ECE569 Lecture 02.18
Fall 2004
The SQL Query Language

To find all 18 year old students, we can write
Sid
select *
from student S 53666
where s.age=18 53688

Name
Jones
Smith
Login
jones@cse
smith@eecs
Age Gpa
18 3.4
18 3.2
To find just names and logins, replace the first line:
select s.name, s.login
ECE569 Lecture 02.19
Fall 2004
Logical DB design

Representing entity-relationship diagrams

An entity set E can be represented by a relation whose
relation scheme consists of all the attributes of the entity
set.

Each tuple of the relation represents one entity in the
current instance of E.

A relationship R among E1, E2, …, Ek is represented by a
relation whose relation scheme consists of the attributes
in the keys for each of E1, E2, …, Ek.
ECE569 Lecture 02.20
Fall 2004
Relational Query language

Query language: Allow manipulation and retrieval
of data from a database

Relational model supports simple, powerful QLs:


Strong formal foundation based on logic

Allows for much optimization
Query language != programming language
ECE569 Lecture 02.21
Fall 2004
Formal Relational Query Languages

Two mathematical query languages form the basis
for “real” languages (e.g., SQL) and for
implementation:

Relational algebra: more operational, very useful for
representing execution plans

Relational calculus: let users describe what they want,
rather than how to compute it (Non-operational,
declarative)
ECE569 Lecture 02.22
Fall 2004
Relational Algebra

A set of five basic operations that map one or more
relations to a new relation.

Union: R  S  t  tR or tS 

The set of tuples in R or S or both

The degrees of R and S must match
a
b
c
A
B
C
D
E
F
G
H
a
d
c
a
b
c
d
e
d
e
b
e
a
b
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
d
e
b
a
d
b
b
b
b
R
ECE569 Lecture 02.23
S
T
RT
Fall 2004
Relational Algebra (cont)

Difference: R - S  t  tR and tS 

The set of tuples in R, but not in S

The degrees of R and S must match
A
B
C
D
E
F
G
H
a
d
c
a
b
c
d
e
d
e
b
e
a
b
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
R
ECE569 Lecture 02.24
S
R-T
T
Fall 2004
Relational Algebra (cont)

Cartesian Product: R X S 
 (a1, a2, …, ar+s) (a1, a2, …, ar) R and (ar+1, ar+2, …, ar+s)  S 

R x S is of degree r+s where r=degree(R) and s=degree(s)

a tuple t is in R x S if its first r components match those
of a tuple in R and its last s components from a tuple in S
R.A R.B R.C S.D S.E
A
B
C
D
E
F
G
H
A
B
C
D
E
a
b
c
d
e
d
e
b
a
b
c
d
e
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
a
b
c
b
c
a
d
c
d
e
a
d
c
b
c
e
a
b
d
e
e
a
b
b
c
R
S
T
RxS
ECE569 Lecture 02.25
Fall 2004
Relational Algebra (cont)

Projection: i1, i2, …, im (R)   (ai1, ai2, …, aim) (a1, a2, …, an) R

i1, i2, …, im (R) is of degree m

A tuple t in the result is obtained by removing and/or
rearranging the attributes of a tuple of R
A
B
C
D
E
F
G
H
G
F
a
b
c
d
e
d
e
b
e
d
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
b
a
b
d
R
ECE569 Lecture 02.26
S
T
2,1 (T)
Fall 2004
Relational Algebra (cont)

Selection: F(R)  t  t R and F(t)

F(R) is of the same degree as R

All tuples in R that satisfy predicate F are included in
result

F is a formula involving
- Constants or components of the tuple – component i is
denoted $I
- Comparison operators, <, =, >, , , 
- Logical operators and, or, not
F
G
H
d
e
b
a
a
d
b
b
b
c
b
b
T
ECE569 Lecture 02.27
F
G
H
d
e
b
a
b
b
d
b
b
$1=d or $2=$3 (T)
Fall 2004
Examples

Find the names of all patients with a balance of
more than $10,000.

Find the names of all patients that have a pulse
less than 50 or have been diagnosed with
hypertension.
ECE569 Lecture 02.28
Fall 2004
Examples (cont)

Find all patients whose treatment includes the
“application of leeches”

Print the names of all pairs of patients that
occupy the same hospital room and have been
diagnosed with the same disease.
ECE569 Lecture 02.29
Fall 2004
Examples (cont)

Print the name of the patient with the highest pulse
recorded today

Find all patients that had pulse measured today

Find the set of all ‘losers’, i.e., those whose pulse is less
than or equal to that of some other patient

Subtract (b) from (a) and project out patient name
ECE569 Lecture 02.30
Fall 2004
Examples (cont)

List all patients that suffer from at least one illness
that no other patient suffers from
ECE569 Lecture 02.31
Fall 2004
Additional Operations

Intersection: R  S  t  tR and tS 

The set of tuples in both R and S

The degrees of R and S must match

R  S = R – (R – S)
S
R
R–S
R – ( R – S)
ECE569 Lecture 02.32
Fall 2004
Additional Operations (cont)

Theta-join: R ij S  $i  $r+j(R  S) where r =
degree(R) and    <, =, >, , ,  


When  is ‘=‘ operation is called equijoin.
Find patient with highest pulse
ECE569 Lecture 02.33
Fall 2004
Additional Operations (cont)

Natural Join: R
S

An equijoin for all attributes that R and S have in
common.

The shared columns originating in S are projected out

R
S   i1, i2, …, im  R.A1 = S. A1 and … and R.Ak = S.Ak (R x S)
where i1, i2, … im is the list of all attributes of R x S, in
order, except the attributes S.A1, …, S.Ak.

Example: Names of patients being treated with leeches.
ECE569 Lecture 02.34
Fall 2004
Sample Queries

Consider the following set of relations
frequents (drinker, bar)
serves (bar, beer)
likes (drinker, beer)

List the drinkers that frequent at least one bar that
serves a beer they like

Construct the relation should_visit(drinker,bar)
consisting of all tuples <d, b> where bar b serves a
beer that drinker d likes.
ECE569 Lecture 02.35
Fall 2004
Sample Queries (cont)

List the drinkers that frequent only bars that serve
some beer they like. (Assume every drinker
frequents at least one bar.)

Print the drinkers that frequent no bar that serves a
beer that they like
ECE569 Lecture 02.36
Fall 2004
Questions on Healthcare DB

Retrieve all pairs of patients that shared a room at
the same time

List all patients that finished paying their bill
before one of their roomates began paying theirs.

Print the names of all pairs of patients that occupy
the same hospital room and have been diagnosed
with the same disease.
ECE569 Lecture 02.37
Fall 2004
Components of Data-Intensive Systems

Three separate types of functionalities

Data management

Application logic

Presentation

The system architecture determines whether these three
components reside on a single system (“tier”) or are
distributed across several tiers
ECE569 Lecture 02.38
Fall 2004
Single-tier architecture

All functionality combined into a single tier, usually
on a mainframe


Advantages:


Users access through dumb terminals
Easy maintenance and administration
Disadvantages:

Today, users expect graphical user interface

Centralized computation of all of them is too much for a
central system
ECE569 Lecture 02.39
Fall 2004
Client-Server Architecture


Thin client: work division

Client implements only the graphical user interface

Server implements business logic and data management
Thick client:

work division
- Client implements both the graphical user interface and the
business logic
- Server implements data management

Disadvantages:
- No central place to update the business logic
- Security issues: server needs to trust clients
- Scalability: does not scale more than several 100s of clients
ECE569 Lecture 02.40
Fall 2004
The three-tier architecture
Client program
(web browser)
Application Server
(Tomcat, Apache)
Database System
(DB2)
ECE569 Lecture 02.41
Fall 2004
Related documents