Download 1_Managing external data_1

Document related concepts

SQL wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational algebra wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Managing external data
Part 1
Design of Databases
Gitte Christensen
Dyalog Ltd
Purpose
• To give you a crash course in data analysis and
databases
• After part 1 Design of Databases you will be able
to analyse and organise data based on a
requirement spec or use case.
• After part 2 Database programming you will be
able to use relational data in your APL applications
• After part 3 Database Implementation you will be
able to choose between different storage methods
based on structure and use of data and
performance considerations
Agenda
• The Relational Model
– Entity/Relation model
– Convert E/R to table structure
– Relational Algebra
• Semistructured data
• Multidimensional data
Data Models
• A Database models some portion of
the real world.
• Data Model is link between user’s
view of the world and bits stored in
computer.
• We will concentrate on the Relational
Model
Data Models
• A data model is a collection of concepts for
describing data.
• A database schema is a description of a
particular collection of data, using a given
data model.
• The relational model of data is the most
widely used model today.
– Main concept: relation, basically a table
with rows and columns.
– Every relation has a schema, which
describes the columns, or fields.
Levels of Abstraction
• Views describe how
users see the data.
• Conceptual schema
defines logical structure
• Physical schema
describes the files and
indexes used.
• (sometimes called the
ANSI/SPARC model)
Users
View 1
View 2
View 3
Conceptual Schema
Physical Schema
DB
Data Independence
• A Simple Idea:
Applications should be
insulated from how data
is structured and stored.
• Logical data independence:
Protection from changes in
logical structure of data.
• Physical data independence:
Protection from changes in
physical structure of data.
View 1
View 2
View 3
Conceptual Schema
Physical Schema
DB
Entity-Relationship Model
8
Purpose of E/R Model
• The E/R model allows us to sketch
database designs.
– Kinds of data and how they connect.
– Not how data changes.
• Designs are pictures called entityrelationship diagrams.
• Later: convert E/R designs to relational
DB designs.
9
Entity Sets
• Entity = “thing” or object.
• Entity set = collection of similar entities.
– Similar to a class in object-oriented languages.
• Attribute = property of (the entities of) an
entity set.
– Attributes are simple values, e.g. integers or
character strings.
10
E/R Diagrams
• In an entity-relationship diagram:
– Entity set = rectangle.
– Attribute = oval, with a line to the rectangle
representing its entity set.
11
Example
name
manf
Beers
• Entity set Beers has two attributes, name and
manf (manufacturer).
• Each Beers entity has values for these two
attributes, e.g. (Bud, Anheuser-Busch)
12
Relationships
• A relationship connects two or more entity
sets.
• It is represented by a diamond, with lines
to each of the entity sets involved.
13
Example
name
addr
Bars
name
Beers
Sells
license
Frequents
Note:
license =
beer, full,
none
name
Likes
Drinkers
manf
Bars sell some
beers.
Drinkers like
some beers.
Drinkers frequent
some bars.
addr
14
Relationship Set
• The current “value” of an entity set is the
set of entities that belong to it.
– Example: the set of all bars in our database.
• The “value” of a relationship is a set of lists
of currently related entities, one from each
of the related entity sets.
15
Example
• For the relationship Sells, we might have a
relationship set like:
Bar
Joe’s Bar
Joe’s Bar
Sue’s Bar
Sue’s Bar
Sue’s Bar
Beer
Bud
Miller
Bud
Pete’s Ale
Bud Lite
16
Case Movie Database
• We want to create a movie database
which will allow our users to find
information about movies
• Each movie has a title, a production year,
lenght in minutes, whether it is color or b/w
and an owner, a studio
• We have adresses for the studios and the
actors
EntityName
Relationship
AttributeName
AttriAbute
Draw a model
of the Movies database
using these symbols
Multiway Relationships
• Sometimes, we need a relationship that
connects more than two entity sets.
• Suppose that drinkers will only drink
certain beers at certain bars.
– Our three binary relationships Likes, Sells,
and Frequents do not allow us to make this
distinction.
– But a 3-way relationship would.
19
Example
name
license
addr
name
Bars
manf
Beers
Preferences
Drinkers
name
addr
20
A Typical Relationship Set
Bar
Joe’s Bar
Sue’s Bar
Sue’s Bar
Joe’s Bar
Joe’s Bar
Joe’s Bar
Sue’s Bar
Drinker
Ann
Ann
Ann
Bob
Bob
Cal
Cal
Beer
Miller
Bud
Pete’s Ale
Bud
Miller
Miller
Bud Lite
21
Case Movie Database
• In each movie there are actors who are
contracted by the studios
• Add this relationship to your model
Many-Many Relationships
• Focus: binary relationships, such as Sells
between Bars and Beers.
• In a many-many relationship, an entity of
either set can be connected to many
entities of the other set.
– E.g., a bar sells many beers; a beer is sold by
many bars.
23
In Pictures:
many-many
24
Many-One Relationships
• Some binary relationships are many -one
from one entity set to another.
• Each entity of the first set is connected to
at most one entity of the second set.
• But an entity of the second set can be
connected to zero, one, or many entities of
the first set.
25
In Pictures:
many-one
26
Example
• Favorite, from Drinkers to Beers is manyone.
• A drinker has at most one favorite beer.
• But a beer can be the favorite of any
number of drinkers, including zero.
27
One-One Relationships
• In a one-one relationship, each entity of
either entity set is related to at most one
entity of the other set.
• Example: Relationship Best-seller between
entity sets Manfs (manufacturer) and Beers.
– A beer cannot be made by more than one
manufacturer, and no manufacturer can have
more than one best-seller (assume no ties).
28
In Pictures:
one-one
29
Representing “Multiplicity”
• Show a many-one relationship by an arrow
entering the “one” side.
• Show a one-one relationship by arrows
entering both entity sets.
• Rounded arrow = “exactly one,” i.e., each
entity of the first set is related to exactly
one entity of the target set.
30
Example
Drinkers
Likes
Beers
Favorite
31
Example
• Consider Best-seller between Manfs and
Beers.
• Some beers are not the best-seller of any
manufacturer, so a rounded arrow to
Manfs would be inappropriate.
• But a beer manufacturer has to have a
best-seller.
32
In the E/R Diagram
Manfs
Bestseller
Beers
33
Case Movie Database
• Add arrows to your diagram so it reflects
the kind of relations between the entities
Attributes on Relationships
• Sometimes it is useful to attach an
attribute to a relationship.
• Think of this attribute as a property of
tuples in the relationship set.
35
Example
Bars
Sells
Beers
price
Price is a function of both the bar and the beer,
not of one alone.
36
Equivalent Diagrams Without
Attributes on Relationships
• Create an entity set representing values
of the attribute.
• Make that entity set participate in the
relationship.
37
Example
Bars
Sells
Prices
Beers
Note convention: arrow
from multiway relationship
= “all other entity sets
together determine a
unique one of these.”
price
38
Roles
• Sometimes an entity set appears more
than once in a relationship.
• Label the edges between the relationship
and the entity set with names called roles.
39
Example
Relationship Set
Husband
Bob
Joe
…
Married
husband
wife
Drinkers
40
Wife
Ann
Sue
…
Example
Relationship Set
Buddies
1
2
Buddy1
Bob
Joe
Ann
Joe
…
Drinkers
41
Buddy2
Ann
Sue
Bob
Moe
…
Case Movie Database
• The actors can be contracted either by the
studio producing the movie or by another
studio who rents the actor to the producing
studio
• We would like to record what the actor is
paid for appearing in a movie
• Update your model to reflect the new facts
Subclasses
• Subclass = special case = fewer entities =
more properties.
• Example: Ales are a kind of beer.
– Not every beer is an ale, but some are.
– Let us suppose that in addition to all the
properties (attributes and relationships) of
beers, ales also have the attribute color.
43
Subclasses in E/R Diagrams
• Assume subclasses form a tree.
– I.e., no multiple inheritance.
• Isa triangles indicate the subclass
relationship.
– Point to the superclass.
44
Example
name
Beers
manf
isa
color
Ales
45
Case Movie Database
• For some movies like cartoons we have a
different kind of actor, voices.
• Design a subclass to reflect this fact
ISA
E/R Vs. Object-Oriented Subclasses
• In OO, objects are in one class only.
– Subclasses inherit from superclasses.
• In contrast, E/R entities have
representatives in all subclasses to which
they belong.
– Rule: if entity e is represented in a subclass,
then e is represented in the superclass.
47
Example
name
Beers
manf
Pete’s Ale
isa
color
Ales
48
Keys
• A key is a set of attributes for one entity
set such that no two entities in this set
agree on all the attributes of the key.
– It is allowed for two entities to agree on some,
but not all, of the key attributes.
• We must designate a key for every entity
set.
49
Keys in E/R Diagrams
• Underline the key attribute(s).
• In an Isa hierarchy, only the root entity set
has a key, and it must serve as the key for
all entities in the hierarchy.
50
Example: name is Key for Beers
name
Beers
manf
isa
color
Ales
51
Example: a Multi-attribute Key
dept
number
hours
room
Courses
• Note that hours and room could also serve as a
key, but we must select only one key.
52
Case Movie Database
• Add keys to your diagram
Weak Entity Sets
• Occasionally, entities of an entity set need
“help” to identify them uniquely.
• Entity set E is said to be weak if in order
to identify entities of E uniquely, we need
to follow one or more many-one
relationships from E and include the key
of the related entities from the connected
entity sets.
54
Example
• name is almost a key for football players, but
there might be two with the same name.
• number is certainly not a key, since players
on two teams could have the same number.
• But number, together with the team name
related to the player by Plays-on should be
unique.
55
In E/R Diagrams
name
number
Players
name
Playson
Teams
• Double diamond for supporting many-one relationship.
• Double rectangle for the weak entity set.
56
Weak Entity-Set Rules
• A weak entity set has one or more many-one
relationships to other (supporting) entity sets.
– Not every many-one relationship from a weak entity
set need be supporting.
• The key for a weak entity set is its own
underlined attributes and the keys for the
supporting entity sets.
– E.g., (player) number and (team) name is a key for
Players in the previous example.
57
Case Movie Database
• We would like to record which camera
crews shot a particular movie
• Camera crews are numbered within each
studio
• Add these facts to your diagram
Design Techniques
1. Avoid redundancy.
2. Limit the use of weak entity sets.
3. Don’t use an entity set when an attribute
will do.
59
Avoiding Redundancy
• Redundancy occurs when we say the
same thing in two or more different ways.
• Redundancy wastes space and (more
importantly) encourages inconsistency.
– The two instances of the same fact may
become inconsistent if we change one and
forget to change the other.
60
Example: Good
name
Beers
name
ManfBy
addr
Manfs
This design gives the address of each
manufacturer exactly once.
61
Example: Bad
name
Beers
name
ManfBy
addr
Manfs
manf
This design states the manufacturer of a beer
twice: as an attribute and as a related entity.
62
Example: Bad
name
manf
manfAddr
Beers
This design repeats the manufacturer’s address
once for each beer and loses the address if there
are temporarily no beers for a manufacturer.
63
Entity Sets Versus Attributes
•
An entity set should satisfy at least one
of the following conditions:
– It is more than the name of something; it has
at least one nonkey attribute.
or
– It is the “many” in a many-one or manymany relationship.
64
Example: Good
name
Beers
name
ManfBy
addr
Manfs
•Manfs deserves to be an entity set because of
the nonkey attribute addr.
•Beers deserves to be an entity set because it is
the “many” of the many-one relationship ManfBy.
65
Example: Good
name
manf
Beers
There is no need to make the manufacturer an
entity set, because we record nothing about
manufacturers besides their name.
66
Example: Bad
name
Beers
name
ManfBy
Manfs
Since the manufacturer is nothing but a name,
and is not at the “many” end of any relationship,
it should not be an entity set.
67
Don’t Overuse Weak Entity Sets
• Beginning database designers often doubt
that anything could be a key by itself.
– They make all entity sets weak, supported by all
other entity sets to which they are linked.
• In reality, we usually create unique ID’s for
entity sets.
– Examples include social-security numbers,
automobile VIN’s etc.
68
When Do We Need Weak Entity Sets?
• The usual reason is that there is no global
authority capable of creating unique ID’s.
• Example: it is unlikely that there could be
an agreement to assign unique player
numbers across all football teams in the
world.
69
Break
How to translate ER Model
to Relational Model
Concepts
Relational Model is made up of tables
•
•
•
•
•
A row of table
A column of table
A table
Cardinality
Degree
= a relational instance/tuple
= an attribute
= a schema/relation
= number of rows
= number of columns
Example
Attribute
SID
Name
Major
GPA
1234
John
CS
2.8
5678
Mary
EE
3.6
Cardinality =
tuple/relational
instance
2
4 Degree
A Schema / Relation
From ER Model to Relational Model
So… how do we convert an ER diagram into a
table?? Simple!!
Basic Ideas:
• Build a table for each entity set
• Build a table for each relationship set if necessary (more
on this later)
• Make a column in the table for each attribute in the entity
set
• Indivisibility Rule and Ordering Rule
• Primary Key
Example – Strong Entity Set
SID
Name
SSN
Advisor
Student
Major
Name
Professor
Dept
GPA
SID
Name Major
GPA
SSN
Name
Dept
1234
John
CS
2.8
9999
Smith
Math
5678
Mary
EE
3.6
8888
Lee
CS
Representation of Weak Entity Set
• Weak Entity Set Cannot exists alone
• To build a table/schema for weak entity set
– Construct a table with one column for each attribute in
the weak entity set
– Remember to include discriminator
– Augment one extra column on the right side of the
table, put in there the primary key of the Strong Entity
Set (the entity set that the weak entity set is
depending on)
– Primary Key of the weak entity set = Discriminator +
foreign key
Example – Weak Entity Set
Age
SID
Name
Student
Major
Primary key of Children is
Parent_SID + Name
Name
owns
Children
GPA
Age
Name
Parent_SID
10
Bart
1234
8
Lisa
5678
Representation of Relationship Set
--This is a little more complicated-• Unary/Binary Relationship set
– Depends on the cardinality and participation of the relationship
– Two possible approaches
• N-ary (multiple) Relationship set
– Primary Key Issue
• Identifying Relationship
– No relational model representation necessary
Representing Relationship Set
Unary/Binary Relationship
• For one-to-one relationship w/out total participation
– Build a table with two columns, one column for each
participating entity set’s primary key. Add successive
columns, one for each descriptive attributes of the
relationship set (if any).
• For one-to-one relationship with one entity set having
total participation
– Augment one extra column on the right side of the
table of the entity set with total participation, put in
there the primary key of the entity set without
complete participation as per to the relationship.
Example – One-to-One Relationship Set
Degree
SID
Name
Student
Major
ID Code
study
Major
GPA
Primary key can be either
SID or Maj_ID_Co
SID
Maj_ID Co
S_Degree
9999
07
1234
8888
05
5678
Example – One-to-One Relationship Set
Condition
SID
Name
1:1
Relationship
Student
Major
S/N #
Have
Laptop
GPA
Brand
SID
Name
Major
GPA
LP_S/N
Hav_Cond
9999
Bart
Economy
-4.0
123-456
Own
8888
Lisa
Physics
4.0
567-890
Loan
* Primary key can be either SID or LP_S/N
Representing Relationship Set
Unary/Binary Relationship
• For one-to-many relationship w/out total
participation
– Same thing as one-to-one
• For one-to-many/many-to-one relationship with
one entity set having total participation on “many”
side
– Augment one extra column on the right side of
the table of the entity set on the “many” side,
put in there the primary key of the entity set
on the “one” side as per to the relationship.
Example – Many-to-One Relationship Set
Semester
SID
Name
N:1
Relationship
Advisor
Student
Major
SSN
Professor
GPA
Dept
Name
SID
Name
Major
GPA
Pro_SSN
Ad_Sem
9999
Bart
Economy
-4.0
123-456
Fall 2006
8888
Lisa
Physics
4.0
567-890
Fall 2005
* Primary key of this table is SID
Representing Relationship Set
Unary/Binary Relationship
• For many-to-many relationship
– Same thing as one-to-one relationship without
total participation.
– Primary key of this new schema is the union
of the foreign keys of both entity sets.
– No augmentation approach possible…
Representing Relationship Set
N-ary Relationship
• Intuitively Simple
– Build a new table with as many columns as there are
attributes for the union of the primary keys of all
participating entity sets.
– Augment additional columns for descriptive attributes
of the relationship set (if necessary)
– The primary key of this table is the union of all
primary keys of entity sets that are on “many” side
– That is it, we are done.
Example – N-ary Relationship Set
P-Key1
D-Attribute
A-Key
E-Set 1
P-Key2
A relationship
Another Set
E-Set 2
P-Key3
E-Set 3
P-Key1
P-Key2
P-Key3
A-Key
D-Attribute
9999
8888
7777
6666
Yes
1234
5678
9012
3456
No
* Primary key of this table is P-Key1 + P-Key2 + P-Key3
Representing Relationship Set
Identifying Relationship
• This is what you have to know
– You DON’T have to build a table/schema for the
identifying relationship set once you have built a
table/schema for the corresponding weak entity set
– Reason:
• A special case of one-to-many with total participation
• Reduce Redundancy
Representing Composite Attribute
• Relational Model Indivisibility Rule Applies
• One column for each component attribute
• NO column for the composite attribute itself
SSN
SSN
Name
Street
City
9999
Dr. Smith
50 1st St.
Fake City
8888
Dr. Lee
1 B St.
San Jose
Name
Professor
Address
Street
City
Representing Multivalue Attribute
• For each multivalue attribute in an entity
set/relationship set
– Build a new relation schema with two columns
– One column for the primary keys of the entity
set/relationship set that has the multivalue attribute
– Another column for the multivalue attributes. Each
cell of this column holds only one value. So each
value is represented as an unique tuple
– Primary key for this schema is the union of all
attributes
Example – Multivalue attribute
SID
Name
Children
Student
Major
The primary key for this
table is Student_SID +
Children, the union of all
attributes
GPA
SID
Name
Major
GPA
1234
John
CS
2.8
5678
Homer
EE
3.6
Stud_SID
Children
1234
Johnson
1234
Mary
5678
Bart
5678
Lisa
5678
Maggie
Representing Class Hierarchy
• Two general approaches depending on
disjointness and completeness
– For non-disjoint and/or non-complete class hierarchy:
• create a table for each super class entity set
according to normal entity set translation method.
• Create a table for each subclass entity set with a
column for each of the attributes of that entity set
plus one for each attributes of the primary key of
the super class entity set
• This primary key from super class entity set is also
used as the primary key for this new table
SSN
Example
Name
Person
SID
Status
Gender
ISA
Student
Major
GPA
SSN
Name
Gender
1234
Homer
Male
5678
Marge
Female
SSN
SID
Status
Major
GPA
1234
9999
Full
CS
2.8
5678
8888
Part
EE
3.6
Case Movie Database
• Convert your E/R diagram to relational
tables
Relational Algebra
Relational Algebra
Relational Algebra is :
• the formal description of how a relational
database operates
• the mathematics which underpin SQL
operations.
Operators in relational algebra are not necessarily
the same as SQL operators, even if they have the
same name.
Terminology
• Relation - a set of tuples.
• Tuple - a collection of attributes which describe
some real world entity.
• Attribute - a real world role played by a named
domain.
• Domain - a set of atomic values.
• Set - a mathematical definition for a collection of
objects which contains no duplicates.
Operators - Write
• INSERT - provides a list of attribute values for a new
tuple in a relation. This operator is the same as SQL.
• DELETE - provides a condition on the attributes of a
relation to determine which tuple(s) to remove from the
relation. This operator is the same as SQL.
• MODIFY - changes the values of one or more attributes
in one or more tuples of a relation, as identified by a
condition operating on the attributes of the relation. This
is equivalent to SQL UPDATE.
Operators - Retrieval
There are two groups of operations:
• Mathematical set theory based relations:
UNION, INTERSECTION, DIFFERENCE, and
CARTESIAN PRODUCT.
• Special database operations:
SELECT (not the same as SQL SELECT),
PROJECT, and JOIN.
Relational SELECT
SELECT is used to obtain a subset of the
tuples of a relation that satisfy a select
condition.
For example, find all employees born after
1st Jan 1950:
SELECT dob > ’01/JAN/1950’ (employee)
Relational PROJECT
The PROJECT operation is used to select a
subset of the attributes of a relation by
specifying the names of the required
attributes.
For example, to get a list of all employees
surnames and employee numbers:
PROJECT surname,empno (employee)
SELECT and PROJECT
SELECT and PROJECT can be combined together.
For example, to get a list of employee numbers
for employees in department
1:= 1 (employee))
PROJECT empno number
(SELECT depno
Mapping this back to SQL gives:
SELECT empno
FROM employee
WHERE depno = 1;
Set Operations - semantics
Consider two relations R and S.
• UNION of R and S
the union of two relations is a relation that includes all
the tuples that are either in R or in S or in both R and S.
Duplicate tuples are eliminated.
• INTERSECTION of R and S
the intersection of R and S is a relation that includes all
tuples that are both in R and S.
• DIFFERENCE of R and S
the difference of R and S is the relation that contains all
the tuples that are in R but that are not in S.
SET Operations - requirements
For set operations to function correctly the
relations R and S must be union compatible.
Two relations are union compatible if
– they have the same number of attributes
– the domain of each attribute in column order is
the same in both R and S.
UNION Example
INTERSECTION Example
DIFFERENCE Example
CARTESIAN PRODUCT
The Cartesian Product is also an operator
which works on two sets. It is sometimes
called the CROSS PRODUCT or CROSS
JOIN.
It combines the tuples of one relation with all
the tuples of the other relation.
CARTESIAN PRODUCT
Example
JOIN Operator
JOIN is used to combine related tuples from two relations:
• In its simplest form the JOIN operator is just the cross
product of the two relations.
• As the join becomes more complex, tuples are removed
within the cross product to make the result of the join
more meaningful.
• JOIN allows you to evaluate a join condition between the
attributes of the relations on which the join is undertaken.
The notation used is
R JOIN join condition S
JOIN Example
Natural Join
Invariably the JOIN involves an equality test, and thus is
often described as an equi-join. Such joins result in two
attributes in the resulting relation having exactly the same
value. A ‘natural join’ will remove the duplicate attribute(s).
– In most systems a natural join will require that the
attributes have the same name to identify the
attribute(s) to be used in the join. This may require a
renaming mechanism.
– If you do use natural joins make sure that the relations
do not have two attributes with the same name by
accident.
OUTER JOINs
Notice that much of the data is lost when applying a join to
two relations. In some cases this lost data might hold useful
information. An outer join retains the information that would
have been lost from the tables, replacing missing data with
nulls.
There are three forms of the outer join, depending on which
data is to be kept.
• LEFT OUTER JOIN - keep data from the left-hand table
• RIGHT OUTER JOIN - keep data from the right-hand
table
• FULL OUTER JOIN - keep data from both tables
OUTER JOIN Example 1
OUTER JOIN Example 2
Semistructured data
star
Root
starIn
starOf
mh
city
city
street
cf
starOf
starIn
sw
Multidimensional data
End of Part 1