Download 1.2 One-to-Many Relationships and Foreign Keys

Document related concepts

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
SQL Unit 1
An Introduction to Relational
Databases
Kirk Scott
1
• 1.1 Entities, Tables, and Primary Keys
• 1.2 One-to-Many Relationships and Foreign
Keys
• 1.3 One-to-One and Many-to-Many
Relationships
• 1.4 An Introduction to Data Types
• 1.5 Nulls and Integrity
2
1.1 Entities
• 1.1.1 Definition of an Entity
• The term "entity" refers to any individual item
that can have information stored about it in a
database.
• This may be a person, a thing, or some sort of
abstraction that doesn't have a physical
existence.
3
1.1.2 Describing Entities by Means of
their Characteristics
• In order to have information about an entity
stored in a database, it has to be possible to
describe it using values which may be words
or phrases or numeric quantities.
4
1.1.3 Storing Information about
Entities in Tables
• Information about entities is stored in rectangular
tables.
• A table may contain information about more than
one entity of the same kind.
• If there are multiple rows in a table, then you are
storing information on that many different
instances of that kind of entity.
• The information about each individual entity is
contained in a single row in the table.
• The values which describe that entity are
contained in the columns of that row.
5
• Suppose you decide that people are entities
that you want to store information about.
• This is an illustration of the general idea:
SSN
name
dob
123-45-6789
Bob
1/1/01
…
6
1.1.4 Table Schema Notation
• Table schema notation can be used to specify the
name of a table and its fields without providing
sample data.
• Sometimes people are tempted to give tables
plural names.
• It turns out to be less confusing to talk about
tables and their contents if the table names are
singular.
• The name of the table should describe a single
instance of the kind of entity stored in a row of
the table.
7
• For example, if you want to store information
about people, you would give the table the
name "Person".
• Here is the table specification using this
naming convention and schema notation:
•
• Person(SSN, name, dob)
8
1.1.5 Parallel Sets of Terminology for
Relational Tables
• Table, Row, Column
• File, Record, Field
• Relation, Tuple, Attribute
9
• There are three parallel sets of terminology when
referring to the structure of data mentioned
above.
• One set has already been used above.
• Each row in the table contains information about
one entity.
• Each column contains a value describing a
particular characteristic of the entity.
• The characteristic described by a given column is
the same for all of the entities in the table.
10
• An older set of terminology, which is still used,
refers to files, records, and fields instead of
tables, rows, and columns.
• Sometimes the term flat file is used.
• This means that each record in the file has the
same number of fields.
11
• A more theoretical set of terminology refers to
relations, tuples, and attributes, respectively.
• The terms relation and tuple may be
somewhat obscure, but the term attribute is
very descriptive.
• Each column in the table contains a value
which describes an attribute of the entity in
question.
12
• In these notes all of these sets of terminology
may be used at one time or another, and they
may be mixed up, referring to attributes of
records, or rows of files, for example.
• You may also use the terminology
indiscriminately when answering questions.
• It's simply important that you know what the
different terms mean.
13
1.1.6 Deciding on Entities in a
Database
• The designer of a database has to determine
what entities will have information stored
about them.
• Entities are conceptual in nature.
• For any given situation, there is not
necessarily just one correct set of entities
which will describe the situation.
• Suppose you want to record information
about mothers and children, for example.
14
• You could regard both mothers and children as
instances of persons, and decide that the
underlying entity is a person.
• You could also decide that mothers and
children are distinct entities in your view of
the world.
• This second approach would have to be taken
if the attributes you store for mothers and
children are different.
15
1.1.7 Incorrect Design 1: Mixing
Types of Entities in Tables
• Suppose that you have decided to record
different attributes for mothers and children and
that they are conceptually different entities.
• Suppose also that you show the relationship
between mothers and children by putting the
records for children after their mother's record.
• It may be convenient or customary to show
information in this way, but this is not a correct
design for a table in a relational database:
16
motherA
child1
child2
motherB
child3
…
…
…
…
…
…
…
17
• This is not allowed under the relational model.
• The theory of relational databases is similar to
set theory in math.
• The rows in the tables are like elements in a
set.
• All of the elements have to be of the same
kind.
18
• If two kinds of entities are different, then they
belong in different tables, even if they have
the same number of attributes.
• Each table has to contain information about
one kind of entity only.
• One table can't contain records for two
different kinds of entities.
19
• It is also true that the order of the elements of
a set does not have any meaning.
• Likewise, the order of the rows of a table has
no meaning.
• It may be convenient for users to see the rows
sorted in some particular order, and it will be
possible to display information in this way.
• But intrinsically the order is immaterial.
20
• If you enter data in a particular order, that can
imply no relationship between rows;
• Also, if you happen to see the data displayed in a
particular order, you can infer no relationship
between the rows.
• It is not permissible to store relationships
between entities by means of their relative
positions in tables.
• It will turn out that relationships can only be
captured by means of attribute values.
21
1.1.8 Incorrect Design 2: Repeating
Fields or Multi-valued Fields
• It may also be convenient to show information
as outlined below, but this is also incorrect:
motherA
motherB
…
child1
child3
child2
22
• This is not allowed under the relational model.
• Different mothers may have different numbers
of children.
• This means that the number of attributes in a
row for a mother could vary.
• You may think that it would be possible to set
a maximum number of children per mother
and use that to set a fixed number of columns
per row.
23
• However, for any number you choose there
are two undesirable results:
• You may encounter a mother who has more
children than the maximum;
• and for all mothers who have less than the
maximum you have lots of wasted space for
information about children.
24
• This example emphasizes the idea of a flat file.
• The rows of a table can't be jagged.
• They all have to contain the same set of
attributes—the same number of attributes.
25
1.1.9 Incorrect Design 3:
Concatenating Related Records
• There is at least one more alternative design
which is incorrect.
• It is worth taking a look at because it illustrates a
different set of problems.
• Suppose for the purposes of illustration that each
mother had 4 attributes and each child had 2
attributes.
• Then let the table be designed so that each row
contained all of the information about one
mother and one of her children.
26
For example:
motherA …
motherA …
motherB …
…
…
…
…
…
…
child1
child2
child3
…
…
…
27
• The fundamental problem remains that
information about more than one different
kind of entity is being stored in one table.
• The practical problem with this design is
redundancy.
• If a mother has more than one child, then the
mother's data is repeated for as many children
as she has.
28
• This is bad for at least three reasons.
• 1. It wastes space.
• 2. If the mother's information ever has to be
updated, more than one record may have to
be updated, rather than just one.
29
• 3. And the most insidious problem of all is this:
• By recording the same information more than
one time, you open up the possibility that
different values will be recorded for the same
attribute in different places.
• If this happens, it is clear that at least one of the
entries is wrong, but it's impossible to tell which
one.
• This is known as a data integrity problem.
30
1.1.10 No Duplicate Records; Primary
Keys
• Relations are like sets in another way.
• Duplicate elements are not allowed in a set,
and duplicate rows are not allowed in tables.
• This makes perfect sense.
• What purpose would it serve to store the
information about a given entity more than
one time?
31
• Another way of saying that there can be no
duplicate records is that all of the records in a
file are unique.
• In other words, when taking the values in all
fields of each record into account, no two
records in the table contain exactly the same
set of values.
32
• It is also customary to have a single field which
uniquely identifies each record.
• That is to say, there are no duplicate values for
that field in the whole table.
• When recording information about people, their
social security number is a good example of this.
• No two people are supposed to have the same
social security number.
• If you know their number, you have positive
identification and you can look them up and find
out other information about them.
33
• It is customary, but not required, to have the
unique identifier be the first field in the table.
• This field is called the primary key of the table.
This picture, given previously, illustrates the
general idea:
SSN
123-45-6789
…
name
Bob
dob
1/1/01
34
• The schema notation for a table can be expanded to
show which field is the primary key. One alternative is
to underline the primary key field:
•
• Person(SSN, name, dob)
•
• Another alternative is to explicitly mark the primary
key field by following it with the abbreviation p.k.:
•
• Person(SSN p.k., name, dob)
35
1.2 One-to-Many Relationships and
Foreign Keys
36
1.2.1 All Tables in a Database are
Related
• In theory, you could have a database
consisting of one table, containing information
about just one type of entity.
• In practice, a database will store information
about more than one type of entity and will
consist of more than one table.
• Each table in a database has to be related in
some way to at least one of the other tables in
the database.
37
• Collectively, all of the tables have to be related
to each other.
• Informally, the first step in database design is
determining the entities and attributes
involved.
• The second step is determining the
relationships among the entities.
38
1.2.2 The Three Kinds of
Relationships
• There are three kinds of relationships that can
exist between entities:
• one-to-one (1-1),
• one-to-many (1-m),
• and many-to-many (m-n).
• Each of these kinds of relationships can be
captured in a relational database design.
39
1.2.3 The One-to-Many Relationship;
ER Notation
• It turns out that the 1-m relationship is the
most basic one.
• The mother-child relationship is of this type,
and it can be illustrated using an entityrelationship (ER) diagram
• This will be shown on the overhead following
the next one.
40
• In an entity-relationship diagram the tables
are represented by rectangles and the
relationship between the tables is represented
by a line between them.
• One end of the line is forked.
• This is known as a crow's foot, and it is this
end of the relationship which is "many":
41
Mother
Child
42
• It is also possible to include field names using
this notation.
• In this example mid stands for mother id and
kid stands for child id.
• These fields are the primary keys and they are
indicated with the notation p.k.
43
Mother
Child
mid p.k.
name
…
kid p.k.
name
…
44
1.2.4 Foreign Keys
• Capturing the relationship between two tables
depends on the use of what is called a foreign
key.
• A foreign key is a field in one table which happens
to be the primary key field of another table.
• Foreign key can be abbreviated f.k.
• The way to capture a 1-m relationship is to
embed the primary key of the "one" table as a
foreign key in the "many" table.
45
• Continuing to use ER notation, the example
above can be expanded to show the 1-m, or
primary key to foreign key relationship.
• The mid, the primary key of the mother table,
is embedded as a foreign key (with the same
name) in the child table:
46
Mother
Child
mid p.k.
name
…
kid p.k.
name
mid f.k.
…
47
• In some books the list of fields in the Child
table wouldn't explicitly show the mid field.
• They are relying on the crow's foot notation to
indicate that the p.k. of the Mother table
would be a f.k. in the Child table.
• It is redundant, but probably clearer, to show
all of the fields in each table explicitly.
48
• The foreign key field doesn't have to have the
same name as the corresponding primary key
field.
• As an illustration, in this example it would be
possible to have mid p.k. in the Mother table
and motherid f.k. in the Child table.
• It is the notations p.k. and f.k. which represent
the roles of the fields in the tables, not the
names of the fields themselves.
49
• Another way of illustrating this relationship is
as follows:
Mother
mid
Child
kid
mid
50
• In this representation the arrow graphically shows the
primary key field being embedded as a foreign key
field.
• In some books the arrow might be shown going in the
opposite direction.
• When done in that way it represents the fact that
values in the "many" table refer back to values in the
"one" table.
• The idea remains the same.
• To capture a 1-m relationship, the primary key of the
"one" table is embedded as a foreign key in the "many"
table.
51
1.2.5 A Concrete Example
• The presentation so far has basically been about
notation.
• The underlying idea may become clearer if a concrete
example is shown.
• Suppose that the Mother and Child tables consist of
these simple designs.
• Notice the use of the abbreviation f.k. again to identify
the foreign key in the design:
•
• Mother(mid, name)
• Child(kid, name, mid f.k.)
52
• The underlying assumption is that the mid
fields in both tables are of the same type and
have the same meaning, namely, they identify
one of the mothers.
• Suppose that the tables contain this
information:
53
mid
1
2
kid
a
b
c
Mother
name
Lily
Matilda
Child
name
Ned
Ann
June
mid
2
1
2
54
• The important observation is that the values
in the mid field in the Child table tell you that
Ned and June's mother is Matilda and Ann's
mother is Lily.
• It is the values of the mid fields in the two
tables that capture the relationship between
them.
55
1.2.6 A Relationship between a Table
and Itself
• Here is one last iteration on the idea of
entities and capturing relationships between
them.
• You could decide that the basic entity you're
dealing with is a person, but you would also
like to record who is whose mother among
person entities.
• Here is an example of a design that
accomplishes this:
56
pid
1
2
3
Person
name
Ned
Ann
June
mid
2
2
57
• In this example, pid is the primary key and mid is
the foreign key.
• mid refers to pid.
• In other words, the primary key of the table is
embedded as a foreign key (necessarily with a
different name) in the same table.
• What you gather from the data shown is that Ned
and June are Ann's children.
• Ann's mother doesn't appear in the table.
58
• There are not literally two copies of the
Person table, but the relationship between
the table and itself could be diagrammed in
this way:
Person
as
Mother
Person
as
Child
pid p.k.
name
mid f.k.
pid p.k.
name
mid f.k.
…
59
• Do not mistake this design for one of the
incorrect designs given earlier.
• The basic entity is person.
• Some people are mothers and some are children
and you are recording information about the
same set of attributes for both.
• The important point is that you can't tell who a
mother's children are or vice-versa by whether or
not their records are located next to each other.
60
• The relationship between records is captured
by the values stored in the fields.
• The one-to-many relationship is captured by
the fact that the primary key value of the
entity which plays the "one" role is embedded
as a foreign key value in the record of the
entity which plays the "many" role.
• In this case, both entities happen to exist in
the same table.
61
• Here is another diagram for this situation.
Person
pid p.k.
name
mid f.k.
Is the mother of
62
• This diagram is better than the previous one
because it doesn't suggest that there are two
copies of the Person table.
• On the other hand it is a bit more cryptic.
• This diagram illustrates the fact that when the
relationship is not obvious, then the line
indicating the relationship has to be given an
explanatory label.
• Either diagram is acceptable.
63
1.3 One-to-One and Many-to-Many
Relationships
64
1.3.1 Background
• The correct approach to implementing a 1-m
relationship was given in the previous section.
• This section will show how to implement 1-1
and m-n relationships.
• The idea behind the examples will be
biological mating patterns.
65
• The 1-m pattern will be repeated using cows as
an example.
• In a herd social structure, one bull will acquire a
harem of cows.
• Geese will be used to illustrate the 1-1 structure.
• Canada geese, for example, typically mate for life.
• They will only find a new partner if their old
partner dies.
66
• Finally, chimpanzees will be used to illustrate
the m-n structure.
• Throughout their lives, chimpanzees may
change partners and have children with
different partners.
• Their social structure is not based on what
humans call a nuclear family.
67
1.3.2 The One-to-Many Relationship
Again
• There is just the one way of implementing a 1m relationship.
• The primary key of the "one" table is
embedded as a foreign key in the "many"
table.
• The tables below illustrate this idea again.
68
bid
1
2
cid
a
b
c
Bull
name
Ferdinand
Durham
Cow
name
Elsie
Bossy
Daisy
bid
2
1
2
69
1.3.3 The One-to-One Relationship as
a Single Table
• With the 1-1 model, there is a choice to be
made.
• Suppose that the match-up of two entities is
truly permanent.
• In this case, it would be possible to re-analyze
the situation and determine that the matchup itself was a base entity, and make a single
table for that.
• For example:
70
pair id
a
b
c
Goose Pairs
goose name
Gaye
Gabriela
Gladys
gander name
Gary
Gus
Gil
71
1.3.4 The One-to-One Relationship as
Two Tables
• Most of the time it is impractical to assume
that the match-up is a base entity.
• The design alternative is to put the two kinds
of entities into two different tables and link
the tables with a primary key, foreign key pair.
• This is one alternative for handling the
situation:
72
goose id
a
b
c
gander id
1
2
3
Goose
goose name
Gaye
Gabriela
Gladys
gander id
1
2
3
Gander
gander name
Gary
Gus
Gil
73
• There are several things happening with this
example that bear some explanation.
• This technique for capturing a 1-1 relationship
allows for data to be included that would
make it 1-m.
• Every database design has underlying
assumptions.
• The assumption here is that the relationship is
1-1.
74
• It is up to the user to make sure that no data is entered
into the tables which would imply a 1-m relationship.
• There is a choice in modeling a 1-1 relationship in this
way:
• Which primary key should be embedded in the other
table as a foreign key?
• In this example, the primary key of the Gander table is
embedded as a foreign key in the Goose table.
• It would also be possible to embed the primary key of
the Goose table as a foreign key in the Gander table.
75
• It doesn't make a difference in this example, but
in real life, if there is any chance that there might
be exceptions or that in the future the
relationship could become 1-m, it is important to
embed the primary key of what would be the
"one" table as a foreign key in what would
become the "many" table.
• Another concern which will be explained further
in the future is minimizing the number of null
values in tables.
76
1.3.5 An Incorrect Design for the
One-to-One Relationship
• Beginning database designers sometimes
make this mistake:
• They think that if one embedded primary key
is a good thing, then two embedded primary
keys are a better thing.
• Going back to the notation for embedding
which uses arrows, this illustrates what they
try to do, which is wrong:
77
Gander
Ganderid, p.k.
Goose
Gooseid, f.k.
Gooseid, p.k.
Ganderid, f.k.
78
• The problem with this design is that it's
redundant.
• There is only one relationship between the
tables, but it is captured twice.
• First of all, this is wasteful.
79
• The second flaw in this arrangement is that it
opens the possibility of mistakenly storing
conflicting data.
• What if Gus's primary key is the foreign key
value in the record for Gabriela, but Gabriela's
primary key value is not the foreign key value
in the record for Gus?
80
• Another concern which will be explained in
greater detail later on is that it may become
impossible to enter data into the tables.
• This can result because additional constraints
can be added which specify that fields may
not be null and values may not be entered
into foreign key fields unless they exist in
primary key fields.
81
1.3.6 The Many-to-Many Relationship
with a "Table in the Middle"
• If two kinds of entities are in a m-n
relationship, in addition to the tables for the
two kinds of entities, a third table is needed.
• The m-n relationship is captured by two 1-m
relationships.
• Each of the base entity tables is in a 1-m
relationship with the "table in the middle".
• Using the ER modeling notation, this is how it
works:
82
Entity1
Table in the middle
Entity2
83
Here is a simple example using
data:
fcid
1
2
3
4
5
6
fcid
1
1
2
2
5
6
3
mcid
1
2
3
4
Female chimp
name
Carol
Alice
Sue
Jill
Ann
Barb
Pairing
mcid
1
2
1
2
3
3
4
Male chimp
name
Bob
Ted
John
Lou
84
1.3.7 Cardinalities, or the Number of
Entities on each End of a Relationship
• This example raises several points.
• Notice that some of the chimps do not
participate in a pairing at all, namely female 4.
• Some of the chimps participate in a 1-1
relationship, namely female 3 and male 4.
• Some of the chimps are in a 1-m relationship,
namely male 3 and females 5 and 6.
• Finally, males 1 and 2 and females 1 and 2 exhibit
an m-n relationship.
85
• When referring to relationships like m-n or 1m, unless other assumptions are explicitly
stated, it's customary to assume that the
relationship may involve 0 or more match-ups
on the "many" side of the relationship.
• It is also customary to assume that the
relationship may involve 0 or 1 match-ups on
the "one" side of the relationship.
86
1.3.8 The Primary Key of the Table in
the Middle, Concatenated Keys
• The second point raised by the foregoing example
has to do with the primary key of the table in the
middle.
• As given, the pair of embedded foreign keys in
the table in the middle is unique for each row.
• It is not necessary to invent a new primary key for
the table.
• When more than one field together make a
unique identifier and this is used as the primary
key, this combination is known as a concatenated
key field.
87
• Using the notation for table design given earlier,
this m-n design can be represented as shown in
the schemas on the next overhead.
• Notice that underlining is used to show that
something is part of the primary key, and at the
same time the abbreviation f.k. is used to show
that something is a foreign key.
• fcid and mcid together are the primary key field
of the table in the middle.
• At the same time, each separately is a foreign key
field:
88
• Female chimp(fcid, name)
• Pairing(fcid f.k., mcid f.k.)
• Male chimp(mcid, name)
• Or
• Female chimp(fcid p.k., name)
• Pairing(fcid p.k. f.k., mcid p.k. f.k.)
• Male chimp(mcid p.k., name)
89
1.3.9 A Separate Primary Key Field
for the Table in the Middle
• The third point raised by this example has to
do with adding a separate primary key field to
the table.
• This may be handy, and there is nothing wrong
with it.
• For example, the table in the middle might
now take this form:
90
pid
1
2
3
4
5
6
7
Pairing
fcid
1
1
2
2
5
6
3
mcid
1
2
1
2
3
3
4
91
• The design could then look like this:
•
• Pairing(pid, fcid f.k., mcid f.k.)
• Or
• Pairing(pid p.k., fcid f.k., mcid f.k.)
92
1.3.10 Recording Historical Data
• This leads to the fourth, and conceptually
most interesting point.
• In m-n relationships of this kind, the different
pairings may occur at different times, and this
may be important information to record.
• There may be different pairings of the same
partners at different times.
• It is also possible that the times of pairings of
different partners could overlap.
93
• This kind of information can be captured by
adding a date or time field to the table in the
middle, or possibly a beginning date or time
and ending date or time field.
• For example:
Pairing
fcid
mcid
beginning date
end date
1
1
1/1/01
1/6/01
1
2
2/2/02
2/7/02
1
1
3/3/03
3/8/03
...
94
• As shown above, with time or date fields
included, it is still reasonable to assume that
each row would be unique and that no
separate primary key field would be needed.
• Every field in the table is part of the key, and
the design would look like this:
•
• Pairing(fcid f.k., mcid f.k., beginning date, end
date)
95
• On the other hand, it might still be useful to
have a simple primary key, in which case the
design would look like this:
•
• Pairing(pid, fcid f.k., mcid f.k., beginning date,
end date)
96
• It is always possible that the table in the middle
would also have other fields containing useful
information concerning the match-up.
• For example, there might be a field which
contains yes or no values depending on whether
the pairing resulted in offspring.
• In that case, the table design would expand to
this:
•
• Pairing(pid, fcid f.k., mcid f.k., beginning date, end
date, offspring)
97
1.3.11 An Incorrect Design for the
Many-to-Many Relationship
• The last remark on implementation choices
for m-n relationships is the following:
• Beginning designers sometimes make the
same mistake here as with 1-1 relationships.
• They try to capture the relationship without
the table in the middle and they embed the
primary key of each of the base tables as a
foreign key in the other.
98
• Not only does this suffer from the same
general problems as when applied to 1-1
relationships;
• it is also a deadly mistake when trying to
represent m-n relationships.
• The incorrect design is shown below with the
embedded foreign key fields.
99
fcid
1
2
3
mcid
1
2
3
Female chimp
name mcid
Carol
Alice
Sue
Male chimp
name
fcid
Bob
Ted
John
100
• Suppose that the relationships among the
chimps are completely promiscuous.
• Each female is paired with each male and viceversa.
• There are 3 of each which means there are 9
possible different pairings.
101
• There are only 6 blank foreign key values
where these relationships could be recorded.
• It is simply impossible to record all of the
relationships with this faulty design.
• It is also undesirable that one concept, a
pairing between a male and a female, may be
recorded in two different ways.
102
1.3.12 A Table in a Many-to-Many
Relationship with Itself
• Recall that a one-to-many relationship
between mothers and children could be reanalyzed as a relationship among people,
which led to a one-to-many relationship
between the Person table and itself.
• Similarly, information about male and female
chimps can be stored in a single table.
• All that's required to keep track of gender is
the addition of a new field to the table.
103
• Then the relationships among chimps can be
captured as a table in the middle, which matches
one chimp record with another.
• The diagram below represents the idea as if there
were two copies of the Chimp table, labeled (1)
and (2).
• As before, keep in mind that there are not
actually two copies of the table.
• Incidentally, this design allows for same sex
relationships and relationships "with yourself".
104
Chimp
(1)
cid p.k.
name
gender
Chimp
(2)
cid1 p.k., f.k.
cid2 p.k., f.k.
beginning date
end date
cid p.k.
name
gender
105
• Whether or not such relationships are allowed
are questions which have to be documented
as part of the overall design separately from
the designs of the individual tables and their
primary key to foreign key relationships.
• If some kinds of relationships are not allowed,
such restrictions can only be enforced by data
integrity rules which are applied when data is
entered into the tables.
106
This is also a correct diagram for this
design:
Chimp
cid p.k.
name
gender
cid1 p.k., f.k.
cid2 p.k., f.k.
beginning date
end date
107
1.4 An Introduction to Data Types
108
1.4.1 What is a Data Type?
Illustrating with Text vs. Numeric
• When creating a table using SQL, it's
necessary to specify what kinds of data are
stored in the fields of the table.
• Each field can only contain one kind of data,
or data type.
• The idea of a data type can be explained using
two broad categories:
• Text data and numeric data.
109
• If a field contains non-numeric data, or if it
contains numeric digits, but it makes no sense
to apply mathematical operations to them,
then this field should be given a text data
type.
• This kind of data is sometimes called
alphanumeric because the field may contain
letters or digits.
110
• If a field in a table contains numeric digits, and
it would make sense to apply mathematical
operations to the stored values, like adding or
subtracting them, then this field should be
given a numeric data type.
111
• This section is not an exhaustive treatment of
all of the different types supported by SQL or
all of their characteristics.
• In particular, it does not cover the specific
details of any particular implementation of
SQL.
• It is intended to give a general idea about the
types.
112
• General information on what a complete
implementation of SQL should include can be
found by turning to the documentation of the
SQL standard.
• Specific information on the data types
supported by a specific implementation can
be found by turning to the documentation for
that implementation.
113
1.4.2 Other Data Types: Monetary
and Date
• In addition to numeric fields and text fields,
there are two other kinds of data type which
are useful even for simple databases.
• There is a monetary data type.
• Its values can be treated numerically, but it
has built-in formatting characteristics
associated with money.
114
• The monetary or currency data type is a
feature of MS Access.
• For example, OpenOffice Base and MySQL do
not have a currency data type.
• There is also a date data type.
• This has both numeric and textual
characteristics which make it useful for
showing dates and times.
115
1.4.3 Other Data Types and
Synonyms
• There are a fair number of different specific
data types defined in SQL.
• Not all of them will be covered here.
• In addition, there are a fair number of
synonyms for the various data types.
• The text, numeric, monetary, and date data
types are briefly introduced below, along with
their synonyms.
116
1.4.4 Text Data Types
• The text data type can be designated with the
keyword TEXT or the keyword CHAR.
• CHAR is the standard name of the type.
• TEXT is a synonym used in MS Access.
117
• When specifying a text field, it is necessary to
specify its length (also known as its width) by
putting a number in parentheses behind the
keyword.
• For example, if you want to allow up to 12
characters for a name field, the type
designation would be either TEXT(12) or
CHAR(12).
118
• There is an additional keyword, VARCHAR,
which works in a similar way.
• It is worth knowing about because it is
commonly used in place of CHAR.
• If a field is defined as TEXT(12) or CHAR(12),
then exactly 12 spaces are reserved in storage
to hold any value that might be entered.
119
• If a field is defined as VARCHAR(12), then a
value up to 12 characters long could be stored
in it.
• However, if a shorter value is stored, only that
amount of space is used.
• This difference is not apparent to the user, but
some database designers make use of
VARCHAR because of the space saving.
120
• In large databases, these considerations can be
important.
• It is also the case that although TEXT is in general
a synonym for CHAR, TEXT fields can be declared
without lengths.
• Depending on the system, they may be able to
store thousands or millions or more characters.
• The length of such a TEXT field will vary according
to the amount of data stored in it.
121
1.4.5 When is a Number Field Not a
Number?
• It is important to keep in mind what kinds of
fields might be declared as text fields.
• A person's name typically only includes letters
of the alphabet.
• A person's address might include a house
number as well as a street name—a mixture
of letters and numbers.
• Both of these fields should be text fields.
122
• An even more important example would be
things like id numbers, serial numbers, etc.
• The common names for these attributes include
the word "number", but they are not numbers.
• If there is never any purpose in applying
arithmetic operations to them, they should not
be defined as numeric fields.
• Depending on the source, serial numbers may
include letters, digits, symbols such as dashes and
so on.
123
• Social security numbers are the classic example.
• In their pure form, they are simply defined as
sequences of 9 digits (the dashes separating the 3
groups of digits are optional).
• Even though they always consist entirely of digits,
they are not numbers, and should not be defined
as numeric fields.
• Telephone numbers and zip codes also fall into
the same category.
124
1.4.6 Text Field Values Should Be
Enclosed in Quotation Marks
• There is one last consideration concerning text
fields.
• When writing SQL statements that refer to
specific text values, those values have to be in
quotation marks.
• So, for example, if I have a name field, a value in
that field would be represented as 'Fred'.
• If I had a social security number field, a value in
that field would be represented as '473608932'.
125
• Depending on the implementation of SQL,
single quotes or double quotes may be used
to enclose the values of text fields.
• It turns out that even in some systems that
support double quotes, there are situations
where only single quotes will do.
• In these notes, single quotes will be used
because they are the most likely to be correct
in all situations.
126
• It is also worth noting that you may run into
trouble when copying and pasting answers into
an SQL editor.
• The editor may not recognize curly quotes: “, ”, ‘,
’.
• If so, they should be replaced with straight
quotes: ", '.
• There are other uses for quotation marks in SQL
besides enclosing the values of fields.
• As you will find out later, one of those other uses
requires double quotes.
127
1.4.7 Numeric Data Types
• The general kind of data that has the most different
kinds of types in SQL is numeric data.
• There is a broad division between numbers that have
decimal places and those that don't, and within each
category there are several different types.
• Initially, it is possible to pass over the integral types.
• It is always possible to use a data type that allows
decimal places and simply enter whole number values
into it.
• Doing that makes life simpler because you don't have
to remember the different integral types.
128
• Among the numeric types that allow decimal
places there is a distinction based on how many
bytes are used internally to store the values.
• To the beginning user, these distinctions don't
make a difference.
• There are a large number of synonyms covering
all of the possibilities, but for practical purposes,
you can simply choose between the synonyms
NUMBER and NUMERIC.
129
• For those more familiar with the terminology
of programming languages, one of these
synonyms may be more memorable or
meaningful:
• REAL, FLOAT, SINGLE, DOUBLE.
• All of these types store values in binary form.
130
• There is also a numeric type named DECIMAL.
• This stores values in base 10.
• This kind of representation takes up more
space than the other numeric types.
• For those situations where genuine base 10
numbers are desired, this type is the correct
option.
131
• A field is declared decimal(m, n), where m
represents the total number of decimal digits and
n represents the number of digits to the right of
the decimal point.
• This data type is useful in systems like OpenOffice
Base and MySQL, which don’t have a monetary
data type.
• A declaration of decimal(m, 2), for example,
would be suitable for holding monetary values.
132
• In MS Access, on output, currency fields will
be formatted to show things like $ signs.
• Decimal fields will just be displayed as
numbers.
133
• When representing numeric values, you never
enclose them in quotes.
• If the type allows a decimal point, then one
can be used.
• If the number is a whole number, it is not
necessary to append .0 at the end.
• Groups of digits should never be separated by
commas or any other punctuation mark.
134
• No other symbol, like a $ sign, is allowed.
• Only the single decimal point is allowed.
• So, for example, a valid representation of a
numeric value might look like this:
1894364.562
135
• Here is a list of some of the integral types and
the number of bytes used to store them:
•
• TINYINT 1 byte
• SMALLINT
2 bytes
• INTEGER 4 bytes (range: -231 to 231-1)
136
• Here is a list of some of the non-integral types and the
number of bytes used to store them:
• REAL
4 bytes, floating point
• FLOAT
8 bytes, floating point
• DECIMAL 17 bytes, genuine decimal representation
without loss of precision due to binary conversion.
• When declaring a field of this type, it is possible to
specify how the digits are used.
• For example: fieldname
DECIMAL(total number
of digits, digits to the right of the decimal point).
137
1.4.8 Monetary Data Types
• The monetary data types are specialized
numeric types.
• In other words, it's possible to do arithmetic
with these kinds of fields because they really
do represent numbers, but these types also
have other characteristics.
138
• The keyword for such a field is MONEY or
CURRENCY.
• Such fields always carry 4 decimal places
internally for computation, but they always
display 2 decimal places by default.
• They also display the currency symbol, $ for
example, along with the numeric value.
139
• On the other hand, the same rules apply to
monetary types as numeric types:
• When representing or entering such data
values, the user cannot use quotation marks,
commas, or currency symbols.
• A valid value consists only of digits and up to
one decimal point.
140
• This is the specification for how many bytes
are used to hold a monetary value:
•
• MONEY 8 bytes, up to 15 digits to the left of
the decimal point, up to 4 digits to the right,
computations shown rounded to 2 digits to
the right.
141
1.4.9 Date and Time Data Types
• Both DATE and TIME data types exist.
• For the purposes of this introduction, only the
date data type will be considered.
• Date fields are hybrids between text and
numeric fields.
• They contain information that may be
represented in textual form, such as the name
of a month, reflecting their textual nature.
142
• Finding the differences between dates is also
meaningful, reflecting their numeric nature.
• The date types support special functions for
separating out components such as the names of
days or months.
• They also support many different kinds of
representations, ranging from digits and slashes
to fully spelled out names of days and months.
• Details will be covered later.
143
• Depending on the system, the synonyms for
the date types may include: DATE, TIME, and
DATETIME.
• The default representation of dates (in the
U.S.) is of this form: mm/dd/yy, where the
letters m, d, and y represent digits.
• Dashes can also be used instead of slashes.
144
• When the user is representing a date, it may be
enclosed in # signs.
• For example: #10/11/09#.
• If you are using a system other than MS Access,
like OpenOffice Base or MySQL, it will be
necessary to use single quotes instead of # signs.
• On systems that support the # sign, using it is not
a bad idea.
• This special punctuation serves as a reminder of
the two aspects of dates, both textual and
numeric.
145
1.4.10 Other Types
• In addition to the simpler data types, over
time SQL has expanded to cover kinds of data
which can't be classified as text or numbers.
• Some additional types include:
•
• BIT or BOOLEAN
• These may take up 1 byte, but they are
designed to store 1 of only 2 values, true or
false.
146
• BINARY, BLOB (binary large object), IMAGE,
OLE (object linking and embedding)
• These are essentially unlimited in size.
• They are designed to store values which are
not structured as simple database field types.
147
• For example, a binary file could contain
program code, it could contain a spreadsheet,
it could contain a JPEG, etc.
• The entire file becomes the value stored in
such a database field.
• Another way of stating this is that such fields
are a way of including non-database files in
database files.
148
1.4.11 Notation for Table Designs
Including Data Types
• Every field in a table has to have a welldefined type.
• The notation used previously to represent
table designs can be expanded to include the
types of the fields.
• A new Person table example is given on the
overhead following the next one.
149
• The table has been expanded to include fields
of various types.
• It is in a mother-child relationship with itself.
• The primary key of the table, SSN, is
embedded as the foreign key field motherSSN
in the same table
• And the fields have been given data types
150
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Person
(SSN
lastname
firstname
middle
motherSSN
dob
heightininches
weightinpounds
eyecolor
wagerate
streetaddress
city
state
zip
TEXT(9)
TEXT(12),
TEXT(12),
TEXT(1),
TEXT(9)
DATE,
NUMBER,
NUMBER,
TEXT(6)
MONEY,
TEXT(24),
TEXT(18),
TEXT(2),
TEXT(5))
primary key,
foreign key,
151
1.4.12 Primary and Foreign Key Field
Data Types
• Logically, corresponding primary and foreign
key fields represent the same thing, or mean
the same thing, and as a result, they should be
designed to have the same data type.
• In the case of text fields, you also expect them
to have the same width.
• If they appear in different tables, they may
have the same names.
152
• In the example given, they appear in the same
table.
• No two fields in the same table can have the
same name, so the foreign key field has to be
given a different name.
• The name it has been given is descriptive of
the role it is playing.
• That field contains the SSN of the mother of
the person in whose record it appears.
153
1.5 Nulls and Integrity
154
1.5.1 Definition of Null
• The term "null" refers to the idea that a
particular field in a particular record may not
have data in it.
• A relational database management system
allows this.
• Cases often arise in practice where the
information doesn't exist or isn't known.
155
1.5.2 What Null is Not
• When a database management system
supports null values in fields, it's important to
understand what this does not mean.
• It does not mean that the fields contain the
sequence of characters "null".
• It also does not mean that the field contains
invisible blanks.
• Blank spaces themselves are a form of
character.
156
• What NULL means is that there is absolutely
nothing in the field, and the database
management system is able to recognize fields
that are in that state.
157
1.5.3 The NOT NULL Constraint
• Although null values are supported in general,
it is also possible to specify that certain fields
can never be null.
• It is not possible for primary key fields to be
null
• It is also possible that for a certain database,
the designer requires that if a user enters
records into a given table, all records have to
have valid values for certain specified fields.
158
1.5.4 Data Integrity
• Data integrity is the requirement that all data
in all fields of a database be valid and correct.
• If databases didn't allow nulls and required
that every field always contained a value,
users would fill unknown fields with bogus
data.
• This would violate data integrity.
159
• One solution to the ambiguous meaning of
null values and the tendency of users to use
bogus data when nulls are not allowed is to
specify values like "not applicable" or "not
known" which are valid entries for given fields.
• This solution works for text fields.
• For numeric fields, null values are the only
option.
160
1.5.5 Entity Integrity
• The term integrity in database management
systems refers to the validity and consistency
of data entered into a database.
• The phrase "entity integrity" is the formal
expression of a requirement that was stated
informally earlier.
• Entity integrity puts the following requirement
on a correctly implemented database:
161
• Every table has to have a primary key field,
and no part of the primary key field can be
null for any record in the table.
• If all or part of a key were allowed to be null,
that would defeat the purpose that the
primary key field be the unique identifier for
every record in the table.
• To state the idea again, every key field value
has to be complete and unique.
162
1.5.6 Referential Integrity
• It is the primary key to foreign key
relationships that capture the relationships
between entities that have been separated
into different tables by the database design.
• It is critically important that the data
maintaining the relationships be valid and
consistent.
163
• The phrase "referential integrity" has the
following meaning:
• Every value that appears in a foreign key field
also has to appear as a value in the
corresponding primary key field.
• This can also be stated negatively:
• There can be no foreign key value that does
not have a corresponding primary key value.
164
• The meaning and importance of referential
integrity can be most easily explained with a
small example showing a violation of it.
• Consider these two tables:
165
mid
1
2
kid
a
b
c
Mother
name
Lily
Matilda
Child
name
Ned
Ann
June
mid
2
3
166
• Child b, Ann, is shown as having a mother with
mid equal to 3.
• There is no such mother in the Mother table.
• This is literally nonsense.
• There is no sense in which this can be correct
and this is what referential integrity forbids.
167
• The example also illustrates two other things,
which are related to "non-existent" values.
• These things are brought out in the following
two points.
168
1.5.7 Nulls in Foreign Key Fields
• In the example above, child c, June, does not
have a mother listed.
• In other words, the foreign key field is null.
• This does not violate referential integrity.
• As with null in any situation, it may mean that
the mother is not known.
169
• Nobody literally doesn't have a mother, but if
the woman table only records information on
living women, for example, then for an
orphan, the mother "wouldn't exist".
• It is unlikely that you would rename the table
"Child or Orphan"—but the idea is that the
null value is allowed and this in some sense
affects the meaning of what kinds of entities
are entered into the table.
170
1.5.8 Primary Key Entities without
Foreign Key Matches
• Also observe in the example that mother 1,
Lily, does not have any matching records in the
Child table.
• This does not violate referential integrity.
• It only suggests that the Mother table might
be better named the Woman table, because it
is reasonable to be recording information
about women and children where some
women do not have children.
171
1.5.9 Referential Integrity and Nulls
on Insertion
• Referential integrity leads to another
consideration.
• A database design with two tables in a
primary key to foreign key relationship
introduces interrelationship constraints.
• A fully-featured database management system
will enforce referential integrity constraints.
172
• Once turned on, this means that at data entry
time the primary key record has to go in first.
• Alternatively, if a foreign key record is entered
first, null has to be allowed for the value of
the foreign key field.
• In a table in a relationship with itself, the
enforcement of referential integrity and NOT
NULL on the foreign key field at the same time
will prevent data entry into the table.
173
1.5.10 Referential Integrity on
Updates and Deletions
• If a referential integrity constraint has been
created, it will also be active when the
contents of a table are to be changed by
updating values or deleting records.
• The constraint puts a restriction on what can
be entered into a foreign key field:
• The value has to exist in the primary key field.
174
• The constraint also puts restrictions on the
primary key field:
• It can't be changed if there are foreign key
records that depend on it.
• What can be done if primary key values are to
be deleted or updated?
175
• The default settings for this part of a
referential integrity constraint are summarized
in these two phrases:
• On delete, restrict; on update, cascade.
• Following this is a fuller explanation of what
these phrases mean in terms of the concrete
mother and child example:
176
• On delete, restrict:
• No mother record can be deleted if she has
corresponding child records in the other table.
• To allow the deletion of the mother would
lead to a referential integrity violation.
• To spell it out again:
• Deletion in the primary key table is restricted
if there are corresponding records in the
foreign key table.
177
• On update, cascade:
• If the primary key value of a mother record is
updated, if she has corresponding child
records in the other table, the foreign key
values in those records are automatically
updated to reflect the change.
• This problem arises less frequently because
once a primary key value is assigned to an
entity, it is rarely changed.
178
• Other referential integrity settings are also
syntactically valid.
• For example: On delete, cascade (if the
mother goes, death to the children);
• On update, restrict (no changes to the mother
once she has children).
179
The End
180