Download Relational Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Understanding Database Models
Before covering relational databases in more detail, we'll briefly cover hierarchical
and network database management systems (DBMSs). Understanding their limitations will
help you understand the relational approach and how the approach attempts to address these
limitations.
Hierarchical Databases
One of the hierarchical DBMSs still in use today is an IBM product called IMS, which
stands for Information Management System. Its paradigm is to use a tree structure and a
series of links to navigate from record type (a table) to record type. Records (single rows)
include one or more fields (columns). Each tree must have a single root, or parent record
type. The relationship between the record types is the same as the directory structure on your
computer: parent to child, continuing onto lower and lower levels. The relationship is
maintained as a DBMS pointer structure from one record to another. That pointer is valid for
only one level of connectivity, and maintaining the order of the rows is required for the
pointer to work.
As an example of this type of system, consider Figure 2-1, which illustrates a classscheduling system at a college.
Figure 2-1. Hierarchical class scheduling: record types connected in a tree structure
This type of data management, system has several challenges. One is the direct result
of the restriction of being able to link only one parent to any children record types (such as
Class in Figure 2-1). Look at the necessary duplication of Class because of its need to relate
to both Campus and Course. A hierarchical DBMS (such as IMS) assumes the world can be
viewed as a series of unrelated, strictly applied hierarchies, with one parent having many
children within a tree structure. When this doesn't work (and there are many cases when you
don't have exclusive classification), you end up with duplication of data across different tree
structures. Workarounds for this usually require creating duplicate records or tables to satisfy
each different use of the same data, which can lead to data synchronization problems, since
the same records appear in numerous places within the database.
Another challenge with this type of design is the unnecessary duplication of records
within a single record type and hierarchy. Look at the relationship of Student to Course. A
student will often enroll for many courses, and the details of that student will be stored as a
child of every course for which the student is enrolled.
Information support for control systems
Lesson 2 / Student
Page 1/5
One of the major limitations of this technology is that because of the technical
difficulty inherent in setting up and navigating complex hierarchies within the physical
database, the physical design is often implemented as a series of single-level parent-child
relationships (rool-child, root-child2, and so on), regardless of whether this actually
represents the business rules of the data. The technological limitations with which the
database administrators (DBAs) and programmers must work bias the implementation lo the
point where the hierarchies aren't implemented in the way the designers intended.
And unfortunately, because of the linage structures connecting the tables, you can't
skip a level of relationship to find data. So, to get from Course to Teacher, you must perform
additional input/output (I/O) operations and walk the tree down through Class. You can
imagine that for a large database with complex hierarchies this would be a very long path.
Further- more, because these relationship paths aren't easy to change and new ones aren't easy
to add, these databases become rather inflexible once the database is created.
However, in spite of these types of challenges, hierarchical DBMSs are still regularly
used. They were popular in the 1960-70s, can still deliver high-performing systems, and are
prevalent in legacy systems still in use today.
Network Databases
One of the big network databases still fairly common in data management is IDMS,
which stands for Integrated Database Management Systems. Network databases were a
logical extension of hierarchical databases and resolved the problem hierarchical databases
had with child records having multiple parents.
The Conference on Data Systems Languages (CODASYL) in 1971 formally
introduced the network model. Under this system, data management is based on mathematical
"set" theory. A data set consists of an owner record type, a set name, and a member record
type. A member record type basically corresponds to a data element in a row. The member
record types can belong to various owner record types, thereby allowing for more than one
parent relationship for a record. The owner record type can also be a member or an owner in
another record type.
To address data sizing issues, data elements making up a record of a network design
can be "redefined" to mean different things. This means that if a record has been defined as
containing four data elements, such as A, B, C and D, under some circumstances the fields
within the record may be redefined to actually store the elements A, E, F, and G instead. This
is a flexible construct, which reduces the amount of database structures and the amount of
disk space required, but it can make locating and identifying specific data elements
challenging. Also, one data element can be defined to "occur" more than one time on a
record, allowing for the creation of a specific (and usually restrictive) number of multiples of
a value or group of values to exist within a single record. The record type can be a complex
structure even when looked at as a single construct and may hide some of the complexity of
the data through the flexibility of its definition.
From a high level, the database design is a simple network, with link and intersection
record types (called junction records by IDMS). The flexibility of this design provides a
network of relationships represented by several parent-to-child pairings. With each parentchild pair, one record type is recognized as the owner record type, and one or more record
types are recognized as member record types.
Information support for control systems
Lesson 2 / Student
Page 2/5
Revisiting the college database example from Figure 2-1, you could alter a network
database design from the previous hierarchical example to allow Campus to be directly linked
to Class through a second owner/member link, as shown in Figure 2-2.
Figure 2-2. Network class scheduling: record types connected in a modified tree structure and
allowing more than one parent
Network databases still have the limitations of pointer-type connections between the
tables. You still need to step through each node of the network to connect data records. In this
example, you still need to navigate from Campus to Teacher by way of Class. And the rows
still have to be maintained in the order the pointer is expecting so that they function properly.
Relational Databases
Relational databases such as Oracle, Microsoft SQL Server, and IBM DB2 are
different from both network and hierarchical database management systems in several ways.
In a relational database, data is organized into structures called tables, and the relations
between data elements are organized into structures called constraints. A table is a collection
of records, and each record in a table contains the same data elements, or fields. Relational
databases don't generally support multiple definitions of the fields or multiple occurrences
within a single record, which is in contrast to network and hierarchical databases. The highlevel properties of RDBMS tables are as follows:
 The value in a data element is single and atomic (no data replicates within a
field, and data contained in a field doesn't require any interpretation).
 Each row is unique (no wholly duplicated records should exist within a set).
 Column values are of the same kind (a field's data doesn't have multiple
definitions or permit "redefines”).
 The ordinal sequence of columns in a table isn't significant.
 The ordinal sequence of rows in a table isn't significant, eliminating the
problem of maintaining record pointers.
 Each column has a unique name within its owning table.
By connecting records through matching data contained in database fields rather than
the pointer constructs (unlike network databases), and by allowing for child records to have
multiple parents (unlike hierarchical databases), this type of design builds upon the strengths
of prior database systems.
You'll see in more detail how this is achieved in a moment, but let's consider how the
example would be represented in a relational database (see Figure 2-3).
Information support for control systems
Lesson 2 / Student
Page 3/5
Figure 2-3. A relational class scheduling—if all the business rules are the same
It looks the same as the network database model, doesn't it? The network database
management systems allow almost (the same flexibility as a relational system in terms of
allowable relationships. The power of this database paradigm lies in a different area. If you
follow the rules of normalization, you'd probably end up with a model that looks more like
Figure 2-4.
Figure 2-4. A relational class scheduling—in Third Normal Form
So, why is this model so different from Figure 2-3? This model ensures that only one
set named Course is necessary, since it can be related to other courses in the event you need
to set up a Prerequisite. In a similar fashion, teachers and students are recognized as being
part of the single set Person, allowing a person, Isaac Asimov, to be both a Student and a
Teacher of a Class. Relational design emphasizes storing information in one and only one
place. Here, a single teacher record is reused for as many classes as necessary, rather than
duplicating the teacher information for each class. We also created two new sets, Course
Subject and School Calendar, to group courses. This creates two new small sets to manage
certain business rules in the database, rather than a code domain structure for subjects in the
network database and a data format restriction to restrict Begin Date values to dates in Class.
The other big gain is that accessing records can also be simpler in a relational
database. Instead of using record pointers to navigate between data sets, you can reuse a
portion of one record to link it to another. That portion is usually called the primary key on its
owning table, because of its identifying nature. It becomes a foreign key on the child table.
Information support for control systems
Lesson 2 / Student
Page 4/5
The process of propagating the key to the child table is called migrating. You can use a
foreign key to navigate back into the design, allowing you to skip over tables. For example,
you can see some examples of possible data values in Table 2-1, Table 2-2, Table 2-3, and
Table 2-4. (PK means it's the primary key, and FK means it's the foreign key.)
Table 2-1. Teacher
Teacher Name (PK)
Rating
Isaac Asimov
Most Excellent
Table 2-2. Campus
Campus Name (PK)
Anaheim Satellite
Campus Phone Number
714-663-7853
Table 2-3. Class
Course Name (PK)
Creative Writing 101
Campus Name (PK and FK)
Anaheim Satellite 12/01/2004
Begin Date (PK and FK)
Table 2-4. Class Teacher
Teacher Name (PK and Course
Name Campus Name (PK
FK)
(PK and FK)
and FK)
Isaac Asimov
Creative Writing 101 Anaheim Satellite
Begin
Date
(PK and FK)
12/01/2004
If you warn to know the phone number of the campus where Isaac Asimov is
attending the Creative Writing 101 class, you can use the foreign key of Campus Name to
skip over Class and link directly with Campus. We've shortened the access path to the data by
avoiding the pointer design used in network database designs.
The design characteristics of using natural data values to relate records between data
sets, and the philosophy of building of simple understandable List of Values (LOV) sets for
reuse as foreign keys, may be the most important characteristics contributing to the success of
relational databases. Their potential for simplicity and understandability can make them
nonthreatening and an easy technology to learn.
Notice the evolutionary nature of the database systems. Over time, each database
management system built on the strengths of existing systems while attempting to address
their restrictions.
Tasks
1. Which type of described databases can be used to store data from gauges?
2. Draw a database scheme for control system that will store collected data. This
database should contain gauge descriptions (name, type, position, ranges) and data
in different time periods.
Information support for control systems
Lesson 2 / Student
Page 5/5