Chapter 1: Introduction
What is a database?
shared file containing integrated data
with controlled redundancy
Often implemented as a group of related
tables (examples on pages 4-5)
Relationships between tables often
implemented as other tables.
Making Distinctions
A collection of tables as in the previous
Database management system (DBMS):
Database plus tools to process requests,
enforce integrity constraints, provide
security, analyze usage, optimize access,
Database application.
Software that accesses data from a
2-dimensional table having the
following properties
Table entries are single valued or
This means that an entry does not
consist of a structure more complex
than strings, dates, etc.
For example, a column type cannot be
another table or relation.
entries in any one column are of the
same type
Each column, also called attribute, has a
order of columns & rows is unimportant
no two rows are identical
NOTE: The purist may disagree with
this definition as he or she will see a
table as an implementation of a
A relation is more of an abstract
concept but is usually implemented
using a table.
Some do use the terms
reduce redundancy and
e.g. most student information from the
tables on page 5 is not replicated.
One fact is in one place.
Data is shared
Security is applied centrally
language independent (COBOL, VB,
C++, java, C#, etc)
Multiple applications
Easier to maintain integrity
data independence
allow data access without knowledge of
its internal organization and structure
Design: How many tables?
This is an important decision,
and based on a set of rules
known as normalization, which
we will cover later.
However, the figure on page 18
illustrates a simple example
related to an important issue.
SQL: Structured Query Language
Used to extract information from one
or more tables.
Can specify what you want without
specifying how to get it.
Example on p. 9
Can be standalone or embedded in
application software.
General format
Select stuff
From one or more tables
Where conditions
Ranges from nearly trivial to fairly
complex logic
Some Definitions
metadata (Sysfiles) - also data
Description of all tables in the database.
Example on p. 12.
More of a concern for a Database
Administrator (DBA).
Know what it is - but we will not focus on
Client/Server Environment
Database is stored on a server
Application software often runs on a
client using languages such as C#
and others.
Typically written by application
Stored Procedure.
Procedures stored on a server.
Typically written by DB people.
Can be invoked by client applications.
Can be used for common activities used
by multiple users.
Some DBAs may limit database access
except through certain stored
It gives them more control.
A special type of procedure that is
invoked automatically when a certain
action occurs.
Can be used to make sure needed data
elements are updated due to user
Example: A student adds a class and a
trigger is activated to update tuition &
The app that adds the student does not
know about the trigger but the DBMS
Building a database:
Some terms:
Entity-Relationship (E-R) Diagrams.
Entities are somewhat like the classes
you’ve designed in previous courses.
Relationships define how entities are
related to one another.
Together they must reflect the reality as
it is understood.
Design phase:
Design entities, relationships, and
constraints consistent with perceived
Test phase:
Create tables, stored procedures,
triggers, forms, reports, etc. consistent
with the E-R diagram and test.
Implementation phase:
Put into production
Early database models:
Hierarchical model
IMS (Information Management System)
Developed largely by IBM
Required all data be organized as a
Tended to be awkward since not all
realities are hierarchical in nature
network model (CODASYLConference on Data Systems
Data organized into complex graph
(network) data structures.
Application programs reflected the actual
data structures.
Changes in design potentially affect ALL
applications – costly!!
Relational Model (dominant form
Object Model (not a commercial
See table on p. 21 for a general
history – also the prose on p. 23.
We will NOT cover web-based
databases/services since we already
have a course for that.
Relational Model
Edgar Codd’s landmark paper in CACM
(Communications of the Association of
Computing Machinery) A relational Model
of data for large shared databases in 1970.
Codd, a mathematician working for IBM in
San Jose CA, applied concepts of
relational algebra to the problem of a
“stored data bank”.
Paved the way for the development of the
relational database.
Mapping objects to relational
E.g. how does an object oriented
program access non-object-oriented
data in a relational database?
We will see how this works when we
discuss ADO.NET
Data Structures for databases:
Appendix D
We will not focus on complex data
structures but there are a few things you
must be aware of.
This appendix is online in a zipped file at
Contains concentric magnetic tracks
on each surface.
Each track is divided into sectors.
Disk head moves radially inward and
outward while the disk rotates.
Disks are SLOW and a potential bottleneck
Need to minimize disk head movement
(seek time) for optimal performance.
Rotational delays (time for sector to rotate
past the head) also a factor
File Organizations
Linked List:
Database records, disk sectors, or
clusters of sectors are maintained in a
linear linked list.
Simple but can be very slow, especially
for finding a record based on a key or
index value.
i.e. find an employee record given the
employee’s ID.
See pages D-3 and D-4 of the appendix.
list of field values that identify records
along with the location of that record.
List can be linear or some other
e.g. a textbook often has a linear index
at the end
Searching the index is a lot faster than
searching through all of the content.
However, for many millions of records a
linear index can still be inefficient.
hierarchical arrangement of index
Provides quick access & order to data.
See pages D-5 and D-6 of the appendix.
Typically each level would correspond to
a sector or cluster.
Might have millions or records
accessible via only a few index layers.
Hash function
Index value (sometimes called a key) fed
into a hash function which specifies
where to store the entire record.
To locate a record, given its key, apply
the hash function to the key value and
the location is calculated.
R is a record
R.k is the value of R’s key field
H is a hash function that calculates a location
H (R.k)
Hash table (database
R is stored here
Time to find a record can be independent
of the number of records.
Assumes a good hash function and
sufficient space.
Each are nontrivial and the subject of a
course in data structures or algorithms.
1 entry for each record
Useful if records are stored in random
1 index for a group of records (say 1 on
a page)
Useful if records are maintained in order
3 Levels (views) of a database.
physical storage
conceptual, sometimes the DBA view
a collection of Base Tables
Base table is a table with a direct
underlying storage structure.
Created from the E-R diagram
described by a data dictionary or
External, sometimes user view
Collection of tables defined for a
particular user using SQL.
They are called logical tables, virtual
tables, or derived tables, or just view
The data in a view is presented to the
user as a single table though it is
actually derived from one or more base
tables specified in its definition.
These logical tables do not exist in the
same sense as a base table – there is
NO direct underlying storage structure.
A view can simplify the user’s view of
the database and provide security by
hiding certain parts of a base table.
Table consisting of student with a given
major or GPA value)
the single table in Fig 1-20 (page 18)
could be a view derived by joining the
two other base tables in the same figure.
DBA (Database Administrator)
defines conceptual schema, internal
schema, user liaison, security and
integrity, backup/recovery, performance
Microsoft SQL Server 2008
Accessing SQL Server:
Start All Programs  Microsoft SQL
Server 2008 R2  SQL Server
Management Studio.
You may see a window indicating the
MS SQL Server Management Studio is
configuring for first time use.
Just wait and be patient.
In the Connect to Server window, select
Database Engine for the Server Type
ICSD for the server name
Windows Authentication for Authentication
(These should all be defaults).
Then press the connect button.
In the Object Explorer pane (left side of
screen), expand the Databases folder.
If you don’t see an Object Explorer Pane,
select it from the View menu.
There are four databases that start with
“CS451” you should have read-only access
to each one. I will use these during the
To see the tables in one of them, expand
the database folder and the subsequent
Tables folder that appear. Right click on
one of the table names and select Select
Top 1000 Rows.
Test this and let me know of any access