Download Database - La Salle University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Oracle Database wikipedia , lookup

IMDb wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
CSC 240 (Blum)
DATABASE
An introduction to database
concepts and vocabulary
1
CSC 240 (Blum)
UBIQUITY OF DATABASES





Mauchly and Eckert designed the ENIAC during World
War II to perform calculations (shell trajectories).
After the ENIAC was built, it was used to do
thermonuclear chain reaction calculations.
But when Mauchly and Eckert went into business, their
first customer was the census bureau.
And ever since computers have played an important role
in filing and record keeping.
Suffice it to say
 Databases are very important.
 Databases are all around us.
2
CSC 240 (Blum)
MAUCHLY & ECKERT AND EARLY
PROGRAMMERS OF THE ENIAC
3
CSC 240 (Blum)
WHAT DO WE WANT?

Desired Features of our database

Storage:

We want to store data efficiently, have it centralized (or at
least seemingly centralized).


Retrieval:


We want to have the data at our fingertips when we want it.
Querying:


Centralization (integration) are subject to bottlenecks and
single-point-of-failure issues.
We want to ask various questions about the data (and get
answers in a timely manner).
(These desires are to some extent in conflict.)
4
CSC 240 (Blum)
AUTOMATING THE PROCEDURE
We would like to have the computer perform
the tedious aspects of such tasks.
 An outdated approach would be to use a filebased system, that is, to have the data stored
in various (flat, simple text) files and write a
program that reads the files, parses the
information, does the required searching,
sorting, correlating, etc.

5
CSC 240 (Blum)
ENTITIES




Even in the file-based approach, one must identify
units of information that will be contained in a
single file. These are known as the entities.
An entity is somewhat similar to an object in
programming, it collects data that belongs together
in some immediate way.
Entities also separate the data into distinct units.
Database entities often reflect real objects/entities
(persons, buildings, courses, etc.)
6
CSC 240 (Blum)
FIELDS

The lower-level pieces of data gathered together to
form an entity are known as fields or attributes or
properties.



The Person entity might consist of fields like FirstName,
LastName, JobType, SocSecNum, etc.
Fields are analogous to properties of an object.
Fields have a type (Text, Number, Yes/No, Memo,
Date/Time, etc.) which indicate how the information is
to be stored and interpreted.
7
CSC 240 (Blum)
RELATIONSHIP

The various entities may be distinct, but
they are not completely disconnected.
 E.g.

a Customer places an order
An association between two entities is
known as a relationship.
 The
Customer-places-Order relationship was
realized in Access by using the Lookup Wizard
to ensure that the two tables had a common
field (CustomerID).
8
CSC 240 (Blum)
ER DIAGRAM

One can visualize the entities and their
relationship using an Entity-Relationship (ER)
diagram.
 The
entities are represented by rectangles.
 The relationships are represented by arrows
between the rectangles.
 The
arrow may include a verb to capture the nature of
the relationship (as well as other notations).
9
CSC 240 (Blum)
ER DIAGRAM EXAMPLE
Customer
CustomerID
CustomerFirstName
…
Places
Order
OrderID
ShippingCost
…
Is part of
Item
ItemID
ItemDescription
…
10
CSC 240 (Blum)
FILE-BASED SYSTEMS

In a file-based approach, there would be a
file corresponding to each entity.


(There may be more files than entities since some
relationships are realized through their own
tables/files.)
These files must be located, read, parsed.
The data is then used to initialize some
variables and/or objects which are then
analyzed (searched, sorted, etc.) by the
remainder of the program.
11
CSC 240 (Blum)
DETAILS, DETAILS

The programmer must have information about
the data files. For example:





where they are to be found
the order in which the fields occur
the length of the fields and/or the delimiter used
Changing the length of a field or adding a field
may require that all of the corresponding
programs be rewritten.
Such features of the file-based approach are
called program-data dependence.
12
CSC 240 (Blum)
AUTOMATING THE AUTOMATION

Since







Reading is reading
Parsing is parsing
Searching is searching
Sorting is sorting
Why have programmers continually repeating
these tedious tasks?
Automate and/or generalize the process.
These are some of the aspects of a database
management system (DBMS).
13
CSC 240 (Blum)
SPECIFIC INFO IN DATABASE
The generalized routines for reading, parsing,
searching, sorting etc. are in the DBMS.
 But information specific to a particular case
(number of fields, their type, size and so on) is
still required. This data is placed together with
the “actual” data in the database.

14
CSC 240 (Blum)
META-DATA

This data about the data is known as
meta-data.
 Meta:
a prefix meaning: after, along with or
beyond
The meta-data describes the actual data,
and so databases are sometimes called
self-describing.
 Related terms include: data dictionary,
system catalog and schema.

15
CSC 240 (Blum)
SELECT QUERY SHOWS DATA BUT DESCRIBE
QUERY SHOWS META-DATA
16
CSC 240 (Blum)
LAYERS
The inclusion of the meta-data (the selfdescribing aspect of a database) allows a
separation of the data from the processing,
providing program-data independence.
 Another way to say this is that there is a
separation between the database (specific)
and the database management system
(generic).

17
CSC 240 (Blum)
DATABASE/DBMS DISTINCTION
Database
Raw-data and meta-data
DBMS
User
Application
Users and applications interact with a database only through
the DBMS.
18
CSC 240 (Blum)
PROS OF DATABASE APPROACH







Control of data redundancy
Data consistency
More info from same data
Sharing of data
Improved data integrity
Improved security
Enforcement of standards






Economy of scale
Balancing of conflicting
requirements
Improved accessibility and
responsiveness
Improved maintenance
through data independence
Increased concurrency
Improved backup and
recovery services
19
CSC 240 (Blum)
PROS

Control of data redundancy and consistency





If the same data is entered more than once, it is said to
be redundant.
An obvious point is that this wastes space.
If the data is updated, it must be updated in several
places or the data will be inconsistent.
Relationships are realized through repeated data, but
one tries to use something like an ID# (a name might
change but an ID# does not have to).
(Redundancy reduction and query simplicity can be at
odds, sometimes one sacrifices redundancy in order to
make querying easier, e.g. in data mining. )
20
CSC 240 (Blum)
PROS (CONT.)

More information, sharing of data and
standardization
 Because
databases facilitate querying, they
can yield more information.
 A database approach often centralizes
(integrates) the records of different
departments, making more (raw) data and
information available to the users
 Integration often leads to standardization,
consistent naming schemes, consistent
report formats, etc.
21
CSC 240 (Blum)
PROS (CONT.)

Improved data integrity
An old computing axiom says garbage in, garbage
out (GIGO). If the raw data is bad, so too is the
resulting information.
 In the database approach, one can apply
constraints to help ensure that the data is reliable.

 Access’s
lookup table for the foreign keys is an example.
A foreign key is supposed to match an entry from another
table, the lookup table helps ensure that.

We also saw that we could “Enforce Referential Integrity.”
 We
also mentioned masks, which are another integrity
check.
22
CSC 240 (Blum)
PROS (CONT.)

Improved security
 Part
of the meta-data can be used to
authenticate users who are allowed to
access the data.
 Different
users may have different access
 Data is often not entered directly into a Table
using the DataSheet but by using Views and/or
Forms, which can hide sensitive data from certain
users.
23
CSC 240 (Blum)
PROS (CONT.)

Economy of scale



A benefit of an organization centralizing (integrating)
its record-keeping efforts is the money applied to
individual departments is pooled.
Not only is duplication of effort reduced or
eliminated, but so too is duplication of hardware and
software.
Balancing of conflicting requirements

Integration can lead to a resolution or at least a
balancing of different departments, which may have
conflicting goals.
24
CSC 240 (Blum)
PROS (CONT.)

Improved data accessibility and
responsiveness and increased productivity
Because nitty-gritty details (reading, parsing,
sorting, searching, etc.) are built into the DBMS, the
database staff work at a higher level closer to the
users, responding to their particular needs.
 Again with fewer details to attend to, more work can
be accomplished.

25
CSC 240 (Blum)
PROS (CONT.)

Improved maintenance through data
independence
 Change
of a field’s type or size or introduction of a
new field changes only the database and not the
DBMS.
 This layering yields independence which simplifies
maintenance. Changing the database does not
require changing the DBMS, which was not the
case in the file-based approach.
26
CSC 240 (Blum)
PROS (CONT.)

Increased concurrency
The DBMS can handle multi-users using and
even updating the database.
 There are built-in mechanisms to prevent two
users from changing the data in conflicting ways.


Improved backup and recovery services

Backing up and recovering the database may be
handled by the DBMS (that is, they are
integrated services) rather than externally.
27
CSC 240 (Blum)
CONS OF DATABASE APPROACH
Complexity
 Size
 Cost of DBMS
 Additional hardware costs
 Cost of conversion
 Performance
 Higher impact of failure

28
CSC 240 (Blum)
CONS

Complexity and Size



Because so many features have been integrated into
the DBMS’s, they have become complicated software
packages. One must understand these features to
utilize them properly.
Integrating information from various departments
makes the database more complicated. Good design
is crucial.
Integration of features into the DBMS and data into
the database means that both may become quite
large.
29
CSC 240 (Blum)
CONS (CONT.)

Cost OF DBMSs and the hardware
 Again
the size and complexity of the software
means that such packages are expensive.
 The larger, more complex software requires more
powerful hardware to run on.
 It also requires a knowledgeable, well-trained
(hopefully high paid) staff.
30
CSC 240 (Blum)
CONS (CONT.)

Conversion cost
 Legacy

Performance
 More

system
complexity may slow down some tasks.
Higher impact of failure.
 Integrating
(centralizing) the information can mean
that everything is lost at once.
31
CSC 240 (Blum)
THINGS IN THE DATABASE ENVIRONMENT

In addition to the data, there’s
 Hardware
that stores and manipulates
the data
 Software to
 Interface

with the hardware
(actually the operating system which
interfaces with the BIOS which interfaces
with the hardware)
 Provide
the data with structure
 Interface with the user and/or
applications
 People
32
CSC 240 (Blum)
HARDWARE

Could be:
A
single PC
 A mainframe and terminals
 A network of computers
33
CSC 240 (Blum)
A scenario
Database
Database Server
Network Server
Client
Client
Network Server
Client
Client
Network Server
Client
Client
34
CSC 240 (Blum)
CLIENT-SERVER



The client-server model is a way for transactions to take
place.
 The transaction is viewed as a service.
 The client requests the service.
 The server provides the service.
For example, to query a networked database
 A client would request the network server(s) to
connect it to the database server
 The database server queries the database
 The result is passed from database server to network
server to client.
The client-server terminology can be applied to both
software and hardware.
35
CSC 240 (Blum)
FRONT-END AND BACK-END



In large-scale client-server interaction, there may be
many intermediate client-server interactions (e.g. the
network servers become clients of the database
server).
The software and hardware near the beginning of the
transaction (initial request) is called front-end while
that near the ultimate providing of the service is
known as back-end.
In the analogy of getting a meal at a restaurant, the
waiter is front-end and the cook is back-end.
36
CSC 240 (Blum)
SOFTWARE
The bulk of the software is contained in the
database management system (DBMS). It
handles everything from storage and structure
to security and integrity.
 There may also be application software that
interfaces with the DBMS.
 The DBMS allows one to interface with the
database on a higher level.

37
CSC 240 (Blum)
PRESCRIPTIVE VS. DESCRIPTIVE

In a file-based approach, one’s program is a
step-by-step procedure explicitly determining
how a question will be answered


Read this file, parse it this way, create these objects,
search them this way
This approach is sometimes called prescriptive

Prescription originally meant a set of instructions for
preparing and/or taking a drug, only later did the
word become synonymous with the drug itself.
38
CSC 240 (Blum)
PRESCRIPTIVE VS. DESCRIPTIVE (CONT.)
In the database approach, most of the
nitty-gritty, step-by-step instructions are
hidden in the DBMS and the user need
only describe the data (the meta-data, the
self-describing database) and describe
what he or she wants from the data.
 This approach is sometimes called
descriptive.

39
CSC 240 (Blum)
LANGUAGE GENERATIONS

People talk about generations of programming languages or the level of
a language.
 A first generation language (1GL) is machine code, that is, a binary
representation of instructions (e.g. 11001101)
 A second generation language (2GL) is assembly language, that is,
mnemonics for machine code (e.g. STA 13)
 A third generation language (3GL) is a high-level language, which
includes most compiled languages, such as Fortran, C, BASIC, Java,
etc. (e.g. int a = 13)
 A fourth generation language (4GL) is used to develop database
applications. They are designed to be closer to natural language.
40
CSC 240 (Blum)
SQL
SQL (Structured Query Language),
pronounced S-Q-L or See-Quel, has
become the standard language for
relational databases.
 SQL is part third generation and part
fourth generation.

41
CSC 240 (Blum)
SQL IS THE SEQUEL TO SEQUEL
The original version was called SEQUEL and
was developed at IBM in the mid-70’s.
 However, Oracle Corporation was the first
company to use SQL in a commercial product in
1979.

42
CSC 240 (Blum)
WHAT’S IT MADE OF?

SQL has 3 components:
 Data
Definition Language (DDL)
 The
part that allows you to establish the structure
of the database
 Data
Manipulation Language (DML)
 The
part that allows you to enter data, update
data and ask questions of the data (queries)
 Data
Control Language (DCL)
 The
part that allows you to add security features
(e.g. user authentication), concurrency (multiuser) features, recovery features, etc.
43
CSC 240 (Blum)
THE PEOPLE: WHO’S INVOLVED WITH THIS
DATABASE ANYWAY?

Data Administrator (DA):
 Oversees
data resources.
 More of a hands-off role.
 Deals with other managers.
 Sets policies.
 Handles budgets.
 Plans for future.
44
CSC 240 (Blum)
WHO’S INVOLVED WITH THIS DATABASE
ANYWAY? (CONT.)

Database administrator (DBA):
 More
hands-on and more technical than the Data
Administrator (DA)
 Oversees hardware and software design,
implementation and maintenance
 Responsible for security and integrity
 Ensures users have appropriate accessibility
 Etc.
45
CSC 240 (Blum)
WHO’S INVOLVED WITH THIS DATABASE
ANYWAY? (CONT.)

Database Designer:

Logical database designer:
Identifies entities, fields, relationships
 Applies high-level constraints including the business rules



E.g. A Simpsons database might have a business rule that there must
be between 10 and 30 episodes in a complete season
Physical database designer:
Actually creates tables
 Implements constraints
 Introduces security measures
 Etc.

46
CSC 240 (Blum)
WHO’S INVOLVED WITH THIS DATABASE
ANYWAY? (CONT.)

Application Developer:
 After
the overall structure of the database is laid
out and implemented, the application developer
considers the more individual needs, such as what
software does the payroll department need
 Application may involve third-generation or fourth
generation languages or a combination
 E.g.
a Visual Basic program could use an SQL statement
to query a database
47
CSC 240 (Blum)
WHO’S INVOLVED WITH THIS DATABASE
ANYWAY? (CONT.)

End-Users:
 Naïve:
 Has
little to no database knowledge
 Uses applications that simplify interaction with the
database

Cashier scanning an item’s barcode
 Sophisticated:
 Knows
something to a lot about databases
 May use SQL to update or query database
48
CSC 240 (Blum)
REFERENCES
Database Systems Rob and Coronel
 Database Systems, Connolly and Begg
 SQL for Dummies, Taylor
 http://www.metacard.com/wp1a.html



http://www.oracle.com/glossary/index.html?axx.ht
ml
Concepts of Database Management, Pratt
and Adamski
49