Download Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Versant Object Database wikipedia , lookup

Copyright wikipedia , lookup

Database wikipedia , lookup

Concurrency control wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Chapter 9:
Database Systems
Computer Science: An Overview
Tenth Edition
by
J. Glenn Brookshear
Modified for UMBC’s CMSC 100, Fall 2008,
by Marie desJardins
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
Chapter 9: Database Systems
•
•
•
•
•
•
•
9.1 Database Fundamentals
9.2 The Relational Model
9.3 Object-Oriented Databases
9.4 Maintaining Database Integrity
9.5 Traditional File Structures
9.6 Data Mining
9.7 Social Impact of Database Technology
1-2
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-2
Database
A collection of data that is multidimensional
in the sense that internal links between its
entries make the information accessible
from a variety of perspectives
1-3
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-3
Figure 9.1 A file versus a database
organization
1-4
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-4
Figure 9.2 The conceptual layers of a
database implementation
1-5
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-5
Schemas
• Schema: A description of the structure of
an entire database, used by database
software to maintain the database
• Subschema: A description of only that
portion of the database pertinent to a
particular user’s needs, used to prevent
sensitive data from being accessed by
unauthorized personnel
1-6
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-6
Database Management Systems
• Database Management System (DBMS): A
software layer that manipulates a database in
response to requests from applications
• Distributed Database: A database stored on
multiple machines
– DBMS will mask this organizational detail from its
users
• Data independence: The ability to change the
organization of a database without changing the
application software that uses it
1-7
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-7
Database Models
• Database model: A conceptual view of a
database
– Relational database model
– Object-oriented database model
1-8
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-8
Relational Database Model
• Relation: A rectangular table
– Attribute: A column in the table
– Tuple: A row in the table
1-9
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-9
Figure 9.3 A relation containing
employee information
1-10
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-10
Relational Design
• Avoid multiple concepts within one relation
– Can lead to redundant data
– Deleting a tuple could also delete necessary
but unrelated information
1-11
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-11
Improving a Relational Design
• Decomposition: Dividing the columns of a
relation into two or more relations,
duplicating those columns necessary to
maintain relationships
– Lossless or nonloss decomposition: A
“correct” decomposition that does not lose any
information
1-12
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-12
Figure 9.4 A relation containing
redundancy
1-13
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-13
Figure 9.5 An employee database
consisting of three relations
1-14
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-14
Figure 9.6 Finding the departments in
which employee 23Y34 has worked
1-15
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-15
Figure 9.7 A relation and a proposed
decomposition
1-16
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-16
Relational Operations
• Select: Choose rows
• Project: Choose columns
• Join: Assemble information from two or
more relations
1-17
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-17
Figure 9.8 The SELECT operation
1-18
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-18
Figure 9.9 The PROJECT operation
1-19
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-19
Figure 9.10 The JOIN operation
1-20
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-20
Figure 9.11 Another example of the
JOIN operation
1-21
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-21
Figure 9.12 An application of the
JOIN operation
1-22
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-22
Maintaining Database Integrity
• Transaction: A sequence of operations that
must all happen together
– Example: transferring money between bank accounts
• Transaction log: A non-volatile record of each
transaction’s activities, built before the
transaction is allowed to execute
– Commit point: The point at which a transaction has
been recorded in the log
– Roll-back: The process of undoing a transaction
1-29
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-29
Maintaining database integrity
(continued)
• Simultaneous access problems
– Incorrect summary problem
– Lost update problem
• Locking = preventing others from
accessing data being used by a
transaction
– Shared lock: used when reading data
– Exclusive lock: used when altering data
1-30
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-30
Sequential Files
• Sequential file: A file whose contents can
only be read in order
– Reader must be able to detect end-of-file
(EOF)
– Data can be stored in logical records, sorted
by a key field
• Greatly increases the speed of batch updates
1-31
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-31
Figure 9.14 The structure of a simple
employee file implemented as a text file
1-32
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-32
Figure 9.15 A procedure for merging
two sequential files
1-33
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-33
Figure 9.16
Applying the merge
algorithm (Letters
are used to
represent entire
records.
The particular letter
indicates the value
of the record’s
key field.)
Indexed Files
• Index: A list of key values and the location
of their associated records
1-35
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-35
Figure 9.17 Opening an
indexed file
1-36
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-36
Hashing
• Each record has a key field
• The storage space is divided into buckets
• A hash function computes a bucket
number for each key value
• Each record is stored in the bucket
corresponding to the hash of its key
1-37
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-37
Figure 9.18 Hashing the key field
value 25X3Z to one of 41 buckets
1-38
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-38
Figure 9.19 The rudiments of a
hashing system
1-39
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-39
Collisions in Hashing
• Collision: The case of two keys hashing to the
same bucket
– Clustering problem: Poorly designed hashing function
can have uneven distribution of keys into buckets
– Collision also becomes a problem when there aren’t
enough buckets (probability greatly increases as load
factor (% of buckets filled) approaches 75%)
– Solution: somewhere between 50% and 75% load
factor, increase number of buckets and rehash all data
1-40
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-40
Data Mining
• Data Mining: The area of computer
science that deals with discovering
patterns in collections of data
• Data warehouse: A static data collection
to be mined
– Data cube: Data presented from many
perspectives to enable mining
1-41
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-41
Data Mining Strategies
•
•
•
•
•
•
Class description
Class discrimination
Cluster analysis
Association analysis
Outlier analysis
Sequential pattern analysis
1-42
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-42
Social Impact of Database
Technology
• Problems
– Massive amounts of personal data are being collected
• Often without knowledge or meaningful consent of affected
people
– Data merging produces new, more invasive
information
– Errors are widely disseminated and hard to correct
• Remedies
– Existing legal remedies often difficult to apply
– Negative publicity may be more effective
1-43
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
9-43