Download Information Organization and Retrieval

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Microsoft Access wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Ingres (database) wikipedia , lookup

Concurrency control wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Physical Database Design
University of California, Berkeley
School of Information Management and
Systems
SIMS 257: Database Management
9/26/2000
SIMS 257: Database Management
Review
• Normalization
• Denormalization
9/26/2000
SIMS 257: Database Management
Normalization
No transitive
dependency
between
nonkey
attributes
All
determinants
are candidate
keys - Single
multivalued
dependency
9/26/2000
BoyceCodd and
Higher
SIMS 257: Database Management
Functional
dependencyof
nonkey
attributes on
the primary
key - Atomic
values only
Full
Functional
dependencyof
nonkey
attributes on
the primary
key
Normalization
• Normalization is performed to reduce or
eliminate Insertion, Deletion or Update
anomalies.
• However, a completely normalized database
may not be the most efficient or effective
implementation.
• “Denormalization” is sometimes used to
improve efficiency.
9/26/2000
SIMS 257: Database Management
Today
• Physical Database Design
• Access Methods
• Indexes
Based on McFadden Modern Database Management
and Atre Database:Structured Techniques for Design,
Performance and Management
9/26/2000
SIMS 257: Database Management
Database Design Process
Application 1
External
Model
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Application 3
Conceptual
requirements
Conceptual
Model
Logical
Model
Internal
Model
Application 4
Conceptual
requirements
9/26/2000
Physical
Design
SIMS 257: Database Management
Physical Database Design
• Many physical database design decisions
are implicit in the technology adopted
– Also, organizations may have standards or an
“information architecture” that specifies
operating systems, DBMS, and data access
languages -- thus constraining the range of
possible physical implementations.
• We will be concerned with some of the
possible physical implementation issues
9/26/2000
SIMS 257: Database Management
Physical Database Design
• The primary goal of physical database
design is data processing efficiency
• We will concentrate on choices often
available to optimize performance of
database services
• Physical Database Design requires
information gathered during earlier stages
of the design process
9/26/2000
SIMS 257: Database Management
Physical Design Information
• Information needed for physical file and
database design includes:
– Normalized relations plus size estimates for them
– Definitions of each attribute
– Descriptions of where and when data are used
• entered, retrieved, deleted, updated, and how often
– Expectations and requirements for response time,
and data security, backup, recovery, retention and
integrity
– Descriptions of the technologies used to
implement the database
9/26/2000
SIMS 257: Database Management
Physical Design Decisions
• There are several critical decisions that will
affect the integrity and performance of the
system.
–
–
–
–
–
9/26/2000
Storage Format
Physical record composition
Data arrangement
Indexes
Query optimization and performance tuning
SIMS 257: Database Management
Storage Format
• Choosing the storage format of each field
(attribute). The DBMS provides some set of
data types that can be used for the physical
storage of fields in the database
• Data Type (format) is chosen to minimize
storage space and maximize data integrity
9/26/2000
SIMS 257: Database Management
Objectives of data type selection
•
•
•
•
Minimize storage space
Represent all possible values
Improve data integrity
Support all data manipulations
• The correct data type should, in minimal space,
represent every possible value (but eliminated
illegal values) for the associated attribute and can
support the required data manipulations (e.g.
numerical or string operations)
9/26/2000
SIMS 257: Database Management
Access Data Types
•
•
•
•
•
•
•
•
•
Numeric (1, 2, 4, 8 bytes, fixed or float)
Text (255 max)
Memo (64000 max)
Date/Time (8 bytes)
Currency (8 bytes, 15 digits + 4 digits decimal)
Autonumber (4 bytes)
Yes/No (1 bit)
OLE (limited only by disk space)
Hyperlinks (up to 64000 chars)
9/26/2000
SIMS 257: Database Management
Access Numeric types
• Byte
– Stores numbers from 0 to 255 (no fractions). 1 byte
• Integer
– Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes
• Long Integer
• Single
(Default)
– Stores numbers from –2,147,483,648 to 2,147,483,647 (no
fractions). 4 bytes
– Stores numbers from -3.402823E38 to –1.401298E–45 for
negative values and from 1.401298E–45 to 3.402823E38 for
positive values.
4 bytes
• Double
– Stores numbers from –1.79769313486231E308 to –
4.94065645841247E–324 for negative values and from
1.79769313486231E308 to 4.94065645841247E–324 for
positive values.
15
8 bytes
• Replication ID
– Globally unique identifier (GUID)
9/26/2000
SIMS 257: Database Management
N/A
16 bytes
Controlling Data Integrity
•
•
•
•
•
Default values
Range control
Null value control
Referential integrity
Handling missing data
9/26/2000
SIMS 257: Database Management
Designing Physical Records
• A physical record is a group of fields stored
in adjacent memory locations and retrieved
together as a unit
• Fixed Length and variable fields
9/26/2000
SIMS 257: Database Management
Designing Physical Files/Internal
Model
• Overview
• terminology
• Access methods
9/26/2000
SIMS 257: Database Management
Physical Design
• Internal Model/Physical Model
User request
Interface 1
External Model
DBMS
Internal Model
Access Methods
Interface 2
Operating
System
Access Methods
Interface 3
Data
Base
9/26/2000
SIMS 257: Database Management
Physical Design
• Interface 1: User request to the DBMS. The user
presents a query, the DBMS determines which
physical DBs are needed to resolve the query
• Interface 2: The DBMS uses an internal model
access method to access the data stored in a logical
database.
• Interface 3: The internal model access methods
and OS access methods access the physical
records of the database.
9/26/2000
SIMS 257: Database Management
Physical File Design
• A Physical file is a portion of secondary
storage (disk space) allocated for the
purpose of storing physical records
• Pointers - a field of data that can be used to
locate a related field or record of data
• Access Methods - An operating system
algorithm for storing and locating data in
secondary storage
• Pages - The amount of data read or written
in one disk input or output operation
9/26/2000
SIMS 257: Database Management
Internal Model Access Methods
• Many types of access methods:
–
–
–
–
–
–
Physical Sequential
Indexed Sequential
Indexed Random
Inverted
Direct
Hashed
• Differences in
– Access Efficiency
– Storage Efficiency
9/26/2000
SIMS 257: Database Management
Physical Sequential
• Key values of the physical records are in
logical sequence
• Main use is for “dump” and “restore”
• Access method may be used for storage as
well as retrieval
• Storage Efficiency is near 100%
• Access Efficiency is poor (unless fixed size
physical records)
9/26/2000
SIMS 257: Database Management
Indexed Sequential
• Key values of the physical records are in logical
sequence
• Access method may be used for storage and
retrieval
• Index of key values is maintained with entries for
the highest key values per block(s)
• Access Efficiency depends on the levels of index,
storage allocated for index, number of database
records, and amount of overflow
• Storage Efficiency depends on size of index and
volatility of database
9/26/2000
SIMS 257: Database Management
Index Sequential
Data File
Actual
Value
9/26/2000
Address
Block
Number
Dumpling
1
Harty
2
Texaci
3
...
…
Adams
Becker
Dumpling
SIMS 257: Database Management
Block 1
Getta
Harty
Block 2
Mobile
Sunoci
Texaci
Block 3
Indexed Sequential: Two Levels
Key
Value
Key
Value
150
1
385
2
001
003
.
.
150
Address
385
7
678
8
805
9
…
Key
Value
251
.
.
385
Address
536
3
678
4
Key
Value
9/26/2000
Address
455
480
.
.
536
605
610
.
.
678
Address
785
5
805
6
SIMS 257: Database Management
791
.
.
805
705
710
.
.
785
Indexed Random
• Key values of the physical records are not
necessarily in logical sequence
• Index may be stored and accessed with Indexed
Sequential Access Method
• Index has an entry for every data base record.
These are in ascending order. The index keys are
in logical sequence. Database records are not
necessarily in ascending sequence.
• Access method may be used for storage and
retrieval
9/26/2000
SIMS 257: Database Management
Indexed Random
Becker
Harty
Actual
Value
Address
Block
Number
Adams
2
Becker
1
Dumpling
3
Getta
2
Harty
1
Adams
Getta
Dumpling
9/26/2000
SIMS 257: Database Management
Btree
F
B
|| D || F|
|| P || Z|
H || L || P|
R || S || Z|
Devils
Aces
Boilers
Cars
9/26/2000
Flyers
Hawkeyes
Hoosiers
Minors
Panthers
SIMS 257: Database Management
Seminoles
Inverted
• Key values of the physical records are not
necessarily in logical sequence
• Access Method is better used for retrieval
• An index for every field to be inverted may
be built
• Access efficiency depends on number of
database records, levels of index, and
storage allocated for index
9/26/2000
SIMS 257: Database Management
Inverted
CH 145
101, 103,104
Actual
Value
Address
Block
Number
CH 145
1
CS 201
2
CS 623
3
PH 345
…
CS 201
102
CS 623
105, 106
9/26/2000
SIMS 257: Database Management
Student
name
Course
Number
Adams
CH145
Becker
cs201
Dumpling ch145
Getta
ch145
Harty
cs623
Mobile
cs623
Direct
• Key values of the physical records are not
necessarily in logical sequence
• There is a one-to-one correspondence between a
record key and the physical address of the record
• May be used for storage and retrieval
• Access efficiency always 1
• Storage efficiency depends on density of keys
• No duplicate keys permitted
9/26/2000
SIMS 257: Database Management
Hashing
• Key values of the physical records are not
necessarily in logical sequence
• Many key values may share the same physical
address (block)
• May be used for storage and retrieval
• Access efficiency depends on distribution of keys,
algorithm for key transformation and space
allocated
• Storage efficiency depends on distibution of keys
and algorithm used for key transformation
9/26/2000
SIMS 257: Database Management
Comparative Access Methods
Factor
Storage space
Sequential
retrieval on
primary key
Random Retr.
Multiple Key
Retr.
Deleting records
Sequential
No wasted space
Indexed
Hashed
No wasted
space for data
but extra space for index
more space needed for
addition and deletion of
records after initial load
Very fast
Moderately Fast
Impractical
Moderately Fast
Impractical
Possible but needs Very fast with
multiple indexes
a full scan
can create wasted OK if dynamic
space
OK if dynamic
Adding records requires rewriting
file
Easy but requires
Maintenance of
Updating records usually requires
indexes
rewriting file
9/26/2000
SIMS 257: Database Management
Very fast
Not possible
very easy
very easy
very easy