Download File Organization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Registry of World Record Size Shells wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Storage and File Organization
File Organization
 Basics
 A database is a collection of files,
 A file is a collection of records
 A record (tuple) is a collection of fields (attributes)
 Files are stored on Disks (that use blocks to read and write)
 Two important issues:
1. Representation of each record
2. Grouping/Ordering of records and storage in blocks
Database System Concepts
11.2
File Organization
 Goal and considerations:
 Compactness
 Overhead of insertion/deletion
 Retrieval speed
 sometimes we prefer to bring more tuples than necessary into MM and
use CPU to filter out the unnecessary ones!
Database System Concepts
11.3
Record Representation
 Fixed-Length Records
 Example
Account( acc-number char(10), branch-name char(20), balance real)
Each record is 38 bytes.
Store them sequentially, one after the other
Record1 at position 0, record2 at position 38, record3 at position 76 etc
Compactness (350 bytes)
Database System Concepts
11.4
Fixed-Length Records

Simple approach:

Store record i starting from byte n  (i – 1), where n is the size of
each record.

Record access is simple but records may cross blocks

Modification: do not allow records to cross block boundaries

Insertion of record i: Add at the end

Deletion of record i: Two alternatives:

move records:
1. i + 1, . . ., n to i, . . . , n – 1
2. Move record n to position i

do not move records, but link all free records on a free list
Database System Concepts
11.5
Free Lists
 2nd approach: FLR with Free Lists


Store the address of the first deleted record in the file header.
Use this first record to store the address of the second deleted record,
and so on

Can think of these stored addresses as pointers since they “point” to
the location of a record.
Better handling ins/del
Less compact
Database System Concepts
11.6
Variable-Length Records
 3rd approach: Variable-length records arise in database systems in
several ways:
 Storage of multiple record types in a file.
 Record types that allow variable lengths for one or more fields.
 Record types that allow repeating fields or multivalued attribute.
 Byte string representation
 Attach an end-of-record () control character to the end of each
record
 Difficulty with deletion (leaves holes)
 Difficulty with growth

4
Field
Count
Database System Concepts
R1

R2
11.7

R3
Variable-Length Records: Slotted Page
Structure
 4th approach VLR-SP
 Slotted page header contains:
 number of record entries
 end of free space in the block
 location and size of each record
 Records stored at the bottom of the page
 External tuple pointers point to record ptrs:
 rec-id = <page-id, slot#>
Database System Concepts
11.8
Rid = (i,N)
Page i
Rid = (i,2)
Rid = (i,1)
20
N
...
16
2
24
N
1 # slots
SLOT DIRECTORY
Insertion: 1) Use Free Space Pointer (FP) to find space and insert
2) Find available ptr in the directory (or create a new one)
3) adjust FP and number of records
Deletion ?
Database System Concepts
11.9
Pointer
to start
of free
space
Variable-Length Records (Cont.)

Fixed-length representation:
 reserved space
 pointers

5th approach: Fixed Limit Records (for VLR)

Reserved space – can use fixed-length records of a known
maximum length; unused space in shorter records filled with a null
or end-of-record symbol.
Database System Concepts
11.10
Pointer Method


6th approach: Pointer method
Pointer method
 A variable-length record is represented by a list of fixed-length records,
chained together via pointers.
 Can be used even if the maximum record length is not known
Database System Concepts
11.11
Pointer Method (Cont.)
 Disadvantage to pointer structure; space is wasted in all
records except the first in a chain.
 Solution is to allow two kinds of block in file:
 Anchor block – contains the first records of chain
 Overflow block – contains records other than those that are the
first records of chains.
Database System Concepts
11.12
Ordering and Grouping records

Issue #1:

In what order we place records in a block?
1. Heap technique: assign anywhere there is space
2. Ordered technique: maintain an order on some attribute
So, we can use binary search if selection on this attribute.
Database System Concepts
11.13
Sequential File Organization
 Suitable for applications that require sequential
processing of the entire file
 The records in the file are ordered by a search-key
Database System Concepts
11.14
Sequential File Organization (Cont.)
 Deletion – use pointer chains
 Insertion –locate the position where the record is to be inserted
 if there is free space insert there
 if no free space, insert the record in an overflow block
 In either case, pointer chain must be updated
 Need to reorganize the file
from time to time to restore
sequential order
Database System Concepts
11.15
Clustering File Organization
 Simple file structure stores each relation in a separate file
 Can instead store several relations in one file using a
clustering file organization
 e.g., clustering organization of customer and depositor:
SELECT account-number, customer-name
FROM depositor d, account a
WHERE d.customer-name = a.customer-name
 good for queries involving depositor
account, and for
queries involving one single customer and his accounts
 bad for queries involving only customer
 results in variable size records
Database System Concepts
11.16
File organization
 Issue #2: In which blocks should records be placed
Many alternatives exist, each ideal for some situation , and not so
good in others:

Heap files: Add at the end of the file.Suitable when typical access
is a file scan retrieving all records.

Sorted Files:Keep the pages ordered. Best if records must be
retrieved in some order, or only a `range’ of records is needed.

Hashed Files: Good for equality selections. Assign records to
blocks according to their value for some attribute
Database System Concepts
11.17
Data Dictionary Storage
Data dictionary (also called system catalog) stores metadata:
that is, data about data, such as

Information about relations
 names of relations
 names and types of attributes of each relation
 names and definitions of views
 integrity constraints


User and accounting information, including passwords
Statistical and descriptive data
 number of tuples in each relation

Physical file organization information
 How relation is stored (sequential/hash/…)
 Physical location of relation
 operating system file name or
 disk addresses of blocks containing records of the relation

Information about indices
Database System Concepts
11.18
Data dictionary storage
 Stored as tables!!

E-R diagram?
 Relations, attributes, domains
 Each relation has name, some attributes
 Each attribute has name, length and domain
 Also, views, integrity constraints, indices
 User info (authorizations etc)
 statistics
Database System Concepts
11.19
A-name
name
position
1
relation
N
has
attribute
domain
Database System Concepts
11.20
Data Dictionary Storage (Cont.)
 A possible catalog representation:
Relation-metadata = (relation-name, number-of-attributes,
storage-organization, location)
Attribute-metadata = (attribute-name, relation-name, domain-type,
position, length)
User-metadata = (user-name, encrypted-password, group)
Index-metadata = (index-name, relation-name, index-type,
index-attributes)
View-metadata = (view-name, definition)
Database System Concepts
11.21