Download file_organize

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hash table wikipedia , lookup

Control table wikipedia , lookup

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Interval tree wikipedia , lookup

Red–black tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

B-tree wikipedia , lookup

Transcript
Chapter 16-17
File Structures, Hashing, Indexing, and
Physical Database Design
1
Oracle SQL*Loader
• http://www.oracle.com/technetwork/database/enterpriseedition/sql-loader-overview-095816.html
2
Storage
• Primary storage (main memory)
– Can be operated on directly by computer CPU
small, fast
• Secondary storage
– http://en.wikipedia.org/wiki/Hard_disk
– Can not be operated on directly by computer CPU
– Magnetic disks, optical disks, tapes, etc.
– Larger capacities, inexpensive, slower than main
memory
3
Table 16.1 Types of Storage with Capacity, Access Time, Max
Bandwidth (Transfer Speed), and Commodity Cost
Table 16.2 Specifications of Typical High-End Enterprise Disks
from Seagate (a) Seagate Enterprise Performance 10 K HDD 1200 GB
continued on next slide
Storage capacity units
•
•
•
•
Kilobytes – 1000 bytes
Megabytes – 1 million bytes
Gigabytes (Gbytes) – 1 billion bytes
Terabytes – 1000 gigabytes
6
Memory Hierarchies and Storage Devices
• Primary storage
– Cache (static RAM)– most expensive, fast, used
by CPU to speed up execution programs
http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=cache
- Main memory (dynamic RAM) – work area for
CPU
7
Secondary storage (Mass storage)
– CD-ROM
– Tapes
– Disks
Main memory database: entire database is
stored in main memory
8
Figure 16.1
(a) A single-sided disk with read/write hardware. (b) A
disk pack with read/write
Figure 16.2 Different sector organizations on disk. (a) Sectors
subtending a fixed angle. (b) Sectors maintaining a uniform
recording density.
Tracks
The part of a disk which passes under one read/write head
while the head is stationary. The number of tracks on a
disk surface therefore corresponds to the number of
different radial positions of the head(s). The collection of
all tracks on all surfaces at a given radial position is known
a cylinder and each track is divided into sectors.
11
Cylinder
• The set of tracks on a multi-headed disk that
may be accessed without head movement.
That is, the collection of disk tracks which
are the same distance from the spindle about
which the disks rotate.
12
Sector
• one sector lies within a continuous range of
rotational angle of the disk
13
Data transfer between main memory and disks
(in blocks)
Hardware Address of a block
– Surface number
– Track number
– Block number
• Time requires
– Seek time
– Rotational delay time (latency)
– Block transfer time
14
Table 16.2 (continued) Specifications of Typical HighEnd Enterprise Disks from Seagate (a) Seagate Enterprise
Performance 10 K HDD - 1200 GB
continued on next slide
Table 16.2 (continued) Specifications of Typical High-End
Enterprise Disks from Seagate (a) Seagate Enterprise
Performance 10 K HDD - 1200 GB
17
Figure 16.3
Interleaved concurrency versus parallel execution.
18
Figure 16.4
Use of two buffers, A and B, for reading
from disk.
19
Figure 16.5 Three record storage formats. (a) A fixed-length
record with six fields and size of 71 bytes. (b) A record with two
variable-length fields and three fixed-length fields. (c) A variablefield record with three types of separator characters.
20
Figure 16.6 Types of record organization. (a)
Unspanned. (b) Spanned.
Figure 16.7 Some blocks of an ordered (sequential) file
of EMPLOYEE records with Name as the ordering key field.
Table 16.3 Average Access Times for a File of
b Blocks under Basic File Organizations
File organization
• Heap file (unordered file)
place new records in no order at the end of the file
• Sorted file ( sequential file)
keeps the records ordered by the value of a particular
file
• Hashed file
Uses hash function applied to a field (hash key) to
determine a record’s placement on disk
• B-trees, B+ trees – use tree structure
24
Hashing techniques
Static hashing – hash address space is fixed
Extendible hashing
Linear hashing
25
Hashing algorithm
26
Hash Table (Wikipedia) http://en.wikipedia.org/wiki/Hash_table
27
28
Figure 16.8 Internal hashing data structures. (a) Array
of M positions for use in internal hashing. (b) Collision
resolution by chaining records.
Figure 16.9
Matching bucket numbers to disk
block addresses.
Figure 16.10 Handling overflow for
buckets by chaining.
Figure 16.11 Structure of the
extendible hashing scheme.
33
Figure 16.11 Structure of the
extendible hashing scheme.
Figure 16.12 Structure of the
dynamic hashing scheme.
Figure 16.13 Striping of data across multiple disks. (a)
Bit-level striping across four disks. (b) Block-level striping
across four disks.
Figure 16.14 Some popular levels of RAID. (a) RAID
level 1: Mirroring of data on two disks. (b) RAID level 5:
Striping of data with distributed parity across four disks.
38
• A search tree of order p is a tree such that each
node contains at most p - 1 search values and p
pointers in the order < P1, K1, P2, K2, ..., Pq-1, Kq-1,
Pq >, where q 1 p; each Pi is a pointer to a child node
(or a null pointer); and each Ki is a search value
from some ordered set of values.
39
40
41
B tree of order p
1.
Each internal node in the B-tree is of the form
<P1, <K1, Pr1> , P2, <K2, Pr2> , ..., <Kq-1,Prq-1> , Pq>
where q 1 p. Each Pi is a tree pointer—a pointer to another node in the B-tree. Each
Pri is a data pointer —a pointer to the record whose search key field value is equal
to Ki (or to the data file block containing that record).
2.
Within each node, K1 <K2 < ... < Kq-1.
3.
For all search key field values X in the subtree pointed at by Pi (the ith subtree, see
Figure 06.10a), we have:
Ki-1 < X < Ki for 1 < i < q; X < Ki for i = 1; and Ki-1 < X for i = q.
Each node has at most p tree pointers.
Each node, except the root and leaf nodes, has at least (p/2) tree pointers. The root
node has at least two tree pointers unless it is the only node in the tree.
A node with q tree pointers, q 1 p, has q - 1 search key field values (and hence has q 1 data pointers).
All leaf nodes are at the same level. Leaf nodes have the same structure as internal
nodes except that all of their tree pointers Pi are null.
4.
5.
6.
7.
42
EXAMPLE 5: Suppose that the search field of Example 4 is a nonordering key
field, and we construct a B-tree on this field. Assume that each node of the B-tree
is 69 percent full. Each node, on the average, will have p * 0.69 = 23 * 0.69 or
approximately 16 pointers and, hence, 15 search key field values. The average
fan-out fo =16. We can start at the root and see how many values and pointers can
exist, on the average, at each subsequent level:
Root:
Level 1:
Level 2:
Level 3:
Level 4:
1 node
16 nodes
256 nodes
4096 nodes
65536 nodes
15 entries
240 entries
3840 entries
61,440 entries
983,040 entries
16 pointers
256 pointers
4096 pointers
65536 pointers
43
B+ Trees
The structure of the internal nodes of a B+tree of order p is as follows:
1. Each internal node is of the form <P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq>
where q 1 p and each Pi is a tree pointer.
2. Within each internal node, K1 < K2 < ... <Kq-1.
3. For all search field values X in the subtree pointed at by Pi, we have
Ki-1 < X 1 Ki for 1 < i < q; X 1 Ki for i = 1; and Ki-1 < X for i = q.
4. Each internal node has at most p tree pointers.
5. Each internal node, except the root, has at least (p/2) tree pointers.
The root node has at least two tree pointers if it is an internal node.
6. An internal node with q pointers, q 1 p, has q - 1 search field values.
44
The structure of the leaf nodes of a B+-tree of
order p (Figure 14.11b) is as follows:
1.
Each leaf node is of the form
<<K1, Pr1> , <K2, Pr2>, ..., <Kq-1, Prq-1>, Pnext>
2.
3.
4.
5.
Where q 1 p, each Pri is a data pointer, and Pnext points to the next leaf
node of the B+-tree.
Within each leaf node, K1 < K2 < ... < Kq-1, q 1 p.
Each Pri is a data pointer that points to the record whose search field
value is Ki or to a file block containing the record (or to a block of
record pointers that point to records whose search field value is Ki if
the search field is not a key).
Each leaf node has at least (p/2) values.
All leaf nodes are at the same level.
45
46
47
EXAMPLE 7: Suppose that we construct a B+-tree on the field of Example 6. To calculate
the approximate number of entries of the B+-tree, we assume that each node is 69 percent
full. On the average, each internal node will have 34 * 0.69 or approximately 23 pointers,
and hence 22 values. Each leaf node, on the average, will hold 0.69 * pleaf = 0.69 * 31 or
approximately 21 data record pointers. A B+-tree will have the following average number of
entries at each level:
Root:
Level 1:
Level 2:
Level 3:
Level 4:
1 node
23 nodes
529 nodes
12,167nodes
279,841 nodes
22 entries
506 entries
11,638 entries
255,507entries
5,876,661entries
23 pointers
529 pointers
12,167 pointers
279,841 pointers
48
49
50
51
52