Download data disk

Document related concepts

Extensible Storage Engine wikipedia , lookup

Database model wikipedia , lookup

Transcript
SECONDARY STORAGE
MANAGEMENT
SECTIONS 13.1 – 13.3
Sanuja Dabade & Eilbroun Benjamin
CS 257 – Dr. TY Lin
Presentation Outline

13.1 The Memory Hierarchy
13.1.1 The Memory Hierarchy
 13.1.2 Transfer of Data Between Levels
 13.1.3 Volatile and Nonvolatile Storage
 13.1.4 Virtual Memory


13.2 Disks
13.2.1 Mechanics of Disks
 13.2.2 The Disk Controller
 13.2.3 Disk Access Characteristics

Presentation Outline (con’t)

13.3 Accelerating Access to Secondary
Storage
13.3.1 The I/O Model of Computation
 13.3.2 Organizing Data by Cylinders
 13.3.3 Using Multiple Disks
 13.3.4 Mirroring Disks
 13.3.5 Disk Scheduling and the Elevator
Algorithm
 13.3.6 Prefetching and Large-Scale
Buffering

13.1.1 Memory Hierarchy



Several components for data storage having
different data capacities available
Cost per byte to store data also varies
Device with smallest capacity offer the fastest
speed with highest cost per bit
Memory Hierarchy Diagram
Programs,
DBMS
Main Memory DBMS’s
As Visual Memory
Tertiary Storage
Disk
Main Memory
Cache
File System
13.1.1 Memory Hierarchy

Cache
 Lowest
level of the hierarchy
 Data items are copies of certain locations of main
memory
 Sometimes, values in cache are changed and
corresponding changes to main memory are delayed
 Machine looks for instructions as well as data for those
instructions in the cache
 Holds limited amount of data
13.1.1 Memory Hierarchy (con’t)


No need to update the data in main memory
immediately in a single processor computer
In multiple processors data is updated immediately
to main memory….called as write through
Main Memory


Everything happens in the computer i.e. instruction
execution, data manipulation, as working on
information that is resident in main memory
Main memories are random access….one can
obtain any byte in the same amount of time
Secondary storage



Used to store data and programs when they are not
being processed
More permanent than main memory, as data and
programs are retained when the power is turned
off
E.g. magnetic disks, hard disks
Tertiary Storage


Holds data volumes in terabytes
Used for databases much larger than what can be
stored on disk
13.1.2 Transfer of Data Between levels




Data moves between adjacent levels of the
hierarchy
At the secondary or tertiary levels accessing the
desired data or finding the desired place to store
the data takes a lot of time
Disk is organized into bocks
Entire blocks are moved to and from memory called
a buffer
13.1.2 Transfer of Data Between level
(cont’d)


A key technique for speeding up database
operations is to arrange the data so that when one
piece of data block is needed it is likely that other
data on the same block will be needed at the same
time
Same idea applies to other hierarchy levels
13.1.3 Volatile and Non Volatile
Storage



A volatile device forgets what data is stored on it
after power off
Non volatile holds data for longer period even
when device is turned off
All the secondary and tertiary devices are non
volatile and main memory is volatile
13.1.4 Virtual Memory



Typical software executes in virtual memory
Address space is typically 32 bit or 232 bytes or
4GB
Transfer between memory and disk is in terms of
blocks
13.2.1 Mechanism of Disk

Mechanisms of Disks
 Use
of secondary storage is one of the important
characteristic of DBMS
 Consists of 2 moving pieces of a disk
 1.
disk assembly
 2. head assembly
 Disk
assembly consists of 1 or more platters
 Platters rotate around a central spindle
 Bits are stored on upper and lower surfaces of platters
13.2.1 Mechanism of Disk




Disk is organized into tracks
The track that are at fixed radius from center form
one cylinder
Tracks are organized into sectors
Tracks are the segments of circle separated by gap
13.2.2 Disk Controller


One or more disks are controlled by disk controllers
Disks controllers are capable of
 Controlling
the mechanical actuator that moves the
head assembly
 Selecting the sector from among all those in the cylinder
at which heads are positioned
 Transferring bits between desired sector and main
memory
 Possible buffering an entire track
13.2.3 Disk Access Characteristics

Accessing (reading/writing) a block requires 3 steps
 Disk
controller positions the head assembly at the
cylinder containing the track on which the block is
located. It is a ‘seek time’
 The disk controller waits while the first sector of the
block moves under the head. This is a ‘rotational
latency’
 All the sectors and the gaps between them pass the
head, while disk controller reads or writes data in these
sectors. This is a ‘transfer time’
13.3 Accelerating Access to Secondary
Storage

Several approaches for more-efficiently accessing
data in secondary storage:
Place blocks that are together in the same cylinder.
 Divide the data among multiple disks.
 Mirror disks.
 Use disk-scheduling algorithms.
 Prefetch blocks into main memory.



Scheduling Latency – added delay in accessing
data caused by a disk scheduling algorithm.
Throughput – the number of disk accesses per
second that the system can accommodate.
13.3.1 The I/O Model of Computation

The number of block accesses (Disk I/O’s) is a good
time approximation for the algorithm.


This should be minimized.
Ex 13.3: You want to have an index on R to identify
the block on which the desired tuple appears, but
not where on the block it resides.
For Megatron 747 (M747) example, it takes 11ms to
read a 16k block.
 A standard microprocessor can execute millions of
instruction in 11ms, making any delay in searching for
the desired tuple negligible.

13.3.2 Organizing Data by Cylinders


If we read all blocks on a single track or cylinder
consecutively, then we can neglect all but first seek
time and first rotational latency.
Ex 13.4: We request 1024 blocks of M747.
If data is randomly distributed, average latency is
10.76ms by Ex 13.2, making total latency 11s.
 If all blocks are consecutively stored on 1 cylinder:

 6.46ms
+ 8.33ms * 16 = 139ms
(1 average seek)
(time per rotation)
(# rotations)
13.3.3 Using Multiple Disks


If we have n disks, read/write performance will
increase by a factor of n.
Striping – distributing a relation across multiple disks
following this pattern:




Data on disk R1: R1, R1+n, R1+2n,…
Data on disk R2: R2, R2+n, R2+2n,…
…
Data on disk Rn: Rn, Rn+n, Rn+2n, …
Ex 13.5: We request 1024 blocks with n = 4.

6.46ms + (8.33ms * (16/4)) = 39.8ms
(1 average seek)
(time per rotation)
(# rotations)
13.3.4 Mirroring Disks




Mirroring Disks – having 2 or more disks hold
identical copied of data.
Benefit 1: If n disks are mirrors of each other, the
system can survive a crash by n-1 disks.
Benefit 2: If we have n disks, read performance
increases by a factor of n.
Performance increases further by having the
controller select the disk which has its head closest
to desired data block for each read.
13.3.5 Disk Scheduling and the
Elevator Problem


Disk controller will run this algorithm to select which
of several requests to process first.
Pseudo code:
requests[] // array of all non-processed data requests
 upon receiving new data request:

 requests[].add(new

request)
while(requests[] is not empty)
 move
head to next location
 if(head location is at data in requests[])


retrieve data
remove data from requests[]
 if(head

reaches end)
reverse head direction
13.3.5 Disk Scheduling and the
Elevator Problem (con’t)
Events:
Head starting point
Request data at 8000
Request data at 24000
Request data at 56000
Get data at 8000
Request data at 16000
Get data at 24000
Request data at 64000
Get data at 56000
Request Data at 40000
Get data at 64000
Get data at 40000
Get data at 16000
64000
56000
48000
40000
32000
24000
16000
8000
Current time
13.6
26.9
34.2
45.5
56.8
4.3
10
20
30
0
data
time
8000..
4.3
24000..
13.6
56000..
26.9
64000..
34.2
40000..
45.5
16000..
56.8
13.3.5 Disk Scheduling and the
Elevator Problem (con’t)
Elevator
Algorithm
FIFO
Algorithm
data
time
data
time
8000..
4.3
8000..
4.3
24000..
13.6
24000..
13.6
56000..
26.9
56000..
26.9
64000..
34.2
16000..
42.2
40000..
45.5
64000..
59.5
16000..
56.8
40000..
70.8
13.3.6 Prefetching and Large-Scale
Buffering

If at the application level, we can predict the order
blocks will be requested, we can load them into
main memory before they are needed.
DATA REPRESENTATION
Recovery from Disk Crashes – 13.4
Presented By:
Deepti Bhardwaj
Roll No. 223_103
SJSU ID: 006521307
Contents
•
•
•
•
•
13.4.5 Recovery from Disk Crashes
13.4.6 Mirroring as a Redundancy Technique
13.4.7 Parity Blocks
13.4.8 An Improving: RAID 5
13.4.9 Coping With Multiple Disk Crashers
Recovery from Disk Crashes: Ways to
recover data


The most serious mode of failure for disks is “head crash”
where data permanently destroyed.
So to reduce the risk of data loss by disk crashes there are
number of schemes which are know as RAID (Redundant
Arrays of Independent Disks) schemes.
Continue : Recovery from Disk Crashes:
Ways to recover data

Each of the schemes starts with one or more disks that hold the
data and adding one or more disks that hold information that is
completely determined by the contents of the data disks called
Redundant Disk.
Mirroring as a Redundancy Technique




Mirroring Scheme is referred as RAID level 1 protection against
data loss scheme.
In this scheme we mirror each disk.
One of the disk is called as data disk and other redundant
disk.
In this case the only way data can be lost is if there is a second
disk crash while the first crash is being repaired.
Parity Blocks



RAID level 4 scheme uses only one redundant disk no matter
how many data disks there are.
In the redundant disk, the ith block consists of the parity
checks for the ith blocks of all the data disks.
It means, the jth bits of all the ith blocks of both data disks
and redundant disks, must have an even number of 1’s and
redundant disk bit is used to make this condition true.
Parity Blocks – Reading disk
Reading data disk is same as reading block from
any disk.
•
We could read block from each of the other disks and compute the block of
the disk we want to read by taking the modulo-2 sum.
disk 2: 10101010
disk 3: 00111000
disk 4: 01100010
If we take the modulo-2 sum of the bits in each column, we get
disk 1: 11110000
Parity Block - Writing
•
•
When we write a new block of a data disk, we need to
change that block of the redundant disk as well.
One approach to do this is to read all the disks and compute
the module-2 sum and write to the redundant disk.
But this approach requires n-1 reads of data, write a data
block and write of redundant disk block.
Total = n+1 disk I/Os
Continue : Parity Block - Writing
•
Better approach will require only four disk I/Os
1. Read the old value of the data block being changed.
2. Read the corresponding block of the redundant disk.
3. Write the new data block.
4. Recalculate and write the block of the redundant disk.
Parity Blocks – Failure Recovery
If any of the data disk crashes then we just have to compute the module-2 sum
to recover the disk.
Suppose that disk 2 fails. We need to re compute each block of the
replacement disk. We are given the corresponding blocks of the
first and third data disks and the redundant disk, so the situation looks like:
disk 1: 11110000
disk 2: ????????
disk 3: 00111000
disk 4: 01100010
If we take the modulo-2 sum of each column, we deduce that the missing block
of disk 2 is : 10101010
An Improvement: RAID 5
•
•
•
RAID 4 is effective in preserving data unless there are two
simultaneous disk crashes.
Whatever scheme we use for updating the disks, we need to read
and write the redundant disk's block. If there are n data disks, then
the number of disk writes to the redundant disk will be n times the
average number of writes to any one data disk.
However we do not have to treat one disk as the redundant disk and
the others as data disks. Rather, we could treat each disk as the
redundant disk for some of the blocks. This improvement is often
called RAID level 5.
Continue : An Improvement: RAID 5
For instance, if there are n + 1 disks numbered 0
through n, we could treat the ith cylinder of disk j as
redundant if j is the remainder when i is divided by n+1.
•
For example, n = 3 so there are 4 disks. The first disk,
numbered 0, is redundant for its cylinders numbered 4,
8, 12, and so on, because these are the numbers that
leave remainder 0 when divided by 4.
•
The disk numbered 1 is redundant for blocks numbered
1, 5, 9, and so on; disk 2 is redundant for blocks 2, 6.
10,. . ., and disk 3 is redundant for 3, 7, 11,. . . .
•
Coping With Multiple Disk Crashes
•
•
Error-correcting codes theory known as Hamming code leads
to the RAID level 6.
By this strategy the two simultaneous crashes are correctable.



The bits of disk 5 are the modulo-2 sum of the corresponding
bits of disks 1, 2, and 3.
The bits of disk 6 are the modulo-2 sum of the corresponding
bits of disks 1, 2, and 4.
The bits of disk 7 are the module2 sum of the corresponding bits
of disks 1, 3, and 4
Coping With Multiple Disk Crashes –
Reading/Writing
•
•
We may read data from any data disk normally.
To write a block of some data disk, we compute the modulo-2
sum of the new and old versions of that block. These bits are
then added, in a modulo-2 sum, to the corresponding blocks of
all those redundant disks that have 1 in a row in which the
written disk also has 1.
UPDATES


Stable Storage: To recover the disk failure known as Media
Decay,in which if we overwrite a file, the new data is not read
correctly
RAID 5 need: Shortcoming of RAID level 4: suffers from a
bottleneck defect (when updating data disk need to read and
write the redundant disk);
13.6 REPRESENTING BLOCK
AND RECORD ADDRESSES
Ramya Karri
CS257 Section 2
ID: 206
Introduction

Address of a block and Record
 In
Main Memory
Address of the block is the virtual memory address of the
first byte
 Address of the record within the block is the virtual
memory address of the first byte of the record

 In
Secondary Memory: sequence of bytes describe
the location of the block in the overall system

Sequence of Bytes describe the location of the
block : the device Id for the disk, Cylinder
number, etc.
Addresses in Client-Server Systems

The addresses in address space are
represented in two ways
 Physical
Addresses: byte strings that determine the
place within the secondary storage system where
the record can be found.
 Logical Addresses: arbitrary string of bytes of
some fixed length

Physical Address bits are used to indicate:
 Host
to which the storage is attached
 Identifier for the disk
 Number of the cylinder
ADDRESSES IN CLIENT-SERVER SYSTEMS (CONTD..)
 Map Table relates logical addresses to
physical addresses.
Logical
Physical
Logical Address
Physical Address
Logical and Structured Addresses


Purpose of logical address?
Gives more flexibility, when we
 Move
the record around within the block
 Move the record to another block

Gives us an option of Unused
deciding what to do
when a record is deleted?
Recor Recor
d4
Offset table
Header
d3
Recor Recor
d2
d1
Pointer Swizzling



Having pointers is common in an objectrelational database systems
Important to learn about the management of
pointers
Every data item (block, record, etc.) has two
addresses:
database address: address on the disk
 memory address, if the item is in virtual memory

Pointer Swizzling (Contd…)

Translation Table: Maps database address to
memory address
Dbaddr
Mem-addr
Database address
Memory Address

All addressable items in the database have
entries in the map table, while only those items
Pointer Swizzling (Contd…)

Pointer consists of the following two fields

Bit indicating the type of address

Database or memory address

Example 13.17
Disk
Memory
Swizzled
Block 1
Block 1
Unswizzled
Block 2
Example 13.7


Block 1 has a record with pointers to a second
record on the same block and to a record on
another block
If Block 1 is copied to the memory
 The
first pointer which points within Block 1 can be
swizzled so it points directly to the memory
address of the target record
 Since Block 2 is not in memory, we cannot swizzle
the second pointer
Pointer Swizzling (Contd…)

Three types of swizzling
 Automatic
Swizzling
 As
soon as block is brought into memory, swizzle all
relevant pointers.
 Swizzling
on Demand
 Only
swizzle a pointer if and when it is actually
followed.

No Swizzling
 Pointers
are not swizzled they are accesses using the
database address.
Programmer Control of Swizzling

Unswizzling
 When
a block is moved from memory back to
disk, all pointers must go back to database (disk)
addresses
 Use translation table again
 Important to have an efficient data structure for
the translation table
Pinned records and Blocks


A block in memory is said to be pinned if it
cannot be written back to disk safely.
If block B1 has swizzled pointer to an item in
block B2, then B2 is pinned
 Unpin
a block, we must unswizzle any pointers to it
 Keep in the translation table the places in memory
holding swizzled pointers to that item
 Unswizzle those pointers (use translation table to
replace the memory addresses with database
(disk) addresses
CS 255: Database System
Principles
slides: Variable length data and
record
By:- Arunesh Joshi( 107)
Id:-006538558
Cs257_107_ch13_13.7
Agenda
•Records With Variable-Length Fields
•Records With Repeating Fields
•Variable-Format Records
•Records That Do Not Fit in a Block
•BLOBs
•Column Stores
Records With Variable-Length Fields
A simple but effective scheme is to put all fixed length
fields ahead of the variable-length fields. We then place
in the record header:
1. The length of the record.
2. Pointers to (i.e., offsets of) the beginnings of all the
variable-length fields. However, if the variable-length
fields always appear in the same order then the first of
them needs no pointer; we know it immediately follows
the fixed-length fields.
Records With Repeating Fields
A similar situation occurs if a record contains a variable
number of Occurrences of a field F, but the field itself is of
fixed length. It is sufficient to group all occurrences of field F
together and put in the record header a pointer to the first.
 We can locate all the occurrences of the field F as follows.
Let the number of bytes devoted to one instance of field F be
L. We then add to the offset for the field F all integer
multiples of L, starting at 0, then L, 2L, 3L, and so on.
 Eventually, we reach the offset of the field following F.
Where upon we stop.



An alternative representation is to keep the record of
fixed length, and put the variable length portion - be
it fields of variable length or fields that repeat an
indefinite number of times - on a separate block. In
the record itself we keep:
1. Pointers to the place where each repeating field
begins, and
2. Either how many repetitions there are, or where the
repetitions end.
Storing variable-length fields separately from
the record
Variable-Format Records
The simplest representation of variable-format records
is a sequence of tagged fields, each of which consists of:
1. Information about the role of this field, such as:
(a) The attribute or field name,
(b) The type of the field, if it is not apparent from the
field name and some readily available schema
information, and
(c) The length of the field, if it is not apparent from the
type.
2. The value of the field.

There are at least two reasons why tagged fields would make
sense.
1.
Information integration applications - Sometimes, a
relation has been constructed from several earlier sources,
and these sources have different kinds of information For
instance, our movie star information may have come from
several sources, one of which records birthdates, some give
addresses, others not, and so on. If there are not too many
fields, we are probably best off leaving NULL those
values we do not know.
2. Records with a very flexible schema - If many fields of a
record can repeat and/or not appear at all, then even if
we know the schema, tagged fields may be useful. For
instance, medical records may contain information about
many tests, but there are thousands of possible tests, and
each patient has results for relatively few of them.
A record with tagged fields
Records That Do Not Fit in a Block


These large values have a variable length, but even if the
length is fixed for all values of the type, we need to use
some special techniques to represent these values. In this
section we shall consider a technique called “spanned
records" that can be used to manage records that are
larger than blocks.
Spanned records also are useful in situations where records
are smaller than blocks, but packing whole records into
blocks wastes significant amounts of space.
For both these reasons, it is sometimes desirable to allow
records to be split across two or more blocks. The portion of
a record that appears in one block is called a record
fragment.
If records can be spanned, then every record and record
fragment requires some extra header information:
1. Each record or fragment header must contain a bit telling
whether or not it is a fragment.
2. If it is a fragment, then it needs bits telling whether it is the first
or last fragment for its record.
3. If there is a next and/or previous fragment for the same
record, then the fragment needs pointers to these other
fragments.
Storing spanned records across blocks
BLOBS
• Binary, Large OBjectS = BLOBS
• BLOBS can be images, movies, audio files and other very large
values that can be stored in files.
• Storing BLOBS
– Stored in several blocks.
– Preferable to store them consecutively on a cylinder or multiple
disks for efficient retrieval.
• Retrieving BLOBS
– A client retrieving a 2 hour movie may not want it all at the
same time.
– Retrieving a specific part of the large data requires an index
structure to make it efficient. (Example: An index by seconds on
a movie BLOB.)
Column Stores
An alternative to storing tuples as records is to store
each column as a record. Since an entire column of
a relation may occupy far more than a single block,
these records may span many block, much as long
as files do. If we keep the values in each column in
the same order then we can reconstruct the relation
from column records
Consider this relation
RECORD MODIFICATION
AKSHAY SHENOY
CLASS ID :108
Topic 13.8

Proffesor : T.Y Lin

INTRODUCTION
•

•

What is Record ?
Record is a single, implicitly structured data item
in the database table. Record is also called as
Tuple.
What is definition of Record Modification ?
We say Records Modified when a data
manipulation operation is performed.
STRUCTURE OF A RECORD
•
•
RECORD STRUCTURE FOR A PERSON TABLE
CREATE TABLE PERSON ( NAME CHAR(30), ADDRESS CHAR(256) , GENDER
CHAR(1), BIRTHDATE CHAR(10));
TYPES OF RECORDS
•
FIXED LENGTH RECORDS

CREATE TABLE SJSUSTUDENT(STUDENT_ID

INT(9) NOT NULL , PHONE_NO INT(10) NOT NULL);
•
VARIABLE LENGTH RECORDS

CREATE TABLE SJSUSTUDENT(STUDENT_ID INT(9) NOT NULL,

NAME CHAR(100) ,ADDRESS CHAR(100) ,PHONE_NO INT(10) NOT NULL);
RECORD MODIFICATION
•



•
•
Modification of Record
Insert
Update
Delete
Issues even with Fixed Length Records
More Issues with Variable Length Records
STRUCTURE OF A BLOCK &
RECORDS
•

Various Records are clubbed together and stored
together in memory in blocks
STRUCTURE OF BLOCK
BLOCKS & RECORDS
•
•
If records need not be any particular order, then
just find a block with enough empty space
We keep track of all records/tuples in a
relation/tables using Index structures, File
organization concepts
Inserting New Records
•

•
•
If Records are not required to be a particular order,
just find an empty block and place the record in the
block.
eg: Heap Files
What if the Records are to be Kept in a particular
Order(eg: sorted by primary key) ?
Locate appropriate block,check if space is
available in the block if yes place the record in the
block.
INSERTING NEW RECORDS
•
We may have to slide the Records in the Block to
place the Record at an appropriate place in the
Block and suitably edit the block header.
What If The Block Is Full ?
•
•
•
I.
II.
We need to Keep the record in a particular block
but the block is full. How do we deal with it ?
We find room outside the Block
There are 2 approaches to finding the room for the
record.
Find Space on Nearby Block
Create an Overflow Block
APPROACHES TO FINDING ROOM
FOR RECORD
•



FIND SPACE ON NEARBY BLOCK
BLOCK B1 HAS NO SPACE
IF SPACE AVAILABLE ON BLOCK B2 MOVE
RECORDS OF B1 TO B2.
IF THERE ARE EXTERNAL POINTERS TO RECORDS
OF B1 MOVED TO B2 LEAVE FORWARDING
ADDRESS IN OFFSET TABLE OF B1
APPROACHES TO FINDING ROOM
FOR RECORD
•

CREATE OVERFLOW BLOCK
EACH BLOCK B HAS IN ITS HEADER POINTER TO
AN OVERFLOW BLOCK WHERE ADDITIONAL
BLOCKS OF B CAN BE PLACED.
DELETION
•
•
•
Try to reclaim the space available on a record after
deletion of a particular record
If an offset table is used for storing information
about records for the block then rearrange/slide
the remaining records.
If Sliding of records is not possible then maintain a
SPACE-AVAILABLE LIST to keep track of space
available on the Record.
TOMBSTONE
•
•
•
•
What about pointer to deleted records ?
A tombstone is placed in place of each deleted
record
A tombstone is a bit placed at first byte of deleted
record to indicate the record was deleted ( 0 – Not
Deleted 1 – Deleted)
A tombstone is permanent
UPDATING RECORDS
•
•
For Fixed-Length Records, there is no effect on the
storage system
For variable length records :
•
•
If length increases, like insertion “slide the records”
If length decreases, like deletion we update the spaceavailable list, recover the space/eliminate the overflow
blocks.