Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SECONDARY STORAGE MANAGEMENT SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin Presentation Outline 13.1 The Memory Hierarchy 13.1.1 The Memory Hierarchy 13.1.2 Transfer of Data Between Levels 13.1.3 Volatile and Nonvolatile Storage 13.1.4 Virtual Memory 13.2 Disks 13.2.1 Mechanics of Disks 13.2.2 The Disk Controller 13.2.3 Disk Access Characteristics Presentation Outline (con’t) 13.3 Accelerating Access to Secondary Storage 13.3.1 The I/O Model of Computation 13.3.2 Organizing Data by Cylinders 13.3.3 Using Multiple Disks 13.3.4 Mirroring Disks 13.3.5 Disk Scheduling and the Elevator Algorithm 13.3.6 Prefetching and Large-Scale Buffering 13.1.1 Memory Hierarchy Several components for data storage having different data capacities available Cost per byte to store data also varies Device with smallest capacity offer the fastest speed with highest cost per bit Memory Hierarchy Diagram Programs, DBMS Main Memory DBMS’s As Visual Memory Tertiary Storage Disk Main Memory Cache File System 13.1.1 Memory Hierarchy Cache Lowest level of the hierarchy Data items are copies of certain locations of main memory Sometimes, values in cache are changed and corresponding changes to main memory are delayed Machine looks for instructions as well as data for those instructions in the cache Holds limited amount of data 13.1.1 Memory Hierarchy (con’t) No need to update the data in main memory immediately in a single processor computer In multiple processors data is updated immediately to main memory….called as write through Main Memory Everything happens in the computer i.e. instruction execution, data manipulation, as working on information that is resident in main memory Main memories are random access….one can obtain any byte in the same amount of time Secondary storage Used to store data and programs when they are not being processed More permanent than main memory, as data and programs are retained when the power is turned off E.g. magnetic disks, hard disks Tertiary Storage Holds data volumes in terabytes Used for databases much larger than what can be stored on disk 13.1.2 Transfer of Data Between levels Data moves between adjacent levels of the hierarchy At the secondary or tertiary levels accessing the desired data or finding the desired place to store the data takes a lot of time Disk is organized into bocks Entire blocks are moved to and from memory called a buffer 13.1.2 Transfer of Data Between level (cont’d) A key technique for speeding up database operations is to arrange the data so that when one piece of data block is needed it is likely that other data on the same block will be needed at the same time Same idea applies to other hierarchy levels 13.1.3 Volatile and Non Volatile Storage A volatile device forgets what data is stored on it after power off Non volatile holds data for longer period even when device is turned off All the secondary and tertiary devices are non volatile and main memory is volatile 13.1.4 Virtual Memory Typical software executes in virtual memory Address space is typically 32 bit or 232 bytes or 4GB Transfer between memory and disk is in terms of blocks 13.2.1 Mechanism of Disk Mechanisms of Disks Use of secondary storage is one of the important characteristic of DBMS Consists of 2 moving pieces of a disk 1. disk assembly 2. head assembly Disk assembly consists of 1 or more platters Platters rotate around a central spindle Bits are stored on upper and lower surfaces of platters 13.2.1 Mechanism of Disk Disk is organized into tracks The track that are at fixed radius from center form one cylinder Tracks are organized into sectors Tracks are the segments of circle separated by gap 13.2.2 Disk Controller One or more disks are controlled by disk controllers Disks controllers are capable of Controlling the mechanical actuator that moves the head assembly Selecting the sector from among all those in the cylinder at which heads are positioned Transferring bits between desired sector and main memory Possible buffering an entire track 13.2.3 Disk Access Characteristics Accessing (reading/writing) a block requires 3 steps Disk controller positions the head assembly at the cylinder containing the track on which the block is located. It is a ‘seek time’ The disk controller waits while the first sector of the block moves under the head. This is a ‘rotational latency’ All the sectors and the gaps between them pass the head, while disk controller reads or writes data in these sectors. This is a ‘transfer time’ 13.3 Accelerating Access to Secondary Storage Several approaches for more-efficiently accessing data in secondary storage: Place blocks that are together in the same cylinder. Divide the data among multiple disks. Mirror disks. Use disk-scheduling algorithms. Prefetch blocks into main memory. Scheduling Latency – added delay in accessing data caused by a disk scheduling algorithm. Throughput – the number of disk accesses per second that the system can accommodate. 13.3.1 The I/O Model of Computation The number of block accesses (Disk I/O’s) is a good time approximation for the algorithm. This should be minimized. Ex 13.3: You want to have an index on R to identify the block on which the desired tuple appears, but not where on the block it resides. For Megatron 747 (M747) example, it takes 11ms to read a 16k block. A standard microprocessor can execute millions of instruction in 11ms, making any delay in searching for the desired tuple negligible. 13.3.2 Organizing Data by Cylinders If we read all blocks on a single track or cylinder consecutively, then we can neglect all but first seek time and first rotational latency. Ex 13.4: We request 1024 blocks of M747. If data is randomly distributed, average latency is 10.76ms by Ex 13.2, making total latency 11s. If all blocks are consecutively stored on 1 cylinder: 6.46ms + 8.33ms * 16 = 139ms (1 average seek) (time per rotation) (# rotations) 13.3.3 Using Multiple Disks If we have n disks, read/write performance will increase by a factor of n. Striping – distributing a relation across multiple disks following this pattern: Data on disk R1: R1, R1+n, R1+2n,… Data on disk R2: R2, R2+n, R2+2n,… … Data on disk Rn: Rn, Rn+n, Rn+2n, … Ex 13.5: We request 1024 blocks with n = 4. 6.46ms + (8.33ms * (16/4)) = 39.8ms (1 average seek) (time per rotation) (# rotations) 13.3.4 Mirroring Disks Mirroring Disks – having 2 or more disks hold identical copied of data. Benefit 1: If n disks are mirrors of each other, the system can survive a crash by n-1 disks. Benefit 2: If we have n disks, read performance increases by a factor of n. Performance increases further by having the controller select the disk which has its head closest to desired data block for each read. 13.3.5 Disk Scheduling and the Elevator Problem Disk controller will run this algorithm to select which of several requests to process first. Pseudo code: requests[] // array of all non-processed data requests upon receiving new data request: requests[].add(new request) while(requests[] is not empty) move head to next location if(head location is at data in requests[]) retrieve data remove data from requests[] if(head reaches end) reverse head direction 13.3.5 Disk Scheduling and the Elevator Problem (con’t) Events: Head starting point Request data at 8000 Request data at 24000 Request data at 56000 Get data at 8000 Request data at 16000 Get data at 24000 Request data at 64000 Get data at 56000 Request Data at 40000 Get data at 64000 Get data at 40000 Get data at 16000 64000 56000 48000 40000 32000 24000 16000 8000 Current time 13.6 26.9 34.2 45.5 56.8 4.3 10 20 30 0 data time 8000.. 4.3 24000.. 13.6 56000.. 26.9 64000.. 34.2 40000.. 45.5 16000.. 56.8 13.3.5 Disk Scheduling and the Elevator Problem (con’t) Elevator Algorithm FIFO Algorithm data time data time 8000.. 4.3 8000.. 4.3 24000.. 13.6 24000.. 13.6 56000.. 26.9 56000.. 26.9 64000.. 34.2 16000.. 42.2 40000.. 45.5 64000.. 59.5 16000.. 56.8 40000.. 70.8 13.3.6 Prefetching and Large-Scale Buffering If at the application level, we can predict the order blocks will be requested, we can load them into main memory before they are needed. DATA REPRESENTATION Recovery from Disk Crashes – 13.4 Presented By: Deepti Bhardwaj Roll No. 223_103 SJSU ID: 006521307 Contents • • • • • 13.4.5 Recovery from Disk Crashes 13.4.6 Mirroring as a Redundancy Technique 13.4.7 Parity Blocks 13.4.8 An Improving: RAID 5 13.4.9 Coping With Multiple Disk Crashers Recovery from Disk Crashes: Ways to recover data The most serious mode of failure for disks is “head crash” where data permanently destroyed. So to reduce the risk of data loss by disk crashes there are number of schemes which are know as RAID (Redundant Arrays of Independent Disks) schemes. Continue : Recovery from Disk Crashes: Ways to recover data Each of the schemes starts with one or more disks that hold the data and adding one or more disks that hold information that is completely determined by the contents of the data disks called Redundant Disk. Mirroring as a Redundancy Technique Mirroring Scheme is referred as RAID level 1 protection against data loss scheme. In this scheme we mirror each disk. One of the disk is called as data disk and other redundant disk. In this case the only way data can be lost is if there is a second disk crash while the first crash is being repaired. Parity Blocks RAID level 4 scheme uses only one redundant disk no matter how many data disks there are. In the redundant disk, the ith block consists of the parity checks for the ith blocks of all the data disks. It means, the jth bits of all the ith blocks of both data disks and redundant disks, must have an even number of 1’s and redundant disk bit is used to make this condition true. Parity Blocks – Reading disk Reading data disk is same as reading block from any disk. • We could read block from each of the other disks and compute the block of the disk we want to read by taking the modulo-2 sum. disk 2: 10101010 disk 3: 00111000 disk 4: 01100010 If we take the modulo-2 sum of the bits in each column, we get disk 1: 11110000 Parity Block - Writing • • When we write a new block of a data disk, we need to change that block of the redundant disk as well. One approach to do this is to read all the disks and compute the module-2 sum and write to the redundant disk. But this approach requires n-1 reads of data, write a data block and write of redundant disk block. Total = n+1 disk I/Os Continue : Parity Block - Writing • Better approach will require only four disk I/Os 1. Read the old value of the data block being changed. 2. Read the corresponding block of the redundant disk. 3. Write the new data block. 4. Recalculate and write the block of the redundant disk. Parity Blocks – Failure Recovery If any of the data disk crashes then we just have to compute the module-2 sum to recover the disk. Suppose that disk 2 fails. We need to re compute each block of the replacement disk. We are given the corresponding blocks of the first and third data disks and the redundant disk, so the situation looks like: disk 1: 11110000 disk 2: ???????? disk 3: 00111000 disk 4: 01100010 If we take the modulo-2 sum of each column, we deduce that the missing block of disk 2 is : 10101010 An Improvement: RAID 5 • • • RAID 4 is effective in preserving data unless there are two simultaneous disk crashes. Whatever scheme we use for updating the disks, we need to read and write the redundant disk's block. If there are n data disks, then the number of disk writes to the redundant disk will be n times the average number of writes to any one data disk. However we do not have to treat one disk as the redundant disk and the others as data disks. Rather, we could treat each disk as the redundant disk for some of the blocks. This improvement is often called RAID level 5. Continue : An Improvement: RAID 5 For instance, if there are n + 1 disks numbered 0 through n, we could treat the ith cylinder of disk j as redundant if j is the remainder when i is divided by n+1. • For example, n = 3 so there are 4 disks. The first disk, numbered 0, is redundant for its cylinders numbered 4, 8, 12, and so on, because these are the numbers that leave remainder 0 when divided by 4. • The disk numbered 1 is redundant for blocks numbered 1, 5, 9, and so on; disk 2 is redundant for blocks 2, 6. 10,. . ., and disk 3 is redundant for 3, 7, 11,. . . . • Coping With Multiple Disk Crashes • • Error-correcting codes theory known as Hamming code leads to the RAID level 6. By this strategy the two simultaneous crashes are correctable. The bits of disk 5 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 3. The bits of disk 6 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 4. The bits of disk 7 are the module2 sum of the corresponding bits of disks 1, 3, and 4 Coping With Multiple Disk Crashes – Reading/Writing • • We may read data from any data disk normally. To write a block of some data disk, we compute the modulo-2 sum of the new and old versions of that block. These bits are then added, in a modulo-2 sum, to the corresponding blocks of all those redundant disks that have 1 in a row in which the written disk also has 1. UPDATES Stable Storage: To recover the disk failure known as Media Decay,in which if we overwrite a file, the new data is not read correctly RAID 5 need: Shortcoming of RAID level 4: suffers from a bottleneck defect (when updating data disk need to read and write the redundant disk); 13.6 REPRESENTING BLOCK AND RECORD ADDRESSES Ramya Karri CS257 Section 2 ID: 206 Introduction Address of a block and Record In Main Memory Address of the block is the virtual memory address of the first byte Address of the record within the block is the virtual memory address of the first byte of the record In Secondary Memory: sequence of bytes describe the location of the block in the overall system Sequence of Bytes describe the location of the block : the device Id for the disk, Cylinder number, etc. Addresses in Client-Server Systems The addresses in address space are represented in two ways Physical Addresses: byte strings that determine the place within the secondary storage system where the record can be found. Logical Addresses: arbitrary string of bytes of some fixed length Physical Address bits are used to indicate: Host to which the storage is attached Identifier for the disk Number of the cylinder ADDRESSES IN CLIENT-SERVER SYSTEMS (CONTD..) Map Table relates logical addresses to physical addresses. Logical Physical Logical Address Physical Address Logical and Structured Addresses Purpose of logical address? Gives more flexibility, when we Move the record around within the block Move the record to another block Gives us an option of Unused deciding what to do when a record is deleted? Recor Recor d4 Offset table Header d3 Recor Recor d2 d1 Pointer Swizzling Having pointers is common in an objectrelational database systems Important to learn about the management of pointers Every data item (block, record, etc.) has two addresses: database address: address on the disk memory address, if the item is in virtual memory Pointer Swizzling (Contd…) Translation Table: Maps database address to memory address Dbaddr Mem-addr Database address Memory Address All addressable items in the database have entries in the map table, while only those items Pointer Swizzling (Contd…) Pointer consists of the following two fields Bit indicating the type of address Database or memory address Example 13.17 Disk Memory Swizzled Block 1 Block 1 Unswizzled Block 2 Example 13.7 Block 1 has a record with pointers to a second record on the same block and to a record on another block If Block 1 is copied to the memory The first pointer which points within Block 1 can be swizzled so it points directly to the memory address of the target record Since Block 2 is not in memory, we cannot swizzle the second pointer Pointer Swizzling (Contd…) Three types of swizzling Automatic Swizzling As soon as block is brought into memory, swizzle all relevant pointers. Swizzling on Demand Only swizzle a pointer if and when it is actually followed. No Swizzling Pointers are not swizzled they are accesses using the database address. Programmer Control of Swizzling Unswizzling When a block is moved from memory back to disk, all pointers must go back to database (disk) addresses Use translation table again Important to have an efficient data structure for the translation table Pinned records and Blocks A block in memory is said to be pinned if it cannot be written back to disk safely. If block B1 has swizzled pointer to an item in block B2, then B2 is pinned Unpin a block, we must unswizzle any pointers to it Keep in the translation table the places in memory holding swizzled pointers to that item Unswizzle those pointers (use translation table to replace the memory addresses with database (disk) addresses CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id:-006538558 Cs257_107_ch13_13.7 Agenda •Records With Variable-Length Fields •Records With Repeating Fields •Variable-Format Records •Records That Do Not Fit in a Block •BLOBs •Column Stores Records With Variable-Length Fields A simple but effective scheme is to put all fixed length fields ahead of the variable-length fields. We then place in the record header: 1. The length of the record. 2. Pointers to (i.e., offsets of) the beginnings of all the variable-length fields. However, if the variable-length fields always appear in the same order then the first of them needs no pointer; we know it immediately follows the fixed-length fields. Records With Repeating Fields A similar situation occurs if a record contains a variable number of Occurrences of a field F, but the field itself is of fixed length. It is sufficient to group all occurrences of field F together and put in the record header a pointer to the first. We can locate all the occurrences of the field F as follows. Let the number of bytes devoted to one instance of field F be L. We then add to the offset for the field F all integer multiples of L, starting at 0, then L, 2L, 3L, and so on. Eventually, we reach the offset of the field following F. Where upon we stop. An alternative representation is to keep the record of fixed length, and put the variable length portion - be it fields of variable length or fields that repeat an indefinite number of times - on a separate block. In the record itself we keep: 1. Pointers to the place where each repeating field begins, and 2. Either how many repetitions there are, or where the repetitions end. Storing variable-length fields separately from the record Variable-Format Records The simplest representation of variable-format records is a sequence of tagged fields, each of which consists of: 1. Information about the role of this field, such as: (a) The attribute or field name, (b) The type of the field, if it is not apparent from the field name and some readily available schema information, and (c) The length of the field, if it is not apparent from the type. 2. The value of the field. There are at least two reasons why tagged fields would make sense. 1. Information integration applications - Sometimes, a relation has been constructed from several earlier sources, and these sources have different kinds of information For instance, our movie star information may have come from several sources, one of which records birthdates, some give addresses, others not, and so on. If there are not too many fields, we are probably best off leaving NULL those values we do not know. 2. Records with a very flexible schema - If many fields of a record can repeat and/or not appear at all, then even if we know the schema, tagged fields may be useful. For instance, medical records may contain information about many tests, but there are thousands of possible tests, and each patient has results for relatively few of them. A record with tagged fields Records That Do Not Fit in a Block These large values have a variable length, but even if the length is fixed for all values of the type, we need to use some special techniques to represent these values. In this section we shall consider a technique called “spanned records" that can be used to manage records that are larger than blocks. Spanned records also are useful in situations where records are smaller than blocks, but packing whole records into blocks wastes significant amounts of space. For both these reasons, it is sometimes desirable to allow records to be split across two or more blocks. The portion of a record that appears in one block is called a record fragment. If records can be spanned, then every record and record fragment requires some extra header information: 1. Each record or fragment header must contain a bit telling whether or not it is a fragment. 2. If it is a fragment, then it needs bits telling whether it is the first or last fragment for its record. 3. If there is a next and/or previous fragment for the same record, then the fragment needs pointers to these other fragments. Storing spanned records across blocks BLOBS • Binary, Large OBjectS = BLOBS • BLOBS can be images, movies, audio files and other very large values that can be stored in files. • Storing BLOBS – Stored in several blocks. – Preferable to store them consecutively on a cylinder or multiple disks for efficient retrieval. • Retrieving BLOBS – A client retrieving a 2 hour movie may not want it all at the same time. – Retrieving a specific part of the large data requires an index structure to make it efficient. (Example: An index by seconds on a movie BLOB.) Column Stores An alternative to storing tuples as records is to store each column as a record. Since an entire column of a relation may occupy far more than a single block, these records may span many block, much as long as files do. If we keep the values in each column in the same order then we can reconstruct the relation from column records Consider this relation RECORD MODIFICATION AKSHAY SHENOY CLASS ID :108 Topic 13.8 Proffesor : T.Y Lin INTRODUCTION • • What is Record ? Record is a single, implicitly structured data item in the database table. Record is also called as Tuple. What is definition of Record Modification ? We say Records Modified when a data manipulation operation is performed. STRUCTURE OF A RECORD • • RECORD STRUCTURE FOR A PERSON TABLE CREATE TABLE PERSON ( NAME CHAR(30), ADDRESS CHAR(256) , GENDER CHAR(1), BIRTHDATE CHAR(10)); TYPES OF RECORDS • FIXED LENGTH RECORDS CREATE TABLE SJSUSTUDENT(STUDENT_ID INT(9) NOT NULL , PHONE_NO INT(10) NOT NULL); • VARIABLE LENGTH RECORDS CREATE TABLE SJSUSTUDENT(STUDENT_ID INT(9) NOT NULL, NAME CHAR(100) ,ADDRESS CHAR(100) ,PHONE_NO INT(10) NOT NULL); RECORD MODIFICATION • • • Modification of Record Insert Update Delete Issues even with Fixed Length Records More Issues with Variable Length Records STRUCTURE OF A BLOCK & RECORDS • Various Records are clubbed together and stored together in memory in blocks STRUCTURE OF BLOCK BLOCKS & RECORDS • • If records need not be any particular order, then just find a block with enough empty space We keep track of all records/tuples in a relation/tables using Index structures, File organization concepts Inserting New Records • • • If Records are not required to be a particular order, just find an empty block and place the record in the block. eg: Heap Files What if the Records are to be Kept in a particular Order(eg: sorted by primary key) ? Locate appropriate block,check if space is available in the block if yes place the record in the block. INSERTING NEW RECORDS • We may have to slide the Records in the Block to place the Record at an appropriate place in the Block and suitably edit the block header. What If The Block Is Full ? • • • I. II. We need to Keep the record in a particular block but the block is full. How do we deal with it ? We find room outside the Block There are 2 approaches to finding the room for the record. Find Space on Nearby Block Create an Overflow Block APPROACHES TO FINDING ROOM FOR RECORD • FIND SPACE ON NEARBY BLOCK BLOCK B1 HAS NO SPACE IF SPACE AVAILABLE ON BLOCK B2 MOVE RECORDS OF B1 TO B2. IF THERE ARE EXTERNAL POINTERS TO RECORDS OF B1 MOVED TO B2 LEAVE FORWARDING ADDRESS IN OFFSET TABLE OF B1 APPROACHES TO FINDING ROOM FOR RECORD • CREATE OVERFLOW BLOCK EACH BLOCK B HAS IN ITS HEADER POINTER TO AN OVERFLOW BLOCK WHERE ADDITIONAL BLOCKS OF B CAN BE PLACED. DELETION • • • Try to reclaim the space available on a record after deletion of a particular record If an offset table is used for storing information about records for the block then rearrange/slide the remaining records. If Sliding of records is not possible then maintain a SPACE-AVAILABLE LIST to keep track of space available on the Record. TOMBSTONE • • • • What about pointer to deleted records ? A tombstone is placed in place of each deleted record A tombstone is a bit placed at first byte of deleted record to indicate the record was deleted ( 0 – Not Deleted 1 – Deleted) A tombstone is permanent UPDATING RECORDS • • For Fixed-Length Records, there is no effect on the storage system For variable length records : • • If length increases, like insertion “slide the records” If length decreases, like deletion we update the spaceavailable list, recover the space/eliminate the overflow blocks.