Download B Trees

CSE2132 Database Systems Week 10 Lecture Physical Database Design - File Structures Physical DB Design 10. 1 Data Structures -What will we cover?  Underlying data structures – File organizations – Access modes – Binary trees – B+ trees  Oracle data structures Physical DB Design 10. 2 Underlying Data Structures  Data structures are the bricks and mortar that hold databases together.  Data structures (for the ANSI/SPARC standard) are defined in the internal model level and implemented in the physical data organization.  Data structures are often hidden from the application programmer, since they are primarily used by the DBMS and Operating Systems.  A good understanding and choice of data structures is important for machine performance, also to improve program design and to allow easier communication with DBMS specialists. Physical DB Design 10. 3 File Organization  A file organization is a technique for physically arranging the records of a file on a secondary storage device. File organizations Sequential Sequential (block index) Hardwaredependent (ISAM) Indexed Non-sequential (full index) Direct RelativeAddressed HashAddressed Hardwareindependent (VSAM) Physical DB Design 10. 4 Record Access Modes  Sequential Access In sequential access, record storage starts at a designated point, usually the beginning, and proceeds in a linear sequence through the file. Each record can only be retrieved by accessing all the records that physically precede it.  Random Access In random access, a given record is accessed "out of the blue" without referencing other records in the file. Physical DB Design 10. 5 File Organization and Access Mode  A File organization is established when the file is created, and is rarely changed. However, record access mode can change each time the file is used. Record access mode File Sequential Random Organization Sequential Yes No (impractical) Indexed Seq. Yes Yes Direct-Relative Yes Yes Direct-Hashed No (impractical) Yes Physical DB Design 10. 6 Indexed Sequential Architecture (Partial Index) 747 363 153 575 252 Index set (many levels) 683 363 - - Sequence set 100 125 153 207 221 252 The actual data records Control interval Control Area Physical DB Design 10. 7 Direct - Relative Files  Each record can be retrieved by specifying its relative record number. The relative record number is a number 0 to n that gives the position of the record relative to the beginning of the file.  This provides a method of direct file organization. Both sequential and direct access are handled but having a key allocation suitable for this method is not always easy or possible. Physical DB Design 10. 8 Direct - Hashed Files  In applications which do updates and retrievals in random mode, and there is rarely the need for sequential access to the data records (e.g. reservation systems). Hashed file organization provides rapid access to individual records based on a key.  The major disadvantage of hash organization is that sequential access is not convenient because the records are not stored in primary key sequence. But highly concurrent environments doing random access are suitable for using hash organization.  The basis of a hash file is an addressing algorithm which transforms the record identifier into a relative address. Physical DB Design 10. 9 Components of a Hashed File Identifier Transformation 1 Primary storage area 2 3 ..... s 1 2 Slot Bucket overflow technique Bucket b Overflow storage area 1 2 3 ..... s 0 Physical DB Design 10. 10 Hashed File Design Load Factor(Fill Factor): The load factor is the percentage of space allocated to the file that is taken up by the records in the file. A low load factor reduces the number of records that overflow their home addresses It is common to use 50% to 80%, using a lower load factor for files which that will grow. Bucket Capacity: Increasing the bucket capacity will also reduce the number of overflows and hence the average search length also. b=1 1.5 b=2 Average Search Length 1.3 b=3 b=4 1.1 b = records per bucket 1.0 20 40 60 Load Factor (%) 80 100 Physical DB Design 10. 11 Comparison of Organizations Sequential Start of file ASTEROIDS BREAKOUT COMBAT ZAXXON Indexed Sequential H Key P Z Index A ASTEROIDS ..... D K ....H .... M .. MEGAMANIA P ...... ZAXXON Physical DB Design 10. 12 Comparison of Organizations(2) Direct - Relative CHESS Relative record number COMBAT 1 DEFENDER 2 ZAXXON 3 n Direct - Hashed Hashing Routine KEY PITFALL 1 BERSERK 2 Relative record no. ODYSSEY 3 .... DONKEY KONG n Physical DB Design 10. 13 Binary Trees  A non-linear data structure, each element having several "next" elements ( branching ).  A binary tree has a maximum of two branches per element or node.  A node consist of some data and a maximum of two pointers, a left pointer to the left branch and right pointer to the right branch. If there is no left or right branch then a nil pointer is used. Physical DB Design 10. 14 A Diagram of a Binary Tree PRODUCT# LINK RLINK Basic binary tree record Primary Less Than Greater Than Data layout for Key Pointer Pointer PRODUCT __________________________________________ 1000 1000 < > 1600 (1) Initial tree < 0350 1000 (2) Insert 1000 < > 2000 (5) Insert 2000 0350 1600 (4) Insert 0350 < 0350 > 1600 > 0975 (6) Insert 0975 > 2000 > 0350 1000 > 1600 (3) Insert 1600 1000 1000 > > 1600 < 0975 0625 > 2000 (7) Insert 0625 Physical DB Design 10. 15 An Example of a Binary Tree 1000 < > 1600 0350 < 0100 < 0625 > < > 0975 1250 > 1425 2000 < 1775 Task: Indicate the different traversals on this diagram. Physical DB Design 10. 16 B Trees  The problem with Binary Trees is balance, the tree can easily deteriorate to a linked list. Consequently, the reduced search times are lost, this problem is overcome in B trees. B stands for Balanced, where all the leaves are the same distance from the root. B trees guarantee a predictable efficiency.  There are several varieties of Btrees, most applications use the B+tree. A B+tree of degree m has the following properties: 1. All leaves are at the same level, that is the same depth from the root. 2. A non-leaf node that has n branches will contain n-1 keys. Physical DB Design 10. 17 Example of a B Tree < 0625 0350 1000 1250 1291 1277 1282 > 1425 2000 1600 2107 A Btree provides balance and quick direct access but sequential processing can be slow. Because of this the B+tree was introduced. In a B+tree all key values occur in a leaf node so that sequential processing can be supported. This means that the leaf nodes have a different structure to high level nodes and some key values will occur twice in the tree. Physical DB Design 10. 18 B+ Tree Node Structure A high level node K1 P1 Pointer to subtree for keys< K 1 K2 P2 .. Pointer to subtree for keys>= K & 1 <K 2 ..... P n-1 K n-1 Pointer to subtree for keys>= K n-& 2 <K P n Pointer to subtree for keys>= K n-1 n-1 A leaf node (Every key value appears in a leaf node) P1 Pointer to record (block) with key K 1 K1 P2 K2 Pointer to record (block) with key K 2 ....... P n-1 K n-1 Pointer to record (block) with key K n-1 P n Pointer to leaf with smallest key greater than K n-1 Physical DB Design 10. 19 Example of a B+ Tree Leaf Nodes < >= 1250 0625 1000 0350 0350 0625 0625 1425 2000 1000 1000 1250 1300 1250 1425 1600 2000 1425 1300 2000 1600 Actual Data Records Physical DB Design 10. 20 Building a B+ Tree 67, 89 , 123,18, 34, 87, 99, 104, 36, 55, 78, 9 root node < < 67 89 leaf node 89 >= 67 89 123 data records (node split a bc ; 3 do not fit so split and promote middle value) < 18 67 89 < >= 89 123 34 18 89 89 34 >= 123 67 Physical DB Design 10. 21 A Review of Trees  Can permit rapid retrieval of data for both random and sequential processing.  Can be used based on primary or secondary keys.  Trees are special cases of networks; in networks records from different files are joined without a strict hierarchy being observed. Physical DB Design 10. 22 Indexes in Oracle(1) CREATE [bitmap] [unique] INDEX index ON table(column [,column]..);  An index is a schema object that contains an entry for each value that appears in the indexed column(s) of the table or cluster and provides direct, fast access to rows.  Indexes may be created on  one or more(up to 32) columns of a table, a partitioned table, or a cluster;  one or more scalar typed object attributes of a table or a cluster.  It is preferable to use primary key when creating the table as Create Unique Index will fail if there are duplicates. Physical DB Design 10. 23 Indexes in Oracle(2)  An index is an ordered list of all the values that reside in a group of one or more columns at a given time. Such a list makes queries that test the values in those columns vastly more efficient. Indexes also take up storage space, and must be changed whenever the data is, so a cost-benefit analysis must be made in each case to determine whether and how indexes should be used. Oracle can use indexes to improve performance when:  searching for rows with specified index column values  accessing tables in index column order  When you initially insert rows into a new table, it is generally faster to create the table, insert the rows, and then create the index. If you create the index before inserting the rows, Oracle must update the index for every row inserted. Physical DB Design 10. 24 Indexes in Oracle(3)  Multiple Indexes Per Table Unlimited indexes can be created for a table provided that the combination of columns differ for each index. You can create more than one index using the same columns provided that you specify distinctly different combinations of the columns. For example, the following statements specify valid combinations: CREATE INDEX emp_idx1 ON emp (ename, job); CREATE INDEX emp_idx2 ON emp (job, ename);  Note that each index increases the processing time needed to maintain the table during updates to indexed data. There is overhead in maintaining indexes when a table is updated. Thus, updating a table with a single index will take less time than if the table had five indexes. Physical DB Design 10. 25 Indexes in Oracle(4) - Nulls  Table rows in which all key columns are NULL are not indexed. Consider the following statement: SELECT ename FROM emp WHERE comm IS NULL; The above query does not use an index created on the COMM column. Physical DB Design 10. 26 Indexes in Oracle(5) - Bitmap Index  Bitmap indexes store the rowids associated with a key value as a bitmap. Each bit in the bitmap corresponds to a possible ROWID, and if the bit is set, it means that the row with the corresponding ROWID contains the key value. The internal representation of bitmaps is best suited for applications with low levels of concurrent transactions, such as data warehousing.  Bitmap indexes are appropriate when there are few distinct values for a column that the index is created on. An example would be a flag column that held either Y or N. CREATE BITMAP INDEX masterflagbitmap_ix ON film_copy(masterflag);  The index holds a bitmap value for each possible value for every row in the table Y<11011001............> N<00100110............> Physical DB Design 10. 27 Clusters(1)  A cluster is a schema object that contains one or more tables that all have one or more columns in common. Rows of one or more tables that share the same value in these common columns are physically stored together within the database.  Clustering provides more control over the physical storage of rows within the database. Clustering can reduce both the time it takes to access clustered tables and the space needed to store the table. After you create a cluster and add tables to it, the cluster is transparent. You can access clustered tables with SQL statements just as you can non-clustered tables.  While clustering multiple tables improves the performance of joins, it is likely to reduce the performance of full table scans, INSERT statements, and UPDATE statements that modify cluster key values. Physical DB Design 10. 28 Clusters(2) - creating an Indexed Cluster  The rows of two related tables are interleaved in a single area called a cluster. The cluster key is the column or columns by which the tables are usually joined in a query. CREATE CLUSTER cluster (column datatype [,column datatype] . . . ); e.g. CREATE CLUSTER workerandskill (tempname varchar2(25) ); This sets aside a space. The column name is irrelevant but the datatype must match Name in the table worker. Next tables are created to be included in the cluster. CREATE TABLE worker (Name Varchar2(25) not null, Age Number, Lodging Varchar2(15) ) CLUSTER workerandskill (Name); Physical DB Design 10. 29 Clusters(3) - creating an Indexed Cluster  Now a second table is added to the cluster CREATE TABLE workerskill ( Name Varchar2(25) not null, Skill Varchar2(25) not null, Ability Varchar2(15) ) CLUSTER workerandskill (Name);  Prior to inserting rows into worker and workerskill you must create a cluster index. CREATE INDEX workerandskill_ix ON CLUSTER workerandskill; Note that no index columns are specified since the index is automatically built on all the columns of the cluster key. For cluster indexes, all rows are indexed. Physical DB Design 10. 30 Example of a Cluster: Name is the Cluster Key Age 23 29 22 18 16 43 27 Lodging PAPA KING ROSE HILL CRAMNER ROSE HILL MATTS WEITBROCHT ROSE HILL ROSE HILL Name Skill Ability ADAH TALBOT WORK GOOD ANDREW DYE BART SARJEANT DICK JONES SMITHY EXCELLENT DONALD ROLLO ELBERT TALBOTDISCUS SLOW JOHN PEARSON COMBINE DRIVER WOODCUTTER GOOD SMITHY AVERAGE KAY AND PALMER WALLBOM From the WORKER table From the WORKERSKILL table Physical DB Design 10. 31 Clusters(4) - creating an Indexed Cluster  Each cluster key value is stored only once. It is as if the cluster were a big table containing data drawn from both of the tables that make it up.  You may want to use indexed clusters in the following cases: Your queries retrieve rows over a range of cluster key values. Your clustered tables may grow unpredictably.  You cannot specify integrity constraints as part of the definition of a cluster key column. Instead, you can associate integrity constraints with the tables that belong to the cluster. Physical DB Design 10. 32 Clusters(5) - creating a Hash Cluster  In a hash cluster, Oracle stores together rows that have the same hash key value. The hash value for a row is the value returned by the cluster's hash function.  When you create a hash cluster, you can either specify a hash function or use the Oracle internal hash function. Hash values are not actually stored in the cluster, although cluster key values are stored for every row in the cluster.  You may want to use hash clusters in the following cases: Your queries retrieve rows based on equality conditions involving all cluster key columns. Your clustered tables are static or you can determine the maximum number of rows and the maximum amount of space required by the cluster when you create the cluster. Physical DB Design 10. 33 Clusters(6) - creating a Hash Cluster  The following statement creates a hash cluster named PERSONNEL with the cluster key column DEPARTMENT_NUMBER. CREATE CLUSTER personnel ( department_number NUMBER ) HASHKEYS 500;  The hashkeys clause creates the hash cluster, using an internal hash function and specifies the number of hash values rounded to the nearest prime number (503 in this case).  Now create the tables indicating the cluster in the cluster clause Physical DB Design 10. 34

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download B Trees