Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tree Data Structures Jim Skon Indexing and Storage Levels Index structures can greatly speed access to file based information Index processed in primary memory Single Level Index One single index Physical: Entire index in primary memory at once Logical: Index structure is a simple array Index File Indexing and Storage Levels Two-level (multi-level) storage Index partitions into more than one part Physical: Only part of index in primary memory, Rest in secondary memory at one time. Logical: Index a hierarchical structure Multi-level index File Tree Dats Structures Tree An acyclic connected graph Binary tree a finite set of nodes which each is either empty or consists of of two disjoint binary trees that are called the left and right subtrees. Binary trees Node levels Root level = 1 Level N = level of parent + 1 Path length of a node x number of nodes to be traversed in order to proceed from the root of a tree to a node x. Tree Data Structures Binary tree Path length of a tree sum of the path lengths of all its nodes. Ni is number of nodes at level i PL = SUM(Ni * i) for all i Average path length average path length to any node. PA = P/n P is path length of tree n is number of nodes Binary Search Tree Arranged in a binary tree as follows: All keys of the nodes in the left subtree of a node Ni precede the key in Ni. The keys of a node Ni precedes all the nodes in the right subtree. Binary Search Tree C++ tree management: C++ has a pointer data type and facilities for controlling dynamic storage. A single binary tree node may be defined as follows: class node { public: int value; node *lchild; node *rchild; }; Binary Search Tree Operations Search Sequential list Insert Remove Binary Search Tree Search(key, node) - return pointer to node. 1. If node is NULL, then the search terminates unsuccessfully. 2. If key = node->key, then the search returns pointer to node. 3. If key < node->key, then the left subtree of node is searched, i.e., Search(node->left). 4. If key > node->key, then the right subtree of node is searched, i.e. Search(node->right). Binary Search Tree *node search(int k, node *n) { if (n==NULL) return NULL; else if (k==n->key) then return n; else if (k < n->key) return search(key, n->left) else return search(key, n->right) end; Binary Search Tree Insert(key, data, node) (start with node=root) 1. If the node is nil, then create node, root = node. 2. If key = node->key, then the insertion terminates unsuccessfully; the key is already in the tree. 3. If key < node ->key, insert into left subtree of node: Insert(key, data,node ->left) 4. If key > node ->key, insert into right subtree of node: Insert(key, data,node ->right) Binary Search Tree Sequential List: in order 1. visit the left subtree in order 2. print the current node 3. visit the right subtree in order Binary Search Tree Deletions 4 cases Leftchild = null = null ? null ? null Rightchild =null ? null =null ? null Binary Search Tree Deletions Case one : left and right children null leaf node just remove node Binary Search Tree Deletions Case two: left null and right non-null Replace node by its right subtree Binary Search Tree Deletions Case three: left non-null and right null Replace node by its left subtree Binary Search Tree Deletions Case four: left and right non-null option one replace node with left subtree point "rightmost" null in left subtree to right subtree leads to long paths Binary Search Tree Deletions Case four: left and right non-null option two replace node with node with largest key in left subtree less path length increase Binary Search Tree Analysis simple to implement running time dependent on path length fast if "balanced" degenerates to "linked-list" in worst case. Question how can the tree be kept balanced? how expensive is the balancing operation? Binary Search Tree Balancing Balanced - the depths of the subtrees of each node differ by at most one. In principle any data set can be place in a balanced tree. Rebalancing can be a complex operation. Result of balanced tree All operations can take place in O(log2n) time. AVL Trees Developed by Adelson-Velskii and Landis. Also known as Height-Balanced Trees. AVL tree features: For any node the longest paths to leaves through each subtree differs by at most one. Insertions and deletions must perform balancing as needed to maintain this state Searches do not change tree, thus no maintenance needed AVL Trees Three cases which must be distinguished during insertion: Consider for some node x with left and right subtrees l and r a new node is inserted in l, causing the height of l to increase by one hl and hr are the height of the left and right subtrees, respectively AVL Trees There are then three distinct cases to consider hl=hr and the insertion causes hl>hr but the balance criterion is not violated. hl<hr and the insertion causes hl=hr and the balance is improved. hl>hr and the insertion causes the balance criteria to be violated and the tree with node x as the root must be rebalanced. AVL Trees Insertion Following the search path until it is verified that the key value is not already in the tree. Inserting the new node in the tree. Retreating along the search path and checking the balance factor (hr-hl) at each node, rebalancing if necessary. M-way Search Trees The performance of an search tree can be enhanced significantly by increasing the branching factor of the tree. the maximum degree of each node is increased Definition: An m-way search tree is a tree in which each node has out-degree Š m. M-way Search Trees An m-way tree has the following properties: Each node of the tree has the structure: n P0 K0 P1 K1 P2 ... Pn-1 Kn-1 Pn where the P0,P1,...,Pn are pointers to the node's subtrees K0,...,Kn-1 are key values. each node has out-degree <= m forces n <= m-1. M-way Search Trees An m-way tree has the following properties (cont): The key values in a node are in ascending order: Ki<Ki+1 for i=0,...n-2. All key values in nodes of the subtree pointed to by Pi are less than the key value Ki for all i=0,...,n-1. All key values in nodes of the subtree pointed to by Pn are greater than the key value Kn-1. The subtrees pointed to by the Pi, i=0,...,n are also m-way search trees. n P0 K0 P1 K1 P2 ... Pn-1 Kn-1 Pn B-Trees A B-tree of order m is an m-way search tree with the following properties: Each node of the tree, except for the root and the leaves, has at least 1/2m subtrees and no more than m subtrees. The root of the tree has at least two subtrees, unless it is itself a leaf. All leaves of the tree are on the same level. B-Trees The first constraint ensures that each node of the tree is at least half full. The second constraint forces the tree to branch early. B-Trees A B-tree is to an M-tree as an AVL tree is to a binary search tree. Worst length search in a B-tree of order m containing n keys is: 𝑙𝑜𝑔 (𝑛−1) 𝑚 2 2 +1 B-Trees: Sequential access Can be done using the inorder node processing This requires each parent to be visited once for each child. B-Trees: Insertion If a node fills, the node is split into two nodes. This may cause the next node up also to fill. At worst case, this splitting may percolate all the way up to the root node, causing the tree to grow one level in height. The probability that a key insertion will cause a split in a B-tree of order m is less than 1 in 1/2m -1. B-Trees: Deletion if a node becomes less then half full, two nodes may need to be merged. Merges may also percolate all the way up to the root node, causing a decrease in height by one. B*-Trees Splitting can be costly In B-trees nodes can be only half full B*-Trees are an enhanced version B*-Trees An m-order B*-tree has the following characteristics Every node has m or fewer children, excluding the root. Interior nodes have >= FLOOR[(2m - 2) /3] + 1 children (2/3 thirds full) The root has between two and 2 * FLOOR[(2m 2)/2] + 1 children inclusive. All terminal nodes are on the same level. A nonterminal node with k children contains exactly k - 1 key values. B*-Trees Redistribution nodes not split until both adjacent siblings full When splitting, redistribution across all three nodes Trees not as deep as B-tree - since it tends to be wider (since nodes at least 2/3 full.) B*-Trees Insertion Many redistribution techniques possible If a node fills (overflow), but adjacent node has room, combine this node with adjacent node, redistributing keys (changing parent key as needed) If all siblings are full, combine n node, break into n+1 nodes, and add one to the parent level. Splitting must occur systematically to obey 2/3’s rule. On splitting record to split always m * 2 + 1 B*-Trees Deletion Deletions may cause “underflow” Redistribute with siblings if possible. (node count remains the same) May be complex! B+-trees This is a variation on the B-tree as follows: ALL key values reside in the leaf nodes. The leaves are linked together in order. This ordered list of nodes is called a "sequence set". The "interior" (non-leaf) nodes are called the "index part". This contains only key values and pointers (no data). B+-trees (This is a variation on the B-tree as follows:) The sequence set may actually be only a list pointers to blocks of records, and therefore may not contain every key value. Addition is much as in B-trees. Deletes is simpler, as index key entries do not need to be removed. B+-trees A B+-tree is data structure on which indexed sequential systems are based on. The sequence set allows for efficient sequential access. IBM VSAM VSAM Virtual Storage Access Method Based on B+-trees Structure index set - interior nodes Sequence set - terminal node ( with pointer to data) one pointer per key in index Data storage partitioned into control areas Control areas partitioned into control intervala Control areas and control intervals split as needed Records in control interval may be variable size.