Download Trees

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Quadtree wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

B-tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Tree Data Structures
Jim Skon
Indexing and Storage Levels



Index structures can greatly speed access
to file based information
Index processed in primary memory
Single Level Index



One single index
Physical: Entire index in primary memory at
once
Logical: Index structure is a simple array
Index
File
Indexing and Storage Levels

Two-level (multi-level) storage



Index partitions into more than one part
Physical: Only part of index in primary memory,
Rest in secondary memory at one time.
Logical: Index a hierarchical structure
Multi-level
index
File
Tree Dats Structures

Tree


An acyclic connected graph
Binary tree

a finite set of nodes which each is either empty or
consists of of two disjoint binary trees that are
called the left and right subtrees.
Binary trees

Node levels



Root level = 1
Level N = level of parent + 1
Path length of a node x

number of nodes to be traversed in order to proceed
from the root of a tree to a node x.
Tree Data Structures

Binary tree

Path length of a tree




sum of the path lengths of all its nodes.
Ni is number of nodes at level i
PL = SUM(Ni * i) for all i
Average path length




average path length to any node.
PA = P/n
P is path length of tree
n is number of nodes
Binary Search Tree

Arranged in a binary tree as follows:


All keys of the nodes in the left subtree of a node
Ni precede the key in Ni.
The keys of a node Ni precedes all the nodes in
the right subtree.
Binary Search Tree

C++ tree management:

C++ has a pointer data type and facilities for
controlling dynamic storage. A single binary tree
node may be defined as follows:
class node {
public:
int value;
node *lchild;
node *rchild;
};
Binary Search Tree

Operations




Search
Sequential list
Insert
Remove
Binary Search Tree

Search(key, node) - return pointer to node.
1. If node is NULL, then the search terminates
unsuccessfully.
2. If key = node->key, then the search returns
pointer to node.
3. If key < node->key, then the left subtree of node is
searched, i.e., Search(node->left).
4. If key > node->key, then the right subtree of node
is searched, i.e. Search(node->right).
Binary Search Tree
*node search(int k, node *n) {
if (n==NULL)
return NULL;
else
if (k==n->key) then
return n;
else if (k < n->key)
return search(key, n->left)
else return search(key, n->right)
end;
Binary Search Tree

Insert(key, data, node) (start with node=root)
1. If the node is nil, then create node, root = node.
2. If key = node->key, then the insertion terminates
unsuccessfully; the key is already in the tree.
3. If key < node ->key, insert into left subtree of
node: Insert(key, data,node ->left)
4. If key > node ->key, insert into right subtree of
node: Insert(key, data,node ->right)
Binary Search Tree

Sequential List: in order
1. visit the left subtree in order
2. print the current node
3. visit the right subtree in order
Binary Search Tree

Deletions

4 cases
Leftchild
= null
= null
? null
? null
Rightchild
=null
? null
=null
? null
Binary Search Tree

Deletions

Case one : left and right children null


leaf node
just remove node
Binary Search Tree

Deletions

Case two: left null and right non-null

Replace node by its right subtree
Binary Search Tree

Deletions

Case three: left non-null and right null

Replace node by its left subtree
Binary Search Tree

Deletions

Case four: left and right non-null

option one



replace node with left subtree
point "rightmost" null in left subtree to right subtree
leads to long paths
Binary Search Tree

Deletions

Case four: left and right non-null

option two


replace node with node with largest key in left subtree
less path length increase
Binary Search Tree

Analysis





simple to implement
running time dependent on path length
fast if "balanced"
degenerates to "linked-list" in worst case.
Question


how can the tree be kept balanced?
how expensive is the balancing operation?
Binary Search Tree

Balancing




Balanced - the depths of the subtrees of each
node differ by at most one.
In principle any data set can be place in a
balanced tree.
Rebalancing can be a complex operation.
Result of balanced tree

All operations can take place in O(log2n) time.
AVL Trees



Developed by Adelson-Velskii and Landis.
Also known as Height-Balanced Trees.
AVL tree features:



For any node the longest paths to leaves through
each subtree differs by at most one.
Insertions and deletions must perform balancing
as needed to maintain this state
Searches do not change tree, thus no
maintenance needed
AVL Trees


Three cases which must be distinguished
during insertion:
Consider for some node x
 with left and right subtrees l and r
 a new node is inserted in l, causing the height of l
to increase by one

hl and hr are the height of the left and right
subtrees, respectively
AVL Trees

There are then three distinct cases to
consider
 hl=hr and the insertion causes hl>hr but the
balance criterion is not violated.

hl<hr and the insertion causes hl=hr and the
balance is improved.

hl>hr and the insertion causes the balance
criteria to be violated and the tree with node x as
the root must be rebalanced.
AVL Trees

Insertion



Following the search path until it is verified that
the key value is not already in the tree.
Inserting the new node in the tree.
Retreating along the search path and checking
the balance factor (hr-hl) at each node,
rebalancing if necessary.
M-way Search Trees



The performance of an search tree can be
enhanced significantly by increasing the
branching factor of the tree.
the maximum degree of each node is
increased
Definition:

An m-way search tree is a tree in which each
node has out-degree Š m.
M-way Search Trees

An m-way tree has the following properties:

Each node of the tree has the structure:
n P0 K0 P1 K1 P2 ... Pn-1 Kn-1 Pn



where the P0,P1,...,Pn are pointers to the node's
subtrees
K0,...,Kn-1 are key values.
each node has out-degree <= m forces n <= m-1.
M-way Search Trees

An m-way tree has the following properties (cont):




The key values in a node are in ascending order:
 Ki<Ki+1 for i=0,...n-2.
All key values in nodes of the subtree pointed to by Pi are less
than the key value Ki for all i=0,...,n-1.
All key values in nodes of the subtree pointed to by Pn are
greater than the key value Kn-1.
The subtrees pointed to by the Pi, i=0,...,n are also m-way
search trees.
n P0 K0 P1 K1 P2 ... Pn-1 Kn-1 Pn
B-Trees

A B-tree of order m is an m-way search tree
with the following properties:



Each node of the tree, except for the root and the
leaves, has at least 1/2m subtrees and no more
than m subtrees.
The root of the tree has at least two subtrees,
unless it is itself a leaf.
All leaves of the tree are on the same level.
B-Trees


The first constraint ensures that each node of
the tree is at least half full.
The second constraint forces the tree to
branch early.
B-Trees


A B-tree is to an M-tree as an AVL tree is to a
binary search tree.
Worst length search in a B-tree of order m
containing n keys is:
𝑙𝑜𝑔
(𝑛−1)
𝑚
2
2
+1
B-Trees: Sequential access


Can be done using the inorder node
processing
This requires each parent to be visited once
for each child.
B-Trees: Insertion




If a node fills, the node is split into two nodes.
This may cause the next node up also to fill.
At worst case, this splitting may percolate all
the way up to the root node, causing the tree
to grow one level in height.
The probability that a key insertion will cause
a split in a B-tree of order m is less than 1 in
1/2m -1.
B-Trees: Deletion


if a node becomes less then half full, two
nodes may need to be merged.
Merges may also percolate all the way up to
the root node, causing a decrease in height
by one.
B*-Trees



Splitting can be costly
In B-trees nodes can be only half full
B*-Trees are an enhanced version
B*-Trees

An m-order B*-tree has the following
characteristics





Every node has m or fewer children, excluding the
root.
Interior nodes have >= FLOOR[(2m - 2) /3] + 1
children (2/3 thirds full)
The root has between two and 2 * FLOOR[(2m 2)/2] + 1 children inclusive.
All terminal nodes are on the same level.
A nonterminal node with k children contains
exactly k - 1 key values.
B*-Trees

Redistribution



nodes not split until both adjacent siblings full
When splitting, redistribution across all three
nodes
Trees not as deep as B-tree - since it tends to
be wider (since nodes at least 2/3 full.)
B*-Trees

Insertion





Many redistribution techniques possible
If a node fills (overflow), but adjacent node has
room, combine this node with adjacent node,
redistributing keys (changing parent key as
needed)
If all siblings are full, combine n node, break into
n+1 nodes, and add one to the parent level.
Splitting must occur systematically to obey 2/3’s
rule.
On splitting record to split always m * 2 + 1
B*-Trees

Deletion



Deletions may cause “underflow”
Redistribute with siblings if possible. (node count
remains the same)
May be complex!
B+-trees

This is a variation on the B-tree as follows:

ALL key values reside in the leaf nodes.



The leaves are linked together in order.
This ordered list of nodes is called a "sequence
set".
The "interior" (non-leaf) nodes are called the
"index part".

This contains only key values and pointers (no
data).
B+-trees

(This is a variation on the B-tree as
follows:)



The sequence set may actually be only a list
pointers to blocks of records, and therefore
may not contain every key value.
Addition is much as in B-trees.
Deletes is simpler, as index key entries do not
need to be removed.
B+-trees


A B+-tree is data structure on which indexed
sequential systems are based on.
The sequence set allows for efficient
sequential access.
IBM VSAM



VSAM Virtual Storage Access Method
Based on B+-trees
Structure







index set - interior nodes
Sequence set - terminal node ( with pointer to data)
one pointer per key in index
Data storage partitioned into control areas
Control areas partitioned into control intervala
Control areas and control intervals split as needed
Records in control interval may be variable size.