Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11. Midterm2 Average 84 >90 12 students B-Trees Organize data for fast queries Index for fast search For datasets of structured records, B-tree indexing is used Motivation (cont.) Assume that we use an AVL tree to store about 20 million records We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes about 0.2 seconds We know we can’t improve on the log n lower bound on search for a binary tree But, the solution is to use more branches and thus reduce the height of the tree! • As branching increases, depth decreases B-Trees 4 Constructing a B-tree 17 3 1 2 B-Trees 6 7 8 28 12 14 16 25 5 26 29 48 45 52 53 55 68 Definition of a B-tree A B-tree of order m is an m-way tree (i.e., a tree where each node may have up to m children) in which: 1. the number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree 2. all leaves are on the same level 3. all non-leaf nodes except the root have at least m / 2 children 4. the root is either a leaf node, or it has from two to m children 5. a leaf node contains no more than m – 1 keys The number m should always be odd B-Trees 6 Comparing Trees Binary trees • Can become unbalanced and lose their good time complexity (big O) • AVL trees are strict binary trees that overcome the balance problem • Heaps remain balanced but only prioritise (not order) the keys Multi-way trees • B-Trees can be m-way, they can have any (odd) number of children • One B-Tree, the 2-3 (or 3-way) B-Tree, approximates a permanently balanced binary tree, exchanging the AVL tree’s balancing operations for insertion and (more complex) deletion operations B-Trees 7 Chapter 8: Dynamic Programming Dynamic Programming is a general algorithm design technique Invented by American mathematician Richard Bellman in the 1950s to solve optimization problems “Programming” here means “planning” Main idea: • solve several smaller (overlapping) subproblems • record solutions in a table so that each subproblem is only solved once • final state of the table will be (or contain) solution Example: Fibonacci numbers Recall definition of Fibonacci numbers: f (0) = 0 f (1) = 1 f (n) = f (n-1) + f (n-2) Computing the nth Fibonacci number recursively (top-down): f(n) f(n-1) f(n-2) + + f(n-3) f(n-2) f(n-3) ... + f(n-4) Example: Fibonacci numbers Computing the nth Fibonacci number using bottom-up iteration: f(0) = 0 f(1) = 1 f(2) = 0+1 = 1 f(3) = 1+1 = 2 f(4) = 1+2 = 3 f(5) = 2+3 = 5 … f(n-2) = f(n-1) = f(n) = f(n-1) + f(n-2) Examples of Dynamic Programming Algorithms Computing binomial coefficients Optimal chain matrix multiplication Constructing an optimal binary search tree Warshall’s algorithm for transitive closure Floyd’s algorithms for all-pairs shortest paths Some instances of difficult discrete optimization problems: • travelling salesman • knapsack The problem: check if all pair have a valid path a b c d You can run depth first search from each vertex… But can u do better? Warshall’s Algorithm: Transitive Closure • Computes the transitive closure of a relation • (Alternatively: all paths in a directed graph) • Example of transitive closure: 3 3 1 1 2 4 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 2 4 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 Warshall’s Algorithm Main idea: a path exists between two vertices i, j, iff •there is an edge from i to j; or •there is a path from i to j going through vertex 1; or •there is a path from i to j going through vertex 1 and/or 2; or •there is a path from i to j going through vertex 1, 2, and/or 3; or •... •there is a path from i to j going through any of the other vertices Warshall’s Algorithm 3 3 3 1 1 1 4 2 0 1 0 0 R0 0 1 0 0 0 0 1 0 0 1 0 0 4 2 R1 0 0 1 0 0 0 0 1 1 1 0 0 0 1 0 0 3 3 1 1 4 2 R2 0 0 1 0 0 0 1 1 1 1 0 1 0 1 0 1 4 2 R3 0 0 1 0 0 0 1 1 1 1 0 1 0 1 0 1 4 2 R4 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 Warshall’s Algorithm In the kth stage determine if a path exists between two vertices i, j using just vertices among 1,…,k { R(k)[i,j] = R(k-1)[i,j] (path using just 1 ,…,k-1) or (R(k-1)[i,k] and R(k-1)[k,j]) (path from i to k and from k to i using just 1 ,…,k-1) k i kth stage j Pseudocode of Warshall’s Algorithm ALGORITHM Warshall ( A[1..n,1..n ]) R ( 0) A for k 1 to n do for i 1 to n do for j 1 to n do R ( k ) [i, j ] R ( k 1) [i, j ] or ( R ( k 1) [i, k ] and R ( k 1) [k , j ]) return R ( n ) Example a b c d Time efficiency? Intuitive understanding of Warshall algo Can the above Warshall algorithm find all possible paths? Does Warshall algorithm avoid duplicate updates? i=6 9 12 3 7 5=j 69 912 123 37 75=j Floyd’s Algorithm: All pairs shortest paths In a weighted graph, find shortest paths between every pair of vertices Same idea: construct solution through series of matrices D(0), D(1), … using an initial subset of the vertices as intermediaries. 2 2 1 7 6 3 3 1 4 1 2 3 4 1 2 3 4 0 3 2 0 7 0 1 6 0 1 2 3 4 1 2 3 4 0 10 3 4 2 0 5 6 7 7 0 1 6 16 9 0 Similar to Warshall’s Algorithm dij(k ) in D (k ) is equal to the length of shortest path among all paths from the ith vertex to jth vertex with each intermediate vertex, if any, numbered not higher than k (k-1) dij vj vi (k-1) (k-1) di k dk j vk dij( k ) min{ dij( k 1) , dik( k 1) dkj( k 1) } for k 1, dij( 0) wij Pseudocode of Floyd’s Algorithm The next matrix in sequence can be written over its predecessor ALGORITHM Floyd (W [1..n,1..n ]) D W for k 1 to n do for i 1 to n do for j 1 to n do D[i, j ] min{ D[i, j ], D[i, k ] D[k , j ]} return D Example 1 2 2 7 3 6 3 Time efficiency? 1 4 Principle of Optimality An optimal solution to any instance of a optimization problem is composed of optimal solution to its subinstance e.g. Shortest path between a and b… then any distance of two points S, T on the path from a to b must also be shortest…. Why? Otherwise, we can choose another path S’ and T’! Key issues in DP algorithm design Derive a recurrence relationship that expresses a solution to an instance of the problem in terms of solutions to its smaller subinstances Design Algo for Sequence Alignment Problem using DP ACTGGAGGCCA AGCTAGAGAAG Score= Sum(si) match: +3 Mismatch:-2 gap-X -1