Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to The Design and Analysis of Algorithms Chapter One : Introduction An algorithm ios a sequence of nonambiguous instructions for solving a problem in a finite amount of time. An input to an algorithm specifies an instance of the problem the algorithm solves. Algorithms can be specified in a natural language or a pseudocode; they can also be implemented as computer programs. Among several ways to classify algorithms, the two principal alternatives are: to group algorithms according to types of problem they solve; to group algorithms according to underlying design techniques they are based upon. The important problem types are sorting, searching, string processing, graph problems, combinatorial problems, geometrical problems, and numerical problems. Algorithm design techniques ( or "strategies" or "paradigms") are general approaches to solving problems algorithmically, applicable to a variety of problems from different areas of computing. Allthough desiogning an algorithm is undoubtably a creative activity, one can identify a sequence of interrelated actions involved in such a process. They are summarized in the following figure. A good algorithm is usually a result of repeated efforts and rework. The same problem can often be solved by several algorithms. For example , three algorithms were given for computing the greatest common divisor of two integers: Euclid's algorithm, the consecutive integer checking algorithm, and the middle-school algorithm ( enhanced by the sieve of Eratosthenes for generating a list of primes). Algorithms operate on data. This makes the issue of data structuring critical for efficient algorithmic problem solving. The most important elementary data structures are the array and the linked list. They are used for representing more abstract data structures such as the list, the stack, the queue, the graph (via its adjacency matrix or adjacency linked list), the binary tree, and the set. An abstract collection of objects with several operations that can be performed on them is called an abstract data type (ADT). The list, the stack, the queue, the priority queue, and the dictionary are important examples of abstract data types. Modern object-oriented languages support implementation of ADT's by means of classes. Chapter Two : Fundamentals of the Analysius of Algorithm Efficiency There are two kinds of algorithm efficiency: time efficiency and space efficiency. Time efficiency indicates how fast the algorithm runs; space efficiency deals with the extra space it requires. An algorithm's time efficiency is principally measured as a function of its input size by counting the number of times its basic operation is executed. A basic operation is the operation that contributes most toward running time. Typically, it is the most time-consuming operation in the algorithm's innermost loop. For some algoritms, the running time may differ considerably for inoputs of the same size, leading to worst-case efficiency, average-case efficiency, and best-case efficiency. The established framework for analyzing an algorithm's time efficiency is primarily grounded in the order of growth of the algorithm's running time as its input size goes to infinity. The notation Ο, Ω, and Θ are used to indicate and compare the asymptotic orders of growth of functions expressing algorithm efficiencies. The efficiencies of a large number of algorithms fall into the following few classes: constant, logarithmic, linear, "n-log-n'" quadratic, cubic, and exponential. The main tool for analyzing the time efficiency of a nonrecursive algorithm is to set up a sum expressing the number of executions of its basic operation and ascertain the sum's order of growth. The main tool for analyzing the time efficiency of a recursive algorithm is to set up a recurrence relatin expressing the number of executions of its basic operation and ascertain the sum's order of growth. Succinctness of a recursive algorithm may mask its inefficiency The Fibonacci numbers are an important sequence of integers in which every element is equal to the sum of its two immediate predesessors. There are several algorithms for computing the Fibonacci numbers with drastically different efficiencies. Empirical analysis of an algorithm is performed by running a program implementing the algorithm on a sample of inputs and analyzing the data observed ( the basic operation's count or physical running time). This often involves generating psudorandom numbers. The applicability to any algorithm is the principle strength of this approach; the dependence of results on the particular computer and instance sample is its main weakness. Algorithm visualization is the use of images to convey useful information about algorithms. The two principal variations of algorithm visualization are static algorithm visualization and dynamic algorithm visualization (also called algorithm animation). Chapter Three : Brute Force Brute force is a straightforward approach to solving a problem, usually directly based on the problem's statement and definitions of the concepts involved. The principal strengths of the brute-force approach are wide applicability and simplicity; its principal weakness is subpar efficiency of most brute-force algorithms. A first application of the brute-force approach often results in an algorithm that can be improved with a modest amount of effort. The following noted algorithms can be considered as examples of the bruteforce approach: definition-based algorithm for matrix multiplication selection sort sequential search straightforward string matching algorithm. Exhaustive search is a brute-force approach to combinatorial problems. It suggests generating each and every combinatorial object of the problem, selecting those of them that satisfy the problem's constraints and then finding a desired object. The traveling salesman problem, the knapsack problem, and the assignment problem are typical example of problems that can be solved, at least theoritically, by exhaustive-search algorithms. Exhaustive search is impractical for all but very small instaces of problems it can be applied to. Chapter Four : Divide and Conquer Divide-and-conquer is a general algorithm design technique that solves a problem's instance by dividing it into several smaller instances (ideally of equal size), solving each of them recursively, and then combining their solutions to get a solution to the original instance of the problem. Many efficient algorithms are based on this technique, although it can be both inapplicable and inferior to simpler algorithmic solutions. Time efficiency T(n) of many divide-and-conquer algorithm satisfies the equation T(n) = aT(n/b) + f(n). The Master Theorem establishesthe order of growth of this equation's solutions. Mergesort is a divide-and-conquer sorting algorithm. It works by dividing an input array into two halves, sorting them recusively, and the merging the two sorted halves to get the original array sorted. The algorithm's time efficiencyis in Θ(n logn) in all cases, with the number of key comparisons being very close to the theoritical minimum. Its principal drawback is a significant extra storage requirement. Quicksort is a divide-and-conquer sorting algorithm that works by partitioning its input's element according to their value relative to some preselected element. Quicksort is noted for its superior efficiency among n log n algorithms for sorting randomly ordered array but also for the quadratic worstcase efficieny. Binary search is a Ο(log n) algorithm for searching in sorted arrays. It is a typical example of an application of the divide-and-conquer technique because it need to solve just one problem of half the size on each of its iterations. The classic traversals of a binary tree – preorder, inorder, and postorder – and similar algorithms that require recursive processing of both left and right subtrees can be considered examples of the divide-and-conquer technique. Their analysis is helped by replacing all the empty subtrees of a given tree with special external nodes. There is a divide-and-conquer algorithm for multiplying two n-digit integers that requires about n1.585 one digit multiplications. Starassen's algorithm needs only seven multiplications to multiply 2-by-2 matrices but requires more additions that the definition-based algorithm. By exploiting the divide-and-conquer technique, this algorithm can multiply two n-by-n matrices with about n2.807 multiplications. That divide-and-conquer technique can be successfully applied to two important problems of computational geometry: the closest-pair problem and the convex-hull problem. Chapter Five : Decrease and Conquer Decrease-and-conquer is a general algorithm design technique, based on exploiting a relationship between a solution to a given instance of a problem and a solution to a smaller instance of the same problem. One such a relationship is established, it can be exploited either top down (recursively) or bottom up (without a recursion). There are three major variations of decrease-and-conquer: decrease by a constant, most often by ione (e.g. insertin sort); decrease by a constant factor. most often by the factor of two (e.g. binary search); variable size decrease (e.g. Euclid's algorithm). Insertion sort is a direct application of the decrease (by one)-and-conquer technique to the sorting problem. It is a Θ(n2) algorithm both in the worst and average cases but it is about twice as fast on average that in the worst case. The algorithm's notable advantage is a good performance on almost sorted arrays. Depth-first-search (DFS) and breadth-first-search (BFS) are two principal graph traversal algorithms. By representing a graph in a form of a depth-first or breadth-first search forest, they help in the investigation of many important properties of the graph. Both algorithm have the same time efficiency: Θ(|V|2) for the adjacency matrix representation and Θ(|V| + |E|) for the adjacency linked list representation. A digraph is a graph with directions on its edges. The topological sorting problem asks to list vertices of a digraph in an order such that for every edge of the digraph the vertex it starts at is listed before the vertex it points to. This problem has a solution if and only if a digraph is a dag (directed acyclic graph), i.e., it has no directed cycles. There are two algorithms for solving th etopological sorting problem. The first one is based on depth-first search; the second is based on the direct implementation of the decrese-by –one technique. Decrease-by-one technique is a natural approach to developing algorithms for generating elementary combinatorial objects. The most efficient class of such algorithms are minimal-change algorithms. However, the number of combinatorial objects grows so fast that even the best algorithms are of practical interest only for very small instances of such problems. Identifying a fake coin with a balanced scale, multiplication a la russe, and the Josephus problem, are examples of problems that can be solved by decreaseby –a-constant-factor algorithms. Two other and more important examples are binary search and exponentiation by squaring. For some algorithms based on the decrease-and–conquer technique, the size reduction varies from one iteration of the algorithm to another. Examples of such variable-size-decrease algorithms include Euclid's algorithm, the partition-cased algorithm for the selection problem, interpolation search, and searching and insertion in a binary search tree. Chapter Six : Transform and Conquer Transform-and-conquer is a group of techniques based on the idea of transformation to a problem that is easier to solve. There are thre principal varieties of the transform-and-conquer strategy: instance simplification, representation change, and problem reduction. Instance simplification is a technique of transforming an instance of a problem to an instance of the same problem with some special property that makes the problem easier to solve. List presorting, Gaussian elimination, and AVL trees are good examples of this technique. Representation change implies changing one representation of a problem's instance into another representation of the same instance. Examples include representationof a set by a 2-3 tree, heaps and heapsort, Horner's rule for polynomial evaluation, and two binary exponentiation algorithms. Problem reduction calls for transforming a problem given to another problem that can be solved by a known algorithm. Among examples of applying this idea to algorithmic problem solving, reductions to linear programming and reductions to graph problems are especially imnportant. Some examples used to illustrate the transform-and-conquer techniques happen to be very important data structures and algorithms. They are: heaps and heapsort, AVL and 2-3 trees, Gaussian elimination, and Horner's rule. A heap is an essentially complete binary tree with keys (one per node) satisfying the parental dominance requirement. Though defined as binary trees, heaps are normally implemented as arrays. Heaps are most important for the efficient implementation of priority queues; they also underlie heapsort. Heapsort is a theoritically important sorting algorithm based on arranging elements of an array in a heap and then successively removing the largest element from a remaining heap. The algorithm's running time is in Θ(n log n) both in the worst case and in the average case; in addition, it is in place. AVL trees are binary search trees that are always balanced to the extent possible for a binary tree. The balance is maintained by trtansformations of four tyoes called rotations. All basic operations on AVL trees are in Θ(n log n); it eliminates the bad worst-case efficiency of classic binary search trees. 2-3 tree achieve a perfect balance in a search tree by allowing a node to contain up to two ordered keys and have up to three children. This idea can be generalized to yeild very important B-trees. Gaussian elimination – an algorithm for solving systems of linear equations – is a principal algorithm in linear algebra. It solves a system by transformaing it to an equivalent system with an upper-triangular coefficient matrix, which is easy to solve by backward substitutions. Gaussian elimination requires about 1/3n3 multiplications. Horner's rule is an optimal algorithm for polynomial evaluation without coefficient preprocessing. It requires only n multiplications and n additions. It also has a few useful by-products such as the synthetic division algorithm. Two binary exponentiation algorithms for computing both exploit the binary representation of the exponent n, but they process it in the opposite directions; left to right and right to left. Linear programming concerns optimizing a linear functions of several variables subject to constraints in the form of linear equations and linear inequalities. They are efficient algorithms capable of solving very large instances of this problem with many thousands of variables and constraints, provided the variables are not required to be integers. The latter, called integer linear programming problems, constitute a much more difficult class of problems. Chapter Seven : Space and time tradeoffs in algorithm design are a well-known issue for both theoreticians and practitioners of computing. As an algorithm design technique, trading space for time is much more prevalent than trading time for space. Input enhancement is one of the two principal varieties of trading space for time in algorithm design.Its idea is to preprocess the problem's input, in whole or in part, and store the additional information obtained in order to accelerate solving the problem afterward. Sorting by distribution counting and several important algorithms for string matching are examples of algorithms based on this technique. Distribution counting is a special method for sorting lists of elements from a small set of possible values. Horspool's algorithm for string matching can be considered a simplified version of the Boyer-Moore algorithm .Both algorithms are based on the ideas of input enhancement and right-to-left comparisons of a pattern's characters. Both algorithms use the same bad-symbol shift table; the Boyer-Moore also uses the second table, called the good-suffix shift table. Prestructuring - the second type of technique that exploits space-for-time tradeoffs - uses extra space to facilitate a faster and/or more flexible access to the data. Hashing and B-trees are important examples of prestructuring. Hashingis a very efficient approach to implementing dictionaries. It is based on the idea of mapping keys into a one dimensional table. The size limitations of such a table make it necessary to employ a collision esolution mechanism. The two principal varieties of hashing are open hashing or separate chaining (with keys stored in linked lists outside of the hash table) and closed hashing or open addressing (with keys stored inside the table). Both enable searching, insertion,and deletion in Θ(1) time, on average. The B-tree is a balanced search tree that generalizes the idea of the 2-3 tree by allowing multiple keys at the same node. Its principal application is for keeping index-like information about data stored on a disk. By choosing the order of the tree appropriately, we can implement the operations of searching, insertion, and deletion with just a few disk access even for extremely large files. Chapter Eight : Dynamic Programming Dynamic programming is a technique for solving problems with overlapping subproblems. Typically, these subproblems arise from a recurrence relating a solution to a given problem with solutions to its smaller subproblems of the same type. Dynamic programming suggests solving each smaller subproblem once and recording the results in a table from which a solution to the original problem can be the obtained Applicability of dynamic programming to an optimization problem requires the problem to satisfy the principle of optimality: an optimal solution to any of its instances must be made up of optimal solutions to its subinstances. Computing a binomial coefficient via constructing the Pascal triangle can be viewed as an application of the dynamic programming technique to a nonoptiimization problem. Warshall's algorithm for finding the transitive closure and Floyd's algorithm for the all-pairs shortest-path problem are based on the idea that can be interpreted as an application of the dynamic programming technique. Dynamic programming can be used for constructing an optimal binary search tree for a given set of keys and known probabilities of searching for them. Solving a knapsack problem by dynamic programming algorithm exemplifies an application of this technique to difficult problems of combinatorial optimaization. The memory function technique seeks to combine strengths of the top-down and bottom-up approaches to solving problems with overlapping subproblems. It does this by solving, in the top-down fashion but only once, just necessary subproblems of a given problem and recording their solutions in a table. Chapter Nine : Greedy Technique The greedy technique suggests constructing a solution to an optimization problem through a sequence of steps, each expanding a partially constructed solution obtained so far, until a complete solution to the problem is reached. On each step, the choice made must be feasible, locally optimal, and irrevocable. Prim's algorithm is a greedy algorithm for constructing a minimum spanning tree of a weighted connected graph. It works by attaching to a previously constructed subtree a vertex closest to the vertices already in the tree. Kruskal's algorithm is another greedy algorithm for the minimum spanning tree problem. It constructs aminimum spanning tree by selecting edges in increasing order of their weights provided that the inclusion does not create a cycle. Checking the latter condition efficiently requires an application of one of the so-called union-find algorithms. Dijkstra's algorithm solves the single-source shortest-path problem of finding shortest paths from a given vertex (the source) to all the other vertices of a weighted graph or digraph. It works as Prim's algorithm but compares path lengths rather than edge lengths. Dijkstra's algorithm always yeilds a correct solution for agraph with nonnegative weights. A Huffman tree is a binary tree that minimizes the weighted path length from the root to the leaves containing a set of predefined weights. The most important application of Huffman trees are Huffman codes. A Huffman code is an optimal prefix-free variable-length encoding scheme that assigns bit strings to characters based on their frequencies in a given text. This is accomplished by a greedy construction of a binary tree whose leaves represent the alphabet characters and whose edges are labeled with 0's and 1's.