Download Introduction to The Design and Analysis of Algorithms

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Introduction to The Design and Analysis of Algorithms
Chapter One : Introduction








An algorithm ios a sequence of nonambiguous instructions for solving a
problem in a finite amount of time. An input to an algorithm specifies an
instance of the problem the algorithm solves.
Algorithms can be specified in a natural language or a pseudocode; they can
also be implemented as computer programs.
Among several ways to classify algorithms, the two principal alternatives are:
 to group algorithms according to types of problem they solve;
 to group algorithms according to underlying design techniques they are
based upon.
The important problem types are sorting, searching, string processing, graph
problems, combinatorial problems, geometrical problems, and numerical
problems.
Algorithm design techniques ( or "strategies" or "paradigms") are general
approaches to solving problems algorithmically, applicable to a variety of
problems from different areas of computing.
Allthough desiogning an algorithm is undoubtably a creative activity, one can
identify a sequence of interrelated actions involved in such a process. They are
summarized in the following figure.
A good algorithm is usually a result of repeated efforts and rework.
The same problem can often be solved by several algorithms. For example ,
three algorithms were given for computing the greatest common divisor of two
integers: Euclid's algorithm, the consecutive integer checking algorithm, and


the middle-school algorithm ( enhanced by the sieve of Eratosthenes for
generating a list of primes).
Algorithms operate on data. This makes the issue of data structuring critical
for efficient algorithmic problem solving. The most important elementary data
structures are the array and the linked list. They are used for representing
more abstract data structures such as the list, the stack, the queue, the graph
(via its adjacency matrix or adjacency linked list), the binary tree, and the set.
An abstract collection of objects with several operations that can be performed
on them is called an abstract data type (ADT). The list, the stack, the queue,
the priority queue, and the dictionary are important examples of abstract data
types. Modern object-oriented languages support implementation of ADT's by
means of classes.
Chapter Two : Fundamentals of the Analysius of Algorithm
Efficiency











There are two kinds of algorithm efficiency: time efficiency and space
efficiency. Time efficiency indicates how fast the algorithm runs; space
efficiency deals with the extra space it requires.
An algorithm's time efficiency is principally measured as a function of its
input size by counting the number of times its basic operation is executed. A
basic operation is the operation that contributes most toward running time.
Typically, it is the most time-consuming operation in the algorithm's
innermost loop.
For some algoritms, the running time may differ considerably for inoputs of
the same size, leading to worst-case efficiency, average-case efficiency, and
best-case efficiency.
The established framework for analyzing an algorithm's time efficiency is
primarily grounded in the order of growth of the algorithm's running time as
its input size goes to infinity.
The notation Ο, Ω, and Θ are used to indicate and compare the asymptotic
orders of growth of functions expressing algorithm efficiencies.
The efficiencies of a large number of algorithms fall into the following few
classes: constant, logarithmic, linear, "n-log-n'" quadratic, cubic, and
exponential.
The main tool for analyzing the time efficiency of a nonrecursive algorithm is
to set up a sum expressing the number of executions of its basic operation and
ascertain the sum's order of growth.
The main tool for analyzing the time efficiency of a recursive algorithm is to
set up a recurrence relatin expressing the number of executions of its basic
operation and ascertain the sum's order of growth.
Succinctness of a recursive algorithm may mask its inefficiency
The Fibonacci numbers are an important sequence of integers in which every
element is equal to the sum of its two immediate predesessors. There are
several algorithms for computing the Fibonacci numbers with drastically
different efficiencies.
Empirical analysis of an algorithm is performed by running a program
implementing the algorithm on a sample of inputs and analyzing the data

observed ( the basic operation's count or physical running time). This often
involves generating psudorandom numbers. The applicability to any algorithm
is the principle strength of this approach; the dependence of results on the
particular computer and instance sample is its main weakness.
Algorithm visualization is the use of images to convey useful information
about algorithms. The two principal variations of algorithm visualization are
static algorithm visualization and dynamic algorithm visualization (also called
algorithm animation).
Chapter Three : Brute Force







Brute force is a straightforward approach to solving a problem, usually
directly based on the problem's statement and definitions of the concepts
involved.
The principal strengths of the brute-force approach are wide applicability and
simplicity; its principal weakness is subpar efficiency of most brute-force
algorithms.
A first application of the brute-force approach often results in an algorithm
that can be improved with a modest amount of effort.
The following noted algorithms can be considered as examples of the bruteforce approach:
 definition-based algorithm for matrix multiplication
 selection sort
 sequential search
 straightforward string matching algorithm.
Exhaustive search is a brute-force approach to combinatorial problems. It
suggests generating each and every combinatorial object of the problem,
selecting those of them that satisfy the problem's constraints and then finding a
desired object.
The traveling salesman problem, the knapsack problem, and the assignment
problem are typical example of problems that can be solved, at least
theoritically, by exhaustive-search algorithms.
Exhaustive search is impractical for all but very small instaces of problems it
can be applied to.
Chapter Four : Divide and Conquer


Divide-and-conquer is a general algorithm design technique that solves a
problem's instance by dividing it into several smaller instances (ideally of
equal size), solving each of them recursively, and then combining their
solutions to get a solution to the original instance of the problem. Many
efficient algorithms are based on this technique, although it can be both
inapplicable and inferior to simpler algorithmic solutions.
Time efficiency T(n) of many divide-and-conquer algorithm satisfies the
equation T(n) = aT(n/b) + f(n). The Master Theorem establishesthe order of
growth of this equation's solutions.







Mergesort is a divide-and-conquer sorting algorithm. It works by dividing an
input array into two halves, sorting them recusively, and the merging the two
sorted halves to get the original array sorted. The algorithm's time efficiencyis
in Θ(n logn) in all cases, with the number of key comparisons being very close
to the theoritical minimum. Its principal drawback is a significant extra storage
requirement.
Quicksort is a divide-and-conquer sorting algorithm that works by partitioning
its input's element according to their value relative to some preselected
element. Quicksort is noted for its superior efficiency among n log n
algorithms for sorting randomly ordered array but also for the quadratic worstcase efficieny.
Binary search is a Ο(log n) algorithm for searching in sorted arrays. It is a
typical example of an application of the divide-and-conquer technique because
it need to solve just one problem of half the size on each of its iterations.
The classic traversals of a binary tree – preorder, inorder, and postorder – and
similar algorithms that require recursive processing of both left and right
subtrees can be considered examples of the divide-and-conquer technique.
Their analysis is helped by replacing all the empty subtrees of a given tree
with special external nodes.
There is a divide-and-conquer algorithm for multiplying two n-digit integers
that requires about n1.585 one digit multiplications.
Starassen's algorithm needs only seven multiplications to multiply 2-by-2
matrices but requires more additions that the definition-based algorithm. By
exploiting the divide-and-conquer technique, this algorithm can multiply two
n-by-n matrices with about n2.807 multiplications.
That divide-and-conquer technique can be successfully applied to two
important problems of computational geometry: the closest-pair problem and
the convex-hull problem.
Chapter Five : Decrease and Conquer




Decrease-and-conquer is a general algorithm design technique, based on
exploiting a relationship between a solution to a given instance of a problem
and a solution to a smaller instance of the same problem. One such a
relationship is established, it can be exploited either top down (recursively) or
bottom up (without a recursion).
There are three major variations of decrease-and-conquer:
 decrease by a constant, most often by ione (e.g. insertin sort);
 decrease by a constant factor. most often by the factor of two (e.g. binary
search);
 variable size decrease (e.g. Euclid's algorithm).
Insertion sort is a direct application of the decrease (by one)-and-conquer
technique to the sorting problem. It is a Θ(n2) algorithm both in the worst and
average cases but it is about twice as fast on average that in the worst case.
The algorithm's notable advantage is a good performance on almost sorted
arrays.
Depth-first-search (DFS) and breadth-first-search (BFS) are two principal
graph traversal algorithms. By representing a graph in a form of a depth-first





or breadth-first search forest, they help in the investigation of many important
properties of the graph. Both algorithm have the same time efficiency: Θ(|V|2)
for the adjacency matrix representation and Θ(|V| + |E|) for the adjacency
linked list representation.
A digraph is a graph with directions on its edges. The topological sorting
problem asks to list vertices of a digraph in an order such that for every edge
of the digraph the vertex it starts at is listed before the vertex it points to. This
problem has a solution if and only if a digraph is a dag (directed acyclic
graph), i.e., it has no directed cycles.
There are two algorithms for solving th etopological sorting problem. The first
one is based on depth-first search; the second is based on the direct
implementation of the decrese-by –one technique.
Decrease-by-one technique is a natural approach to developing algorithms for
generating elementary combinatorial objects. The most efficient class of such
algorithms are minimal-change algorithms. However, the number of
combinatorial objects grows so fast that even the best algorithms are of
practical interest only for very small instances of such problems.
Identifying a fake coin with a balanced scale, multiplication a la russe, and the
Josephus problem, are examples of problems that can be solved by decreaseby –a-constant-factor algorithms. Two other and more important examples are
binary search and exponentiation by squaring.
For some algorithms based on the decrease-and–conquer technique, the size
reduction varies from one iteration of the algorithm to another. Examples of
such variable-size-decrease algorithms include Euclid's algorithm, the
partition-cased algorithm for the selection problem, interpolation search, and
searching and insertion in a binary search tree.
Chapter Six : Transform and Conquer






Transform-and-conquer is a group of techniques based on the idea of
transformation to a problem that is easier to solve.
There are thre principal varieties of the transform-and-conquer strategy:
instance simplification, representation change, and problem reduction.
Instance simplification is a technique of transforming an instance of a problem
to an instance of the same problem with some special property that makes the
problem easier to solve. List presorting, Gaussian elimination, and AVL trees
are good examples of this technique.
Representation change implies changing one representation of a problem's
instance into another representation of the same instance. Examples include
representationof a set by a 2-3 tree, heaps and heapsort, Horner's rule for
polynomial evaluation, and two binary exponentiation algorithms.
Problem reduction calls for transforming a problem given to another problem
that can be solved by a known algorithm. Among examples of applying this
idea to algorithmic problem solving, reductions to linear programming and
reductions to graph problems are especially imnportant.
Some examples used to illustrate the transform-and-conquer techniques
happen to be very important data structures and algorithms. They are: heaps
and heapsort, AVL and 2-3 trees, Gaussian elimination, and Horner's rule.








A heap is an essentially complete binary tree with keys (one per node)
satisfying the parental dominance requirement. Though defined as binary
trees, heaps are normally implemented as arrays. Heaps are most important for
the efficient implementation of priority queues; they also underlie heapsort.
Heapsort is a theoritically important sorting algorithm based on arranging
elements of an array in a heap and then successively removing the largest
element from a remaining heap. The algorithm's running time is in Θ(n log n)
both in the worst case and in the average case; in addition, it is in place.
AVL trees are binary search trees that are always balanced to the extent
possible for a binary tree. The balance is maintained by trtansformations of
four tyoes called rotations. All basic operations on AVL trees are in Θ(n log
n); it eliminates the bad worst-case efficiency of classic binary search trees.
2-3 tree achieve a perfect balance in a search tree by allowing a node to
contain up to two ordered keys and have up to three children. This idea can be
generalized to yeild very important B-trees.
Gaussian elimination – an algorithm for solving systems of linear equations –
is a principal algorithm in linear algebra. It solves a system by transformaing it
to an equivalent system with an upper-triangular coefficient matrix, which is
easy to solve by backward substitutions. Gaussian elimination requires about
1/3n3 multiplications.
Horner's rule is an optimal algorithm for polynomial evaluation without
coefficient preprocessing. It requires only n multiplications and n additions. It
also has a few useful by-products such as the synthetic division algorithm.
Two binary exponentiation algorithms for computing both exploit the binary
representation of the exponent n, but they process it in the opposite directions;
left to right and right to left.
Linear programming concerns optimizing a linear functions of several
variables subject to constraints in the form of linear equations and linear
inequalities. They are efficient algorithms capable of solving very large
instances of this problem with many thousands of variables and constraints,
provided the variables are not required to be integers. The latter, called integer
linear programming problems, constitute a much more difficult class of
problems.
Chapter Seven :



Space and time tradeoffs in algorithm design are a well-known issue for both
theoreticians and practitioners of computing. As an algorithm design
technique, trading space for time is much more prevalent than trading time for
space.
Input enhancement is one of the two principal varieties of trading space for
time in algorithm design.Its idea is to preprocess the problem's input, in whole
or in part, and store the additional information obtained in order to accelerate
solving the problem afterward. Sorting by distribution counting and several
important algorithms for string matching are examples of algorithms based on
this technique.
Distribution counting is a special method for sorting lists of elements from a
small set of possible values.




Horspool's algorithm for string matching can be considered a simplified
version of the Boyer-Moore algorithm .Both algorithms are based on the ideas
of input enhancement and right-to-left comparisons of a pattern's characters.
Both algorithms use the same bad-symbol shift table; the Boyer-Moore also
uses the second table, called the good-suffix shift table.
Prestructuring - the second type of technique that exploits space-for-time
tradeoffs - uses extra space to facilitate a faster and/or more flexible access to
the data. Hashing and B-trees are important examples of prestructuring.
Hashingis a very efficient approach to implementing dictionaries. It is based
on the idea of mapping keys into a one dimensional table. The size limitations
of such a table make it necessary to employ a collision esolution mechanism.
The two principal varieties of hashing are open hashing or separate chaining
(with keys stored in linked lists outside of the hash table) and closed hashing
or open addressing (with keys stored inside the table). Both enable searching,
insertion,and deletion in Θ(1) time, on average.
The B-tree is a balanced search tree that generalizes the idea of the 2-3 tree by
allowing multiple keys at the same node. Its principal application is for
keeping index-like information about data stored on a disk. By choosing the
order of the tree appropriately, we can implement the operations of searching,
insertion, and deletion with just a few disk access even for extremely large
files.
Chapter Eight : Dynamic Programming







Dynamic programming is a technique for solving problems with overlapping
subproblems. Typically, these subproblems arise from a recurrence relating a
solution to a given problem with solutions to its smaller subproblems of the
same type. Dynamic programming suggests solving each smaller subproblem
once and recording the results in a table from which a solution to the original
problem can be the obtained
Applicability of dynamic programming to an optimization problem requires
the problem to satisfy the principle of optimality: an optimal solution to any of
its instances must be made up of optimal solutions to its subinstances.
Computing a binomial coefficient via constructing the Pascal triangle can be
viewed as an application of the dynamic programming technique to a
nonoptiimization problem.
Warshall's algorithm for finding the transitive closure and Floyd's algorithm
for the all-pairs shortest-path problem are based on the idea that can be
interpreted as an application of the dynamic programming technique.
Dynamic programming can be used for constructing an optimal binary search
tree for a given set of keys and known probabilities of searching for them.
Solving a knapsack problem by dynamic programming algorithm exemplifies
an application of this technique to difficult problems of combinatorial
optimaization.
The memory function technique seeks to combine strengths of the top-down
and bottom-up approaches to solving problems with overlapping subproblems.
It does this by solving, in the top-down fashion but only once, just necessary
subproblems of a given problem and recording their solutions in a table.
Chapter Nine : Greedy Technique






The greedy technique suggests constructing a solution to an optimization
problem through a sequence of steps, each expanding a partially constructed
solution obtained so far, until a complete solution to the problem is reached.
On each step, the choice made must be feasible, locally optimal, and
irrevocable.
Prim's algorithm is a greedy algorithm for constructing a minimum spanning
tree of a weighted connected graph. It works by attaching to a previously
constructed subtree a vertex closest to the vertices already in the tree.
Kruskal's algorithm is another greedy algorithm for the minimum spanning
tree problem. It constructs aminimum spanning tree by selecting edges in
increasing order of their weights provided that the inclusion does not create a
cycle. Checking the latter condition efficiently requires an application of one
of the so-called union-find algorithms.
Dijkstra's algorithm solves the single-source shortest-path problem of finding
shortest paths from a given vertex (the source) to all the other vertices of a
weighted graph or digraph. It works as Prim's algorithm but compares path
lengths rather than edge lengths. Dijkstra's algorithm always yeilds a correct
solution for agraph with nonnegative weights.
A Huffman tree is a binary tree that minimizes the weighted path length from
the root to the leaves containing a set of predefined weights. The most
important application of Huffman trees are Huffman codes.
A Huffman code is an optimal prefix-free variable-length encoding scheme
that assigns bit strings to characters based on their frequencies in a given text.
This is accomplished by a greedy construction of a binary tree whose leaves
represent the alphabet characters and whose edges are labeled with 0's and 1's.