* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Parallel Computation
Sieve of Eratosthenes wikipedia , lookup
Genetic algorithm wikipedia , lookup
Probabilistic context-free grammar wikipedia , lookup
Fisher–Yates shuffle wikipedia , lookup
A New Kind of Science wikipedia , lookup
Sorting algorithm wikipedia , lookup
Simplex algorithm wikipedia , lookup
Irreducible complexity wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Algorithm characterizations wikipedia , lookup
Smith–Waterman algorithm wikipedia , lookup
Selection algorithm wikipedia , lookup
Multiplication algorithm wikipedia , lookup
Theoretical computer science wikipedia , lookup
Fast Fourier transform wikipedia , lookup
Computational complexity theory wikipedia , lookup
Factorization of polynomials over finite fields wikipedia , lookup
Complexity 21-1 Parallel Computation Complexity Andrei Bulatov Complexity 21-2 Multiprocessor Computation Obviously, several communicating processors often can solve problems faster than a single processor processor1 processor2 processor3 memory1 memory2 memory3 shared memory … processorn memoryn Complexity Assumptions • Exchanging messages takes no time (one step in the other model) • Writing to and reading from the memory takes no time and can be shared • We can use as many processors as we wish (for different instances of the same problem different number of processors can be used), but this number is polynomial in the size of the input - from the point of view of complexity theory, fixed number of processors makes no sense — it gives only constant speed up - we need to reveal “true” parallelism in algorithms - an algorithm with unlimited number of processors can be “scaled” to an algorithm with fixed number of processors 21-3 Complexity 21-4 Example A11 Given two matrices A A n1 compute their product C A B A1n B11 and B B Ann n1 n Recall that Cij Aik Bkj k 1 Straightforward computation of C can be done in O ( n 3 ) Can we do better with parallel computation? Obviously, yes: use a separate processor for each of the n 3 multiplications (denote it Pikj ); then use Pi1 j to compute the sum for Cij Totally, we need time O(n) B1n Bnn Complexity We have obtained polynomial speed-up This is not sufficient: changing the model of sequential computation it is possible to get some polynomial speed-up We seek for exponential drop in the time required, that is an algorithm that works in O(logn) or at least O (log k n ) 21-5 Complexity 21-6 Fast Algorithm for Matrix Multiplication Pi1 j Pi1 j Pi1 j Pi1 j Pi 2 j Pi 9 j Pi 5 j Pi 3 j Pi 3 j Pi 5 j Pi 4 j Pi 5 j Pi 6 j O(log n) steps Pi1 j Pi 7 j Pi 7 j Thus, C can be computed in O(logn) Pi 9 j Pi11 j Pi 8 j Pi 9 j Pi10 j Pi11 j Pi12 j Complexity 21-7 Complexity Measures We solved the Matrix Multiplication problem using n 3 processors in O(logn) steps These two numbers constitute the complexity measure for parallel algorithms: • number of processors • time complexity As we parallelized an algorithm of time complexity O ( n 3 ) the total amount of work done by a parallel algorithm cannot be less than n 3 n3 So the minimal number of processors required is O log n Complexity Brent’s Principle Our algorithm can be improved so that this theoretical bound is achieved n3 • compute n in log n “shifts” using processors at each shift log n 3 n • use the same processors to compute the first log log n log n parallel additions 3 • complete computation as before Since the total number of steps is at most 2 log n, this algorithm works in O(log n) Brent’s Principle If there is a parallelization, A, of a sequential algorithm that works in O(f(n)), such that the time complexity of A is g(n), then there is a parallelization, B, of the same algorithm f (n) such that B works in O(g(n)) and uses processors g (n) 21-8 Complexity Graph Reachability Reachability Instance: A finite directed graph G and vertices s and t Question: Is there a path from s to t? The natural algorithm for Reachability (breadth-first search) cannot be parallelized in an obvious way Even if many processors are arranged to process nodes from the stack, we cannot stop before a path is found that requires at least as many steps as the length of the path, that is up to n – 1 It is suspected that both searches in graphs are inherently sequential 21-9 Complexity 21-10 Adjacency Matrix 1 4 3 G 2 10 1 A(G ) 1 0 1 10 1 0 1 1 10 1 00 00 11 10 What is ( A(G )) 2 ? n A V Aik Akj 2 ij k 1 Aij2 1 if and only if there is a path from i to j of length 2 Therefore, to solve Reachability for all possible pairs of vertices we need to compute An1 This can be done in O (log 2 n ) parallel steps with O ( n 3 log 2 n ) total work