Download Parallel Computation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sieve of Eratosthenes wikipedia , lookup

Genetic algorithm wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

A New Kind of Science wikipedia , lookup

Sorting algorithm wikipedia , lookup

Simplex algorithm wikipedia , lookup

Irreducible complexity wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Algorithm characterizations wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Selection algorithm wikipedia , lookup

Multiplication algorithm wikipedia , lookup

Theoretical computer science wikipedia , lookup

Algorithm wikipedia , lookup

Fast Fourier transform wikipedia , lookup

Computational complexity theory wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Time complexity wikipedia , lookup

Transcript
Complexity
21-1
Parallel Computation
Complexity
Andrei Bulatov
Complexity
21-2
Multiprocessor Computation
Obviously, several communicating processors often can solve problems
faster than a single processor
processor1
processor2
processor3
memory1
memory2
memory3
shared memory
…
processorn
memoryn
Complexity
Assumptions
• Exchanging messages takes no time (one step in the other model)
• Writing to and reading from the memory takes no time and can be shared
• We can use as many processors as we wish (for different instances of
the same problem different number of processors can be used), but this
number is polynomial in the size of the input
- from the point of view of complexity theory, fixed number of processors
makes no sense — it gives only constant speed up
- we need to reveal “true” parallelism in algorithms
- an algorithm with unlimited number of processors can be “scaled” to an
algorithm with fixed number of processors
21-3
Complexity
21-4
Example
 A11 


Given two matrices A   
A
 n1 
compute their product C  A  B
A1n 
 B11


  and B   
B
Ann 
 n1
n
Recall that Cij   Aik  Bkj
k 1
Straightforward computation of C can be done in O ( n 3 )
Can we do better with parallel computation?
Obviously, yes: use a separate processor for each of the n 3
multiplications (denote it Pikj );
then use Pi1 j to compute the sum for Cij
Totally, we need time O(n)



B1n 

 
Bnn 
Complexity
We have obtained polynomial speed-up
This is not sufficient: changing the model of sequential computation it is
possible to get some polynomial speed-up
We seek for exponential drop in the time required, that is an algorithm that
works in O(logn) or at least O (log k n )
21-5
Complexity
21-6
Fast Algorithm for Matrix Multiplication
Pi1 j
Pi1 j
Pi1 j
Pi1 j
Pi 2 j
Pi 9 j
Pi 5 j
Pi 3 j
Pi 3 j
Pi 5 j
Pi 4 j Pi 5 j
Pi 6 j
O(log n) steps
Pi1 j
Pi 7 j
Pi 7 j
Thus, C can be computed in O(logn)
Pi 9 j
Pi11 j
Pi 8 j Pi 9 j Pi10 j Pi11 j Pi12 j
Complexity
21-7
Complexity Measures
We solved the Matrix Multiplication problem using n 3 processors in O(logn)
steps
These two numbers constitute the complexity measure for parallel algorithms:
• number of processors
• time complexity
As we parallelized an algorithm of time complexity O ( n 3 ) the total
amount of work done by a parallel algorithm cannot be less than n 3
 n3 
So the minimal number of processors required is O 

 log n 
Complexity
Brent’s Principle
Our algorithm can be improved so that this theoretical bound is achieved
 n3 
• compute n in log n “shifts” using 
 processors at each shift
log
n


3
 n 
• use the same 
processors to compute the first log log n

 log n 
parallel additions
3
• complete computation as before
Since the total number of steps is at most 2 log n, this algorithm works in
O(log n)
Brent’s Principle
If there is a parallelization, A, of a sequential algorithm that
works in O(f(n)), such that the time complexity of A is
g(n), then there is a parallelization, B, of the same algorithm
f (n)
such that B works in O(g(n)) and uses
processors
g (n)
21-8
Complexity
Graph Reachability
Reachability
Instance: A finite directed graph G and vertices s and t
Question: Is there a path from s to t?
The natural algorithm for Reachability (breadth-first search)
cannot be parallelized in an obvious way
Even if many processors are arranged to process nodes from the stack,
we cannot stop before a path is found that requires at least as many
steps as the length of the path, that is up to n – 1
It is suspected that both searches in graphs are inherently sequential
21-9
Complexity
21-10
Adjacency Matrix
1
4
3
G
2
 10

1

A(G ) 
1

0
1
10
1
0
1
1
10
1
00

00
11

10
What is ( A(G )) 2 ?
n
A  V Aik  Akj
2
ij
k 1
Aij2  1 if and only if there is a path from i to j of length 2
Therefore, to solve Reachability for all possible pairs of vertices
we need to compute An1
This can be done in O (log 2 n ) parallel steps with O ( n 3 log 2 n ) total work