Download Parallel Programming

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

Pattern recognition wikipedia , lookup

Multidimensional empirical mode decomposition wikipedia , lookup

Corecursion wikipedia , lookup

Stream processing wikipedia , lookup

Theoretical computer science wikipedia , lookup

Transcript
Parallel Programming
By J. H. Wang
May 2, 2017
Outline
• Introduction to Parallel Programming
• Parallel Algorithm Design
Motivation
• “Fast” isn’t fast enough
• Faster computers let you tackle larger computations
What’s Parallel Programming
• The use of a parallel computer to reduce the time needed to solve a
single computational problem
• Parallel computer is a multiple-processor system
• Multicomputers, centralized multiprocessors (SMP)
• Programming in a language that allows you to explicitly indicate how
different portions of the computation may be executed concurrently
by different processors
• MPI: Message Passing Interface
• OpenMP: SMP
Concurrency
• To identify operations that may be performed in parallel (concurrently)
• Data dependence graph
• Vertex u: task
• Edge u->v: task v is dependent on task u
• Data parallelism
• Independent tasks applying the same operation to different data elements
• Functional parallelism
• Independent tasks applying different operation to different data elements
• Pipelined computation
• Computation divided into stages
• Size considerations
An Example of Data Dependence Graph
Programming parallel computers
• Parallelizing compilers
• Sequential programs with compiler directives
• To extend a sequential programming language with parallel functions
• For creation, synchronization, and communication of processes, E.g.: MPI
• Adding a parallel programming layer
• Creation and synchronization of processes, partitioning of data
• Parallel language
• Or to add parallel constructs to existing languages
Parallel Algorithm Design
• Task/Channel Model represents a parallel computation as a set of
tasks that interact by sending messages through channels
• Task: a program, its local memory, and a collection of I/O ports
• Channel: a message queue that connects output port with other’s input port
• Asynchronous sending, synchronous receiving
PCAM: a design methodology for parallel
programs
Partitioning
• Dividing the computation and data into pieces
• Domain decomposition
• First divide the data into pieces, then determine how to associate
computations with the data
• Functional decomposition
• First divide the computation into pieces, then determine how to associate
data items with the computations
• E.g. pipelining
• To identify as many primitive tasks as possible
Checklist for partitioning
• There are at least an order of magnitude more primitive tasks than
processors
• Redundant computations and data storage are minimized
• Primitive tasks are roughly the same size
• The number of tasks is an increasing function of the problem size
Communication
• Local communication
• When a task needs values from a small number of other tasks, we create
channels from the tasks supplying data to the task consuming them
• Global communication
• When a significant number of primitive tasks must contribute data in order to
perform a computation
• Part of the overhead of a parallel algorithm
Checklist for communication
• Communication operations are balanced among tasks
• Each task communicates with only a small number of neighbors
• Tasks can perform their communications concurrently
• Tasks can perform their computations concurrently
Agglomeration
• Grouping tasks into larger tasks in order to improve performance or
simplify programming
• Goals of agglomeration
• To lower communication overhead
• Increasing the locality of parallel algorithm
• Another way to lower communication overhead is to combine groups of sending and
receiving tasks, reducing the number of messages being sent
• To maintain the scalability of the design
• To reduce software engineering costs
Checklist of Agglomeration
• The agglomeration has increased the locality of the parallel algorithm
• Replicated computations task less time than the communications they
replace
• The amount of replicated data is small enough to allow the algorithm to
scale
• Agglomerated tasks have similar computational and communications costs
• The number of tasks is an increasing function of the problem size
• The number of tasks is as small as possible, yet at least as great as the
number of processors
• The tradeoff between agglomeration and the cost of modifications to
existing sequential code is reasonable
Mapping
• Assigning tasks to processors
• Goal: to maximize processor utilization and minimize interprocess
communication
• They are usually conflicting goals
• Finding an optimal solution is NP-hard
Checklist for mapping
• Designs based on one task per processor and multiple tasks per
processor have been considered
• Both static and dynamic allocation of tasks to processors have been
evaluated
• For dynamic allocation, the task allocator is not a bottleneck
• For static allocation, the ratio of tasks to processors is at least 10:1
References
• Ian Foster, Designing and Building Parallel Programs, available online
at: http://www.mcs.anl.gov/~itf/dbpp/