Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction • What is Parallel Algorithms? • Why Parallel Algorithms? • Evolution and Convergence of Parallel Algorithms • Fundamental Design Issues What is Parallel Algorithm? • An algorithm which uses multiple components of hardware and software. • Level of parallelism – – – – – – Circuits -- VLSI Arithmetic Functional Units -- Instructional level Processors Processes Tasks (Job) Data • Closely related to parallel architectures and computation model What is Parallel Architecture? • A parallel computer is a collection of processing elements that cooperate to solve problems fast • Issues: – Resource • how large a collection? • how powerful are the processing elements? – Communication and Synchronization • how do the elements cooperate and communicate? • how are data transmitted between processors? • what are the abstractions and primitives for cooperation? – Performance and Scalability • how does it all translate into performance? • how does it scale? Inevitability of Parallel Computing • Application demands: Demands for computing cycles – Scientific computing: CFD, Biology, Chemistry, Physics, ... – General-purpose computing: Video, Graphics, CAD, Databases, TP... • Technology Trends – Number of transistors on chip growing rapidly – Clock rates expected to go up only slowly • Architecture Trends – Instruction-level parallelism valuable but limited – Coarser-level parallelism, as in MPs, the most viable approach • Economics • Current trends: – Today’s microprocessors have multiprocessor support – Servers and workstations becoming MP: Sun, SGI, DEC, COMPAQ!... – Tomorrow’s microprocessors are multiprocessors Driving Force of Parallel Architectures • Application Trends Applications provide wish lists (Grand Challenge Problems) Weather Simulation, Data Mining, N-body Some problems require terra (10^12) to Peta flops (10^15) • VLSI Technology Trends Smaller minimum feature size (transistor size) Increasing Die Size => Transistor count double every 3 year • Architectural Trends Trends in Computer Performance: Annual rate of 1.35 before 1985, 1.85 after 1986 (RISC) • Programming Model Convergence • Economics Application Trends • Transition to parallel computing has occurred for scientific and engineering computing • In rapid progress in commercial computing – Database and transactions as well as financial – Usually smaller-scale, but large-scale systems also used • Desktop also uses multithreaded programs, which are a lot like parallel programs • Demand for improving throughput on sequential workloads – Greatest use of small-scale multiprocessors • Solid application demand exists and will increase General Technology Trends • Microprocessor performance increases 50% - 100% per year • Transistor count doubles every 3 years • DRAM size quadruples every 3 years • Huge investment per generation is carried by huge commodity market 180 160 140 DEC alpha 120 100 80 60 40 20 MIPS Sun 4 M/120 260 MIPS M2000 IBM RS6000 540 Integer HP 9000 750 0 1987 1988 1989 1990 1991 1992 • Not that single-processor performance is plateauing, but that parallelism is a natural way to improve it. FP Technology: A Closer Look • Basic advance is decreasing feature size ( ) – Circuits become either faster or lower in power • Die size is growing too – Clock rate improves roughly proportional to improvement in – Number of transistors improves like (or faster) • Performance > 100x per decade; clock rate 10x, rest transistor count • How to use more transistors? – Parallelism in processing • multiple operations per cycle reduces CPI – Locality in data access Proc $ • avoids latency and reduces CPI • also improves processor utilization – Both need resources, so tradeoff • Fundamental issue is resource distribution, as in uniprocessors Interconnect Economics • Commodity microprocessors not only fast but CHEAP • Development cost is tens of millions of dollars (5-100 typical) • BUT, many more are sold compared to supercomputers – Crucial to take advantage of the investment, and use the commodity building block – Exotic parallel architectures no more than special-purpose • Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors • Standardization by Intel makes small, bus-based SMPs commodity • Desktop: few smaller processors versus one larger one? – Multiprocessor on a chip Challenge of Parallel Processing: Micro-processors, Memory are cheap. Connect bunches of them together. Use either aynchonously or syncronously. Example: each borad: 100 of 10MFLOP micro-processor each system: 100 Board Speed: 100GFLOP Reason: Cache Coherancy Extra Design Time for Bus, interface Compiler may not find parallelism easily Communication and I/O complexity Economics Amdahl's Law Writing Parallel Programs not easy Amdahl's Law Speedup S is bounded by 1 S < ------------f + (1-f)/p p: number of PEs, f: fraction of serial part. Example: if 10% of a program must be sequentially, maximum speed-up is 10 To achieve S=20 for p=100, 99.75% of codes should be parallel How to overcome the law? Increase problem size: Scaled speed-up. Develop parallel algorithm maximizing the parallelism and minimizing the sequentail operations. How to Solve Communication Overhead Increase Communication bandwith Decrease Communication Latency Communication Latency Hiding Increase Granularity (Figure 8.4 of Henessey's book)