Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Towards a Realistic Scheduling Model Oliver Sinnen, Leonel Sousa, Frode Eika Sandnes IEEE TPDS, Vol. 17, No. 3, pp. 263-275, 2006. Parallel processing is the oldest discipline in computer science – yet the general problem is far from solved Why is parallel processing difficult? • ”Jo flere kokker jo mere søl” – Partitioning and transforming problems – Load balancing – Inter-processor communication – Granularity – Architecture Implementing parallel systems • Manually – MPI – PVM – Linda • Automatically – Parallelising compilers (Fortran) – Static scheduling Taskgraph scheduling: Representing static computations Modelling computations A=B+C C Valid sequences: CBA, BCA Invalid sequences: ABC, ACB, CAB, BAC B A Data dependencies Another example A = (B-C)/D F = B+G D C B A G F Scheduling Static taskgraph scheduling techniques The scheduling process A c1 c3 D Allocation C c4 p2 A c2 B p1 B C c5 E D E Taskgraph Schedule Topological sorting • Topological sorting – to order the vertices of a graph such that the precedence constraints are not violated • All valid schedules represent a topological sort • Scheduling algorithms differ in how they topologically sort the graph The importance of abstraction • Abstraction is important to preserve generality • Too specific float sum = 0; for (int i=0;i<8;i++) { sum += a[i]; } • General and flexible float sum = sumArray(a); Communication Communication is a major bottleneck • Typically from 1:50 to 1:10,000 difference between computation and communication • Communication cost not very dependent on data size. • Interconnection network topology affect the overall time. Scheduling work prior to 1995 • Assumptions – Zero-interprocessor communication costs – Fully connected processor interconnection networks. Amounts of data transfer Public transport is a good thing? Data-size not is not major factor • Multiple single messages connect send connect send connect send • Single compound message connect send send send Interconnection topology Fully connected The ring To send something from here.. …to here Interprocessor communication • Zero vs non-zero communication overheads • Direct links vs connecting nodes P11 P12 P13 P1 P2 P3 P14 P4 Bus RAM Shared memory Bus-based multiprocessor P21 P22 P23 P24 P31 P32 P33 P34 P41 P42 P43 P44 Distributed memory Mesh multiprocessor Avoiding communication overheads Duplication 1 a 1 1 1 b 1 c duplication 1 a 1 1 b allocation p1 1 a 1 1 c allocation p2 p1 p2 t=1 a t=1 a a t=2 b t=2 b c t=3 c When considering communication overheads Classic communication model: Assumptions • Local communications have zero communication costs • Communication is conducted by subsystem. • Communication can be performed concurrently • The network is fully connected Implications • Network contention (not modelled) – Tasks compete for communication resources • Contention can be modelled: – Different types of edges – Switch verticies (in addition to processor verticies) Processor involvement in communication I Two-sided involvement (TCP/IP PC-cluster) Processor involvement in communication II One-sided involvement (Shared memory Cray T3E) Processor involvement in communication III Third party involvement (Dedicated DMA hardware Meiko CS-2) Problems • All classic scheduling models assume third-party involvement. • Very little hardware are equipped with dedicated hardware supporting third-party involvement. • Estimated finish-times for tasks are hugely inaccurate. • Scheduling algorithm are very suboptimal. Even more problems Results bobcat Sun E3500 3TE-900 The End