Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Single Assignment C: A High Productivity Language for High Performance Computing Clemens Grelck Workshop Trends in High-Performance Distributed Computing Amsterdam, Netherlands March 14, 2012 Clemens Grelck, Universiteit van Amsterdam Single Assignment C The HPˆ3 Triangle in High Performance Computing High Productivity High Performance High Portability Clemens Grelck, Universiteit van Amsterdam Single Assignment C HPC in the Age of Multi- and Many-Core Welcome to the Cluster Node Hardware Zoo!! Clemens Grelck, Universiteit van Amsterdam Single Assignment C HPC in the Age of Multi- and Many-Core Welcome to the Cluster Node Hardware Zoo!! Clemens Grelck, Universiteit van Amsterdam Single Assignment C HPC in the Age of Multi- and Many-Core Welcome to the Cluster Node Hardware Zoo!! Clemens Grelck, Universiteit van Amsterdam Single Assignment C HPC in the Age of Multi- and Many-Core Welcome to the Cluster Node Hardware Zoo!! Clemens Grelck, Universiteit van Amsterdam Single Assignment C HPC in the Age of Multi- and Many-Core Welcome to the Cluster Node Hardware Zoo!! Clemens Grelck, Universiteit van Amsterdam Single Assignment C HPC in the Age of Multi- and Many-Core Welcome to the Cluster Node Hardware Zoo!! Clemens Grelck, Universiteit van Amsterdam Single Assignment C HPC in the Age of Multi- and Many-Core Welcome to the Cluster Node Hardware Zoo!! Clemens Grelck, Universiteit van Amsterdam Single Assignment C HPC in the Age of Multi- and Many-Core Compute node hardware is a zoo: I Vastly different numbers of cores I Vastly different core architectures: general, restricted I Vastly different memory architectures Clemens Grelck, Universiteit van Amsterdam Single Assignment C HPC in the Age of Multi- and Many-Core Compute node hardware is a zoo: I Vastly different numbers of cores I Vastly different core architectures: general, restricted I Vastly different memory architectures Programming diverse hardware is uneconomic: I Various low-level programming models I Each requires different expert knowledge I Heterogeneous combinations of the above ? I Cumbersome, error-prone and inefficient Clemens Grelck, Universiteit van Amsterdam Single Assignment C An Alternative: Single Assignment C (SAC) Credo: abstraction, abstraction, abstraction I Program what to compute, not exactly how I Leave concrete organisation of execution to compiler and runtime system I Put expert knowledge into tools, not into applications I Architecture-agnostic programming for portability I Compile one source to diverse target hardware I Automatically manage resources: memory, cores, etc Clemens Grelck, Universiteit van Amsterdam Single Assignment C SAC — Design Space SAC Clemens Grelck, Universiteit van Amsterdam Single Assignment C SAC — Design Space High-level functional, data-parallel programming with vectors, matrices, arrays SAC Clemens Grelck, Universiteit van Amsterdam Single Assignment C SAC — Design Space High-level functional, data-parallel programming with vectors, matrices, arrays SAC Suitability to achieve high performance in sequential and parallel execution Clemens Grelck, Universiteit van Amsterdam Single Assignment C SAC — Design Space High-level functional, data-parallel programming with vectors, matrices, arrays SAC Easy to adopt for programmers with an imperative background Suitability to achieve high performance in sequential and parallel execution Clemens Grelck, Universiteit van Amsterdam Single Assignment C + FFT (rofu) FFT roots of unity Case Study: 1-Dimensional Complex FFT (NAS-FT) complex [.] FFT ( complex [.] v , complex [.] rofu ) { even = condense (2 , v ); odd = condense (2 , drop ( [1] , v )); even = FFT ( even , rofu ); odd = FFT ( odd , rofu ); * rofu = condense ( len ( rofu ) / len ( odd ) , rofu ); − left = even + odd * rofu ; right = even - odd * rofu ; return left ++ right ; } Clemens Grelck, Universiteit van Amsterdam Single Assignment C + FFT (rofu) FFT roots of unity Case Study: 1-Dimensional Complex FFT (NAS-FT) complex [.] FFT ( complex [.] v , complex [.] rofu ) { even = condense (2 , v ); odd = condense (2 , drop ( [1] , v )); even = FFT ( even , rofu ); odd = FFT ( odd , rofu ); * rofu = condense ( len ( rofu ) / len ( odd ) , rofu ); − left = even + odd * rofu ; right = even - odd * rofu ; return left ++ right ; } Now, what is functional array programming ? Clemens Grelck, Universiteit van Amsterdam Single Assignment C + FFT (rofu) FFT roots of unity Functional Array Programming complex [.] FFT ( complex [.] v , complex [.] rofu ) { even = condense (2 , v ); odd = condense (2 , drop ( [1] , v )); even = FFT ( even , rofu ); odd = FFT ( odd , rofu ); * rofu = condense ( len ( rofu ) / len ( odd ) , rofu ); − left = even + odd * rofu ; right = even - odd * rofu ; return left ++ right ; } Role of Functions: I Map argument values to result values I No side effects I Call-by-value parameter passing Clemens Grelck, Universiteit van Amsterdam Single Assignment C + FFT (rofu) FFT roots of unity Functional Array Programming complex [.] FFT ( complex [.] v , complex [.] rofu ) { even = condense (2 , v ); odd = condense (2 , drop ( [1] , v )); even = FFT ( even , rofu ); odd = FFT ( odd , rofu ); * rofu = condense ( len ( rofu ) / len ( odd ) , rofu ); − left = even + odd * rofu ; right = even - odd * rofu ; return left ++ right ; } Role of Variables: I Variables are placeholders for values I Variables do not denote memory locations I Automatic memory management Clemens Grelck, Universiteit van Amsterdam Single Assignment C + FFT (rofu) FFT roots of unity Functional Array Programming complex [.] FFT ( complex [.] v , complex [.] rofu ) { even = condense (2 , v ); odd = condense (2 , drop ( [1] , v )); even = FFT ( even , rofu ); odd = FFT ( odd , rofu ); * rofu = condense ( len ( rofu ) / len ( odd ) , rofu ); − left = even + odd * rofu ; right = even - odd * rofu ; return left ++ right ; } Execution Model: I Contextfree substitution of expressions I No side-effects Clemens Grelck, Universiteit van Amsterdam Single Assignment C + FFT (rofu) FFT roots of unity Functional Array Programming complex [.] FFT ( complex [.] v , complex [.] rofu ) { even = condense (2 , v ); odd = condense (2 , drop ( [1] , v )); even = FFT ( even , rofu ); odd = FFT ( odd , rofu ); * rofu = condense ( len ( rofu ) / len ( odd ) , rofu ); − left = even + odd * rofu ; right = even - odd * rofu ; return left ++ right ; } Control flow constructs: I Branches are syntactic sugar for conditional expressions I Loops are syntactic sugar for tail-end recursive functions I Data flow determines execution order Clemens Grelck, Universiteit van Amsterdam Single Assignment C + FFT (rofu) FFT roots of unity Functional Array Programming complex [.] FFT ( complex [.] v , complex [.] rofu ) { even = condense (2 , v ); odd = condense (2 , drop ( [1] , v )); even = FFT ( even , rofu ); odd = FFT ( odd , rofu ); * rofu = condense ( len ( rofu ) / len ( odd ) , rofu ); − left = even + odd * rofu ; right = even - odd * rofu ; return left ++ right ; } Nature of Arrays: I Pure values, mapping indices to (other) values I No state, no fixed memory representation I Maybe no memory manifestation at all Clemens Grelck, Universiteit van Amsterdam Single Assignment C Case Study: 3-Dimensional Complex FFT (NAS-FT) Algorithmic idea: FFT1d FFT1d FFT1d Implementation: complex [. ,. ,.] FFT ( complex [. ,. ,.] a , complex [.] rofu ) { b = { [. ,y , z ] -> FFT ( a [. ,y , z ] , rofu ) }; c = { [x ,. , z ] -> FFT ( b [x ,. , z ] , rofu ) }; d = { [x ,y ,.] -> FFT ( c [x ,y ,.] , rofu ) }; return d ; } typedef double [2] complex ; Clemens Grelck, Universiteit van Amsterdam Single Assignment C The Same in Fortran Clemens Grelck, Universiteit van Amsterdam Single Assignment C The Power of Abstraction I Programming by composition of building blocks: I I I I I Opportunities for compiler and runtime system: I I I I I Rapid prototyping High confidence in correctness Good readability of code Plenty of code reuse opportunities Target-independent optimisation Code generation for variety of target architectures Automatic parallelisation Automatic memory management Result: I I I Sufficient performance For a range of architectures Without extra effort Clemens Grelck, Universiteit van Amsterdam Single Assignment C Compilation Challenge SAC Functional Array Programming SAC2C Advanced Compiler Technology FPGAs Amsterdam MicroGrid Architecure Multi−Core Multi− Processors Systems on a Chip GPGPU Accelerators And achieve reasonable performance.... Goal: achieve 90% with 10% of the effort Clemens Grelck, Universiteit van Amsterdam Single Assignment C Clusters SAC as a Compiler Technology Project Some Figures: I SAC compiler + runtime library: I I I 300,000 lines of code, about 1000 files about 250 compiler passes + standard prelude + standard library I More than 15 years of research and development I More than 30 people involved over the years Involved Universities: I University of Kiel, Germany (1994–2005) I University of Toronto, Canada (since 2000) I University of Lübeck, Germany (2001–2008) I University of Hertfordshire, England (2004–2012) I University of Amsterdam, Netherlands (since 2008) I Heriot-Watt University, Scotland (since 2011) Clemens Grelck, Universiteit van Amsterdam Single Assignment C Conclusion High Productivity High Performance High Portability Single Assignment C: I reconcile productivity, portability and performance I one source for all architectures I functional array programming I exposure of fine-grained concurrency I automatically sequentialising compiler I automatic resource management www.sac-home.org Clemens Grelck, Universiteit van Amsterdam Single Assignment C