Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stamatis Vassiliadis Symposium The Future of Computing A+A=A Mateo Valero Barcelona Supercomputing Center To Stamatis, my loved friend 1 A+A=A Stamatis Vassiliadis Symposium The way we all do research ... As seen from HPCA 1999 • Microarchitecture idea Applications Compiler Production, public, custom, … Simulator Public, custom, … Results 2 A+A=A SPEC, PerfectClub, TPC-D, NAS, Splash … How much we get from our idea Stamatis Vassiliadis Symposium The Past Future ... As seen from HPCA 1999 Applications Algorithms Absolutely obsessed with going to the limits of extracting available ILP on a single core Compiler Architecture Hardware 3 A+A=A Stamatis Vassiliadis Symposium The Past Future Continued: Advanced ILP Techniques for Superscalar Processors • • • • • • • • • • • 4 Optimized Pipeline Cache Branch Predictors Instruction Collapsing Value Prediction Reuse Assisted/Subordinated Threads Trace Cache/Processor Control/Data Speculation Kilo-instruction Processors ……… A+A=A Stamatis Vassiliadis Symposium Distant Parallelism: Non-numerical applications • (In)Dependent threads: e.g. m88ksim check_issue kill_time TIMING • Application speed-up: 2.65 statistics cmmutime Real_execution EXE Sbus2 breakpoint? FETCH PC guess 5 A+A=A breakpoint? Stamatis Vassiliadis Symposium fetch_next The “immediate” future: Number of cores doubled every 18 months “It is better for Intel to get involved in this now so when we get to the point of having 10s and 100s of cores we will have the answers. There is a lot of architecture work to do to release the potential, and we will not bring these products to market until we have good solutions to the programming problem” Justin Rattner Intel CTO Marenostrum Most beautiful supercomputer Fortune magazine, Sept. 2006 #1 in Europe, #5 in the World 100's of TeraFlops with general purpose Linux supercluster of commodity PowerPC-based Blade Servers “Now, the grains inside these machines more and more will be multi-core type devices, and so the idea of parallelization won't just be at the individual chip level, even inside that chip we need to explore new techniques like transactional memory that will allow us to get the full benefit of all those transistors and map that into higher and higher performance.” Bill Gates, Supercomputing 05 keynote 6 A+A=A Stamatis Vassiliadis Symposium Supercomputers will likely have millions of processing cores 7 A+A=A Stamatis Vassiliadis Symposium The “far” future (e.g. 2017) and The big question! How to solve the programming problem? a.k.a. How to program the beast? • • How to enable the power of the hundreds to millions of cores on a system? • We need a multidisciplinary top-down approach to this, including Computer Architects must adapt their thinking. From now on, parallel software requirements will directly drive systems design • • • • • • • • Applications Algorithms Debugging Programming models Programming languages Compilers Operating Systems Runtime environment … as design drivers for future Architectures 8 A+A=A Stamatis Vassiliadis Symposium The holistic view: A + A = A How to solve the programming problem? a.k.a. How to program the beast? • • How to enable the power of the hundreds to millions of cores on a system? • We need a multidisciplinary top-down approach to this, including Computer Architects must adapt their thinking. From now on, many-core software requirements will directly drive processor design • • • • • • • • Applications Algorithms Debugging Programming models Programming languages Compilers Operating Systems Runtime environment … as design drivers 9 A+A=A Stamatis Vassiliadis Symposium Applirithms + Adhesive = Architecture Far Future: Applications • • • • • What will be the typical applications in 2017? Is it Dwarfs and/versus RMS the right path to follow? Applications are ephemeral but the kernels are forever: the applications may change, the kernels stay the same. Will streaming applications require new architectures? Are we approaching the special purpose accelerators for specific applications? M. Valero. Microsoft Workshop on Multicore, Seattle, June-2007 10 A+A=A Stamatis Vassiliadis Symposium Far Future: Algorithms • Bad news (for some folks): “Rethink and rewrite the algorithms” • For manycores, the algorithms need to carefully consider: • The right level of parallelism • Load Balancing • Communication-Computation overlapping • Speculation (e.g. in message passing) Microsoft Workshop on Multicore, Seattle, June-2007 11 A+A=A Stamatis Vassiliadis Symposium Source: Jack Dongarra Top-Down CMP Design, an initial programmer wishlist • Easy-to-express paralellism • • • Transactional Memory (TM): Compared to locks, TM provides an easy to use mechanism for ensuring mutual exclusion Hide all kind of non-uniformities to the programmer (heterogeneous cores, non-uniform memory access, …) Continue using standard tools • • • • 12 OpenMP: the industry standard for writing parallel programs on shared memory TM and OpenMP combines ease with familiarity for programming multi-cores • • BSC-UPC-Microsoft: IWOMP07, MEDEA07 Stanford: PACT07 Dataflow model ideally suited to express paralelism • Cell Superscalar = Distant Parallelism+Data Flow+ Out of Order Execution Super computers: MPI+ (OpenMP/Cell Superscalar)+TM)) A+A=A Stamatis Vassiliadis Symposium Chip organization in 2017: many-core Will they be homogeneous or heterogeneous?. Arrays of simple in order cores, fewer complex out of order or a mix of the two? Consentry and Internet Security Simultaneous Multithreading is just for servers? Microsoft Workshop on Multicore, Seattle, June-2007 13 A+A=A Stamatis Vassiliadis Symposium Off-die Interconnect Memory Cache Cache On-die Interconnect Cache Cache On-die Interconnect Cache Memory Memory Cache Should we push for further optimizing classical OoO implementations or research how to put into practical use radical new approaches such as dataflow or asynchronous architectures? Cache • • How many cores will the processor of 2017 have? Cache • • Memory Chip organization in 2017: memory and interconnection network • • • • • How will the latency and bandwidth problems be addressed? 3D integration aware Computer Architecture: it is a great future idea. Will it will always be a great future idea? What is the best many-core interconnect topology? How we can evaluate the importance of the interconnection network in the applications? What are the obstacles that are presented for parallel applications when I/O doesn't scale well? Microsoft Workshop on Multicore, Seattle, June-2007 14 A+A=A Stamatis Vassiliadis Symposium An overall picture of the Microsoft Many-core project 15 OpenMP+TM STM HW acceleration for Haskell Many-core architecture Power-aware A+A=A Functional • • Transactional Memory Imperative • • Architectural support to programming models Programming model • Architecture Programming models for future many-core architectures Applications • Stamatis Vassiliadis Symposium HTM An overall picture of the IBM MareIncognito project • • • • • • Our 10-100 Petaflop research project for BSC (2010) Port/develop applications to reduce time-to-production once installed Programming models (MPI, OpenMP+TM, CellSs) Tools for application development and to support previous evaluations Application development an tuning Evaluate node architecture (heavily multicored) Evaluate interconnect options Performance analysis and Prediction Tools Model and prototype Interconnect 16 Fine-grain programming models A+A=A Stamatis Vassiliadis Symposium Load balancing Processor and node Supercomputing and e-Science Consolider program • • • 5 Grand Challenge applications 22 groups 119 senior researchers Life Sciences Compilers and tuning of application kernels Earth Sciences Programming models and performance tuning tools Astrophysics Architectures and hardware technologies Engineering Material Sciences Strong interaction Interaction to be created 17 A+A=A Stamatis Vassiliadis Symposium Education for multi-core I 18 A+A=A Multicore-based pacifier Stamatis Vassiliadis Symposium programming multicores