Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CCS machine development plan for postpeta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba 2010.2.22 Computing resources in CCS PACS-CS (2006~) FIRST (2007~) GRAPE-6 A Special-purpose system to Astrophysics simulation by hybrid computation of radiation and N-body. Each node is equipped by GRAPE-6, which is an accelerator specialized for N-body Gravity calculation. 256 nodes performance: cluster 3.5TFLOPS+Grape-6 35TFLOPS #node 2560 node (Intel Xeon 2.8GHz, single core /node) peak performance 14.34 TF memory 5 TB network 250MB/s/link x 3 (3D-HXB by GbE) core core core core core core core core core core core Designed by T2K Open Supercomputer Alliance (U. Tokyo and Kyoto U) memory core core memory core core memory (2008~) memory T2K-tsukuba core IO interface IO interface Network (DDR Infiniband x 4) Spec; • 648 nodes (quad Opteron, 4sockets/node) • 10000 cores • Peak performance 95.4TF • total memory 20TB • total disk capacity 800TB ( 20th in top 500, June, 2008) Full bi-sectional FAT-tree Network n L3 SWs n : #Node with 4 Links : #24ports IB Switch L2 SWs L1 SWs Nodes Detail View for one network unit # Item 1 2 1 3 2 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 Node 696 Level 3 switch 144 Level 2 switch 240 Level 1 switch 232 Total switch 616 12 12 ※ノード総数696台には オンラインのスペア ノード4台を含みます。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 2 x 20 network units 6 System installation and future plans H17 H18 H19 H20 H21 H22 H23 H24 2005 2006 2007 2008 2009 2010 2011 2012 2010.2.22 2013 CP-PACS FCS-IV (計画) PACS-CS HA-PACS (planned) FCS-V FCS: Front-end system 2011-2013 VPP suspended T2K FIRST the next system to T2K NGS (10PF) 3 2010.2.22 Issues for Post-peta scale systems (not exa?) System to enable strong-scaling the current petascale system enabled by weak-scaling We need more powerful node & network More specialized architecture GPGPU is one of solution we need a sharp science target All applications cannot use Peak flops Exaflops system 1EFlops 1018 target of HA-PACS 1PFlops 1015 NGS > 10PF 1TFlops 1012 PACS-CS (14TF) 1GFlops 109 More difficult to program Need supports from CS-side Collaboration with computer science and computational science limitation of #node 1 10 102 103 #node 104 105 106 CCS's mission 4 HA-PACS: Highly Accelerated Parallel Advanced system for Computational Sciences (planned) 2010.2.22 Objective: to investigate acceleration technologies for post-petascale computing and its software, algorithms and computational science applications, and demonstrate by building a prototype system Design and deploy a GPGPU-based Cluster system Research on programming model and languages, environment for parallel system with accelerators. Design of Algorithms and applications for parallel system with accelerators. Research on architectures for parallel system with accelerators. •ノード構成:8-core CPU x 2 + GPU x 2 •ネットワーク構成:Infiniband QDR x 2 / node Full-bisection B/W Fat-Tree •ピーク性能:2TFLOPS/node x 324 = 648TFLOPS examples IB switch 2-stage Fat-Tree (Infiniband QDR) IB switch IB switch ..... ......... IB switch IB switch IB switch .............. .............. Infiniband QDR x 2 port 8core CPU 8core CPU GPGPU GPGPU .............. 12 node 18 node 18 node ... . . . .. 18 groups Total #node = 18x18 = 324 5 HA-PACS/NG powered by PEARL Link 2010.2.22 PEARL: PCI-Express Adaptive and Reliable Link Infiniband QDR Use PCI-Express as a high-speed link ..... ......... Connect CPU and devices including GPGPU through a router chip, PEACH (PCI-Express Adaptive IB switch IB switch Communication Hub) IB switch IB switch IB switch IB switch .............. .............. Infiniband QDR CPU CPU PCIe PCIe GPGPU GPGPU .............. PEARL Link 12 node GPGPU .............. GPGPU 12 node PEARL Link GPU PEACH GPGPU To neighbor node 12 node to neighbor node GPU PEACH GPU PEACH To external PCI-e switch Direct Connection between GPUs CPU CPU CPU 6 2010.2.22 Strategic target computational sciences of HA-PACS ① Bio-physics : high performance QM/MM hybrid simulation for mechanisms of highefficiency enzymatic reactions, electronic and 3D structures of biomacromolecules Speedup of QM is a key for this simulations ② astrophysics: full Hydrodynamics and radiativetransfer simulation for the Universe and Formation of Astronomical objects Full 6 dimensional simulation is required ③ Particle physics: full-lattice QCD simulation 7 Japanese the next generation supercomputer project 2010.2.22 background: Japanese government plan The 3rd Science and Technology Basic Plan (FY2006-FY2010) “Next-generation super computing technology” is selected as one of key technologies of national importance Development and installation of the advanced high performance supercomputer system (10petaflops) → the Next-Generation Supercomputer Development application software Establishment of “Advanced Computational Science and Technology Center” (tentative name) The 4th Science and Technology Basic Plan (FY2011-FY2015) (Now under discussion) Exaflops class HPC Technology New chip device, software, hardware… After the election of the House of Representatives in the last summer,…. In the November of the last year, the new government party have decided to freeze the plan of the development at the screening of government projects!!! In January of this year, the cabinet have made a decision to resume the super computer project. 9 2010.2.22 The System Overview of NGS 【Massively Parallel/Distributed Memory Supercomputer】 Ultra high-speed/ high-reliable CPU High performance/highly reliable network Advanced 45nm process technology 8cores/CPU, 128GFLOPS Error recovery ( ECC, Instruction retry, etc.) Direct interconnection network by multi-dimensional mesh/torus network Expandability and reliablity System Software Linux OS Fortran, C, and MPI libraries Distributed parallel file system Logical 3-dimensional torus network Courtesy of FUJITSU 10 Configuration of Compute Nodes Number of nodes > 80k 2010.2.22 Multi-dimensional mesh/torus network Peak bandwidth: 5GB/s x 2 for each direction of logical 3-dimensional torus network Peak bi-sectional bandwidth: > 30TB/s Number of CPUs > 80k Number of cores > 640k 5GB/s x 2 Peak Performance > 10PFLOPS Total Memory Capacity > 1PB ( 16GB/node ) ノード CPU: 128GFLOPS (8 Core) Core Core Core SIMD(4FMA) Core Core SIMD(4FMA) SIMD(4FMA) Core Core 16GFlops SIMD(4FMA) SIMD(4FMA) Core 16GFlops 16GFlops SIMD(4FMA) SIMD(4FMA) 16GFlops 16GFlops SIMD(4FMA) 16GFlops 16GFlops 16GFLOPS 5GB/s x 2 5GB/s x 2 L2$: 5MB 64GB/s z MEM: 16GB x 5GB/s x 2 y Logical 3-dimensional torus network for programming 11 2010.2.22 The Next-Generation Supercomputer Project ○Schedule FY2006 System FY2007 Conceptual design FY2008 Detailed design FY2009 Prototype and evaluation FY2010 FY2011 Production, installation, and adjustment FY2012 Tuning and improvement open to users Applications Next-Generation Integrated Nanoscience Simulation Next-Generation Integrated Life Simulation Buildings Computer building Research building Development, production, and evaluation Development, production, and evaluation Design Verification Verification Construction Design Construction 12 The categories of users of NGS 2010.2.22 1. Strategic Use: MEXT selected 5 strategic fields from national viewpoint. Field 1: Life science/Drug manufacture Field 2: New material/energy creation Field 3: Global change prediction for disaster prevention/mitigation Field 4: Mono-zukuri (Manufacturing technology) Field 5: The origin of matters and the universe 2. General Use: The use for the needs of the researchers in many science and technology fields including industrial use and educational use 13 Organization for NGS 2010.2.22 “Advanced Computational Science and Technology Center” (ACSTC) (tentative name) will be organized at NGS. MEXT selects 5 core organizations that lead research activities in 5 strategic fields ACSTC → Core research center • • • Conducts advanced and basic R&D in computational science Leads cooperation among strategic fields Provides key knowledge to 5 organizations in strategic fields and another research organizations 5 core organizations → Research center in each field • • Conducts advanced R&D in each field CCS was selected as a core organization for "Field 5: The origin of matters and the universe" • • particle physics, Astrophysics, nuclear physics Collaboration with KEK and National Observatory 14