Download CCS machine development plan for post- peta scale computing and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CCS machine development plan for postpeta scale computing
and
Japanese the next generation
supercomputer project
Mitsuhisa Sato
CCS, University of Tsukuba
2010.2.22
Computing resources in CCS
PACS-CS
(2006~)
FIRST
(2007~)
GRAPE-6
A Special-purpose system to Astrophysics simulation
by hybrid computation of radiation and N-body.
 Each node is equipped by GRAPE-6, which is an
accelerator specialized for N-body Gravity calculation.
 256 nodes
 performance: cluster 3.5TFLOPS+Grape-6 35TFLOPS





#node 2560 node
(Intel Xeon 2.8GHz, single core /node)
peak performance 14.34 TF
memory 5 TB
network 250MB/s/link x 3 (3D-HXB by GbE)
core
core
core
core
core
core
core
core
core
core
core
Designed by T2K Open Supercomputer
Alliance (U. Tokyo and Kyoto U)
memory
core
core
memory
core
core
memory
(2008~)
memory
T2K-tsukuba
core
IO interface
IO interface
Network (DDR Infiniband x 4)
Spec;
• 648 nodes
(quad Opteron, 4sockets/node)
• 10000 cores
• Peak performance 95.4TF
• total memory 20TB
• total disk capacity 800TB
( 20th in top 500, June, 2008)
Full bi-sectional FAT-tree Network
n
L3 SWs
n
: #Node with 4 Links
: #24ports IB Switch
L2 SWs
L1 SWs
Nodes
Detail View for one network unit
#
Item
1
2
1
3
2
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
Node
696
Level 3 switch
144
Level 2 switch
240
Level 1 switch
232
Total switch
616
12
12
※ノード総数696台には
オンラインのスペア
ノード4台を含みます。
1
2
3
4
5
6
7
8
9 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36
2
x 20 network units
6
System installation and future plans
H17
H18
H19
H20
H21
H22
H23
H24
2005
2006
2007
2008
2009
2010
2011
2012
2010.2.22
2013
CP-PACS
FCS-IV
(計画)
PACS-CS
HA-PACS
(planned)
FCS-V
FCS: Front-end system
2011-2013
VPP
suspended
T2K
FIRST
the next system
to T2K
NGS
(10PF)
3
2010.2.22
Issues for Post-peta scale systems (not exa?)

System to enable strong-scaling
the current petascale
system enabled by weak-scaling
 We need more powerful
node & network



More specialized architecture



GPGPU is one of solution
we need a sharp science target
All applications cannot use
Peak
flops
Exaflops system
1EFlops
1018
target of
HA-PACS
1PFlops
1015
NGS
> 10PF
1TFlops
1012
PACS-CS
(14TF)
1GFlops
109
More difficult to program
Need supports from CS-side
 Collaboration with computer science and
computational science
limitation
of #node
1
10
102
103
#node
104
105
106

CCS's mission
4
HA-PACS: Highly Accelerated Parallel Advanced
system for Computational Sciences (planned)
2010.2.22
Objective: to investigate acceleration technologies for post-petascale computing and its
software, algorithms and computational science applications, and demonstrate by
building a prototype system




Design and deploy a GPGPU-based Cluster system
Research on programming model and languages, environment for parallel system with
accelerators.
Design of Algorithms and applications for parallel system with accelerators.
Research on architectures for parallel system with accelerators.
•ノード構成:8-core CPU x 2 + GPU x 2
•ネットワーク構成:Infiniband QDR x 2 / node
Full-bisection B/W Fat-Tree
•ピーク性能:2TFLOPS/node x 324
= 648TFLOPS
examples
IB switch
2-stage Fat-Tree (Infiniband QDR)
IB switch
IB switch
.....
.........
IB switch
IB switch
IB switch
..............
..............
Infiniband QDR
x 2 port
8core
CPU
8core
CPU
GPGPU
GPGPU
..............
12 node
18 node
18 node
...
.
.
.
..
18 groups
Total #node = 18x18 = 324
5
HA-PACS/NG powered by PEARL Link
2010.2.22
PEARL: PCI-Express Adaptive and Reliable Link
Infiniband QDR
Use PCI-Express as a high-speed link
.....
.........
Connect CPU and devices including
GPGPU through a router chip,
PEACH (PCI-Express Adaptive
IB switch
IB switch
Communication Hub)
IB switch
IB switch
IB switch
IB switch
..............
..............
Infiniband QDR
CPU
CPU
PCIe
PCIe
GPGPU
GPGPU
..............
PEARL Link
12 node
GPGPU .............. GPGPU
12 node
PEARL Link
GPU
PEACH
GPGPU
To neighbor
node
12 node
to neighbor
node
GPU
PEACH
GPU
PEACH
To external PCI-e switch
Direct
Connection
between
GPUs
CPU
CPU
CPU
6
2010.2.22
Strategic target computational sciences of HA-PACS
① Bio-physics : high performance QM/MM
hybrid simulation for mechanisms of highefficiency enzymatic reactions, electronic and
3D structures of biomacromolecules

Speedup of QM is a key for this simulations
② astrophysics: full Hydrodynamics and radiativetransfer simulation for the Universe and
Formation of Astronomical objects

Full 6 dimensional simulation is required
③ Particle physics: full-lattice QCD simulation
7
Japanese the next generation
supercomputer project
2010.2.22
background: Japanese government plan

The 3rd Science and Technology Basic Plan (FY2006-FY2010)
“Next-generation super computing technology” is selected as one of key
technologies of national importance




Development and installation of the advanced high performance supercomputer
system (10petaflops) → the Next-Generation Supercomputer
Development application software
Establishment of “Advanced Computational Science and Technology Center”
(tentative name)
The 4th Science and Technology Basic Plan (FY2011-FY2015) (Now
under discussion)

Exaflops class HPC Technology
New chip device, software, hardware…
After the election of the House of Representatives in the last summer,….


In the November of the last year, the new government party have decided to
freeze the plan of the development at the screening of government projects!!!
In January of this year, the cabinet have made a decision to resume the super
computer project.
9
2010.2.22
The System Overview of NGS
【Massively Parallel/Distributed Memory Supercomputer】

Ultra high-speed/ high-reliable CPU




High performance/highly reliable network



Advanced 45nm process technology
8cores/CPU, 128GFLOPS
Error recovery ( ECC, Instruction retry, etc.)
Direct interconnection network by multi-dimensional mesh/torus network
Expandability and reliablity
System Software



Linux OS
Fortran, C, and MPI libraries
Distributed parallel file system
Logical 3-dimensional torus
network
Courtesy of FUJITSU
10
Configuration of Compute Nodes
Number of nodes > 80k
2010.2.22
Multi-dimensional mesh/torus network
Peak bandwidth: 5GB/s x 2 for each
direction of logical 3-dimensional torus
network
Peak bi-sectional bandwidth: > 30TB/s
Number of CPUs > 80k
Number of cores > 640k
5GB/s x 2
Peak Performance > 10PFLOPS
Total Memory Capacity > 1PB
( 16GB/node )
ノード
CPU: 128GFLOPS
(8 Core)
Core
Core Core
SIMD(4FMA)
Core
Core
SIMD(4FMA)
SIMD(4FMA)
Core Core
16GFlops
SIMD(4FMA)
SIMD(4FMA)
Core
16GFlops
16GFlops
SIMD(4FMA)
SIMD(4FMA)
16GFlops
16GFlops
SIMD(4FMA)
16GFlops
16GFlops
16GFLOPS
5GB/s x 2
5GB/s x 2
L2$: 5MB
64GB/s
z
MEM: 16GB
x
5GB/s x 2
y
Logical 3-dimensional torus
network for programming
11
2010.2.22
The Next-Generation Supercomputer Project
○Schedule
FY2006
System
FY2007
Conceptual
design
FY2008
Detailed design
FY2009
Prototype and
evaluation
FY2010
FY2011
Production, installation,
and adjustment
FY2012
Tuning and
improvement
open to users
Applications
Next-Generation
Integrated
Nanoscience
Simulation
Next-Generation
Integrated
Life Simulation
Buildings
Computer
building
Research
building
Development, production, and evaluation
Development, production, and evaluation
Design
Verification
Verification
Construction
Design
Construction
12
The categories of users of NGS
2010.2.22
1. Strategic Use:
MEXT selected 5 strategic fields from national viewpoint.

Field 1: Life science/Drug manufacture

Field 2: New material/energy creation

Field 3: Global change prediction for disaster prevention/mitigation

Field 4: Mono-zukuri (Manufacturing technology)

Field 5: The origin of matters and the universe
2. General Use:
The use for the needs of the researchers in many
science and technology fields including industrial use and
educational use
13
Organization for NGS


2010.2.22
“Advanced Computational Science and Technology Center” (ACSTC)
(tentative name) will be organized at NGS.
MEXT selects 5 core organizations that lead research activities in 5
strategic fields

ACSTC → Core research center
•
•
•

Conducts advanced and basic R&D in computational science
Leads cooperation among strategic fields
Provides key knowledge to 5 organizations in strategic fields and another research
organizations
5 core organizations → Research center in each field
•
•
Conducts advanced R&D in each field
CCS was selected as a core organization for "Field 5: The
origin of matters and the universe"
•
•
particle physics, Astrophysics, nuclear physics
Collaboration with KEK and National Observatory
14