Download Mellanox Update - HPC User Forum

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Distributed operating system wikipedia , lookup

Network tap wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

IEEE 1355 wikipedia , lookup

Bus (computing) wikipedia , lookup

Computer cluster wikipedia , lookup

Airborne Networking wikipedia , lookup

Transcript
Mellanox Connectivity Solutions for
Scalable HPC
Highest Performing, Most Efficient
End-to-End Connectivity for Servers and Storage
September 2010
Brandon Hathaway
Connectivity Solutions for Efficient Computing
Enterprise HPC
High-end HPC
HPC Clouds
Mellanox Interconnect Networking Solutions
ICs
Adapter Cards
Host/Fabric
Software
Switches/Gateways
Cables
Leading Connectivity Solution Provider For Servers and Storage
2
© 2010 MELLANOX TECHNOLOGIES
Complete End-to-End Connectivity
Mellanox Host/Fabric Software Management
- Full Software Stack on all OSs and Platforms
- In-Box: Linux Kernel - Windows
Application Accelerations
- CORE-Direct
- RDMA
- GPUDirect
Networking Efficiency/Scalability
- QoS
- Adaptive Routing
- Congestion Management
Server and Storage High-Speed Connectivity
ICs
© 2010 MELLANOX TECHNOLOGIES
Adapters
Switches/Gateway
Cables
3
Mellanox InfiniBand in the TOP500
Top500 InfiniBand Trends
Number of Clusters
250
208
200
150
100
152
122
50
0
Jun 08


Jun 09
Jun 10
Mellanox InfiniBand builds the most powerful clusters
• 5 of the Top10 (#2, #3, #6, #7, #10) and 64 of the Top100
InfiniBand represents 42% of the Top500
• All InfiniBand clusters use Mellanox solutions

Mellanox InfiniBand enables the highest utilization on the Top500
• Up to 96% system utilization
• 50% higher than the best Ethernet based system
© 2010 MELLANOX TECHNOLOGIES
4
Highest Performance

Highest throughput
• 40Gb/s node to node
• Nearly 90M MPI messages per second
• Send/receive and RDMA operations with zero-copy

Lowest latency
•
•
•
•

1-1.3usec MPI end-to-end
0.9-1us InfiniBand latency for RDMA operations
100ns switch latency at 100% load
Lowest latency 648-port switch – 25% to 45% faster vs other solutions
Lowest CPU overhead
• Full transport offload maximizes CPU availability for user applications
© 2010 MELLANOX TECHNOLOGIES
5
Advanced HPC Capabilities

CORE-Direct (Collectives Offload Resource Engine)
• Collectives communication are communications used for system
synchronizations, data broadcast or data gathering
• Collectives effect every node
• CORE-Direct offloads these collectives to the HCA from the CPU
• Eliminates system noise and jitter issue
• Increases the CPU cycles available for applications

GPUDirect
• Enables fastest GPU-to-GPU communications
• Elimiates CPU overhead and eliminate memory copies in system memory
• Reduces 30% of the GPU-to-GPU communication time

Congestion control
• Eliminates network congestions (hot-spots) related to many senders and a single
receiver

Adaptive routing
• Eliminates networks congestions related to point to point communications sharing
the same network path
© 2010 MELLANOX TECHNOLOGIES
6
InfiniBand Link Speed Roadmap
Per Lane & Rounded Per Link Bandwidth (Gb/s)
# of
Lanes per
direction
5G-IB
DDR
10G-IB
QDR
14G-IB-FDR
(14.025)
26G-IB-EDR
(25.78125)
12
60+60
120+120
168+168
300+300
8
40+40
80+80
112+112
200+200
4
20+20
40+40
56+56
100+100
1
5+5
10+10
14+14
25+25
12x NDR
12x HDR
300G-IB-EDR
168G-IB-FDR
8x NDR
Bandwidth per direction (Gb/s)
8x HDR
4x NDR
120G-IB-QDR
200G-IB-EDR
112G-IB-FDR
4x HDR
60G-IB-DDR
80G-IB-QDR
x12
100G-IB-EDR
56G-IB-FDR
1x NDR
1x HDR
x8
40G-IB-DDR
40G-IB-QDR
20G-IB-DDR
10G-IB-QDR
25G-IB-EDR
14G-IB-FDR
x4
2005
Market
Demand
x1
-
2006
-
2007
© 2010 MELLANOX TECHNOLOGIES
-
2008
-
2009
-
2010
-
2011
2014
7
Thank You