Published on March 7, 2014
Interconnect Your Future With Mellanox High-Performance Computing March 2014
Mellanox Performance Advantage (Source TopCrunch) 2014 Results Higher Performance with Half of the System Size! LS-DYNA Applications, Car2Car Benchmark (Seconds) 1500 1400 1300 LS-DYNA is an advanced multiphysics simulation software (CAE), developed by LSTC Used in automotive, aerospace, military, manufacturing, and bioengineering industries 1200 1100 1000 CRAY XC30/Aries 2000 Cores CRAY XC30/Aries 4000 Cores FDR InfiniBand (SGI) 2000 Cores InfiniBand Delivers Highest System Performance, Efficiency and Scalability All platforms use same CPUs of Intel® Xeon® E5-2690 v2 @3.00GHz, Cray platform is connected with Cray Aries interconnect, SGI platform is connected with Mellanox FDR InfiniBand © 2014 Mellanox Technologies 2
Mellanox Performance Advantage (Source HPC Advisory Council) More than 2X Performance! HOOMD-blue is an highly optimized object-oriented many-particle dynamics applications that performs general purpose particle dynamics simulations Developed by the University of Michigan InfiniBand Delivers Highest System Performance, Efficiency and Scalability © 2014 Mellanox Technologies 3
InfiniBand Leadership in TOP500 Petascale-Capable Systems Mellanox InfiniBand is the interconnect of choice for Petascale computing • Accelerates 48% of the sustained Petaflop systems (19 systems out of 40) © 2014 Mellanox Technologies 4
Mellanox InfiniBand Connected Petascale Systems Connecting Half of the World’s Petascale Systems Mellanox Connected Petascale System Examples © 2014 Mellanox Technologies 5
InfiniBand’s Unsurpassed System Efficiency Average Efficiency • InfiniBand: 86% • Cray: 80% • 10GbE: 65% • GigE: 44% TOP500 systems listed according to their efficiency InfiniBand is the key element responsible for the highest system efficiency Mellanox delivers efficiencies of more than 97% with InfiniBand © 2014 Mellanox Technologies 6
Mellanox in the TOP500 Supercomputing List (Nov’13) Mellanox FDR InfiniBand is the fastest interconnect solution on the TOP500 • • • • More than 12GB/s throughput, less than 0.7usec latency Being used in 80 systems on the TOP500 list – 1.8X increase from the Nov’12 list Connects the fastest InfiniBand-based supercomputers – TACC (#7), LRZ (#10) Enables the two most efficient systems in the TOP200 Mellanox InfiniBand is the fastest interconnect technology on the list • Enables the highest system utilization on the TOP500 – more than 97% system efficiency • Enables the top seven highest utilized systems on the TOP500 list Mellanox InfiniBand is the only Petascale-proven, standard interconnect solution • Connects 19 out of the 40 Petaflop capable systems on the list • Connects 4X the number of Cray based systems in the Top100, 6.5X in TOP500 Mellanox’s end-to-end scalable solutions accelerate GPU-based systems • GPUDirect RDMA technology enables faster communications and higher performance © 2014 Mellanox Technologies 7
System Example: NASA Ames Research Center Pleiades 20K InfiniBand nodes Mellanox end-to-end FDR and QDR InfiniBand Supports variety of scientific and engineering projects • Coupled atmosphere-ocean models • Future space vehicle design • Large-scale dark matter halos and galaxy evolution Asian Monsoon Water Cycle High-Resolution Climate Simulations © 2014 Mellanox Technologies 8
Leading Supplier of End-to-End Interconnect Solutions Comprehensive End-to-End Software Accelerators and Managment Management MXM FCA Mellanox Messaging Acceleration Fabric Collectives Acceleration UFM Unified Fabric Management Storage and Data VSA UDA Storage Accelerator (iSCSI) Unstructured Data Accelerator Comprehensive End-to-End InfiniBand and Ethernet Portfolio ICs Adapter Cards © 2014 Mellanox Technologies Switches/Gateways Host/Fabric Software Metro / WAN Cables/Modules 9
Converged Interconnect Solutions to Deliver Highest ROI for all Applications Accelerating Half of the World’s Petascale Systems Mellanox Connected Petascale System Examples InfiniBand Enables Lowest Application Cost in the Cloud (Examples) © 2014 Mellanox Technologies Businesses Success Depends on Mellanox Dominant in Storage Interconnects 10
Mellanox Solutions © 2014 Mellanox Technologies 11
Virtual Protocol Interconnect (VPI) Technology VPI Adapter VPI Switch Unified Fabric Manager Switch OS Layer Applications Storage Networking Clustering Management Acceleration Engines Ethernet: 10/40/56 Gb/s 3.0 64 ports 10GbE 36 ports 40/56GbE 48 10GbE + 12 40/56GbE 36 ports IB up to 56Gb/s 8 VPI subnets InfiniBand:10/20/40/56 Gb/s From data center to campus and metro connectivity LOM Adapter Card Mezzanine Card Standard Protocols of InfiniBand and Ethernet on the Same Wire! © 2014 Mellanox Technologies 12
Mellanox ScalableHPC Communication Library to Accelerate Applications MPI OpenSHMEM / PGAS MXM FCA • • • • • • • • Berkeley UPC Reliable Messaging Hybrid Transport Mechanism Efficient Memory Registration Receive Side Tag Matching Topology Aware Collective Optimization Hardware Multicast Separate Virtual Fabric for Collectives CORE-Direct Hardware Offload Reduce Collective Latency 100.0 80.0 60.0 40.0 20.0 0.0 Latency (us) Latency (us) Barrier Collective Latency 0 500 1000 1500 2000 Processes (PPN=8) Without FCA © 2014 Mellanox Technologies With FCA 2500 3000 2500 2000 1500 1000 500 0 0 500 1000 1500 2000 2500 Processes (PPN=8) Without FCA With FCA 13
Mellanox Connect-IB The World’s Fastest Adapter The 7th generation of Mellanox interconnect adapters World’s first 100Gb/s interconnect adapter (dual-port FDR 56Gb/s InfiniBand) Delivers 137 million messages per second – 4X higher than competition World leading scalable transport – no dependency on system size © 2014 Mellanox Technologies 14
Smart Offloads for MPI/SHMEM/PGAS/UPC Collective Operations Ideal System noise CORE-Direct (Offload) CORE-Direct - Asynchronous CORE-Direct Technology US Department of Energy (DOE) funded project – ORNL and Mellanox Adapter-based hardware offloading for collectives operations Includes floating-point capability on the adapter for data reductions CORE-Direct API is exposed through the Mellanox drivers © 2014 Mellanox Technologies 15
1 System Memory CPU CPU 1 GPUDirect RDMA for Highest GPU Performance GPU Chip set Chip set System Memory GPU InfiniBand GPU Memory InfiniBand GPUDirect RDMA GPU Memory Source: Prof. DK Panda 67% Lower Latency © 2014 Mellanox Technologies 5X Increase in Throughput 16
Remote GPU Access through rCUDA GPU servers CUDA Application Application GPU as a Service Client Side Server Side Application rCUDA daemon rCUDA library CUDA Driver + runtime Network Interface Network Interface CPU VGPU CPU VGPU CUDA Driver + runtime GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU CPU VGPU rCUDA provides remote access from every node to any GPU in the system © 2014 Mellanox Technologies 17
Campus and Metro RDMA Long Reach Solutions Example: 4 MetroX TX6100 systems over 6 km Example: 4 MetroX TX6100 systems • Connect IB over 2-4km “A common problem is the time cost of moving data between datacenters, which can slow computations and delay results. Mellanox's MetroX lets us unify systems across campus, and maintain the high-speed access our researchers need, regardless of the physical location of their work.” • Replace Obsidian SDR Mike Shuey, Purdue University Example: © 2014 Mellanox Technologies 2 MetroX TX6100 systems over 8 km 18
Variety of Clustering Topologies CLOS (Fat Tree) Typically enables best performance, lowest latency Non-Blocking Network Alleviates bandwidth bottleneck closer to the root. Most common topology in many supercomputers Hypercube Supported by SGI © 2014 Mellanox Technologies Mesh / 3D Torus Blocking network, good for applications with locality Support for dedicate sub-networks Simple expansion for future growth Not limited to storage connection only at cube edges DragonFly+ Concept of connecting “groups” together in a full-graph Flexible definition of intra-group interconnection 19
The Mellanox Advatage Connect-IB delivers superior performance: 100Gb/s, 0.7usec latency, 137 million messages/sec ScalableHPC software library provides leading performance for MPI, OpenSHMEM/PGAS and UPC Superiors applications offloads: RDMA, Collectives, scalable transport (Dynamically Connected) Flexible topologies: Fat Tree, mesh, 3D Torus, Dragonfly+ Standard based solution, Open source support, large eco-system, one solution for all applications Converged I/O – compute, storage, management on single fabric Long term roadmap © 2014 Mellanox Technologies 20
Technology Roadmap – One-Generation Lead over the Competition Mellanox 56Gbs 40Gbs 20Gbs Terascale Petascale 3rd “Roadrunner” Virginia Tech (Apple) Mellanox Connected © 2014 Mellanox Technologies Exascale 1st TOP500 2003 2000 200Gbs 100Gbs 2005 Mega Supercomputers 2010 2015 2020 21
The Only Provider of End-to-End 40/56Gb/s Solutions Comprehensive End-to-End InfiniBand and Ethernet Portfolio ICs Adapter Cards Switches/Gateways Host/Fabric Software Metro / WAN Cables/Modules From Data Center to Metro and WAN X86, ARM and Power based Compute and Storage Platforms The Interconnect Provider For 10Gb/s and Beyond © 2014 Mellanox Technologies 22
For more information: HPC@mellanox.com
Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...
In this presentation we will describe our experience developing with a highly dyna...
Presentation to the LITA Forum 7th November 2014 Albuquerque, NM
Un recorrido por los cambios que nos generará el wearabletech en el futuro
Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...
Interconnect Your Future with ... the Mellanox interconnect can ... 3 thoughts on “ Interconnect Your Future with Mellanox 100Gb EDR Interconnects and ...
TOP500 Supercomputers, June 2016. Interconnect Your Future Enabling the Best Datacenter Return on Investment
In this slidecast, Gilad Shainer from Mellanox describes the advantages of InfiniBand and the company's off-loading network architecture for HPC ...
Interconnect Your Future ... Number of Mellanox Interconnect systems grew ~2X from Nov’13 to Nov’14 ... The Future is Here
Presented by Yoni Luzon, Mellanox during the Supercomputing Conference 2013 in Denver, CO
Interconnect Your Future ... is Moving to the Interconnect CPU Interconnect Past Future ... data path between the GPU and Mellanox interconnect
Scot Schultz, Director, HPC and Technical Marketing HPC Advisory Council, European Conference, June 2014 Interconnect Your Future
Rich Graham February 2016, HPCAC Stanford Conference Interconnect Your Future