Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to High Performance Cluster Computing Courseware Module H.1.a August 2008 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these are NOT High Availability clusters HPTC = High Performance Technical Computing The ultimate aim of HPC users is to max out the CPUs! Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Agenda • Parallel Computing Concepts • Clusters • Cluster Usage Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Concurrency and Parallel Computing A central concept in computer science is concurrency: • Concurrency: Computing in which multiple tasks are active at the same time. There are many ways to use Concurrency: • Concurrency is key to all modern Operating Systems as a way to hide latencies. • Concurrency can be used together with redundancy to provide high availability. • Parallel Computing uses concurrency to decrease program runtimes. HPC systems are based on Parallel Computing Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Hardware for Parallel Computing Parallel computers are classified in terms of streams of data and streams of instructions: • MIMD Computers: Multiple streams of instructions acting on multiple streams of data. • SIMD Computers: A single stream of instructions acting on multiple streams of data. Parallel Hardware comes in many forms: • • • • On chip: Instruction level parallelism (e.g. IPF) Multi-core: Multiple execution cores inside a single CPU Multiprocessor: Multiple processors inside a single computer. Multi-computer: networks of computers working together. Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Hardware for Parallel Computing Parallel Computers Single Instruction Multiple Data (SIMD)* Shared Address Space Symmetric Multiprocessor (SMP) Non-uniform Memory Architecture (NUMA) Multiple Instruction Multiple Data (MIMD) Disjoint Address Space Massively Parallel Processor (MPP) Cluster Distributed Computing Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. HPC Platform Generations In the 1980’s, it was a vector SMP. Custom components throughout In the 1990’s, it was a massively parallel computer. Commodity Off The Shelf CPUs, everything else custom … but today, it is a cluster. COTS components everywhere Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Otherorbrands and names property of or their respective owners. Corporation its subsidiaries inare thethe United States other countries. What is an HPC Cluster A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource. A typical cluster uses: • Commodity off the shelf parts • Low latency communication protocols Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. What is HPCC? Master Node LAN/WAN File Server / Gateway Interconnect Compute Nodes Cluster Management Tools Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. A Sample Cluster Design External Network PowerConnect 2016 1 3 5 7 9 11 13 15 2 4 6 8 10 12 14 16 Cluster Switch 100M LNK/ACT POWER FDX 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 4 4 Rack switches Data Network 32 31 Compute Nodes Master Node External Network Gigabit Ethernet (Fibre) Storage Node Data Network Gigabit Ethernet (copper) Control and Out-ofBand Network 100BaseT Copper EMC 2 Connection to storage Disk Store Control Node Rack-mount LCD Panel/keyboard 31 32 Control and Out-of-Band Network Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Cluster Architecture View Parallel Benchmarks: Perf, Ring, HINT, NAS, … Application Middleware shmem OS Protocol Interconnect Hardware Real Applications MPI PVM Linux Other OSes TCP/IP Ethernet desktop VIA Quadrics Workstation Proprietary Infiniband Server 1P/2P Myrinet Server 4U + Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Cluster Hardware The Node • A single element within the cluster • Compute Node • Just computes – little else • Private IP address – no user access • Master/Head/Front End Node • User login • Job scheduler • Public IP address – connects to external network • Management/Administrator Node • Systems/cluster management functions • Secure administrator address • I/O Node • Access to data • Generally internal to cluster or to data centre Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Interconnect Interconnect Typical Latency usec Typical Bandwidth MB/s 100 Mbps Ethernet 75 8 1Gbit/s Ethernet 60-90 90 10 Gb/s Ethernet 12-20 800 SCI* 1.5-4 200-600 Myricom Myrinet* 2.2-3 250-1200 InfiniBand* 2-4 900-1400 Quadrics QsNet* 3-5 600-900 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Agenda • Parallel Computing Concepts • Clusters • Cluster Usage Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Cluster Usage Performance Measurements Usage Model Application Classification Application Behaviour Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. The Mysterious FLOPS 1 GFlops = 1 billion floating point operations per second Theoretical v Real GFlops Xeon Processor • 1 Core theoretical peak = 4 x Clock speed (double precision) • Xeons have 128 bit SSE registers which allows the processor to carry out 2 double precision floating point add and 2 multiply operations per clock cycle • 2 computational cores per processor • 2 processors per node (4 cores per node) Sustained (Rmax) = ~35-80% of theoretical peak (interconnect dependent) You’ll NEVER hit peak! Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Other Measures of CPU Performance SPEC (www.spec.org) – Spec CPU2000/2006 Speed – single core performance indicator – Spec CPU2000/2006 Rate – node performance indicator – SpecFP – Floating Point performance – SpecINT – Integer performance Many other performance metrics may be required – – – – – STREAM - memory bandwidth HPL – High Performance Linpack NPB – NASA suite of performance tests Pallas Parallel Benchmark – another suite IOZone – file system throughput Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Technology Advancements in 5 Years Codename Release date Foster GHz Number Peak FLOP Peak GFLOPS Linpack on of cores per CPU cycle per CPU 256 Processors September 1.7 2001 Woodcrest June 2006 3.0 1 2 3.4 288.9* 2 4 24 4781** Example: * From November 2001 top500 supercomputer list (cluster of Dell Precision 530) ** Intel internal cluster built in 2006 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Usage Model Electronic Design Monte Carlo Design Optimisation Parallel Search Many Users Mixed size Parallel/Serial jobs Ability to Partition and Allocate Jobs to Nodes for Best Performance Many Serial Jobs (Capacity) Meteorology Seismic Analysis Fluid Dynamics Molecular Chemistry One Big Parallel Job (Capability) Batch Usage Appliance Usage Load Balancing More Important Job Scheduling very important Interconnect More Important Normal Mixed Usage Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Application and Usage Model HPC clusters run parallel applications, and applications in parallel! One single application that takes advantage of multiple computing platforms • Fine-Grained Application • • • Uses many systems to run one application Shares data heavily across systems PDVR3D (Eigenvalues and Eigenstates of a matrix) • Coarse-Grained Application • • • Uses many systems to run one application Infrequent data sharing among systems Casino (Monte-Carlo stochastic methods) • Embarrassingly Parallel Application • • • An instance of the entire application runs on each node Little or no data sharing among compute nodes BLAST (pattern matching) A shared memory machine will run all sorts of application Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Types of Applications Forward Modelling Inversion Signal Processing Searching/Comparing Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Forward Modelling Solving linear equations Grid Based Parallelization by domain decomposition (split and distribute the data) Finite element/finite difference Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Inversion From measurements (F) compute models (M) representing properties (d) of the measured object(s). Deterministic • Matrix inversions • Conjugate gradient Stochastic • Monte Carlo, Markov chain • Genetic algorithms Generally large amounts of shared memory Parallelism through multiple runs with different models Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Signal Processing/Quantum Mechanics Convolution model (stencil) Matrix computations (eigenvalues…) Conjugate gradient methods Normally not very demanding on latency and bandwidth Some algorithms are embarrassingly parallel Examples: seismic migration/processing, medical imaging, SETI@Home Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Signal Processing Example Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Searching/Comparing Integer operations are more dominant than floating point IO intensive Pattern matching Embarrassingly parallel – very suitable for grid computing Examples: encryption/decryption, message interception, bioinformatics, data mining Examples: BLAST, HMMER Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Application Classes Applications • FEA – Finite Element Analysis • The simulation of hard physical materials, e.g. metal, plastic Crash test, product design, suitability for purpose • Examples: MSC Nastran, Ansys, LS-Dyna, Abaqus, ESI PAMCrash, Radioss • CFD – Computational Fluid Dynamics • The simulation of soft physical materials, gases and fluids Engine design, airflow, oil reservoir modelling • Examples: Fluent, Star-CD, CFX • Geophysical Sciences • Seismic Imaging – taking echo traces and building a picture of the sub-earth geology • Reservoir Simulation – CFD specific to oil asset management • Examples: Omega, Landmark VIP and Pro/Max, Geoquest Eclipse Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Application Classes Applications • Life Sciences • Understanding the living world – genome matching, protein folding, drug design, bio-informatics, organic chemistry • Examples: BLAST, Gaussian, other • High Energy Physics • Understanding the atomic and sub-atomic world • Software from Fermi-Lab or CERN, or home-grown • Financial Modelling • Meeting internal and external financial targets particularly regarding investment positions • VaR – Value at Risk – assessing the impact of economic and political factors on the bank’s investment portfolio • Trader Risk Analysis – what is the risk on a trader’s position, a group of traders Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.