Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GEC23 Experimenter Developer Roundtable Paul Ruth, Anirban Mandal (Presenter) Experiences with Mid-Scale GENI Infrastructure • What is mid-scale infrastructure? – Bigger GENI racks (1000+ cores) – NSFCloud: CloudLab – NSFCloud: Chameleon • Why mid-scale infrastructure? – GENI technologies are ready to be tried at scale – Real benefits to real domain science Presentation title goes here 2 NSFCloud • Not a cloud! – It is a testbed for developing clouds • Cloudlab – – – – Sites: Utah, Clemson, Wisconsin (plus APT) API: GENI Features: ARM, IB, Cisco UCS servers ~5,000 cores (when complete), + 2,000 cores from APT • Chameleon – – – – Sites: TACC, Chicago API: OpenStack Features: IB, 100 Gbps network between sites ~14,500 cores (when complete) Presentation title goes here 3 Experiences with NSFCloud • Goal – Deploy ExoGENI rack in NSFCloud • Why? – Target HPC domain science applications (IB) • High performance is first order goal – Test/debug new ExoGENI features – Evaluate performance optimizations on identical hardware – Add support for new resource types • ARM, IB, etc. Presentation title goes here 4 ExoGENI on NSF Cloud Cloud Software NSFCloud Site Presentation title goes here 5 ExoGENI on NSFCloud ExoSM A M Broker ExoGENI Rack NSFCloud Site Presentation title goes here 6 ExoGENI on NSFCloud ExoSM A M Broker Network Transit Providers (I2,ESnet) ExoGENI Rack NSFCloud Site Presentation title goes here 7 ADCIRC on Mid-Scale Infrastructure • • • • • Finite Element Very high spatial resolution (~1.2M triangles) Efficient MPI implementation, scales to thousands of cores Typically use 256-1024 cores for forecasting applications Used for coastal flooding simulations – – – FEMA flood insurance studies Forecasting systems Research applications ADCIRC on Mid-Scale Infrastructure • Run ADCIRC at scale (bare metal) – CloudLab • APT cluster (r320s) w/ IB • Largest run 512 core (64 node) MPI job – Chameleon • TACC Alamo (Early User Program) • Largest run 160 core MPI job • Run ADCIRC at scale (VM) – CloudLab • APT cluster (r320s) w/ IB using SR-IOV • OpenStack • Largest run 256 core (32 node) MPI job Presentation title goes here 9 ADCIRC on Mid-Scale Infrastructure • Run ADCIRC at scale (bare metal) – CloudLab • APT cluster (r320s) w/ IB • Largest run 512 core (64 node) MPI job – Chameleon • TACC Alamo (Early User Program) • Largest run 160 core MPI job • Run ADCIRC at scale (VM) – CloudLab • APT cluster (r320s) w/ IB using SR-IOV • OpenStack • Largest run 256 core (32 node) MPI job VERY Preliminary Results CloudLab Ensemble member 1 Hurricane Floyd : Bare Metal 256 core MPI (IB) = ~160 mins VM 256 core MPI (IB w/ SRIOV) = ~170 mins Approx. 6% overhead due to virtualization Presentation title goes here 10 Thoughts for GENI community • CloudLab – Feels like GENI (easier for this community to use) • Chameleon – Based on OpenStack • Would be easy to interface with ExoGENI to enable GENI APIs – Need to wait for more nodes. Presentation title goes here 11 Functionality Requests • Network stitching (CloudLab, Chameleon) – Not just GENI stitching… we need to enable ExoGENI stitching • Image management is not transparent (CloudLab) – I’d like to be able to view/delete images that I have created Presentation title goes here 12 Future Work • ExoGENI – Install full ExoGENI software stack to control OpenStack cluster on APT – Create ExoGENI handler that can interface with Chameleon OpenStack sites. • Port existing work to other sites – Targeting Clemson but it has Qlogic IB cards that will require some time to setup. Presentation title goes here 13