Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
GEC23 Experimenter Developer Roundtable Paul Ruth, Anirban Mandal (Presenter) Experiences with Mid-Scale GENI Infrastructure • What is mid-scale infrastructure? – Bigger GENI racks (1000+ cores) – NSFCloud: CloudLab – NSFCloud: Chameleon • Why mid-scale infrastructure? – GENI technologies are ready to be tried at scale – Real benefits to real domain science Presentation title goes here 2 NSFCloud • Not a cloud! – It is a testbed for developing clouds • Cloudlab – – – – Sites: Utah, Clemson, Wisconsin (plus APT) API: GENI Features: ARM, IB, Cisco UCS servers ~5,000 cores (when complete), + 2,000 cores from APT • Chameleon – – – – Sites: TACC, Chicago API: OpenStack Features: IB, 100 Gbps network between sites ~14,500 cores (when complete) Presentation title goes here 3 Experiences with NSFCloud • Goal – Deploy ExoGENI rack in NSFCloud • Why? – Target HPC domain science applications (IB) • High performance is first order goal – Test/debug new ExoGENI features – Evaluate performance optimizations on identical hardware – Add support for new resource types • ARM, IB, etc. Presentation title goes here 4 ExoGENI on NSF Cloud Cloud Software NSFCloud Site Presentation title goes here 5 ExoGENI on NSFCloud ExoSM A M Broker ExoGENI Rack NSFCloud Site Presentation title goes here 6 ExoGENI on NSFCloud ExoSM A M Broker Network Transit Providers (I2,ESnet) ExoGENI Rack NSFCloud Site Presentation title goes here 7 ADCIRC on Mid-Scale Infrastructure • • • • • Finite Element Very high spatial resolution (~1.2M triangles) Efficient MPI implementation, scales to thousands of cores Typically use 256-1024 cores for forecasting applications Used for coastal flooding simulations – – – FEMA flood insurance studies Forecasting systems Research applications ADCIRC on Mid-Scale Infrastructure • Run ADCIRC at scale (bare metal) – CloudLab • APT cluster (r320s) w/ IB • Largest run 512 core (64 node) MPI job – Chameleon • TACC Alamo (Early User Program) • Largest run 160 core MPI job • Run ADCIRC at scale (VM) – CloudLab • APT cluster (r320s) w/ IB using SR-IOV • OpenStack • Largest run 256 core (32 node) MPI job Presentation title goes here 9 ADCIRC on Mid-Scale Infrastructure • Run ADCIRC at scale (bare metal) – CloudLab • APT cluster (r320s) w/ IB • Largest run 512 core (64 node) MPI job – Chameleon • TACC Alamo (Early User Program) • Largest run 160 core MPI job • Run ADCIRC at scale (VM) – CloudLab • APT cluster (r320s) w/ IB using SR-IOV • OpenStack • Largest run 256 core (32 node) MPI job VERY Preliminary Results CloudLab Ensemble member 1 Hurricane Floyd : Bare Metal 256 core MPI (IB) = ~160 mins VM 256 core MPI (IB w/ SRIOV) = ~170 mins Approx. 6% overhead due to virtualization Presentation title goes here 10 Thoughts for GENI community • CloudLab – Feels like GENI (easier for this community to use) • Chameleon – Based on OpenStack • Would be easy to interface with ExoGENI to enable GENI APIs – Need to wait for more nodes. Presentation title goes here 11 Functionality Requests • Network stitching (CloudLab, Chameleon) – Not just GENI stitching… we need to enable ExoGENI stitching • Image management is not transparent (CloudLab) – I’d like to be able to view/delete images that I have created Presentation title goes here 12 Future Work • ExoGENI – Install full ExoGENI software stack to control OpenStack cluster on APT – Create ExoGENI handler that can interface with Chameleon OpenStack sites. • Port existing work to other sites – Targeting Clemson but it has Qlogic IB cards that will require some time to setup. Presentation title goes here 13