Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Utilising Amazon Web Services to provide urgent computing for climateprediction.net Peter Uhe, F. E. L. Otto, M. Rashid, D. C. H. Wallom Oxford e-Research Centre and Environmental Change Institute, University of Oxford, UK. [email protected] CLIMATE Funded by Climate Central Thanks to AWS Cloud Credits for Research CENTRAL • Introduction and motivation • Options for provisioning resources for Urgent Computing • Workflow • Benchmarking • Case Study Simulations • Take Home Messages Environmental Computing Workshop, Baltimore 2016 Introduction and motivation Environmental Computing Workshop, Baltimore 2016 Climateprediction.net • Volunteer computing climate modelling project using BOINC • Run large ensembles (1000s) of single processor climate simulations (Many Task computing) • weather@home: subproject of climateprediction.net high resolution (50 or 25km) regional simulations • Has been running for ~14 years with many projects analysing different aspects of climate change Environmental Computing Workshop, Baltimore 2016 World Weather Attribution • Current project to attribute individual severe weather events to climate change • Aim to present results of analysis in near real-time (within days) of the event occurring • Breakdown of full workflow means simulation time is minimal (precomputed if possible) • May require urgent last minute computing in addition to simulations precomputed by volunteers. • Urgent requirements difficult/impossible to guarantee using volunteer resources Environmental Computing Workshop, Baltimore 2016 Our Requirements • Be able to submit 1000’s of simulations to be computed in as fast a time frame as possible • On demand resources • Compute that can be rapidly scaled up and down • Best value simulations per dollar • If using paid resources, engage our volunteers to emphasise their use for urgent cases or if volunteer capacity is temporarily exceeded (rather than base computing load) • Can’t monopolise volunteer compute capacity (climateprediction.net has other projects) Environmental Computing Workshop, Baltimore 2016 Options for provisioning resources for Urgent Computing Environmental Computing Workshop, Baltimore 2016 Dedicated Hardware • Hardware can be designed/optimised with computing requirements in mind • Have full control of resources • Requires over-provisioning to account for maximum desired load • Very expensive for infrequent computing activities Environmental Computing Workshop, Baltimore 2016 Supercomputer Time • Many academic institutions have supercomputers available to researchers. • Pay for amount of computer time requested/ used • These are shared systems with queuing systems • Difficult/impossible to assign higher priority for urgent tasks Environmental Computing Workshop, Baltimore 2016 Cloud resources Available on demand for urgent work Highly scalable Only pay for amount of compute time used, no requirement of infrastructure costs Environmental Computing Workshop, Baltimore 2016 Workflow Environmental Computing Workshop, Baltimore 2016 Amazon Web Services • One of the largest Infrastructure as a Service, cloud providers • On demand and massively scalable resource • Spot Instances allow use of ‘spare’ compute capacity at much reduced prices • API allows automated/ script based control of AWS resources Environmental Computing Workshop, Baltimore 2016 Deployment of Simulations • BOINC (Berkeley Open Infrastructure for Network Computing) Climate models • • Current method of distributing simulations to volunteers, infrastructure is already in place Just requires spinning up VM’s with the correct dependencies and running simulations through BOINC client Web server Scheduler and database server Project Scientist Results • Option of cloud only or mixed cloud- volunteer deployment Data server Research and Papers Environmental Computing Workshop, Baltimore 2016 Deployment of Simulations • BOINC (Berkeley Open Infrastructure for Network Computing) Climate models • • Current method of distributing simulations to volunteers, infrastructure is already in place Just requires spinning up VM’s with the correct dependencies and running simulations through BOINC client Web server Scheduler and database server Project Scientist Results • Option of cloud only or mixed cloud- volunteer deployment Data server Research and Papers Environmental Computing Workshop, Baltimore 2016 Lifecycle of Virtual Machine 1. Spin up the VM using AWS API (stock Ubuntu image with required packages installed with boot script) 2. Connect to CPDN BOINC project (registered as a specific user) 3. Wait for BOINC client to download simulations to run in parallel (number of simulations matching no. of CPUs) 4. Allow BOINC to run these simulations, prevent downloading more 5. Shutdown when all simulations are complete • Note (hosts are registered in BOINC database, this will become very large without reusing previous host entries for new VMs) Environmental Computing Workshop, Baltimore 2016 Managing AWS VM’s using spot fleets • Spot instances use ‘spare’ compute resources for much less than on-demand rates • Spot fleets simplify the control of a large number of VM’s (maintain a set capacity of CPU units) • Can be scaled up/down. New VM’s spun up when VM’s terminated • Spot instances can be terminated, our workflow allows for some losses (we don’t require every simulation) Environmental Computing Workshop, Baltimore 2016 Inputs To Spot Fleet • Relative performance of different VM types (benchmarks) • Spot price of VM types in different data centres (Availability Zones) at time of launch • Cheapest price per simulation VM’s are launched • Additional Check: VM’s with volatile prices are discarded from consideration (too much risk of termination) • Downside: Instantaneous price used- recent historical average would be preferable (more likely to get cheapest price of simulation) Environmental Computing Workshop, Baltimore 2016 Benchmarking Environmental Computing Workshop, Baltimore 2016 Model set-up • Stand alone simulation (c++ controller of two (global and regional climate simulations). Not connecting to BOINC • Short (1 day simulation, 20-30 minutes compute time) • Run multiple copies of the simulation in parallel (to match the number of vCPUs) • Run on as many different VM types as possible (same simulation in all cases) Environmental Computing Workshop, Baltimore 2016 Benchmark Results Blue dots: Time to complete each simulation Red dots: Slowest simulation of a particular VM (Length of time you pay for). Environmental Computing Workshop, Baltimore 2016 Notes about benchmarking • • • Smaller VM’s give better performance (due to sharing with lighter users of same physical hardware) Benchmarks may depend on load and vary slightly between data-centres Speed of computation is improved by utilising less vCPUs (as each vCPU is a hyper thread). However throughput is better by utilising all vCPUs 30 min 25 min 20 min 15 min Environmental Computing Workshop, Baltimore 2016 Case Study Environmental Computing Workshop, Baltimore 2016 Case Study Setup • Model validation ensemble (check how model performs in the region of choice) • Simulations of South America region at 50km resolution • 13 month simulations, 25 per year, starting Dec 1985-2014 • 750 simulations total Environmental Computing Workshop, Baltimore 2016 Case Study 1 • • • • 318 instances spun up (gradually increased over a few hours) c4.large c4.xlarge m3.large Fastest Run Time (hrs) 100.3 91.1 118.4 Slowest Run Time (hrs) 104.75 106.9 120.6 Mean Run Time (hrs) 101.6 102.5 119.6 Ratio to c4.large (case study) 1 1.008 1.18 Ratio to c4.large (benchmarks) 1 1.016 1.18 2-4 simulations per instance 12 instances terminated due to spike in spot price 2 model crashes Environmental Computing Workshop, Baltimore 2016 Case Study 2 • • • Continuation of simulations started in case study 1 Included check for volatility Included logging of detailed billing information c3.large c4.large c4.xlarge c4.2xlarge Number of instances 3 357 23 4 Workunits completed 6 714 92 32 Ave Time per Workunit (hours) 111.978 102.23 102.8 105.7125 On demand price per hour $0.12 $0.12 $0.24 $0.48 On demand cost per workunit $6.72 $6.08 $6.12 $6.30 Spot Price instance per hour $0.02 $0.02 $0.05 $0.13 Spot cost per workunit $1.09 $1.08 $1.22 $1.73 Environmental Computing Workshop, Baltimore 2016 Data Uploads • Case study- upload to Oxford • • Analysed in Oxford Option for uploading to S3/Glacier in AWS • In cloud analysis Environmental Computing Workshop, Baltimore 2016 Take Home Messages • This proof of concept works! • Fast computation for urgent results • Cost effective using spot fleets • Spot price volatility can kill off instances (volatility check implemented after first case study) • Use of Instantaneous Spot Price is not ideal, potential to better optimise price, but requiring more sophisticated framework • Including a large number of potential instance types could minimise costs, however c4.large was cheapest most of the time Environmental Computing Workshop, Baltimore 2016 Thank You! Any Questions? [email protected] CLIMATE CENTRAL