Download Utilising Amazon Web Services to provide urgent

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Environmentalism wikipedia , lookup

Transcript
Utilising Amazon Web Services
to provide urgent computing
for climateprediction.net
Peter Uhe, F. E. L. Otto, M. Rashid, D. C. H. Wallom
Oxford e-Research Centre and Environmental Change Institute,
University of Oxford, UK.
[email protected]
CLIMATE
Funded by Climate Central
Thanks to AWS Cloud Credits for Research
CENTRAL
•
Introduction and motivation
•
Options for provisioning resources for
Urgent Computing
•
Workflow
•
Benchmarking
•
Case Study Simulations
•
Take Home Messages
Environmental Computing Workshop, Baltimore 2016
Introduction and
motivation
Environmental Computing Workshop, Baltimore 2016
Climateprediction.net
•
Volunteer computing climate
modelling project using BOINC
•
Run large ensembles (1000s) of
single processor climate
simulations (Many Task computing)
•
weather@home: subproject of
climateprediction.net high
resolution (50 or 25km) regional
simulations
•
Has been running for ~14 years
with many projects analysing
different aspects of climate change
Environmental Computing Workshop, Baltimore 2016
World Weather Attribution
•
Current project to attribute individual
severe weather events to climate
change
•
Aim to present results of analysis in
near real-time (within days) of the event
occurring
•
Breakdown of full workflow means
simulation time is minimal
(precomputed if possible)
•
May require urgent last minute
computing in addition to simulations
precomputed by volunteers.
•
Urgent requirements difficult/impossible
to guarantee using volunteer resources
Environmental Computing Workshop, Baltimore 2016
Our Requirements
•
Be able to submit 1000’s of simulations to be computed in as fast a
time frame as possible
•
On demand resources
•
Compute that can be rapidly scaled up and down
•
Best value simulations per dollar
•
If using paid resources, engage our volunteers to emphasise their
use for urgent cases or if volunteer capacity is temporarily
exceeded (rather than base computing load)
•
Can’t monopolise volunteer compute capacity
(climateprediction.net has other projects)
Environmental Computing Workshop, Baltimore 2016
Options for provisioning
resources for
Urgent Computing
Environmental Computing Workshop, Baltimore 2016
Dedicated Hardware
•
Hardware can be designed/optimised with
computing requirements in mind
•
Have full control of resources
•
Requires over-provisioning to account for
maximum desired load
•
Very expensive for infrequent computing
activities
Environmental Computing Workshop, Baltimore 2016
Supercomputer Time
•
Many academic institutions have
supercomputers available to researchers.
•
Pay for amount of computer time requested/
used
•
These are shared systems with queuing systems
•
Difficult/impossible to assign higher priority for
urgent tasks
Environmental Computing Workshop, Baltimore 2016
Cloud resources
Available on demand for urgent work
Highly scalable
Only pay for amount of compute time used, no
requirement of infrastructure costs
Environmental Computing Workshop, Baltimore 2016
Workflow
Environmental Computing Workshop, Baltimore 2016
Amazon Web Services
•
One of the largest Infrastructure
as a Service, cloud providers
•
On demand and massively
scalable resource
•
Spot Instances allow use of
‘spare’ compute capacity at
much reduced prices
•
API allows automated/ script
based control of AWS resources
Environmental Computing Workshop, Baltimore 2016
Deployment of Simulations
•
BOINC (Berkeley Open
Infrastructure for Network
Computing)
Climate models
•
•
Current method of distributing
simulations to volunteers,
infrastructure is already in place
Just requires spinning up VM’s
with the correct dependencies
and running simulations
through BOINC client
Web server
Scheduler
and
database
server
Project Scientist
Results
•
Option of cloud only or mixed
cloud- volunteer deployment
Data
server
Research
and Papers
Environmental Computing Workshop, Baltimore 2016
Deployment of Simulations
•
BOINC (Berkeley Open
Infrastructure for Network
Computing)
Climate models
•
•
Current method of distributing
simulations to volunteers,
infrastructure is already in place
Just requires spinning up VM’s
with the correct dependencies
and running simulations
through BOINC client
Web server
Scheduler
and
database
server
Project Scientist
Results
•
Option of cloud only or mixed
cloud- volunteer deployment
Data
server
Research
and Papers
Environmental Computing Workshop, Baltimore 2016
Lifecycle of Virtual Machine
1. Spin up the VM using AWS API (stock Ubuntu image with
required packages installed with boot script)
2. Connect to CPDN BOINC project (registered as a specific user)
3. Wait for BOINC client to download simulations to run in parallel
(number of simulations matching no. of CPUs)
4. Allow BOINC to run these simulations, prevent downloading more
5. Shutdown when all simulations are complete
•
Note (hosts are registered in BOINC database, this will become
very large without reusing previous host entries for new VMs)
Environmental Computing Workshop, Baltimore 2016
Managing AWS VM’s using
spot fleets
•
Spot instances use ‘spare’ compute resources for
much less than on-demand rates
•
Spot fleets simplify the control of a large number of
VM’s (maintain a set capacity of CPU units)
•
Can be scaled up/down. New VM’s spun up when
VM’s terminated
•
Spot instances can be terminated, our workflow allows
for some losses (we don’t require every simulation)
Environmental Computing Workshop, Baltimore 2016
Inputs To Spot Fleet
•
Relative performance of different VM types (benchmarks)
•
Spot price of VM types in different data centres
(Availability Zones) at time of launch
•
Cheapest price per simulation VM’s are launched
•
Additional Check: VM’s with volatile prices are discarded
from consideration (too much risk of termination)
•
Downside: Instantaneous price used- recent historical
average would be preferable (more likely to get cheapest
price of simulation)
Environmental Computing Workshop, Baltimore 2016
Benchmarking
Environmental Computing Workshop, Baltimore 2016
Model set-up
•
Stand alone simulation (c++ controller of two (global
and regional climate simulations). Not connecting to
BOINC
•
Short (1 day simulation, 20-30 minutes compute time)
•
Run multiple copies of the simulation in parallel (to
match the number of vCPUs)
•
Run on as many different VM types as possible (same
simulation in all cases)
Environmental Computing Workshop, Baltimore 2016
Benchmark Results
Blue dots:
Time to complete each
simulation
Red dots:
Slowest simulation of a
particular VM (Length of
time you pay for).
Environmental Computing Workshop, Baltimore 2016
Notes about benchmarking
•
•
•
Smaller VM’s give better
performance (due to sharing
with lighter users of same
physical hardware)
Benchmarks may depend on
load and vary slightly between
data-centres
Speed of computation is
improved by utilising less
vCPUs (as each vCPU is a
hyper thread). However
throughput is better by utilising
all vCPUs
30 min
25 min
20 min
15 min
Environmental Computing Workshop, Baltimore 2016
Case Study
Environmental Computing Workshop, Baltimore 2016
Case Study Setup
•
Model validation ensemble
(check how model performs in
the region of choice)
•
Simulations of South America
region at 50km resolution
•
13 month simulations, 25 per
year, starting Dec 1985-2014
•
750 simulations total
Environmental Computing Workshop, Baltimore 2016
Case Study 1
•
•
•
•
318 instances spun up
(gradually increased over a
few hours)
c4.large
c4.xlarge
m3.large
Fastest Run
Time (hrs)
100.3
91.1
118.4
Slowest Run
Time (hrs)
104.75
106.9
120.6
Mean Run
Time (hrs)
101.6
102.5
119.6
Ratio to c4.large
(case study)
1
1.008
1.18
Ratio to c4.large
(benchmarks)
1
1.016
1.18
2-4 simulations per instance
12 instances terminated due
to spike in spot price
2 model crashes
Environmental Computing Workshop, Baltimore 2016
Case Study 2
•
•
•
Continuation of simulations
started in case study 1
Included check for volatility
Included logging of detailed
billing information
c3.large
c4.large
c4.xlarge
c4.2xlarge
Number of
instances
3
357
23
4
Workunits
completed
6
714
92
32
Ave Time per
Workunit
(hours)
111.978
102.23
102.8
105.7125
On demand
price per
hour
$0.12
$0.12
$0.24
$0.48
On demand
cost per
workunit
$6.72
$6.08
$6.12
$6.30
Spot Price
instance per
hour
$0.02
$0.02
$0.05
$0.13
Spot cost per
workunit
$1.09
$1.08
$1.22
$1.73
Environmental Computing Workshop, Baltimore 2016
Data Uploads
•
Case study- upload to Oxford
•
•
Analysed in Oxford
Option for uploading to S3/Glacier in AWS
•
In cloud analysis
Environmental Computing Workshop, Baltimore 2016
Take Home Messages
•
This proof of concept works!
•
Fast computation for urgent results
•
Cost effective using spot fleets
•
Spot price volatility can kill off instances (volatility check
implemented after first case study)
•
Use of Instantaneous Spot Price is not ideal, potential to better
optimise price, but requiring more sophisticated framework
•
Including a large number of potential instance types could
minimise costs, however c4.large was cheapest most of the time
Environmental Computing Workshop, Baltimore 2016
Thank You!
Any Questions?
[email protected]
CLIMATE
CENTRAL