Download CARDIO: Cost-Aware Replication for Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inverse problem wikipedia , lookup

Generalized linear model wikipedia , lookup

Computer simulation wikipedia , lookup

History of numerical weather prediction wikipedia , lookup

Data assimilation wikipedia , lookup

Fault tolerance wikipedia , lookup

Transcript
CARDIO: Cost-Aware Replication
for Data-Intensive workflOws
Presented by Chen He
Motivation
• Is large scale
cluster reliable?
 5 average worker
deaths per MapReduce job
 At least 1 disk
failure in every run of
a 6- hour MapReduce
job on a 4000-node
cluster
Motivation
• How to prevent node failure from affecting
performance?
– Replication
• Capacity constraint
• Replication time, etc
– Regeneration through re-execution
• Delay program progress
• Cascaded re-execution
Motivation
COST
AVAILABILITY
All pictures adopted from the Internet
Outline
•
•
•
•
•
Problem Exploration
CARDIO Model
Hadoop CARDIO System
Evaluation
Discussion
Problem Exploration
• Performance Costs
– Replication cost (R)
– Regeneration cost (G)
– Reliability cost (Z)
– Execution cost (A)
– Total cost (T)
– Disk cost (Y)
T=A+Z
Z=R+G
Problem Exploration
• Experiment Environment
– Hadoop 0.20.2
– 25 VMs
– Workloads: Tagger->Join->Grep->RecordCounter
Problem Exploration Summary
• Replication Factor for MR Stages
Problem Exploration Summary
• Detailed Execution Time of 3 Cases
CARDIO Model
• Block Failure Model
– Output of stage i is Di
– Replication factor is xi
– Total block number is bi
– Single block failure probability is p
– Failure probability in stage i:
f ( xi )  1  (1  p xi )bi
CARDIO Model
• Cost Computation Model
– Total time of stage i: Ti  Ai  Ri  Gi 1
– Replication cost of stage i: Ri  xi Yi
– Expected regeneration time of stage i:
Gi  fi ( xi  1)Ti
n
n 1
– Reliability cost for all stages: Z   Ri   Gi
i 1
i 1
– Storage Constraint C of all stages:
Y 
n
xY
i 1
i
i
C
– Choose X  {x1 , x2 ,...xn } to minimize Z
CARDIO Model
• Dynamic Replication
– Replication number x may vary during the
program approaching
• Job is in Step k, the replication factor at this step is:
xi (k ), i  1, 2,...k
k  1, 2,..., n
CARDIO Model
• Model for Reliability
n
n
– Minimize Z   R (k )   G (k )
k 1
k 2
– Based on
X  {x1 (k ), x2 (k ),...xn (k )}
– In the condition of
Y (k ) 
k
 x (k )Y
i 1
i
i
C
CARDIO Model
• Resource Utilization Model
– Model Cost = resource utilized
– Resource type Q
• CPU, Network, Disk, and Storage resource, etc.
• Utilization of q resource in stage i: ui , q q  1, 2,...Q
• Normalize usage by i , q
ui , q
i , q  n
, q  1, 2,...Q
 u j ,q
j 1
• Relative costs weights: wq ,
q  1, 2,...Q
CARDIO Model
• Resource Utilization Model
– The cost for A is:
Q
Ai   wq i , q
q 1
– Total Cost:
n
Q
n
n 1
i 1
i 1
T  A  Z   wq i ,q  R 'i   G 'i
i 1 q 1
– Optimization target:
• Choose X  {x1 (k ), x2 (k ),...xn (k )} to minimize T
CARDIO Model
• Optimization Problem
– Job optimality (JO)
– Stage optimality (SO)
Hadoop CARDIO System
• CardioSense
– Obtain progress from JT periodically
– Be triggered by pre-configured threshold-value
– Collect resource usage statistics for running stages
– Rely on HMon on each worker node
• HMon based on Atop has low overhead
Hadoop CARDIO System
• CardioSolve
– Receive data from CardioSense
– Solve SO problem
– Decide the replication factors for current and
previous stages
Hadoop CARDIO System
• CardioAct
– Implement the command from CardioSolve
– Use HDFS API setReplication(file, replicaNumber)
Hadoop CARDIO System
Evaluation
• Several Important Parameters
– p is the failure rate 0.2 if not specified
–  is the time to replicate a data unit, 0.2 as well
– C iis the computation resource of stage i, it
follows uniform distribution U(1,Cmax),Cmax=100
in general.
– Di is the output of stage i, it is obtained from a
uniform distribution U(1, Dmax), Dmax varies
within the [1,Cmax].
– C is the storage constraint for the whole process.
Default value is 
Evaluation
• Effect of Dmax
Evaluation
• Effect of Failure rate p
Evaluation
• Effect of block size
Evaluation
• Effect of different resource constraints
++ means over-utilzed,
and this type of
resource is regarded as
expensive
P=0.08, C=204GB,
delta=0.6
S3 is CPU intensive
DSK has similar
performance pattern as
NET
CPU 0010, NET 0011,
DSKIO 0011,STG0011
Evaluation
S2 re-execute more
frequently due to the
failure injection. Because
it has large data output.
P=0.02, 0.08 and 0.1
1 , 3, 21
API reason
Discussion
• Problems
– Typos and misleading
symbols
– HDFS API setReplication()
• Any other ideas?