Download A[i+1] - RAD Lab

UC Berkeley Using Statistical Machine Learning in Cloud Computing ` David Patterson, UC Berkeley Reliable Adaptive Distributed Systems Lab 1 Image: John Curley http://www.flickr.com/photos/jay_que/1834540/ Datacenter is new “server” • • • • “Program” == Web search, email, map/GIS, … “Computer” == 1000’s computers, storage, network Warehouse-sized facilities and workloads New datacenter ideas (2007-2008): truck container (Sun), floating (Google), datacenter-in-a-tent (Microsoft) • How to enable innovation in new services without first building & capitalizing a large company? 2 photos: Sun Microsystems & datacenterknowledge.com Outline • Cloud Computing • RAD Lab • Case Study 1: Director & SCADS – Cloud storage and automatic management • Case Study 2: Automatic Log Analysis – Find anomalous behavior • Case Study 3: Predicting Map Reduce jobs – Help with scheduling • Summary 3 Cloud Computing is Hot • 9/15/09 Federal CIO Vivek Kundra embraces Cloud Computing • “We’ve been building data center after data center, acquiring application after application, and frankly what it’s done is drive up the cost of technology immensely across the board. What we need is to find a more innovative path in addressing these problems.” 4 But... What is cloud computing, exactly? 5 “It’s nothing (new)” “...we’ve redefined Cloud Computing to include everything that we already do... I don’t understand what we would do differently ... other than change the wording of some of our ads.” Larry Ellison, CEO, Oracle (Wall Street Journal, Sept. 26, 2008) 6 Above the Clouds: A Berkeley View of Cloud Computing abovetheclouds.cs.berkeley.edu • 2/09 White paper by RAD Lab PI’s and students – Clarify terminology around Cloud Computing – Quantify comparison with conventional computing – Identify Cloud Computing challenges & opportunities • Why can we offer new perspective? – Strong engagement with industry – Users of cloud computing in our own research and teaching in last 18 months • Goal: stimulate discussion on what’s really new – without resorting to weather analogies ad nauseam 7 Utility Computing Arrives • Amazon Elastic Compute Cloud (EC2) • “Compute unit” rental: $0.10-0.80/hr. – 1 CU ≈ 1.0-1.2 GHz 2007 AMD Opteron/Xeon core “Instances” Platform Cores Small - $0.10 / hr 32-bit 1 1.7 GB 160 GB Large - $0.40 / hr 64-bit 4 7.5 GB 850 GB – 2 spindles 64-bit 8 • N - $0.80 / hr XLarge Memory Disk 15.0 GB 1690 GB – 3 spindles • No up-front cost, no contract, no minimum • Billing rounded to nearest hour; pay-as-you-go storage also available • A new paradigm (!) for deploying services? 8 What is it? What’s new? • Old idea: Software as a Service (SaaS) – Basic idea predates MULTICS – Software hosted in the infrastructure vs. installed on local servers or desktops; dumb (but brawny) terminals • New: pay-as-you-go utility computing – Illusion of infinite resources on demand – Fine-grained billing: release == don’t pay – Earlier examples: Sun, Intel Computing Services—longer commitment, more $$$/hour, no storage – Public (utility) vs. private clouds 9 Why Now (not then)? • “The Web Space Race”: Build-out of extremely large datacenters (10,000’s of commodity PCs) – Build-out driven by growth in demand (more users) => Infrastructure software: e.g., Google File System => Operational expertise: failover, DDoS, firewalls... – Discovered economy of scale: 5-7x cheaper than provisioning a medium-sized (100’s machines) facility • More pervasive broadband Internet • Commoditization of HW & SW – Fast Virtualization – Standardized software stacks 10 Classifying Clouds • • • • Instruction Set VM (Amazon EC2) Managed runtime VM (Microsoft Azure) Framework VM (Google AppEngine) Tradeoff: flexibility/portability vs. “built in” functionality Lower-level, Less managed EC2 Higher-level, More managed Azure AppEngine 11 Cloud Economics 101 • Cloud Computing User: Static provisioning for peak - wasteful, but necessary for SLA Machines Capacity $ Capacity Demand Demand Time Time “Statically provisioned” data center “Virtual” data center in the cloud Unused resources 12 Cloud Economics 101 Machines Capacity Energy • Cloud Computing Provider: Could save energy Capacity Demand Demand Time Time “Statically provisioned” data center Real data center in the cloud Unused resources 13 Risk of Under Utilization • Underutilization results if “peak” predictions are too optimistic Capacity Resources Unused resources Demand Time Static data center 14 2 1 Time (days) Capacity Demand Capacity 2 1 Time (days) Demand Lost revenue 3 Resources Resources Resources Risks of Under Provisioning 3 Capacity Demand 2 1 Time (days) Lost users 3 15 New Scenarios Enabled by “Risk Transfer” to Cloud • “Cost associativity”: 1,000 CPUs for 1 hour same price as 1 CPUs for 1,000 hours (@$0.10/hour) – Washington Post converted Hillary Clinton’s travel documents to post on WWW <1 day after released – RAD Lab graduate students demonstrate improved Hadoop (batch job) scheduler—on 1,000 servers • Major enabler for SaaS startups – Animoto traffic doubled every 12 hours for 3 days when released as Facebook plug-in – Scaled from 50 to >3500 servers – ...then scaled back down 16 Cloud Computing & Statistical Machine Learning? • Before CC, only performance optimization on mostly small scale systems • CC  detailed cost-performance model – Optimization more difficult with more metrics • CC  Everyone can use 1000+ servers – Optimization more difficult at large scale • Economics rewards scale up AND down – Optimization more difficult if add/drop servers • SML as optimization difficulty increases 17 RAD Lab 5-year Mission Enable 1 person to develop, deploy, operate next -generation Internet application • Key enabling technology: Statistical machine learning – management, scaling, anomaly detection, performance prediction, ... • Highly interdisciplinary faculty & students – PI’s: Fox/Katz/Patterson (systems/networks), Jordan (machine learning), Stoica (networks & P2P), Joseph (systems/security), Franklin (databases) – 2 postdocs, ~30 PhD students, ~5 undergrads 18 RAD Lab Prototype: System Architecture Drivers Drivers Drivers Automatic Workload Evaluation (AWE) Director Offered load, resource utilization, etc. Training data performance & cost models Log Mining Chukwa & XTrace (monitoring) New apps, equipment, global policies (eg SLA) SCADS Chukwa trace coll. local OS functions Web 2.0 apps web svc Ruby on APIs Rails environment Chukwa trace coll. local OS functions VM monitor 19 Successes 1. Automatically add/drop servers to fit demand, without violating Service Level Agreement (SLA) 2. Predict performance of complex software system when demand is scaled up 3. Distill millions of lines of log messages into an operator-friendly “decision tree” that pinpoints “unusual” incidents/conditions • Recurring theme: cutting-edge Statistical Machine Learning (SML) works where simpler methods have failed 20 Outline • Cloud Computing • RAD Lab • Case Study 1: Director & SCADS – Cloud storage and automatic management • Case Study 2: Automatic Log Analysis – Find anomalous behavior • Case Study 3: Predicting Map Reduce jobs – Help with scheduling • Summary 21 Automatic Management of a Datacenter • As datacenters grow, need to automatically manage the applications and resources – examples: • deploy applications • change configuration, add/remove virtual machines • recover from failures • Director: – mechanism for executing datacenter actions • Advisors: – intelligence behind datacenter management 22 Director Framework workload model performance model Advisor Advisor Advisor Advisor monitoring data Director Drivers config Datacenter(s) VM VM VM VM 23 Director Framework • Director – issues low-level/physical actions to the DC/VMs • request a VM, start/stop a service – manage configuration of the datacenter • list of applications, VMs, … • Advisors – update performance, utilization metrics – use workload, performance models – issue logical actions to the Director • start an app, add 2 app servers 24 What About Storage? • Easy to imagine how to scale up and scale down computation • Database don’t scale down, usually run into limits when scaling up • What would it mean to have datacenter storage that could scale up and down as well so as to save money for storage in idle times? 25 DC Storage Motivation • Most popular websites follow the same pattern – Rapidly developed on SQLServer / PostgreSQL / MySQL / etc. – Become popular and realize scaling limitations – Build large, complicated ad-hoc systems to deal with scaling limitations as they arise • Websites that can’t scale fast enough lose customers SCADS: Scalable, ConsistencyAdjustable Data Storage • Scale Independence - as the user base grows: – No changes to application – Cost per user doesn’t increase – Request latency doesn’t change • Key Innovations 1.Performance safe query language 2.Declarative performance/consistency tradeoffs 3.Automatic scale up and down using machine learning Scale Independence Arch • Developers provide performance safe queries along with consistency requirements • Use ML, workload information, and requirements to provision proactively via repartitioning keys and replicas SCADS Performance Model (on m1.small, all data in memory) 5% writes 1% writes SLA threshold 99th percentile median Low workload, low put rate Outline • Cloud Computing • RAD Lab • Case Study 1: Director & SCADS – Cloud storage and automatic management • Case Study 2: Automatic Log Analysis – Find anomalous behavior • Case Study 3: Predicting Map Reduce jobs – Help with scheduling • Summary 31 Mining Console Logs for LargeScale System Problem Detection • Why console logs? • Console logs are everywhere – In almost every software system – Hand-picked information by developers – Expressive, convenient to use • Low programmer overhead • Especially useful in large scale Internet services – Open source code + in-house development – Continuously changing system “Detecting Large-Scale System Problem Detection by Mining Console Logs,” Wei Xu, Ling Huang, Armando Fox, David Patterson, Michael Jordan, 32 Symposium on Operating Systems Principles (SOSP), Oct 2009 Console logs use awkward languages HODIE NATUS EST RADICI FRATER* today unto the Root a brother is born. * "that crazy Multics error message in Latin." http://www.multicians.org/hodie-natus-est.html 33 Console logs are not operator friendly Console Logs Operators grep Perl scripts search • Problem – Don’t know what to look for! • Console logs are intended for a single developer • Assumption: log writer == log reader • Today many developers => massive textual logs • Our goal - Discover the most interesting log messages without any prior input 34 Console logs are hard for machines too Machine Parsing Learning Feature Creation Machine Learning Visualization • Problem • Highly unstructured, looks like free text • Not able to capture detailed program state with texts • Hard for operators to understand detection results • Our contribution • A general framework for processing console logs • Efficient parsing and features 35 Key observations • Parsing – Console logs were inherentedly structured • Determined by log printing statement – Constant strings = markers of message structure – Source code is usually available • Features • Common information reported in logs • Identifiers and execution paths • State variables • Correlations among messages • Many ways to group related log messages • i.e. not just by time 36 Case studies • Surveyed a number of software apps – Linux kernel, OpenSSH, Apache, MySQL, Jetty – Hadoop, Cassandra, Nutch … – Parsing works on 20/22 • In this talk – Hadoop file system – Per block operation logging (usually ignored) • Experiment on EC2 cloud – 203 nodes * 48 hours – ~ 300 TB HDFS data (550,000 blocks) – ~24 million lines of console logs 37 Step 1: Using source code analysis to help log parsing • Basic ideas 091012 03-00-00 INFO Creating file mydata Log.info(“Creating file ” + filename); Creating file (.*) [filename] • Non-trivial in object oriented languages – toString() method of objects – Dynamic binding (sub-classing) • Our Solution – Analyze source code to abstract syntax tree (AST) level – Pre-compute all possible message types • Free text -> semistructured text 38 Step 2: Feature Message count vector • Identifiers are common in logs – file names, object keys, user ids • Identifiers can be discovered automatically – Have many distinct values – Appear in multiple message types – Reported many times myfile: Creating Write • Grouping by identifiers – Similar to execution paths Backup Done • Numerical representation of these “paths” – Similar to Bag of words model in IR 39 Message count vector example datanode_r16 | Receiving block blk_100 src: … dest:... namenode_r10 | allocateBlock: blk_100 namenode_r10 | allocateBlock: blk_200 datanode_r16 | Receiving block blk_200 src: … dest:... datanode_r14 | Receiving block blk_100 src: …dest:… datanode_r16 | Received block blk_100 of size 49486737 from … datanode_r14 | Received block blk_100 of size 49486737 from … datanode_r16 | Error Receiving block blk_200 of size 49486737 from … blk_100 0 2 21200200000000 blk_200 0 1 01200201 000000 40 Step 3: Mining PCA detection 0221200201000000 55,000 vectors, one per block • This example - detect abnormal vectors • Dimensions highly correlated – Rare correlations indicate abnormal executions – PCA separates normal pattern from abnormal, making anomalies easy to detect • From text -> structured feature – With details of program execution – Can apply many other machine learning algorithms 41 PCA detection results Seq Event Description 1 2 3 4 5 6 7 8 9 10 11 Forgot to update namenode for deleted block Write block exception then client give up Failed at beginning, no block written Over-replicate-immediately-deleted Received block that does not belong to any file Redundant addStoredBlock request received Trying to delete a block, but the block no longer exists on data node Empty packet for block Exception in receiveBlock for block PendingReplicationMonitor timed out Other anomalies Total (Out of 550,000 blocks) Seq Description 1 2 False Positives Normal background migration Multiple replica ( for task / jobdesc files ) Total Actual Detected 4297 3225 2950 2809 1240 953 724 476 89 45 108 4297 3225 2950 2788 1228 953 650 476 89 45 107 16916 16808 False Positives 1397 349 1746 Making results easy for operators to understand! 42 Explaining detection results with decision tree writeBlock # received exception >=1 0 # Starting thread to transfer block # to # >=3 <=2 #: Got exception while serving # to #:# 0 >=1 1 0 addStoredBlock request received for # on # size # But it does not belong to any file >=1 1 0 # starting thread to transfer block # to # >=1 0 0 #Verification succeeded for # >=1 0 0 Receiving block # src: # dest: # >=3 0 1 >=1 0 Unexpected error trying to delete block #\. BlockInfo Not found in volumeMap 1 <=2 1 43 Summary: Mining Console Logs for Large-Scale System Problem Detection Extract Detect abnormal log segments 200 nodes, >24 million lines of logs Visualize A single decision tree to visualize system behavior Contributions: How to parse logs, How to turn into features so ready for SML, How to make results accessible to operators, Which SML algorithm used matters less 44 Outline • Cloud Computing • RAD Lab • Case Study 1: Director & SCADS – Cloud storage and automatic management • Case Study 2: Automatic Log Analysis – Find anomalous behavior • Case Study 3: Predicting Map Reduce jobs – Help with scheduling • Summary 45 Using SML to Characterize, Predict, Optimize Resources Given a system (e.g. Web Service, multicore …), its workload, and performance metrics: • Answer “what-if” questions for – Changes in workload (mix/rate/distribution) – Changes in system configuration (hw/sw) – Changes in both • 3 Examples for Case Study 3: – Parallel Database Performance Prediction – Hadoop Job Performance Prediction – Multi-core Performance Optimization (if time permits) Problem Statement Workload SYSTEM Behavior Model • Find relationships between workload and performance • How to generate the model? – Manually constructed queueing model – Automatically constructed • Regression, genetic algorithms, correlation analysis,… – This talk: Canonical Correlation Analysis Finding Correlations Kernel Canonical Correlation Analysis (KCCA) • 2003, Bach & Jordan at Berkeley • Find dimensions of maximal correlation between kernelized datasets – “Kernel functions” impose system-specific definition of similarity • Dimensionality reduction + clustering effect Advantages: – Relative distance between datapoints (defines neighborhoods) – Interpolation (preserves neighborhoods across both datasets) 3 Challenges in using SML 1. Mapping function from traces to feature vectors • • Workload -> multidimensional space System behavior -> multidimensional space 2. How to compare similarity of two feature vectors 3. How to leverage results of SML techniques to find actionable answer Hadoop Job Performance Prediction Jobs Hadoop Performance 51 Motivating Scenarios • Which jobs should I schedule together to avoid resource contention in Hadoop? – Should I run my job now or wait till later? • What is the optimal number of nodes to run a job on? – How many nodes should my job run on to meet the deadline? • Given observed behavior of a job run at small scale, how will the job behave when scaled up? – Are the performance bottlenecks the same? – Which aspects of the system perform non-linearly? 52 Hadoop Facts • Map-reduce jobs have diverse performance profiles - Each job has a different behavioral signature - Input parameters affect scaling slopes of jobs • Map performance is linear wrt number of nodes • Reduce complexity is job specific • Shuffle step is primary source of job unpredictability 53 SML Goals • Predict runtime/resource requirements for jobs – Same jobs, different configs – Same configs, different jobs • Identify which jobs can be co-scheduled – minimize resource contention – maximize shared data • Create Map-Reduce benchmark – Representative workloads – Varying scale factors 54 Data and Testbed: Facebook • Multi-user environment • Data set spans 6 months • Know job names but not job source code • ~600 node cluster • Nodes 1-300 – 16GB memory – 5 map slots – 5 reduce slots • Nodes 301-600 – 8GB memory – 5 map slots – 0 reduce slots Recurring Extract Transform Load (ETL) Jobs # of instances 66 unique jobs account for 99% of ETL jobs JobID Most repeated job 14000 12000 Total Time 10000 8000 6000 4000 2000 0 Same job, many configs 14000 12000 Total Time 10000 8000 6000 4000 2000 0 0 50 100 150 Number of Maps 200 250 X Y Job Descriptor Vector Hadoop • Job config parameters • Number of Maps • Number of Reduces • Data characteristics • Map Input Bytes • Locality Performance Vector • Map Time • Reduce Time • Total Time • Map Output Bytes Predicting Job Performance Actual time (ms) Total Time Predicted time (ms) Actual bytes Map Output Bytes Predicted bytes Multi-Core Performance Optimization (if time permits) Application Optimization Configurations Multi-core Performance A. Ganapathi, K. Datta, A. Fox, D. Patterson, "A Case for Machine Learning to Optimize Multicore Performance", First USENIX Workshop on Hot Topics in Parallelism (HotPar '09), Berkeley, CA, March 30-31, 2009. Problem Structured Grids Dense Linear Algebra Motifs + Sparse Linear Algebra Diverse Sun Intel/AMD IBM Multicore Niagara2 x86 Blue Gene Architectures + Compilers alonexlc icc (No code tuning) gcc = Poor Performance! Solution: Auto-tuning Identify motif-specific optimizations Generate code variants based on these optimizations Benefits: Problem: • Portable across diverse architectures Optimization • Easily scalable Thread Count • Tunable for any relevant metric Domain Decomposition • Minimizes programmer tuning Software Prefetching time • Proven track record (e.g. ATLAS, Padding SPIRAL, FFTW, OSKI) Inner Loop • Already showed speedups of Total 1.5x to 5.6x Search parameter space for best configuration Parameters Total Configs 1 4 4 36 2 18 1 32 8 480 16 4x107 ML Goals Goal: find best value for configuration parameters to simultaneously optimize multiple performance metrics  Require little architecture/application knowledge  Provide actionable suggestions for improving performance  Simultaneously optimize for multiple performance metrics (e.g. energy efficiency and cycle time)  Use small set of samples to build model Experimental Setup 2563 regular grid x+1 z+1 y+1 y-1 (x,y,z) z-1 x-1 Structured Grids (stencil codes) • • 3D 7-point stencil • • 3D 27-point stencil Used in iterative PDE solvers for image processing, diffusion etc. For a given point, a stencil is a fixed subset of nearest neighbors A stencil code updates every point in a regular grid by “applying a stencil” Stencil codes usually bandwidthbound Architectures • AMD Barcelona 2 sockets x 4 cores/socket, 2.30 GHz, compiler: gcc • Intel Clovertown 2 sockets x 4 cores/socket, 2.66 GHz, compiler: icc Potential Optimizations TY +X NX(unit stride) CY Thread 0 CZ RXRY RZ CX Domain Decomposition Processing … A[i-1] A[i] A[i+1] Low Memory Bandwidth Retrieving from DRAM … A[i+dist-1] A[i+dist] A[i+dist+1] Thread 1 Conflict Misses … NY TY CZ NZ +Z +Y TX Parallelization and Capacity Misses TX Software Prefetching Thread n Array Padding for i=1 to x { for i=1 to xto { y{ for j=1 for j=1 to { z{ for i=1 to { y to forxk=1 for k=1 for j=1 to y to { z{ A[i,j,k]=… A[i,j,k]=… for k=1 } to z { }} A[i,j,k]=… }}} }} } Poor Functional Unit Usage and Scheduling Inner Loop Opts. 3 Challenges in using ML 1. Mapping function from traces to feature vectors • • Configuration parameters -> multidimensional space System behavior -> multidimensional space 2. How to compare similarity of two feature vectors 3. How to leverage results of ML techniques to find actionable answer Configuration Parameters Parameter Optimization Type Thread Count Number of Threads Block Size Domain Decomposition Parameter Range Number of Configurations NThreads {20…23} 4 CX {27…NX} CY {21…NY} CZ NZ Name 36 Chunk Size Software Prefetching Prefetching Type Padding Padding Size {register block, plane, pencil} 3 {0, 25…29} 6 {0…31} 32 Prefetching Distance Register Block Size Inner Loop Optimizations Feature Vector RX {20…21} RY {20…21} RZ {20…23} 10 Statement Type {complete, individual} 2 Read From Type {array, variable} 2 Pointer Type {fixed, moving} 2 Neighbor Index Type {register block, plane, pencil} 3 FMA-like Instructions {yes, no} 2 X Threads Block Size CX Block Size CY Block Size CZ Padding Size Prefetch type Prefetch distance Statement Type 4 32 128 256 32 Plane 64 individual Performance Metrics Counter Description PAPI_TOT_CYC Cycles per thread per job PAPI_L1_DCM PAPI_L2_DCM L1 data cache misses per thread L2 data cache misses per thread PAPI_TLB_DCM PAPI_CA_SHR TLB misses per thread Accesses to shared cache lines PAPI_CA_CLN PAPI_CA_ITV Accesses to clean cache lines Cache interventions Power meter Watts consumed per second Feature Vector Y (total cycles x # of flops) / (clk rate x # of watts) Total Cycles L1_DCM L2_DCM TLB_DCM CA_SHR CA_CLN CA_ITV Energy Efficiency 1.9E7 2.4E5 1.5E5 1.2E4 1.2E5 1.4E4 1.2E3 2.3E4 Kernel functions Threads Block Size CX Block Size CY Block Size CZ Padding Size Prefetch type Prefetch distance Statement Type 4 32 128 256 32 Plane 64 Individual 2 64 128 256 32 Plane 64 Individual X1 X2 2 64 128 256 32 Pencil 64 complete X3 Are X1 and X2 more similar than X2 and X3? Measure of similarity Numeric Columns: Gaussian Kernel K(xi,xj) = exp(-||xi-xj||2 /) Non-numeric Columns: K(xi,xj) = {1 if xi=xj, 0 if xi≠xj } Similarity(X1, X2) = average(K(X1i, X2i)) Stencil Code X Y Configuration Feature Vector Performance Feature Vector Platform kernel function kernel function • performance • compiler optimizations 1 … N counters 1 …N 1 1 • parameters for memory blocking • power meter readings : : KY : • Ksoftware prefetching : X N N KCCA 0 K K Kernel functions X Y A KXKX 0 = A K K 0 Gaussian B B • Numeric Columns: 0 K Kernel K • Non-numeric Columns: K(xi,xj) = {1 if xi=xj, 0KifY*Bxi≠xj } KX*A • Similarity(X1, X2) = average(K(X1i, X2i)) Y X Y Y Finding Optimal Configurations KCCA Data Space Raw Data Space Configuration features Performance Metrics X Y Inverse Image ??? Problem KX*A KCCA Nearest Neighbors KY*B Results 75 Auto-tuning Time • Exhaustive Search: – 4x107 configs x 0.08 s/trial x 5 trials > 180 days • Our Technique: – Training data: 1500 configs x 0.08 s/trial x 5 trials = 10 minutes – Training time using KCCA: 90 minutes – Genetic algorithm: 243 configs x 0.08 s/trial x 5 trials = 1.6 min – Total Time < 2 hrs 76 Summary: SML Strengths 1. Train vs. write the logic (handle SW churn) 2. Learns how to handle and make policy during transitions between steady states (unlike queueing theory) 3. Coping with a complex cost function (unlike simple control theory) 4. Finding trends, needles in data haystack 5. Fast enough to run online (today) – Infeasible algorithms from 1960s fast in 2008 – Computers cheap enough OK to “just” monitor 77 Summary: SML Accommodations • System data usually text vs. numbers – Need to turn text into features – Need to find distance function for close vs. far • SML is not perfect – System must live with <100% accuracy, false positives • Will be asked “Yes, I see that SML works, but why didn’t you try ‘simpler’ Algorithm X?” – For all X – Simpler = I know X, I don’t know SML? 78 Conclusion • Cloud Computing will transform IT industry – Pay-as-you-go utility computing leveraging economies of scale of Cloud provider – Anyone can create/scale next eBay, Twitter… • Cloud Computing  better management: cost-performance, scale, scale up/down • SML: making sense from lots of data • RAD Lab applying SML to New Cloud Computing challenges: Director, SCADS, Map Reduce Scheduling, Log Analysys 79 Acknowledgments • Peter Bodik for Director/SCADS experiments • Wei Xu for Log Analysis • Archana Ganapathi for Map Reduce and Multicore Performance Prediction • 30 PhD students and 8 faculty of RAD Lab • RAD Lab founding sponsors Google, Microsoft, and Sun Microsystems and affiliates Amazon Web Services, Cloudera, Cisco, Facebook, HP, NetApp, SAP, VMware 80 Backup Slides 81 Evaluation Benchmarks – ordered the optimizations – applied them consecutively – more information: Datta et al., “Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures”, Supercomputing 2008. Opt. #2 Parameters • “No Optimization”: running naïve code • “Expert Optimized”: Opt. #1 Parameters • “Random Raw Data”: best performing point in raw data • “Genetic on Raw Data”: permute configs for top three best performing points in raw data Cross Validation Prediction Accuracy: 1 – [Σ(predictedi – actuali)2 / Σ(actuali – actualmean)2] Test Set Map Time Reduce Time Total Time Map Output Bytes mod4 (old) 0.60 0.80 0.70 0.50 mod3 0.84 0.85 0.86 0.94 mod2 0.81 0.76 0.83 0.94 mod1 0.91 0.77 0.88 0.97 mod0 (new) 0.96 0.89 0.97 0.91 Train with Sampled Subset Prediction Accuracy: 1 – [Σ(predictedi – actuali)2 / Σ(actuali – actualmean)2] Training Set Map Time Reduce Time Total Time Map Output Bytes All old data 0.96 0.89 0.97 0.91 sample1 0.93 0.87 0.94 0.85 sample2 0.62 0.94 0.81 0.81 sample3 0.76 0.91 0.91 0.72 sample4 0.64 0.83 0.78 0.71 sample5 0.69 0.87 0.82 0.74 Microsoft’s Chicago Modular Datacenter 85 The Million Server Datacenter • 24000 sq. m housing 400 containers – Each container contains 2500 servers – Integrated computing, networking, power, cooling systems • 300 MW supplied from two power substations situated on opposite sides of the datacenter • Dual water-based cooling systems circulate cold water to containers, eliminating need for air conditioned rooms86 High workload, low put rate High workload, high put rate Low workload, low put rate 2020 IT Carbon Footprint 820m tons CO2 360m tons CO2 2007 Worldwide IT carbon footprint: 2% = 830 m tons CO2 Comparable to the global aviation industry Expected to grow to 4% by 2020 260m tons CO2 90 Energy & Cloud Computing? • Cloud Computing saves Energy? • Don’t buy machines for local use that are often idle • Better to ship bits as photons over fiber vs. ship electrons over transmission lines to spin disks, power processors locally – Clouds use nearby (hydroelectric) power – Leverage economies of scale of cooling, power distribution 91 Energy & Cloud Computing? • Techniques developed to stop using idle servers to save money in Cloud Computing can also be used to save power – Up to Cloud Computing Provider to decide what to do with idle resources • New Requirement: Scale DOWN and up – Who decides when to scale down in a datacenter? – How can Datacenter storage systems improve energy? 92 Hybrid / Surge Computing • Keep a local “private cloud” running same protocols as public cloud • When need more, “surge” onto public cloud, and scale back when need fulfilled • Saves energy (and capital expenditures) by not buying and deploying power distribution, cooling, machines that are mostly idle 93

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download A[i+1] - RAD Lab