Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Transactions on Dependability and Security Siewiorek Industry Trends and Dependability Research Ram Chillarege Dan Siewiorek Zbigniew Kalbarczyk IEEE Transactions on Dependability and Security © Siewiorek 2012 1 Transactions Siewiorek on Dependability and Security Moore’s Law • Doubling Annually Leads to a three orders of magnitude increase in capacity ever decade • Capacity includes number of transistors, processor performance, bits of data, storage, communications bandwidth © Siewiorek 2012 2 Transactions on Dependability and Security Siewiorek Implications of Moore’s Law • One year in the next decade Intel will be able to produce a simple microprocessor for every ant on earth • Every year or two more computers are produced than in all of previous history – By 1970 only a few thousand computers produced – Today ten’s of millions of computers shipped annually © Siewiorek 2012 3 Transactions Siewiorek on Dependability and Security Sources, Users, Environment over four decades Decade 1970 System Mainframe Workstn. PC Mobile Faults Hardware Hardware +, SW, , Net User +, Env, Conn, Malicious Complexity Closed, Integration HW, SW +, Network +, Open, 3rd Prty +, Proprtry, COTS People 10 K M 10 M 100 M Training B.S. 5000 Hrs Basic 500 Hrs Basic Buy time 50-100 Hrs 0-10 Hrs © Siewiorek 2012 1980 1990 2000 4 Transactions on Dependability and Security Siewiorek Technology and Users Change © Siewiorek 2012 5 Transactions Siewiorek on Dependability and Security Sources, Users, Environment over four decades Decade 1970 System Mainframe Workstn. PC Mobile Faults Hardware Hardware +, SW, Net, User +, Env, Conn, Malicious +, Network +, Open, 3rd Prty +, Proprtry, COTS M Basic 500 H 10 M Basic 50-100 H 100 M Buy time 0-10 H Complexity Closed, Integration HW++SW People 10 K Training © Siewiorek 2012 B.S. 5000 Hrs 1980 1990 2000 6 Transactions on Dependability and Security Siewiorek Industry Trends Fueled by Moore’s Law 1. Shifting Error Sources Hardware failures drop, new sources dominate 2. Explosive Complexity Systems more complex, users less tolerant 3. Global Volume High level of integration, open systems, number © Siewiorek 2012 7 Transactions Siewiorek on Dependability and Security Trends, Artifacts & Processes Trend Artifact Process 1) Shifting sources Monitoring Failure Data Analysis Fault Injection Raise level of abstraction 2) Explosive complexity Anomaly detection Trend analysis 3) Global volume Proactive management Pervasive, Cognitive Adaptive model of normality Formal Methods ODC Software Reliability User interaction No test case Malicious, nonmalicious © Siewiorek 2012 8 Transactions Siewiorek on Dependability and Security Trend 1 - Shifting Sources Industry Trend Artifact - Monitoring Failure rates drop in hardware and new sources - Failure data dominate analysis Process Raise the level of abstraction -Fault Injection © Siewiorek 2012 9 Transactions Siewiorek on Dependability and Security System Hardware Failure Rate Changes © Siewiorek 2012 Copyright IBM 10 Transactions on Dependability and Security Siewiorek Evolution of Hardware Platforms • Mainframe System MTTF over 30 years – Up to 25% circuitry for error detection and correction, especially for transient faults – Information redundancy (coding), spacial and temporal redundancy – Evolve techniques to accommodate integration changes (e.g. whole chip failures) • Microprocessor MTTF over 1000 years – ECC on L2 cache; parity on registers, L1 cache – Rollback to known state © Siewiorek 2012 11 Transactions on Dependability and Security Siewiorek Emerging Failure Sources • Software – Shift from functionality to dependability – High end IBM servers almost no cold restarts in an entire year • Planned Outages – Massive multi-player game company sued when game down for two hours • Desktop software expectations – Entering stable stage, expectations growing – Newer sources of failure including viruses and security holes © Siewiorek 2012 12 Transactions Siewiorek on Dependability and Security Evolution of Artifact Experimental Dependability Research 1970's 1980's 1990's 2000's Operational Crash life dumps monitoring Error logs Natural workloads Humancomputer interaction errors Fault injection Memory API Security © Siewiorek 2012 Stuck-at 13 Transactions on Dependability and Security Siewiorek Operational Life Monitoring • Evolution of monitoring and analysis – Summary Statistics - mean time to crash – Distribution type and distribution parameter values – Trend and symptom analysis • Significant findings – Transient faults over an order of magnitude more frequent than permanent faults – Probability of crashes follow a decreasing failure-rate Weibull – Strong correlation between workload and failure rate – Spatial sorting followed by temporal heuristics predicted failures on average a week before catastrophic failure with over 90% accuracy and with one-fourth the number of events for statistical techniques © Siewiorek 2012 14 Transactions on Dependability and Security Siewiorek Fault Injection • Evolution of Fault Injection – Pin level – Software implemented fault injection (SWIFI) • Memory bit/word, API - Raising Level of Abstraction – Heavy Ion Radiation, Electromagnetic interference, Laser • Significant Findings – Fault injection experiment components: fault set, workload, fault location/timing, readout, evaluation – Automated API calls with unusual data causes hangs and process aborts up to 20% of time in commercial operating systems © Siewiorek 2012 15 Transactions Siewiorek on Dependability and Security Trend 2 - Explosive Complexity Industry Trend Artifact Growth in system - Anomaly complexity, users, and Detection shrinking user tolerance to - Trend Analysis failures. Process -Formal Methods & model checking - ODC - Software Reliability - Testing - Standards © Siewiorek 2012 16 Transactions on Dependability and Security Siewiorek Aspects of Explosive Growth • Growth and Complexity – Systems synthesized by layering components from different vendors, component specifications usually do not exist • Redefining Failure – Expectation and satisfaction of the customer • Speed of Service – Single highest impact on image of the company; remote diagnosis, failure prediction, and self-repair can help contain costs • Constant Development – Much wider variability in installed base due to software downloads • Standards – Provide abstractions © Siewiorek 2012 17 Transactions on Dependability and Security Siewiorek Software Reliability • Evolution – – – – – Field measurement Assessment and feedback during development process Use of formal methods - model checking for distributed, collaborating processes Map increasing fault model abstraction level into development process space. Orthogonal Defect Classification (ODC) - defects categorized by attribute-value set including Defect Type (Development) and Trigger (Testing) • Significant Findings – MTBF increases by a factor of four during the first 12 months after release – Systematic approaches can reduce the cost of analysis by over an order of magnitude – ODC can decrease defects by almost two orders of magnitude saving $100’s millions © Siewiorek 2012 18 Transactions on Dependability and Security Siewiorek Software Failure Rates Field Failure Rates for Software, Measured from Service Data for Widely Distributed Software: (A) IBM Product circa 1994 [34] (B) Microsoft Product circa 2004 [64] © Siewiorek 2012 19 Transactions Siewiorek on Dependability and Security Trend 3 - Global Volume Industry Trend High level of integration and emerging open systems, a source of new dimensions in failures Artifact -Proactive Management -Pervasive and Cognitive Computing -Adaptive Model of Normal Behavior © Siewiorek 2012 Process -User Interaction -No Test Case -Tools to assess resilience to both malicious and nonmalicious errors 20 Transactions on Dependability and Security Siewiorek Aspects of Global Volume • Form Factor and Mobility – Handheld mobile device trigger market sizes in tens to hundreds of millions • Lower Training Threshold – Previously minor irritations for expert users become major problems for novices • Dependability in Systems Management – Volume of systems means previously manual techniques have to be automated • Market Maturity Drives Product Sophistication – Increasing user maturity demands more capability, performance, durability © Siewiorek 2012 21 Transactions on Dependability and Security Siewiorek Approaches to Manage Global Volume • Adaptive Model of Normal Behavior – Reduces number of events to examine by orders of magnitude allowing operational personnel to focus on meaningful events • User Interaction – User/operator error • No Test Case – High degree of configurability leads to customization so there is no typical system • Pervasive Computing – Hundreds of embedded computers • Cognitive Assistants That Learn – Sharing knowledge acquired through learning • Proactive Management – Discover system parameters and set them appropriately © Siewiorek 2012 22 Transactions Siewiorek on Dependability and Security Conclusion • Trend 1 (Shifting Error Sources) and Trend 2 (Explosive Complexity) – Substantial research exists – Opportunities to extend existing techniques to adapt to new assumptions – Opportunities in complexity and security • Trend 3 Global Volume – Emerging area – Many research opportunities © Siewiorek 2012 23