Download Reflections on Industry Trends and Experimental Research in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Transactions
on Dependability and Security
Siewiorek
Industry Trends and
Dependability Research
Ram Chillarege
Dan Siewiorek
Zbigniew Kalbarczyk
IEEE Transactions
on Dependability and Security
© Siewiorek 2012
1
Transactions
Siewiorek
on Dependability and Security
Moore’s Law
• Doubling Annually Leads to a three orders of
magnitude increase in capacity ever decade
• Capacity includes number of transistors, processor
performance, bits of data, storage, communications
bandwidth
© Siewiorek 2012
2
Transactions
on Dependability and Security
Siewiorek
Implications of Moore’s Law
• One year in the next decade Intel will be able to
produce a simple microprocessor for every ant on
earth
• Every year or two more computers are produced
than in all of previous history
– By 1970 only a few thousand computers produced
– Today ten’s of millions of computers shipped annually
© Siewiorek 2012
3
Transactions
Siewiorek
on Dependability and Security
Sources, Users, Environment
over four decades
Decade
1970
System
Mainframe Workstn. PC
Mobile
Faults
Hardware
Hardware +, SW,
, Net
User
+, Env, Conn,
Malicious
Complexity Closed,
Integration HW, SW
+,
Network
+, Open,
3rd Prty
+, Proprtry,
COTS
People
10 K
M
10 M
100 M
Training
B.S.
5000 Hrs
Basic
500 Hrs
Basic
Buy time
50-100 Hrs 0-10 Hrs
© Siewiorek 2012
1980
1990
2000
4
Transactions
on Dependability and Security
Siewiorek
Technology and Users Change
© Siewiorek 2012
5
Transactions
Siewiorek
on Dependability and Security
Sources, Users, Environment
over four decades
Decade
1970
System
Mainframe Workstn. PC
Mobile
Faults
Hardware
Hardware +, SW,
Net, User
+, Env, Conn,
Malicious
+,
Network
+, Open,
3rd Prty
+, Proprtry,
COTS
M
Basic
500 H
10 M
Basic
50-100 H
100 M
Buy time
0-10 H
Complexity Closed,
Integration HW++SW
People
10 K
Training
© Siewiorek 2012
B.S.
5000 Hrs
1980
1990
2000
6
Transactions
on Dependability and Security
Siewiorek
Industry Trends Fueled by
Moore’s Law
1. Shifting Error Sources
Hardware failures drop, new sources dominate
2. Explosive Complexity
Systems more complex, users less tolerant
3. Global Volume
High level of integration, open systems, number
© Siewiorek 2012
7
Transactions
Siewiorek
on Dependability and Security
Trends, Artifacts & Processes
Trend
Artifact
Process
1)
Shifting
sources
Monitoring
Failure Data Analysis
Fault Injection
Raise level of
abstraction
2)
Explosive
complexity
Anomaly detection
Trend analysis
3)
Global
volume
Proactive management
Pervasive, Cognitive
Adaptive model of
normality
Formal Methods
ODC
Software Reliability
User interaction
No test case
Malicious, nonmalicious
© Siewiorek 2012
8
Transactions
Siewiorek
on Dependability and Security
Trend 1 - Shifting Sources
Industry Trend
Artifact
- Monitoring
Failure rates drop in
hardware and new sources - Failure data
dominate
analysis
Process
Raise the level
of abstraction
-Fault Injection
© Siewiorek 2012
9
Transactions
Siewiorek
on Dependability and Security
System Hardware Failure
Rate Changes
© Siewiorek 2012
Copyright IBM
10
Transactions
on Dependability and Security
Siewiorek
Evolution of Hardware
Platforms
• Mainframe System MTTF over 30 years
– Up to 25% circuitry for error detection and correction,
especially for transient faults
– Information redundancy (coding), spacial and temporal
redundancy
– Evolve techniques to accommodate integration changes (e.g.
whole chip failures)
• Microprocessor MTTF over 1000 years
– ECC on L2 cache; parity on registers, L1 cache
– Rollback to known state
© Siewiorek 2012
11
Transactions
on Dependability and Security
Siewiorek
Emerging Failure Sources
• Software
– Shift from functionality to dependability
– High end IBM servers almost no cold restarts in an entire year
• Planned Outages
– Massive multi-player game company sued when game down
for two hours
• Desktop software expectations
– Entering stable stage, expectations growing
– Newer sources of failure including viruses and security holes
© Siewiorek 2012
12
Transactions
Siewiorek
on Dependability and Security
Evolution of Artifact Experimental
Dependability Research
1970's 1980's
1990's
2000's
Operational Crash
life
dumps
monitoring
Error logs
Natural
workloads
Humancomputer
interaction
errors
Fault
injection
Memory
API
Security
© Siewiorek 2012
Stuck-at
13
Transactions
on Dependability and Security
Siewiorek
Operational Life Monitoring
• Evolution of monitoring and analysis
– Summary Statistics - mean time to crash
– Distribution type and distribution parameter values
– Trend and symptom analysis
• Significant findings
– Transient faults over an order of magnitude more frequent than
permanent faults
– Probability of crashes follow a decreasing failure-rate
Weibull
– Strong correlation between workload and failure rate
– Spatial sorting followed by temporal heuristics predicted failures on
average a week before catastrophic failure with over 90% accuracy
and with one-fourth the number of events for statistical techniques
© Siewiorek 2012
14
Transactions
on Dependability and Security
Siewiorek
Fault Injection
• Evolution of Fault Injection
– Pin level
– Software implemented fault injection (SWIFI)
• Memory bit/word, API - Raising Level of Abstraction
– Heavy Ion Radiation, Electromagnetic interference, Laser
• Significant Findings
– Fault injection experiment components: fault set, workload,
fault location/timing, readout, evaluation
– Automated API calls with unusual data causes hangs and
process aborts up to 20% of time in commercial operating
systems
© Siewiorek 2012
15
Transactions
Siewiorek
on Dependability and Security
Trend 2 - Explosive
Complexity
Industry Trend
Artifact
Growth in system
- Anomaly
complexity, users, and
Detection
shrinking user tolerance to - Trend Analysis
failures.
Process
-Formal Methods
& model
checking
- ODC
- Software
Reliability
- Testing
- Standards
© Siewiorek 2012
16
Transactions
on Dependability and Security
Siewiorek
Aspects of Explosive Growth
• Growth and Complexity
– Systems synthesized by layering components from different
vendors, component specifications usually do not exist
• Redefining Failure
– Expectation and satisfaction of the customer
• Speed of Service
– Single highest impact on image of the company; remote diagnosis,
failure prediction, and self-repair can help contain costs
• Constant Development
– Much wider variability in installed base due to software downloads
• Standards
– Provide abstractions
© Siewiorek 2012
17
Transactions
on Dependability and Security
Siewiorek
Software Reliability
• Evolution
–
–
–
–
–
Field measurement
Assessment and feedback during development process
Use of formal methods - model checking for distributed, collaborating processes
Map increasing fault model abstraction level into development process space.
Orthogonal Defect Classification (ODC) - defects categorized by attribute-value
set including Defect Type (Development) and Trigger (Testing)
• Significant Findings
– MTBF increases by a factor of four during the first 12 months after release
– Systematic approaches can reduce the cost of analysis by over an order of
magnitude
– ODC can decrease defects by almost two orders of magnitude saving $100’s
millions
© Siewiorek 2012
18
Transactions
on Dependability and Security
Siewiorek
Software Failure Rates
Field Failure Rates for Software, Measured from Service Data for Widely Distributed Software:
(A) IBM Product circa 1994 [34] (B) Microsoft Product circa 2004 [64]
© Siewiorek 2012
19
Transactions
Siewiorek
on Dependability and Security
Trend 3 - Global Volume
Industry Trend
High level of integration
and emerging open
systems, a source of
new dimensions in
failures
Artifact
-Proactive
Management
-Pervasive and
Cognitive Computing
-Adaptive Model of
Normal Behavior
© Siewiorek 2012
Process
-User Interaction
-No Test Case
-Tools to assess
resilience to
both malicious
and nonmalicious errors
20
Transactions
on Dependability and Security
Siewiorek
Aspects of Global Volume
• Form Factor and Mobility
– Handheld mobile device trigger market sizes in tens to hundreds of
millions
• Lower Training Threshold
– Previously minor irritations for expert users become major
problems for novices
• Dependability in Systems Management
– Volume of systems means previously manual techniques have to
be automated
• Market Maturity Drives Product Sophistication
– Increasing user maturity demands more capability, performance,
durability
© Siewiorek 2012
21
Transactions
on Dependability and Security
Siewiorek
Approaches to Manage
Global Volume
• Adaptive Model of Normal Behavior
– Reduces number of events to examine by orders of magnitude allowing
operational personnel to focus on meaningful events
• User Interaction
– User/operator error
• No Test Case
– High degree of configurability leads to customization so there is no typical system
• Pervasive Computing
– Hundreds of embedded computers
• Cognitive Assistants That Learn
– Sharing knowledge acquired through learning
• Proactive Management
– Discover system parameters and set them appropriately
© Siewiorek 2012
22
Transactions
Siewiorek
on Dependability and Security
Conclusion
• Trend 1 (Shifting Error Sources) and Trend 2
(Explosive Complexity)
– Substantial research exists
– Opportunities to extend existing techniques to adapt to new
assumptions
– Opportunities in complexity and security
• Trend 3 Global Volume
– Emerging area
– Many research opportunities
© Siewiorek 2012
23