Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Checking Computation of Numerical Functions by the Use of Functional Equations REC 2006 NSF Workshop on Reliable Engineering Computing F. Vainstein and C. Jones Presentation Summary Background – – – Theory Finding checking polynomials – – – – Fault tolerance Computing Numerical Functions The general method A program developed by this research Some examples Considerations for deployment Future directions Fault Tolerance Grace in response to the unexpected Withstands failures Exhibits desirable behavior Does not endanger life (military, transportation, medical) Preserves scientific investment (space, supercomputing) Meets consumer expectations Fault Tolerance Can Be Critical Military: Global Hawk Science: Gravity Probe B Exploration: Mars Opportunity Rover Civilian: Airbus A380 Methods for Fault Tolerance Modular redundancy – Replication with voting – Duplicate function blocks and compare for “majority” wins Error-correcting codes – Back up systems in the event primary unit fails Reed-Solomon, parity checks, … Algorithm-based fault tolerance (ABFT) - Encodes data and augments algorithm to detect errors A Complex System: The Space Shuttle Total number of parts > 600,000 Total Weight = 4,500,000 pounds Cost to move one pound of cargo = $20,000 Budget = $3.3 billion / year Modular Redundancy: Space Shuttle Space shuttle avionics from Redundancy Management Techniques for Space Shuttle Computers, Sklaroff, IBM Research Development, 1976. Replication With Voting: Space Shuttle Complex System: The Microprocessor Intel Pentium 4 Prescott Core Number of transistors > 125 million Transistor size = 90nm Pipeline = 31 stages Development Budget = $4.2 billion/year “Never in the history of mankind has it been possible to produce so many wrong answers so quickly.” Carl-Erik Froeberg What Does a Microprocessor Do? • ALU: Arithmetic logic unit performs math and logic functions. •Math coprocessors were big business for Intel and others in the 1980s. Today, most processors incorporate a math coprocessor or emulator for numerical calculations. • Move data from one memory location to another • Make decisions and jump to new set of instructions IBM FPU Core Scientific codes typically spend much of their time in common numerical subroutines - about 70% of a phase retrieval application, for example, is spent in the Fast Fourier Transform alone. M. Turmon, Annual Report for FY 2001 Final Report Algorithm-Based Fault Image Legend: Tolerance, Nasa-JPL, Remote Exploration Dark Blue: Interface, Decode and Issue and Experimentation Project. Pink: Pipe Management and Data Forwarding Yellow: Arithmetic Pipe Aqua: Load/Store Pipe Numerical Functions Numbers from numbers Absolute value Minimum Maximum Round to next integer Return the fractional part of a value Clip in a saturation fashion Wrapping for integers Log Fast Fourier Transform (FFT) Numerical Differentiation Kalman Filtering Degrees to radians Cosine Hyperbolic Sine ArcSine SINC function Next positive power of 2 Linear interpolation Root finding Gaussian Mod Greatest Common Divisor Numerical Functions in Action: 1 IMAGE PROCESSING The FIDO Mars Exploration Rover (MER) relies on detailed panoramic views in its operation for near real-time tasks: • Determination of exact location • Navigation • Science target identification • Mapping WEATHER MODELING Roe, K., et al., High Resolution Weather Monitoring for Improved Fire Management, 2001, Maui HPCC • Real-time analysis of environmental information for prediction of fire behavior Numerical Functions in Action: 2 NON-LINEAR CONTROL SYSTEMS Brennan, S., Integrated Chassis Control for Vehicles, 2000 SCIENTIFIC SUPERCOMPUTING U. Landman, et al., Large-scale classical molecular dynamics, 2001, Georgia Tech Background Summary: Computing is at the heart of most modern systems Fault tolerance is a concern – especially for mission and safety critical systems The computation of numerical functions is a critical area of computing Notable Work in Numerical Result Checking M. Blum, R. Rubinfeld - Self-Testing/Correcting with Application to Numerical Problems, 1990 M. Blum, H. Wasserman - Reflections on the Pentium Division Bug, 1995, - Software Reliability Via Runtime Result Checking, 1997 • Promoted numerical checking • A motivation for result checking Used functional equations but no general method existed. An Algebraic Method for Fault Tolerance 1991 – Feodor Vainstein, Georgia Tech Error Detection and Correction in Numerical Computations by Algebraic Methods Developed a general theory for generating functional equations. Showed that many numerical functions have functional equations and that computations of such numerical functions could be verified by checking polynomials – a novel technique based upon algebraic concepts such as the transcendental degree of field extensions. Contribution of This Work: A Method for Practical Numerical Checking • Developed software method for finding checking polynomials. • Treated the case of functions that are not polynomially checkable. • User-friendly program for hardware/software engineering • Design considerations Polynomial Numerical Checking Example: 1 Polynomial Numerical Checking Example: 2 Polynomial Numerical Checking Example: 3 Polynomial Numerical Checking Example: 4 Algebra*: Fields *S. Lang, Algebra, Addison-Wesley, 1965 Algebra: Algebraically Dependent Algebra: Transcendental Degree of Field Extension Algebra: Algebraically Closed and Algebraic Closure Algebra: Linear Independence Theory: Polynomially Checkable Theorem: Theory: Example and Generality Theory: Linearly Checkable Theory: Other Cases We also considered Functions over various fields PC and LC functions of several variables Partially polynomially checkable functions The focus of the present work is on finding a practical method for determining approximate checking polynomials for PC and non-PC functions for real-valued functions of a single variable. Least Squares Estimation The least squares estimation technique is used to compute estimations of parameters and to fit data. Since some functions are not PC we can generalize to approximate for non-PC functions. There are other methods but this was chosen to • Add robustness • Develop a practical process • Treat all polynomially checkable functions Application of Least Squares Estimation: 1 The problem of finding a checking polynomial can be reduced to the following optimization problem. Let B 0 , 1 , , k f x 1 f x a1 k f x a k 0 2 dx A Application of Least Squares Estimation: 2 Application of Least Squares Estimation: 3 Application of Least Squares Estimation: 4 Software Implementation of Least Squares Estimation: 1 Solve the matrix equation: AX B Software Implementation of Least Squares Estimation: 2 The coefficients of the checking polynomial are then in vector X: Those values can be used to find the value of the delta function: Deviation shows how good is our approximation The Matlab Function: • Solves the least squares estimation problem • Finds the delta function value for a range of k • Returns the checking polynomial coefficients for the best (smallest error) delta function • Plots the error over the function domain for the best delta • Plots deviation for a range of k • Simulink, DSP Builder generates VHDL and deploys to Altera FPGA (Xilinx similar) Function Input Function Output B 0 , 1 , , k f x 1 f x a1 k f x a k 0 2 dx A Example: SINE Function Output Example: SINE Function Plots The sine function is linearly checkable (LC) The Logarithm Function: Output The Logarithm Function: Plots The Logarithm Function: k = [1…40] Checking Polynomials: Simple Functions Checking Polynomials: Compound Functions Why Matlab Matlab (MATrix LABoratory) • Matrix-oriented programming environment • Code can compile to C/C++ • Built-in routines for data analysis and visualization • GUI/Web publishing support • A popular environment for technical computing http://www.gtrep.gatech.edu/undergradlabs/labman/CheckingPolynomial Deployment: Considerations • Hardware or software • Pipeline or parallel • If non-LC function returns high-order checking polynomial Break up function domain Generate separate checking polynomial for each sub-interval Simulink Design [k,delta,alphas,betao,stepsize,A,B]=LSEFUNRUN('exp(x).*sin(x)',10^-4,0,3.1415,(1:2)) k delta • We show a Simulink example • Extension of Matlab • Modeling, simulating • GUI environment • Toolboxes for DSP, etc • Toolboxes for targeting FPGA devices alphas beta Simulink Implementation of Checking Algorithm f x e x sin( x) Space Complexity For a ROM implementation that stores b-bit numbers and has m address lines. Error Coverage Error Coverage Example This is the percentage of all errors covered. Design Flow Define Numerical Function Define Domain of Function Based on System Bit Size, Accuracy of Instrumentation, etc… Use LSE Function to Find Checking Polynomial Coefficients Based on LSE Results Choose Appropriate Number of Shifted Functions Choose Hardware or Software Implementation Parallel or Pipeline Target Markets Numerically intense, safety, or mission critical Supercomputing Moletronics and nanosystems Space or remote systems Control systems using COTS components Example: NASA Seeks COTS Remote Supercomputing Space Radiation S. Kayali, Space Radiation Effects on Microelectronics, Radiation Effects Group, JPL, Section 514. Traditional Fault Tolerant Devices are Costly in Terms of Design Space, Time, and Money Perry COTS initiative • Buy more commercial products • Use industrial specifications • Reduce costs William J. Perry, Specifications and Standards – A New Way of Doing Business, Memorandum, 1994 Radiation-Hardened Half-Micron CMOS 16K SRAM, Sandia National Laboratories Moletronics, CMOL, and Nanodevices Will Require Minimizing Fault Tolerant Strategies Low Yield and structural defects will be considerable (in moletronic devices). Hence, the target architecture has to be inherently faulttolerant/configurable. If you want to compensate for the errors then you have to use errorcorrecting codes and fault-tolerant circuits. V. Roychowdhury, A Quest for Information, Frontiers in Nanocomputing Seminar, 2004 Single molecular implementation of single-electron transistor, K Likharev, Electronics Below 10nm, 2003 Demands for Numerical Fault Tolerant Computing Shrinking Devices Numerical Fault Tolerance Remote Autonomous COTS (Cost) Numerical Checking Only Part of the Solution: Complex Systems Require Multiple Fault Tolerant Strategies Conclusions and Future Directions Remaining Tasks – – – – Tame functional discontinuities Deploy to hardware/software testbed Investigate impact of single and multiple checking polynomial strategies Investigate best interface strategies Develop Numerical Checking Toolbox – – Functions of several variables Partially polynomially checkable functions Thank You!