Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military & Aerospace Business Unit Single Event Upset (SEU) Overview for SRAM-Based FPGAs Definitions SEU: Single Event Upset Unwanted Change in State of a Latch or a Memory Cell SER: Soft Error Rate SEU Rate SEFI: Single Event Functional Interrupt Functional Failure by SEU Not All SEUs are SEFIs Generally Takes 5-10 SEUs to Cause SEFI Copyright © 2004 Altera Corporation Circuit Components of SRAM-Based FPGAs I/O Registers & I/O Configuration No Issue, Very Robust Registers, < 1 FIT Logic Registers (LEs) No Issues, Very Robust Registers, < Hard Error Rate User Memory Typically On-Chip Memories are “By 9” for Parity Checking IP Available for ECC Configuration RAM (CRAM) for LUTs & Routing Area of Focus Copyright © 2004 Altera Corporation Noise Current for 10fC Collected Charge Time Vcc 200 Voltage Voltage Upset of a CRAM Cell Current (µA) Add Time 150 Data In 100 Data Out 50 Clear 0 0 50 100 150 200 Vss Time (ps) 6 Transistor Cell Copyright © 2004 Altera Corporation SEU Induced Failure Rate* Device LE Count SEU Rate (FIT) SEFI Rate (FIT) MTBF** (Years) EP1C6 6K 250 60 1,900 Years EP1C20 20K 730 180 634 Years EP1S25 26K 1950 400 285 Years EP1S80 79K 6000 1200 95 Years * Data at Sea Level **MTBF: Mean Time Between Functional Interrupt Copyright © 2004 Altera Corporation Number of CRAM Bit Upsets for Each Occurrence of Functional Upset Altera EP1S25 Neutron SER - WNR data Altera EP1S25 Alpha SER 3 99.5% 99% 2.5 Std Deviation 1.5 90% 84% 70% 60% 50% 40% 30% 20% 16% 1 0.5 0 -0.5 Median ~6 -1 -1.5 -2 1% 0.5% -2.5 -3 0 10 20 30 40 50 # of CRAM bit upsets for each event of functional upset Std Deviation 2 3 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 -3 99.5% 99% Median 5 1% 0.5% 0 10 20 30 40 # of CRAM bit upsets for each event of functional upset Copyright © 2004 Altera Corporation 90% 84% 70% 60% 50% 40% 30% 20% 16% 50 Addressing System-Level Issues SER Improvements/Mitigation Chip Design Enhancements New Materials & Process Enhancements Larger CRAM Structure Increase in Capacitance on Critical Node Smaller Process => Smaller Die => Lower SEU Probability Built-In Error Detection/Correction Circuitry Copyright © 2004 Altera Corporation SER Per SRAM Bit Trend SER per SRAM MBit 1,000 FITS 100 FITS 90 nm Projection 0.5 µm 1995 Copyright © 2004 Altera Corporation Process Technology Year 0.13 µm 2002 System Level Improvements Mitigation ECC for User Memory Use Detection/Correction Feature Triple Module Redundancy (TMR) To Achieve Lower Error Rate & Less Downtime Migrate to Structured ASIC Copyright © 2004 Altera Corporation Soft Error Detection Methods Configuration RAM Readout Read-Out Full Bitstream Compare with Stored Bitstream Can Determine where in Configuration Error Occurred Caveat: Security Issues with Reading Out Bitstream Stored CRAM Data FPGA Copyright © 2004 Altera Corporation Microprocessor or CPLD Same or Different? Soft Error Detection Methods On-Chip SEU Detection Dedicated Comparison Circuitry e.g. CRC Engine Comparing Stored CRC with That Calculated from Configuration RAM Detection Circuitry Running Continuously Error Detection Rate Variable Based on Implementation of Hardware, Number of CRAM Bits & Input Clock Frequency Error Signal Available Internally or Externally Caveat: Cannot Determine Where in Configuration Error Occurred FPGA Stored Value Computed Value Copyright © 2004 Altera Corporation = To Core On-Chip Detection Example Dedicated CRC Circuit Configuration RAM Verification Capability 32-Bit Cyclic Redundancy Code Check Verified Against Internally Stored Value Runs in the Background Without Impacting Device Performance Close to Real-Time Detection Variable Clock Frequency Depends on Number of CRAM Bits Multi-Event Detection Up to 3-Bit for 32-Bit CRC Result Output to Either Core or Pin Use with Either Internal or External Hardware for Error Correction Copyright © 2004 Altera Corporation Correction Methods FPGA Detection, System-Level Correction Lower Total Cost Downtime Is Limited & Manageable Used in Non-Critical Applications Triple Module Redundancy Two Flavors All On-Chip in FPGA Separate Chips & Voter Correction Can Be Real-Time Used in Critical Applications Copyright © 2004 Altera Corporation Single System Detection & Correction Step One: Detect the Soft Error 75% of Reported Errors Are “Don’t Care” Errors Step Two: Alert the System Step Three: Fix the Error In Some Cases, Re-Program the FPGA In Some Cases, Reboot the Sub-System In Some Cases, Reboot the System Need to Focus on System “Downtime” Each System Has Unique Requirements Re-Programming FPGA Takes < 250 ms Rebooting Time Varies & Can Be Fast “by Design” Copyright © 2004 Altera Corporation TMR Method 1 FPGA Hardware1 FPGA Hardware 2 FPGA or CPLD (Voting) FPGA Hardware3 Copyright © 2004 Altera Corporation Identical Hardware in FPGAs Use Voter Implemented in FPGA or CPLD Utilize Either Hardware Output or CRC Error Pin Voter Also Used to Signal Reconfiguration on Difference or Error TMR Method 2 Hardware 1 Hardware 2 Multiple Instantiations of Hardware in Single FPGA For Low-Rate SEUs SEU Events May Occur Much More Frequently than Functional Error (De-Rating) Voter Signals Reconfiguration of FPGA FPGA Must be Reconfigured Voting Circuit Hardware 3 FPGA Copyright © 2004 Altera Corporation De-Rating Methodology Only a Fraction of Configuration Bits Are Actually Programmed e.g. Using Only Two Inputs of 4-Input LUT Leaves 75% of LUT as “Don’t Care” Only About 20% of Routing Is Used Depends on Utilization & Application Some Un-Programmed Bits Still Matter Flipping Could Change Function of the Device Extensive Experimentation Shows a Range From 1/8 to 1/3 of the Bits Matter Copyright © 2004 Altera Corporation Structured ASIC: Ultimate SEU Protection PLD Architecture with ASIC Routing FPGA Structured ASIC No Configuration Memory = Estimated SER is below Hard Failure Rate for the Device Copyright © 2004 Altera Corporation Summary SEU is a Well Understood Phenomena Many Chip Level Enhancements Mitigate SEUs Process Design Manufacturing Techniques Easy Detection of SEU Events is Key After Detection, Other Methods Must be Employed to Deal with the Event Critical Nature of Application Determines Level of SEU Response Structured ASICs from FPGA Designs Offer a Much More Robust Solution Due to Removal of All CRAM Copyright © 2004 Altera Corporation