Download A Survey of Fault Tolerant Methodologies for FPGA’s

A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu 2006703357 Outline  Introduction to FPGA’s  Device-Level Fault Tolerance  Methods  Configuration Level Fault Tolerance  Methods  Comparison of Methodologies  Conclusion Introduction (FPGA)  A field programmable gate array is a semiconductor device containing programmable logic components and programmable interconnects  Consists of regular arrays of processing logic blocks (PLBs)  Programmable routing matrix  Configuration of FPGA includes     The functionality of the FPGA Which PLBs will be used The functionality of the PLBs Which wire segments will be used for connecting PLBs Introduction (FPGA)  PLB’s are multi-input, multi-output circuits and allow:  Sequential Designs  Combinational Designs  PLB’s include:  Look Up Tables (LUTs or small ROMs)  Multiplexers  Flip-Flops Introduction (FPGA)  Look Up Tables (LUTs):  4 input-1 output units  Can be used as:     RAM ROM Shift Register Functional Unit  Configured by an 16-bit “INIT” function Introduction (FPGA)  An Example:     y=(x1+x2)*x3+x4 Create truth table Assign “INIT” to the LUT Since there are 4 inputs and 1 output, 1 LUT is enough to represent the equation  The LUT can be put into any PLB in the FPGA x1 x2 x3 x4 y 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 Introduction (FPGA)  Another Example:  y=(x1+x2)*x3+x4  z=y*x5  Create truth tables  Assign “INIT”s to LUTs  Since there are 5 inputs and 1 output, 2 LUTs needed to represent the equation  The LUTs can be put into any PLBs in the FPGA  A1 and A0 are “don’t care”s x1 x2 x3 x4 y 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 y x5 z 0 0 0 0 1 0 1 0 0 1 1 1 Introduction (FPGA)  An example of a full design on an FPGA Fault Tolerance  Device-Level Fault Tolerance  Attempts to deal with faults at the level of FPGA hardware  Select redundant HW, replace faulty one  Solution with extra HW resources  Configuration-Level Fault Tolerance  Tolerates faults at the level of FPGA configuration  When a circuit is placed, fault-free resources are selected  Status of the resources is considered each time a circuit is placed-and-routed  Solution with extra reconfiguration time Device-Level FT Methods(1)  Extra Rows      One extra spare row is added Selection Logic is added to bypass the defective row Vertical wire segments are increased by one row Faults in one row can be tolerated More than 1 spare row needed to tolerate faults in multiple rows Device-Level FT Methods(2)  Reconfiguration Network  Four architectural changes  Additional routing resources (bypass lines)  Reconfiguration Memory to store locations of faulty resources  On-chip circuitry for reconfiguration routing  Additional column of PLBs Device-Level FT Methods(2)  Reconfiguration Network  Test and identify faulty resources  Create fault map  Load map into Reconfiguration Memory  On-board router avoids faulty resources  The network is constructed by shifting all PLBs in the faultcontaining row towards the right  Method can tolerate 1 fault in each row if there is one extra spare column. Device-Level FT Methods(3)  Self-Repairing Architecture  Sub-arrays of PLBs  Routers between sub-arrays  Extra columns of PLBs  PLBs constantly test themselves  If a fault is detected,  Column of affected PLB is shifted one position to the right  The inter-array routers are adjusted  Area overhead of this method is significant  If there is 1 spare column and N subarrays in vertical, method can tolerate N faults at a time Device-Level FT Methods(4)  Block-Structured Architecture  Goal: tolerate larger and denser patterns of defects efficiently  Blocks of PLBs  FPGA is configured by a loading arm.  The block at the end of loading arm is configured Device-Level FT Methods(4)  Block-Structured Architecture       A block is selected by the loading arm and tested If the test is passed, it is configured, otherwise designated as faulty Loading arm configures blocks one by one If the arm cannot extend any further in a path, it’s retracted by one block Fault tolerance is provided by redundant rows and/or columns Area overhead is significant Device-Level FT Methods(5)  Fault Tolerant Segments/Grids  Fault Tolerant Segments:  Adds one track of spare segment to each wiring channel  If a faulty segment is found, segment is shifted to spare  Single fault can be tolerated  Fault Tolerant Grids:  An entire spare routing grid is added  No additional elements in routing channel, no extra time delay Device-Level FT Methods(6)  SRAM Shifting  Based on shifting the entire circuit on the FPGA  PLBs should be placed in 2 ways:  King Allocation: 8 PLBs uses one spare, circuit can move in 8 directions  Horse Allocation: 4 PLBs uses one spare, circuit can move in 4 directions  Testing determines the faulty cells, feeds information to the shifter circuitry on the FPGA. Device-Level FT Methods(6)  SRAM Shifting  Additional spare PLBs surrounding the FPGA  Horse Allocation used in the figure  The circuit is shifted up and right  Advantages of the Method:  No external reconfiguration algorithm is required  The timing of the circuit is almost fixed  Any single fault can be tolerated Configuration-Level FT Methods(1)  Pebble Shifting  Find an initial circuit configuration, then move pieces from faulty units  Occupied PLBs are called pebbles  Pair pebbles on faulty cells with unique, unused cells such that sum of weighted Manhattan distance is minimized  Start shifting pebbles  If a pebble finds an empty cell other than the intended cell, this empty cell becomes the destination  No limit to the number of faults that can be tolerated Configuration-Level FT Methods(1)  Pebble Shifting  Example:  1 and 6 are on faulty cells  Using a minimum-cost, maximum matching algorithm, pairings are: 1->v11 and 6->v32  Element 1 is shifted its position  To move 6, we shift 3,8 and 7  Now all elements are on non-faulty cells and allocation is done Configuration-Level FT Methods(2)  Mini-Max Grid Matching  Uses a grid matching algorithm to match faulty logic to empty, non-faulty locations  Like Pebble Shifting, uses minimum cost, maximum matching algorithm  Minimizes the maximum distance between the pairings, since the circuit’s performance is set by the critical (longest) path  Can tolerate faults until there are no unused cells Configuration-Level FT Methods(3)  Node-Covering and Cover Segments  When a fault is discovered, nodes are shifted along the chain (row) towards the right  The last PLB of a chain is reserved as a spare  One fault in a row can be tolerated  Needs no reconfiguration if local routing configurations are present Configuration-Level FT Methods(4)  Tiling  Partition FPGA into tiles  Precompiled configurations of tiles are stored in memory  Each tile contains system function, some spare logic and interconnect resources  When a logic fault occurs in a tile, the configuration of the tile is replaced by a configuration that does not use the faulty resources  Many logic faults can be tolerated  Local interconnect faults can be tolerated, but global ones can’t be tolerated Configuration-Level FT Methods(5)  Cluster-Based  Intracluster tolerance in a PLB  Basic Logic Elements (BLEs or LUTs)  For simple LUT faults, preferred solution is to use another LUT in the PLB  Instead of changing PLB, try to find a solution in the same PLB  In example, T is faulty and 4th PLB is used instead of 2nd PLB Configuration-Level FT Methods(6)  Column-Based  Treats the design as a set of functional units, each unit is a column  Like Tiling, less cost precompiled configurations  At least one column should be spare  If there is a faulty cell in a column, the column is shifted toward the spare column  Method can tolerate m faulty columns, where m is the number of columns not occupied by system functions Comparison of Methodologies(1)  Device Level (DL) Methods need extra HW and have more area cost  DL Methods use one initial reconfiguration and no extra reconfiguration cost  Configuration Level Methods needs more than one reconfiguration and sometimes result in high time cost  CL Methods don’t need extra HW and no additional area cost Comparison of Methodologies(2)  DL Methods are less flexible, therefore less able to improve reliability  CL Methods usually tolerate more faults than DL Methods  Performance impact of fault tolerance is less for DL Methods than CL Methods Conclusion  No single Fault Tolerance methodology is better than the others in all cases.  DL Techniques has less impact on performance, but not flexible  CL Methods tolerates more faults but have more impact on performance

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download A Survey of Fault Tolerant Methodologies for FPGA’s