Download A Survey of Fault Tolerant Methodologies for FPGA’s

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A Survey of Fault Tolerant
Methodologies for FPGA’s
Gökhan Kabukcu
2006703357
Outline
 Introduction to FPGA’s
 Device-Level Fault Tolerance
 Methods
 Configuration Level Fault Tolerance
 Methods
 Comparison of Methodologies
 Conclusion
Introduction (FPGA)
 A field programmable gate array
is a semiconductor device containing
programmable logic components and
programmable interconnects
 Consists of regular arrays of
processing logic blocks (PLBs)
 Programmable routing matrix
 Configuration of FPGA includes




The functionality of the FPGA
Which PLBs will be used
The functionality of the PLBs
Which wire segments will be used for
connecting PLBs
Introduction (FPGA)
 PLB’s are multi-input,
multi-output circuits and
allow:
 Sequential Designs
 Combinational Designs
 PLB’s include:
 Look Up Tables (LUTs or
small ROMs)
 Multiplexers
 Flip-Flops
Introduction (FPGA)
 Look Up Tables (LUTs):
 4 input-1 output units
 Can be used as:




RAM
ROM
Shift Register
Functional Unit
 Configured by an 16-bit
“INIT” function
Introduction (FPGA)
 An Example:




y=(x1+x2)*x3+x4
Create truth table
Assign “INIT” to the LUT
Since there are 4 inputs and 1 output, 1
LUT is enough to represent the equation
 The LUT can be put into any PLB
in the FPGA
x1
x2
x3
x4
y
0
0
0
0
0
0
0
0
1
1
0
0
1
0
0
0
0
1
1
1
0
1
0
0
0
0
1
0
1
1
0
1
1
0
1
0
1
1
1
1
1
0
0
0
0
1
0
0
1
1
1
0
1
0
1
1
0
1
1
1
1
1
0
0
0
1
1
0
1
1
1
1
1
0
1
1
1
1
1
1
Introduction (FPGA)
 Another Example:
 y=(x1+x2)*x3+x4
 z=y*x5
 Create truth tables
 Assign “INIT”s to LUTs
 Since there are 5 inputs
and 1 output, 2 LUTs
needed to represent the
equation
 The LUTs can be put into
any PLBs in the FPGA
 A1 and A0 are “don’t
care”s
x1
x2
x3
x4
y
0
0
0
0
0
0
0
0
1
1
0
0
1
0
0
0
0
1
1
1
0
1
0
0
0
0
1
0
1
1
0
1
1
0
1
0
1
1
1
1
1
0
0
0
0
1
0
0
1
1
1
0
1
0
1
1
0
1
1
1
1
1
0
0
0
1
1
0
1
1
1
1
1
0
1
1
1
1
1
1
y
x5
z
0
0
0
0
1
0
1
0
0
1
1
1
Introduction (FPGA)
 An example of a full
design on an FPGA
Fault Tolerance
 Device-Level Fault Tolerance
 Attempts to deal with faults at the level of FPGA
hardware
 Select redundant HW, replace faulty one
 Solution with extra HW resources
 Configuration-Level Fault Tolerance
 Tolerates faults at the level of FPGA
configuration
 When a circuit is placed, fault-free resources are
selected
 Status of the resources is considered each time
a circuit is placed-and-routed
 Solution with extra reconfiguration time
Device-Level FT Methods(1)

Extra Rows





One extra spare row is
added
Selection Logic is
added to bypass the
defective row
Vertical wire segments
are increased by one
row
Faults in one row can
be tolerated
More than 1 spare row
needed to tolerate
faults in multiple rows
Device-Level FT Methods(2)
 Reconfiguration Network
 Four architectural
changes
 Additional routing
resources (bypass lines)
 Reconfiguration Memory to
store locations of faulty
resources
 On-chip circuitry for
reconfiguration routing
 Additional column of PLBs
Device-Level FT Methods(2)
 Reconfiguration Network
 Test and identify faulty resources
 Create fault map
 Load map into Reconfiguration
Memory
 On-board router avoids faulty
resources
 The network is constructed by
shifting all PLBs in the faultcontaining row towards the right
 Method can tolerate 1 fault in
each row if there is one extra
spare column.
Device-Level FT Methods(3)
 Self-Repairing Architecture
 Sub-arrays of PLBs
 Routers between sub-arrays
 Extra columns of PLBs
 PLBs constantly test themselves
 If a fault is detected,
 Column of affected PLB is shifted one
position to the right
 The inter-array routers are adjusted
 Area overhead of this method is
significant
 If there is 1 spare column and N subarrays in vertical, method can tolerate N
faults at a time
Device-Level FT Methods(4)
 Block-Structured
Architecture
 Goal: tolerate larger and
denser patterns of
defects efficiently
 Blocks of PLBs
 FPGA is configured by a
loading arm.
 The block at the end of
loading arm is
configured
Device-Level FT Methods(4)

Block-Structured Architecture






A block is selected by the
loading arm and tested
If the test is passed, it is
configured, otherwise
designated as faulty
Loading arm configures blocks
one by one
If the arm cannot extend any
further in a path, it’s retracted
by one block
Fault tolerance is provided by
redundant rows and/or columns
Area overhead is significant
Device-Level FT Methods(5)
 Fault Tolerant Segments/Grids
 Fault Tolerant Segments:
 Adds one track of spare segment to
each wiring channel
 If a faulty segment is found, segment
is shifted to spare
 Single fault can be tolerated
 Fault Tolerant Grids:
 An entire spare routing grid is added
 No additional elements in routing
channel, no extra time delay
Device-Level FT Methods(6)
 SRAM Shifting
 Based on shifting the entire circuit on the
FPGA
 PLBs should be placed in 2 ways:
 King Allocation: 8 PLBs uses one spare,
circuit can move in 8 directions
 Horse Allocation: 4 PLBs uses one spare,
circuit can move in 4 directions
 Testing determines the faulty cells, feeds
information to the shifter circuitry on the
FPGA.
Device-Level FT Methods(6)
 SRAM Shifting
 Additional spare PLBs
surrounding the FPGA
 Horse Allocation used in the
figure
 The circuit is shifted up and right
 Advantages of the Method:
 No external reconfiguration
algorithm is required
 The timing of the circuit is
almost fixed
 Any single fault can be tolerated
Configuration-Level FT Methods(1)
 Pebble Shifting
 Find an initial circuit configuration, then
move pieces from faulty units
 Occupied PLBs are called pebbles
 Pair pebbles on faulty cells with unique,
unused cells such that sum of weighted
Manhattan distance is minimized
 Start shifting pebbles
 If a pebble finds an empty cell other than
the intended cell, this empty cell
becomes the destination
 No limit to the number of faults that can
be tolerated
Configuration-Level FT Methods(1)
 Pebble Shifting
 Example:
 1 and 6 are on faulty cells
 Using a minimum-cost, maximum
matching algorithm, pairings are:
1->v11 and 6->v32
 Element 1 is shifted its position
 To move 6, we shift 3,8 and 7
 Now all elements are on non-faulty
cells and allocation is done
Configuration-Level FT Methods(2)
 Mini-Max Grid Matching
 Uses a grid matching algorithm to match faulty
logic to empty, non-faulty locations
 Like Pebble Shifting, uses minimum cost,
maximum matching algorithm
 Minimizes the maximum distance between the
pairings, since the circuit’s performance is set by
the critical (longest) path
 Can tolerate faults until there are no unused
cells
Configuration-Level FT Methods(3)
 Node-Covering and Cover
Segments
 When a fault is discovered,
nodes are shifted along the
chain (row) towards the right
 The last PLB of a chain is
reserved as a spare
 One fault in a row can be
tolerated
 Needs no reconfiguration if
local routing configurations
are present
Configuration-Level FT Methods(4)
 Tiling
 Partition FPGA into tiles
 Precompiled configurations of tiles are
stored in memory
 Each tile contains system function, some
spare logic and interconnect resources
 When a logic fault occurs in a tile, the
configuration of the tile is replaced by a
configuration that does not use the
faulty resources
 Many logic faults can be tolerated
 Local interconnect faults can be
tolerated, but global ones can’t be
tolerated
Configuration-Level FT Methods(5)
 Cluster-Based
 Intracluster tolerance in a PLB
 Basic Logic Elements (BLEs or LUTs)
 For simple LUT faults, preferred
solution is to use another LUT in the
PLB
 Instead of changing PLB, try to find a
solution in the same PLB
 In example, T is faulty and 4th PLB is
used instead of 2nd PLB
Configuration-Level FT Methods(6)
 Column-Based
 Treats the design as a set of functional
units, each unit is a column
 Like Tiling, less cost precompiled
configurations
 At least one column should be spare
 If there is a faulty cell in a column, the
column is shifted toward the spare
column
 Method can tolerate m faulty columns,
where m is the number of columns not
occupied by system functions
Comparison of Methodologies(1)
 Device Level (DL) Methods need extra HW and
have more area cost
 DL Methods use one initial reconfiguration and
no extra reconfiguration cost
 Configuration Level Methods needs more than
one reconfiguration and sometimes result in
high time cost
 CL Methods don’t need extra HW and no
additional area cost
Comparison of Methodologies(2)
 DL Methods are less flexible, therefore less
able to improve reliability
 CL Methods usually tolerate more faults than
DL Methods
 Performance impact of fault tolerance is less
for DL Methods than CL Methods
Conclusion
 No single Fault Tolerance
methodology is better than the others
in all cases.
 DL Techniques has less impact on
performance, but not flexible
 CL Methods tolerates more faults but
have more impact on performance