Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CprE / ComS 583
Reconfigurable Computing
Prof. Joseph Zambreno
Department of Electrical and Computer Engineering
Iowa State University
Lecture #8 – Reconfigurable Networking
Recap – Genetic Pattern Matching
• Comparing strings by edit distance
• Motivation: The Human Genome Project
• Do two genetic strings match?
• How are they related?
• When biologists characterize a new sequence,
they want to compare it to the (growing)
database of known sequences
• Abstraction:
• What is the cost of transforming s into t
• Given – costs for insertion, deletion, substitution
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.2
Alphabet and Costs
• Alphabet
• Letters in the string. For DNA, there are four:
• A (Adenine)
• C (Cytosine)
• T (Thymine)
• G (Guanine)
• Transformation Costs
• Insert:1, Delete:1, Substitute:2, match:0
• Type of comparison
• One target to many sources
• One target to one source
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.3
Substitution Example
Word
Move
Cost
baboon
Delete ‘o’
1
bab|on
Substitute ‘o’
2
bobon
Insert ‘u’
1
boubon
Insert ‘r’
1
bourbon
Match?
0
bourbon
September 13, 2007
Total cost: 5
CprE 583 – Reconfigurable Computing
Lect-08.4
Dynamic Programming Solution
• Source sequence: s1, s2, … sm
• Target sequence: t1, t2, … tn
• di,j = distance between subsequence s1, s2, …si
and subsequence t1, t2, …tj, where
d0,0 = 0
di,0 = di-1,0 + Delete(si)
d0,j = d0,j-1 + Insert(tj)
di-1,j + Delete(si)
di,j = min di,j-1 + Insert(tj)
di-1,j-1 + Substitute(si, tj)
• Distance(Source, Target) = dm,n
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.5
Dynamic Programming Example
b
a
b
o
o
n
September 13, 2007
0
1
2
3
4
5
6
b
1
0
1
2
2
3
4
o
2
1
2
2
2
2
3
u
3
2
3
3
3
3
4
r
4
3
4
4
3
4
5
b
5
3
4
4
4
5
5
CprE 583 – Reconfigurable Computing
o
6
4
5
5
4
5
6
n
7
5
6
6
5
6
5
Lect-08.6
Parallelism on the Anti-Diagonal
b
a
b
o
o
n
0
1
2
3
4
5
6
September 13, 2007
b
1
0
1
2
2
3
4
o
2
1
2
2
2
2
3
u
3
2
3
3
3
3
4
r
4
3
4
4
3
4
5
b
5
3
4
4
4
5
5
o
6
4
5
5
4
5
6
n
7
5
6
6
5
6
5
di-1,j-1
di-1,j
di,j-1
di,j
CprE 583 – Reconfigurable Computing
Lect-08.7
Bidirectional PE Example
If (SCin != 0) and (TCin != 0)
PEDist + Substitute(SCin, TCin)
PEDist  min TDin + Delete(SCin)
SDin + Insert(TCin)
else-if (SCin != 0)
PEDist  SDin
else-if (TCin != 0)
SCin
PEDist  TDin
SDin
Bidir
Bidir
endif
TCout
PE
PE
TDout
SCout  SCin
PEDist
PEDist
TCout  TCin
SDout  PEDist
TDout  PEDist
September 13, 2007
CprE 583 – Reconfigurable Computing
SCout
SDout
Bidir
PE
TCin
TDin
PEDist
Lect-08.8
Bidirectional Summary
• 16 CLBs/PE
• 384 PEs/Board
• 2,100 Million Cells/sec
• Requires 2*(m+n) PEs
• Uses only half the processors at any one time
• Must stream both source and target for each
comparison
• Makes comparison against large DB impractical
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.9
Genetic Search Performance
• Nearly linear scaling in cell updates per
second (CUPS)
• Need to reuse array for large patterns
λ
Area
0.60μ 500Mλ2x17x16
0.60μ
500Mλ2x16
0.60μ
420Mλ2x32
2.0μ
7.8Mλ2x34
Hardware
Splash 2 x16
Splash 2
Splash 1
P-NAC (34)
CUPS
43,000M
3,000M
370M
500M
CM-2 (64K)
150M
?
CM-5 (32)
SPARC 10
SPARC 1
33M
1.2M
0.87M
?
0.40μ
0.75μ
September 13, 2007
1.6GMλ2
273Mλ2
CprE 583 – Reconfigurable Computing
CUP/λ2s
0.32
0.38
0.028
1.9
0.00075
0.0032
Lect-08.10
Outline
• Recap – Pattern Matching on Splash-2
• The Field-Programmable Port Extender (FPX)
• FPX Architecture
• FPX Programming Model
• FPX Applications
• Pattern Matching
• Packet Classification
• Rule Processing
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.11
Application – Network Processing
• Networking applications well-suited for reconfigurable
hardware
• Target signatures change often
• Massive quantities of stream-based data
• Repetitive operations
• Connecting up to a realistic networking environment is
hard
• Washington University experimental setup one of the
best
• Shows importance of both memory and processing
capability
• Numerous experiments performed over the past five
years
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.12
Network Routing with the FPX
Line
Card
OC3/
OC12/
OC48
FPX
Fieldprogrammable
Port
Extender
IPP
OPP
IPP
OPP
Gigabit
Switch
Fabric
Line
Card
OC3/
OC12/
OC48
FPX
Fieldprogrammable
Port
Extender
IPP
OPP
IPP
OPP
IP Packets
IP Packets
•
•
•
•
FPX Modules distributed across each port of a switch
IP packets (over ATM) enter and depart line card
Packet fragments processed by modules
Advantages:
• New protocols implemented directly in silicon
• Easy to upgrade in the field
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.13
FPX Hardware Device
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.14
FPX Hardware in a WUGS-20 Switch
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.15
FPGA-based Router
(backside)
WashU / ARL
July, 2000
JL / MR
RAD
Program
SRAM
RAD
Reprogrammable
Application
Device
NID
Netw ork
Interface
Device
Virtex1000E fg680
OSC OSC
10 MHz
100MHz
JTAG
62.5 MHz
OSC
NID
SRAM
8Mbit ZBT
(backside)
SDRAM
EPROM
OC3 / OC12 / OC48 Linecard Connector
VRM: 1.8V Switcher
WUGS Switch Backplane Connector
FPX
SDRAM
SRAM
8Mbit ZBT
Reprog
RAD/NID Status
• FPX module contains two FPGAs
• NID – network interface device
• Performs data queuing
• RAD – reprogrammable application device
• Specialized control sequences
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.16
Reprogrammable Application Device
Module
Data
SDRAM
Data
SRAM
Module
Data
SDRAM
Data
SRAM
RAD
Network Interfaces to NID
• Spatial Re-use of FPGA Resources
• Modules implemented using FPGA logic
• Module logic can be individually reprogrammed
• Shared Access to off-chip resources
• Memory Interfaces to SRAM and SDRAM
• Common Datapath to send and receive data
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.17
Architecture of the FPX
Data
SDRAM
Data
SRAM
Module
Module
Data
SDRAM
Data
SRAM
RAD
VC
VC
RAD
Program
SRAM
VC
EC
• RAD
VC
NID
EC
Switch
LineCard
• Large Xilinx FPGA
• Attaches to SRAM and
SDRAM
• Reprogrammable over
network
• Provides two user-defined
Module Interfaces
• NID
• Provides Utopia
Interfaces between
switch & line card
• Forwards cells to RAD
• Programs RAD
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.18
Architecture of the FPX (cont.)
FPX Block Diagram
SDRAM
Flow
Buffer
SRAM
SRAM
Extensible
Modules
Route
Filter
SDRAM
Layered Protocol Wrappers
FPX Photo
Memory
RAD (FPGA)
PROM
Config
Program
Cache
NID (FPGA)
September 13, 2007
Switch
Network Interface
CprE 583 – Reconfigurable Computing
Lect-08.19
FPX SRAM
• Provide low latency for fast table-lookups
• Zero Bus Turnaround (ZBT) allows back-to-back
read / write operations every 10ns
• Dual, Independent Memories
• 36-bit wide bus
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.20
FPX SDRAM
• Dual, independent SDRAM memories
• 64-bit wide, 100 MHz
• 64Mb / Module : 128 Mb total [expandable]
• Burst-based transactions [1-8 word transfers]
• Latency of 14 cycles to Read/Write 8-word burst
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.21
Routing Traffic Flows
• Traffic flows routed among
• Switch
• Line Card
• RAD.Switch
• RAD.Linecard
• Functions
EC
• Check packets for errors
ccp
• Process commands
VC
NID
VC
ccp
EC
• Control, status, & reprogramming
VC
•
VC
VC
EC
Switch
LineCard
Implement per-flow forwarding
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.22
Typical Flow Configurations
RAD
Switch
RAD
LineCard
VC
RAD
Switch
VC
VC
VC
VC
ccp
EC
LineCard
(Bypass)
RAD
Switch
VC
VC
VC
ccp
EC
LineCard
Full RAD Processing
(Packet Routing and Reassembly)
September 13, 2007
LineCard
Switch
(IP Routing)
RAD
Switch
VC
VC
EC
Switch
(System Test)
VC
VC
ccp
EC
LineCard
Full Loopback Testing
RAD
LineCard
VC
VC
ccp
EC
LineCard
Ingress Processing
RAD
LineCard
VC
VC
EC
Switch
EC
(Per-flow Output Queueing)
RAD
Switch
VC
ccp
EC
Egress Processing
RAD
LineCard
VC
VC
EC
Switch
RAD
LineCard
VC
VC
ccp
EC
Default Flow Action
RAD
Switch
VC
VC
EC
Switch
RAD
LineCard
VC
EC
Switch
LineCard
Partial Loopback Testing
(Egress Flow Processing Test)
CprE 583 – Reconfigurable Computing
Lect-08.23
Reprogramming Logic
•
•
NID programs at boot from EPROM
Switch Controller writes RAD configuration memory to NID
• Configuration file for RAD arrives transmitted over network via
control cells
•
•
Switch Controller issues {Full/Partial} reconfigure command
NID reads RAD config memory to program RAD
• Performs complete or partial reprogramming of RAD
VRM
September 13, 2007
8Mbit ZBT
SRAM
SRAM
8Mbit ZBT
IPP
OPP LC
IPP
OPP LC
IPP
OPP LC
IPP Switch OPP LC
IPP
OPP LC
IPPElement OPP LC
IPP
OPP LC
IPP
OPP LC
WUGS Switch Backplane Connector
LC
LC
LC
LC
LC
LC
LC
LC
(backside)
RAD
RAD
NID
Program
EPROM
FIFO
NID
Reprogrammable
Application
Device
Virtex1000E fg680
(backside)
SDRAM
Network
Interface
Device
OSC
100MHz
OSC
62.5 MHz
PCB Trace Density
CprE 583 – Reconfigurable Computing
VRM
OC3 / OC12 / OC48 Linecard Connector
2.5V
(backside)
SDRAM
1.8V
(backside)
Lect-08.24
FPX Interfaces Provides
• Well defined Interface
• Utopia-like 32-bit fast data interface
• Flow control allows back-pressure
• Flow Routing
• Arbitrary permutations of packet flows through ports
• Dynamically Reprogrammable
• Other modules continue to operate even while new
module is being reprogrammed
• Memory Access
• Shared access to SRAM and SDRAM
• Request/Grant protocol
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.25
Pattern Matching using the FPX
• Use Hardware to detect a pattern in data
• Modify packet based on match
• Pipeline operation to maximize throughput
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.26
“Hello, World” Module Function
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.27
Logical Implementation
Append
“WORLD”
to payload
VCI Match
New
Cell
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.28
The Wrapper Concept
App
Wrapper
Wrapper
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.29
AAL5 Encapsulation
ATMHeader
0
•
Payload is packed in cells
•
Padding may be added
•
64 bit Trailer at end of cell
•
Trailer contains CRC-32
•
Last Cell indication bit
(last bit of PTI field)
Payload
0
Payload
1
Padding
AAL5 Trailer
September 13, 2007
options
length
CRC-32
CprE 583 – Reconfigurable Computing
Lect-08.30
HelloBob Module
SRAM
Interface
Input
UDP
Hello Bob
Output
Echo
UDP Processor
IP Processor
Frame Processor
Cell Processor
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.31
Results: Performance
• Operating Frequency: 119 MHz.
• 8.4ns critical path
• Well within the 10ns period RAD's clock.
• Targeted to RAD’s V1000E-FG680-7
• Maximum packet processing rate:
• 7.1 Million packets per second.
• (100 MHz)/(14 Clocks/Cell)
• Circuit handles back-to-back packets
• Slice utilization:
• 0.4% (49/12,288 slices)
• Less than one half of one percent of chip resources
• Search technique can be adapted for other types of
data matching and modification
• Regular expressions
• Parsing image content …
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.32
CAM-based Packet Matching
• Sample Packet:
•
•
•
•
•
•
Source Address = 128.252.5.5 (dotted.decimal)
Destination Address = 141.142.2.2 (dotted.decimal)
Source Port = 4096 (decimal)
Destination Port = 50 (decimal)
Protocol = TCP (6)
Payload = “Consolidate your loans. CALL NOW”
• Payload Lists = { General SPAM (0), Save Money SPAM (1) }
• Content Vector = “00000011” (binary) = x”03” (hex)
111
104
103
Content
= 03
September 13, 2007
72
Src IP (hex) =
80FC0505
71
40 39
Dest IP (hex) =
8D8E0202
Src
Port =
1000
CprE 583 – Reconfigurable Computing
87
Dest
Port =
0050
0
Proto
= 06
Lect-08.33
Sample Filter
•
•
•
•
•
•
Source Address = 128.252.0.0 / 16
Destination Address = 141.142.0.0 / 16
Source Port = Don’t Care
Destination Port = 50
Protocol = TCP (6)
Payload includes general SPAM (List 0)
Conten t=
01
Src IP value =
80FC0000
Dest IP (hex) =
8D8E0000
Src
Port =
0000
Dest
Port =
50
Conten t=
01
Src IP (hex) =
FFFF0000
Dest IP (hex) =
FFFF0000
Src
Port =
0000
Dest
Proto
Port =
= FF
FFFF
103
Content=
03
72
Src IP (hex) =
80FC0505
71
40 39
Dest IP (hex) =
8D8E0202
Src
Port =
1000
Proto
= 06
8
7 0
Dest
Port =
0050
Proto
= 06
Value
Mask:
1=care
0=don’t care
IP Packet
DROP the packet : It matches the filter
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.34
Packet Classifier with FlowID
16 bits
112 bits
Flow ID [1]
CAM MASK [1]
CAM VALUE [1]
Flow ID [2]
CAM MASK [2]
CAM VALUE [2]
16 bits
- - CAM Table - -
Flow ID
Flow ID [3]
CAM MASK [3]
CAM VALUE [3]
Resulting
Flow
Identifier
...
...
Flow ID [N]
...
CAM MASK [N]
CAM VALUE [N]
Bits in IP Header
Flow List
Priority Encoder
Mask Matchers
Value Comparators
September 13, 2007
Payload
Match Bits
Source Address
CprE 583 – Reconfigurable Computing
Source
Port
Destination Address
Protocol
Dest.
Port
Lect-08.35
Fast IP Lookup Algorithm
• Function
• Search for best matching prefix using
Trie algorithm
Prefix
*
01*
10*
110*
0001*
1011*
00110*
01011*
September 13, 2007
Next Hop
4
7
2
9
1
0
5
3
1
0
0
0
1
1
0
1
1
0
CprE 583 – Reconfigurable Computing
0
1
0
1
1
1
1
Lect-08.36
Hardware Implementation in the FPX
SRAM1
Extract
IP Headers
SRAM1 Interface
1
0 Remap VCIs
Request
Grant
for IP packets
IP Lookup
Engine
0
0
1
1
0
0
1
0
1
counter
1
On-Chip Cell Store
SRAM2
LC
September 13, 2007
Packet
Reassembler
0
RAD FPGA
1
1
1
1
Control
Cell
Processor
NID FPGA
CprE 583 – Reconfigurable Computing
SW
Lect-08.37
Pipelined FIPL Operations
Generate
Address
Latch ADDR
into SRAM
SRAM
D < M[A]
Latch Data
into FPGA
Compute
Time (cycles)
Space
(Parallel
lookup
units on
FPGA)
Time (cycles)
• Throughput : Optimized by interleaving memory accesses
• Operate 5 parallel lookups
• t_pipelined_lookup =
550ns / 5 = 110 ns
• Throughput = 9.1 Million packets / second
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.38
Other Modules Implemented
• IPv6 Tunneling Module
• Tunnels IPv6 over IPv4
• IPv4 CAM Filter
• 104 Bit header matching
• Statistics Module
• Event counter
• Fast IP Lookup (FIPL)
• Longest Prefix Match
• MAE-West at 10M
pkts/second
• Traffic Generator
• Per-flow mixing
• Packet Content Scanner
• Reg. Expression Search
• Video Recoder
• Motion JPEG
• Embedded Processor
• KCPSM
September 13, 2007
• Data Queueing
• Per-flow queue in
SDRAM
CprE 583 – Reconfigurable Computing
Lect-08.39
Summary
•
Field Programmable Port Extender (FPX)
• Network-accessible Hardware
• Reprogrammable Application Device
•
Module Deployment
• Modules implement fast processing on data flow
• Network allows Arbitrary Topologies of distributed systems
•
Project Website
• http://www.arl.wustl.edu/arl/projects/fpx/
September 13, 2007
CprE 583 – Reconfigurable Computing
Lect-08.40