Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Reconfigurable Networking Recap – Genetic Pattern Matching • Comparing strings by edit distance • Motivation: The Human Genome Project • Do two genetic strings match? • How are they related? • When biologists characterize a new sequence, they want to compare it to the (growing) database of known sequences • Abstraction: • What is the cost of transforming s into t • Given – costs for insertion, deletion, substitution September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.2 Alphabet and Costs • Alphabet • Letters in the string. For DNA, there are four: • A (Adenine) • C (Cytosine) • T (Thymine) • G (Guanine) • Transformation Costs • Insert:1, Delete:1, Substitute:2, match:0 • Type of comparison • One target to many sources • One target to one source September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.3 Substitution Example Word Move Cost baboon Delete ‘o’ 1 bab|on Substitute ‘o’ 2 bobon Insert ‘u’ 1 boubon Insert ‘r’ 1 bourbon Match? 0 bourbon September 13, 2007 Total cost: 5 CprE 583 – Reconfigurable Computing Lect-08.4 Dynamic Programming Solution • Source sequence: s1, s2, … sm • Target sequence: t1, t2, … tn • di,j = distance between subsequence s1, s2, …si and subsequence t1, t2, …tj, where d0,0 = 0 di,0 = di-1,0 + Delete(si) d0,j = d0,j-1 + Insert(tj) di-1,j + Delete(si) di,j = min di,j-1 + Insert(tj) di-1,j-1 + Substitute(si, tj) • Distance(Source, Target) = dm,n September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.5 Dynamic Programming Example b a b o o n September 13, 2007 0 1 2 3 4 5 6 b 1 0 1 2 2 3 4 o 2 1 2 2 2 2 3 u 3 2 3 3 3 3 4 r 4 3 4 4 3 4 5 b 5 3 4 4 4 5 5 CprE 583 – Reconfigurable Computing o 6 4 5 5 4 5 6 n 7 5 6 6 5 6 5 Lect-08.6 Parallelism on the Anti-Diagonal b a b o o n 0 1 2 3 4 5 6 September 13, 2007 b 1 0 1 2 2 3 4 o 2 1 2 2 2 2 3 u 3 2 3 3 3 3 4 r 4 3 4 4 3 4 5 b 5 3 4 4 4 5 5 o 6 4 5 5 4 5 6 n 7 5 6 6 5 6 5 di-1,j-1 di-1,j di,j-1 di,j CprE 583 – Reconfigurable Computing Lect-08.7 Bidirectional PE Example If (SCin != 0) and (TCin != 0) PEDist + Substitute(SCin, TCin) PEDist min TDin + Delete(SCin) SDin + Insert(TCin) else-if (SCin != 0) PEDist SDin else-if (TCin != 0) SCin PEDist TDin SDin Bidir Bidir endif TCout PE PE TDout SCout SCin PEDist PEDist TCout TCin SDout PEDist TDout PEDist September 13, 2007 CprE 583 – Reconfigurable Computing SCout SDout Bidir PE TCin TDin PEDist Lect-08.8 Bidirectional Summary • 16 CLBs/PE • 384 PEs/Board • 2,100 Million Cells/sec • Requires 2*(m+n) PEs • Uses only half the processors at any one time • Must stream both source and target for each comparison • Makes comparison against large DB impractical September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.9 Genetic Search Performance • Nearly linear scaling in cell updates per second (CUPS) • Need to reuse array for large patterns λ Area 0.60μ 500Mλ2x17x16 0.60μ 500Mλ2x16 0.60μ 420Mλ2x32 2.0μ 7.8Mλ2x34 Hardware Splash 2 x16 Splash 2 Splash 1 P-NAC (34) CUPS 43,000M 3,000M 370M 500M CM-2 (64K) 150M ? CM-5 (32) SPARC 10 SPARC 1 33M 1.2M 0.87M ? 0.40μ 0.75μ September 13, 2007 1.6GMλ2 273Mλ2 CprE 583 – Reconfigurable Computing CUP/λ2s 0.32 0.38 0.028 1.9 0.00075 0.0032 Lect-08.10 Outline • Recap – Pattern Matching on Splash-2 • The Field-Programmable Port Extender (FPX) • FPX Architecture • FPX Programming Model • FPX Applications • Pattern Matching • Packet Classification • Rule Processing September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.11 Application – Network Processing • Networking applications well-suited for reconfigurable hardware • Target signatures change often • Massive quantities of stream-based data • Repetitive operations • Connecting up to a realistic networking environment is hard • Washington University experimental setup one of the best • Shows importance of both memory and processing capability • Numerous experiments performed over the past five years September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.12 Network Routing with the FPX Line Card OC3/ OC12/ OC48 FPX Fieldprogrammable Port Extender IPP OPP IPP OPP Gigabit Switch Fabric Line Card OC3/ OC12/ OC48 FPX Fieldprogrammable Port Extender IPP OPP IPP OPP IP Packets IP Packets • • • • FPX Modules distributed across each port of a switch IP packets (over ATM) enter and depart line card Packet fragments processed by modules Advantages: • New protocols implemented directly in silicon • Easy to upgrade in the field September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.13 FPX Hardware Device September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.14 FPX Hardware in a WUGS-20 Switch September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.15 FPGA-based Router (backside) WashU / ARL July, 2000 JL / MR RAD Program SRAM RAD Reprogrammable Application Device NID Netw ork Interface Device Virtex1000E fg680 OSC OSC 10 MHz 100MHz JTAG 62.5 MHz OSC NID SRAM 8Mbit ZBT (backside) SDRAM EPROM OC3 / OC12 / OC48 Linecard Connector VRM: 1.8V Switcher WUGS Switch Backplane Connector FPX SDRAM SRAM 8Mbit ZBT Reprog RAD/NID Status • FPX module contains two FPGAs • NID – network interface device • Performs data queuing • RAD – reprogrammable application device • Specialized control sequences September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.16 Reprogrammable Application Device Module Data SDRAM Data SRAM Module Data SDRAM Data SRAM RAD Network Interfaces to NID • Spatial Re-use of FPGA Resources • Modules implemented using FPGA logic • Module logic can be individually reprogrammed • Shared Access to off-chip resources • Memory Interfaces to SRAM and SDRAM • Common Datapath to send and receive data September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.17 Architecture of the FPX Data SDRAM Data SRAM Module Module Data SDRAM Data SRAM RAD VC VC RAD Program SRAM VC EC • RAD VC NID EC Switch LineCard • Large Xilinx FPGA • Attaches to SRAM and SDRAM • Reprogrammable over network • Provides two user-defined Module Interfaces • NID • Provides Utopia Interfaces between switch & line card • Forwards cells to RAD • Programs RAD September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.18 Architecture of the FPX (cont.) FPX Block Diagram SDRAM Flow Buffer SRAM SRAM Extensible Modules Route Filter SDRAM Layered Protocol Wrappers FPX Photo Memory RAD (FPGA) PROM Config Program Cache NID (FPGA) September 13, 2007 Switch Network Interface CprE 583 – Reconfigurable Computing Lect-08.19 FPX SRAM • Provide low latency for fast table-lookups • Zero Bus Turnaround (ZBT) allows back-to-back read / write operations every 10ns • Dual, Independent Memories • 36-bit wide bus September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.20 FPX SDRAM • Dual, independent SDRAM memories • 64-bit wide, 100 MHz • 64Mb / Module : 128 Mb total [expandable] • Burst-based transactions [1-8 word transfers] • Latency of 14 cycles to Read/Write 8-word burst September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.21 Routing Traffic Flows • Traffic flows routed among • Switch • Line Card • RAD.Switch • RAD.Linecard • Functions EC • Check packets for errors ccp • Process commands VC NID VC ccp EC • Control, status, & reprogramming VC • VC VC EC Switch LineCard Implement per-flow forwarding September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.22 Typical Flow Configurations RAD Switch RAD LineCard VC RAD Switch VC VC VC VC ccp EC LineCard (Bypass) RAD Switch VC VC VC ccp EC LineCard Full RAD Processing (Packet Routing and Reassembly) September 13, 2007 LineCard Switch (IP Routing) RAD Switch VC VC EC Switch (System Test) VC VC ccp EC LineCard Full Loopback Testing RAD LineCard VC VC ccp EC LineCard Ingress Processing RAD LineCard VC VC EC Switch EC (Per-flow Output Queueing) RAD Switch VC ccp EC Egress Processing RAD LineCard VC VC EC Switch RAD LineCard VC VC ccp EC Default Flow Action RAD Switch VC VC EC Switch RAD LineCard VC EC Switch LineCard Partial Loopback Testing (Egress Flow Processing Test) CprE 583 – Reconfigurable Computing Lect-08.23 Reprogramming Logic • • NID programs at boot from EPROM Switch Controller writes RAD configuration memory to NID • Configuration file for RAD arrives transmitted over network via control cells • • Switch Controller issues {Full/Partial} reconfigure command NID reads RAD config memory to program RAD • Performs complete or partial reprogramming of RAD VRM September 13, 2007 8Mbit ZBT SRAM SRAM 8Mbit ZBT IPP OPP LC IPP OPP LC IPP OPP LC IPP Switch OPP LC IPP OPP LC IPPElement OPP LC IPP OPP LC IPP OPP LC WUGS Switch Backplane Connector LC LC LC LC LC LC LC LC (backside) RAD RAD NID Program EPROM FIFO NID Reprogrammable Application Device Virtex1000E fg680 (backside) SDRAM Network Interface Device OSC 100MHz OSC 62.5 MHz PCB Trace Density CprE 583 – Reconfigurable Computing VRM OC3 / OC12 / OC48 Linecard Connector 2.5V (backside) SDRAM 1.8V (backside) Lect-08.24 FPX Interfaces Provides • Well defined Interface • Utopia-like 32-bit fast data interface • Flow control allows back-pressure • Flow Routing • Arbitrary permutations of packet flows through ports • Dynamically Reprogrammable • Other modules continue to operate even while new module is being reprogrammed • Memory Access • Shared access to SRAM and SDRAM • Request/Grant protocol September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.25 Pattern Matching using the FPX • Use Hardware to detect a pattern in data • Modify packet based on match • Pipeline operation to maximize throughput September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.26 “Hello, World” Module Function September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.27 Logical Implementation Append “WORLD” to payload VCI Match New Cell September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.28 The Wrapper Concept App Wrapper Wrapper September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.29 AAL5 Encapsulation ATMHeader 0 • Payload is packed in cells • Padding may be added • 64 bit Trailer at end of cell • Trailer contains CRC-32 • Last Cell indication bit (last bit of PTI field) Payload 0 Payload 1 Padding AAL5 Trailer September 13, 2007 options length CRC-32 CprE 583 – Reconfigurable Computing Lect-08.30 HelloBob Module SRAM Interface Input UDP Hello Bob Output Echo UDP Processor IP Processor Frame Processor Cell Processor September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.31 Results: Performance • Operating Frequency: 119 MHz. • 8.4ns critical path • Well within the 10ns period RAD's clock. • Targeted to RAD’s V1000E-FG680-7 • Maximum packet processing rate: • 7.1 Million packets per second. • (100 MHz)/(14 Clocks/Cell) • Circuit handles back-to-back packets • Slice utilization: • 0.4% (49/12,288 slices) • Less than one half of one percent of chip resources • Search technique can be adapted for other types of data matching and modification • Regular expressions • Parsing image content … September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.32 CAM-based Packet Matching • Sample Packet: • • • • • • Source Address = 128.252.5.5 (dotted.decimal) Destination Address = 141.142.2.2 (dotted.decimal) Source Port = 4096 (decimal) Destination Port = 50 (decimal) Protocol = TCP (6) Payload = “Consolidate your loans. CALL NOW” • Payload Lists = { General SPAM (0), Save Money SPAM (1) } • Content Vector = “00000011” (binary) = x”03” (hex) 111 104 103 Content = 03 September 13, 2007 72 Src IP (hex) = 80FC0505 71 40 39 Dest IP (hex) = 8D8E0202 Src Port = 1000 CprE 583 – Reconfigurable Computing 87 Dest Port = 0050 0 Proto = 06 Lect-08.33 Sample Filter • • • • • • Source Address = 128.252.0.0 / 16 Destination Address = 141.142.0.0 / 16 Source Port = Don’t Care Destination Port = 50 Protocol = TCP (6) Payload includes general SPAM (List 0) Conten t= 01 Src IP value = 80FC0000 Dest IP (hex) = 8D8E0000 Src Port = 0000 Dest Port = 50 Conten t= 01 Src IP (hex) = FFFF0000 Dest IP (hex) = FFFF0000 Src Port = 0000 Dest Proto Port = = FF FFFF 103 Content= 03 72 Src IP (hex) = 80FC0505 71 40 39 Dest IP (hex) = 8D8E0202 Src Port = 1000 Proto = 06 8 7 0 Dest Port = 0050 Proto = 06 Value Mask: 1=care 0=don’t care IP Packet DROP the packet : It matches the filter September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.34 Packet Classifier with FlowID 16 bits 112 bits Flow ID [1] CAM MASK [1] CAM VALUE [1] Flow ID [2] CAM MASK [2] CAM VALUE [2] 16 bits - - CAM Table - - Flow ID Flow ID [3] CAM MASK [3] CAM VALUE [3] Resulting Flow Identifier ... ... Flow ID [N] ... CAM MASK [N] CAM VALUE [N] Bits in IP Header Flow List Priority Encoder Mask Matchers Value Comparators September 13, 2007 Payload Match Bits Source Address CprE 583 – Reconfigurable Computing Source Port Destination Address Protocol Dest. Port Lect-08.35 Fast IP Lookup Algorithm • Function • Search for best matching prefix using Trie algorithm Prefix * 01* 10* 110* 0001* 1011* 00110* 01011* September 13, 2007 Next Hop 4 7 2 9 1 0 5 3 1 0 0 0 1 1 0 1 1 0 CprE 583 – Reconfigurable Computing 0 1 0 1 1 1 1 Lect-08.36 Hardware Implementation in the FPX SRAM1 Extract IP Headers SRAM1 Interface 1 0 Remap VCIs Request Grant for IP packets IP Lookup Engine 0 0 1 1 0 0 1 0 1 counter 1 On-Chip Cell Store SRAM2 LC September 13, 2007 Packet Reassembler 0 RAD FPGA 1 1 1 1 Control Cell Processor NID FPGA CprE 583 – Reconfigurable Computing SW Lect-08.37 Pipelined FIPL Operations Generate Address Latch ADDR into SRAM SRAM D < M[A] Latch Data into FPGA Compute Time (cycles) Space (Parallel lookup units on FPGA) Time (cycles) • Throughput : Optimized by interleaving memory accesses • Operate 5 parallel lookups • t_pipelined_lookup = 550ns / 5 = 110 ns • Throughput = 9.1 Million packets / second September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.38 Other Modules Implemented • IPv6 Tunneling Module • Tunnels IPv6 over IPv4 • IPv4 CAM Filter • 104 Bit header matching • Statistics Module • Event counter • Fast IP Lookup (FIPL) • Longest Prefix Match • MAE-West at 10M pkts/second • Traffic Generator • Per-flow mixing • Packet Content Scanner • Reg. Expression Search • Video Recoder • Motion JPEG • Embedded Processor • KCPSM September 13, 2007 • Data Queueing • Per-flow queue in SDRAM CprE 583 – Reconfigurable Computing Lect-08.39 Summary • Field Programmable Port Extender (FPX) • Network-accessible Hardware • Reprogrammable Application Device • Module Deployment • Modules implement fast processing on data flow • Network allows Arbitrary Topologies of distributed systems • Project Website • http://www.arl.wustl.edu/arl/projects/fpx/ September 13, 2007 CprE 583 – Reconfigurable Computing Lect-08.40