Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
TCP STREAM PROCESSING AT GIGABIT LINE RATES TCP Processor HARDWARE CIRCUIT David Vincent Schuehler Dissertation Defense Washington University in St. Louis Department of Computer Science and Engineering November 3, 2004 Outline • Motivation and Background • Architecture and Related Work • Live Internet Traffic Processing • Conclusion and Future Work TCP Processor HARDWARE CIRCUIT David V. Schuehler 2 Motivation • Inspect data moving through networks • Enable application level data processing • Secure networks – Safeguard confidential data • Detect and prevent intrusions – Worms, viruses, spam, espionage • Mitigate denial of service attacks • Characterize and analyze network traffic • Operate at multi-gigabit data rates TCP Processor HARDWARE CIRCUIT David V. Schuehler 3 Transmission Control Protocol • 86% to 90% of all Internet traffic uses TCP – Web, email, file transfer, remote login, secure communications • Provides virtual bit pipe between two end systems – – – – Retransmission services Data reordering services Flow control services Congestion avoidance services Network Data Packets Moving Through Network Source Payload Header Data Payload TCP Hdr Destination IP Hdr Layout of Single Packet TCP Processor HARDWARE CIRCUIT David V. Schuehler 4 Internet Hand held computer G Government Agency Cell phone G Cellular tower C C Satellite uplink C G C Computer Computer Laptop C C C G C Computer Computer G C G UNIVERSITY University C Corporation G G G Gateway router Internet Service Provider C Core router TCP Processor HARDWARE CIRCUIT Municipality David V. Schuehler 5 Internet Hand held computer G Government Agency Cell phone G Cellular tower C C Satellite uplink C G C Computer Computer Laptop C C C G G C Computer Computer C G C UNIVERSITY University SPAM INTRUSION Corporation G G G Gateway router Internet Service Provider C Core router Municipality VIRUS TCP Processor HARDWARE CIRCUIT David V. Schuehler 6 Internet Hand held computer G Government Agency Cell phone G Cellular tower C C Satellite uplink C G C Computer Computer Laptop C C C G G C Computer Computer C G C UNIVERSITY University SPAM INTRUSION Corporation G G G Gateway router Internet Service Provider C Core router Municipality VIRUS TCP Processor HARDWARE CIRCUIT David V. Schuehler 7 Cost of Internet Attacks Economic Impact Worldwide (mi2g’04) Representative Attacks (cost) Year 2003 $236 Billion Sobig.F ($2B) Blaster ($1.3B) Slammer ($1.2B) 2002 $118 Billion KLEZ ($9B) Bugbear ($950M) 2001 $36 Billion Nimbda ($635M) Code Red ($2.62B) SirCam ($1.15B) 2000 $26 Billion Love Bug ($8.75B) 1999 $20 Billion Melissa ($1.10B) TCP Processor HARDWARE CIRCUIT David V. Schuehler 8 Economic Damage Estimate TCP Processor HARDWARE CIRCUIT David V. Schuehler 9 Design Requirements • Architecture that is fast – Hardware-based system – High-performance (multi-gigabit networks) – Per-flow context storage & retrieval • Architecture that is scalable – Performance improves with advances in technology • In-line traffic processing model • Implementation using reasonable resources – FPGA implementation can be done in research lab • Framework that is flexible – Integrates with multiple applications – Multi-device coordination of TCP stream processing TCP Processor HARDWARE CIRCUIT David V. Schuehler 10 Outline • • • • Motivation and Background Architecture and Related Work Live Internet Traffic Processing Conclusion and Future Work TCP Processor HARDWARE CIRCUIT David V. Schuehler 11 TCP-Processor Architecture Data Processing Circuit TCP Processing Architecture Input Buffer TCP Processing Engine Packet Routing State Store Manager Egress Stats Off-Chip Memory TCP Processor HARDWARE CIRCUIT David V. Schuehler 12 TCP Processing Engine Frame FIFO Checksum Engine Control & State FIFO TCP State Processing Flow Hash Computation Output State Machine Input State Machine TCP Processing Engine State Store Manager TCP Processor HARDWARE CIRCUIT David V. Schuehler 13 Challenges and Design Choices • Performance – Operate at multi-gigabit data rates – Hardware-based design exploiting pipelining and parallelism • Flow classification – Open addressing hash with limited bucket sizes • Context storage and retrieval – Requires memory read and write for each packet – 64-byte per-flow context - use burst read/write operations • Reassembly of out-of-order packets – Multiple processing modes (guaranteed and passive) • TCP processing – Flow monitoring instead of flow termination TCP Processor HARDWARE CIRCUIT David V. Schuehler 14 Link Speeds and Packet Rates Link Type Data rate 40 byte pkts/sec 64 byte pkts/sec 500 byte pkts/sec 1500 byte pkts/sec OC-3 155 Mbps .48 M .3 M 38 K 12 K OC-12 622 Mbps 1.9 M 1.2 M .16 M 52 K GigE 1.0 Gbps 3.1 M 2.0 M .25 M 83 K OC-48 2.5 Gbps 7.8 M 4.8 M .63 M .21 M OC-192 10 GigE 10 Gbps 31 M 20 M 2.5 M .83 M OC-768 40 Gbps 125 M 78 M 10 M 3.3 M TCP Processor HARDWARE CIRCUIT David V. Schuehler 15 Systems with TCP Processors • Load balancing systems SYN – Content (cookie) based request routing – Delayed binding technique – Limited to scanning start of flow • TCP offload engines E N D U S E R – Move TCP protocol processing to NIC – Targeting Gigabit NIC market – Intel, NEC, Adaptec, Lucent, and others • SSL Accelerators – Offload encryption/decryption – Protocol translation • Intrusion Detection Systems SYN ACK ACK L O A D Request Response B A L A N C E R Encrypted SYN SYN ACK ACK E N D U S E R W E B Request Response SYN SYN ACK S E R V E R ACK Request Response Not Encrypted S S L A C C E L E R A T O R SYN SYN ACK ACK Request Response W E B S E R V E R – Traffic Rates < 1Gbps – Perform content scanning and some stream reassembly TCP Processor HARDWARE CIRCUIT David V. Schuehler 16 Related Work in TCP Processing • Software-based TCP processing – – – – – – Ethereal, tcpdump, etc – require post processing Snort w/TCP option – larger virtual packets Cluster-based online monitoring system (Mao: WIDM’01) Bro – rule based processing (Paxson: Computer Networks’99) STAT/STATL – state based processing (Vingna: DISCEX’00) Intel – Xeon as packet processor (Regnier: HotI’03) • Hardware-based TCP processing – – – – – Georgia Tech – 1 flow/circuit (Necker: FCCM’02) University of Oslo – 1 flow/ circuit (Li: FPL’03) Indiana University and Imperial College – Netflow statistics University of Tokyo – multi-flow stream scanning (Sugawara: FPL’04) Intel TCP processor – 8k connections, 9Gbps (Xu: HotChips’03) • Network processors – Intel IXP 1200, 2400, 2800, 2850 – Motorola PowerQUICC TCP Processor HARDWARE CIRCUIT David V. Schuehler 17 Data Rate X Context Records Taxonomy of Packet Processors Store little or no state TCP-Processor Software based systems Intel projects Network Processors Experimental TCP Processor TCP-Processor Snort w/TCP option TCP Termination BRO/ STATL Packet Capture Other FPGA TCP Processors Load Balancer SSL Accelerator IP Lookup Packet Forwarding Software Hardware TCP Processor HARDWARE CIRCUIT David V. Schuehler 18 Multi-Device Coordination • Encodes interface signals • Regenerates waveforms on separate device • Provides extensible format & self describing structure Device 1 TCP Processing Circuit Encode Device 2 Device 3 Data Processing Circuit 1 Decode TCP Processor HARDWARE CIRCUIT Data Processing Circuit 2 Transport David V. Schuehler 19 Place & Route Results • • • • Including Protocol Wrappers & Encoder/Decoder Target Xilinx Virtex XCV2000E-8 FPX Platform Number of BLOCKRAMs – 95 out of 160 (59%) • Number of SLICEs – 7279 out of 19200 (37%) • Maximum clock frequency: 85.565MHz • Maximum data throughput: 2.7 Gbps • Maximum packets per second: 2.9M packets/sec – Min 29 clock cycles per packet (345 ns) – Throughput limited by memory latency TCP Processor HARDWARE CIRCUIT David V. Schuehler 20 Content Scanning TCP circuit Scan circuit Xilinx XCV2000E FPGA addr TCP Encode TCP Decode D[64] TCP Decode TCP Encode Scan Circuit addr CTL Proc TCP-Processor PC100 SDRAM update state D[64] query state PC100 SDRAM Xilinx XCV2000E FPGA IPWrapper addr ZBT SRAM D[36] IPWrapper Frame Wrapper Frame Wrapper Cell Wrapper Cell Wrapper State Store Ctl Cell Processor addr D[36] ZBT SRAM Control Interface Network Traffic TCP Processor HARDWARE CIRCUIT David V. Schuehler 21 Outline • • • • Motivation and Background Architecture and Related Work Live Internet Traffic Processing Conclusion and Future Work TCP Processor HARDWARE CIRCUIT David V. Schuehler 22 Washington University Network • 384 Mbps total Internet bandwidth – 300 Mbps Internet – 84 Mbps Internet2 • Approx 19,000 active end systems • Approx 10,000 students • Traffic analyzed for 5 week period – Aug 20th to Sep 24th – Over 1000 charts generated • Selected highlights presented TCP Processor HARDWARE CIRCUIT David V. Schuehler 23 Washington University Network Internet / Internet 2 To TCP Processor TCP Processor HARDWARE CIRCUIT David V. Schuehler 24 Live Internet Traffic Analysis WUGS-20 WUGS-20 External Stats Monitor Port 5 Port 4 Empty PortTracker Circuit Port 6 Port 3 Scan Circuit G-Link Switch Ctrl Port 7 Port 2 Unused TCP Processor Port 0 Port 1 GigE Line Card GigE Line Card TCP Processor HARDWARE CIRCUIT Standalone FPX-in-a-Box WashU Internet traffic David V. Schuehler 25 Data Collection Real-time processing MRTG queries the SNMP agent and generates traffic charts StatsCollector SNMP Agent Multi-Router Traffic Grapher A Perl script reads raw data files and calls gnuplot to generate charts gnuplot Pkts A SNMP agent publishes the statistics in a standard format Pkts Statistics are sent to StatsCollector application from hardware circuits StatsCollector spools raw data to disk files and retransmits stats Time Time TCP Processor HARDWARE CIRCUIT David V. Schuehler 26 Current Live Traffic TCP Processor HARDWARE CIRCUIT David V. Schuehler 27 Collected Statistics TCP Statistics Configuration Information SSM New Connections SSM End Connections SSM Reused Connections SSM Active Connections INB Input Words INB Input Packets INB Dropped Packets INB Output Packets ENG TCP Packets ENG SYN Packets ENG FIN Packets ENG RST Packets ENG Zero Length Packets ENG Retransmitted Packets ENG Out-of-Sequence Pkts ENG Bad Checksums RTR TCP Data Bytes RTR Client Packets RTR Bypass Packets EGR Client Packets In EGR Bypass Packets In EGR TCP Checksum Update EGR Packets Out Protocol Statistics Cells In Cells Dropped Cells Bypass Cells Out Frame Words In Frame Packets In IP Packets Dropped IP Packet Fragments IP Packets In IP Words In IP Packets Bypass IP Words Bypass IP Bad Checksum TCP Processor HARDWARE CIRCUIT Port Statistics FTP SSH Telnet SMTP TIM Nameserv Whois Login DNS TFTP Gopher Finger HTTP POP SFTP SQL NNTP NetBIOS SNMP BGP GACP IRC DLS LDAP HTTPS DHCP Lower Upper Scan Statistics String 1 String 2 String 3 String 4 David V. Schuehler 28 Typical Daily Traffic Pattern Lowest activity Highest activity TCP Processor HARDWARE CIRCUIT David V. Schuehler 29 IP and TCP Traffic Rates >90% TCP packets TCP Processor HARDWARE CIRCUIT David V. Schuehler 30 Zero Length TCP Packets 20-40% zero length pkts TCP Processor HARDWARE CIRCUIT David V. Schuehler 31 Fragmented IP Packets .25% Fragmented TCP Processor HARDWARE CIRCUIT David V. Schuehler 32 Packet Sequencing 3x-4x more retransmitted TCP Processor HARDWARE CIRCUIT David V. Schuehler 33 Packet Sequencing (cont) 3%-4% Retransmitted TCP Processor HARDWARE CIRCUIT 1% Out of Seq David V. Schuehler 34 Worm/Virus Detection • Search for digital signatures • MyDoom (appeared 1/26/04) – – – – – Spread via email attachment Opens back door via ports 3127-3198 Contains SMTP engine to replicate itself Contains denial of service attack (25% operational) At Peak, 1 in 12 emails contained virus • Netsky (appeared 3/1/04) – Spread via email attachment – Scans drives C through Z looking for email addresses – Contains SMTP engine to replicate itself TCP Processor HARDWARE CIRCUIT David V. Schuehler 35 MyDoom Virus Detection TCP Processor HARDWARE CIRCUIT David V. Schuehler 36 Netsky Virus Detection TCP Processor HARDWARE CIRCUIT David V. Schuehler 37 Denial of Service Attack • TCP SYN Attack – 8 minutes in duration – 71,000 TCP pkts/sec avg (34,000 normal) – 40,000 TCP SYN pkts/sec avg (2,000 normal) • IP attack (non TCP traffic) – 3.5 minutes in duration – 91,000 IP pkts/sec peak (36,000 normal) – 57,000 Non-TCP pkts/sec peak (2,000 normal) TCP Processor HARDWARE CIRCUIT David V. Schuehler 38 Attack Difficult to Detect TCP: 10:25 to 10:34am IP: 10:37 to 10:41am TCP Processor HARDWARE CIRCUIT David V. Schuehler 39 Both Attacks Visible Non-TCP attack TCP attack TCP Processor HARDWARE CIRCUIT David V. Schuehler 40 TCP SYN Attack 20x increase in SYN packets TCP Processor HARDWARE CIRCUIT David V. Schuehler 41 Attack Directed at SSH Port counter saturated True spike at 2.4 M pkts TCP Processor HARDWARE CIRCUIT David V. Schuehler 42 Non-TCP Attack 29x increase in non-TCP packets TCP Processor HARDWARE CIRCUIT David V. Schuehler 43 Flow Classification and Attacks • • • • • • State store contains 1 million records Record removed after TCP FIN or RST Stale records are not aged out 500,000 to 800,000 active records normal DoS attack can cause flow saturation Table quickly settles back to normal range TCP Processor HARDWARE CIRCUIT David V. Schuehler 44 Active State Store Records 400,000 new flows TCP Processor HARDWARE CIRCUIT David V. Schuehler 45 Outline • • • • Motivation and Background Architecture and Related Work Live Internet Traffic Processing Conclusion and Future Work TCP Processor HARDWARE CIRCUIT David V. Schuehler 46 Insights • 20%-40% zero length packets – Increase from 18% to 22% (Shalunov: Internet2‘01) – Implies larger amount of 1-way traffic – Optimization skips processing of these packets • 5% out of order packets – Agrees with results from (Jaiswal: Infocom‘03) • Flow classification tables need to be larger – Flow table ½ to ¾ full during normal processing – 1M entry table saturated during attack • Automated response systems required – Short lived attacks difficult to address manually TCP Processor HARDWARE CIRCUIT David V. Schuehler 47 Contributions • Developed Architecture for TCP-Processor – Hardware-based system – High-performance (multi-gigabit networks) – Per-flow context storage & retrieval • Implemented TCP-Processor in Reprogrammable Hardware – Operates at 85Mhz on Xilinx Virtex 2000E FPGA – Maximum throughput of 2.7 Gbps – Maximum 2.9M packets/sec • Created inter-device protocol TCP applications – Multi-device coordination of TCP stream processing – Interfaces with TCP-Processor – Self-describing/extensible transport protocol • Analyzed live Internet traffic – Insight into Internet traffic profiles • Supported academic and commercial endeavors TCP Processor HARDWARE CIRCUIT David V. Schuehler 48 Future Work • • • • • • • • Packet defragmentation Flow classification Packet storage manager 10Gbps and 40Gbps data processing Histogram (packet size, packet type, etc) Event rate detection Traffic sampling and real-time analysis Application integration TCP Processor HARDWARE CIRCUIT David V. Schuehler 49 Acknowledgments • Advisor & committee – – – – – John Lockwood (advisor) Chris Gill Ron Loui Ron Indeck Dave Schimmel • Reuters (formerly Bridge) – Scott Parsons – Deb Grossman – John Leighton • Recommendations – – – – • ARL faculty & staff – – – – Jon Turner Patrick Crowley Fred Kuhns John DeHart • CSE faculty & staff • ARL & FPX students • NTS • Reviewers – Tanya Yatzeck – James Hartley • Family – Jerry & Lois (parents) – Chris & Kreslyn – Nancy, Jeff & Nathan – Steve Wiese • Global Velocity – Matthew Kulig Scott Parsons Don Bertier Andy Cox Chris Gray • Friends TCP Processor HARDWARE CIRCUIT David V. Schuehler 50 Questions TCP Processor HARDWARE CIRCUIT David V. Schuehler 51