Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Deep Packet Inspection Which Implementation Platform? Sarang Dharmapurikar Cisco Implementation Platform • Several choices, each with some pros and cons – – – – – ASICs FPGA Network Processors Graphics Processors (nVidia) multiple-core, multi-threaded Commodity processors • Needs evaluation with respect to – – – – Cost Speed Overall system performance (DPI is just a small piece of the puzzle) Ease of use and upgrading • A hardware-software co-design approach – Profile a DPI system and push some components in hardware if the overall speed up is effective (Ahmdal’s law) ASIC • Examples: ClassiPi, NetLogic, Tarari, some Cisco ASICs • Requires too much investment – NRE close to a million dollars! • A long design cycle – Most of the time is consumed in verification • Hard to upgrade – Algorithms evolve – It is hard to build a flexible enough ASIC • Applications get locked to a platform – To migrate to a new platform requires a lot of software rewriting FPGA • Very flexible but expensive and power-consuming – Virtex-5 offers 330,000 lookup tables units – 4MB of SRAM • Latest Xilinx FPGA contain multiple PowerPC cores • Possible to design hybrid hw/sw systems – The compoents that assist DPI such as TCP-reassembly, normalization, flow classification done in hardware • Several FPGA platforms for networking acceleration available today – NetFPGA – FPX • Need to be careful in the DPI approach – The raw signature matching techniques that use FPGA logic resources for each signature won’t scale Network Processors • Intel IXP2850 – 16 micro-engines with • 2KB D$ and 8KB I$ and 16 entry CAM – An integrated XScale processor for control path • 32KB I$ and 32kB D$ – 2 Crypto units – 16KB shared scratch pad SRAM • Cisco QuantumFlow processor – 40 packet processing engines (PPE) each @ 1.2 GHz – 4 threads per PPE – Dedicated hardware for queuing, buffering, IP lookup and classification Commodity processors • Really powerful server class processors coming up – Intel’s Nehalem • 8 cores • 2 threads per core • 32KB L1, 256 KB L2, 10+MB of shared L3 cache – Sun’s Niagara2 • • • • 8 cores 8 threads per core! 16KB I$ and 8KB D$ per core, 4MB shared L2 cache. Integrated cryptographic coprocessors units • Need to think multi-core, multi-threaded – Think in terms of a complete system, not just pattern matching – Which core should do what? • Need to design cache-friendly data structures Conclusion • While hardware can assist DPI systems, building proprietary hardware not a good idea • Let’s understand the “actual” performance needs – Let’s not be misguided by “marketing” needs • Need to think of hardware-software co-design – Requires careful profiling of DPI systems to identify the components that can be pushed to hardware • Need to design algorithms for multi-core multi-threaded processors