* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Flynn`s Classifications (1972) [1]
Voltage optimisation wikipedia , lookup
Opto-isolator wikipedia , lookup
Power engineering wikipedia , lookup
Time-to-digital converter wikipedia , lookup
Power over Ethernet wikipedia , lookup
Distribution management system wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Mains electricity wikipedia , lookup
Alternating current wikipedia , lookup
Microprocessor wikipedia , lookup
Paper Review A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Babak Noory Professor Maitham Shams 97.575 March 18, 2002 Agenda 1. Look at the evolution of Digital Signal Processors 2. Review the emerging system requirements 3. Summarize recent advances in low power DSP techniques 4. Look at a number of new high performance architectures 5. Describe a bus based multi-core architecture for task level parallelism Introduction General Purpose Digital Signal Processors Introduced in 1980 - High performance engines - MAC speed advantage of 50:1 over the best micro-processors Today - Modest performance improvements - Outperformed by micro-processors DSP Evolution Performance of DSPs vs. Microprocessors Performance 1K (Peak MACs) Pentium MMX DSP-1600 100 DSP-16 10 DSP-16210 Pentium DSP-32C DSP-1 80386 80286 1 M68000 1980 1985 1990 1995 2000 And yet, DSPs generate over $ 3 billion dollars for the semiconductor industry every year. DSP Evolution Power and Cost of DSP’s vs. Microprocessors Power (mW/MIP) M68000 ($200) 10K 80286 ($200) 1K 80386 ($300) DSP-1 ($150) Pentium ($500) DSP-32C ($250) 100 DSP-16A ($15) DSP-1600 (<$10) 10 1 1980 1985 1990 Lower cost Higher MOP/mm2 and MOP/mW 1995 2000 Emerging Applications Very Low Power Applications Portable Applications: functionalities such as video and web browsing added to cellular phones, PDAs, and Multimedia Laptops Average power becomes the main design constrain High Performance Applications Embedded Applications: digital audio broadcast and smart phones PC based Applications: 3-D graphics and real-time video communications Infrastructure Applications: modem head-end and wireless basestations Low Power Techniques 1. Full Custom Datapath Layout Circuit Topology Layout Topology Simple Transistor Sizing Layout Parasitics Drain Capacitance 45.6 fF Finger 18.7 fF Ring 10.8 fF X S X D S W a) Simple D W/2 S S W/4 b) Finger Courtesy [1] D c) Ring Low Power Techniques 2. Clock Gating System Level Clock Gating: Limit data transition and clock dissipation to active sub-systems Local Clock Gating: Deactivate non-active elements in a sequential circuit Gate CPU T Crystal Oscillator Operation Mode Power Gate CPU Section 1 & To boards 1-3 Normal Mode (80MHz) 120mW Gate CPU Section 2 & To boards 4-6 Standby (Halt) 21 mW Gate CPU Section 3 & To boards 7-9 System Clock Slow Clock (16KHz) StopClk Courtesy [4] 2.3mW 30uW Low Power Techniques 3. Minimizing Data Transitions Applicable to circuits, where data transitions are well understood Difficult to estimate internal node activity for complex circuits A B x C B C Z x P(A=1) = 0.5 A Z P(B=1) = 0.2 P(C=1) = 0.1 Activity at node x = 0.09 Activity at node x = 0.0196 Courtesy [3] Low Power Techniques 4. Partitioned Memory Architecture Memories occupy a great deal of silicon area, but activity factors in these individual circuits are very low. Adopt hierarchical sub-banking Replace large memory blocks with several smaller blocks Make use of gated clocks to limit switching activity to active blocks Low Power Techniques 5. Technology &Voltage Scaling Adjusting supply voltages to meet performance requirements Mixed voltage & mixed threshold logic families Dynamic voltage scaling: Supply voltage and clock speed vary continuously according to processor load Supply “cut off:” High threshold transistors used to cut off the power when chip goes in sleep mode Emerging Applications (Revisited) Very Low Power Applications Portable Applications: functionalities such as video and web browsing added to cellular phones, PDAs, and Multimedia Laptops Average power becomes the main design constrain High Performance Applications Embedded Applications: digital audio broadcast and smart phones PC based Applications: 3-D graphics and real-time video communications Infrastructure Applications: modem head-end and wireless basestations New Class of architectures Minor enhancements in combination with process improvement will not meet the requirements of emerging applications. The new architectures must provide: Performance ranging from hundreds of MOPS to tens of GOPS Parallel architectures, many operations/clock Large memory and I/O bandwidth Cache hierarchies Compiler driven programming environment High-level programming languages Scalability Media Processors Architecture clock Performance Memory Programming TI Chromatics Philips IBM Samsung C80 MPACT Tri-Media MFAST MSP-1 4 64bDSP VLIW/SIMD VLIW VLIW/SIMD 32-way SIMD + 32b RISC 4 ALUs 25 exec. Units 4by4 folded array + 32b RISC 40 MHz 62 MHz 100 MHz 50 MHz 100 MHz 1.2 GOPS 2.0 GOPS 4.0 GOPS 20 GOPS 6.4 GOPS DRAM RAMBUS SDRAM SDRAM SDRAM 400 MB/s 500 MB/s 400 MB/s 800 MB/s 800 MB/s Compiler + Assembler In-house VLIW Compiler Compiler + Assembler Compiler + Assembler Very high performance Very fast memories Yet all programs (save Tri-Media) have been cancelled Media Processors Reasons: 1. Programmability Issues - Required large quantities of assembly code - Explicit management of task level and instruction level parallelism 2. Lack of Scalability - Single price/performance (except for C80) 3. Difficult Market - Multimedia applications on PC - Caught between high-performance ASICS and Software solutions Daytona MIMD Architecture Task Level Parallelism Code and data Ext. mem Scalability I/O Memory & I/O Controller Bus support for N DSP cores STBus Cache memory Simulation has shown that N can be in the range of 8 to 10 processors ! host cache cache cache DSP DSP DSP Daytona DSP Core Architecture LIW Machine STBus 32b SPARC + 64b SIMD Instruction level parallelism: Bus Interface - 64b instructions - 2 x 32b RISC operations 8kB Instruction and Data Cache - 32b RISC + 32b coprocessor extension DSP core programming in C 32b SPARC RISC up 64b 8-way SIMD Vector Coprocessor Conclusions(1) The DSP world is changing Emerging applications in combination with few backward compatibility issues require new architectures, which can maximize: Parallelism Scalability Programmability Generality While other measures must be taken to minimize: Cost Time to Market Conclusions(2) The DSP world is changing What will separate the DSPs from general purpose microprocessors in the future, will simply be the cost factor. Advances in programmable hardware field are also very promising, and could further change the DSP landscape in the future. References [1] A. P. Chandrakasan and R.W. Brodersen, “Low Power Digital CMOS Design,” Kluwer Academic Publishers: Norwell, 1995. [2] K. D. Wagner, “Clock System Design,” IEEE Design & Test of Computers, PP. 9-27, October 1988 [3] L. Wanhammar, “DSP Integrated Circuits,” Academic Press: London: 1999. [4] K. Hwang, “Advanced Computer Architecture: Parallelism, Scalability, Programmability,” McGraw-Hill: New York, 1993. [5] T. Kudra and T. Sakurai, “Overview of Low-Power ULSI Circuit Techniques,” IEICE Transactions on Electronics, Vol. E78-C, NO.4, PP. 334-344, April 1995 [6] C. Hamacher, Z. Vranesic and S. Zaky, “Computer Organization,” fifth edition, McGraw-Hill: New York, 2002. [7] M. M. Mano, “Computer System Architecture,” McGraw-Hill: New York, 1993.