* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ㄹㅇㅎㄹㅇㅎㄹㅇㅎㄹㅇㅎㄹㅇㅎㄹ - VADA
Transmission line loudspeaker wikipedia , lookup
Quality of service wikipedia , lookup
Topology (electrical circuits) wikipedia , lookup
Power engineering wikipedia , lookup
Time-to-digital converter wikipedia , lookup
History of electric power transmission wikipedia , lookup
Telecommunications engineering wikipedia , lookup
Alternating current wikipedia , lookup
Victor Bahl wikipedia , lookup
Power over Ethernet wikipedia , lookup
Electronic engineering wikipedia , lookup
MPSoC Clock and Power Olivier Franza, Intel • Increased uncertainty with process scaling – Process, voltage, temperature variations, noise, coupling • Affects design margin over design, power & performance loss – Increased power constraints – Increasing leakage, power (density, delivery) limitations • More transistors mean: – Larger clock distribution networks – Higher capacitance (more load and parasitics) • With each new technology: – Gate delay decreases ~25% – Wire delay increases ~100% – Cross-chip communication increases – Clock needs multiple cycles to cover die © 조준동, 2006년 가을 1 테크놀로지 스케일링에 따른 저항성분은 증가하고 정전용량은 줄어들지 않는다. © 조준동, 2006년 가을 2 온칩 버스에서 소모하는 에너지는 전체 에너지의 1/4 © 조준동, 2006년 가을 3 Interconnect Delays & Density Hannu Tenhunen & Dr. Li-Rong Zheng, Royal Institute of Technology © 조준동, 2006년 가을 4 Multiple Clocks due to Interconnect limitation © 조준동, 2006년 가을 5 At reduced performance, larger resource size © 조준동, 2006년 가을 6 Multiple clock domains • Low skew and jitter ALWAYS a must • Clock modeling requires more accuracy • Within-die variations, inductance, crosstalk, electromigration, self-heat, … • Floor plan modularity • Think adding/removing cores seamlessly! • Hierarchical clock partitioning • Reduce global clock and possibly relax its requirements • Generate “locally”-used clock “locally” • Implement clock domain deskewing techniques • Bound clock problem into simple, reliable, efficient domains © 조준동, 2006년 가을 7 DEC/Compaq Alpha more complex core to improve performance, more complex clocks (?), Source: DEC/Compaq – Gronoski & al., JSSC 1998 – Xanthopoulos & al., ISSCC 2001 – Barroso & al., ISCA 2000 © 조준동, 2006년 가을 8 Clock and Power Convergence Intel® Itanium® Montecito • Each core split into 3 clock domains on variable power supply • Each domain controlled by Digital Frequency Divider (DFD) generating low-skew variablefrequency clocks; fed by central PLL and aligned through phase detectors • Regional Voltage Detector (RVD): supply voltage monitor • Second level clock buffer (SLCB): digitally controlled delay buffer for active deskewing • Regional Active Deskew (RAD): phase comparators monitoring and adjusting delay difference between SLCBs • Clock Vernier Device (CVD): digitally controlled delay buffer © 조준동, 2006년 가을 9 On-Chip Interconnects: Circuits and Signaling, Wayne Burleson • Using Vdd programmability • High Vdd to devices on critical path • Low Vdd to devices on non-critical paths • VddOff for inactive paths A – Baseline Fabric B – Fabric with Vdd Configurable Interconnect This work builds on a similar idea for FPGAs described in: Fei Li, Yan Lin and Lei He. Vdd Programmability to Reduce FPGA Interconnect Power, IEEE/ACM International © 조준동, 2006년 Conference on Computer-Aided Design, Nov. 2004 가을 10 From Spaghetti wires to Noc Marcello Coppola, STMicroelectronics © 조준동, 2006년 가을 11 Benchmarks, EE Times,7/2005 • Xpipes, Bologna and Stanford : compared w/ Amba AHB multilayer bus, 21% faster, but worse latency • When, Univ. of Kaiserslautern: LPDC decoder: 500Mhz vs 64 Mhz (fixed bus), but 30W vs. 700mW, twice the die size. • Arteris: better die size, comparable power consumption, 740Mhz (250Mhz) • SonicsMX: power-efficient mobile-handset w/ power management • STNoC, Spidergon: topology w/ degree 2-3 © 조준동, 2006년 가을 12 NoC Applications http://www.eit.uni-kl.de/wehn • Turbo-Decoder UMTS compliant, 100Mbit: large flexibilty w/ 14 parallel units, area = 16.84 mm2 (14mm2 PUs, 2.8mm2 NoC) • LDPC Decoding, T. Theocharides, G. Link, N. Chip, T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin, Int. Conference on VLSI Design 2005 – 1024 Bit block size, 1.2Gb/s, R=0.75 – NoC: 5x5 2D mesh, dimension-order routing, large flexibility – 160nm CMOS Technology, 1.8V, 500 MHz, 110 mm2, ~30 Watt © 조준동, 2006년 가을 13 Reliable design, G. De Micheli 1. Manufacturing imperfections: More likely to happen as lithography scales down 2. Approximations during design: Uncertainty about details of design 3. Aging: Oxide breakdown,electromigration 4. Environment-induced Soft-errors (Data corruption due external radiation exposure), electro-magnetic interference 5. Operating-mode induced: Extremely-low voltage supply © 조준동, 2006년 가을 14 Dealing with variability • Most variability problems that induce timing errors 1. 2. 3. 4. Power supply variation Wire length estimation Crosstalk Soft errors © 조준동, 2006년 가을 15 Adaptive low-power transmission scheme Frédéric Worm, Patrick Thiran, Giovanni De Micheli, and Paolo Ienne. Self-calibrating Networks-on-Chip.In Proceedings of the IEEE International Symposium on Circuits and Systems, Kobe, Japan, May 2005. © 조준동, 2006년 가을 16 Reduced Energy Consumption © 조준동, 2006년 가을 17 Low-Power Network-on-Chip for High-Performance SoC Design Lee, K.; Lee, S.-J.; Yoo, H.-J. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on Volume 14, Issue 2, Feb. 2006 Page(s):148 - 160 Digital Object Identifier 10.1109/TVLSI.2005.863753 Se-Joong Lee; Kangmin Lee; Seong-Jun Song; Hoi-Jun Yoo Circuits and Systems II: Express Briefs, IEEE Transactions on [see also Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on] Volume 52, Issue 6, June 2005 Page(s):308 - 312 Digital Object Identifier 10.1109/TCSII.2005.848972 성균관대학교 정보통신공학부 © 조준동 2006년 여름 18 Contents • Introduction • NoC Architecture – Overall Architecture – Packet Routing Scheme – Chip-to-Chip Connectivity – Topology Selection – Physical Transfer Unit Size – Hierarchical Circuit/Packet Switching – Synchronization – NoC Protocol • Low-Power Techniques – Low-Swing Signal – Mux-Tree Based Round-Robin Scheduler – Crossbar Partial Activation Technique – Low-Energy Coding On-Chip Serial Link • Implementation and Measurement Result • Conclusion © 조준동, 2006년 가을 19 Introduction • System-on-Chip (SoC) – More than a billion transistors are integrated on a single chip.[1. 1] – Wire delays have become more critical[1. 2] – The synchronization problem Heterogeneous NoC architecture • The clock frequencies increase • The feature sizes decrease • NoC Solution – How to interconnect efficiently • Focus – Performance and Scalability © 조준동, 2006년 가을 20 NoC Architecture • Overall Architecture – Essential Part of NoC • • • • • • • Network0 Interface (NI) Up-Sampler (UPS) Link Wires FIFO Synchronizer with a queuing buffer (SYNC) Switch Overall architecture of the On Chip Network Down-Sampler (DNS) Off-chip Gateway (OGW) © 조준동, 2006년 가을 21 NoC Architecture • Packet Routing Scheme – Routing Process • A packet is transferred to a destination according to route information in the packet header. Switch port index, Header Format, Header modification © 조준동, 2006년 가을 22 NoC Architecture • Chip-to-Chip Connectivity – Off-chip Gateway(OGW) • OGWs provide chip-to-chip packet transaction Chip-to-Chip connection using OGWs © 조준동, 2006년 가을 23 NoC Architecture • Topology Selection – The first step for NoC architecture design – There are basic topologies • Mesh Topology – Mesh topology widely used and studied for parallel computing architecture. • Star Topology – Star topology has not been popularly used because it has a limitation of scalability – Optimal to SoC design( PUs may be placed irregularly to minimize chip area) Mesh area (5 N 4 N ) ASYNC (25N 36 N ) A1SW 4( N N ) ALNK Star area NASYNC N 2 A1SW N N ART the number of PUs = N © 조준동, 2006년 가을 24 NoC Architecture • Topology Selection Energy consumption according to a number of PUs © 조준동, 2006년 가을 25 NoC Architecture • Topology Selection Network area according to a number of PUs – The network area cost including the area of… • • • • Switches Multiplexers Demultiplexers links © 조준동, 2006년 가을 26 NoC Architecture • Physical Transfer Unit(PHIT) Size – A packet is divided and transmitted through the core network. Serialization ratio (SERR) = packet size / phit size Energy and Area of OCN according to SERR © 조준동, 2006년 가을 27 NoC Architecture • Hierarchical Circuit and Packet Switching – Local Intracluster Network → Circuit Switching • The Circuit switching does not need packet buffers • Area and Power consumption can be reduced – Global Intercluster Network → Packet Switching • The global intercluster traffic shares the bandwidth of the switch-to-switch link • The throughput of the shared and limited link is more important rather than the latency © 조준동, 2006년 가을 28 NoC Architecture • Synchronization – Heterogeneous Multiprocessing System(multi timing reference) Synchronization structure in the NoC © 조준동, 2006년 가을 29 NoC Architecture • NoC Protocol Packet Format, Burst READ/WRITE Transactions © 조준동, 2006년 가을 30 Low-Power Techniques • Low-Swing Signaling – The global link consumes higher power than local link does Low-swing signaling and its transceiver circuits – Low-signaling can alleviate its energy consumption significantly[2. 13] © 조준동, 2006년 가을 31 Low-Power Techniques • Low-Swing Signaling (a)Energy consumption (b)Energy and delay product © 조준동, 2006년 가을 32 Low-Power Techniques • Mux-Tree Based Round-Robin Scheduler Mux-tree-based round-robin scheduler © 조준동, 2006년 가을 33 Low-Power Techniques • Crossbar Partial Activation Techniques (CPAT)[2. 10] Schematic Diagram of Crossbar © 조준동, 2006년 가을 34 Low-Power Techniques • Low-Energy Coding on On-Chip Serial Link transitions in parallel with serial communications © 조준동, 2006년 가을 35 Low-Power Techniques • Low-Energy Coding on On-Chip Serial Link – Serialized low-energy transmission (SILENT) technique[2. 18] • To minimize the transmission energy on the serial wire t) B (by [i ] using b (t ) [i ] b (t 1)the [i ], data for i correlation 0 (n 1) properties – The is expressed as follows: B (t ) [n encoding 1: 0] : n bit algorithm original data word at time t b (t ) [n 1: 0] : n bit encoded data word at time t Encoder/Decoder © 조준동, 2006년 가을 36 Low-Power Techniques • Low-Energy Coding on On-Chip Serial Link – Serialized low-energy transmission (SILENT) Average power consumption on18] serial communications technique[2. © 조준동, 2006년 가을 37 Implementation and Measurement Results • Implemented multimedia SoC – proposed NoC architecture, protocol and lowBlock diagram of atechniques prototype SoC power © 조준동, 2006년 가을 38 Implementation and Measurement Results • Implemented multimedia SoC – proposed NoC architecture, protocol and lowpower techniques Measured packet signals On-chip network power consumption © 조준동, 2006년 가을 39 Conclusion • A low-power NoC is designed and implemented for high-performance SoC application • Heterogeneous IPs are interconnected in a hierarchical star topology • Various power-efficient techniques were suggested and implemented © 조준동, 2006년 가을 40 Reference • • • • • • • • • • [1. 1] R.Woo et al., “A 210 mW graphics LSI implementing full 3 D pipeline with 264 Mtexels/s texturing for mobile multimedia applications,” in ISSCC Tech. Dig., 2003, pp. 44– 45. [1. 2] AMBA™ AXI Protocol Specification (2003). [Online]. Available: http://www.arm.com [1. 3] M. Sgroi et al., “Addressing the system-on-a-chip interconnection woes through communication-based design, ”in Proc. Design Automation Conf., 2001, pp. 667–672. [1. 4] L. Benini et al., “Powering networks on chips,” in Proc. Int. Symp.System Synthesis, 2001, pp. 33–38. [1. 5] P. Guerrier et al., “A generic architecture for on-chip packet-switched interconnections,” in Proc. Design Automation Test Eur. Conf. Exhib.,2000, pp. 250–256. [1. 6] S. Kumar et al., “A network on chip architecture and design methodology,”in Proc. Ann Symp. VLSI, 2002, pp. 117–124. [1. 7] W. J. Dally et al., “Route packets, not wires: On-chip interconnection networks,” in Proc. f Design Automation Conf., 2001, pp. 684–689. [1. 8] H. Zhang et al., “A 1-V heterogeneous reconfigurable DSP IC for wireless baseband digital signal processing,” J. Solid-State Circuits, vol. 35,no. 11, pp. 1697–2000, Nov. 2000. [1. 9] M. Taylor et al., “A 16-issue multiple-program-counter microprocessor with point-topoint scalar operand network,” in ISSCC Tech. Dig., 2003,pp. 170–171. [1. 10] S. Lee et al., “An 800 MHz star-connected on-chip network for application to systems on a chip,” in ISSCC Tech. Dig., 2003, pp. 468–469. © 조준동, 2006년 가을 41 Reference • • • • • • • • • • • [1. 11] K. Lee et al., “A 51 mW 1.6 GHz on-chip network for low power heterogeneous SoC platform,” in ISSCC Tech. Dig., 2004, pp. 152–153. [1. 12] A. Laffely et al., “Adaptive system on a chip (ASOC): A backbone for power-aware signal processing cores,” in Proc. IEEE Int. Conf. Image Processing, Barcelona, Spain, Sep. 2003. [1. 13] E. Rijpkema et al., “Tradeoffs in the design of a router with both guaranteed and best-effort services for networks on chip,” in Proc. Design Automation Test in Eur., Mar. 2003, pp. 350–355. [1. 14] D. Bertozzi et al., “Xpipes:Anetwork-on-chip architecture for gigascale system-onchip,” IEEE Circuits Syst. Mag., vol. 4, no. 2, pp. 18–31, Feb. 2004. [1. 15] S. Kimura et al., “An on-chip high speed serial communication method based on independent ring oscillators,” in ISSCC Tech. Dig., 2003, pp. 390–391. [1. 16] H. J. Siegel et al., Interconnection Networks for Large-Scale Parallel Processing: Theory and Case Studies. New York: McGraw-Hill, 1990. [1. 17] K. Lee et al., “SILENT: Serialized low energy transmission coding for on-chip interconnection networks,” in Proc. Int. Conf. Computer-Aided Design, 2004, pp. 448–451. [1. 18] W. J. Dally et al., Digital Systems Engineering. Cambridge, U.K.: Cambridge Univ. Press, 1998, ch. 10. [1. 19] M. Cooperman et al., “CMOS gigabit-per-second switching,” J. Solid- State Circuits, vol. 28, no. 6, pp. 631–639, Jun. 1993. [1. 20] T. T. Ye et al., “Analysis of power consumption on switch fabrics in network routers,” in Proc. Design Automation Conf., 2002, pp. 524–529. [1. 21] K. Lee et al., “A distributed on-chip crossbar switch scheduler for on-chip networks,” in Proc. Custom Integrated Circuits Conf., May 2003, pp. 671–674. © 조준동, 2006년 가을 42 Reference • • • • • • • • • • • • [2. 1] International Technology Roadmap for Semiconductors [Online]. Available: http://public.itrs.net [2. 2] W. Dally et al., “Route packets, not wires: On-chip interconnection networks,”in Proc. Des. Autom. Conf., Jun. 2001, pp. 684–689. [2. 3] L. Benini et al., “Networks on chips: A new SoC paradigm,” IEEE Computer, vol. 36, no. 1, pp. 70–78, Jan. 2002. [2. 4] D. Bertozzi et al., “Xpipes: A network-on-chip architecture for gigascale system-onchip,” IEEE Circuits Syst. Mag., vol. 4, no. 2, pp. 18–31, 2004. [2. 5] E. Rijpkema et al., “Trade offs in the design of a router with both guaranteed and best-effort services for networks on chip,” in Proc. Des., Autom. Test Europe Conf., Mar. 2003, pp. 350–355. [2. 6] V. Nollet et al., “Operating-system controlled network on chip,” in Proc. Des. Autom. Conf., Jun. 2004, pp. 256–259. [2. 7] J.-S. Kim et al., “On-chip network based embedded core testing,” in Proc. IEEE Int. SoC Conf., Sep. 2004, pp. 223–226. [2. 8] S.-J. Lee et al., “An 800 MHz star-connected on-chip network for application to systems on a chip,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2003, pp. 468–469. [2. 9] M. Taylor et al., “A 16-issue multiple-program-counter microprocessor with point-topoint scalar operand network,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2003, pp. 170–171. [2. 10] K. Lee et al., “A 51 mW 1.6 GHz on-chip network for low-power heterogeneous SoC platform,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2004, pp. 152–153. [2. 11] H. Wang et al., “A technology-aware and energy-oriented topology exploration for on-chip networks,” in Proc. Des., Autom. Test Europe Conf., Mar. 2005, pp. 1238–1243. [2. 12] BONE: Network-on-Chip Protocol [Online]. Available: http://ssl.kaist.ac.kr/ocn © 조준동, 2006년 가을 43 Reference • • • • • • • • • [2. 13] R. Ho et al., “Efficient on-chip global interconnects,” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, Jun. 2003, pp. 271–274. [2. 14] C. Svensson, “Optimum voltage swing on on-chip and off-chip interconnect,”IEEE J. Solid-State Circuits, vol. 36, no. 7, pp. 1108–1112, Jul. 2001. [2. 15] P. Gupta et al., “Design and implementing a fast crossbar scheduler,”IEEE Micro, vol. 19, no. 1, pp. 20–28, Jan./Feb. 1999. [2. 16] E. Shin et al., “Round-robin arbiter design and generation,” in Proc. IEEE Int. Symp. Syst. Synthesis, Oct. 2002, pp. 243–248. [2. 17] P. Landman et al., “Architectural power analysis: The dual bit type method,”IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 3, no. 2, pp. 173–187, Jun. 1995. [2. 18] K. Lee et al., “SILENT: Serialized low-energy transmission coding for on-chip interconnection networks,” in IEEE Int. Conf. Comput.-Aided Des. Dig. Tech. Papers, Nov. 2004, pp. 448–451. [2. 19] R. Woo et al., “A 210-mW graphics lsi implementing full 3-D pipeline with 264 Mtexels/s texturing for mobile multimedia applications,” IEEE J. Solid-State Circuits, vol. 39, no. 2, pp. 358–367, Feb. 2004. [2. 20] C. Kretzschmar et al., “Why transition coding for power minimization of on-chip buses does not work,” in Proc. Des. Autom. Test Europe Conf. (DATE), Feb. 2004, pp. 512– 517. [2. 21] M. R. Stan et al., “Bus-invert coding for low-power I/O,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 3, no. 1, pp. 49–58, Mar. 1995. © 조준동, 2006년 가을 44