* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Dynamically Parameterized Architectures for Power Aware Video
Survey
Document related concepts
Transmission line loudspeaker wikipedia , lookup
Electrification wikipedia , lookup
Power inverter wikipedia , lookup
Variable-frequency drive wikipedia , lookup
Buck converter wikipedia , lookup
Immunity-aware programming wikipedia , lookup
Electrical substation wikipedia , lookup
Stray voltage wikipedia , lookup
Power electronics wikipedia , lookup
History of electric power transmission wikipedia , lookup
Time-to-digital converter wikipedia , lookup
Power engineering wikipedia , lookup
Rectiverter wikipedia , lookup
Distribution management system wikipedia , lookup
Voltage optimisation wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Transcript
Using System-on-a-Chip as a Vehicle for VLSI Design Education Andrew Laffely and Wayne Burleson Electrical and Computer Engineering University of Massachusetts Amherst {alaffely,burleson}@ecs.umass.edu This material is based upon work supported by the National Science Foundation under Grant No. 9988238 and SRC Tasks 766 and 1075 Burleson, UMASS 1 Challenges in VLSI Education • • • • • Advancing Processing Technology Higher level design tools Realistic yet tractable design projects Preparation for jobs in semiconductor and other sectors. Making best use of faculty/student time and university resources Burleson/UMASS 2 ECE 559/659: VLSI Design Project (10 grads, 20 seniors) Course Objectives: • Learn design process for a complex VLSI in deep sub-micron CMOS • Learn VLSI design skills and tools, including working in teams • Learn about a particular application component and its VLSI implementation • Learn to present formal design reviews using oral, written, graphical and web-based techniques Burleson/UMASS 3 Key Aspects of the Course • aSoC (home-grown SoC platform) • • • • • • Graduate and undergraduate teamwork • • • Provides a unifying framework to class Allows for subdivision but inter-relation of projects Interesting cutting edge architecture based on NSF- and SRC-funded research at UMASS and elsewhere Covers many aspects of VLSI Design Realistic constraints on area, timing, power and I/O Graduate students provide leadership, motivation and experience Commercial tools and design flow Review-based evaluation • Oral and web-based reports for 4 different reviews: proposal, feasibility, implementation, integration Burleson/UMASS 4 Adaptive System-on-a-Chip (aSoC) Tile mProc • • Multiplier • Communication Interface North FPGA Tiled architecture with mesh interconnect Multiplier East West Allows for heterogeneous cores • • ctrl South Core Burleson/UMASS Differing sizes, clock rates, voltages Low-overhead core interface for • • Point to point communication pipeline On-chip bus substitute for streaming applications Based on static scheduling • Fast and predictable 5 Communication Interface Core • Core-ports North North South East South • East • West West Inputs Local Config. Crossbar Decoder North to South & East • Outputs Controller Custom design to maximize speed and reduce power Local Frequency & Voltage • • Core-ports Crossbar Controller Instruction memory Local frequency and voltage supply PC Instruction Memory Burleson/UMASS 6 Class Projects SoC Infrastructure1,3 • Communication Interface • Interconnect3 • Power Distribution • Clock System • Power Management • 1 2 3 Used in PhD Dissertation Used in Masters Thesis Used in Publications • Cores Motion estimation for video encoding2,3 • AES Cryptography3 • Cache2,3 • Huffman Coding • 3D Graphics1,2,3 • Discrete Cosine Transform2,3 • Smart Card2,3 • Burleson/UMASS 7 Design Flow http://vsp2.ecs.umass.edu/vspg/658/TA_Tools/design_flow.html • Architecture to Layout Architecture: Block diagram of system and behavioral description Logic: Gate level or schematic description • Circuit: Transistor sizing • Layout: Floorplanning, clock and power distribution • • • Tools • • • • • • • • VerilogXL: behavioral representation VTVT: standard cell library Synopsys: standard cell gate level netlist generation Silicon Ensemble: standard cell netlist to layout Cadence LayoutPlus: schematic and layout design NCSU CDK: design and extraction rules Cadence Layout vs. Schematic: layout verification HSPICE: circuit simulator Burleson/UMASS 8 aSoC Implementation and Integration 2500 l .18m TSMC technology Full custom 3000 l Burleson/UMASS 9 Advanced Signaling Techniques (building on SRC-funded work) Differential current sensing Booster Insertion Multi-level current signaling Phase coding Burleson/UMASS 10 Circuit Level Simulation (HSPICE) Evaluating Subsystems with realistic models • • • Capacitance, resistance and inductance Process variations Process generations Burleson/UMASS 11 Interconnect Characterization: Comparing delay and power of signaling techniques for different tile sizes at 250nm, 180nm, 130nm, 100n Burleson/UMASS 12 Voltage Scaling Approach • Core-ports • • Single buffer for each stream to cross clock/voltage barrier between core and interface Reading/Writing success rates indicate core utilization Input blocked: Core too slow • Output blocked: Core too fast • • Controller • Interprets core-port success rates to adjust local clock and voltage Core Buffer Processing Pipeline Local Local Vdd Clock Input Core-port Output Core-port Clock Blocked Blocked and Supply Controller Interconnect Burleson/UMASS 13 Vdd Selection Criteria Normalized Core Critical Path Delay vs. Vdd 12 Normalized Delay 10 1/8 Speed 8 6 1/4 Speed 4 1/2 Speed • • • As Vdd decreases delay increases exponentially Use curve to match available clock frequencies to voltages The voltage and frequency change reduces power by 79%, 96%, and 98.7% • P = aC(Vdd)2f 2 Max Speed 0 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0.73 1.16 2 Voltage Burleson/UMASS 14 Clock Distribution Tile • Tiled architecture extends life of globally synchronous systems • Precise H-tree implementation • Load is small and equal at each branch • Skew can be reduced by 70% with advanced deskew circuits1 64 tile aSoC 70nm 100nm 130nm 180nm Chip Area (9.24mm)2 (13.3mm)2 (17.2mm)2 (23.8mm)2 Frequency 5 GHz 2 GHz 1 GHz 0.5 GHz Power 126 mW 240 mW 445 mW 784 mW Mean Skew 41 ps 50 ps 92 ps 70.6 ps Percent Skew 21 % 10 % 9% 4% S. Tan et al. “Clock Generation and Distribution for the First IA-64 Microprocessor” IEEE JSSC, Nov. 2000 Burleson/UMASS 1 15 Power Distribution • Heterogeneous cores may require multiple power supply voltages • Tile structure enables uniform interwoven grid • Larger grid for higher current demands Gnd Vml Vl Vmh • Reduced resistance • Higher capacitance Vh 64 tile aSoC Vh Vmh Vml Vl Voltage 1.8V 1.16V 0.73V 0.6V Current per Core 110mA 25mA 13mA 7mA Total Power 12.1 W 1.86 W 607 mW 269 mW Burleson/UMASS 16 Architecture Evaluation (Motion Estimation) • Array-based architecture • • Memory Pipelined ME FIFOs Parameterized search window size • • • Address Generation Unit Full search Choose 16x16 or 8x8 windows Reduce power Burleson/UMASS Processing Element Array 17 Modify Existing Designs • • Take existing Verilog code or hardware and improve or change functionality (e.g. add motion estimation algorithms, provide AES key-length flexibility) Evaluate changes in performance and overhead - Old PE Layout - New PE Layout Burleson/UMASS 18 Conclusions • Advancing Process Technology • • Higher level design tools • • Re-use existing projects and provide unifying themes Preparation for jobs in semiconductor and other sectors. • • • Combine synthesis and custom techniques Realistic yet tractable design projects • • Target .18u for affordable fab but also do scaling studies Focus on system design and appropriate levels of abstraction Teach how to learn new tools Making best use of faculty/student time and university resources Leverage research Combine grad and undergrad • Re-use materials, tools • • Burleson/UMASS 19