Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Implementation of a Simple 8-bit Microprocessor with Reversible Energy Recovery Logic Seokkee Kim and Soo-Ik Chae System Design Group School of Electrical Engineering Seoul National University 2005 / 05 / 05 Contents • Introduction to nRERL • 8-bit nRERL Microprocessor • Phase Scheduling • Reversibility Breaking • Measurement Results • Future Works SDGroup, School of Electrical Engineering, SNU 1/15 Introduction to nRERL (1) • nRERL is nMOS Reversible Energy Recovery Logic *) – A Fully adiabatic circuit using reversible logic – Only nMOS SW is used by exploiting Bootstrapped – Phase-pipelining using 6-phase clocked power fi+1 fi+2 fi F Xi G Xi+1 fi+2 fi+4 fi+2 H G-1 F-1 fi+3 fi+3 fi+1 H-1 fi+3 fi+5 fi+4 *) J. Lim, D.-G. Kim, and S.-I Chae, “nMOS reversible energy recovery logic for ultra-low-energy applications,” IEEE Journal of Solid-State Circuits, vol. 35, no. 6, pp. 865-875, June, 2000. SDGroup, School of Electrical Engineering, SNU 2/15 Introduction to nRERL (2) Forward Logic fi+1 switch MFL T0 T1 T2 T3 T4 T5 T6 fi n1 Xi+1 MFI Xi MFLB clamp n2 MRI n3 MRIB n4 MRLB fi+3 fi+1 0 Vdd fi+2 0 Vdd 0 Vdd-Vthb Xi MRL Reverse Isolation switch 0 Vdd fi+3 MFIB Xi Vdd fi Forward Isolation switch Xi+1 Reverse Logic switch fi+2 SDGroup, School of Electrical Engineering, SNU Vdd n1 0 Vdd-Vthb Xi+1 0 Vdd n3 0 3/15 8-bit nRERL Microprocessor (1) • Issues – Area v.s Reversibility : How we should control the reversibility to integrate the microprocessor in the limited silicon area ? – Pipelining v.s Energy : How we should schedule the phase pipelining to minimize the total energy consumption of the microprocessor ? – Energy v.s Reversibility : How we could control the reversibility without increasing the total energy consumption of the microprocessor ? SDGroup, School of Electrical Engineering, SNU 4/15 8-bit nRERL Microprocessor (2) • A subset of DLX Instruction Set Architecture 8-bit adiabatic Microprocessor Controller ALU Program Counter(PC) Branch PC Generator clocked power data flow path Register File (16w x 8b) RAM (128w x 8b) ROM (64w x 20b) – No floating point Instructions – 19 Instructions 6-phase Clocked Power Generator • 5 macro-blocks: – IF ID EXE MEM WB – Fully adiabatic circuit fREF fOSC Off-chip SDGroup, School of Electrical Engineering, SNU • 6-phase CPG is also integrated – A shared off-chip inductor is used 5/15 Phase scheduling (1) Time Phase T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 0 1 2 CASE I: Cycle-based scheduling Register File CASE II: Phase-based Scheduling (best case) Register File CASE III: Phase-based Scheduling (worst case) Register File 3 4 5 0 1 2 3 4 5 ALU 0 1 2 3 4 Memory Writeback data ALU Buffer Memory Writeback data SDGroup, School of Electrical Engineering, SNU ALU Memory Writeback data 6/15 5 Phase scheduling (2) Time T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 T22 T23 Phase 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Buffer MUX Data path Control Signal Write data Forward data Page register External Instruction forward Register File ROM ALU RAM Memdata PC register Control Eqcheck branch pc generation write to register Decoded Instructions Branch Flush pc increment Instruction Decoding Instruction Fetch Engineering, SDGroup, School of Electrical SNU /Register Fetch Execution Memory Acess 7/15 Writeback Overhead Reversibility Breaking (1) • SERC: Self-Energy Recovery Circuit – Energy recovery with its own data instead of using reversible logic 1 2 – Nonadiabatic loss exists ( 2 CVthb ) f0 f1 f2 T2 T3 T4 T5 T6 T7 T8 Data* f1 f2 f4 f4 f3 Data Vdd SERC 0 Vdd f5 0 n7 n7 Vthb Vdd 0 Data* Data* Data* n8 f5 Vthb f4 SDGroup, School of Electrical Engineering, SNU Vdd Vthb 0 8/15 Reversibility Breaking (2) • Infinite memory cannot be implemented on the limited silicon area Read port bit[n]_out wd[m]_rd_iso_f2 wd[m]_rd_f3 • SERC is used for unwrite and refresh operations. Write port SERC in Memory Cell wd[m] _unwr_f4 (ref_f4) wd[m] _unwr_f5 (ref_f4) SDGroup, School of Electrical Engineering, SNU 9/15 Measurement Results • ANAM 0.18m (1P6M) microprocessor core ROM & PC ALU & Register file Bias generator Bias Memory Control – Core: 2.62 x 2.03 mm2 – CPG: 1.0 x 0.6 mm2 – Vdd=1.8V, Vth0=0.35V – E=8.5 pJ/cycle (P=7.5 W) @ Vdd=1.8V, f=880kHz • E_cpg = 4.97 pJ/cycle 6-phase Clocked Power Routing CPG (58.5%) • E_core = 3.53 pJ/cycle (41.5%) SDGroup, School of Electrical Engineering, SNU 10/15 Hardware Complexity # of transistors (Portions to Core) Area (Portions to Core) ROM (64w x 20b) 10,000 (13.3%) 0.60 x 0.50 mm2 (7.9%) PC 17,000 (22.6%) 0.60 x 0.58 mm2 (9.2%) ALU 5,200 (6.9%) 0.50 x 0.60 mm2 (7.9%) Reg.file (16w x 8b) 7,600 (10.1%) 0.36 x 0.50 mm2 (4.8%) 400 (0.5%) 0.70 x 0.24 mm2 (4.4%) 28,000 (37.2%) 0.65 x 1.30 mm2 (22.3%) Control 5,700 (7.6%) 1.60 x 0.70 mm2 (22.2%) Phase aligning buffers 1,400 (1.9%) Microprocessor core 75,300 (100%) 2.62 x 2.03 mm2 (100%) 2,700 1.00 x 0.60 mm2 - 0.4 x 7.0 mm2 78,000 4.0 x 4.0 mm2 Forward RAM (128w x 8b) CPG Clock routing Total chip SDGroup, School of Electrical Engineering, SNU - 11/15 Energy Partitions • The energy portion of CPG is more than a half. – More optimization is required for CPG design. • At optimal condition, Adiabatic, Leakage, CPG raildriver energy loss should be same. < nRERL microprocessor > ALU8b & reg. file 6% Control & others 6% 64x20b ROM 9% E_total (8.5pJ/cycle) <nRERL microprocessor> E_total (8.5pJ/cycle) E_cpg (58.5% ) E_cpg (58.5%) E_core (41.5% ) E_core (41.5%) CPG (clk. driver) 58% 128x8b RAM 21% <Partitioned by functional blocks> SDGroup, School of Electrical Engineering, SNU leakage 20% adiabatic 21% CPG,rail-driver 35% SERC 8% CPG,controller 16% <Partitioned by energy components> 12/15 Comparisons (1): CMOS v.s nRERL • Minimum Energy Consumption 60 Energy loss per cycle [pJ/cycle] 52.0pJ 50 4.6% 9.7% 40 16.3% 30 22.1% ALU8b & reg. file 64x 20b RO M Control & others 128x 8b RAM CPG (clk. driv er) 20 10 47.3% 8.5pJ CMO S nRERL 0 SDGroup, School of Electrical Engineering, SNU 13/15 Summary Hardware Complexity Operating Region 8-bit nRERL microprocessor 8-bit CMOS microprocessor # of Tr’s 78,000 15,000 Core Area 2.62 x 2.03 mm2 0.82 x 0.51 mm2 Supply voltage 1.8V 0.8V ~ 1.8V Frequency 200kHz ~ 10MHz ~ 1GHz 8.5 pJ/cycle @ Vdd=1.8V, Vbias=1.5V, f=880kHz 52.0 pJ/cycle @ Vdd=0.65V, f=200kHz ~1MHz Minimum energy consumption (optimal condition) SDGroup, School of Electrical Engineering, SNU 14/15 Future Works • More energy-efficient CPG design is required. • More study on the complexity reduction is required for the implementation of more complex circuits.*) *) Seokkee Kim and S.-I Chae, “Complexity reduction in an adiabatic microprocessor using reversible logic,” will be published on proc. International Symposium on Low Power Electronics and Design, Aug., 2005. SDGroup, School of Electrical Engineering, SNU 15/15