Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Energy Efficient and High Speed On-Chip Ternary Bus Chunjie Duan Mitsubishi Electric Research Labs, Cambridge, MA, USA Sunil P. Khatri Texas A&M University, College Station, TX, USA Motivation • Trends in VLSI design – Shrinking feature size • Deep SubMicron (DSM) and Very Deep SubMicron (VDSM) processes – Scaling down supply voltage – Increasing die-size (e.g. SoC, NoC, CMP) • Impacts χ χ χ χ Smaller gate delay (high speed logic) Lower switching power per gate High complexity (>billion gates) Increasing power consumption Higher leakage current (standby power) Reduced noise margin Increasing interconnect delay • Interconnect delay >> gate delay • Global interconnect becomes the performance bottleneck 03/13/2008 2 3 On-chip Bus Interconnects • The impact of DSM / VDSM: – W↓, P↓ – L↑, T↑ • to avoid quadratic increase in resistance of the wire: • R Inter-wire capacitance CI is much greater than substrate capacitance CL, → crosstalk becomes dominant – λ = CI / CL > 10 for metal 4 in a 0.1mm CMOS process W P CI CI CI L WT CI T CL CL Earlier process 03/13/2008 CL CL CL DSM process CL 4 Ternary Bus and Mapping • Advantage of a ternary bus – • We propose a bit-to-bit binary-ternary mapping scheme – – – • • low voltage step: Vdd/2 instead of Vdd Each binary bit is mapped directly to a line on the ternary bus. A binary 0 is mapped to a middle value on the ternary bus. i.e. 0b->0t. A binary 1 is mapped to either high or low value on the ternary bus. i.e. 1b+ or 1b - . Disadvantage: lower bit density (1 bit/line vs 1.58 bit/line for true ternary bus) Advantages: direct mapping and flexible polarity – – • 03/13/2008 Ternary to binary conversion is very slow and complex Flexible polarity results in low crosstalk. e.g., the ternary vectors +0+, -0-, +0- and -0+ all represent the same binary value 101. Each ternary value is represented by the polarity Pj and the magnitude Dj Dj Pj Tj Vout 0 X 0 V0 1 0 - V- 1 1 + V+ Ternary driver truth table 5 Crosstalk in a Multi-valued Bus • Define the effective crosstalk as X eff , j abs2d j d j , j 1 d j , j 1 – where dj,k = sgn(dj) DVk is the normalized voltage change, Vstep Vdd NOL and d j DV j Vstep . NOL is the number of logic levels • Delay can be approximated as j k CL Vstep d j l X eff , j Table 1. Examples of Total Crosstalk Vt-1 Vt Xeff 000 +++ 0 000 0++ 1 000 0+- 5 +0+ 0+0 4 +0+ 0-0 0 -+0 +-0 6 +-+ -+- 8 • Bus speed/power is highly data pattern dependent! +++ --- 0 – for l >> 1, j k CL Vstep l X eff , j • Energy consumption is Etotal d j X eff . j l CL DVstep n j 1 – when l >> 1, n 2 Etotal CL X eff . j DVstep j 1 2 • For ternary bus, Vstep = Vdd/2, we know – max(Xeff,j)= 8 – min(Xeff,j)=0 03/13/2008 6 A Low Power, High Speed 4X Ternary Bus • • Using direct bit-to-bit mapping Coding rules: – Rule #1: A direct - ↔ + transition is prohibited. – Rule #2: A 1b0b is mapped as -t0t or +t0t depending only on the current polarity of the 1b. – Rule #3: For a 0b1b transition on bj, if bj-1 is transitioning, Pj is coded so both lines transition in the same direction. – Rule #4: For a 0b1b transition on bj, if bj-1 is not transitioning and and bj+1 is transitioning from 1 to 0, Pj is coded so that the jth and (j+1)th line transition in the same direction. – Rule #5: For a 0b1b transition on bj, if no transition on either neighbor, Pj is coded so {Pj = Pj-1 or Pj = Pj+1} with Pj = Pj-1 having the higher priority. • • • The 1st rule guarantees max(Xeff,j) = 4, therefore a 2X speed up from a conventional binary bus The other rules are designed to lower the probability of high value Xeff,j’s occurrence on the bus Binary Ternary Xeff Identical encoder/decoder logic for each bit An example of 4X ternary sequences 03/13/2008 11110111 00110101 11100011 01010100 10101110 01110001 00000011 00011110 ++-000-+ 00—0+0+ ++-000-+ 0+0+0+00 -0-0-+-0 0+-+000000000-000+++-0 01100121 01220111 10112122 00001021 01212200 13431121 00110121 7 An Even Faster 3X Ternary Bus • • • • Partition the bus into 5-bit groups Insert shield wire between groups Apply the same rules for 4X bus It can be proven that such a configuration guarantees max(Xeff) = 3 – Additional 33% speed up over 4X ternary bus • At the cost of 20% additional wires Bj+4 Bj+3 Bj+2 Bj+1 To j+2, … Enc Enc Enc Enc Enc Enc Enc Enc Pj+4 Dj+4 Pj+3 Dj+3 Pj+2 Dj+2 Dj+1 Pj+1 Dj Pj Pj+1 Dj+1 Pj Dj Pj-1 Dj-1 Ternary driver Ternary driver Ternary driver Ternary driver Ternary driver Ternary driver Ternary driver Ternary driver Tj-1 Tj Tj+1 Tj Tj+1 Tj+2 Tj+3 Tj+4 4X bus encoder and driver circuit 03/13/2008 Bj Bj+1 Bj Bj-1 To j-2, … 3X bus encoder and driver circuit 8 Circuit Implementations • • • Encoder implemented based on the 5 rules Decoder is extremely simple (implemented with two 2-input gates) Ternary driver and receiver can be implemented in current or voltage mode – Current mode is more power hungry (static current) – Voltage mode requires a low impedance Vdd/2 supply I ref Vdd 2Iref Iref din M1 ENC to Dj+1 M2 out2 CI out1 dout R I-driver M5 M3 CL bus w xtalk M4 I-receiver to Dj-1 (A) current mode shared V-ref Vdd Vdd/2 Vref1 Vdd Vdd M2 to Dj+1 M1 din ENC Vref2 Vref1 CI R Vdd M3 CL bus V-driver dout to Dj-1 03/13/2008 (B) Voltage mode Vref2 V-receiver 9 Experimental Results • The power saving comes from the redistribution of the Xeff – More transitions are pushed towards lower Xeff • The average power saving is ~27% Crosstalk distribution and normalized energy consumption comparison (code ternary vs. half-swing binary) 0X 1X 2X 3X 4X EF (x104) % B 52821 81837 46056 20289 3792 25.0 34.5 T 74712 99228 28101 2754 0 16.3 B 16924 26509 14432 6123 1540 7.99 T 21792 31373 11104 1259 0 5.73 B 15541 25637 15437 7264 1641 8.49 T 19843 31302 12685 1690 2 6.17 B 14852 25109 15949 7771 1823 8.76 T 18976 31285 13550 1691 2 6.35 Bus Size 5 8 16 32 03/13/2008 4X: ternary bus using 4X code; HB: half-swing binary bus; RP: ternary bus with random polarity; TT: true ternary bus 28.2 27.2 27.5 10 Experimental Results • The proposed 4X and 3X busses are advantageous over other bus coding schemes. • EF: Normalized total energy • PDP: power delay product Bus type 4XT 3XT SB HB RP TT EF (x104) 6.13 6.67 19.7 8.38 12.1 7.55 Delay 4x 3x 4x 4x 8x 8x PDP (x105) 2.45 2.00 7.88 3.35 9.68 6.04 Pwr saving (%) 68.9 66.1 0 57.5 38.6 61.7 PDP gain (%) 68.9 74.6 0 57.5 -22.8 23.4 Bus Area 1 1.2 1.97 1 1 0.68 Bus performance comparison 03/13/2008 4XT: ternary bus using 4X code; 3XT: ternary bus with 3X code; SB: binary bus with shielding; HB: half-swing binary bus; RP: ternary bus with random polarity; TT: true ternary bus Experimental Results Eye diagrams for uncoded an coded busses (10mm) 03/13/2008 11 Summary 12 • Crosstalk classification was extended to multi-valued buses • We proposed a direct bit-to-bit binary-ternary mapping scheme which results in a simple CODEC design. • We proposed a 4X coding scheme that allows us to double the speed of a conventional ternary bus and save energy. • We proposed a coding scheme (3X coding) to attain an additional 33% speed gain at the cost of 20% area overhead. • We designed and implemented the CODEC and ternary driver/receiver. • Our experimental results show significant power saving (27%) and speed gain (2X or more) over other schemes 03/13/2008