Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Audio power wikipedia , lookup
Electrification wikipedia , lookup
Power over Ethernet wikipedia , lookup
Thermal runaway wikipedia , lookup
Mains electricity wikipedia , lookup
Electric power system wikipedia , lookup
Semiconductor device wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Earthing system wikipedia , lookup
Alternating current wikipedia , lookup
Power engineering wikipedia , lookup
Power MOSFET wikipedia , lookup
Outline • Introduction. • Design for low leakage: Leakage Power Minimization in Deep-Submicron CMOS circuits - Basics. - Existing approaches. • Sleep Transistor Insertion: Enrico Macii - Principle. - Automated STI. Politecnico di Torino Dip. di Automatica e Informatica 10129 Torino, Italy • Methodology. • Preliminary results. • Extensions. [email protected] • Conclusions. Enrico Macii Power Dissipation in CMOS Circuits Electronic Technology Today: CMOS Convergence • CMOS technology dominates in modern ICs. 1960s 1970s 1980s 1990s • Power dissipation of a CMOS gate: P = PSW + P SC + P Lk 2000s where: CMOS Watch Chip NMOS SRAM CMOS NMOS FLASH PMOS 1960s CMOS NMOS 1970s - Switching power minimization was the primary objective. CMOS Bipolar ECL Server/Mainframe • In older technologies (0.25um and above), P Lk was marginal w.r.t. switching power: CMOS NMOS Microprocessor DRAM • P SW = Switching (or dynamic) power. • P SC = Short -circuit power. • P Lk = Leakage (or stand-by) power. CMOS PMOS Calculator BICMOS 1980s • In deep sub-micron processes, P Lk becomes critical. CMOS 1990s 2000s Enrico Macii 3 Leakage vs. Dynamic Power in Current CMOS Circuits • Leakage power becomes comparable to dynamic power as technology scales. - Example: ASICs Enrico Macii • CMOS inverter: V DD 150 125 Power Density 100 75 (Watts/cm2 ) 50 25 0 PMOS Leakage Power Dynamic Power 250nm 180nm 130nm 90nm V IN 65nm V OUT NMOS - Example: Microprocessors [source: Intel] . 180nm, 1.5V, 1.0GHz, 221MTx (core+cache) 4 Power Dissipation Due to Leakage [source: STMicroelectronics]. Itanium 2: 2 Isub 100% CL Igate 80% Leakage Power 60% I/O Power Itanium 3: 40% 130nm, 1.3V, 1.5GHz, 410MTx (core+cache) 20% Dynamic Power 0% Itanium 2 Enrico Macii Itanium 3 5 Enrico Macii 6 1 Power Dissipation Due to Leakage (Cont.) Low-Leakage Design • Leakage power of a CMOS gate: P Leakage = I L Vdd • Leakage power minimization: - Design problem (and not just a technology/process problem). where: • For memory macros: • V DD = Supply voltage. • IL = Leakage current. - Optimization based on ad- hoc solutions (cell optimization). • For cell-based logic: - Leakage current IL consists of two major contributions: - Optimization requires design automation. - Integration with existing tools (both at logic and physical level) is mandatory. IL= Isub + Igate where: • Isub = Sub-threshold current caused by low threshold voltage. • Igate = Gate current caused by reduced thickness of gate oxide. - Isub dominates, but grows by 5X per generation. - Igate is less relevant, but grows much faster (500X per generation). Enrico Macii • Different solutions proposed for both sub -threshold and gate leakage. 7 Enrico Macii Existing Approaches to Low-Leakage Design DTCMOS • Sub-threshold leakage: - 8 • Low-threshold cells: Variable-threshold CMOS (VTCMOS). Dual-threshold CMOS (DTCMOS). Multi-threshold CMOS (MTCMOS). Sleep transistor insertion (STI). Multi-voltage CMOS (MVCMOS). Body biasing (reverse -- RBB and forward -- FBB). State assignment. - 15- 20% faster. - 10x higher leakage. than high-threshold cells. • Libraries containing high-V Th and low-V Th gates do exist. • Idea: - Use low -VTh gates for critical paths, high- VTh cells for the rest. • Approach: - Synthesize and map the design onto all high- VTh cells. Minimum leakage implementation. - Replace high-VTh cells on the critical path with low - VTh cells to meet timing constraints. • Gate leakage: - Boosted gate MOS (BGMOS). - P-type Domino. - Pin reordering and state assignment. Enrico Macii • Leakage power increase required to meet timing constraints may vary from 20% to 200%. 9 Enrico Macii MTCMOS 10 MTCMOS (Cont.) • Idea: Use multi-threshold CMOS cells with capability of operating at: • Principle of MTCMOS: - Insertion of high- VTh transistors in series to the pull-up and pull-down networks in order to reduce the sub-threshold leakage current while maintaining high-speed operation in active mode. - Low- VTh, when in active mode; - High- VTh, when in stand-by mode. • Leakage power control obtained thanks to two effects: Vdd - Transistor stacking. - Low sub-threshold leakage current of high-V Th transistors. Vdd Sleep Virtual Vdd Low-VTh CMOS gate Low-VTh CMOS gate Virtual GND GND Sleep’ GND Enrico Macii 11 Enrico Macii 12 2 MTCMOS (Cont.) STI • Limitations of MTCMOS: • Idea: Modify the MTCMOS approach by: - Impact on area: • Each cell includes two extra transistors for low -leakage stand-by operation ⇒ Significant cost in terms of area. • The PMOS transistor is normally much larger than the NMOS (e,.g., Form factor ~ 30) ⇒ Need of huge buffering circuitry . - Using the same sleep transistors to control blocks of higher complexity. - Avoiding the PMOS sleep transistor. - Process modifications for supporting the implementation of high-VTh transistors. - Impact on performance: V dd V dd • Slow-down of power gated logic cells when the circuit is active. • Re-activation delay for re -enabling a set of powered down cells. Low-V Th CMOS block (N cells) Low-V Th CMOS block (N cells) Virtual GND GND Sleep’ GND Enrico Macii 13 Enrico Macii STI (Cont.) Automated STI • Further modification: • Issues: - Use low -V th sleep transistors. - Granularity of STI insertion. • Large CMOS blocks: Size of sleep transistors and driving strengths of sleep signals. • Small CMOS blocks: Number of sleep transistors and size of control logic. • Consequences: - All devices are fabricated using the same process. - Sub-threshold leakage power reduced by transistor stacking effect only (smaller, but still significant reduction). - Design of sleep transistor cells: • Example: • Different sizes and driving strenghts. • Must be compliant with the cells in the library. LOW - VTH GATED LOGIC G 1 G 2 14 G 3 G 4 G 5 G 6 - Area and delay control. G 7 • Selection of gates to which STI should be applied: Requires layout information. Vgnd - Generation of sleep signals: • Area, timing and power overhead. SLEEP LEAKAGE - CONTROL CELL Enrico Macii 15 Enrico Macii Automated STI (Cont.) 16 Automated STI (Cont.) • Post-layout STI for combinational circuits. • Post-layout STI for combinational circuits (Cont.). - STI is performed on a row -by-row basis. - Sleep transistors are added at the boundaries of each row and they are connected to a common virtual ground. - Assumptions: • All cells in the circuit can be potentially controlled by sleep transistors. • Only one control signal is used to drive all the sleep transistors and it is available from some external module (e.g., a microprocessor). - Design and characterization of a library of sleep transistor cel ls. - Flow: Placed Row Area Constraint Calculate Cluster of Gates Size and Insert Sleep Transistor Delay Constraint Sleep Transistor Library Update Layout Enrico Macii 17 Enrico Macii 18 3 Automated STI (Cont.) Automated STI (Cont.) • Post-layout STI for combinational circuits (Cont.). • Post-layout STI for combinational circuits (Cont.). - Controlling area penalty. - Controlling delay penalty. • Use part of the area of empty regions between cells according to the tolerated congestion overhead (compaction). • Resize the floorplan according to the tolerated area overhead. • For each row, consider the largest sleep transistor that can be inserted (free area + allowed overhead). • The maximum sustainable current of each sleep transistor is computed according to the tolerated slow-down in active mode. • The cell selection process performs a gate-by-gate exploration of each row starting from the cell with the longest timing path and going back towards the prymary inputs. • The re-activation time penalty is traded (or nullified) by preventing the power gating in the circuit of some of the cells whose arrival times are shorter than the re -activation delay of the sleep transistors. Free Area G3 G2 G4 G1 G4 G1 For each row, the process stops when the current budget is exhausted or the re-activation time penalty is violated. Free Area + Area Overhead G2 G3 Enrico Macii 19 Enrico Macii Automated STI (Cont.) 20 Automated STI (Cont.) • Post-layout STI for combinational circuits (Cont.). • Post-layout STI for combinational circuits (Cont.). - Experimental set-up: - Results: • Six benchmarks (from 1900 to 2600 standard cells). • Circuits synthesized onto 0.13um CMOS technology library from STMicroelectronics. • Sleep transistor cells chosen so as to guarantee a total perform ance degradation below 5% in active mode. • Tolerated area overhead set to 5%. • Leakage power reductions around 80%. • Total power savings, accounting for cell dynamic and internal power, are around 19%. Benchmark Original Pdyn+int [mW] Ptot [mW] PL [mW] Pdyn+int [mW] Ptot [mW] PL [%] Pdyn+int [%] Ptot [%] Block1 0.11 0.29 0.40 0.02 0.32 0.34 78.9 -9.0 15.0 Block2 0.19 0.22 0.41 0.04 0.24 0.28 80.0 -10.1 31.0 Block3 0.16 0.31 0.47 0.04 0.33 0.37 74.6 -8.8 21.2 Block4 0.26 0.60 0.86 0.05 0.63 0.68 82.7 -5.0 18.6 Block5 0.12 0.29 0.41 0.03 0.32 0.35 78.9 -9.7 12.5 Block6 0.46 0.88 1.34 0.09 0.98 1.07 83.5 -12.4 20.1 79.7 -9.6 18.9 Avg. Enrico Macii 21 Automated STI (Cont.) ∆∆ Optimized PL [mW] Enrico Macii 22 Automated STI (Cont.) • Post-layout STI for combinational circuits (Cont.). • Extensions to the post-layout STI approach: - Results (cont.): - Handle sequential circuits: • Area overhead around 2.5% and circuit delay increase of 5%. Benchmark Gates Sleep Cells Area_Orig [µm2 ] Area_Opt [µm2 ] ∆∆ [%] Block1 1852 14 64912 66794 2.9 Block2 1916 14 65210 66710 2.3 Block3 2215 22 65053 66550 2.3 Block4 2267 13 65412 66524 1.7 Block5 2302 26 65918 68159 3.4 Block6 2612 20 70298 71703 2.0 Enrico Macii • Problem of state retention in sleep mode. - Need to design low-leakage flip-flops (based on the concept of “Baloon Circuit”). - Automatic extraction of logic conditions for sleep. • Can reuse idle conditions from clock gating. • Exploit ODC -driven clock gating approach. - Minimum overhead (shared logic with gated clock circuitry). 23 Enrico Macii 24 4 Automated STI (Cont.) Conclusions • Leakage accounts for around 5-10% of power budget at 180nm; this grows to 20-25% at 130nm and to 35-50% at 90nm. • Preliminary results on the design of a low-leakage storage element (flip-flop with “Baloon circuit”): - Leakage current in stand-by mode: • Regular flip-flop: 116nA. • Low-leakage flip-flop: 10nA. • Leakage power minimization must be faced from the design stand-point, not just at the technology/process level. - Marginal delay increase. • Several low-leakage design approaches introduced recently. • STI is promising, although it requires significant methodology and tool infrastructure support. • Preliminary results are currently under evaluation. Enrico Macii 25 Enrico Macii 26 5