Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Low Power Techniques 경종민 [email protected] 2 Low Power Techniques Contents 1. Introduction 2. 왜 low power 인가? 3. Future Opportunities for Low-Power 4. How to reduce power 3 1. Introduction 1) Drivers for IC progress Delay FPGA P Power Size Reliability-1 Full custom Flexibility-1 (Programmability) Design TAT Cost • Silicon is the winner, and among many, CMOS is the winner. • So will it be at least for next 25 years. 4 There’s no show stopper! (in technology) ex. 양자/열역학(min. switching energy, power dissipation) 전자기학(빛의 속도) material, etc. Except for Multi-Billion $ investment cost! Moore’s law will keep being honored. Why? 1. No insurmountable obstacle exists. 2. People believes & behaves accordingly. • Huge opportunity exists only if we do good in exploiting 1) cross-breeding, co-utilization and co-development among interactable technologies 2) Technology sharing using network 5 2) Big Picture : If power reduction is THE goal, you need to visit all areas to achieve it. Speed Power Designtime Feb. Cost algorithm architecture logic circuit device process material S/W Pgmmability 6 Analogy : Vertical engineer vs. horizontal engineer IF you want to sell graphic chip, you need to do anything to help achieve it, from design, application to marketing, etc. P-core wireless graphics giga-bit switch MPEG RAMBUS marketing application Legal affairs(IP) Horizontal engineer Main facturing verification design testing simulation Process tuning Vertical engineer 7 2. 왜 low power인가? 1) Battery 기술 발전 slow ! : 5-8배 향상/200yrs 200년전 : 납축전지 25 watt.hour/kg now : lithium polymer 전지 : 200 watt.hour/kg 이에 비하면 반도체기술은 30년동안 106배(CPU속도) 매 3년마다 4배(Memory density) Still wild wild frontier stretching before us! 2) 열방출 문제 : You don’t want big cooling tower for each IC’s ! 3) Energy 절약 : minimize the amount of energy consumption, and recirculation period, otherwise our earth will be EXHAUSTED. 4) Convenience : too many wires around : mess 8 3. Future Opportunities for Low-Power 1) PDA(Personal Digital Assistant) telephone, pager, pen-based input, schedule keeper, audio/video entertainment fax, video camera, data security with fingerprint and/or voice recognition, speech recognition, appl. S/W, teleconferencing… Appl. Server PDA Base Station(RF) Function sharing for “low-power”ing PDA 2) Tablet(descendent of current Notebook) 9 3) Virtual Reality(VR) headset for Games : allows you to move around, only if there’s no wire. : delegate complex processing to fixed server, while performing only video decompression. 4) Military : No chance for wires, No heavy batteries was your too busy. – Information warfare : 1) Soldier locates enemy tank using laser rangefinder with GPS 2) request(for airstrike) to control officers 3) aircraft nearby gets command 10 5) Pico-cell based home network for Games FTTH Satellite xDSL Cable I/F Home automation Temp control security Home cellular A/V digital network PDA cellular video-phone Phone & TV HDTV, VCR Game Camera, Printer Get all available service, Allow all possible communications among home devices, But with no messy wires. 11 6) Medical Uses pace maker(implanted) health monitor hearing aids 7) GPS(for traveller/explorer, driver(car, ship, boat, soldiers …) 8) RF ID(for identifying people, animal, cars…) passive type : resonant LC circuits active type(no battery, draws RF power from RF field) 9) Smart Cards : 주민증, Cash drawing encryption, COS(card OS) 12 4. How to reduce Power • By all means possible, algorithm, S/W, architectures, data representation, logic & circuits place & route, clock, process, library, material 1) algorithm : adjusting # of taps(N) in FIR filters by measuring noise power. N=10 transfer function N=6(low power) 13 2) Software : similar to the case when reducing code size & improving speed of execution – instruction selection and ordering compiler’s job to minimize Bus switching – minimize memory space & access (reduce cache miss) – codesign for low power – slow down clock – halt clock – lower VDD – Shut down 14 3) Architectures • Parallel architecture VDD f VDD/n, f/n MUX – Switching Power P1 CVDD f 2 MUX VDD 2 f P1 P2 (1 ) n C 2 2 n n n For the same speed CVDD CVDD 1 t 2 ~ i ( VDD VT ) VDD f VDD Sacrifice area for low power 15 • Pipelining Latches VDD VDD ,f n f P1 CVDD 2 f i) VDD가 VDD 2 P1 P2 (1 )C ( ) f 2 n n 1 1 로 되면 speed도 로 됨. n n ii) pipeline stage 수를 n으로 하면 각 stage의 logic complexity는 1 로 되고, 따라서 speed(throughput)가 n 배로 됨. n iii) speed는 그대로 유지 됨.(는 pipelining overhead, ex : 각 stage delay의mismatch …. ) 16 • BUS에서의 switching power 소모를 최소화: CV 2 f • Effective capacitance activity-driven bus placement Decreasing (activity) SRAM data Phys. Cap. address bus mostly READ operation mostly sequential access : small Display data : large Distance from core to pads priority for placing bus(route, layer) 17 • V(voltage swing) reduction lowV hi-V I/F High V I/F Large C Small C - low-swing bus ex. GTL(Xerox) CTT(Mosaid) JTL(Jedec) LVTTL, LVCTT …. - Charge-recycling bus V V 01 . VDD 18 • BUS invert encoding : - send inverted signals when majority of bits are switching, and deinvert. Source date EX-OR DATA bus Received data Polarity signal Polarity decision logic 19 • F(frequency) lowering : PLL Multiply f by N using PLL before distribution f/N master clock PLL PLL 20 4) Data representation • Gray code vs. binary 2’s(or 1’s) compl. n B 2 ( 2 1) n # of toggles ratio : 2 Gn 2n • signed magintude vs. 2’s compl. Zero-crossing 시 sign-bit 만 변함. Zero crossing 시 full switching 21 5) Logic • Signal gating : masking unwanted switching activities from propagating forward, causing unnecessary power dissipation. • Additional power due to control signal generation should be small. Frequency of control signal needs to be slower than the signal frequency. 22 • Logic encoding ; binary vs. Gray code for counters 23 24 • State encoding 0.1 0.1 11 0.3 01 0.1 0.4 00 01 0.1 (M1) VS. 0.3 0.4 00 0.1 11 0.1 (M1) E(M1) = expectation of # of switchings per transition = 2(0.3+0.4)+1(0.1+0.1)=1.6 E(M2) 1(0.3+0.4+0.1)+2(0.1)=1.0 - assigning don’t cares to either 1 or - for low switching 25 • Precomputation logic ; – saves power by masking uninfluential input signals into the combinational logic with g(x), precomputation logic. – I.e., for the out put f(x), there may be some conditions under which f(x) is independent of some set of input signals latched in R2, which can be disabled according to g(x). 26 ex.) Binary comparator : f(A,B) = 1 if A>B g(x) = AnBn 27 • Systematic method to derive a pre-computation function, g(x), given f(x), R1 and R2 • Let f(p1, … pm, x1, …, xn) be Boolean function where p1,…, pm are pre-computed inputs corresponding to R1, and x1,…,xn are gated inputs corresponding to R2. • Let fxi(fxi)be the Boolean function obtained by setting xi=1(xi=0) in f. • Define Uxi f (= universal quantification of f w.r.t. xi )= fxi * fxi • Then Uxi f = 1 implies f=1 regardless of the value of xi, because Uxif=1 means fxi= fxi =1 in the Shannon’s decomposition of f w.r.t. xi f=xi*fxi +xi*fxi 28 • Let g1 = Ux1 Ux2 … Uxn f Then g1 =1 implies that f=1 regardless of the values of x1 … xn. I.e., g1=1 is one of the conditions where f is indep. of the input values of x1 … xn. • Similarly, g0 = Ux1 Ux2 … Uxn f g0=1 implies that f=0 regardless of x1,…xn. • Then g=g1+g0 is the pre-computation function. I.e. if g = 1, we can disable the loading of x1,…xn into R2 because output f is independent of gated inputs. • G, computed this way, may not be the unique precomputation function, but it contains the most number of 1’s in its truth table among all pre-computation functions. 29 • Examples 1) Precomputation architecture based on Shannon’s decomposition; f(x1,…,xn) = xi *fxi + xi*fxi 30 • Ex 2) Latch-based pre-computation architecture: 31 6) Low Power Circuits • Use static rather than dynamic to avoid unnecessary precharge • low static power – self reverse bias for reducing subthreshold current VDD S I1 I2 Pc(Wc) X lnID act stdby S=0(active) S=1(stdby) Pdi VGs Word line drivers 32 • Compromise between dynamic and leakage power dissipation 33 • Multi-VT(threshold) : speed-critical part : low VT power-critical part : high VT - by back-gate bias : routing difficult - by additional implant • Adiabatic Computing : Power dissipation is due to voltage drop on R reduce it! R by gradual rise & fall of inputs C multi-step clock 파형 34 • Delay vs. power supply voltage(Td vs. VDD) Td VDD-1 35 • Power delay product(Energy) vs. delay for various circuits 36 7) Power reduction in clock network • Why bother with clock network? – In synchronous circuit, clock is generally the highest frequency signal. – And, clock typically drives a large load as it has to reach many sequential elements. – In alpha chip, power consumption in the clock network is 40% of total. • Clock gating: – Most popular method for power reduction of clock signals – effective when some functional module(ALU, memory or FPU, etc) is not required for some extended period. – Gated clock suffers additional gate delay due to gating function. 37 • Reduced clock swing: – Conventional vs. half-swing clocking 38 – Charge sharing circuit for half-swing clock When CLK is low, VH C1 C A Vdd C1 C4 C A C B When CLK is high, VH C2 C A Vdd C2 C3 C A C B VH 0.5 Vdd if CA=CB >> C1, C2, C3, C4 39 – Simple charge sharing circuit 40 • Tri-state keeper circuit: – Floating node with its potential somewhere between GND and VDD is noise-sensitive and can cause DC power dissipation in the fanin gate – Floating bus suppressor circuit 41 • Blocking gate – Fanin gates connected to a node floating( as it is powered down) can experience large short-circuit current. • Use a blocking NAND gate as below: 42 • Reduction of switching activity: – guarded evaluation: • adding latches or blocking gates before C/L if its outputs are not used. • Ex). 43 – Careful bus multiplexing for +vely correlated data stream – Aggressive bus multiplexing for -vely correlated data stream 44 8) process : • VDD reduction conflict reduce VT • Standby current를 줄인다. VT not too small • leakage 전류 축소 junction profile, high subthreshold swing • switching power 축소 parasitic C 축소 (high-speed와 같은 goal유지) retrograded channel trench sidewall pacer for S/D implant 45 9) Library : • Small size, various sizes for tr. sizing for delay balancing long intercon. on low C-layer to reduce glitch large C to reduce buffer size small C 10) Material low e inter-layer dielectric low material for intercon copper