Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 4 Optimizing Capacitance and Switching Activity to Reduce Dynamic Power SECTIONS 1-7 By Astha Chawla Introduction  C and A are intertwined  P = V2 X f x Ceffective.  ILP + Frequency increase => Power problem!!  Factors affecting A:   Complexity of the processor  Exploitation of parallelism  Bit-width of its structures etc.  Optimized at the architectural and microarchitectural level  Can be changed by run-time optimizations Factors affecting C:  Size of a processor’s structure  Organization to exploit locality  Manipulated at the circuit and process technology level  Determined at fixed design time Excess Switching Activity.  Idle-Unit switching activity:  Triggered by clock transitions in unused portions of hardware.  Idle –width switching activity :  Mismatch in the implemented and the actual width of processor structures.  Idle-capacity switching activity :  When a program does not use the provided hardware architectures in their entirety.  Parallel switching activity:  Activity expended in parallel for performance  Cacheable switching activity:  Repetitive switching activity, convert computing activity to cache lookups  Speculative switching activity:  Speculatively executing incorrect instructions is wasted activity  Value- dependent switching activity:  Power consumed depends on the actual data values. Capacitance  Does not change dynamically  Total capacitance = Capacitance of transistors + capacitance of wires.  Burd and Brodersen: CL = CW + Cfixed  Low power architectural techniques require partitioning:  Wire partitioning  Bit-line segmentation IDLE- UNIT SWITCHING ACTIVITY.  No effect on computation  Clock gating  Static logic: To eliminate switching, enough to prevent inputs from changing.  Dynamic logic: Power can be consumed even if the inputs to the circuit do not change   Precomputation: aims to derive a precomputation circuit for a logic block  multiplexed precomputation architecture.  F(x=0), F(x=1) Guarded evaluation: aims to shuts down part of the original circuit. Deterministic clock gating  Gating the clock to the processor structures when they are known to be idle  Power savings, improves EDP, without performance loss.  Clock gating examples:   IBM’s Power 5  Reduction in switching power > 25%  Implements fine-grain gating domain Intel’s Xscale processors  Implements three power- saving modes: Idle, Standby, Sleep  Cuts down power consumption by 30% Idle- Width switching activity: Core  Arises from a mismatch between the designed bit-width of a processor and the actual bitwidth needed in frequently occurring operations  Dynamically detects narrow- width (16 bit wide or less) operands.  Abundance in integer and multimedia applications  Approaches:    Value gating: disabling the unused width.  Disabling switching in unused parts of ALU if both operands are narrow.  Significant power savings Operation packing: Packs more than one narrow- width operation in the full width of hardware  Improves performance without significant power overhead.  Speculative operation packing. Significance compression: Compresses non-significant bits.  Byte serial pipeline. Idle- Width switching activity: Caches  Dynamic zero compression: accesses only significant bits   Only compresses zero bytes.- zero indicator bit Frequent value compression: dictionary loaded with the frequent values of a program.  Simple  Most efficient compression mechanism  Frequent value cache: cache line contains compressed and uncompressed words.  First array: holds 8 low-order bits.  Second array: holds remaining 24 high-order bits Packing compressed cache lines  Space freed by compression remains empty.  Increases cache utilization: indirect power savings.  Packing techniques:  Variable packing: packs variable number of cache lines into cache frames.    expensive Fixed packing: preset number of cache lines are packed  Reduced opportunities for compression  Compression cache:  Uses frequent value compression  Does not attempt to pack cache lines into frames  Frame holds either two compressed or one uncompressed line.  Significance compression cache: lines are compressed using sign compression Instruction compression. IDLE- CAPACITY SWITCHING ACTIVITY  Wasted activity related to out-of-order execution  Processor resources over provisioned to support high instruction throughput.  Power inefficiency of out-of-order processors:  Energy-per-instruction growth Ei ~ (IW)γ . Resource partitioning.  Cannot afford latency of very long wires.  Partitioned by placing buffers  Aimed at size vs speed trade-off.  Wire partitioning  Wire delay proportional to R x C .  Breaking wire into ‘k’ segments improves delay by k2  Total energy increases exponentially with k.  Replacing buffers with tristate devices. IDLE- CAPACITY SWITCHING ACTIVITY: INSTRUCTION QUEUE.  Resizable IQ, mix of CAM and SRAM  Readiness feedback control   Adjust IQ size based on the activity of its entries.  Decision making scheme has a safety mechanism. Occupancy feedback control  IQ, LSQ, ROB.  Occupancy of a structure is the appropriate feedback control metric.  Logical resizing without partitioning  IQ organized as a circular FIFO buffer.  Limiting the size logically by limiting the part that can be allocated to new entries  ILP- contribution feedback control  Instruction queue collapsing IDLE-CAPACITY SWITCHING ACTIVITY: CORE  Dynamically changing the width of an 8-issue processor to 6 or 4-issue.  6-issue processor: half of a cluster is disabled  4-issue processor: one whole cluster is disabled  Appropriate functional units are clock gated.  Decisions made at the end of the sampling window THANK YOU!