Download Architecture Support for Disciplined Approximate Programming

Architecture Support for Disciplined Approximate Programming Esmaeilzadeh, Sampson, Ceze, and Burger Presented by John Kloosterman, Mick Wollman Approximate computing Replace guarantees with expectations ● 2 + 2 = {3,4,5} ● 2 + 2 = 4, most of the time ● 2 + 2 = 1,000,000,000 Why? ● 90/10 rule: can get 90% of the answer for 10% of the effort ● Effort can be either ALU operations or power Applications of Approximation e.g. raytracing Result won’t noticeably change if ray angle 10% off or wrong 10% of the time sphere from http://www.piprime.fr/1143/official_asymptote_example-sphere/ Approximate results vs. semantics Approximate operations have undefined results, defined semantics Undefined result ● 2+2 = 5 is OK Defined semantics ● 2+2 throwing a divide-by-zero exception is not OK Disciplined Approximate Programming Split program into approximate/exact portions ● exact ⇒ approximate OK ● approximate ⇒ exact only with annotation Some computations must always be exact: ● address calculations ● control flow Need ability to switch between exact and approximate at instruction granularity Example Approximate Kernel approximate int sum; for (int i = 0; i < 100; i++) sum += *(array + i); (*output) = sum; Exact: ● loop counter ● address calculation Approximate: ● accumulator ● store to approximate memory Approximate ISA Design Have both approximate and exact: ● integer arithmetic ● FP arithmetic ● bit operations ● load/store instructions Disciplined model: partition approximate/exact ● This paper uses the same HW for both, compiler enforces data flow rules Microarchitectural planes ● Instruction control plane ○ ○ ○ ○ ✕ Fetch Decode Instruction bookkeeping Approximation would break semantics ● Data movement / processing plane ✓ ○ Datapath (RF, $, LSQ, FUs, Bypass network) ○ Approximation will only affect results Power Reduction Methods ● Global Voltage Reduction ○ Error checking + rollback to provide precision ○ Lower voltage ⇒ More rollbacks ● Dual Voltage, VH and VL ○ High voltage for IC plane ○ Either voltage for data plane ● Dual Voltage, VLH and VL ○ Lower IC plane voltage further ○ Rollback adds complexity, not examined Errors vs. Voltage (Razor) http://web.eecs.umich.edu/~taustin/papers/IEEEMICRO05-Razor.pdf Important structures ● DV-SRAM ○ Each row prepended a VH-driven precision bit ○ In-row VH bit used to connect to power lines ○ Precharge based on inst/op precision Important structures cont’d. ● DV-Mux ○ Select between two different voltage-level signals ○ Controlled by precision bit input[0] 0-VH/L input[1] 0-VH/L DV Mux select output 0-VH/L Important structures cont’d. ● L2H and H2L shifters L2H input 0-VH/L DeMux Mux output 0-VH Mux output 0-VH select H2L input 0-VH/L DeMux select Microarch. Changes ● Opcode and source register operands carry added precision bit ● RF precision set at register granularity ● Duplicate pipeline data registers ● Approx. shadow FU’s ○ DV FU’s possible but complicated Microarch. Changes cont’d. ● Broadcast network carries precision bit ● Memory precision set at cache line granularity ○ Precision set by fills and writes, left unchanged by read hits ○ Can do a precise read from an approx. line, vice versa ○ NB: Tags, MSHR, etc. always precise Overheads ● Precision bits ● Pipeline reg / FU duplication ● Shifters / multiplexers Results: Energy Savings Problems: ● Many computations for control flow/address generation, not data ● Much of processor is control plane, not data plane Best-case energy savings (best benchmark) for 50% voltage: ○ In-order: ~40% ○ OoO: ~15% ○ Difference due to size of control plane Energy Savings ● 25% energy savings with 50% voltage ● Modeled using McPAT and Cacti Results: Program % Approximate low integer opportunity high FP opportunity Program Error Sensitivity ● How sensitive are applications to approximation? ○ Model approximate results by flipping bits ○ exact: 0000010 + 0000010 = 0000100 ○ approximate with one bit flip could be ■ 0000101 = 5 ■ 1000100 = 68 input errors Questions? Discussion questions ● Pro ○ Many kinds of useful compute-intensive operations can be approximated (images, video, simulations) ○ SW approximation has given good results ○ Razor: undervolting can produce acceptable types of errors ● Con ○ Can this hardware be built? ○ Scheduling is more expensive than the operations themselves ○ Rounding error tolerant ≟ Approximation tolerant

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Architecture Support for Disciplined Approximate Programming