Download pptx - UCSD VLSI CAD Laboratory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Power inverter wikipedia , lookup

Electric power system wikipedia , lookup

Electrical substation wikipedia , lookup

Buck converter wikipedia , lookup

Fault tolerance wikipedia , lookup

Stray voltage wikipedia , lookup

History of electric power transmission wikipedia , lookup

Amtrak's 25 Hz traction power system wikipedia , lookup

Surge protector wikipedia , lookup

Rectiverter wikipedia , lookup

Distribution management system wikipedia , lookup

Opto-isolator wikipedia , lookup

Voltage optimisation wikipedia , lookup

Power engineering wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Alternating current wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Mains electricity wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Transcript
Optimizing Stochastic Circuits for
Accuracy-Energy Tradeoffs
Armin Alaghi3, Wei-Ting J. Chan1,
John P. Hayes3, Andrew B. Kahng1,2 and Jiajia Li1
UC San Diego, 1ECE and 2CSE Depts.,
3University of Michigan, EECS Dept.
0
Outline
• Background and Previous Work
• Problem Statement in SC Physical
Design
• Modeling Approach
• Optimization Approach
• Conclusions
1
Motivation: Low Power Challenge
• Low power design is a grand challenge
• Mobile devices must operate with extremely low power as the
performance requirement of applications grow
• Voltage scaling has slowed down in the recent years
• Possible solution: to employ new design paradigms to overcome the
challenges and achieve the performance improvements
10,000
9,000
Slow performance improvement due to
power limit + slow voltage scaling
4W mobile platform power
requirement
8,000
Power [mW]
7,000
6,000
5,000
4,000
1W SOC power
requirement
3,000
2,000
1,000
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026
Trend: Memory Static Power
Trend: Logic Static Power
Trend: Memory Dynamic Power
Trend: Logic Dynamic Power
Requirement: Dynamic plus Static Power
[source] ITRS
2
New Paradigm: Stochastic Computing (SC)
• Stochastic computing (SC) is a design paradigm that has gained
attention recently due its low power and error tolerance
• Random bit streams are used to represent operands
• Complex arithmetic operations implemented by simple logic
circuits
Z = X1×X2
X2
4/8
6/8
Z
3/8
X1
3/8 = 4/8  6/8
3
Error Tolerance, Precision, and Accuracy
Number to represent: 5/16
Stochastic: 0010 0001 0101 0010
Binary: 0.0101
• Bit-stream length grows exponentially with precision
• Redundant representation provides error tolerance
Correct = 3/8
• Inaccurate computation may occur
4
Area, Computation Efficiency, and Delay
SC: smaller area, longer computation latency,
and shorter critical path
Stochastic
multiplier
Conventional
binary
multiplier
Critical path
5
Application Context of SC
• Stochastic representation is similar to analog “pulse-mode”
signals, as well as neural signals
• Stochastic computing circuit performs cheap pre-processing; saves
resources
Low cost preprocessing between
two domains
6
Summary of Advantages/Disadvantages
• Advantages
• Low-complexity circuits (allows massive parallelism)
• Error tolerance
• Robustness to voltage scaling (explored and improved
this work)
• Disadvantages
• Long computation time
• Limited precision
• Expensive conversion circuits and storage elements
7
Outline
• Background and Previous Work
• Problem Statement in SC Physical
Design
• Modeling Approach
• Optimization Approach
• Conclusions
8
Challenges, Problems, and Our Contributions
Challenges of stochastic computing (SC) design:
• Current digital design flow does not comprehend the tradeoff
between accuracy and power in SC
• Physical implementation of SC circuits has not been well
explored
Problems:
• What is the efficient way to estimate error while exhaustive
simulation is not feasible?
• Given a synthesized SC circuit, what is the physical
implementation recipe?
Our contributions:
• We introduce the delay matching problem in SC
• We reduce the computation error by balancing delay paths
• We propose a Markov chain model for error estimation
9
Stochastic Computing: Scope of Study
• Design Metrics
• Energy
• Accuracy
(new model is proposed in this work)
Metrics covered in this
• Circuit area
work
• Design Parameters
•
•
•
•
Computation latency (N)
Frequency Scaling (f)
Voltage scaling (V)
Netlist Implementation
(New optimization is proposed in this work)
10
Outline
• Background and Previous Work
• Problem Statement in SC Physical
Design
• Modeling Approach
• Optimization Approach
• Conclusions
11
Balance of Path Delay Matters
Three scenarios of signal transitions
(A) Ideal: stable states of logic values are captured
(B) Balanced delay: all the transitions arrive at the same time
(C) Unbalanced delay: causing extra errors due to glitches or
delayed transitions
x
0
z
x1
Correct
Correct
Error
Sample
clock
(A) Ideal
(B) Balanced (C) Unbalanced
12
Markov Chain for Error Prediction
• Markov chain (MC) has been previously used to model
sequential SC circuits
• We augment the states for delay-induced transition errors
Only correct states in the previous
from the behavior model
SC behavior model
• Transition probability are trained by a small set
of simulation results
• Stationary probability distribution is obtained by
solving the Markov chain
• C1, D1, G1 decide the output expected values
Errors induced by
• Used for error estimation
glitches and delayed
transitions
13
Result: Markov Chain for Error Prediction
• Model is accurate for larger errors
• The model is less accurate when error is small
Precise prediction for high error
magnitude
On-going work:
to improve the
accuracy for
small errors
14
Outcome of Accuracy Model Study
Before our work:
• SC behavior model is based on pre-layout simulation
• SC behavior model did not consider the cell delay and wire delay contributed by
physical implementation
Our work:
• Augment the SC behavior model by considering delayed transitions and glitches
contributed by physical implementation
• Optimize the physical implementation by balancing the timing paths
Correct
Correct
Error
Balanced delays
15
Outline
• Background and Previous Work
• Problem Statement in SC Physical
Design
• Modeling Approach
• Optimization Approach
• Conclusions
16
Challenges of SC Physical Implementation
• Clock is fast to compensate for long computation latency
• Launch and capture flip-flops may be far apart in a huge array of
SC circuits
• Unbalanced paths due to circuit structures and variations
 Previous analysis shows delay balance matters
• The timing is more critical when DVFS lowers the supply voltage
Long physical distance in a huge array
Analog frontend circuit or
random number
generator
SC
Path 1 (long)
sub-circuits
x0
z
x1
Converter to
binary number
system
Path 2 (short)
faster clock to compensate for long latency
17
Post-P&R Optimization for SC Circuits
Problem statement:
• Given an SC circuit and a range of supply voltages, we seek an
implementation that minimizes error across the voltages
• Observation:
• Transition errors increase at lower voltages due to path delay mismatch
• Approach: ILP-based retiming after P&R by commercial tool
• Optimization constraints:
• #Buffers / #wires inserted to compensate for shorter paths
• Bounded delay variation across voltages
• Buffer power penalty
• Objective: minimize path delay differences
• Improves accuracy
• Side note: Similar to multi-corner multi-mode (MCMM) CTS skew optimization:
Skew <-> Path delay differences
MCMM <-> Delays are evaluated at multiple supply voltages
Power penalty <-> #Buffer insertion
18
ILP Formulation for Buffer Insertion
Minimize 𝑈 (𝑈 : max normalized delay delta)
where 𝑈 ≥
𝐷𝑚𝑎𝑥
𝐾
𝑘
𝑘
′
′
𝑘 ∙ (𝐷 𝑚𝑎𝑥 − 𝐷 𝑚𝑖𝑛 ) (𝑈 : normalized delay difference)
(1)
𝐷𝑚𝑎𝑥
𝐾
𝐷𝑚𝑎𝑥  Max path delay at highest voltage
𝑘
𝐷𝑚𝑎𝑥  Path delay at Vk
𝑘
′
𝐷 𝑚𝑎𝑥  Max delay at Vk after optimization
𝑘
𝐷′ 𝑚𝑖𝑛  Max delay at Vk after optimization
Subject to
𝐷′
𝑘
𝑖
= 𝐷𝑖
𝑘
+
1≤𝑖≤𝑀,1≤𝑗≤𝑄 𝑐𝑟𝑗 ∙ 𝑑𝑖
𝑘
(𝐷𝑖 ′ : opt. path delay; 𝐷 i: original delay)
1≤𝑗≤𝑄 𝑐𝑟𝑗 ≤ 1 (𝑐𝑟𝑗 : binary number denoting buffer insertion)
𝑘
𝛼 ∙ 𝐷𝑚𝑎𝑥 ≥ 𝐷′𝑗
𝑘
(𝛼: empirical parameter)
𝑘
𝛽 ∙ 𝐺𝑘 ≥
1≤𝑟≤𝑅,1≤𝑗≤𝑄 𝑐𝑟𝑗 ∙ 𝑔𝑗
(𝛽: empirical
normalize
delay parameter;
mismatch across voltages because
𝑔𝑗 : buffer leakage power; 𝐺 circuit leakage power)
(2)
(3)
(4)
(5)
(1) U: To
the
ranges of delays are different for each Vk
(2)(3) The inserted delay is decided by 𝑐𝑟𝑗 (to insert buffer to a net or
𝑘
not) and 𝑑𝑖 (cell delay at Vk)
(4) To exclude solutions with too many buffers inserted
(5) To limit the leakage power penalty
19
Heuristics for Buffer Choices
• Heuristic 1: various buffer/wire types to compensate for delay between
voltages
• We provide buffer candidates with different delay sensitivity to voltage scaling
• We provide wire detour options to provide wider voltage sensitivity range
• Heuristic 2: pruning buffers in the candidates to speed up MILP
• Solutions are pruned within sub-regions in the tradeoff space by choosing cells
in the regions with lowest leakage
Without pruning
With pruning
Wire detouring
20
Result: Improved Accuracy by Balancing Paths
Path delays
Average Errors
STRAUSS (UMich) +
Conventional P&R (ICC)
ReSC (UMN) +
Conventional P&R (ICC)
ReSC (UMN) +
Proposed P&R Opt.
Less inter-path
delay skew
Lower error
21
Result: Improved Input Delay Window
• Safe timing window: timing margin between clock edge and input
delay
• Before optimization: small input delay variation will cause errors
• After: Safe timing window = half of the clock cycle
Original delay distribution
Safe window
Safe window
Opt.
Clock period = 150ps
22
Result: Improved Energy Cost by Balancing Paths
Improved accuracy = Less voltage scaling needed = Higher energy
efficiency
Conventional P&R flow (ICC) fails to meet accuracy constraint when VDD
is low
Our proposed P&R optimization reduce delay mismatch at lower voltages
and leads to lower energy cost for the same accuracy
23
MC Model: Improved Simulation Runtime
• The proposed Markov chain model is verified on four
different SC application circuits
#Cycle (Ex.) #Cycles (MC)
Green: New MC model
GammaCorr
1024
10
Blue: Exhaustive simulation
PolySmall
256
10
Neuron
100
10
Less simulation cycles
24
Result: Gamma Correction
•
•
•
•
Testcase: Gamma correction
Both SC and conventional circuits are signed off at 1.0V
SC still generates recognizable image at 0.6V
Energy saving of SC = 66%
25
Outline
• Background and Previous Work
• Problem Statement in SC Physical
Design
• Modeling Approach
• Optimization Approach
• Conclusions
26
Conclusions
• We identify the impact of delay-induced errors and
propose a Markov chain-based model for error
estimation
• We propose a new physical implementation approach
that improves the energy-accuracy tradeoff
• The experiment results show significant energy and
benefit over previous work
Future work
• Markov chain model improvement
• Comprehensive tradeoff recipe for performance,
accuracy, and energy
27
Thank you !
28