Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Interconnect Modeling for Improved
System-Level Design Optimization
Luca Carloni §
Andrew B. Kahng¶
Swamy Muddu ¶
Alessandro Pinto‡
Kambiz Samadi ¶
Puneet Sharma ¶
§
Columbia University
¶ University of California, San Diego
‡ University of California, Berkeley
January 22, 2008
Outline
Motivation
System-Level Communication Synthesis
Buffered Interconnect Model
Interconnect Optimization
Validation and Significance Assessment
Conclusions
Motivation
Focus of design process is shifting from “computation” to
“communication”
Device and interconnect performance scaling mismatches
cause breakdown of traditional across-chip communication
System-level designers require accurate, yet simple models
to bridge planning and implementation stages
Today’s system-level performance, power modeling suffers:
Ad hoc selection of models
Poor balance between accuracy and simplicity
Poor definition of inputs
Lack of model extensibility across future technology nodes
Inability to explore different implementation styles
Our Goal: Develop accurate models that are easily usable by
system-level design early in the design cycle
Previous Interconnect Delay Models
Missing required aspects of accurate delay estimation 90nm
Do not consider input slew change, which impacts effective drive
resistance and consequently cell delay
Do not consider scattering, which impacts metal resistivity and
consequently metal resistance
Bakoglu90
No crosstalk impact, assumes driver on-resistance Rd, gate input
capacitance Cg vary linearly with device size, uses Elmore delay model
Pamunuwa03
Similar to Bakoglu90 but adds crosstalk impact
CongPan99 (IPEM)
Multiple delay models under certain optimization schemes
Use of second-order RC model for gate delay (e.g., Shao03)
Does not address gate loading during model construction
Other Limitations of Previous Work
Design style and buffering schemes
Design-level degrees of freedom: wire width, spacing,
shielding
Practical buffer sizing
Only consider the delay as optimization objective = wrong
Analytic solutions have large buffer sizes (100X-400X) which are not
in any realistic cell library
Model inputs and technology capture
Do not have well-defined pathways to capture necessary
technology and device parameters
Collect inputs from ad hoc sources, which often leads to
misleading conclusions
Outline
Motivation
System-Level Communication Synthesis
Buffered Interconnect Model
Interconnect Optimization
Validation and Significance Assessment
Conclusions
Communication Synthesis for Network-on-Chip
Given
An input specification as a set of communication constraints
A library of communication components
An objective function (e.g., power, area, delay)
Find
A network-on-chip implementation as a composition of
library components that
Satisfies the specification
Minimizes the cost function
Communication Synthesis Infrastructure (COSI)
Based on the Platform-Based Design methodology
Takes specification and library descriptions in XML format
Produces a variety of outputs , including a cycle accurate
SystemC implementation of the optimal network-on-chip
Constraint-Driven Communication Synthesis
Synthesis
Application
Point-to-Point Specification
Implementation
Perf. / Cost
Abstractions
Constraints
Propagation
On-Chip Communication Library
Synthesis Result
Communication Synthesis Key Elements
Specification of input constraints
Set of IP cores: area and interface
End-to-end communication requirements between pairs of
IP cores: latency and throughput
Characterization of library of components
Interface types, max number of ports
Max capacities: bandwidth, latency, max distance
Performance and cost model
Component instantiation and parallel composition
Rename, set parameters of library components
Composition based on algebra on quantities (including type
compatibility)
Communication Synthesis Example
Synthesis of optimal network-on-chip
Return valid composition that meets input constraints and
Minimizes the objective function (e.g., power dissipation)
(Original Specification)
Platform Instance 2
Platform Instance 1
COSI: Communication Synthesis Infrastructure
COSI is a public-domain software package for NoC synthesis
http://embedded.eecs.berkeley.edu/cosi/
Outline
Motivation
System-Level Communication Synthesis
Buffered Interconnect Model
Interconnect Optimization
Validation and Significance Assessment
Conclusions
Proposed Model Features
Tech. Characteristics
• # metal layers
• min. width, spacing, thickness
• dielectric thickness, constant
• device drive res, cap, leakage
Design Style
• width/spacing configs
• buffering scheme
• shielding
• signaling scheme
Bus Attributes
• length, # bits, layer, switching
Delay
Area
Proposed
Model
Leakage
Dynamic
Max. unclocked length, #
pipelines, latency, throughput
Improved accuracy with respect to well-known models
Modeling of nanoscale-era effects: crosstalk, scattering, barrier
thickness, dependence of delay on slews, etc.
Single-digit percentage accuracy relative to gate-level analyses
Model Technology Inputs
Inputs for repeater delay calculation
Delay and slew values for a set of input slew and load capacitance
values (obtained from Liberty / Timing Library Formats (TLF) / SPICE)
Input capacitance for different repeater size (Liberty, Predictive
Technology Models (PTM))
Inputs for wire delay calculation
Wire dimensions (ITRS/PTM, LEF, ITF)
Inter-wire spacings for global and intermediate layers (ITRS/PTM, LEF,
ITF)
Inputs for power calculation
Input capacitance (Liberty, PTM)
Wire parasitics (computed in wire delay calculation)
Inputs for area calculation
Wire dimensions used above
Repeater area is available from Liberty and for future technologies,
ITRS A-factors or proposed area models can be used
Buffered Interconnect Model
Buffered interconnect model for delay, power, and area
Constructed from: buffer (repeater) and wire delay models
Accounts for coupling capacitances, slew dependence and UDSM
effects (e.g., scattering-dependent wire resistance changes)
Calibrated against SPICE
Components:
Repeater delay model
Separate models for intrinsic delay, output slew, input capacitance
Wire delay model
Accounts for coupling capacitance impact on wire delay
Repeater power model
Accounts for sub-threshold and gate leakages
Repeater area model
Derived from existing cell layouts (can be extrapolated)
Wire area model
Derived from wire width and spacing (can be extrapolated)
Repeater Delay Model
Repeater delay can be decomposed into load independent (i) and load
dependent (rd.cl) components:
d = i + rd.cl
i(si) = α0 + α1.s1 + α2.si2
si denotes input slew; α0, α1 and α2 are the coefficient by quadratic regression
Drive resistance is nearly linear with input slew; also both the intercept
and slope vary with repeater size
rd = rd0 + rd1.si
Output slew depends on load capacitance; slope is independent of input
slew, while intercept depends linearly on it
so(cl , si) = so0 + s01.si + so2.cl
so is the output slew, and so0, so1 and so2 are the fitting coefficients from linear
regression
ci is the input capacitance, wp, wn are PMOS and NMOS widths respectively,
and η is a coefficient derived using linear regression with zero intercept
ci = η × (wp + wn)
Wire Delay Model
For wire delay we use the model proposed by Pamunuwa et al. (cf.
TVLSI03) which accounts for cross-talk
dw, rw, cg, cc, and ci respectively denote wire delay, wire resistance, ground
capacitance, coupling capacitance and input capacitance of the next-stage
repeater
λi is a coefficient (i.e., based on SPICE simulation) due to switching patterns of
the neighboring wires
dw = rw.(0.4cg + (λi.cc)/2 + 0.7ci)
We enhance the quality of the wire delay model by considering two other
important factors that change wire resistance:
Scattering-aware resistivity (cf. Shi et al. ASPDAC06):
ρ(w) = ρB + Kρ/ww
ww is the wire width, ρB=2.202 µΩ.cm, and Kρ=1.030×10-15 Ω.m2
Interconnect barrier (cf. Mai et al. IEEE01)
tm, tb respectively are the metal and barrier thicknesses, lw is the length of the wire,
and ρ is computed using the above equation
rw = (ρ.lw) / (tm - tb).(ww - 2tb)
Repeater and Wire Delay Models
Intrinsic Delay Model – i(slewin)
Output Slew Model – o(slewin, CL)
Drive Resistance Model – r(slewin)
delay = i(slewin) + r(slewin) * CL
r(s) = f(size, slewin)
slewout = f(slewin,CL)
wire delay = Elmore
Model coefficient fit from data
extracted from Liberty/LEF/Tech. files
and other extrapolatable sources
(i.e., PTM and ITRS)
Repeater and Wire Power Models
Power is an important design objective and must be accounted for early in
the design flow
Today, leakage and dynamic power are primary forms of power dissipation
Leakage has two main components: (1) sub-threshold leakage, and (2)
gate-tunneling current
Both components depend linearly on device size
ps= (psn + psp) / 2
psn = k0n + k1n.wn
psp = k0p + k1p.wp
Dynamic power can be calculated as:
pd = a.cl.vdd2.f
cl = ci + cg + cc
pd, a, cl, vdd and f are dynamic power, activity factor, load capacitance, supply
voltage and frequency, respectively
Load capacitance is composed of the input capacitance of the next repeater (ci),
ground (cg) and coupling (cc) capacitances of the wire driven
Repeater and Wire Area Models
For existing technologies, the area of a repeater can be
calculated as:
ar = τ0 + τ1.wn
ar denotes repeater area, τ0 and τ1 are coefficients using linear
regression; wn and wp are widths of NMOS and PMOS, respectively
For future technologies, feature size (F), contacted pitch (CP),
row height (RH), and row width (RW) can be used to estimate
the area:
NF = (wp + wn + 2.F) / RH
RW = NF × (F + CP) + CP
ar = RH × RW
Wiring area can be calculated as:
aw = n × (ww + sw) + sw
aw denotes wire area, n is the bit width of the bus, and ww and sw are
wire width and spacing
Repeater Power and Area Models
Repeater area and power
models fit from simulation
data points
Area and leakage power are
linear over the range of
implementable repeater
sizes (larger repeater sizes
higher leakage power)
Outline
Motivation
System-Level Communication Synthesis
Buffered Interconnect Model
Interconnect Optimization
Validation and Significance Assessment
Conclusions
Interconnect Optimization: Buffering
Conventional delay-optimal buffering unrealistic buffer
sizes high dynamic / leakage power suboptimal
Pareto-optimal frontier of the
power-delay tradeoff of a
5mm interconnect in 90nm /
65nm
Our approach: iterative optimization of hybrid
objective (power + delay)
Search for optimal number and size of repeaters
Can be extended for other interconnect optimizations (e.g.,
wire sizing and driver sizing)
Outline
Motivation
Communication Synthesis
Buffered Interconnect Model
Interconnect Optimization
Validation and Significance Assessment
Conclusions
Model Validation
Model comparison with results from physical implementation
{5mm wire} X {90nm, 65nm} X {wiring layers} X {design styles}
Model-predicted delays compared with delays from PrimeTime
Deviation of proposed model from PrimeTime delays < 15%
Impact on System-Level Design
Testcases
VPROC: video processor with 42 cores and 128-bit datawidth
dVOPD: dual video object plane decoder with 26 cores and 128-bit
datawidth
SOC
VPROC
90nm
65nm
dVOPD
90nm
65nm
Dynamic Power (mW)
Original Proposed
117.3
364.8
51.1
179.9
63.4
88.0
27.3
73.2
Leakage Power (mW)
Original
Proposed
38.1
99.6
69.9
86.7
14.2
32.5
25.7
33.2
Original model (Orig.)
underestimates power compared
to the Proposed Model (Prop.)
Device Area (mm x mm)
Original
Proposed
0.070
0.009
0.036
0.007
0.026
0.003
0.013
0.003
Total Area (mm x mm)
Original
Proposed
0.370
0.346
0.217
0.223
0.141
0.162
0.082
0.085
Avg # of hops
Max # of hops
Original Proposed Original Proposed
3.09
3.01
4
5
3.10
3.42
4
6
1.76
1.76
3
3
1.76
1.91
3
4
Original Model is very optimistic in delay (i.e.
the synthesis result may be actually infeasible).
This could become more critical as technology
scales and the chip size becomes larger than
the critical sequential length.
Outline
Motivation
System-Level Communication Synthesis
Buffered Interconnect Model
Interconnect Optimization
Validation and Significance Assessment
Conclusions
Conclusions and Future Directions
Accurate models can drive effective system-level exploration
Inaccurate models can lead to misleading design targets
Reproducible methodology for extracting inputs to models
from reliable sources
More realistic buffering scheme, where power and area are
considered in addition to delay
Modeling of NoC components besides wires
Across future nanometer technologies (45nm and beyond)
At different levels of abstractions
protocol encapsulation (e.g., hand-shaking for AMBA bus allocation)
buses, pipelined rings (e.g. EIB in IBM Cell)
routers, network interfaces
FIFOs, queues, crossbar switches (where ORION left off)
from high-level analytical models to low-level executable models
Extending to other metrics
Reliability estimation (i.e., error probability of transmission over wires)