Download ARTEMIS: An Aging-Aware Runtime Application Mapping Framework for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Variable-frequency drive wikipedia , lookup

Microprocessor wikipedia , lookup

Rectifier wikipedia , lookup

Stray voltage wikipedia , lookup

Buck converter wikipedia , lookup

Opto-isolator wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Alternating current wikipedia , lookup

Power MOSFET wikipedia , lookup

Voltage optimisation wikipedia , lookup

Distribution management system wikipedia , lookup

Mains electricity wikipedia , lookup

Transcript
ARTEMIS: An Aging-Aware Runtime Application Mapping Framework for
3D NoC-based Chip Multiprocessors
Nishit Kapadia, Venkata Yaswanth Raparti, Sudeep Pasricha
Department of Electrical and Computer Engineering
Colorado State University, Fort Collins, CO, U.S.A.
[email protected], [email protected], [email protected]
I.
INTRODUCTION
II.
ARTEMIS FRAMEWORK: OVERVIEW
Bias Temperature Instability (BTI) is the most dominant physical
phenomenon that degrades the maximum switching rate of transistors
under long periods of voltage stress. BTI causes gradual circuit
slowdown over the operational lifetime of an electronic chip. The
principal effect of such a circuit-aging mechanism is to increase
circuit-threshold voltage (VT), which results in higher circuit-delay.
From a system-level perspective, such VT-degradation causes
slowdown in critical paths of processor-cores and network-on-chip
(NoC) routers, thereby limiting overall system performance.
Additionally, electromigration (EM) in metal wires on the chip leads
to increased interconnect resistance over time. This phenomenon is
most dominant in power delivery network (PDN) wires that carry
larger unidirectional currents compared to signal wires. The increased
resistance of the power-grid results in higher IR-drops in the PDN,
which causes further circuit slowdown due to degradation of supply
voltage. These adverse effects of EM are expected to be particularly
severe in 3D chip-multiprocessors (CMPs) that possess limited
number of power-pins and higher current densities.
A commonly proposed solution to mitigate circuit-slowdown due
to aging is to hike the supply voltage; however this increases currentdensities in the PDN due to the increased power consumption on the
die, which in turn expedites PDN-aging. Moreover, hiking supply
voltage also increases VT-degradation, further increasing the rate of
circuit-aging. We thus note that mechanisms to enhance lifetime
reliability in 3D NoC-based CMPs must consider circuit-aging
together with PDN-aging. In this work, for the first time, we propose
a novel runtime framework (ARTEMIS) for intelligent dynamic
application-mapping and voltage-scaling to simultaneously manage
aging in circuits and the PDN, and enhance the performance and
lifetime of 3D NoC-based CMPs. We also propose a symmetrical
aging-enabled routing (SAR) algorithm that balances the degree of
aging between NoC routers and cores, thereby increasing the
combined lifetime of both. Our framework also considers dark silicon
power constraints that are becoming a major design challenge in
scaled technologies, particularly for 3D stacked CMPs.
The key aspects of our proposed framework are illustrated in
Figure 1. We consider a scenario where applications arrive at runtime
to be executed on the 3D CMP platform. The knowledge of the chipaging profile is continuously utilized in the application-mapping and
dynamic voltage scaling (DVS) scheduling steps. The applicationmapping at runtime consists of mapping the application task-graph on
to a chosen rectangular- or cuboidal-shaped region of tiles (similar to
[6] and [7]) on the 3D CMP admissible for the application (from the
list {B1,…,Bn}), as well as performing routing path allocation of the
intra-application communication-flows on the 3D NoC. At any given
time, the scaling-down of Vdd via DVS-scheduling to save power and
to limit aging is constrained by the frequency-constraints of the
applications running on the 3D CMP, whereas scaling-up of Vdd is
constrained by the DS-PB. Due to space constraints, all the details of
our aging-aware application mapping and DVS as well as our circuitand PDN-aging analyses are not discussed. Sections II.A and II.B
briefly describe the two main steps employed in ARTEMIS.
Objective: Perform runtime application-mapping and DVS-scheduling
on a given 3D NoC-based CMP platform such that the total number of
applications executed over the lifetime of the chip is maximized,
while all application-specific minimum operating frequency- and
maximum runtime-constraints, as well as CMP platform-specific
dark-silicon power-budget (DS-PB) are satisfied.
Our heuristic utilizes the runtime profiles of VT-degradation and
worst-case (WC) IR-drops of the 3D CMP (aging processes are
modeled based on [1] and [2] in our simulation framework). The
objective is to find the region on the 3D-mesh (with one of the
admissible shapes) so as to: (a) minimize leakage-power; (b)
minimize EM-induced degradation of PDN-paths supplying to cores
with high WC-IR-drops; (c) satisfy the frequency-constraint of the
application by all cores within the region; (d) satisfy the DS-PB. To
this end, we define the following cost-function (Ξ¨) for joint
optimization of leakage-power and PDN-degradation:
Permission to make digital or hard copies of part or all of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for third-party
components of this work must be honored. For all other uses, contact the
Owner/Author.
Copyright is held by the owner/author(s).
NOCS '15, September 28-30, 2015, Vancouver, BC, Canada
ACM 978-1-4503-3396-2/15/09.
http://dx.doi.org/10.1145/2786572.2786578
Figure 1. Overview of ARTEMIS runtime aging-aware applicationmapping and DVS-scheduling framework
A. Circuit- and PDN-aging Aware Mapping Region-Selection
and Voltage-Selection Heuristic
Ξ¨ = βˆ‘π‘˜=π·π‘œπ‘ƒ
{𝛼. (
π‘˜=1
max⁑_𝑉𝑇 βˆ’π‘‰π‘‡π‘˜
max⁑_𝑉𝑇 βˆ’π‘›π‘œπ‘š_𝑉𝑇
) + 𝛽(
π‘ŠπΆ_𝐼𝑅_π‘‘π‘Ÿπ‘œπ‘π‘˜
max⁑_𝐼𝑅_π‘‘π‘Ÿπ‘œπ‘
)}⁑⁑⁑⁑….. (1)
The heuristic performs a simple exhaustive search over all tiles on
the 3D-mesh and over all admissible shapes {B1, …, Bn} for the
application. The VT-profile and WC-IR-drop profile inputs are used
for calculating the value of Ξ¨. The region with the least Ξ¨ value that
satisfies the frequency-constraints and at the same time does not
violate the DS-PB, is selected for mapping the application under
consideration. If no region on the 3D-mesh is found to satisfy the
frequency-constraints, we repeat the search for successively higher
Vdd-levels.
B. Symmetric Aging-enabled Routing Path Allocation (SAR)
In this step, we map the communication-flows of the current
application on to the designated cuboidal region on the 3D NoC-based
CMP. We propose a symmetric aging-enabled and congestion-aware
routing scheme (SAR) to produce a balanced core-router aging profile
and extend the lifetime of the 3D NoC. The main objective of SAR is
to minimize the number of runtime scenarios where applicationmapping on a given cuboidal region is precluded due to aging in
routers. Note that an application can be mapped only if all tiles (each
tile has a compute-core and a NoC-router) within the region under
consideration satisfy the minimum application-frequency constraint.
Prior work on aging- enabled routing (such as [5]) considers the aging
in NoC-routers but does not consider the aging in compute-cores.
Such an approach could lead to a somewhat imbalanced aging within
tiles of the CMP, thus potentially preventing application mapping
onto desirable CMP regions due to excessive aging in NoC routers.
SAR on the other hand enables symmetric aging on individual tiles of
the 3D-CMP to extend the service life of NoC routers. Additionally,
SAR efficiently trades-off aging with network-congestion in the NoC
by choosing routing paths to maximize NoC-lifetime while leveraging
the knowledge of maximum execution time constraints of
applications, i.e., the aging metric in the routing cost function is
prioritized by varying degrees, given the time-slack available for
application-completion.
compared to WL+DVS, despite ARTEMIS having a longer lifetime
and servicing a higher number of applications. Figure 2(c) shows the
variance in the WC-IR-drop-distribution on the 3D chip obtained at
the end of lifetime with different frameworks. A smaller variance of
IR-drops with ARTEMIS framework (up to 24% lower compared to
WL+DVS) signifies efficient management of PDN-aging that aides in
improving the longevity of the PDN.
Figure 2(d), shows the mean effective VT-degradation in computecores at the end of lifetime. As discussed earlier, the VT-values of
circuit components increase with aging. Given the nominal-VT of
0.3V at the start of lifetime, observe in figure 2(d) that the mean VTdegradation values at the end of lifetime for ARTEMIS framework are
significantly higher (by up to 30% for compute-intensive workloads)
compared to the WL+DVS framework. By restricting the EM-induced
PDN-degradation, ARTEMIS can extend the tolerable degree of
circuit-aging (VT-degradation) in compute-cores, while meeting the
same performance constraints. Thus the 3D CMP remains functional
for much higher VT-degradation with ARTEMIS compared to other
approaches, highlighting a key feature of our framework.
III. EXPERIMENTAL RESULTS
Our experiments compare our ARTEMIS framework with two other
runtime mapping approaches derived from prior work: (i) traditional
worst-case guard-banding approach (WC-GB): region selection is
done based on the runtime area constrained mapping approach [6] that
attempts to fit the maximum number of applications on the chip. This
approach neither assumes DVS nor assumes runtime inputs from
aging-sensors to make mapping decisions and thus is not aging-aware;
(ii) wear-leveling approach with DVS (WL+DVS): region-selection
for application-mapping is always done based on the lowest average
VT-degradation in cores, as proposed in [3] and [4]; in addition, Vdd is
opportunistically reduced when possible and adaptively hiked with
aging to meet application performance constraints.
Figure 2(a) shows the total number of applications serviced over
the chip lifetime, across the three different types of application-inputsequences. The error-bars in all our plotted results represent the range
of results across simulations with five different randomly generated
application-sequences (with individual applications derived from the
SPLASH-2 and PARSEC benchmarks). As expected, the WL+DVS
framework outperforms the WC-GB approach that does not perform
DVS. By intelligently selecting application-regions on the 3D-die
with its region-selection heuristic, our ARTEMIS framework achieves
a further reduction in leakage-power dissipation and stress on the
more highly degraded PDN-paths. The ARTEMIS framework produce
9%–40% (25% average) improvement in the total number of
applications serviced over the next best framework, WL+DVS.
We also analyze the distribution of percentage worst-case IR-drops
(%WC-IR-drops) at the end of lifetime with different frameworks.
Figure 2(b) shows the maximum %WC-IR-drops obtained for
different frameworks at the end of chip lifetime. The aging-unaware
WC-GB framework maps applications such that some cores are more
heavily loaded than others, thus resulting in the shortest lifetimes with
high maximum WC-IR-drops. With our strategy to prioritize mapping
on cores with less WC-IR-drops, ARTEMIS framework produces
lower maximum %WC-IR-drop-values (by up to 9% lower),
Figure 2. Results obtained at the end of lifetime with all frameworks
IV. CONCLUSION
We proposed an aging-aware application-mapping and DVS
scheduling framework (ARTEMIS) that considers PDN-aging of 3D
NoC-based CMPs in addition to circuit-aging in both the performance
and aging evaluation stages, and the lifetime optimization
methodology. Compared to a framework based on the best known
prior works on aging-aware mapping techniques, ARTEMIS is able to
service 25% more applications (on average) over the chip lifetime.
ACKNOWLEDGEMENTS
This research is supported by grants from SRC, NSF (CCF1252500, CCF-1302693), and AFOSR (FA9550-13-1-0110).
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
J. Velamala et al., β€œStatistical aging under dynamic voltage scaling: a
logarithmic model approach,” IEEE CICC, Sept. 2012.
V. Mishra, S. Sapatnekar et al., β€œThe impact of electromigration in
copper interconnects on power grid integrity,” DAC, May 2013.
A. Tiwari, J. Torrellas, β€œFacelift: Hiding and slowing down aging in
multicores,” MICRO Nov. 2008.
S. Feng et al., β€œMaestro: Orchestrating lifetime reliability in chip
multiprocessors,” HIPEAC 2010.
K. Bhardwaj et al., β€œTowards graceful aging degradation in NoCs
through an adaptive routing algorithm,” DAC June, 2012.
M. Fattah et al., β€œSmart hill climbing for agile dynamic mapping in
many-core systems,” Proc. DAC, June 2013.
N. Kapadia, S. Pasricha, β€œVARSHA: Variation and Reliability-Aware
Application Scheduling with Adaptive Parallelism in the Dark-Silicon
Era,” Proc. DATE, Mar. 2015.