Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Variable-frequency drive wikipedia , lookup
Microprocessor wikipedia , lookup
Stray voltage wikipedia , lookup
Buck converter wikipedia , lookup
Opto-isolator wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Alternating current wikipedia , lookup
Power MOSFET wikipedia , lookup
Voltage optimisation wikipedia , lookup
ARTEMIS: An Aging-Aware Runtime Application Mapping Framework for 3D NoC-based Chip Multiprocessors Nishit Kapadia, Venkata Yaswanth Raparti, Sudeep Pasricha Department of Electrical and Computer Engineering Colorado State University, Fort Collins, CO, U.S.A. [email protected], [email protected], [email protected] I. INTRODUCTION II. ARTEMIS FRAMEWORK: OVERVIEW Bias Temperature Instability (BTI) is the most dominant physical phenomenon that degrades the maximum switching rate of transistors under long periods of voltage stress. BTI causes gradual circuit slowdown over the operational lifetime of an electronic chip. The principal effect of such a circuit-aging mechanism is to increase circuit-threshold voltage (VT), which results in higher circuit-delay. From a system-level perspective, such VT-degradation causes slowdown in critical paths of processor-cores and network-on-chip (NoC) routers, thereby limiting overall system performance. Additionally, electromigration (EM) in metal wires on the chip leads to increased interconnect resistance over time. This phenomenon is most dominant in power delivery network (PDN) wires that carry larger unidirectional currents compared to signal wires. The increased resistance of the power-grid results in higher IR-drops in the PDN, which causes further circuit slowdown due to degradation of supply voltage. These adverse effects of EM are expected to be particularly severe in 3D chip-multiprocessors (CMPs) that possess limited number of power-pins and higher current densities. A commonly proposed solution to mitigate circuit-slowdown due to aging is to hike the supply voltage; however this increases currentdensities in the PDN due to the increased power consumption on the die, which in turn expedites PDN-aging. Moreover, hiking supply voltage also increases VT-degradation, further increasing the rate of circuit-aging. We thus note that mechanisms to enhance lifetime reliability in 3D NoC-based CMPs must consider circuit-aging together with PDN-aging. In this work, for the first time, we propose a novel runtime framework (ARTEMIS) for intelligent dynamic application-mapping and voltage-scaling to simultaneously manage aging in circuits and the PDN, and enhance the performance and lifetime of 3D NoC-based CMPs. We also propose a symmetrical aging-enabled routing (SAR) algorithm that balances the degree of aging between NoC routers and cores, thereby increasing the combined lifetime of both. Our framework also considers dark silicon power constraints that are becoming a major design challenge in scaled technologies, particularly for 3D stacked CMPs. The key aspects of our proposed framework are illustrated in Figure 1. We consider a scenario where applications arrive at runtime to be executed on the 3D CMP platform. The knowledge of the chipaging profile is continuously utilized in the application-mapping and dynamic voltage scaling (DVS) scheduling steps. The applicationmapping at runtime consists of mapping the application task-graph on to a chosen rectangular- or cuboidal-shaped region of tiles (similar to [6] and [7]) on the 3D CMP admissible for the application (from the list {B1,β¦,Bn}), as well as performing routing path allocation of the intra-application communication-flows on the 3D NoC. At any given time, the scaling-down of Vdd via DVS-scheduling to save power and to limit aging is constrained by the frequency-constraints of the applications running on the 3D CMP, whereas scaling-up of Vdd is constrained by the DS-PB. Due to space constraints, all the details of our aging-aware application mapping and DVS as well as our circuitand PDN-aging analyses are not discussed. Sections II.A and II.B briefly describe the two main steps employed in ARTEMIS. Objective: Perform runtime application-mapping and DVS-scheduling on a given 3D NoC-based CMP platform such that the total number of applications executed over the lifetime of the chip is maximized, while all application-specific minimum operating frequency- and maximum runtime-constraints, as well as CMP platform-specific dark-silicon power-budget (DS-PB) are satisfied. Our heuristic utilizes the runtime profiles of VT-degradation and worst-case (WC) IR-drops of the 3D CMP (aging processes are modeled based on [1] and [2] in our simulation framework). The objective is to find the region on the 3D-mesh (with one of the admissible shapes) so as to: (a) minimize leakage-power; (b) minimize EM-induced degradation of PDN-paths supplying to cores with high WC-IR-drops; (c) satisfy the frequency-constraint of the application by all cores within the region; (d) satisfy the DS-PB. To this end, we define the following cost-function (Ξ¨) for joint optimization of leakage-power and PDN-degradation: Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). NOCS '15, September 28-30, 2015, Vancouver, BC, Canada ACM 978-1-4503-3396-2/15/09. http://dx.doi.org/10.1145/2786572.2786578 Figure 1. Overview of ARTEMIS runtime aging-aware applicationmapping and DVS-scheduling framework A. Circuit- and PDN-aging Aware Mapping Region-Selection and Voltage-Selection Heuristic Ξ¨ = βπ=π·ππ {πΌ. ( π=1 maxβ‘_ππ βπππ maxβ‘_ππ βπππ_ππ ) + π½( ππΆ_πΌπ _πππππ maxβ‘_πΌπ _ππππ )}β‘β‘β‘β‘β¦.. (1) The heuristic performs a simple exhaustive search over all tiles on the 3D-mesh and over all admissible shapes {B1, β¦, Bn} for the application. The VT-profile and WC-IR-drop profile inputs are used for calculating the value of Ξ¨. The region with the least Ξ¨ value that satisfies the frequency-constraints and at the same time does not violate the DS-PB, is selected for mapping the application under consideration. If no region on the 3D-mesh is found to satisfy the frequency-constraints, we repeat the search for successively higher Vdd-levels. B. Symmetric Aging-enabled Routing Path Allocation (SAR) In this step, we map the communication-flows of the current application on to the designated cuboidal region on the 3D NoC-based CMP. We propose a symmetric aging-enabled and congestion-aware routing scheme (SAR) to produce a balanced core-router aging profile and extend the lifetime of the 3D NoC. The main objective of SAR is to minimize the number of runtime scenarios where applicationmapping on a given cuboidal region is precluded due to aging in routers. Note that an application can be mapped only if all tiles (each tile has a compute-core and a NoC-router) within the region under consideration satisfy the minimum application-frequency constraint. Prior work on aging- enabled routing (such as [5]) considers the aging in NoC-routers but does not consider the aging in compute-cores. Such an approach could lead to a somewhat imbalanced aging within tiles of the CMP, thus potentially preventing application mapping onto desirable CMP regions due to excessive aging in NoC routers. SAR on the other hand enables symmetric aging on individual tiles of the 3D-CMP to extend the service life of NoC routers. Additionally, SAR efficiently trades-off aging with network-congestion in the NoC by choosing routing paths to maximize NoC-lifetime while leveraging the knowledge of maximum execution time constraints of applications, i.e., the aging metric in the routing cost function is prioritized by varying degrees, given the time-slack available for application-completion. compared to WL+DVS, despite ARTEMIS having a longer lifetime and servicing a higher number of applications. Figure 2(c) shows the variance in the WC-IR-drop-distribution on the 3D chip obtained at the end of lifetime with different frameworks. A smaller variance of IR-drops with ARTEMIS framework (up to 24% lower compared to WL+DVS) signifies efficient management of PDN-aging that aides in improving the longevity of the PDN. Figure 2(d), shows the mean effective VT-degradation in computecores at the end of lifetime. As discussed earlier, the VT-values of circuit components increase with aging. Given the nominal-VT of 0.3V at the start of lifetime, observe in figure 2(d) that the mean VTdegradation values at the end of lifetime for ARTEMIS framework are significantly higher (by up to 30% for compute-intensive workloads) compared to the WL+DVS framework. By restricting the EM-induced PDN-degradation, ARTEMIS can extend the tolerable degree of circuit-aging (VT-degradation) in compute-cores, while meeting the same performance constraints. Thus the 3D CMP remains functional for much higher VT-degradation with ARTEMIS compared to other approaches, highlighting a key feature of our framework. III. EXPERIMENTAL RESULTS Our experiments compare our ARTEMIS framework with two other runtime mapping approaches derived from prior work: (i) traditional worst-case guard-banding approach (WC-GB): region selection is done based on the runtime area constrained mapping approach [6] that attempts to fit the maximum number of applications on the chip. This approach neither assumes DVS nor assumes runtime inputs from aging-sensors to make mapping decisions and thus is not aging-aware; (ii) wear-leveling approach with DVS (WL+DVS): region-selection for application-mapping is always done based on the lowest average VT-degradation in cores, as proposed in [3] and [4]; in addition, Vdd is opportunistically reduced when possible and adaptively hiked with aging to meet application performance constraints. Figure 2(a) shows the total number of applications serviced over the chip lifetime, across the three different types of application-inputsequences. The error-bars in all our plotted results represent the range of results across simulations with five different randomly generated application-sequences (with individual applications derived from the SPLASH-2 and PARSEC benchmarks). As expected, the WL+DVS framework outperforms the WC-GB approach that does not perform DVS. By intelligently selecting application-regions on the 3D-die with its region-selection heuristic, our ARTEMIS framework achieves a further reduction in leakage-power dissipation and stress on the more highly degraded PDN-paths. The ARTEMIS framework produce 9%β40% (25% average) improvement in the total number of applications serviced over the next best framework, WL+DVS. We also analyze the distribution of percentage worst-case IR-drops (%WC-IR-drops) at the end of lifetime with different frameworks. Figure 2(b) shows the maximum %WC-IR-drops obtained for different frameworks at the end of chip lifetime. The aging-unaware WC-GB framework maps applications such that some cores are more heavily loaded than others, thus resulting in the shortest lifetimes with high maximum WC-IR-drops. With our strategy to prioritize mapping on cores with less WC-IR-drops, ARTEMIS framework produces lower maximum %WC-IR-drop-values (by up to 9% lower), Figure 2. Results obtained at the end of lifetime with all frameworks IV. CONCLUSION We proposed an aging-aware application-mapping and DVS scheduling framework (ARTEMIS) that considers PDN-aging of 3D NoC-based CMPs in addition to circuit-aging in both the performance and aging evaluation stages, and the lifetime optimization methodology. Compared to a framework based on the best known prior works on aging-aware mapping techniques, ARTEMIS is able to service 25% more applications (on average) over the chip lifetime. ACKNOWLEDGEMENTS This research is supported by grants from SRC, NSF (CCF1252500, CCF-1302693), and AFOSR (FA9550-13-1-0110). REFERENCES [1] [2] [3] [4] [5] [6] [7] J. Velamala et al., βStatistical aging under dynamic voltage scaling: a logarithmic model approach,β IEEE CICC, Sept. 2012. V. Mishra, S. Sapatnekar et al., βThe impact of electromigration in copper interconnects on power grid integrity,β DAC, May 2013. A. Tiwari, J. Torrellas, βFacelift: Hiding and slowing down aging in multicores,β MICRO Nov. 2008. S. Feng et al., βMaestro: Orchestrating lifetime reliability in chip multiprocessors,β HIPEAC 2010. K. Bhardwaj et al., βTowards graceful aging degradation in NoCs through an adaptive routing algorithm,β DAC June, 2012. M. Fattah et al., βSmart hill climbing for agile dynamic mapping in many-core systems,β Proc. DAC, June 2013. N. Kapadia, S. Pasricha, βVARSHA: Variation and Reliability-Aware Application Scheduling with Adaptive Parallelism in the Dark-Silicon Era,β Proc. DATE, Mar. 2015.