Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Energy Conservation and Adaptation in Mobile and Embedded Systems Faculty: Kang G. Shin Grad students: Babu Pillai Hai Huang Sabbatical leave from Real-Time Computing Laboratory University of Michigan http://www.eecs.umich.edu/~kgshin My Current Thrust Areas Wired and Wireless QoS Networking Internet QoS with DiffServ, MPLS, overlay networks Dependable real-time protocols Wireless QoS and security protocols Embedded Systems Model-based integration of application SW Modeling and integration of OS and network services Energy-aware real-time OS and applications Embedded systems networks Internet Servers and Adaptware Workload differentiation and protection Overload detection and avoidance DDoS attacks Outline Motivation Energy-efficient real-time OSs Real-time dynamic voltage scaling Energy-aware QoS Conclusions Motivation Handheld and mobile comp and comm devices are everywhere! Increasingly complex SW and faster HW demand more energy Rapid increases in HW complexity, speed, and power consumption, but battery technology is not keeping up Need to conserve energy, improve computational efficiency via OS on power-constrained systems Real-Time & Energy Constraints Many power-constrained embedded or mobile systems have real-time tasks Time-critical computations, typically periodic Need to provide guarantees for meeting deadlines Available stored energy fundamentally limits system runtime Need to use energy efficiently, allocate to the most critical or desirable computations, while meeting real-time requirements Real-Time Task Characteristics Typically well-defined task set Canonical real-time task, Ti: Is periodic, with period Pi Has worst-case execution time (WCET), Ci Has relative deadline, di typically equal to Pi Periodic model can accommodate aperiodic and sporadic tasks Schedulability of RT systems well-studied Energy-Efficient RTOS Reduce overhead services => lower computational overhead =>lower CPU power consumption Optimized IPC for periodic RT tasks Combined Static Dynamic (CSD) scheduling Protocol stack layer-bypassing Eliminate naming services Reported at ACM SOSP’99, NOSSDAV’00, USENIX’02 Exploit HW mechanisms, e.g., voltage scaling of CPU, power management of memory subsystem Memory Power Management Goal: reduce power dissipation for memory access Main memory consists of multiple devices, each with independently-controlled power states Switch devices not needed for current task to lowpower states Modify page allocation to reduce number of devices in use by each task 59-94% memory power reduction with RDRAM Will present at USENIX’03 RT-DVS Goal: reduce per-cycle CPU energy costs Reducing frequency permits lower voltage Lower voltage on CPU to obtain V2 savings per cycle Frequency change affects execution time, disrupts RT scheduling Have developed energy-conserving algs for DVS that preserve RT guarantees (ACM SOSP’01) CPU Operating Voltage Predominant device technology is CMOS E V 2 Maximum gate delays inversely related to voltage Can reduce unit computation energy by reducing frequency and voltage Dynamic Voltage Scaling (DVS) [Weiser+94] busy system increase frequency idle system reduce frequency Many algorithms for non-RT applications Software adjustable PLL, voltage regulator often available, but intended for other things XScale, SpeedStep, PowerNow!, Crusoe Our Approach Development of real-time DVS (RT-DVS) DVS algorithms that maintain RT guarantees Simple enough for online scheduling Work closely with existing RT sched algs. In-depth simulation Implementation in a real, working system Static Voltage Scaling Earliest-Deadline-First (EDF) Worst-case utilization: ui = Ci / Pi Frequency selection: ui f / fmax utilization 1.00 0.75 0.50 0.25 0.00 u3= 0.4 u2= 0.2 u1= 0.3 f = 0.9*fmax Simulation Parameterized simulation Synthetic random real-time task sets uniform distribution of short (1-10), medium (10100), long (100-1000ms) periods Vary computation time distributions Vary hardware specifications Compare different scheduling algorithms and theoretical bounds. Simulation Setup Input: task set, system parameters, sched alg Output: energy consumption of each alg System parameters: list of freqs and voltages actual fraction of WCET for each task idle level (idle vs. normal op energy consumed) Theoretical lower bounds: task execution thruput only w/o timing issues Simulation Results 8 tasks in a task set 100 random task sets Workload = total worstcase utilization 3 freq./volt. settings: 5V, 1.0*fmax 4V, 0.75*fmax 3V, 0.5*fmax 1 Normalized Energy Static 0.8 0.6 0.4 0.2 0 0.1 0.3 0.5 Workload 0.7 0.9 Dynamic Scaling utilization If each task uses less than WCET, we can use lower frequency during its invocation interval 1.00 0.75 0.50 0.25 0.00 u3= 0.2 f = 0.7*fmax u3= 0.4 (k-1)P3 kP3 u2= 0.2 f = 0.9*fmax u1= 0.3 time Cycle-Conserving EDF = fmax* ui at Ti release set ui = Ci / Pi at Ti finish set ui = actual execution time / Pi f / fmax fdesired 1.00 0.75 0.50 0.25 0.00 U=0.9 U=0.7 U=0.8 U=0.7 U=0.7 U=0.6 U=0.6 U=0.5 C1/3 C /2 2 C3/2 D1 u1 = 0.3 u2 = 0.2 u3 = 0.4 C1/3 D2 D3 time Task 1 Task 2 Task 3 Simulation Results 8 tasks in a set 70% WCET each invocation 3 freq./volt. settings: 5V, 1.0*fmax 4V, 0.75*fmax 3V, 0.5*fmax 1 Normalized Energy Static ccEDF 0.8 0.6 0.4 0.2 0 0.1 0.3 0.5 Workload 0.7 0.9 Proactive Techniques Tasks typically use much less than WCETs Proactively reduce frequencies Look ahead to meet future deadlines Consider all tasks together Look-Ahead EDF Minimize current frequency Trade current savings for potential future loss Plan to defer work beyond next immediate deadline Ensure future deadlines with “reservation” Task 1 Task 2 Task 3 D1 D2 D3 time D1 D2 D3 time D1 D2 D3 D1 time Simulation Results 8 tasks in a set 70% WCET each invocation 3 freq./volt. settings: 5V, 1.0*fmax 4V, 0.75*fmax 3V, 0.5*fmax 1 Normalized Energy Static ccEDF laEDF 0.8 0.6 0.4 0.2 0 0.1 0.3 0.5 Workload 0.7 0.9 More Simulation Results Number of tasks is not important Voltage and frequency settings greatly affect performance Look-ahead does not always perform best Algorithms can perform close to theoretical lower bound Implementation PC notebook computer AMD K6-2+ processor, 550 MHz PowerNow! Technology: frequency can be changed 200-550MHz in 50MHz increments voltage selection 1.4V or 2.0V empirical mapping between voltage and frequency switching overheads: 0.4ms (voltage), 41 micros (freq) Processing 20W, screen backlight 7W Software Architecture Real-time extension to Linux 2.2 Modular design, plug-in schedulers RT Task Set PowerNow module Real-Time module Linux 2.2 Kernel Hardware Scheduler module Measurements 30 25 20 EDF ccEDF laEDF Simulated Watts Use oscilloscope with current probe Synthetic RT workload w/ backlighting off Similar to simulation results (20-40% savings) Measured Watts 15 10 5 0 0.2 0.4 0.6 Workload 0.8 0.95 4 3.5 3 2.5 2 1.5 1 0.5 0 EDF ccEDF laEDF 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Workload Power Measurement Interesting Observations The very first invocation of a task may overrun its WCET due to ‘cold’ processor OS states Dynamic addition of a task may cause transient deadline misses, especially with more aggressive schemes Insert the task immediately, but release it after completion of current invocations of all existing tasks Related work Most are loosely-coupled with OS and based on avg processor utilization Many non-RT DVS papers (esp., UCB) Offline WCET analysis + online heuristic Computation time probability heuristics [Krishna+00], [Swaminathan+00] [Gruian01] Compiler-based, application-level DVS [Mosse+00] RT-DVS Discussion Designed and evaluated 5 DVS algorithms for realtime systems (ACM SOSP’01) Provide deadline guarantees while scaling frequency and voltage Simple enough to be used as online schedulers Excellent energy savings, comparable to non-RT DVS Implemented in real system on top of Linux Code available at: http://kabru.eecs.umich.edu/rtos/ But, we need a larger energy framework Limits of Low-level Techniques DVS & processor halt conserve energy only when extra capacity available No general guidelines on how to make best use of limited energy Cannot provide more energy and runtime to more critical or valuable tasks Need to adapt app workload to maximize system gains or utility of computation Example A remote surveillance device transmits compressed video and audio Solar-powered, but must run overnight 3 real-time tasks: Radio transmitter (critical): constant bit rate Video codec (degradable): -high quality (30 fps, 640x480) MPEG4 -low quality (10 fps, 160x120) MPEG1 Audio codec (noncritical): mp3, either ON or OFF Example, cont’d Adapt task set based on power consumption of tasks, available energy, hours until daylight, and relative value of the tasks, e.g., During daytime or high battery levels: radio, video at high quality, audio ON Low battery at night: radio, video low quality, audio ON Energy is critically low: radio, video low quality, audio OFF Dynamic adaptation needed in general, as battery levels and time until daylight are variable EQoS Need to maximize benefits gained from energy spent, but HOW? => Energy-aware Quality of Service (EQoS): Vary per-task QoS, which directly affects the task’s utility and energy consumption Select a set of task QoS levels to maximize total system utility over given runtime Cast selection into tractable, optimization problem EQoS Design EQoS design goals: Leverage low-level halt, DVS mechanisms Meet system runtime goal Maximize benefits of task execution Need methods of changing task QoS levels and specifying benefits and energy requirements How to change per-task QoS? Adopt RT fault-tolerance techniques: Period extension Imprecise computation Apply different algorithms or CODECs Omission Degraded service requires less energy For EQoS, need to specify set of QoS levels and their average required energy for each task Utility Abstract notion of value from executing tasks Need to specify utility for each QoS level of a task: Increasing Rewards for Increasing Service (IRIS) Performance Index (PI) for control applications Perceived quality metrics for multimedia Actual specification flexible to types of applications and systems designed EQoS Algorithms Given: Select a QoS level for each task to: tasks with QoS levels defined, energy required, and utility gained for each level remaining system energy desired runtime, or known time until recharge achieve desired runtime maximize total utility This can be formulated as a MCKP Each task as a category, and the set of its QoS levels as items in the category Knapsack size = power budget Item values and weights = utility rates and power consumption MCKP vs. EQoS Problem EQoS MC Knapsack Problem ... Task Category Item 1 Value Energy Weight Task Category Item 2 ... Value Energy Weight Task Category Item n Value Energy Weight Value Energy Weight ... Value Energy Weight ... Value Energy Weight Energy Weight Limit Runtime Constraint Optimal Algorithms NP-hard: all KP can be expressed as MCKP Exponential Search - O(mn) ... Branch-and-Bound (BB) Need fast bound computation Can use LMCKP as upper bound May still require exponential time Optimal Algorithms, cont’d Dynamic Programming (DP) time, O(mnk) Partial solutions for 1, 2, …, n tasks for all possible power budgets (energy/runtime) Pseudo-polynomial Heuristics Linear: Use LMCKP solution, as with BB bound Drop fractional part Greedy: Start with same approach as LMCKP Continue selecting smaller upgrades O(nm) overhead, without upgrades sorting Simulation Permits exploring a large space multidimensional task set space Simulate various hardware configurations, RT scheduling, DVS mechanisms Static RM, Static EDF, ccRM, ccEDF, laEDF Generated 1000 random task sets, each with 10 tasks, and each of which has up to 5 QoS levels QoS degradation models period extension, imprecise computation, algorithmic change Simulation Results EQoS algorithms achieve desired runtime DVS conserves extra energy, throws off estimated runtime Greedy Linear Max Min Opt 1.2 2 Normalized Utility Normalized Runtime 1 1.5 1 0.5 0.8 0.6 0.4 0.2 0 0.00E+00 2.00E+09 4.00E+09 6.00E+09 Initial Energy 8.00E+09 0 min linear greedy opti max Simulation Results - DVS DVS increases energy efficiency Throws off adaptation -- extends runtime 3 volt/freq: 5V, 1.0*fmax 4V, .75*fmax 3V, .35*fmax Linear 2.00E+09 4.00E+09 Max Min Opt 3.5 Normalized Runtime Greedy 3 2.5 2 1.5 1 0.5 0 0.00E+00 6.00E+09 Initial Energy 8.00E+09 Simulation Results, cont’d DVS compensation achieves desired runtime with higher utility Greedy Linear Max Min Opt 3 DVS 2 DVS-comp Normalized Utility Normalized Runtime 2.5 1.5 1 0.5 2 1.5 1 0.5 0 0.00E+00 2.00E+09 4.00E+09 6.00E+09 8.00E+09 0 min linear greedy opti max Initial Energy Adaptation w/ DVS Compensation Utility comparison between DVS compensation and w/o compensation Implementation Implemented on Linux 2.2 Periodic real-time support PowerNow! driver Real-time scheduler modules EQoS adaptation module Battery monitoring module Currently supports Athlon, Duron, K6-2 processors that implement AMD’s PowerNow! Technology Experiments Measurements on a Compaq Presario 1200Z Implement RT version of Lame MP3 encoder use quality parameter to vary QoS multiple concurrent instances Results follow trend observed in simulations Conclusions RT-DVS provides low-level CPU voltage control Maintains timing guarantees for RT tasks Significant energy savings, comparable to non-RT DVS EQoS provides task adaptation in energyconstrained real-time systems Provides guidelines to best utilize available energy among tasks Frame energy adaptation as a tractable problem Heuristics work nearly as well as optimal algorithms in practice