Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Resource management in embedded systems Tajana Šimunić Rosing HP Labs & Stanford University Embedded System Trends Increasingly complex systems Mixed hard & soft real-time requirements Coordination of subsystem & multi-system control Demand for dynamic & adaptive response Operation in unpredictably changing contexts Variable performance demands Management of resources (power, performance, availability, accessibility, throughput, security etc.) Demand for autonomy Human feasibility restrictions Faster hardware Fast processors and networks Integrated processing, common platforms FPGAs vs ASICs, DSPs; SoCs Source : NSF EU-US Workshop ‘01 Tajana Simunic Rosing Overview of Contributions Resource management in wireless embedded systems Utilize information already present in the system to deliver lower energy consumption with excellent quality of service Simultaneous increase in accessibility Power management of embedded HW Evolution of SoCs into NoCs Optimal power and performance management policy for NoCs Energy optimization of embedded software Methodology for lowering energy consumption in data-intensive embedded software algorithms using symbolic algebra techniques Tajana Simunic Rosing Resource Management of Heterogeneous Wireless Embedded Systems Networked Embedded Systems – Sensor Nodes Power consumption of sensor node subsystems Power (mW) 20 15 10 5 0 SENSORS CPU Energy breakdown for voice TX RX IDLE SLEEP Energy breakdown for MPEG video Encode Decode Receive RADIO Encode Transmit Decode Transmit Receive Lucent WaveLAN at 2 Mbps & SA-1100 CPU at 150 MPIS Source : Mobicom’01 SensorsTutorial Tajana Simunic Rosing 802.11b PM - Doze Mode - MAC layer PM is not sufficient due to continual network polling for data, increased RTT and broadcast traffic issues - Network layer management has no knowledge of application characteristics or of behavior of other clients in the environment 2 Heavy traffic conditions: the card is awake most of the time 1.8 1.6 Power(W) 1.4 awake state 1.2 1 0.8 0.6 0.4 2 0.2 1.8 987 929 871 813 755 697 639 581 523 465 407 349 291 233 175 117 59 1 0 1.6 tim e (10-4 s e c) 1.2 1 0.8 doze state 0.6 0.4 0.2 Medium traffic conditions: the card is awake half of the time 945 886 827 768 709 650 591 532 473 414 355 296 237 178 119 time (10-4 sec) network polling 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 tim e (10-4 s e c) 973 919 865 811 757 703 649 595 541 487 433 379 325 271 217 163 109 1 0 55 Light traffic conditions: the card is sleeping most of the time Power (W) 1 0 60 Power (W) 1.4 Tajana Simunic Rosing Related Work in PM for Networks Homogeneous networks: 802.11e standard allows applications to specify their QoS needs Separate control and data channels for 802.11 based networks [Shi’02] • Decoupled low power control channel, device sends a wake-up call Protocols for sensor networks [Estrin’02] • Low duty cycle operations on radio Power efficient coordination in 802.11b networks [Chen’01] • Forwarding nodes remain active while neighboring nodes are in power save Heterogeneous networks Improved network and link layers for mobility • Mobile IP protocol for host mobility [Monarch CMU] • Contact networking [HPL & UIUC, MobiSys '03 ] Improved hand-off mechanism in overlay networks • Buffering of data at multiple base stations [Barwan UCB] Distributed file system for wireless [Coda,Odyssey CMU] • Caching of files at clients and duplication on accessible servers • Adjust quality of accessed data to match available resources Tajana Simunic Rosing Related work in PM Command Policy Data Heuristic techniques Power Manager Focus on power management of a single device (e.g. CPU, hard disk) Time-out and predictive models [Karlin’94, Hwang 97] DVS algorithms Stochastic methods Requests Active MDP [Benini 99, Qiu 99] • Memoryless distributions only • Discrepancy between predicted and measured savings TISMDP [Simunic 01] • Finite history of user occurrences • Decisions based on event occurrences • Large energy savings measured Queue 4 32 1 Idle min Client Power States Active Idle Sleep Costenergy( s, a) f ( s, a) s a s.t. f ( s, a) M ( s' , s, a) f ( s' , a) 0; a s ' s' a T ( s, a) f ( s, a) 1 s a Cost perf ( s, a) f ( s, a) Constr. s a Tajana Simunic Rosing System Resource Management Concentration on resource management aspects of diverse devices in a heterogeneous wireless network • maintain desired QoS • increase accessibility • maximize battery lifetime Implementation of policies to define • Which devices should communicate using which network interface • When should this communication take place • When should the communicating device be in a low power state Client RM Server RM Separation of policy and control “smart” clients communicate their needs to the resource manager stochastic modeling of communication patterns for independent clients policy takes into account client’s needs and determines control decisions WAN WLAN WPAN Tajana Simunic Rosing Comparison Server-driven Management State of the art DATA Server Client Client Server PM CONTROLS Server Continuously Streaming wireless always on • 802.11b • Doze interval regulation • Manual on/off • Bluetooth • Park,sniff,hold modes have to be specifically initiated by the server Server Adaptive Buffering wireless on only when data arrives • Control power state of all components • Use application knowledge – higher savings possible than just at MAC and network layers • Schedule/buffer with multiple clients Tajana Simunic Rosing Server RM Architecture Detail Client application Server application RTP/UDP/TCP APPLICATION LAYER Server RM TCP CLIENT RM OS LP Device Drivers Device RM Dynamic clock Speed setting CPU Idle/ Sleep Mode Tajana Simunic Rosing Server RM algorithm Server has the additional knowledge of multiple clients workload & traffic conditions Efficient transmission coordination in multiple clients environment Scheduling based on the link quality, client data consumption rates and power needs Estimation Server decides the RM policy Enable/disable various resouces (e.g. BT park) Adjust resource parameters (e.g. doze interval) Set duration of the sleep intervals Schedule on time wake-up (no delay) Server RM relies on the client RM Buffer size calculation Client RM controls the devices Utilizes Rate Monotonic and Earliest Deadline First algorithms to schedule communication with multiple clients Memory energy Communication energy & switch evaluation WNIC settings Tajana Simunic Rosing Estimation and buffer sizing process Maximum likelihood estimator keeps track of changes in WNIC throughput and data usage pattern of the applications m new ln( Pmax ) ( w c 1) ln new old t j old j k Size of buffer chosen to maximize sleep times Total buffer size: Buffer region actively involved in data transfer in steady state: Bactive Bsleep Bontrans Boff trans Buffer size required during Bswitch Tswitchu interface switch: B Bactive Bswitch Bcush Average sleep time: Tsleep Bsleep t u t u Tajana Simunic Rosing Energy consumed by WNIC and RAM EW NIC Pactive Ttransfer Tcushion PtransTtrans PsleepTsleep PswitchTswitch WNIC energy: Interface Switching ERAM Eactive Enon active RAM energy: 250 0.125 0.12 200 0.115 0.11 150 0.105 100 0.1 0.095 50 0.09 0 0 1 2 3 4 5 6 7 8 9 0.085 100 200 300 400 500 600 700 800 900 7 x 10 Average RAM power vs. buffer size Total energy: Average WNIC power vs. buffer size Etotal EWNIC ERAM Tajana Simunic Rosing Server RM Scheduling Algorithms Classical scheduling theory result: There is no dynamical scheduling algorithm for dynamically changing tasks on multiple processors that is provably optimal [Liu’73] Consequence: server RM scheduler for a multi-client system has to be heuristic Two different algorithms are compared: Earliest Deadline First: • Schedule clients with earliest deadline first • Optimal dynamic preemptive algorithm based on dynamic priorities for multiple tasks on a uniprocessor can achieve 100% utilization if: W CTE j Tj 1 j Rate Monotonic: • schedule clients with highest data consumption rate first (shortest period) • according to RM a set of periodic, independent task can be scheduled to meet their deadlines on a uniprocessor, if the sum of the utilization factors is given as: • Where: WCTE j Tj U (n) n(21/ n 1) j – WCTE is worst-case execution time of task j – Tj is the period of task j – U(n) is the utilization bound for n tasks Tajana Simunic Rosing Experimental setup Research prototype of HP’s hotspot server HP’s IPAQ 3970 with Bluetooth (CSR) and 802.11b (CISCO Aironet 350) Simulator for multiple client scenarios The applications used are MPEG4 video MP3 audio Email Telnet WWW DSR Results for: Server RM to single client Server RM to multiple clients Data consumption rate of applications kbps Tajana Simunic Rosing Server RM with MPEG4 Video Streaming WLAN Avg Power (mW) 1200 WLAN Avg Power (mW) No PM 802.11b PM RM 1100 1100 No PM 802.11b PM RM 900 1000 900 700 800 500 700 600 300 size= 50 size = 70 size = 80 size = 80 size = 70 size= 50 size=20 30 frame/sec to a mobile 15 frame/sec to a mobile 40% of power reduction 65% of power reduction larger number of concurrent clients with real time playback Tajana Simunic Rosing Effect of Traffic Conditions Power Consumption vs Traffic Conditions for Video Streaming Average Power (mW) 1000 800 600 SPM 802.11 PM 400 200 0 No Traffic Stanford (light) HP (Medium) Bologna (Heavy) Traffic Conditions Server RM saves 50% of power in heavy traffic conditions with respect to 802.11b PM The server RM performs better in medium/heavy traffic conditions 802.11b PM is useful for systems where broadcast traffic cannot be discarded Both implementations have real time video playback Tajana Simunic Rosing Server RM with MP3 Audio over 802.11b 180 160 140 120 100 80 60 40 20 Stream B Energy (J) 0 Stream A Energy (J) RM 802.11 PM 802.11 • Large energy savings with concurrent increase in number of clients supported – – • factor of 24 relative to current usage model for 802.11b factor of 3 with respect to 802.11 PM with no broadcast traffic Quality of service remains constant – playback is in real time Tajana Simunic Rosing Server RM for MP3 & E-mail via Bluetooth 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 2x concur r ent client s 0.02 1000x client s 0 M P3 B T E-mail B T On On • • M P3 B T Park Power (W) M P3 B T DS E-mail B T DS Large increase in availability of server to clients • Factor of 2 for MP3 streaming • Factor of 1000 for email Concurrent large savings in power • Factor of 2 for MP3 streaming and a factor of 28 for email Tajana Simunic Rosing WNIC switch Comparison of power consumption for an application trace consisting of MP3 audio, Email, Telnet, WWW and MPEG4 video. A factor of 3.0x improvement over solely using Bluetooth or 802.11b Tajana Simunic Rosing Server RM to Multiple Clients BLUETOOTH 802.11b MP3 MP3 Sch. Algo *BC: best client *WC: worst client Avg. power Data (W) trans Sch. Algo *BC: best client Avg. powe r *WC: worst client (MB) RM EDF BC (24.4kB/s) 0.0737 WC (19kB/s) 0.0422 41.5 BC (24.4kB/s) 0.0635 62 WC (19kB/s) 0.0499 49.5 BC(29.3kB/s) 0.0876 81.2 WC(14.7kB/s) 0.0039 2.6 EDF BC(29.3 kB/s) 0.0544 53.6 WC(14.7kB/s) 0.0366 31.3 BC(5.86 kB/s) 0.0179 15.8 WC(4.11kB/s) 0.0042 0.08 BC(5.86 kB/s) 0.0191 16.9 WC(4.11kB/s) 0.0105 6.7 RM EDF EDF RM EDF EDF 63.4 WC(19 kB/s) 0.2949 37.6 BC(24.4 kB/s) 0.3784 72.7 WC (19 kB/s) 0.2578 50.74 BC(29.3kB/s) 0.0696 75.6 WC(14.7kB/s) 0.4863 0 BC(29.3 kB/s) 0.4159 84.8 WC(14.7kB/s) 0.6201 4.33 BC(5.86 kB/s) 0.0205 14.9 WC(4.11kB/s) 1.0423 0.3 EDF BC(5.86 kB/s) 0.7785 17.9 WC(4.11kB/s) 1.0426 5.55 SENSO R NO DE BC(14.7kB/s) 0.0442 39.3 WC(12.5kB/s) 0.0367 32.7 BC(14.7kB/s) 0.0428 40.7 WC(12.5kB/s) 0.0365 34.4 Maximum power savings factor of 42 Average a factor of 4.6 compared to always on: 0.18 W 0.0594 WWW RM SENSO R NO DE RM BC(24.4 kB/s) MPEG4 WWW RM trans (MB) 68.7 MPEG4 RM (W) Data RM EDF BC(14.7kB/s) 0.0362 40 WC(12.5kB/s) 0.4583 11.2 BC(14.7kB/s) 0.4488 43.9 WC(12.5kB/s) 0.5768 27.3 Maximum power savings factor of 40 Average a factor of 2 compared to always on: 0.8W with 802.11b PM Compared Server RM Rate Monotonic scheduler with Earliest Deadline First in time period of 2hrs on two WNICs: Bluetooth (3 clients) and WLAN (30 clients) with multiple applications Clients with lower scheduling priority had an average 10% delay penalty Tajana Simunic Rosing Managing Power Consumption in Networks on Chips Related Work SOC interconnect standards [AMBA,CoreConnect,VSI,OCP] NOC architecture based on packet model Fat tree router topology [Guerrier00] Tiled architecture with flit-reservation flow control [Dally01] Correct-by-construction protocol stack – MESCAL tools [Sgroi01] Reduction of energy consumption in NOCs Maia processor has 21 satellite units; its configuration changes according to application needs – large energy savings [Wan00] Node and network-centric power management suggested [Benini02] Recently proposed power management systems Exclusively node-centric, with little or no outside information utilized Power management & dynamic voltage scaling occur separately Open loop control • policies designed once with no further optimization at run time Tajana Simunic Rosing Network on a Chip ARM Core PCM Buffer DMA Controller Flash MPEG Audio Core PM Audio Out Controller ARM Core DSP DMA Controller Speech Processing PM Audio In Controller RAM Flash RAM Router ARM Core DSP Flash DMA Controller MAC controller MPEG Video Core PM Display Controller Specifications Embedded CPU Communications PM RAM Baseband DSP RAM Flash EEPROM Audio Video Speech Comm. Active (mW) 700 1885 1500 1055 5140 Idle (mW) 216 235 1000 208 1659 Sleep (mW) 0.3 1.4 100 0.6 102.2 A-S-A (ms) 45.6 54.6 40 54.6 54.6 11 11 3 11 11 150 150 100 150 150 #DVS settings DVS switch (us) Radio Total Tajana Simunic Rosing Power Manager Implementation Local Power Manager Router Core Traffic Core Network PM Request Control Policy Core Function Estimator Core Power management Node-centric – fully contained in a local power manager Network-centric – network power management requests Local power manager implements closed-loop power management: Estimator • Observes incoming core traffic, core state & network PM requests • Estimates parameters used in recalculation of power management policy Controller • Sets core’s energy and performance states based on estimator input Tajana Simunic Rosing Node-centric PM Power management is based on Renewal Model Departure Active State foVo Idle State queue > 0 queue = 0 Renewal point Arrival Transition to Sleep State Arrival Transition to Active State queue > 0 No Arrival Arrival Component State User Queue > 0 Queue = 0 Device Active Transition Distribution Exponential Pareto Exponential Uniform Sleep State Exp 1 e et Pareto 1 b t a t tmin t t tmax Uniform tmaxtmin min 0 else Tajana Simunic Rosing Renewal Policy Optimization Basic assumptions: general distribution governs the first request arrival exponential distribution represents arrivals after the first arrival user, device and queue are stationary Optimize average performance under average power constraint randomized policy d ( j ) p( j ) min j T ( j ) p( j ) j s.t. ( Energy( j ) PconstrT ( j )) p( j ) 0 j p( j ) 1 j Globally optimal policy calculated in seconds using LP Tajana Simunic Rosing Closed-loop Renewal Policy Optimization Formulate dual of the Lagrangian Variables v,u & are the Lagrangian multipliers min v s.t. d ( j ) v( j )t ( j ) u ( j )[e( j ) t(j)Pconstr ] ( j ) 0 j Obtain a minimum crossing point of a set of lines specified by the following equation: v( j ) [e( j ) t(j)Pconstr ] d ( j) u( j ) t ( j) t ( j) Indexes of Lagrangian multipliers which form a solution, together with original constraints, are used to obtain the probabilities of transitioning into sleep state Real-time closed-loop control is possible Globally optimal policy calculated in milliseconds Tajana Simunic Rosing Node-centric estimation Estimation of exponential and Pareto distribution parameters Exponential Pareto Pareto 1 b t a Exp 1 e et calculate maximum likelihood ratio for all rate settings calculate interarrival (or interservice) time sums ( S tj ) evaluate natural log of maximum likelihood ratio, ln (Pmax ) n po int s new ln( Pmax ) k ln ( new old ) t j old j nchange if ratio is larger than the one obtained from the lookup table, assume that the rate has changed estimate parameters a & b using leastsquares method on the log of Pareto distribution 1 Prob(T>t) 0.1 0.01 Experimental Pareto Estimate 0.001 0.01 0.1 1 Idle time (t) in sec. 10 Tajana Simunic Rosing Controller implementation Consists of LFSR for generating probability & policy logic Controller on entry to idle state: obtains a random number RND & finds jh for which RND>p(jh) if no arrival during jh seconds, the core enters sleep state, otherwise it stays active Frequency and voltage are set so the average expected processing delay in the queue is kept constant: device Prdelay 60% user ( user device) 50% Error (%) Optimal Policy 40% 30% 20% Pow er Err 10% Penalty Err Idle time Probability (ms) to sleep jh p(jh) 0 0.00 10 0.00 20 0.12 30 0.43 40 0.75 50 0.87 60 0.91 70 1.00 0% 3 8 13 LFSR Bits FPGA synthesis LFSR Bits 5-15 LFSR Regs # LABs Max ns 1 4 Policy # LABs 2 Max ns 35 Synposys synthesis LFSR Regs #FFs % area 5 14% 9 14% 15 12% Policy #gates % area 193 86% 417 86% 855Tajana Simunic 87%Rosing Network centric PM Local Power Manager Router Control Policy Core Traffic Core Core Function Estimator Network PM Request Core Departure Active State foVo Idle State queue > 0 queue = 0 Arrival Arrival Transition to Active State Network queue > 0 request Arrival Renewal point Network request Transition to Sleep State No Arrival Sleep State Tajana Simunic Rosing Network centric PM implementation Estimator continues to have the same function as before Renewal model is expanded to include network requests Controller implementation changes: When all network cores release the local core, the probability of transition to sleep is 1.0 Source Node Network Idle Time (ms) 0 70 120 Any time Transition Probability 0 0.3 1 1 As soon as a request comes from a network core to the local core, the local core transitions to the active state with probability 1.0 Node-centric PM is still needed to implement DVS and PM in situations when early network requests are not available Tajana Simunic Rosing Network-centric results Specifications Audio Video Speech Comm. Active (mW) 700 1885 1500 1055 5140 Idle (mW) 216 235 1000 208 1659 Sleep (mW) 0.3 1.4 100 0.6 102.2 A-S-A (ms) 45.6 54.6 40 54.6 54.6 11 11 3 11 11 150 150 100 150 150 #DVS settings DVS switch (us) Total MPEG Audio Core Speech Processing MPEG Video Core Communications Power savings factor PM None Node Centric PM Type None DVS only DPM only DVS &DPM Network DVS &DPM Audio 1 1.4 2 2.9 Video 1 2 1.5 2.9 Comm. 1 1 3 3 S peech 1 3.9 2 5.8 Total 1 1.2 2.4 2.9 3.7 3.6 4.2 6.4 4.1 Network-centric DPM increases power savings from a factor of 2.9 to 4.1, while at the same time reducing performance penalty by more than 10% Tajana Simunic Rosing Complex Library Mapping for Embedded Software Using Symbolic Algebra Related Work Tree covering code generation [Aho] Retargetable compiler [Goossens, Paulin, Marwedel] Map to simple processor instruction Instruction mapping of ASIPs Power aware compiling Memory access optimization [Catthoor, Kandemir] Instruction reordering [Tiwari] Symbolic algebra for data-flow synthesis [ICCAD’01] Tajana Simunic Rosing Methodology Pre-optimized Embedded Library Algorithmic-level C Code Profiling Critical Code Polynomial Formulation Target Code Library Characterization Symbolic Library Mapping Optimized C Code using Embedded Library Tajana Simunic Rosing Library characterization Target library: Commercial library (e.g. Intel’s integrated performance primitives library) A set of in-house pre-optimized routines IEEE floating-point math library for Linux OS Each library element is labeled with: Type of inputs and outputs Performance and energy consumption obtained from cycle-accurate simulator Functionality & accuracy of its polynomial representation Library Element Perf (s) Factor float SubBandSyn 0.94813 1 fixed SubBandSyn 0.01026 92 IPP SubBandSyn 0.00198 479 float IMDCT 0.3872 1 fixed IMDCT IPP IMDCT 0.0144 0.0002 27 1898 xi n 1 2 k 0 2n y k cos( (2i 1 n )( 2k 1)) 2 Tajana Simunic Rosing Target code identification Algorithm Identify critical code segments with a profiler Formulate maximum size polynomials Higher likelihood of finding more complex library elements Achieved by transformation such as loop unrolling, constant and variable propagation, inlining… Polynomial representation for the critical code segments is calculated as follows: • Linear functions are extracted directly • Bit manipulations or Boolean functions use interpolation-based algorithms [Smith01] • Nonlinear functions are approximated with a Taylor or Chebyshev expansion whose accuracy is verified via cycleaccurate simulation Profiling Critical Code Polynomial Formulation Target Code Source Code Software Profile for ( i=0; i<30; i++) { x[i] = y[i] + 2 * x[i + 1]; z[i] -= x[i]; y[i] = x[i] + z[i]; } fun energy ----------------15% getD sort 10% init 2% ... LD R21, #30; ADD R21, R23,R27; ... ARM Instruction-level Simulator Profiler Processor Core Model L1 Cache Energy Consumption Processor & L1 Cache Energy Model Interconnect Energy Model L2 Cache Energy Model DC-DC Converter Energy Model Battery Memory Energy Model Tajana Simunic Rosing Mapping Algorithm Polynomial Representation of Critical Code THR Factor Expand Polynomial Rep. of Library Elements Horner Select Side Relation Set Add to Side Relation Set No Simplify Mapped? Yes Choose Best Solution Tajana Simunic Rosing Example Phase shift keying modulation Map a code segment of PSK to Library S := 1-.5*x0^2-x0*x1-.5*x1^2+.041667*x0^4+.166668*x0^3*x1 +.250002*x0^2*x1^2+.166668*x0*x1^3+.041667*x1^4; Butterfly IDCT PSK Cos Sin Mac Library Tajana Simunic Rosing Example Phase shift keying modulation Map a code segment of PSK to Library S := 1-.5*x0^2-x0*x1-.5*x1^2+.041667*x0^4+.166668*x0^3*x1 +.250002*x0^2*x1^2+.166668*x0*x1^3+.041667*x1^4; Butterfly IDCT › siderel := {y=x0+x1}; PSK › simplify(S, siderel, [x0,x1,y]); Cos 1.+.041667*y^4-.5*y^2 y := x0 + x1; s := cos(y); Sin Mac Library Tajana Simunic Rosing Experimental Setup PCMCIA power side 1 PCMCIA slot Processor power SA-1100 based embedded system with MP3 from ISO as an application side 2 System & component power measurements, timer for performance Tool implemented in C with calls to Maple V SDRAM slot Fine grain power measurements by data acquisition board Data acquisition board Tajana Simunic Rosing Execution time (s) Original Code r rd e III _ ec o _d fm an hu One frame decodes in 2.6 seconds Profiler results show three critical functions re o d. .. eo st er III _ Functions III _ III _ an tia lia br id III _ hy tL in v yn th an dS _m dc s es i .. e_ s. Su bB tiz qu an de III _ s 1.4000 1.2000 1.0000 0.8000 0.6000 0.4000 0.2000 0.0000 Generate as large as possible polynomials for the critical functions Tajana Simunic Rosing MP3 Final Results Code version Perf (s) Original 503.92 IPP SubBand 301.43 IPP SubBand & IMDCT 211.27 In-House (IH) Library 5.47 In-House + IPP SubBand 3.33 IH+IPP SubBand & IMDCT 1.43 IPP MP3 0.41 Factor Energy (mWhr) Factor 1.0 509.553 1.0 1.7 292.516 1.7 2.4 199.123 2.6 92.1 4.461 114.2 151.4 2.7955 182.3 352.4 1.1708 435.2 1240.8 0.3133 1626.4 Runs a factor of four faster than real-time Additional energy savings are possible by using frequency and voltage scaling Tajana Simunic Rosing Summary Server Resource Management Scheduling algorithms for multi-client heterogeneous wireless embedded systems Large energy savings with little or no cost in performance Embedded HW management of NOCs Node and network-centric approaches with closed-loop control Power savings of a factor of 4 with network-centric approach, while performance penalty reduced by more than 10% Embedded software library mapping methodology Symbolic algebra method maps to pre-optimized library elements Significant productivity improvement with large energy savings Tajana Simunic Rosing Next steps Server Resource Management Combine independent and managed clients Prediction techniques for changes in the channel Better scheduling • Knowledge of the environment for help with scheduling (e.g. location) • Seamless transition to WAN when needed • Heuristics to help EDF/RM algorithms handle conflicts Embedded HW management of NOCs Integrated resource management for cores and the interconnect topology Reliability as another aspect of RM Embedded software optimization Integration of optimizations with compilers Driver & OS optimization Hardware-driven software optimization • Given specific set of hardware components, apply appropriate optimizations automatically Tajana Simunic Rosing Thank you!!! Bluetooth Supports point-to-point and point-to-multipoint (piconet) synchronous and asynchronous connections Maximum throughput for asynchronous connections is 109-723 kbps Supported low power modes: Hold, Park, Sniff and Deep sleep (vendor specific) Tran.Time (m sec) OFF STANDBY INQUIRE (UNKNOWN ADDRESS) PAGE (KNOWN ADDRESS) Active Mode 0.18 Hold Mode 0.061 Hold entry 1.68 0.068 Hold exit 11.62 0.216 Park Mode TRANSMIT DATA (DH/DM PACKET) CONNECTED Master/Slave PARK HOLD DEEP SLEEP Power (W) 0.061 Park entry 2.16 0.077 Park exit 4.12 0.126 Sniff Mode SNIFF 0.061 Sniff entry 0.94 0.078 Sniff exit 7.36 0.194 Deep Sleep 270μ Tajana Simunic Rosing WaveLAN-802.11b Designed to work in adhoc as well as infrastructure mode System states: Active (Transmit/Receive), Doze (802.11b PM) & Off IEEE standard power management Traffic indication map (TIM) after every 100ms Doze mode activation if no data present Tran. Time (ms) Power (W) Transmit - 1.4 Receive - 0.93 Doze - 0.045 –0.93 Off to Doze State Doze entry Doze exit 0.1 1.4 1 1.6 Active to Off State Off entry Off exit 1 1.7 300 2.3 Tajana Simunic Rosing Server RM versus Client PM Server has the additional knowledge of multiple clients workload & traffic conditions Efficient transmission coordination in multiple clients environment Scheduling based on the link quality Server • Client PM can predict and schedule only according to its user’s and application requirements, no knowledge of the rest of the environment decides the RM policy Enable/disable various resouce characteristics (e.g. BT park mode) Adjust resource parameters (e.g. doze interval for 802.11b PM) Switch off wireless, set duration of the OFF intervals On time wake-up (no delay) Server RM relies on the client PM Client PM controls the devices Client PM Architecture OS PM Apps Kernel Device driver traffic Hardware Tajana Simunic Rosing Client PM API operating system Power Manager resource needs Applications Kernel workload Device driver CLK speed, power states Hardware Power management of devices Active, Idle & Sleep state control Access mechanisms Buffer management Client power management • Application driven • Frequency/voltage scaling • Idle and Sleep state implementation Tajana Simunic Rosing Video Streaming Example SERVER PM SHUT-OFF, DELAY, PM ENABLE/DISABLE, SCHEDULE COMM Client device CLIENT PM TCP DECODING RATE, SNR, BUFFER SIZE MEDIA SERVER VIDEO DATA RTP/UDP MPEG4 Decoder FEEDBACK RCTP Tajana Simunic Rosing Client Power Management 2.8 2.4 2 1.6 No Policy Power (W) DPM DPM+DCS@191MHz DCS = Dynamic Processor Clock Setting DPM = Dynamic Power Management of Processor and Devices real-time MP3 and Distributed Speech Recognition with 30% power reduction using Client PM Tajana Simunic Rosing Exponential estimation Maximum likelihood estimator versus exponential average and ideal 120 Exp. Average (gain=0.03) Exp. Average (gain=0.05) Maximum likelihood Ideal 100 Rate 80 60 40 20 0 0 100 200 300 400 500 600 700 800 900 1000 Sample Number Exp 1 e et Tajana Simunic Rosing Pareto estimation Estimate Value 1 0.1 Estimate (a) Estimate(b) 0.01 0 20 40 60 80 100 Sample Number Pareto 1 b t a Estimate errors are below 10% in steady state Tajana Simunic Rosing Symbolic Library Mapping Given: Polynomial rep. of critical code sections S A characterized library of complex elements A routine for accuracy and cost feedback Goal: Decompose S into available complex library Optimize power consumption and performance Use symbolic computer algebra routines & algorithms Higher level optimizations possible Example: factorization and simplification Output: library element(s) implementing S Tajana Simunic Rosing