Download slides - shiftleft.com

Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs Multimedia Computing and Networking Jan 2002 1 System Environment • Portable client – limited battery life • Multimedia server – ample compute/storage • Application – stored media streaming with MPEG decoding performed by the client 2 Problem • Tradeoff – client energy consumption increases with media stream quality • User should be able to choose the operating point to balance quality and battery life • Goal: improve the energy/QoS tradeoff by reducing the energy consumption required for each level of media quality 3 Approach • Idea: exploit the ample resources of servers to improve client battery life • Client supports a discrete set of voltages and clock frequencies –  voltage   speed,  energy consumption – Dynamic Voltage Scaling – DVS • Server pre-processes (offline) stored media – Computes frame decoding order – Assigns voltage/frequency per frame – Transmits schedule to client for DVS execution 4 Contributions • New DVS scheduling algorithm – Minimizes CPU energy consumption – Satisfies timing constraints – Satisfies buffering constraints • Quantification of the energy-QoS tradeoff • Evaluation of the impact of DVS and client design parameters (processor speed, buffering) on the energy-QoS tradeoff 5 Decoding Hardware Organization Audio Display Buffer Decoding order: I0 P1 B2 B3 P2 ... decoder Video Display Buffer Input fifo B3 Past I0 Future P1 Reference Buffers 6 Naïve Scheduling is Bad Audio Voltage Video a0 a1 [Time/Energy] [Time/Energy] Voltage v0 v1 [Time/Energy] [Time/Energy] V_hi 2 / 12 7 / 6 V_hi 2 / 10 3 / 5 V_lo 8 / 4 11 / V_lo 7 / 5 7 / 4 3 Deadline 14 24 Deadline 10 20 Start time 0 0 Start time 0 10 Naive scheduling = EDF task order + greedy voltage assignment. NAIVE 5 v0 12 Idle a0 4 v1 6 a1 Energy: 5 + 12 + 4 + 6 = 27 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 NEW 4 a0 10 v0 5 v1 11 a1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Energy: 4 + 10 + 5 + 3 = 22 7 DVS Scheduling Algorithm • Goal : minimize energy consumption – For a uni-processor client find voltage-frequency settings per frame and interleaved order of decoding frames • Subject to the following constraints – – – – – Frames within a stream are in a fixed decoding order Frame decode interdependence (I-, P-, B-frames) Display rates for video (33 fps) and audio (44 Khz) Audio/Video synchronization: 80 ms Limited client display buffer capacity 8 DVS Scheduling Algorithm (continued) • Approach: dynamic programming – Find the energy optimal subschedule that completes the first i video and j audio tasks by time t, over search space (i,j,t). Report the best results over all possible t for the full media. – Search space is reduced by exploiting our knowledge of the constraints 9 Main Challenges • Frame decoding inter-dependencies: B-frames depend on future P-frames – Decoding order not equal to display order – Construct a mapping function from frame decode number to frame display number in order to compute correct deadlines • Limited buffer capacity – Algorithm must have overflow avoidance mechanism • Multiple voltage levels and possible frame decoding orders – Intractable search space, pruning necessary 10 Fixed Display Buffer Capacity • Overflow prevention: Translate buffer constraints to timing constraints • Assign minimum decoding start times to tasks display I0 B2 B2 B3 B3 P1 B5 Suppose display buffer is full (contains previously decoded frames) Earliest time to enqueue (min start time) for B5 is when head frame I0 leaves buffer to be displayed The head frame I0 is identified using the frame display order and buffer capacity 11 Key to Tractable Execution • Limit the number of combinations of (i, j, t) – Limit the range of subschedule completion times t (time windows) – Limit the combinations of (i,j) by detecting “dead-end” subschedules  small number of (i,j) pairs, each with small time window 12 Limiting Completion Times: Time Window • A window represents possible completion times of i video and j audio tasks. • Lower Bound, Tmin(i,j): earliest time when the last task in both streams can complete • Upper Bound, Tmax(i,j): latest time when the last task in both streams can complete • Tmax – Tmin ~ (1/frame_rate) * buffersize 13 Time Window Example (i + 1)-th video frame i-th video frame tmin[i,j] tmin[i+1,j] tmax[i,j] tmax[i+1,j] Time 14 Only some (i,j) subschedules lead to complete schedule Video Audio Video frame in display 11 10 12 13 Audio frame in display 14 10 N = #frames B = buffer size Ts = 1/frame rate Scheduling (i,j) = (10,14) POSSIBLE BUT Scheduling (i,j) = (10,15) is NOT POSSIBLE because AUDIO BUFFER OVERFLOWS! (i,j) is limited by the buffer size Algorithm Complexity: O(N *B)  O(B* TS) = O(N * B 2 * Ts) 15 Performance Evaluation: Energy vs QoS Exploration • Variability in frame execution times – Potential for energy reduction? • Energy savings vs picture quality – For what range of quality is DVS helpful ? – How much improvement is in that range ? • Impact of client design parameters on energy vs QoS – How does processor speed change tradeoff ? – Will extra buffering ease schedulability? Reduce energy? 16 Time [us] Frame Execution Times 60000 58000 56000 54000 52000 50000 48000 46000 44000 42000 40000 38000 36000 34000 32000 30000 28000 26000 24000 22000 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 frame number 17 Energy-QoS tradeoff: Fast Processor + Fixed Buffer Size • Pentium 3 (1.9V@500MHz, 1.4V@316 MHz) • Display buffers: 2 for video, 2 for audio • Scale factor = frame pixels/max frame pixels 17000 16000 15000 14000 13000 12000 11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 50 dvs hi volt lo volt 47% 40 30 20 19%  10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 scale factor 0.8 0.9 1 0.7 0.75 0.8 0.85 0.9 0.95 1 scale factor Energy improvement over range of high resolution scale factors, 0.6 to 1 18 Energy-QoS tradeoff: Slow Processor + Variable Buffer Size • Pentium 2: (1.7V@300MHz, 1.4V@225MHz) • Variable buffering : (video,audio) (1,1), (3,3) (6,3) •Increasing buffering does not improve energy significantly 19 •Extra buffers enable decoding of higher QoS video Summary and Conclusions • Offline algorithm finds a low energy schedule that respects: – Timing constraints (display rate, synchronization) – Limited memory at client • DVS significantly reduces energy consumption • Increasing buffer size – No impact on energy but – Enables higher video quality 20 Future Work • Online scheduling – Offline schedule represents lower bound on energy • Exploration of other tradeoff media parameters (frame rate, display brightness) • Implementation with progressive coding schemes (JPEG2000) 21 Experimental Setup • Fixed voltage/frequency processors: P3 and P2 • Computed time/energy per frame at fixed voltage • Extrapolated time/energy per frame at other operational core voltages • Assumptions: – Frequency is inverse proportional to gate delay – Cycles/frame remains constant for different frequencies – Power dissipation constant for a given voltage setting 22 Extrapolation Example • Given: Vhi, Fhi, , Thi, Phi – – – – Flo = Fhi * hi/lo = Fhi * Vhi/(Vhi-Vt)2 (1) Tlo = cycles/Flo = Fhi * Thi/Flo (2) Plo = Phi * (Flo* Vlo2)/(Fhi * Vhi2) Elo = Plo * Tlo (3) 23 Related Work • Problem we address – Real-time scheduling of non-preemptable tasks with precedence constraints • Other real-time schedulers treat different cases – [1] Liu and Layland, “Scheduling algorithms for multiprogramming in a hard-real-time environment” – [2] Yao et al. “A scheduling model for reduced CPU energy” • No precedence constraints and preemptable tasks – [3] Hong et al. “Power optimization of variable voltage core-based systems” • Heuristics for non-preemptable tasks but no precedence constraints 24 Frame Interdependence • Map frame number i in decoding order to frame number d(i) in display order 0 1 2 3 5 6 7 d(i) = d( I0 P1 B2 B3 P4 B5 B6 decode order I0 B2 B3 P1 B5 B6 P2 display order { i - 1 if B frame i + m(i) if P/I frame m(1) = 2 B frames after P 25

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides - shiftleft.com