Download slides - shiftleft.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Voltage optimisation wikipedia , lookup

Distributed generation wikipedia , lookup

Transcript
Reduced Energy Decoding of
MPEG Streams
Malena Mesarina, HP Labs/UCLA CS Dept
Yoshio Turner, HP Labs
Multimedia Computing and
Networking Jan 2002
1
System Environment
• Portable client – limited battery life
• Multimedia server – ample compute/storage
• Application – stored media streaming with
MPEG decoding performed by the client
2
Problem
• Tradeoff – client energy consumption
increases with media stream quality
• User should be able to choose the operating
point to balance quality and battery life
• Goal: improve the energy/QoS tradeoff by
reducing the energy consumption required for
each level of media quality
3
Approach
• Idea: exploit the ample resources of servers to
improve client battery life
• Client supports a discrete set of voltages and clock
frequencies
–  voltage   speed,  energy consumption
– Dynamic Voltage Scaling – DVS
• Server pre-processes (offline) stored media
– Computes frame decoding order
– Assigns voltage/frequency per frame
– Transmits schedule to client for DVS execution
4
Contributions
• New DVS scheduling algorithm
– Minimizes CPU energy consumption
– Satisfies timing constraints
– Satisfies buffering constraints
• Quantification of the energy-QoS tradeoff
• Evaluation of the impact of DVS and client
design parameters (processor speed, buffering)
on the energy-QoS tradeoff
5
Decoding Hardware Organization
Audio Display Buffer
Decoding order: I0 P1 B2 B3 P2 ...
decoder
Video Display Buffer
Input fifo
B3
Past
I0
Future P1
Reference Buffers
6
Naïve Scheduling is Bad
Audio
Voltage
Video
a0
a1
[Time/Energy]
[Time/Energy]
Voltage
v0
v1
[Time/Energy]
[Time/Energy]
V_hi
2 / 12
7 / 6
V_hi
2 / 10
3 / 5
V_lo
8 / 4
11 /
V_lo
7 / 5
7 / 4
3
Deadline
14
24
Deadline
10
20
Start time
0
0
Start time
0
10
Naive scheduling = EDF task order + greedy voltage assignment.
NAIVE
5
v0
12 Idle
a0
4
v1
6
a1
Energy: 5 + 12 + 4 + 6 = 27
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
NEW
4
a0
10
v0
5
v1
11
a1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Energy: 4 + 10 + 5 + 3 = 22
7
DVS Scheduling Algorithm
• Goal : minimize energy consumption
– For a uni-processor client find voltage-frequency
settings per frame and interleaved order of decoding
frames
• Subject to the following constraints
–
–
–
–
–
Frames within a stream are in a fixed decoding order
Frame decode interdependence (I-, P-, B-frames)
Display rates for video (33 fps) and audio (44 Khz)
Audio/Video synchronization: 80 ms
Limited client display buffer capacity
8
DVS Scheduling Algorithm
(continued)
• Approach: dynamic programming
– Find the energy optimal subschedule that completes
the first i video and j audio tasks by time t, over
search space (i,j,t). Report the best results over all
possible t for the full media.
– Search space is reduced by exploiting our
knowledge of the constraints
9
Main Challenges
• Frame decoding inter-dependencies: B-frames
depend on future P-frames
– Decoding order not equal to display order
– Construct a mapping function from frame decode
number to frame display number in order to compute
correct deadlines
• Limited buffer capacity
– Algorithm must have overflow avoidance mechanism
• Multiple voltage levels and possible frame
decoding orders
– Intractable search space, pruning necessary
10
Fixed Display Buffer Capacity
• Overflow prevention: Translate buffer
constraints to timing constraints
• Assign minimum decoding start times to tasks
display
I0
B2
B2
B3
B3
P1
B5
Suppose display buffer is full (contains
previously decoded frames)
Earliest time to enqueue (min start time) for B5
is when head frame I0 leaves buffer to be
displayed
The head frame I0 is identified using the frame
display order and buffer capacity
11
Key to Tractable Execution
• Limit the number of combinations of (i, j, t)
– Limit the range of subschedule completion times t
(time windows)
– Limit the combinations of (i,j) by detecting
“dead-end” subschedules
 small number of (i,j) pairs, each with small time
window
12
Limiting Completion Times:
Time Window
• A window represents possible completion
times of i video and j audio tasks.
• Lower Bound, Tmin(i,j): earliest time when
the last task in both streams can complete
• Upper Bound, Tmax(i,j): latest time when the
last task in both streams can complete
• Tmax – Tmin ~ (1/frame_rate) * buffersize
13
Time Window Example
(i + 1)-th
video frame
i-th video
frame
tmin[i,j]
tmin[i+1,j]
tmax[i,j]
tmax[i+1,j]
Time
14
Only some (i,j) subschedules lead to
complete schedule
Video
Audio
Video frame in display
11
10
12
13
Audio frame in display
14
10
N = #frames
B = buffer size
Ts = 1/frame rate
Scheduling (i,j) = (10,14) POSSIBLE BUT
Scheduling (i,j) = (10,15) is NOT POSSIBLE because AUDIO BUFFER
OVERFLOWS!
(i,j) is limited by the buffer size
Algorithm Complexity: O(N *B)  O(B* TS) = O(N * B 2 * Ts)
15
Performance Evaluation:
Energy vs QoS Exploration
• Variability in frame execution times
– Potential for energy reduction?
• Energy savings vs picture quality
– For what range of quality is DVS helpful ?
– How much improvement is in that range ?
• Impact of client design parameters on energy vs
QoS
– How does processor speed change tradeoff ?
– Will extra buffering ease schedulability? Reduce energy?
16
Time [us]
Frame Execution Times
60000
58000
56000
54000
52000
50000
48000
46000
44000
42000
40000
38000
36000
34000
32000
30000
28000
26000
24000
22000
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
0
50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000
frame number
17
Energy-QoS tradeoff: Fast
Processor + Fixed Buffer Size
• Pentium 3 (1.9V@500MHz, 1.4V@316 MHz)
• Display buffers: 2 for video, 2 for audio
• Scale factor = frame pixels/max frame pixels
17000
16000
15000
14000
13000
12000
11000
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
50
dvs
hi volt
lo volt
47%
40
30
20
19% 
10
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
scale factor
0.8
0.9
1
0.7
0.75
0.8
0.85
0.9
0.95
1
scale factor
Energy improvement over range of high
resolution scale factors, 0.6 to 1
18
Energy-QoS tradeoff: Slow
Processor + Variable Buffer Size
• Pentium 2: (1.7V@300MHz, 1.4V@225MHz)
• Variable buffering : (video,audio) (1,1), (3,3) (6,3)
•Increasing buffering does not improve energy
significantly
19
•Extra buffers enable decoding of higher QoS video
Summary and Conclusions
• Offline algorithm finds a low energy schedule that
respects:
– Timing constraints (display rate, synchronization)
– Limited memory at client
• DVS significantly reduces energy consumption
• Increasing buffer size
– No impact on energy but
– Enables higher video quality
20
Future Work
• Online scheduling
– Offline schedule represents lower bound on energy
• Exploration of other tradeoff media parameters
(frame rate, display brightness)
• Implementation with progressive coding
schemes (JPEG2000)
21
Experimental Setup
• Fixed voltage/frequency processors: P3 and P2
• Computed time/energy per frame at fixed voltage
• Extrapolated time/energy per frame at other operational
core voltages
• Assumptions:
– Frequency is inverse proportional to gate delay
– Cycles/frame remains constant for different frequencies
– Power dissipation constant for a given voltage setting
22
Extrapolation Example
• Given: Vhi, Fhi, , Thi, Phi
–
–
–
–
Flo = Fhi * hi/lo = Fhi * Vhi/(Vhi-Vt)2 (1)
Tlo = cycles/Flo = Fhi * Thi/Flo
(2)
Plo = Phi * (Flo* Vlo2)/(Fhi * Vhi2)
Elo = Plo * Tlo
(3)
23
Related Work
• Problem we address
– Real-time scheduling of non-preemptable tasks with
precedence constraints
• Other real-time schedulers treat different cases
– [1] Liu and Layland, “Scheduling algorithms for
multiprogramming in a hard-real-time environment”
– [2] Yao et al. “A scheduling model for reduced CPU
energy”
• No precedence constraints and preemptable tasks
– [3] Hong et al. “Power optimization of variable voltage
core-based systems”
• Heuristics for non-preemptable tasks but no precedence
constraints
24
Frame Interdependence
• Map frame number i in decoding order to
frame number d(i) in display order
0
1
2
3
5
6
7
d(i) =
d(
I0 P1 B2 B3 P4 B5 B6
decode order
I0 B2 B3 P1 B5 B6 P2
display order
{
i - 1 if B frame
i + m(i) if P/I frame
m(1) = 2 B frames after P
25