Download elg7186_15power - School of Electrical Engineering and

Document related concepts

Opto-isolator wikipedia , lookup

Public address system wikipedia , lookup

Islanding wikipedia , lookup

Buck converter wikipedia , lookup

Stray voltage wikipedia , lookup

Power engineering wikipedia , lookup

Surge protector wikipedia , lookup

Distributed generation wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Distribution management system wikipedia , lookup

History of electric power transmission wikipedia , lookup

Fault tolerance wikipedia , lookup

Voltage optimisation wikipedia , lookup

Alternating current wikipedia , lookup

Mains electricity wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Embedded system wikipedia , lookup

Transcript
Hardware/Software
Codesign
of Embedded Systems
Voicu Groza
School of Information Technology and Engineering
[email protected]
Power/Voltage
Management
Embedded Systems
• Power/Energy Aware Embedded Systems
• Dynamic Voltage Scheduling
• Dynamic Power Management
Surpassed hot
(kitchen) plate …?
Why not use it?
University of Ottawa, SITE, 2008
http://www.phys.ncku.edu.tw/~htsu/humor/fry_egg.html
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Processing units
Need for efficiency (power + energy):
Why worry about
energy and power?
„Power is considered as the most important constraint
in embedded systems“
[in: L. Eggermont (ed): Embedded Systems Roadmap 2002, STW]
Current smart phones can hardly be operated for more
than an hour, if data is being transmitted.
[from a report of the Financial Times, Germany, on an analysis by Credit Suisse First Boston;
http://www.ftd.de/tm/tk/9580232.html?nv=se]
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
The energy/flexibility conflict
- Intrinsic Power Efficiency Operations/Watt
[MOPS/mW]
Ambient Intelligence
10
DSP-ASIPs
1
µPs
0.1
poor design
techniques
0.01
1.0µ
0.5µ
0.25µ
0.13µ
Necessary to optimize HW/SW;
otherwise the prize for software flexibility
cannot be paid!
University of Ottawa, SITE, 2008
0.07µ
Technology
[H. de Man, Keynote, DATE‘02;
T. Claasen, ISSCC99]
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Power and energy are related
to each other
P
E   P dt
E'
E
t
In many cases, faster execution also means less energy,
but the opposite may be true if power has to be increased
to allow faster execution.
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Low Power vs. Low Energy
Consumption
Minimizing the power consumption is important for
the design of the power supply
the design of voltage regulators
the dimensioning of interconnect
short term cooling
Minimizing the energy consumption is important due to
restricted availability of energy (mobile systems)
limited battery capacities (only slowly improving)
very high costs of energy (solar panels, in space)
cooling
high costs
limited space
dependability
long lifetimes, low temperatures
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Application Specific Circuits (ASICS)
or Full Custom Circuits
Custom-designed circuits necessary
if ultimate speed or
energy efficiency is the goal and
large numbers can be sold.
Approach suffers from
long design times,
lack of flexibility
(changing standards) and
high costs
(e.g. Mill. $ mask costs).
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Mask cost for specialized HW
becomes very expensive
Trend
towards
implementation
in Software
[http://www.molecularimprints.com/Technology/
tech_articles/MII_COO_NIST_2001.PDF9]
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Power Consumption of a Gate
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Fundamentals of dynamic voltage
scaling (DVS)
Power consumption of CMOS
circuits (ignoring leakage):
P   CL Vdd2 f
Delay for CMOS circuits:
 : switching activity
Vdd
  k CL
Vdd  Vt 2
CL : load capacitanc e
Vt : threshhold voltage
Vdd : supply vol tage
(Vt  Vdd )
f : clock frequency
 Decreasing Vdd reduces P quadratically,
while the run-time of algorithms is only linearly increased
(ignoring the effects of the memory system).
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Potential for Energy
Optimization
2
P   CL Vdd
f
E   CL V
2
dd
f t   CL V
2
dd
# Cycles
Saving Energy under given Time Constraints:
– Reduce the supply voltage Vdd
– Reduce switching activity α
– Reduce the load capacitance CL
– Reduce the number of cycles #Cycles
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Processors
At the chip level, embedded chips include micro-controllers
and microprocessors. Micro-controllers are the true
workhorses of the embedded family. They are the original
’embedded chips’ and include those first employed as
controllers in elevators and thermostats [Ryan, 1995].
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Voltage Scaling and Power Management
Dynamic Voltage Scaling
Energy / Cycle [nJ]
Vdd
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Power density continues to get worse
Nuclear reactor
Prescott: 90 W/cm²,
90 nm [c‘t 4/2004]
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Need to consider CPU & System Power
Mobile PC
Thermal Design (TDP) System Power
Other
13%
Other
13%
600/500 MHz uP
37%
Power Supply
10%
600/500 MHz uP
13%
Power Supply
10%
Memory+Graphics
12%
HDD
9%
Mobile PC
Average System Power
LCD 10"
30%
Memory+Graphics
15%
LCD 10"
19%
HDD
19%
Note: Based on Actual Measurements
CPU Dominates Thermal
Design Power
[Courtesy: N. Dutt; Source: V. Tiwari]
University of Ottawa, SITE, 2008
Multiple Platform
Components Comprise
Average Power
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
New ideas can actually reduce
energy consumption
Pentium
Crusoe
Running the same multimedia application.
As published by Transmeta [www.transmeta.com]
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Dynamic power management (DPM)
Example: STRONGARM SA1100
RUN: operational
IDLE: a sw routine may
stop the CPU when not
in use, while monitoring
interrupts
SLEEP: Shutdown of onchip activity
400mW
RUN
10µs
160ms
10µs
90µs
IDLE Power fault SLEEP
signal
50mW
University of Ottawa, SITE, 2008
160µW
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Variable-voltage/frequency
example: INTEL Xscale
From Intel’s Web Site
OS should
schedule
distribution
of the
energy
budget.
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Key requirement #2:
Code-size efficiency
CISC machines: RISC machines designed for run-time-,
not for code-size-efficiency
Compression techniques: key idea
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Code-size efficiency
001 10
major
opcode
Rd
Constant
16-bit Thumb instr.
ADD Rd #constant
source=
minor
opcode destination
1110 001 01001
0 Rd
zero extended
0 Rd 0000 Constant
• Reduction to 65-70 % of original code size
• 130% of ARM performance with 8/16 bit memory
• 85% of ARM performance with 32-bit memory
Same approach for LSI TinyRisc, …
Requires support by compiler, assembler etc.
University of Ottawa, SITE, 2008
Dynamically
decoded at
run-time
Compression techniques (continued):
2nd instruction set, e.g. ARM Thumb instruction set:
[ARM, R. Gupta]
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Dictionary approach, two level control store
(indirect addressing of instructions)
“Dictionary-based coding schemes cover a wide range of
various coders and compressors.
Their common feature is that the methods use some kind of a
dictionary that contains parts of the input sequence which
frequently appear.
The encoded sequence in turn contains references to the
dictionary elements rather than containing these over and
over.”
[Á. Beszédes et al.: Survey of Code size Reduction Methods, Survey of Code-Size
Reduction Methods, ACM Computing Surveys, Vol. 35, Sept. 2003, pp 223-267]
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Key idea (for d bit instructions)
b
instruction
address
S
For each
instruction
address, S
a contains table
address of
instruction.
b « d bit
c≦
2b
table of used instructions
(“dictionary”)
d bit
CPU
University of Ottawa, SITE, 2008
Uncompressed storage of
a d-bit-wide instructions
requires axd bits.
In compressed code, each
instruction pattern is
stored only once.
small
Hopefully, axb+cxd < axd.
Called nanoprogramming
in the Motorola 68000.
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Key requirement #3: Run-time efficiency
- Domain-oriented
architectures n-1
Application: y[j] = i=0 x[j-i]*a[i]
i: 0i  n-1: yi[j] = yi-1[j] + x[j-i]*a[i]
Architecture: Example: Data path ADSP210x
P
a
D x
AX
Addressregisters
A0, A1, A2
..
i+1, j-i+1
Address
generation
unit (AGU)
x[j-i]
AY
MX
AF
+,-,..
AR
University of Ottawa, SITE, 2008
MY
a[i]
MF
* x[j-i]*a[i]
+,yi-1[j]
MR
Application
maps nicely
onto
architecture
MR:=0; A1:=1; A2:=n-2;
MX:=x[n-1]; MY:=a[0];
for ( j:=1 to n)
{MR:=MR+MX*MY;
MY:=a[A1]; MX:=x[A2];
A1++; A2--}
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Modulo addressing
Modulo addressing:
Am++  Am:=(Am+1) mod n
(implements ring or circular
buffer in memory)
sliding window
x
t1
n most
recent
values
..
x[t1-1]
x[t1]
x[t1-n+1]
x[t1-n+2]
..
..
x[t1-1]
x[t1]
x[t1+1]
x[t1-n+2]
..
Memory, t=t1
University of Ottawa, SITE, 2008
t
Memory, t2=t1+1
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Saturating arithmetic
 Returns largest/smallest number in case of over/underflows
 Example:
a
0111
b
+
1001
standard wrap around arithmetic
(1)0000
saturating arithmetic
1111
(a+b)/2: correct
1000
wrap around arithmetic
0000
saturating arithmetic + shifted
0111 „almost correct“
 Appropriate for DSP/multimedia applications:
• No timeliness of results if interrupts are generated for overflows
• Precise values less important
• Wrap around arithmetic would be worse.
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Fixed-point arithmetic
Shifting required after multiplications and divisions in
order to maintain binary point.
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Properties of fixed-point
arithmetic
Automatic scaling a key advantage for multiplications.
Example:
x= 0.5 x 0.125 + 0.25 x 0.125 = 0.0625 + 0.03125 = 0.09375
For iwl=1 and fwl=3 decimal digits, the less significant digits are
automatically chopped off: x = 0.093
Like a floating point system with numbers  [0..1),
with no stored exponent (bits used to increase precision).
Appropriate for DSP/multimedia applications
(well-known value ranges).
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Spatial vs. Dynamic
Supply Voltage Management
Slow
Module
1.3V 50MHz
Standard
Modules
1.8V
100MHz
Normal Mode
1.3 V
50MHz
Busy
Module
Busy Mode
3.3V 200MHz
3.3 V
200MHz
Analogy of biological blood systems:
• Different supply to different regions
Not• all
components
require
Required
High
pressure: High
pulse count and High
activity performance
same
performance.
may
change over time
• Low
pressure: Low pulse count and Low
activity
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Example: Processor with 3 voltages
Case a): Complete task ASAP
Task that needs to execute 109 cycles within 25 seconds.
Ea= 109 x 40 x 10-9
= 40 [J]
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Case b): Two voltages
Eb= 750 106 x 40 x 10-9
+ 250 106 x 10 x 10-9
= 32.5 [J]
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Case c): Optimal voltage
Ec = 109 x 25 x 10-9
= 25 [J]
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Observations
 A minimum energy consumption is achieved for the
ideal supply voltage of 4 Volts.
In the following: variable voltage processor =
processor that allows any supply voltage up to a certain
maximum.
It is expensive to support truly variable voltages, and
therefore, actual processors support only a few fixed
voltages.
Ishihara, Yasuura: “Voltage scheduling problem for dynamically
variable voltage processors”, Proc. of the 1998 International
Symposium on Low Power Electronics and Design (ISLPED’98)
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Generalization
Lemma [Ishihara, Yasuura]:
• If a variable voltage processor completes a task before the
deadline, then the energy consumption can be reduced.
• If a processor uses a single supply voltage V and completes
a task T just at its deadline, then V is the unique supply voltage
which minimizes the energy consumption of T.
• If a processor can only use a number of discrete voltage
levels, then a voltage schedule with at most two voltages
minimizes the energy consumption under any time constraint.
• If a processor can only use a number of discrete voltage
levels, then the two voltages which minimize the energy
consumption are the two immediate neighbors of the ideal
voltage Videal possible for a variable voltage processor.
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
The case of multiple tasks:
Assigning optimum voltages to a set of tasks
N : the number of tasks
ECj : the number of execution cycles of task j
L : the number of voltages of the target processor
Vi : the ith voltage, with 1  i  L
Fi : the clock frequency for supply voltage Vi
T : the global deadline at which all tasks must have been
completed
SCj : the average switching capacitance during the execution
of task j (SCi comprises the actual capacitance CL and the
switching activity )
Xi, j : the number of clock cycles task j is executed at voltage Vi
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Designing an IP model
Simplifying assumptions of the IP-model include the following:
• There is one target processor that can be operated at a
limited number of discrete voltages.
• The time for voltage and frequency switches is negligible.
• The worst case number of cycles for each task are known.
N
Minimize
L
E   SC j  xi , j Vi 2
j 1 i 1
Subject to j :
L
x
i 1
University of Ottawa, SITE, 2008
i, j
 EC j
L
and
N
xi , j
 F
i 1 j 1
i
T
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Experimental Results
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Voltage Scheduling Techniques
• Static Voltage Scheduling
• Extension: Deadline for each task
• Formulation as IP problem (SS)
• Decisions taken at compile time
• Dynamic Voltage Scheduling
• Decisions taken at run time
• 2 Variants:
• arrival times of tasks is known (SD)
• arrival times of tasks is unknown (DD)
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Dynamic Voltage Control
by Operating Systems
Voltage Control and Task Scheduling by Operating
System to minimize energy consumption
Okuma, Ishihara, and Yasuura: “Real-Time Task Scheduling for a
Variable Voltage Processor”, Proc. of the 1999 International
Symposium on System Synthesis (ISSS'99)
Target:
• single processor system
• Only OS can issue voltage control instructions
• Voltage can be changed anytime
• only one supply voltage is used at any time
• overhead for switching is negligible
• static determination of worst case execution cycles
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Problem for Operating Systems
arrival time
2.5V
Task1
deadline
5.0V
Task2
4.0V
Task3
What is the optimum supply voltage
assignment for each task in order to obtain
minimum energy consumption?
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
The proposed Policy
task
task
Time slot: T
Consider a time slot the task can use
without violating real-time constraints
of other tasks executed in the future
Once time slot is determined:
• The task is executed at a frequency of WCEC / T Hz
• The scheduler assigns start and end times of time slot
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Two Algorithms
Two possible situations:
• The arrival time of tasks is known:
SD Algorithm
Static ordering and Dynamic voltage assignment
• The arrival time of tasks is unknown
DD Algorithm
Dynamic ordering and Dynamic voltage assignment
CPU Time Allocation
Start Time Assignment
End Time Prediction
University of Ottawa, SITE, 2008
SD
off-line
on-line
off-line
DD
on-line
on-line
on-line
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
SD Algorithm (CPU Time Allocation)
• Arrival time of all tasks is known
• Deadline of all tasks is known
• WCEC of all tasks is known
CPU time can be allocated statically
CPU time is assigned to each task:
• assuming maximum supply voltage
• assuming WCEC
Task1
Task2
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
SD Algorithm (Start Time Assignment)
Current time
• In SD, it is possible to assign lower supply voltage to
Task2 using the free
time
Task1
Task2
WCEC
@
• In SS,
the scheduler
can’t use
the free time because it has
Vmax
statically
assigned voltage
Current time
Free time
Task1
Task2
Task2
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
DD Algorithm
Current time
When the task’s arrival time is unknown, its end time
Task1
can’t be predicted statically using the SD algorithm
Task2
 No predetermined CPU time, start or end times
Start Time Assignment:
• New task arrives – it either:
a) Preempts currently executing task
b) Starts right after currently executing task
 Starting time is determined
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
DD Algorithm (cont.)
Current time
Completion time assigned
at CPU time allocation
Task1
Task2
End Time Prediction:
Based on the currently executing task’s end time prediction,
add the new task’s WCEC time at maximum voltage
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
DD Algorithm (cont.)
Current time
Task1
Task1
Task2
Task2
 If the currently executing task finishes earlier, then new
task can start sooner and run slower at lower voltage
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Comparison: SD vs. DD
 SD Algorithm:
Task
End Time
Start Time
 DD Algorithm:
Task
Start Time
University of Ottawa, SITE, 2008
End Time
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Experimental Results: Energy
Normal: Processor runs at maximum supply voltage
SS:
Static Scheduling
SD:
Scheduling done by SD Algorithm
DD:
Scheduling done by DD Algorithm
Task
1
2
3
4
3
5
Energy
University of Ottawa, SITE, 2008
Normal
5.0V
5.0V
5.0V
5.0V
5.0V
5.0V
1615J
SS
4.0V
4.0V
5.0V
2.5V
2.5V
2.5V
702J
SD
4.0V
4.0V
4.0V
2.5V
2.5V
2.5V
685J
DD
5.0V
5.0V
5.0V
5.0V
4.0V
4.0V
1357J
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Dynamic power management
Dynamic Power management
tries to assign optimal
(DPM)
power saving states
Requires Hardware Support
Example: StrongARM SA1100
400mW
RUN: operational
IDLE: a sw routine may stop
the CPU when not in use,
while monitoring interrupts
SLEEP: Shutdown of on-chip
activity
University of Ottawa, SITE, 2008
RUN
10us
90us
160ms
10us
90us
IDLE
SLEEP
50mW
160uW
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
The opportunity:
Reduce power according to workload
device states
shut down
busy
working
power states
wake up
busy
idle
Tsd
sleeping
Twu
working
Tbs
Tsd: shutdown delay
Twu: wakeup delay
Tbs: time before shutdown
Tbw: time before wakeup
Desired: Shutdown only during long idle times
 Tradeoff between savings and overhead
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
The challenge
Questions:
• When to go to a power-saving state?
• Is an idle period long enough for
shutdown?
 Predicting the future
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Adaptive Stochastic Models
Sliding Window (SW): [Chung DATE 99]
B
I
………... B
B
I
I
B
I
I
B
B
B
time
• Interpolating pre-computed optimization tables to
determine power states
• Using sliding windows to adapt to non-stationarity
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Comparison of different
approaches
Algorithm
off-line
Semi-Markov
Sliding Window
Device-Specific Timeout
Learning Tree
Exponential Average
always on
P
0.33
0.40
0.43
0.44
0.46
0.50
0.95
Nsd
250
326
191
323
437
623
-
Nwd
0
76
28
64
217
427
-
P : average power
Nsd: number of shutdowns
Nwd : wrong shutdowns (actually waste energy)
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
What about multitasking?
 Coordinate multiple workload sources
user
requesters
program
program
program
operating system
device
University of Ottawa, SITE, 2008
power manager
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Requesters
Concurrent processes
– Created, executed, and terminated
– Have different device utilization
– Generate requests only when running
(occupy CPU)
Power manager is notified when processes change state
We use processes to represent requesters
requester = process
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Task Scheduling
Rearrange task execution to cluster
similar utilization and idle periods
t1
t2
t3
1
2
1
2
1
idle
t1
t2
t3
3
1
2
1
2
T
1
time
idle
2
idle
University of Ottawa, SITE, 2008
3
3
2
3
T: time quantum
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Power-aware OS
implementations
• Windows APM and ACPI
Device-centric, shutdown based
• Power-aware Linux
Good research platform (several partial
implementations, es. U. Delft, Compaq, etc.)
Quite high-overhead for low-end embedded systems
• Power-aware ECOS
Good research platform (HP-Unibo implementation)
Lower overhead than Linux, modular
• Micro OSes
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Application Aware DPM –
Example: Communication Power
NICs powered by portables reduce battery life
8 hours
2.5 hours
• In general:
Higher bit rates lead to higher power consumption
• 90% of power for listening to a radio channel
 Proper use of PHY layer services by MAC is critical!
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Off mode power savings
Server
Access Point
Buffering
Refill
Beacons
Request
Request
Client
Power
Doze mode
time
Off mode
Energy saving
time
Playback
Low water mark
University of Ottawa, SITE, 2008
Buffer full
Playing
Playback
LWM reached
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
LWM / Buffer characteristics
Where to put
the LWM?
How long
should the
buffer be?
•
•
•
Higher error probability
Exploits NIC off-state
Min. value to allow data acquisition
•
•
•
lower error probability
Incurs NIC off-state overhead
Max. value: Buffer_length–1 block
•
•
Depends on memory availability
The longer the buffer, the higher
the NIC off-state benefits
Buffering Strategies should be Power Aware!
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Comparison
•
•
Low length buffers incur off mode power overhead
Good power saving for high length buffers
University of Ottawa, SITE, 2008
VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
Exploiting application
knowledge
Approximate processing [Chandrakasan98-01]
Tradeoff quality for energy (es. lossy compression)
Design algorithms for graceful degradation
Enforce power-efficiency in programming
Avoid repetitive polling [Intel98]
Use event-based activation (interrupts)
Localize computation whenever possible
Helps shutdown of peripherals
Helps shutdown of memories
University of Ottawa, SITE, 2008