Download p9 - VADA

Document related concepts

Flip-flop (electronics) wikipedia , lookup

Electrical substation wikipedia , lookup

Variable-frequency drive wikipedia , lookup

Decibel wikipedia , lookup

Control system wikipedia , lookup

Power factor wikipedia , lookup

Standby power wikipedia , lookup

Solar micro-inverter wikipedia , lookup

Opto-isolator wikipedia , lookup

Wireless power transfer wikipedia , lookup

Islanding wikipedia , lookup

Power inverter wikipedia , lookup

Voltage optimisation wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Pulse-width modulation wikipedia , lookup

Immunity-aware programming wikipedia , lookup

History of electric power transmission wikipedia , lookup

Power over Ethernet wikipedia , lookup

Electric power system wikipedia , lookup

Electrification wikipedia , lookup

Audio power wikipedia , lookup

Buck converter wikipedia , lookup

Amtrak's 25 Hz traction power system wikipedia , lookup

Distribution management system wikipedia , lookup

Power electronics wikipedia , lookup

Power MOSFET wikipedia , lookup

Alternating current wikipedia , lookup

Mains electricity wikipedia , lookup

Power engineering wikipedia , lookup

Power supply wikipedia , lookup

AC adapter wikipedia , lookup

CMOS wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Transcript
After Tech. Mapping
K 1 =3, k 2 =3
80
SIS+LEVEL MAP
70
SIS+OURS+LEVEL MAP
Improvement Ratio
Power(mW), Ratio
60
50
40
30
20
10
0
h=2
6
h=3
h=3
10
h=4
h=5
h=3
15
h=4
Fanin, Height
h=5
h=5
20
h=6
h=7
h=8
7. Circuit Level Design
Buffer Chain
•
•
Delay analysis of buffer chain
(W / L) k 1  a (W / L) k
n
Ck  a k Cin  a k 1C p
n
Td   t d (k )   a  t0  n a  t0
k 1
Pk  Ck  Vdd  f  Vdd  f  a i 1  (a  Cin  C p )
2
k 1
n
2
PT   Pk  Vdd  f  (a  Cin  C p )
C L  a  Cin
n
2
k 1
ln( C L / Cin )
ln( a )
ln( C L / Cin )
Td  a  t0 
ln( a )
 (Td )
0
a
(a ) optimum  e  2.72 ,
n
size 1
Delay analysis considering
parasitic capacitance,Cp
Eff 
a n 1
a 1
a n 1
a n 1  a n  2 
a 1
a  2 ~ 10 (typical : e)
Ck,Pk: stage k buffer output의 total capacitance, power
PT: buffer chain의 power consumption
Pn: load capacitance CL의 power consumption
(n) optimum  ln( C L / Cin )
size a
size a
i-2
Eff: power efficiency pn/pT
size ai-1
size an-1
input
C in
stage 1
stage 1
aC in
ai-1 C in
stage (i-1)
aiC in
stage i
C in = an C in
stage n
Slew Rate
•
Determining rise/fall time
I mean
I short
t3
t

2 2
   I short (t )dt   I short (t )dt 
T  t1

t2
t

4 2
   I short (t )dt 
T  t1

t

4 2 
   (Vin  Vt ) 2 dt 
T  t1 2

where,  n   p   , Vtp  Vtn  Vt
PSC  I mean  Vdd 
where, t r  t f  

2
(Vdd  2Vt ) 3 f
Period T
Vin
Vdd +V tp
tr
Vtn
Imax
Imean
t1 t2 t3
tf
Slew Rate(Cont’d)
•
Power consumption of Short circuit current in Oscillation Circuit
Vo
Vo
Vdd
Vdd
Vi
Vi
Vdd
Vi
Vdd
Vo
Pass Transistor Logic
•
•
Reducing Area/Power
– Macro cell(Large part in chip
area)
XOR/XNOR/MUX(Primitive) 
Pass Tr. Logic
– Not using charge/discharge scheme
 Appropriate in Low Power Logic
Pass Tr logic Family
– CPL (Complementary Pass
Transistor Logic)
– DPL (Dual Pass Transistor Logic)
– SRPL (Swing Restored Pass
Transistor Logic)
•
CPL
B
– Basic AScheme
B
A
B
B
AB
AB
A
B
B
– Inverter
Buffering
A
B
B
Vdddd
V
AB
AB
p-MOS Latch
Pass Transistor Logic(Cont’d)
•
DPL
– Pass Tr Network + Dual p-MOS
– Enables rail-to-rail swing
– Characteristics
• Increasing input
capacitance(delay)
• Increasing driving ability for
existing 2 ON-path
• equals CPL in input loading
capacitance
•
A
A
B
B
B
B
A
A
AB
AB
SRPL
– Pass Tr network + Cross
coupled inverter
– Restoring logic level
– Inverter size must not be too
big
n-MOS CPL
network
Dynamic Logic
•
•
•
Using Precharge/Evaluation scheme
Family
– Domino logic
– NORA(NO RAce) logic
Characteristics
– Decreasing input loading
capacitance
– Power consumption in precharge
clock
– Increasing useless switching in
precharging period
precharge
evaluation
•
Basic architecture of Domino logic
P1
A
B
A
B
N
Logic Block
clk
A
C in
clk
N1
CL
•
•
•
Input Pin Ordering
Reorder the equivalent inputs to a
transistor based on critical path delays
and power consumption
N- input Primitive CMOS logic
– symmetrical in function level
– antisymmetrical in Tr level
• capacitance of output stage
• body effect
Scheme
– The signal that has many transition
must be far from output
– If it is hard to estimate switching
frequency, we must determine pin
ordering considering path and path
delay balance from primary input
to input of Tr.
•
Example of N-input CMOS logic
CL
A
B
C1
C
C2
D
C3
Experimentd with gate array of TI
For a 4-input NAND gate in TI’s BiCMOS gate
array library (with a load of 13 inverters), the delay
varies by 20% while power dissipation by 10%
between a good and bad ordering
INPUT PIN Reordering
VDD
A
B
C
MPA
MPB
1
D
MPC
1
A
MNA
MPD
CL Simulation result
( tcycle=50ns, tf/tr=1ns)
1
1
1
1
B
MNB
CB
: A가 critical input인 경우
=38.4uW,
1
1
1
1
C
MNC
CC
D가 critical input인 경우
=47.2uW
D
MND
CD
1
(a) (b)
1
(c) (d)
Sensitization
•
Definition
– sensitization : input signal that
forces output transition event
– sensitization vector : the other
inputs if one signal is
sensitized
Y
 [ f ] X i 0  [ f ] X i 1
X i
 f ( X 1 ,, X i l ,0, X i 1 ,, X n ) 
 f ( X 1 ,, X i l ,1, X i 1 ,, X n )
•
Example
X1
X2
X3
Y  ( X1  X 2 )  X 3
Y
 [ f ] X1 0  [ f ] X1 1
X 1
 X2X3  X3  X2X3
Sensitization(Cont’d)
•
Considering Sensitization in
Combinational logic:Remove
unnecessary transitions in the C.L
Q
Considering Sensitization in
Sequential logic: Also reduces the
power
consumption in the flipX1
Q
Y
flops.
Combinational
D
Y
Combinational
Logic
Xn
•
Q
Logic
Xn
D
Q
E
E
X1
Q
X1
Y
E
Q
Combinational
Logic
Combinational
Logic
Xn
Q
D
Xn
E
clk
D
Q
Y
TTL-Compatible
• TTL level signal  CMOS
input
•
Vdd
IDTTL1
Characteristic Curve of CMOS
Inverter
Vo
V dd = 3.3V
IDTTL2
Vin
TTL
INPUT
Vi
Vo
1.4V
Ileak = avg(I
V dd = 3.3V
,I )
d1 d2
PTTL  NTTL Vdd  ( I DTTL1  I DTTL 2 )
wher e NTTL : number of TTL compatible input pad
V IL = 0.8V
V IH = 2.0V
Vi
V dd = 3.3V
TTL Compatible(Cont’d)
•
CMOS output signal  TTL input
Chip Boundary
Chip Boundary
– Because of sink current IOL,
CMOS gets a large amount of
IOL
heat
– Increased chip operating
temperature
– Power consumption of whole
system
Input Pad
VOL
Output Pad
INPUT PIN Reordering
◈ To reduce the power dissipation one should place
the
input with low transition density near the ground
end.
(a) If MNA turns off , only CL needs to be charged
(b) If MND turns off , all CL, CB, CC and CD needs to be charged
(c) If the critical input is rising and placed near output node,
the
initial charge of CB, CC and CD are zero and the delay time
of CL
discharging is less than (d)
(d) If the critical input is rising and placed near ground end, the
charge of
CB, CC and CD must dischagge before the charge of CL
discharge to
zero
저전력 Booth Multiplier 설계
성균관대학교
전기전자컴퓨터공학부
김 진 혁, 이 준 성, 조 준 동
Modified Booth 곱셈기
• Multibit Recoding을 사용하여 부분합의 갯수를 1/2로 줄여 고속의
곱셈을 가능하게 한다.
• 피승수(multiplicand) : X , 승수(multiplier) : Y
Recoded digit = Y2i-1 + Y2i -2Y2i+1 ( Y-1=0 )
Y2i+1
0
0
0
0
1
1
1
1
Y2i
0
0
1
1
0
0
1
1
Y2I-1
0
1
0
1
0
1
0
1
Recoded
Digit
0
+1
+1
+2
-2
-1
-1
0
Operation
on X
0X
+1X
+1X
+2X
-2X
-1X
-1X
0X
Recoded Digit
Operation on X
0
:
Add 0 to the partial product
+1
:
Add X to the partial product
+2
:
Shift left X one position and add it
to the partial product
-1
:
Add two’s complement of X to the
partial product
-2
:
Take two’s complement of X and
shift left one position
< Generation and operation of recoded digit >
Modified Booth 곱셈기 - 예
•
Example
sign
extension
(-107)
10010101 = X
(+105)
01101001 = Y
1111111110010101
00000011010110
000001101011
1100101010
1101010000011101 = P (-11235)
Operation
Bits recoded
+1
-2
-1
+2
010
100
101
011
Wallace Tree - 4:2 Compressor
X7
Y7
..............
..............
X0
Y0
: Zero
: Bit jumping level
: partial product
: bit generated by
compressor
1st stage
2nd stage
Two summands to
be added
(a)
Y
1st stage
(block A)
1st stage
(block B)
2nd stage
(block C)
8
4*8 Partial Product generators
4
X3 , X2 , X1 , X0
8 4-2 compressors
4*8 Partial Product generators
8 4-2 compressors
11 4-2 compressors
16-bit adder
P15
P0
(b)
4
X7 , X6 , X5 , X4
Multipliers - Area
•
16-bit Multiplier Area
2
Multiplier
type
Area(mm )
Gate count
Array
4.2
2,378
Wallace
8.1
2,544
Modified booth
8.5
3,375
Multiplier - Delay
•
Average Power Dissipation (16-bit)
Multiplier
type
Power(mW)
Logic
transitions
Array
43.5
7,224
Wallace
32.0
3,793
Modified booth
41.3
3,993
Multiplier - Power
•
Worst-Case Delay (16-bit)
Multiplier
type
Delay(ns)
Gate
delays
Array
92.6
50
Wallace
54.1
35
Modified booth
45.4
32
Instruction Level Power Analysis
•
•
Estimate power dissipation of instruction sequences and power dissipation of a
program
Eb : base cost of individual instructions
Es : circuit state change effects
EM  Eb Es
E b  B i  N i
•
Es 
O i j  N i j
,
,
EM : the overall energy cost of a program
Bi : the base cost of type i instruction
Ni : the number of type i instruction
Oi,j : the cost occurred when a type i instruction is followed by
a type j instruction
Ni,j : the number of occurrences when a type i instruction is
immediately followed by a type j instruction
Instruction ordering
•
•
Develop a technique of operand swapping
Recoding weight : necessary operation cost of operands
W total  W
i
i
W i W
•
•
b
W
in ter
Wtotal : total recoding weight of input operand
Wi : weight of individual recoded digit i in Booth Multiplier
Wb : base weight of an instruction
Winter : inter-operation weight of instructions
Therefore, if an operand has lower Wtotal , put it in the second
input(multiplier).
RESULT
Circuit State Effects[pJ]
when switched
Instruction
Name
Base
cost
[pJ]
LOAD
ADD
2’s
complement
LOAD
1.46
0.18
1.20
ADD
0.86
0.31
2’s
complement
0.77
SHIFT
0.29
Circuit State Effects[pJ]
when switched
SHIFT
Instruction
Name
Base
cost
[pJ]
LOAD
ADD
2’s
complement
SHIFT
1.08
0.73
LOAD
3.25
0.40
2.67
2.38
1.63
0.49
0.61
ADD
1.91
0.58
1.11
1.44
0.27
0.34
2’s
complement
1.72
0.55
0.78
0.15
SHIFT
0.65
< 4 b y 4 m ultip lier >
0.38
< 8 b y 8 m ultip lier >
Circuit State Effects[pJ]
when switched
Instruction
Name
Base
cost
[pJ]
LOAD
ADD
2’s
complement
SHIFT
LOAD
4.81
0.59
3.96
3.57
2.40
ADD
2.83
1.02
1.63
2.12
2’s
complement
2.55
1.00
1.14
SHIFT
0.96
< 12 b y 12 m ultip lier >
0.78
Conclusion
% of instances with
circuit states effects
9.0%
reduction
Power[pJ]
35
12
30
10
12bit
8bit
0
4bit
5
bits
4
2
0
average
10
6
12bit
4.0%
reduction
8
8bit
20
15
circuit
states
effects not
considered
circuit
states
effects
considered
12.0%
reduction
4bit
25
bits
8. Layout Level Design
Device Scaling of Factor of S
•
•
•
•
•
•
•
•
•
Constant scaled wire increases coupling capacitance by S and wire resistance
by S
Supply Voltage by 1/S, Theshold Voltage by 1/S, Current Drive by 1/S
Gate Capaitance by 1/S, Gate Delay by 1/S
Global Interconnection Delay, RC load+para by S
Interconnect Delay: 50-70% of Clock Cycle
Area: 1/S2
Power dissipation by 1/S - 1/S2
( P = nCVdd2f, where nC is the sum of capacitance times #transitions)
SIA (Semiconductor Industry Association): On 2007, physical limitation: 0.1 m
20 billion transistors, 10 sqare centimeters
, 12 or 16 inch wafer
Delay Variations at Low-Voltage
• At high supply voltage, the delay increases with temperature
(mobility is decreasing with temperature) while at very low
supply voltages the delay decreases with temperature (VT is
decreasing with temperature).
• At low supply voltages, the delay ratio between large and
minimum transistor widths W increases in several factors.
• Delay balancing of clock trees based on wire snaking in order
to avoid clock-skew. In this case, at low supply voltages, slightly
VT variations can significantly modify the delay balancing.
Quarter Micron Challenge
•
•
•
•
•
•
•
•
•
•
•
•
•
Computers/peripherals (SOC): 1996 ($50 Billion) 1999 ($70 Billion)
Wiring dominates delay: wire R comparable to gate driver R; wire/wire coupling
C > C to ground
Push beyond 0.07 micron
Quest for area(past), speed-speed (now), power-power-power(future)
Accelerated increases of clock frequencies
Signal integrity-based tools
Design styles (chip + packages)
System-level design(system partitioning)
Synthesis with multiple constraints (power,area,timing)
Partitioning/MCM
Increasing speed limits complicate clock and power distribution
Design bounded by wires, vias, via resistance, coupling
Reverse scaling: adding area/spacing as needed: widening, thickening of wires,
metal shielding & noise avoidance - adding metal
CLOCK POWER CONSUMPTION
•Clock power consumption is as
large as the logic power; Clock
Signal carrying the heaviest load
and switching at high frequency,
clock distribution is a major
source of power dissipation.
• In a microprocessor, 18% of
the total power is consumed by
clocking
• Clock distribution is designed
as a hierarchical clock tree,
according to the decomposition
principle.
Power Consumption per block in
typical microprocessor
Crosstalk
•
•
•
•
•
•
•
•
•
•
•
Solution for Clock Skew
Dynamic Effects on Skew
Capacitance Coupling
Supply Voltage Deviation (Clock
driver and receiver voltage
difference)
Capacitance deviation by circuit
operation
Global and local temperature
Layout Issues: clocks routed first
Must aware of all sources of delay
Increased spacing
Wider wires
Insert buffers
Specialized clock need net
matching
Two approaches: Single Driver, Htree driver
•
•
•
•
Gated Clocks: The local clocks that
are conditionally enabled so that the
registers are only clocked during the
write cycles. The clock is partitioned
in different blocks and each block is
clocked with its own clock.
Gating the clocks to infrequently
used blocks does not provide and
acceptable level of power savings
Divide the basic clock frequency to
provide the lowest clock frequency
needed to different parts of the
circuit
Clock Distribution: large clock buffer
waste power. Use smaller clock
buffers with a well-balanced clock
tree.
PowerPC Clocking Scheme
CLOCK DRIVERS IN THE DEC ALPHA
21164
DRIVER for PADS or LARGE CAPACITANCES
Off-chip power (drivers and pads) are increasing and is very difficult
to reduce such a power, as the pads or drivers sizes cannot be
decreased with the new technologies.
Layout-Driven Resynthesis for Lower Power
Low Power Process
Vdd
• Dynamic Power Dissipation
C djp
Pd  a  C L  Vdd  f
2
I ds 

2
(Vgs  Vt )
2
Vin
C ovp
Vo
C ovn
C djn
n
C gate  Cox  (W  L)
i 1
m
Cin   (C gate ) j
D
j 1
Cov  CGD0  W
Cdj  C j  AD  C jsw  PD
AD  W  D, PD  2(W  D )
Drain
W
C jb
C jsw
Crosstalk
•
•
•
In deep-submicron layouts, some of the netlengths for connection between
modules can be so long that they have a resistance which is comparable to the
resistance of the driver.
Each net in the mixed analog/digital circuits is identified depending upon its
crosstalk sensitivity
– 1. Noisy = high impedance signal that can disturb other signals, e.g., clock
signals.
– 2. High-Sensitivity = high impedance analog nets; the most noise sensitive
nets such as the input nets to operational amplifiers.
– 3. Mid-Sensitivity = low/medium impedance analog nets.
– 4. Low-Sensitivity = digital nets that directly affect the analog part in some
cells such as control
signals.
– 5. Non-Sensitivity = The most noise insensitive nets such as pure digital
nets,
The crosstalk between two interconnection wires also depends on the
frequencies (i.e., signal activities) of the signals traveling on the wires. Recently,
deep-submicron designs require crosstalk-free channel routing.
Power Measure in Layout
•
•
•
•
•
The average dynamic power consumed by a CMOS gate is given below, where
C_l is the load capacity at the output of the node, V_dd is the supply voltage,
T_cycle is the global clock period, N is the number of transitions of the gate
output per clock cycle, C_g is the load capacity due to input capacitance of
fanout gates, and C_w is the load capacity due to the interconnection tree
formed between the driver and its fanout gates.
Pav = (0.5 Vdd2) / (Tcycle Cl N) = (0.5 Vdd2) / (Tcycle (Cg + Cw )N)
Logic synthesis for low power attempts to minimize SUMi Cgi Ni
Physical design for low power tries to minimize
SUMi Cwi Ni
. Here Cwi consists of Cxi + CsI, where Cxi is the capacitance of net i due to its
crosstalk, and CsI is the substrate capacitance of net i. For low power layout
applications, power dissipation due to crosstalk is minimized by ensuring that
wires carrying high activity signals are placed sufficiently far from the other wires.
Similarly, power dissipation due to substrate capacitance is proportional to the
wirelength and its signal activity.
이중 전압을 이용한 저전력
레이아웃 설계
성균관대학교
전기전자컴퓨터공학부
김 진 혁, 이 준 성, 조 준 동
목
•
•
•
•
•
•
•
•
•
차
연구목적
연구배경
Clustered Voltage Scaling 구조
Row by Row Power Supply 구조
Mix-And-Match Power Supply 구조
Level Converter 구조
Mix-And-Match Power Supply 설계흐름
실험결과
결론
연 구 목 적 및 배경
•
조합회로의 전력 소모량을 줄이는
이중 전압 레이아웃 기법 제안
•
이중 전압 셀을 사용할 때, 한 cell
row에 같은 전압의 cell이 배치되면
서 증가하는 wiring 과 track 의 수를
줄임
•
최소 트랜지스터 개수를 사용하는
Level Converter 회로의 구현
•
디바이스의 성능을 유지하면서
이중 전압을 사용하는 Clustered
Voltage Scaling [Usami, ’95]을 적
용
•
제안된 Mix-And-Match Power
Supply 레이 아웃 구조는 기존의
Row by Row Power Supply
[Usami, ’97] 레이 아웃 구조를
개선하여 전력과 면적을 줄임
Clustered Voltage Scaling
• 저전력 netlist 를 생성
G5
F/F
S 5>0
G4
Slack(S i) = R i - A i
G3
G6
G2
S 6>0
S 4>0
G8
S 2<0
S 3>0
LC1
S 8<0
G1
S 1>0
F/F
G7
S 7<0
S 9>0
: VDDL
S 11<0
F/F
: VDDH
LC2
G11
G10
S 10<0
G9
: Level Converter
Row by Row Power Supply 구조
standard
cell
VDDL
VDDH
VDDL
cell
VDDL
VDDH
standard cell
standard cell
VDDL
VDDH cell
module
VSS
VDDL cell
VDDH cell
VDDH
VSS
Mix-And-Match Power Supply 구조
standard cell
VDDL
VDDH
cell
VDDH
VDDL VDDL
cell
VDDH
standard cell
standard cell
module
VDDH cell
VDDL
cell
VDDH
cell
VDDL
VDDL
VDDH
VDDH
VSS
VSS
VDDL cell
구조비교
Conventional
Circuit
RRPS
MAMPS
VDDL
VDDH
VDDH
VDDL
VDDH
module
module
module
Level Converter 구조
• Transistor의 갯수 : 6개
4개
• 전력과 면적면에서 효과적
VDDH
VDDH
VDDH
OUT
VDDL
VSS/VDDL
VSS/VDDH
IN
Vth=1.5V
기 존
Vth=2.0V
제 안
Mix-And-Match Power Supply
Design Flow
Single voltage netlist
Multiple voltage scaling
Netlist with multiple supply voltage
(OPUS)
Assign supply voltage to each cell
Physical placement
(Aquarius XO)
Routing
Synthesis timing, power and area
(PowerMill)
실험결과
전체 Power
전체 Area
Area
(%)
power
(%)
100
47%
10%
15%
100
2%
Conventional
circuit
RRPS
MAMPS
Conventional
circuit
RRPS
MAMPS
결
론
• 단일 전압 회로와 비교하여 49.4%의 Power 감소를
Area overhead가 발생
얻은 반면 5.6%의
• 기존의 RRPS 구조보다 10%의 Area 감소와 2%의 Power 감소
• 제안된 Level Converter는 기존의 Level Converter보다 30%의 Area 감소와
35%의 Power 감소
9. CAD tools
Low Power Design Tools
•
Transistor Level Tools (5-10% of silicon)
– SPICE, PowerMill(Epic), ADM(Avanti/Anagram), Lsim Power Analyst(mentor)
•
Logic Level Tools (10-15%)
– Design Power and PowerGate (Synopsys), WattWatcher/Gate (Sente), PowerSim
(System Sciences), POET (Viewlogic), and QuickPower (Mentor)
•
Architectural (RTL) Level Tools (20-25%)
– WattWatcher/Architect (Sente): 20-25% accuracy
•
Behavioral (spreadsheet) Level Tools (50-100%)
– Active area of academic research
Commercial synthesis systems
Research synthesis systems
AArchitectural
synthesis.
L - Logic
synthesis.
Low-Power CAD sites
•
•
•
•
•
•
Alternative System Concepts, Inc, : 7X power reduction throigh optimization,
contact http://www.ee.princeton.edu and Jake Karrfalt at [email protected] or
(603) 437-2234. Reduction of glitch and clock power; modeling and
optimization of interconnect power; power optimization for data-dominated
designs with limited control flow.
Mentor Graphics QuickPower: Hierarchical of determining overall benet of
exchanging the blocks for lower power. powering down or disabling blocks when
not in use by gated-clock
choose candidates for power-down Calculate the effect of the power-down logic
http://www.mentorg.com
Synopsys's Power Compiler http://www.synopsys.com/products/power/power_ds
Sente's WattWatcher/Architect (first commerical tool operating at the
architecture level(20-25 %accuracy). http://www.powereda.com
Behavioral Tool: Hyper-LP (Optimization), Explore (Estimation) by J. Rabaey
Design Power(Synopsys)
•
•
•
DesignPower(TM) provides a single, integrated environment for power
analysis in multiple phases of the design process:
–
Early, quick feedback at the HDL or gate level through probabilistic
analysis.
–
Improved accuracy through simulation-based analysis for gate level
and library exploration.
DesignPower estimates switching, internal cell and leakage power. It accepts
user-defined probabilities, simulation toggle data or a combination of both as
input. DesignPower propagates switching information through sequential
devices, including flip-flops and latches.
It supports sequential, hierarchical, gated-clock, and multiple-clock designs.
For simulation toggle data, it links directly to Verilog and VHDL simulators,
including Synopsys' VSS.
10. References
References
[1] Gary K. Yeap, "Practical Low Power Digital VLSI Design",
Kluwer Academic Publishers.
[2] Jan M. Rabaey, Massoud Pedram, "Low Power Design Methodologies",
Kluwer Academic Publishers.
[3] Abdellatif Bellaouar, Mohamed I. Elmasry, "Low-Power Digital VLSI Design
Circuits And Systems", Kluwer Academic Publishers.
[4] Anantha P. Chandrakasan, Robert W. Brodersen, "Low Power Digital CMOS
Design", Kluwer Academic Publishers.
[5] Dr. Ralph Cavin, Dr. Wentai Liu, "1996 Emerging Technologies : Designing
Low Power Digital Systems"
[6] Muhammad S. Elrabaa, Issam S. Abu-Khater, Mohamed I. Elmasry,
"Advanced Low-Power Digital Circuit Techniques",
Kluwer Academic Publishers.
References
•
•
•
•
•
[BFKea94] R. Bechade, R. Flaker, B. Kaumann, and et. al. A 32b 66 mhz 1.8W
Microprocessor". In IEEE Int. Solid-State Circuit Conference, pages 208-209,
1994.
[BM95] Bohr and T. Mark. Interconnect Scaling - The real limiter to high
performance ULSI". In proceedings of 1995 IEEE international electron devices
meeting, pages 241-242, 1995.
[BSM94] L. Benini, P. Siegel, and G. De Micheli. Saving Power by Synthesizing
Gated Clocks for Sequential Circuits". IEEE Design and Test of Computers,
11(4):32-41, 1994.
[GH95] S. Ganguly and S. Hojat. Clock Distribution Design and Verification for
PowerPC Microprocessor". In International Conference on Computer-Aided
Design, page Issues in Clock Designs, 1995.
[MGR96] R. Mehra, L. M. Guerra, and J. Rabaey. Low Power Architecture
Synthesis and the Impact of Exploiting Locality". In Journal of VLSI Signal
Processing,, 1996.