Download 1-LP-SOC

Document related concepts
no text concepts found
Transcript
SoC 저전력 설계 기법
조준동
SungKyunKwan University
VADA Lab.
1
· Content





Introduction
SOC Design Trends
System Level Low Power Design
Architecture Level Low Power Design
Conclusion
2
· SOC Design Trends
 Expected to integrate more and more complex
• Web-browsing, real-time video processing, speech
recognition and synthesis
 Average operating power at or below 100mW and standby
power levels at or below 2mW
 Performance levels must increase from 300 million
operations per second (MOPS) today to 2500 MOPS in
2016
3
Achieving functionality while
maximizing battery life and minimizing size
GPS
Cochlear implant
Cellular phone
Noise
cancellation
headphones
Medical
watch
Hearing
aid
Digital still camera
Portable
audio
Digital radio
4
QoS vs. Power
• How accurate should I make my FDCT?
5
SOC Design Characteristics
 The new version of ITRS predicts that Moore’s law
will continue on a two to three year cycle throughout this
period (2001-2016)
 One of the key design challenges is to effectively use
the dramatically increasing transistor counts, given
certain power and productivity constraints
 “Bottom-up” - based on system constraints
“Top-down” - based on design resource constraints
6
Energy-Flexibility Gap
에너지 효율
(MOPS/mW)
1000
100
신호처리
ASIC
200 MOPS/mW
재구성 구조
10
1
신호처리 프로세서
ASIPs, DSPs
10-80 MOPS/mW
3 MOPS/mW
임베디드 프로세서(ARM)
0.5 MOPS/mW
0.1
가용성
7
6
Radio systems
• WiFi – 10-100Mbits/sec unlicensed band
– OFDM, M-ary coding
• 3G – .1-2 Mbits/sec wide area cellular
– CDMA, GMSK
• Bluetooth – .8 Mbit/sec cable replacement
– Frequency hop
• ZigBee – .02-.2 Kbits/sec low power, low cost
– QPSK
• UWB – Recently allowed by FCC
– Short pulses (no carrier), bi-phase or PPM
8
Data rate
UWB
100 Mbit/sec
10 Mbit/sec
1 Mbit/sec
802.11g
802.11b
3G
802.11a
Bluetooth
ZigBee
100 kbits/sec
ZigBee
10 kbits/sec UWB
0 GHz 1GHz 2 GHz 3 GHz 4 GHz 5 GHz 6 GHz
9
Cost (projections)
$1000
$100
3G
802.11a
802.11b,g
UWB
$10
$1
ZigBee
UWB
Bluetooth
ZigBee
$ .10
0 GHz 1GHz 2 GHz 3 GHz 4 GHz 5 GHz 6 GHz
10
Power Dissipation
10 W
3G
1W
100 mW
802.11bg
802.11a
Bluetooth
UWB
ZigBee
10 mW
ZigBee
UWB
1 mW
0 GHz 1GHz 2 GHz 3 GHz 4 GHz 5 GHz 6 GHz
11
Why Low-Power Devices?
• Practical reasons
(Reducing power requirements of high
throughput portable applications)
• Financial reasons
(Reducing packaging costs and achieving
memory savings)
• Technological reasons
(Excessive heat prevents the realization of
high density chips and limits their
functionalities)
12
Different Constraints for
Different Application Fields
• Portable devices: Battery life-time
• Telecom and military: Reliability
(reduced power decreases
electromigration, hence increases
reliability)
• High volume products: Unit cost
(reduced power decreases packaging cost)
13
Driving Forces for Low-Power:
Deep-Submicron Technology
ADVANTAGES
 Smaller geometries
 Higher clock
frequencies
DISADVANTAGES
 Higher power
consumption
 Lower reliability
14
Dynamic Power Consumption
• Average power consumption by a node cycling
at each period T:
(each period has a 01 or a 1 0 transition)
Pswitchingbattery
Ecycle
2

 C0VDD fCLK
T
Average power consumed by a node with
partial activity
(only a fraction
 of the periods has a transition)
Pswitching battery  C V
2
0 DD CLK
f
15
· Power Model
• Power dissipation in
logic blocks, consists
of both dynamic
(switching) and static
(standby)
16
· Power Model
• Memory power is due
primarily to row/column
decoders and bit and
word line switching
activity
• Consider the power
dissipated when the
bitlines are switched by
approximately VDD
during write cycles
17
· Chip Composition (Future)
 Low-power digital SOC designs of the future will
be 90-95% memory and 5-10% logic, including
overhead
 Future chips may be dominated by memory due to
power and resource constraints
18
Three Factors affecting Energy
– Reducing waste by Hardware
Simplification: redundant h/w extraction,
Locality of reference,Demand-driven /
Data-driven computation,Applicationspecific processing,Preservation of data
correlations
– All in one Approach(SOC): I/O pin and
buffer reduction
– Voltage Reducible Hardwares
– 2-D pipelining (systolic arrays)
– Parallel processing
19
저전력 설계 기법들…
• Voltage and process scaling
• Design methodologies
– Power-aware design flows and tools, trade area for
lower power
• Architecture Design
• Power down techniques
– Clock gating, dynamic power management
• Dynamic voltage scaling based on workload
• Power conscious RT/ logic synthesis
• Better cell library design and resizing
methods
– Cap. reduction, threshold control, transistor layout
20
SoC Design Flow
21
Power Analysis
• Fast and accurate analysis in the design
process
– Power budgeting
– Knowledge-based architectural and
implementation decisions
– Package selection
– Power hungry module identification
• Detailed and comprehesive analysis at the
later stages
– Satisfaction of power budget and constraints
– Hot spots
22
Power Savings
23
Estimation Expectations
24
System Level Power Optimization
• Algorithm selection / algorithm
transformation
• Identification of hot spots
• Low Power data encoding
• Quality of Service vs. Power
• Low Power Memory mapping
• Resource Sharing / Allocation
25
Flow
• C/C++ Compilation
• Program Execution
• Building design representation
• Loading profiling data
• Setting constraints
• Power estimation
• Identification of Hot Spots
26
IBM’s PowerPC
• Optimum Supply Voltage through Hardware Parallel,
Pipelining ,Parallel instruction execution
– five instruction in parallel (IU, FPU, BPU, LSU,
SRU) , RISC
– FPU is pipelined so a multiply-add instruction can
be issued every clock cycle
– Low power 3.3-volt design
– 603e provides four software controllable powersaving modes.
• Copper Processor with SOI
• IBM’s Blue Logic ASIC :New design reduces of power
by a factor of 10 times
27
Silicon-on-Insulator
• How Does SOI Reduce Capacitance ?
Eliminated junction capacitance by using
SOI (similar to glass) is placed between the
impuritis and the silicon substrate
high performance, low power, low soft error
28
Why Copper Processor?
• Motivation: Aluminum resists the flow
of electricity as wires are made thinner
and narrower.
• Performance: 40% speed-up
• Cost: 30% less expensive
• Power: Less power from batteries
• Chip Size: 60% smaller than Aluminum
chip
29
Factors Influencing Ceff
• Circuit function
• Circuit technology
• Input probabilities
• Circuit topology
30
Some Basic Definitions
• Signal probability of a signal g(t) is given by
1 T2
P g   lim  g t dt
T  T T 2
Signal activity of a logic signal g(t) is given by
ng T 
A g   lim
T 
T
where ng(t) is the number of transitions of g(t) in
the time interval between –T/2 and T/2.
31
Factors Influencing Ceff:
Circuit Function
• Assume that there are M mutually independ
ent signals g1, g2,...gM each having a signal pr
obability Pi and a signal activity Ai, for i  n.
• For static CMOS, the signal probability at
the output of a gate is determined according
to the probability of 1s (or 0s) in the logic
description of the
gate
P
1
P1
1-P1
P2
P1P2
P1
1-(1-P1)(1- P2)
P2
32
Factors Influencing Ceff:
Circuit Function (Static CMOS)
• Transistors connected to the
same input are turning on
and off simultaneously when
the input changes
• CL of a static CMOS gate is
charged to VDD any time a
01 transition at the output
node is required.
• CL of a static CMOS gate is
discharged to ground any
time a 1 0 transition at
the output node is required.
NOR Gate
33
Factors Influencing Ceff:
Circuit Function (Static CMOS)
• State transition diagram of the NOR gate
  1  pY  pY  pY 1  pY   3 8


pY '
pY '
34
Factors Influencing Ceff:
Circuit Function (Static CMOS)
• State transition diagram of the NOR gate
  pY ' pY  pY pY '  1 2
35
Factors Influencing Ceff:
Input Probabilities (Static CMOS)
• Signal activity calculation: Boolean
Difference
f xi  f
xi 1
f
xi 0
It signifies the condition under which output f
is sensitized to input xi
If the primary inputs to function f are not
spatially correlated, the signal activity at f is
Af 
 Pf
1i  N
xi  Axi
36
Power Reduction Methods:
Architecture Driven Supply
Voltage Scaling
• Strategy:
1. Modify the architecture of the system so as to
make it faster.
2. Reduce VDD so as to restore the original speed.
Power consumption has decreased.
• The most common architectural changes rely on the
exploitation of parallelization and pipelining.
• Drawback:
The additional circuitry required to compensate the
speed degradation may dominate, and the power
consumption may increase.
• Consequence:
Parallelism and pipelining do not always pay-off.
37
Parallel Architectures
Ppar=0.36Pref
38
Parallel-Pipelined Architectures
Ppar=0.2Pref
39
Loop unrolling
• The technique of loop unrolling replicates the body of a
loop some number of times (unrolling factor u) and then
iterates by step u instead of step 1. This transformation
reduces the loop overhead, increases the instruction
parallelism and improves register, data cache or TLB
locality.
for i = 2 to N - 1
A(i ) = A(i ) + A(i - 1) A(i + 1)
for i = 2 to N - 2 step 2
A(i ) = A(i ) + A(i - 1) A(i + 1)
A(i  1) = A(i  1) + A(i ) A(i + 2)
Loop overhead is cut in half because two iterations are performed in each iteration.
If array elements are assigned to registers, register locality is improved because A(i) and
A(i +1) are used twice in the loop body.
Instruction parallelism is increased because the second assignment can be performed
while the results of the first are being stored and the loop variables are being updated.
40
Loop Unrolling (IIR filter example)
Two output samples are computed in parallel based
on two input samples.
Yn1  X n1  A  Yn2
Yn  X n  A  Yn1  X n  A  ( X n1  A  Yn2 )
Neither the capacitance switched nor the voltage is altered.
However, loop unrolling enables several other transformations
(distributivity, constant propagation, and pipelining). After
distributivity and constant propagation,
Yn1  X n 1  A  Yn 2
Yn  X n  A  Yn1  A2  Yn2
The transformation yields critical path of 3, thus voltage can be
dropped.
41
Loop Unrolling for Low Power
42
Loop Unrolling for Low Power
43
Loop Unrolling for Low Power
44
Encoding
• Bus-invert (BI) code
– Appropriate for random data patterns
– Redundant code (1 extra bus line)
– Reduce avg. transitions up to 25%
0000
1010
0100
1111
1010
0100
1101
0011
0000
1010
1011
1111
1010
1011
0010
0011
0
0
1
0
0
1
1
0
X
Majority
voter
Z
D
Z
D
inv
X
inv
R. J. Fletcher, “Integrated circuit having outputs configured for reduced state changes,” May 1987, U.S. Patent 4667337.
M. R. Stan and W. P. Burleson, “Bus-invert coding for low-power I/O,” IEEE Tr. on VLSI Systems, Mar. 1995, pp. 49-58.
45
Different Supply Voltages
for Different Units
• Partition the chip into multiple sub-units each
of which is designed to operate at a specific
supply voltage
3V
3V
SLOW
3V
5V
3V
SLOW
SLOW
5V
SLOW
FAST
5V
3V
46
Eureka 147/KDMB을 위한
COFDM 모뎀 블록도
Scrambler
Convolutional
Encoder
Time
Interleaver
COFDM
Modulator
(FFT)
Convoluional
Deinterleaver
Scrambler
Viterbi
Decoder
Time
Deinterleaver
COFDM
Modulator
(IFFT,
Phase/Timing
Lock, Frame
Sync
Channel
(Gaussian, Ricean, Rayleigh)
Convoluional
Interleaver
Reed Solomon
Decoder
BERT
(Bit-Error-Ratio-Tester)
Serial
Data
Reed Solomon
Encoder
Serial
Data
47
DMB 변복조부 국내․외 현황
업체명
생산품목과 주요 특징
TI
(미국)
DRE200 : 범 용 DSP 사 용 하 여
COFDM/Audio FEC/Decoder수행, 160mW
ATMEL
(독일)
U2739M : Oak DSP사용하여 COFDM복조
, HW Audio / FEC Decoding, 860mW
Panasonic
(일본)
MN66720UC : SDSP for COFDM, MDSP
for Audio,
Frontier
Chorus FS1010 : Special
Silicon(영국) COFDM/Audio, 100mW
DSP
for
48
저전력 소모 기술 개발 현황
개발자
응용 제품
IBM, Austin
Low power
Computing Research
DPM (PowerPC 405LP)
휴대용 프로세서
Linux power
management
(90% 전력 감소)
Power Aware
Communication
전력관리, 스케줄링,
OS 시스템
PCF50606:
Single Chip power
management unit (for
smart phone and
wireless PDA)
Programmed
power
management
(70% 전력 감소)
DoD DARPA
Philips
STMicroelectronics
Atmel
Atrenta사
GlassSpy CAD tool
특징
RTL 구조의 HDL
및 SystemC로
gate된 클록 구조를
생성
49
VADA Lab’s 저전력 IP’s
buffer i ( wt )
y
x
z
40
35
Conventional FEQ
Low-Power FEQ
Conventional FEQ
30
c



10
PE
comparator
comparator
comparator
comparator
Memory
PDF
( b j ( wt 1 ) )
Transition
(a )
90
1
80
1
70
1
60
1
50
1
1
40
1
0
-5
Control Generator
HOST
CPU
DATA BUS(32BIT)
DATA BUS(32BIT)
Crypto
Processor
CLK
search data buffer
Maximizing Memory Data Reuse for
Lower Power Motion Estimation
33% 전력 감소, 52Mhz 2.1배 면적증가
(SCI 논문)
reference data buffer
shift register
modified
PE
ij
RESET
CS
DW
RD
WR
i ( wt 1 )
Low-Power Equalizer for xDSL
21% 전력 감소, SNR=40dB
address
generator
PE
5
Learning
Constant
Control
external
memory
current
data
PE
15
Error
Control
30
1
Coefficient
Update
external
memory
search
data
PE
Low-Power FEQ
20
20
1
x*
A DDRES S BUS (8BIT)
25
10
1
Conjugator
스마트 카드용 차세대
저전력 보안 프로세서 칩 설계
ECC, Rijndael, DES, SHA
DIN_Reg
Key_add
clk
enb
sel_1
clk
enb
rst
Key
Generation
modified
PE
modified
PE
modified
PE
modified
PE
modified
PE
modified
PE
modified
PE
modified
PE
modified
PE
enb
clk
rst
start
Control
modified
PE
modified
PE
modified
PE
modified
PE
clock generator
contorl signal
generator
c3_sum
c4_sum
c2_sum
c1_sum
compa
rator
compa
rator
shift registors
compa
rator
compa
rator
Motion Vector
Fast and Low Power Viterbi Search
Engine using Inverse Hidden Markov
Model
68% 전력 감소, 71%속도개선,
1.9배면적증가
삼성 휴먼 테크 우수논문상, ‘02
IS-95 기반 CDMA의 Double Dwell
Searcher 저전력 및 co-design 설계
67% 전력 감소, 41% 면적감소
NCO
NCO
CR
CR
Demod
Demod
GI
GI
Removal
Removal
CPE
ADC
ADC
Coarse
Coarse
STR
STR
IF
RF
DP
DP
AGC
AGC
Timing
Timing
Processor
Processor
GI/FFT
GI/FFT
Detector
Detector
FFT
FFT
OFDM-based high-speed
wireless LAN platform
20.7Mhz, 237000 gates
CSI
Channel
Channel
Estimator
Estimator
/Equalizer
/Equalizer
Phase
Phase
Rotator
Rotator
Fine
Fine
STR
STR
Viterbi
Viterbi
FEC
FEC
SER
SER
Mux_1
Byte_Sub
Shift_Low
modified
PE
modified
PE
sub
Key
Key
High-Flexible Design of OFDM
Tranceiver for DVB-T (개발 중)
DSP
DSP
ASIC
ASIC
50
Mix_Column
sel_1
sel_2
sel_2
Mux_2
Key_add
DOUT_Reg
기타 저전력 설계 기법 사례
• 변화된 수 체계의 사용
• Scheduling/ordering
• 알고리즘 치환
• 신호 및 통계적 분석
51
수체계 변환에 의한 저전력 기법 –
I.1
•
Logarithmic Number System의 사용
A  S A LA
•
Log 수 체계
– 연산 모듈 중 크기가 가장 큰 FFT
에 적용
– look-up table이 크기에 변수
– 어떤 수를 부호와 크기 영역으로
분리한다. 크기 영역에 대해서 2의
log를 취한 값을 산출한다.
– 변환된 log 값을 어떤 n 비트로 제
한된 표현 범위의 값을 갖는 2진수
로 표현.
LNS 연산
– 곱셈 : 가산
– 가감산 : 가산고 감산 및 look-up
table
연산의 정확도
– 소수부가 2비트 이상의 경우 BER
성능 감소 없음
전력 소모
– 실험 결과 일반 butterfly FFT에
비하여 약 60% 정도 까지 전력 소
모가 감소함
– 7.8mW -> 3.1mW
0,
SA  
1,
•
•
•
if
if
log 2  A ,
LA  
log 2  ,
A  1  2  S A 2 LA
A0
A0
if
if
A 
A 
n  I  b 1
Lˆ A  ln 1 lblb 1 l0 
2b LA  0.5 / 2b ,
Lˆ A   b
2 LA  0.5 / 2b ,
if LA  0
if LA  0
52
수체계 변환에 의한 저전력 기법 –
I.2
53
연산 순차 변환에 의한 저전력 기법 –
I.1
• coefficient ordering
– radix-4 pipeline 저전력 FFT 프로세서의 전력
소모를 줄이기 위해 연산 순서를 변형
• Coefficient ordering
– 복소 곱셈기의 고정된 계수 입력에 대한 스위칭 동작 감축
• 새로운 commutator 구조
– 추가적인 dual-port RAM 사용
– 16과 64 포인트 FFT에 대하여 각각 23% 및
9%의 전력 감소 효과.
• 보다 큰 FFT에서 효과가 감소
54
연산 순차 변환에 의한 저전력 기법 –
I.2
55
알고리즘 치환에 의한 저전력 – I.1
• 64-point FFT에 적용
– 64 포인트 FFT를 알고리즘 변환에 의해 수식
을 치환
– 2개의 2차원 구조의 8 포인트 FFT로 분할한
다.
• 복소 곱셈은 shift-and-add 방식으로 구현한다.
• 전력 소모
– in-house 0.25µ/m BiCMOS technology 공정
의 20 MHz 1.8v 공급 전압 하에서 평균 동적
전력 소모 41mW
56
알고리즘 치환에 의한 저전력 –
I.2
N 1
Ar    Bk WNrk
k 0
 sl 7

As  8t    W64  Bl  8m W8sm W8lt
l 0 
m 0

7
57
신호 및 통계적 분석에 의한 저전력 –
I.1
• 전력 소모의 비율
– 전체 전력 소모의 절반 가량은 복소 곱셈기에서 이루어 진다.
• Butterfly 곱셈의 내용 분석
– 계수 곱셈의 경우
• generic stage에서 M개의 계수 중에서 총 0.25*M+3은 1
– (1, 0)의 cosine과 sine에 대해서 clock gating 사용 가능
• Frequency division duplex 모뎀의 경우
– ETSI 표준의 4.3125KHz tone spacing을 갖는, 4096 DMT
• upstram carrier중 41%, donwstream중 26%, 그외 30%는 사용되지
않는다.
– ETSI 표준의 4.3125KHz tone spacing을 갖는, 1024 DMT
• 각각 13%, 68%, 18% 이다.
– 59~87%의 IFFT(up) 입력은 0이고 31~74%dml FFT(down)입력
은 0이다.
– clock gating 가능.
– 초기 입력 단에서 적용 가능
58
Clock Network Power Managements
• 50% of the total power
• FIR (massively pipelined circuit):
video processing: edge detection
voice-processing (data transmission like xDSL)
Telephony: 50% (70%/30%) idle,
동시에 이야기하지 않음.
with every clock cycle, data are loaded into the
working register banks, even if there are no
data changes.
59
Wireless Interface Power-Saving
Ronny Krashinsky and Hari Balakrishnan
MIT Laboratory for Computer Science
• Sleep to save energy, periodically wake to check for pending
data
– PSM protocol: when to sleep and when to wake?
• A PSM-static protocol has a regular sleep/wake cycle
PSM off
750mW
time
power
power
Measurements of Enterasys Networks RoamAbout 802.11 NIC
PSM on
50mW
time
100ms
60
Ronny Krashinsky and
Hari Balakrishnan, MIT
Mobile
Device
tim
e
SYN
ACK
DATA
PSM off
Serve
Access
r
Point
Mobile
Device
PSM on
Serve
Access
r
Point
0ms
AWAKE
SLEEP
100ms
200ms
61
The PSM-static Dilemma
Compromise between performance and energy
If PSM-static is too coarse-grained, it harms
performance by delaying network data
If PSM-static is too fine-grained, it wastes
energy by waking unnecessarily
Solution: dynamically adapt to network activity to
maintain performance while minimizing energy
– Stay awake to avoid delaying very fast RTTs
– Back off (listen to fewer beacons) while idle
62
Why Hardware for Motion Estimation?
• Most Computationally demanding part
of Video Encoding
• Example: CCIR 601 format
• 720 by 576 pixel
• 16 by 16 macro block (n = 16)
• 32 by 32 search area (p = 8)
• 25 Hz Frame rate (f frame = 25)
• 9 Giga Operations/Sec is needed for Full
Search Block Matching Algorithm.
63
Why Reconguration in Motion Estimation?
• Adjusting the
search area at
frame-rate
according to the
changing
characteristics of
video sequences
• Reducing Power
Consumption by
avoiding
unnecessary
computation
Motion Vector Distributions
64
Architecture for Motion Estimation
From P. Pirsch et al, VLSI Architectures for Video
Compression, Proc. Of IEEE, 1995
65
DIGLOG multiplier
Cmult (n)  253n 2 , Cadd (n)  214n, where n  world length in bits
A  2 j  AR , B  2 k  BR
A  B  (2 j  AR )(2 k  BR )  2 j  BR  2 k  AR  AR  BR
1st Iter 2nd Iter 3rd Iter
Worst-case error
-25%
-6%
-1.6%
Prob. of Error<1% 10%
70%
99.8%
With an 8 by 8 multiplier, the exact result can be obtained at a maximum of
seven iteration steps (worst case)
66
Low Power CDMA Searcher
CDMA 단말기에 사용하기위한 MSM
(Mobile Station Modem) 칩의 Searcher Engine에 대한
RTL수준 저전력 설계 구현. 동작 주파수 : 12.5MHz
Data flow graph를 사용하여 rescheduling, precomputation 및 strength reduction, Synchronous
Accumulator를 이용한 저전력 설, area와 power를 각각
최대 67.68%, 41.35% 감소 시킴. San Kim and JunDong Cho, “Low Power CDMA Searcher”, CAD and
VLSI Workshop, May. 1999.
•
Inki Hwang, San Kim and Jun-Dong Cho, “CDMA Searcher Co-Design”,
•
ASIC Workshop, Sep. 1999
.
67
CDMA Searcher
그림 1). 상세 블록도
68
탐색자 (Searcher)
• IS-95 기반의 DS/CDMA 시스템에서 기지국에서 전송하
는 파일롯 채널을 입력으로 하여, 초기 동기를 획득하는
장치
• 탐색자 (Searcher)의 종류
– 상관기를 사용하는 방식, 정합필터를 응용한 방식
– 상관기를 사용한 직렬 탐색 및 Double Dwell 방식을 사용함.
• 국부 (단말기) PN 코드 발생기
– 15개의 register를 사용하여 생성.
– 생성 다항식
69
Operation Flow
1 기지국에서 전송하는 파일럿 채널을 단말기에서 발
생된 PN부호열과 역확산 과정 수행.
2 역확산된 결과를 동기 누적 횟수 Nc 만큼 누적한 후
에너지 계산 과정을 거침 (제곱 연산).
3 에너지 계산 결과값들은 첫번째 임계치와 비교하여
초과할 경우 뒷 단에서 비동기 누적(Nn) 수행.
4 그렇지 못할 경우 PN부호열을 한 칩 빨리 발생시키
고 입력되는 신호에 대하여 앞의 과정을 반복.
5 비동기 누적을 거친 결과값을 두번째 임계치와 비교.
6 초과하면 탐색 과정을 종료하고, 그렇지 않을 경우
PN부호열을 한 칩 빨리 발생시키고 앞의 과정을 반
복.
70
Pre-computation
◈ A comparator
example : Shrinivas Devadas,
1994
◈ Precomputation for
external idleness : M.
Alidina, 1994
71
Low Power Comparator
72
Three Input ALU
( Ovadia Bat-Sheva, 1998 )
MUL0
MUL1
MUL0
MUL1
P0
P1
P0
P1
ALU
ALU/ASU
3IALU
acc0
acc1
acc1
Two ALUs Structure
Three Input ALU Structure
The three input ALU consumes much less power than an ALU
and an ASU
A drawback of using a 3I-ALU is the added complexity in
calculating the carry and overflow.
73
Carry Save Adder 및 Pre-computation 적용
RX I
TX I
RX Q
XOR
TX Q
RX I
XOR
TX Q
RX Q
XOR
-TX I
XOR
RX I
TX I
RX Q
XOR
RX I
XOR
TX Q
RX Q
XOR
-TX I
XOR
동기 누적단
동기 누적단
+
TX Q
+
+
+
()2
()2
에너지 계산단
CSA
CSA
()2
()2
에너지 계산단
>
max 값 선택
>
max 값 선택
>
θ1 와 비교
>
θ1 와 비교
비동기 누적단
비동기 누적단
+
+
>
θ2 와 비교
>
θ2 와 비교
74
Rescheduled Data Flow Graph
RXI
TXI RXQ
XOR
TXQ RXI
XOR
TXQ RXQ
XOR
-TX I
XOR
CSA
동기 누적단
CSA
| |
| |
> max 값 선택
>
θ1와 비교
()2
에너지 계산단
+
비동기 누적단
>
θ2와 비교
동기 누적단
– Carry Save Adder
(or 3 Iinput ALU)
사용
임계치 비교
– Pre-computation 적
용
에너지 계산단
– Data Flow 순서를
변화하여 곱셈 과정
을 줄임
75
Image 압축
76
Link Adaptation Technique
Adaptive Modulation and Coding
Throughput
16QAM, R=1/2
Modulation/Coding
transition, 8PSK->16QAM
16QAM, R=1/4
8PSK, R=1/4
Hull of AMC
QPSK, R=1/4
C/I
77
Related documents