Download ARM General Purpose Processor

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Power engineering wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Alternating current wikipedia , lookup

Multi-core processor wikipedia , lookup

Magnetic-core memory wikipedia , lookup

Transcript
1
Technologies for Reducing Power
Trevor Mudge
Bredt Family Professor of Engineering
Computer Science and Engineering
The University of Michigan, Ann Arbor
SAMOS X, July 18th 2010
1
Technologies for Reducing Power
2
 Near threshold operation
 3D die stacking
 Replacing DRAM with Flash memory
ACAL – University of Michigan
2
2
Background


3
Moore’s Law
 density of components doubles without increase in cost every 2 years
 ignoring NRE costs, etc.
 65➞45➞32➞22➞16➞11➞8nm (= F nanometers)
 Intel has 32nm in production
Energy per clock cycle
IleakVdd
E  CV 
f
2
dd


Vdd is the supply voltage, f is the frequency, C capacitance
and Ileak is the leakage current
Vth is the “threshold” voltage at which the gate switches

 e.g. Vth ≈ 300 mV and Vdd ≈ 1V
ACAL – University of Michigan
3
3
The Good Old Days—Dennard Scaling
4
If s is the linear dimension scaling factor (s ≈ √2)
 Device dimension tox, L, W
 Voltage V
 Current I
 Capacitance εA/t
 Delay VC/I
 Power VI
 Power density VI/A
ACAL – University of Michigan
1/s
1/s
1/s
1/s
1/s
1/s2
1
4
4
Recent Trends
Circuit supply
voltages are no
longer scaling…
5
Therefore, power doesn’t
decrease at the same rate that
transistor count is increasing –
energy density is skyrocketing!
Stagnant
Shrinking
Dynamic dominates
CVdd2 IleakVdd
U

A
Af
CVdd2 IleakVdd
U

A
Af

A = gate area  scaling 1/s2
C = capacitance  scaling < 1/s
The emerging dilemma:

More and more gates can fit on a die,
but cooling constraints are restricting their use
ACAL – University of Michigan
5
Impact on Dennard scaling
6
If s is the linear dimension scaling factor ≈ √2
 Device dimension tox, L, W
 Voltage V
 Current I
 Capacitance εA/t
 Delay VC/I
 Power VI
 Power density VI/A
ACAL – University of Michigan
1/s
1/s ➞ 1
1/s
1/s
1/s ➞ 1
1/s2 ➞ 1/s
1➞ s
6
6
Techniques for Reducing Power
7
 Near threshold operation—Vdd near Vth
 3D die stacking
 Replacing DRAM with Flash memory
ACAL – University of Michigan
7
7
Today: Super-Vth, High Performance, Power Constrained
8
Energy / Operation
Super-Vth
3+ GHz
40 mW/MHz
Normalized Power, Energy, & Performance
Log (Delay)
Energy per operation is the key
metric for efficiency. Goal: same performance,
low energy per operation
0
Vth
Vnom
Supply Voltage
ACAL – University of Michigan
Core i7
8
Sub-Vth
9
Super-Vth
~16X
Log (Delay)
Energy / Operation
Subthreshold Design
500 – 1000X
0
Operating in the sub-threshold gives us huge
power gains at the expense of performance 
OK for sensors!
Vth
Vnom
Supply Voltage
ACAL – University of Michigan
9
Energy / Operation
Near-Threshold Computing (NTC)
Sub-Vth NTC Super-Vth
~6-8X
~2-3X
Near-Threshold Computing (NTC):
• 60-80X power reduction
• 6-8X energy reduction
Log (Delay)
•
Invest portion of extra transistors from
scaling to overcome barriers
~50-100X
~10X
0
Vth
Vnom
Supply Voltage
ACAL – University of Michigan
10
10
Restoring performance
12
 Delay increases by 10x
 Computation requires N operations
 Break into N/10 parallel subtasks—execution time restored
 Total energy is still 8X less—operation count unchanged
 Power 80X less
 Predicated on being able to parallelize workloads
 Suitable for a subset of applications—as noted earlier

 Streams of independent tasks—a server
 Data parallel—signal/image processing
Important to have a solution for code that is difficult to
parallelism—single thread performance
ACAL – University of Michigan
12
Interesting consequences: SRAM
10M
Logic
SRAM
1M
1
10
10k
1k
100
10
Dynamic
0
10
1
100m
0.0
Leakage
-1
0.2
0.4
0.6
0.8
1.0
1.2
10
0.2
0.4
VDD (V)



Logic
SRAM
Total
Energy (norm)
Delay (norm)
100k
13
0.6
0.8
1.0
VDD (V)
SRAM has a lower activity rate than logic
VDD for minimum energy operation (VMIN) is higher
Logic naturally operates at a lower VMIN than SRAM—and
slower
ACAL – University of Michigan
13
1.2
NTC—Opportunities and Challenges
 Opportunities:
 New architectures
 Optimize processes to gain back some of the 10X delay
 3D Integration—fewer thermal restrictions
 Challenges:
 Low Voltage Memory
 New SRAM designs
 Robustness analysis at near-threshold
 Variation
 Razor and other in-situ delay monitoring techniques
 Adaptive body biasing
 Performance Loss
 Many-core designs to improve parallelism
 Core boosting to improve single thread performance
ACAL – University of Michigan
14
14
Proposed Parallel Architecture
15
2nd level
memory
2nd level
memory
cluster1
cache/SRAM
(f0,Vdd0,Vth0)
cache/SRAM
(f0,Vdd0,Vth0)
cluster
…
Core
(f0,Vdd0,Vth0)
Core
(f0,Vdd0,Vth0)
…
clustern
cache/SRAM
(k*fcore,Vddmem,Vthmem)
level
converter
core1
(fcore,Vddcore,Vthcore)
……
corek
(fcore,Vddcore,Vthcore)
1. R. Dreslinski, B. Zhai, T. Mudge, D. Blaauw, and D. Sylvester. An Energy Efficient Parallel Architecture Using Near Threshold Operation. 16th Int. Conf. on Parallel
Architectures and Compilation Techniques (PACT), Romania, Sep. 2007, pp. 175-188.
2. B. Zhai, R. Dreslinski, D. Blaauw, T. Mudge, and D. Sylvester. Energy Efficient Near-threshold Chip Multi-processing. Int. Symp. on Low Power Electronics and Design - 2007
(ISLPED), Aug. 2007, pp. 32-37.
ACAL – University of Michigan
15
Cluster Results
16
230MHz Equivalent Performance
1.2
 Single CPU @
Cholesky Benchmark
233MHz
L2
1
L1
Normalized Power
 Baseline
Core
0.8

0.6
NTC 4-Core
 One core per L1
 53% Avg. savings over
Baseline
48%
0.4

0.2
0
Single CPU
NTC 4-Core Clustered NTC
Clustered NTC
 Multiple cores per L1
 3 cores/cluster
 2 clusters
 74% Avg. savings over
Baseline
ACAL – University of Michigan
16
New NTC Architectures
17
Next Level Memory
Next Level Memory
BUS / Switched Network
BUS / Switched Network
L1
L1
L1
L1
L1
Core
Core
Core
Core
Core
Cluster
Cluster
Cluster
Cluster
L1


Recall, SRAM is run at a higher VDD
than cores with little energy penalty
 Caches operate faster than core
Can introduce clustered architectures
 Multiple Cores share L1
 L1 operated fast enough to
satisfy all core requests in 1-cycle
 Cores see view of private single
cycle L1
ACAL – University of Michigan
Core
L1 L1L1
Core
Core
L1
Core

Advantages (leading to lower power):
 Clustered sharing
 Less coherence/snoop traffic

Drawbacks (increased power):
 Core conflicts evicting L1 data (more misses)
 Additional bus/Interconnect from cores to L1
(not as tightly coupled)
17
Digression—Chip Makers Response
perf
18
perf
 Exchanged cores for frequency



 multi / many-cores
freq
# cores
Risky behavior
 “if we build it, they will come”
 predicated on the solution to a tough problem—parallelizing software
Multi-cores have only been successes in
 throughput environments—servers
 heterogeneous environments—SoCs
 data parallel applications
Parallel processing is application specific
 that’s OK
 treat parallel machines a attached processors
 true in SoCs for some time—control plane / data plane separation
ACAL – University of Michigan
18
18
Measured thread level parallelism—TLP
19
Caveat: Desktop Applications
Evolution of Thread-Level Parallelism in Desktop Applications
G. Blake, R. Dreslinski, T. Mudge, University of Michigan, K. Flautner ARM, ISCA 2010, to appear.
ACAL – University of Michigan
19
19
Single thread performance: Boosting
20
Baseline
Cluster
 Cache runs 4x core frequency
L1
 Pipelined cache
Better Single Thread Performance
4 Cores @15MHz (650mV)
 Boosting
4x
Cache @ 60MHz (700mV)
Turn
some
cores
off,
speed
up
the
rest

 Cache frequency remains the same
Cluster
 Cache un-pipelined
L1
8x
 Faster response time
 Same throughput
 Core sees larger cache, hiding longer 1 Core @60MHz (850mV)
DRAM latency
Cache @ 60MHz (1 V)
 Increase core voltage and frequency further
Cluster
 Overclock
L1
 Cache frequency must be increased
 Even faster response time
 Increased throughput
1 Core @120MHz (1.3V)
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Cache @ 120MHz (1.5V)
ACAL – University of Michigan
20
Single Thread Performance
21
 Look at turning off cores and speeding the remaining cores to gain faster

response time.
Graph of cluster performance (not measured – intuition)
R. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge. Near Threshold Computing: Overcoming Performance Degradation from Aggressive Voltage Scaling. Workshop on
Energy-Efficient Design (WEED 2009), held at 36th Int. Symp. on Computer Architecture, Austin, TX, June, 2009.
ACAL – University of Michigan
21
Boosting Clusters—scaled to 22nm
22
Cluster
Baseline
 Cache runs 4x core frequency
 Pipelined cache
Better Single Thread Performance
 Turn some cores off, speed up the rest
 Cache frequency remains the same
 Cache un-pipelined
 Faster response time
 Same throughput
 Core sees larger cache, hiding longer
DRAM latency
 Boost core voltage and frequency further
 Cache frequency must be increased
 Even faster response time
 Increased throughput
L1
Core
Core
Core
Core
4 Cores @140MHz
4x
Cache @ 60MHz
Cluster
L1
Core
Core
8x
Core
Core
1 Core @600MHz
Cache @ 600MHz
Cluster
L1
Core
Core
Core
1 Core @1.2GHz
Cache @ 1.2GHz
ACAL – University of Michigan
22
Core
Technologies for Reducing Power
24
 Near threshold operation
 3D die stacking
 Replacing DRAM with Flash memory
ACAL – University of Michigan
24
24
A Closer Look at Wafer-Level Stacking
25
Preparing a TSV—through silicon via
Oxide
Silicon
Dielectric(SiO2/SiN)
“Super-Contact”
Gate Poly
STI (Shallow Trench Isolation)
W (Tungsten contact & via)
Al (M1 – M5)
Cu (M6, Top Metal)
Bob Patti, CTO Tezzaron Semiconductor
ACAL – University of Michigan
25
Next, stack second wafer & thin
26
FF: face-to-face
Bob Patti, CTO Tezzaron Semiconductor
ACAL – University of Michigan
26
Then stack a third wafer
27
3rd wafer
FB: face-to-back
2nd wafer
1st wafer: controller
Bob Patti, CTO Tezzaron Semiconductor
ACAL – University of Michigan
27
Finally, flip, thin and add pads
28
1st wafer: controller
2nd wafer
This is the completed
stack
3rd wafer
Bob Patti, CTO Tezzaron Semiconductor
ACAL – University of Michigan
28
Characteristics




29
Very high bandwidth low-latency low-power buses possible


10,000 vias / sq mm
Electrical characteristics: ∼ 1fF and < 1Ω
No I/O pads for inter-stack connections—low-power

Consider a memory stack:
 DDR3 ~40mW per pin
 1024 Data pins →40W
 4096 Data pins →160W
 die on wafer ~24uW per pin
Pros / cons



3D interconnect failure < 0.1ppm
Heat—1 W/sq mm
KGD may be a problem—foundry
Different processes can be combined


DRAM / logic / analog / non-volatile memories
e.g. DRAM—split sense amps and drivers from memory cells
ACAL – University of Michigan
29
29
Centip3De—3D NTC Project
30
Logic - A
Logic - B
F2F Bond
Logic - B
Logic - A
DRAM Sense/Logic – Bond Routing
DRAM
F2F Bond
DRAM
Centip3De Design
•130nm, 7-Layer 3D-Stacked Chip
•128 - ARM M3 Cores
•1.92 GOPS @130mW
•tapedout: Q1 2010
ACAL – University of Michigan
30
Stacking the Die
31
Cluster Configuration
•4 Arm M3 Cores @ 15MHz (650mV)
•1 kB Instruction Cache @ 60MHz (700mV)
•8 kB Data Cache @60 MHz (700mV)
•Cores connect via 3D to caches on other layer
System Configuration
•2-Wafer = 16 Clusters (64 Cores)
•4-Wafer = 32 Clusters (128 Cores)
•DDR3 Controller
Estimated Performance (Raytrace)
•1.6 GOPS (0.8 GOPS on 2-Wafer)
•110mW (65mW on 2-Wafer)
•14. 8 GOPS/W
Fair Metric
•Centip3De achieves 24 GOPS/W
without DRAM
ACAL – University of Michigan
31
Design Scaling and Power Breakdowns
NTC Centip3De System
 ~600 GOPS (~1k GOPS in Boost)
 1.9 GOPS (3.8 GOPS in Boost)



Max 1 IPC per core
128 Cores
15 MHz

130nm
To
 130 mW
 14.8 GOPS/W (5.5 in Boost)
Max 1 IPC per core
 4,608 Cores
 140 MHz
 ~3W
22nm
 ~200 GOPS/W
Boosted Mode Power (mW)
NTC Mode Power (mW)
45
39
42
Cores
I-Caches
D-Caches
DRAM
67
28
336
7.0
2.9
ACAL – University of Michigan
32
Raytracing Benchmark
32
Technologies for Reducing Power
33
 Near threshold operation
 3D die stacking
 Replacing DRAM with Flash memory
ACAL – University of Michigan
33
33
34
FIN—Thanks
ACAL – University of Michigan
34
Background – NAND Flash overview
35
 Dual mode SLC/MLC Flash
bank organization
 Single Level Cell (SLC)
 1 bit/cell
 105 erases per block
 25 μs read, 200 μs write
 Multi Level Cell (MLC)
 2 bits/cell
 104 erases per block
 50 μs read, 680 μs write
2112 / 4224 bytes
SLC page
MLC pages
2048
2048
64
64
2048
64
1 block = 64 SLC/128 MLC pages
Dual Mode NAND Flash memory
 Addressable read/write unit is the page
 Pages consist of 2048 bytes + 64 ‘spare’ bytes
 Erases 64 SLC or 128 MLC pages at a time (a block)
 Technology – less than 60nm
ACAL – University of Michigan
35
Reducing Memory Power
36
Area/bit
(μm2)
$/Gb
Active
Power
Idle
Power
DRAM
0.015
3
495mW
15mW
55ns
55ns
N/A
NAND
0.005
0.25
50mW
6μW
25μs
200μs
1.5ms
PCM
0.068
?
6μW
55ns
150ns
N/A
Read
Write
Erase
latency latency latency
NAND Flash cost assumes 2-bit-per-cell MLC. DRAM is a 2 Gbit DDR3-1333 x8 chip.
Flash power numbers are a 2 Gbit SLC x8 chip. Area from ITRS Roadmap 2009.
 Flash is denser than DRAM
 Flash is cheaper than DRAM
 Flash good for idle power optimization

1000× less power than DRAM

DRAM still required for acceptable access latencies
 Flash not so good for low access latency usage model
 Flash “wears out” – 10,000/100,000 write/erase cycles
ACAL – University of Michigan
36
36
A Case for Flash as Secondary Disk Cache

37
Many server workloads use a large working-set (100’s of MBs
~ 10’s of GB and even more)
 Large working-set is cached to main memory to maintain high
throughput
Large portion of DRAM to disk cache




Many server applications are more read intensive than write
intensive
Flash memory consumes orders of magnitude less idle power
than DRAM
Use DRAM for recent and frequently accessed content and
use Flash for not recent and infrequently accessed content
 Client requests are spatially and temporally a zipf like distribution
 e.g. 90% of client requests are to 20% of files
ACAL – University of Michigan
37
37
A Case for Flash as Secondary Disk Cache
38
Specweb99
MP4
MP8
MP12
Network bandwidth - Mbps
(Throughput)
1,200
1,000
800
600
An access
latency of 100’s of microseconds can be tolerated.
400
200
0
12us
25us
50us
100us
400us
1600us
disk cache access latency to 80% of files
T. Kgil, and T, Mudge. FlashCache: A NAND Flash memory file cache for low power web servers. Proc. Conf.
Compiler and Architecture Support for Embedded Systems (CASES'06), Seoul, S. Korea, Oct. 2006, pp. 103-112.
ACAL – University of Michigan
38
38
Overall Architecture
39
Processors
Tables used to manage
Flash memory
128MB DRAM
FCHT
FBST
1GB
DRAM
FPST
FGST
128MB
DRAM
1GB Flash
Flash ctrl.
1GB Flash
DMA
Generic main memory +
Primary disk cache
Secondary disk
cache
Main memory
HDD ctrl.
Hard Disk Drive
Baseline
without
FlashCache
FlashCache
Architecture
ACAL – University of Michigan
39
39
Overall Network Performance - Mbps
MP4
MP8
40
MP12
Network Bandwidth - Mbps
1,200
1,000
128MB
DRAM + 1GB NAND Flash performs
800
as
600well as 1GB DRAM while requiring only about
400 die area (SLC Flash assumed)
1/3
200
0
DRAM 32MB + DRAM 64MB + DRAM 128MB DRAM 256MB DRAM 512MB
FLASH 1GB
FLASH 1GB +FLASH 1GB +FLASH 1GB +FLASH 1GB
DRAM 1GB
Specweb99
ACAL – University of Michigan
40
40
Overall Main Memory Power
read power
write power
41
idle power
Overall Power - W
3
2.5
2.5W
2
1.6W
Flash
Memory
consumes
much less idle power
1.5
than
DRAM
1
0.6W
0.5
0
DDR2 1GB active
DDR2 1GB powerdown
DDR2 128MB + Flash
1GB
SpecWeb99
ACAL – University of Michigan
41
41
Concluding Remarks on Flash-for-DRAM
42
 DRAM clockless refresh reduces idle power
 Flash density continues to grow
 Intel-Micron JV announced 25nm flash

 8GB die 167sq mm 2 bits per cell
 3 bits/cell is coming soon
PCRAM appears to be an interesting future alternative
 I predict single level storage using some form
of NV memory with disks replacing tape for
archival storage
ACAL – University of Michigan
42
42
Cluster Size and Boosting
0.9
16.0
0.8
GOPS
0.7
GOPS/W
•2-Die stack
12.0
10.0
0.5
8.0
0.4
6.0
0.3
•Fixed die size
•Fixed amount of cache per core
•Raytrace Algorithm
GOPS/W
4.0
0.2
2.0
0.1
1.8
0.0
0.0
1
2
3
4
Cores per Cluster
4-Core clusters are 27%
more energy efficient than
1-Core clusters
Boosting 4-Core Version
Achieves 81% more GOPS than
1-Core (Larger Cache)
1.6
5
1.4
Boosted GOPS
GOPS
Analysis:
14.0
0.6
43
1.2
1
0.8
0.6
0.4
0.2
0
1
ACAL – University of Michigan
2
3
4
Cores per Cluster
43
5
System Architecture
B-B
Interface
4-Core
Cluster
4-Core
Cluster
Layer
Hub
44
B-B
Interface
Sys Ctrl
Clock
4-Core
Cluster
4-Core
Cluster
4-Core
Cluster
4-Core
Cluster
4-Core
Cluster
4-Core
Cluster
4-Core
Cluster
4-Core
Cluster
Layer
Hub
Clock
System Comm
Fwd DRAM
DRAM
JTAG
4-Core
Cluster
4-Core
Cluster
Layer B
Layer A
4-Core
Cluster
4-Core
Cluster
Layer
Hub
Layer
Hub
4-Core
Cluster
4-Core
Cluster
JTAG
DRAM DRAM DRAM DRAM
ACAL – University of Michigan
Mem
Fwd
Mem
Fwd
DRAM DRAM DRAM DRAM
44
Cluster Architecture
JTAG In
M3
65 Mhz
Cache
Clock
60 Mhz
System
Clock
Cluster
Clock
Gen
ACAL – University of Michigan
Cluster
I-Cache
1024b
4-Way
60 Mhz
Cache
Clock
AMBA-like
Busses To
DRAM
(128-bit)
Cluster
D-Cache
8192b
4-Way
M3
JTAG Out
Layer 1
AMBA-like
Buses (32-bit)
3D integration
M3
M3
15 Mhz Core
Clocks with
0,90,180,270
degree phase offsets
45
Cluster
MMIO,
Reset Ctrl,
etc
System
Communication
Layer 2
45
46
 With With power and cooling becoming an increasingly costly part of the

operating cost of a server, the old trend of striving for higher performance
with little regard for power is over. Emerging semiconductor process
technologies, multicore architectures, and new interconnect technology
provide an avenue for future servers to become low power, compact, and
possibly mobile. In our talk we examine three techniques for achieving low
power: 1) Near threshold operation; 2) 3D die stacking; and 3) replacing
DRAM with Flash memory.power and cooling becoming an increasingly
costly part of the operating cost of a server, the old trend of striving for
higher performance with little regard for power is over. Emerging
semiconductor process technologies, multicore architectures, and new
interconnect technology provide an avenue for future servers to become
low power, compact, and possibly mobile. In our talk we examine three
techniques for achieving low power: 1) Near threshold operation; 2) 3D die
; and 3) replacing DRAM with Flash memory.
ACAL – University of Michigan
46
46
47
ACAL – University of Michigan
47
47
Solutions?
48
 Reduce Vdd


 “Near Threshold Computing”
 Drawbacks
 slower less reliable operation
 But
 parallelism
 suits some computations (more as time goes by)
 robustness techniques
 e.g. Razor—in situ monitoring
Cool chips (again)
 interesting developments in microfluidics
Devices that operate at lower Vdd without performance loss
ACAL – University of Michigan
48
48
Low-Voltage Robustness
6T with VTH Selection
6T w/0 VTH Selection
8T with VTH Selection
8T w/0 VTH Selection
2
Logic Rule Bitcell Area (m )
2.0
49
1.8
1.6
1.4
1.2
1.0
0.8
300 400 500 600 700 800 900 1000
VDD (mV)




VDD scaling reduces SRAM robustness
Maintain robustness through device sizing and VTH selection
Robustness measured using importance-sampling
In NTC range 6T is smaller
ACAL – University of Michigan
49
SRAM Designs
50
 HD-SRAM
40
WLRW
30
25
20
BLRW
R
15
10
5
0
2
4
6
8
10
12
14
 Skew VDD in column
 Skew GND in row VDD
 Target failing cells WL
 No bitcell changes
BL
 Skew hurts some cells
1
VDD2
WL
BLBAR
GND1
GND2
1.0
1.1V
0.7V
0.8
0.6
0.4
0.2
0.0
0
20
40
60
80
VDD and GND Skew (mV)
ACAL – University of Michigan
16
Read + Write Margin (VTH)
Crosshairs
Row & Column Error Rate (norm)

Number of Chips
 Differential write
 Single-ended read
 Asymmetric sizing
BL
 HD µ/σ = 12.1 / 1.16
 6T µ/σ = 11.0 / 1.35
Half-differential
Differential
35
WLR
50
100
Evolution of Subthreshold Designs
51
Subliminal 1 Design (2006)
-0.13 µm CMOS
processor
244m
-Used to investigate existence of Vmin
122m
memory
-2.60 µW/MHz
305m
181m
Proc A Proc B Proc C
253 µm
Subliminal 2 Design (2007)
-0.13 µm CMOS
IMEM
IMEM
CORE
CORE
CORE
DMEM
DMEM
DMEM
IMEM
253 µm
715 µm
Phoneix 1 Design (2008)
98 µm
- 0.18 µm CMOS
-Used to investigate process variation
-Used to investigate sleep current
-3.5 µW/MHz
-2.8 µW/MHz
Phoenix 2 Design (2010)
- 0.18 µm CMOS
-Commercial ARM M3 Core
-Used to investigate:
•Energy harvesting
•Power management
-37.4 µW/MHz
Unpublished Results – ISSCC 2010 – Do not disclose
ACAL – University of Michigan
51