Download EE241 - Spring 2005 Thermal Design

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Heat sink wikipedia , lookup

Cogeneration wikipedia , lookup

Underfloor heating wikipedia , lookup

Passive solar building design wikipedia , lookup

Thermoregulation wikipedia , lookup

Economizer wikipedia , lookup

Hyperthermia wikipedia , lookup

R-value (insulation) wikipedia , lookup

Solar air conditioning wikipedia , lookup

Thermal conductivity wikipedia , lookup

Thermal conduction wikipedia , lookup

ASHRAE 55 wikipedia , lookup

Thermal comfort wikipedia , lookup

Transcript
EE241 - Spring 2005
Advanced Digital Integrated
Circuits
Lecture 20:
Thermal design
Guest Lecturer: Prof. Mircea Stan
ECE Dept., University of Virginia
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware circuit design
Thermal sensors
References:
-
Intel® Technology Journal: http://developer.intel.com/technology/itj/
IBM Journal of Research and Development: http://www.research.ibm.com/journal/rd/
-
IEEE Transactions on Components and Packaging Technologies
-
IEEE Transactions on VLSI Systems
IEEE Journal on Solid-State Circuits
-
-
2
1
Why should you care about thermals?
Temperature affects:
Circuit performance
Circuit power (especially leakage)
System reliability
IC and system packaging cost
“Environment”
3
Circuit Performance vs. Temperature
Temperature => Transistor threshold and carrier mobility
Temperature => Transistor threshold and carrier mobility
IDS =
W
α
µCox (VGS −VTh )
2L
Temperature => Performance?
Temperature => Performance?
Source: E Long, WR Daasch, R Madge, B Benware, “Detection of Temperature Sensitive Defects Using ZTC”
VLSI Test Symposium, 2004
4
2
Leakage vs. Temperature
-3
log IDS [log A]
-4
k = 1.38x10^-23
q = 1.6x10^-19
kT/q = 25.9mV at 27C
= 23.5mV at 0C (273K)
= 32mV at 100C (373K)
S = kT/q ln10 (1+Cd/Ci)
-5
Subthreshold slope S>ln10 kT/q
-6
-7
-8
-9
0
0.2
0.4
0.6
0.8
1
1.2
VGS [V]
W kT
I ds = µ
L q
V g −VTh
2
e m kT q 1 − e
−
Vds
kT q
5
[Taur, Ning] EECS241 Lecture 3
Leakage Power
Fraction of leakage power increasing:
exponentially with each generation
exponentially dependent on temperature
Increasing
ratio for new
technology nodes
Static power/ Dynamic Power
70
Percentage
60
50
40
30
20
373
368
363
353
358
348
343
338
333
323
328
318
313
303
308
0
298
10
Temperature(K)
180nm
130nm
100nm
90nm
80nm
70nm
6
Source: Sankaranarayanan et al, University of Virginia
3
Reliability
The Arrhenius Equation: MTF=A*exp(Ea/k*T)
MTF: mean time to failure at T
A: empirical constant
Ea: activation energy
k: Boltzmann’s constant
T: absolute temperature
Failure mechanisms:
Die metalization (Corrosion, Electromigration, Contact spiking)
Oxide (charge trapping, oxide breakdown, hot electrons)
Device (ionic contamination, second breakdown, surface-charge)
Die attach (fracture, thermal breakdown, adhesion fatigue)
Interconnect (wirebond failure, flip-chip joint failure)
Package (cracking, whisker and dendritic growth, lid seal failure)
7
System Packaging Cost
Today…
Grid computing: power plants co-located near compute farms
IBM S/390:
refrigeration
Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”
IBM Journal of R&D
8
4
IC Packaging Cost
IBM S/390 processor subassembly: complex!
C4: Controlled Collapse Chip Connection (flip-chip)
Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”
IBM Journal of R&D
9
Desktop processor, simpler, but still…
Pentium 4, Itanium
Source: Intel web site
10
5
“Environment”
Environment Protection Agency (EPA): computers consume 10% of
commercial electricity consumption
This includes peripherals, possibly also manufacturing
A DOE report suggested this percentage is much lower
No consensus, but it’s still a lot
Equivalent power (with only 30% efficiency) for AC
CFCs used for refrigeration
Lap burn
Fan noise
11
Ultimate Effect: Thermal Runaway
Temperature => Leakage power => Temperature …
“Loop gain” > 1 trouble!
Source: Tom’s Hardware Guide
http://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html
12
6
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware circuit design
Thermal sensors
13
What do we mean by thermals?
Anything that has to do with heat/temperature
Heat is a form of energy transfer
Temperature is a measure of entropy and determines heat flow
Source: http://www.iun.edu/~cpanhd/C101webnotes/matter-and-energy/specificheat.html
14
7
Heat mechanisms
Heat Conduction: phonons, vibrations
Heat Convection: fluid molecules movement
Heat Radiation: photons, EM waves
Phase change: boiling, sublimation, condensation, etc.
Heat storage: specific heat
Refrigeration: move heat “backwards”
Other many mechanisms…
15
Conduction
“Similar” to electrical conduction (e.g. metals are good conductors)
Heat flow from high temperature to low temperature
Microscopic (vibration, adjacent molecules, electron transport)
In a material: typically in solids (fluids: distance between mol)
Typical example: thermal “slug”, spreader, heatsink
A
Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001
16
8
Convection
Macroscopic (bulk transport, mix of hot and cold, energy storage)
Need material (typically in fluids, liquid, gas)
Natural vs. forced (air or liquid)
Typical example: heatsink (fan), liquid cooling
Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001
17
Simplistic Thermal Model
Most thermal transfers: R = k/A
Power density matters!
Ohm’s law for thermals (steady-state)
∆V = I · R -> ∆T = P · R
T_hot = P · Rth + T_amb
Ways to reduce T_hot:
-
reduce P (power-aware)
-
reduce Rth (packaging)
-
reduce T_amb (move to Alaska?)
-
maybe also take advantage of transients (Cth)
18
9
Simplistic Dynamic Model
Electrical-thermal duality
V ≅ temp (T)
I ≅ power (P)
R ≅ thermal resistance (Rth)
C ≅ thermal capacitance (Cth)
RC ≅ time constant
KCL
differential eq. I = C · dV/dt + V/R
difference
eq.
∆V = I/C · ∆t + V/RC · ∆t
thermal domain
∆T = P/C · ∆t + T/RC · ∆t
(T = T_hot – T_amb)
One can compute stepwise changes in temperature
for any granularity at which one can get P, T, R, C
19
IC with die, package, heatsink
R = T/Q
R = V/I
Rja = Rjc + Rcs + Rsa = (Tj - Ta)/Q
Rsa = ((Ts - Ta)/Q) - Rjc - Rcs
20
10
Hot spots in Power4
Temperature “landscape”: space and time
How to estimate early in the design cycle?
21
Trends in Power Density
1000
Rocket
Nozzle
Watts/cm 2
Nuclear
Nuclear Reactor
Reactor
100
Pentium® 4
Pentium® III
Pentium® II
Hot plate
10
Pentium® Pro
Pentium®
i386
i486
1
1.5µ
1µ
0.7µ
0.5µ
0.35µ
0.25µ
0.18µ
0.13µ
0.1µ
0.07µ
Source: “New Microarchitecture Challenges in the Coming Generations
Generations of CMOS Process Technologies” – Fred Pollack, Intel Corp.
22
11
Thermals for low-power ICs
Different: little self-generated heat
But…
Cheaper packaging (higher Rth): challenge
More extreme ambient (freezing to hot)
Temporal thermal effects more important than spatial
23
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware circuit design
Thermal sensors
24
12
How do you model thermals?
Source: Electro-thermal circuit simulation using simulator coupling
Wunsche, S. Clauss, C. Schwarz, P. Winkler, F. IEEE Transactions on VLSI Systems, Sep 1997
25
Why need to model thermals?
Power metrics are not acceptable proxy
Chip-wide average will not capture hot spots
Localized average will not capture lateral coupling
Different units have different power densities
26
13
Power electronics: long time ago!
Integrated-circuit thermal modeling Castello, R. Antognetti, P. , IEEE Journal of SolidState Circuits Jun 1978
27
Model (package)
“Vertical” heat flow
28
14
Model (die)
•Block granularity (architecture)
•Grid (circuits)
•Also lateral flow
29
Spatial behavior - Hot Spots
Source: W. Huang, S. Ghosh, K. Sankaranarayanan, K. Skadron, and M. R. Stan. “Compact Thermal Modeling for Temperature-Aware Design.”
41st ACM/IEEE Design Automation Conference (DAC), June 2004
30
15
Time-Varying Behavior – Hot Spots
mesa
31
Tool validation: on-chip measurements
M. R. Stan, K. Skadron, M. Barcella, W. Huang, K. Sankaranarayanan, and S. Velusamy. “HotSpot: A Dynamic Compact Thermal Model at the Processor-Architecture Level.”
Microelectronics Journal: Circuits and Systems, Dec. 2003
32
16
Dynamic validation: measurements
Micred test chip, transient vs. HotSpot
33
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware circuit design
Thermal sensors
34
17
What can you do about thermals?
Better estimates of performance, power, reliability
Optimize at design time (e.g. package co-design)
Adapt at run-time
35
The Role of a Thermal Model
helps close loop for accurate design estimations:
static or dynamic
Power
Model
Thermal
Model
Performance
Model
Reliability
Model
36
18
Self-consistent leakage
37
Design flow: still work in progress!
38
19
Package co-design
For 200 traces (TPC-C, SPEC, Microsoft)
Thermal design point can be reduced to 75% of true “max power”
with minimal performance loss
Aggressive clock gating
Application variations
Underutilized resources
Source: Intel
39
Thermal Performance Graph
How to select a heat sink Seri Lee, Aavid Thermal Technologies
http://www.electronics-cooling.com/Resources/EC_Articles/JUN95/jun95_01.htm
40
20
Adapt at run-time
Temperature
Designed for Cooling Capacity w/out DTM
Designed for Cooling
Capacity w/ DTM
System
Cost Savings
DTM Trigger
Level
DTM Disabled
DTM/Response Engaged
Time
Source: David Brooks 2002
41
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware circuit design
Thermal sensors
42
21
Temperature-Aware circuit design
Power: first-order design constraint
max power consumption: limits power delivery
sustained power dissipation: limits thermal design/packaging
average active power and idle power consumption: limit battery life, etc.
fallacy: instantaneous power ≠ temperature
Power-aware design:
maximize performance for given power
Low-power design:
minimize power for required performance
Temperature-aware design:
performance, power, reliability: function of T
T function of power density, ambient T
maximize performance for given thermal envelope
related to Power Density
43
Performance and Leakage
Temperature (Berkeley PTM 70nm CMOS):
Transistor threshold and mobility
Subthreshold leakage, gate leakage
Ion, Ioff, delay
44
22
Temperature-aware circuits
Robustness constraint: sets Ion/Ioff ratio
Robustness and reliability: Ion/Igate ratio
70nm CMOS, 1.2V, 110oC
Ion/Ioff ~ 1000
Ion/Igate ~ 10000
Idea: keep ratio
constant with T
Trade leakage for
performance
Ref: “Ghoshal et al. “Refrigeration Technologies…”, ISSCC 2000
Garrett et al. “T3…”, ISCAS 2001
45
Adaptive Ion/Ix control
Ion/Ioff = B/A = ct. through ABB
Temperature-aware circuits (TAC) patent (2004)
46
23
Resulting voltages
Wide range: -.4V < Vbb < .4V; 1.2V < Vdd < 1.3V
Almost linear
Robust to inter-die parameter variations
Needs trimming for setpoint
Margin for intra-die parameter variations
Active cooling or natural thermal landscape
47
Resulting performance
25% extra performance (110oC to 0oC) – only NMOS
13% from low temperature alone
"#
!
48
24
Temperature-Aware SRAM
Bit
Decoders
Pre-Charge
Bit
Cell Access
Transistors (N1)
Wordlines
Cell
(Number of
Entries)
Sense Amps
Bitlines
Number of
Ports
Number of
Ports
(Data Width of Entries)
Worst-case bitline leakage limits performance
49
SRAM Read time
Same circuit, different application
6T SRAM memory: “reverse application” (heating)
70nm process (200mV threshold)
Zero biasing at low temperature
50
25
SRAM bit-line sensing
Differential sensing (100mV bitline difference)
128 cells per bit line
Faster read even if higher RBB, smaller Ion
51
Electro-thermal simulations
A rational formulation of thermal circuit models for electrothermalsimulation. I. Finite element method [power electronic systems]
Jia Tzer Hsu Vu-Quoc, L.
Circuits and Systems I: Fundamental Theory and Applications, IEEE Transactions
52
26
Also need electro-thermal models
Electro-thermal circuit simulation using simulator coupling
Wunsche, S. Clauss, C. Schwarz, P. Winkler, F.
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
Sep 1997
53
SOI circuits
SOI thermal impedance extraction methodology and its significance for circuit simulation
Wei Jin Weidong Liu Fung, S.K.H. Chan, P.C.H. Chenming Hu
Electron Devices, IEEE Transactions on Apr 2001
54
27
Refrigeration
“conventional” vs. thermo-electric (TEC)
Can get T < T_amb (Rth < 0!)
TEC: Peltier effect (can use for local cooling)
55
TEC electro-thermal model
56
28
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware design
Thermal sensors
57
Sensors needed for run-time
Thermocouples – voltage output
Junction between wires of different materials; voltage at terminals is Tref – Tjunction
Often used for external measurements
Thermal diodes – voltage output
Biased p-n junction; voltage drop for a known current is temperature-dependent
Biased resistors (thermistors) – voltage output
Voltage drop for a known current is temperature dependent
You can also think of this as varying R
Example: 1 K metal “snake”
BiCMOS, CMOS – voltage or current output
Rely on reference voltage or current generated from a reference band-gap circuit; or
simple ring oscillators with no reference
Relative (just need to adapt) vs. Absolute sensors (need actual T)
May need a Reference – typically a Bandgap circuit
58
29
Typical Sensor Configuration
PTAT – Proportional to Absolute Temperature
59
Absolute Sensor
Delta Vgs Current Reference
Syal, Lee, Ivanov, Altet, Online Testing Workshop, 2001
Generator and Delay Cell
60
30
Sensors: Problem Issues
Poor control of CMOS transistor parameters
Noisy environment
Cross talk
Ground noise
Power supply noise
These can be reduced by making the sensor larger
This increases power dissipation
But we may want many sensors
61
Calibration
Accuracy vs. Precision
Analogous to mean vs. stdev
Calibration deals with accuracy
The main issue is to reduce inter-die variations in offset
Typically requires per-part testing and configuration
Basic idea: measure offset, store it, then subtract this
from dynamic measurements
62
31
Recap: Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware circuit design
Thermal sensors
Questions?
63
32