Download hiprop11

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
C. Heterogeneous Integration
Neil Goldsman and Bruce Jacob
1.0 Introduction
Heterogeneous integration (HI) has great potential for yielding significantly improved
circuit and system performance over standard approaches. With HI, new materials, devices
and geometries are possible. Using HI, we can layout complex circuits in 3-dimensional (3D)
geometry. This new capability opens avenues for novel device, circuit and system designs,
ranging from new implementations of analog and digital circuits, to the development of
improved optical sensors. In addition to designing systems which take advantage of HI, we
will develop methodologies to ensure that these designs are not limited with regard to
electromagnetic coupling and heat dissipation. Below we describe our proposed research
projects that investigate and take advantage of HI’s capabilities.
2.0 HI Interconnect Test Structure Evaluation
One of the key aspects of the HI project is to transform from the current paradigm, where
electronic circuits are composed of numerous 2-dimensional (2D) individual chips on printed
circuit boards (PCB), into circuits where all the functionality is contained in a single 3D
integrated structure. This new paradigm will have the advantage of not being constrained by
the limitations of chip-to-chip communication. In existing 2D systems, when signals
communicate between chips, they are often forced to operate at lower frequencies due to the
capacitance and inductance of the input/output (I/O) structures that are present on integrated
circuits. By transforming to 3D structures, we are avoiding the I/O problem. Instead of
signals having to communicate between chips, the signals will propagate inside the selfcontained 3D integrated structure. This will avoid the I/O problem, and can lead to
significantly faster circuit operating speeds. In our background investigation, we have
developed test structures that have been sent out for fabrication. These structures are
designed to indicated the improvements of transforming to 3D systems. In the coming years
we plan to continue to investigate characterizing such improvements that 3D structures yield.
Our first major task will be to perform measurements on the HI interconnect test
vehicle structures. Specifically, we must characterize the 3D metal vias that make
connections vertically between different sub-circuits on the 3D structure. In addition to the
vias, we also have to characterize inter-metal parasitics. That is, we will determine the
intrinsic resistance, capacitance and inductance of the 3D interconnect network. We will first
perform DC analyses to determine the resistance of the metal interconnect network. The
resistance measurements will be followed by AC network analyzer-type measurements to
determine the intrinsic reactances of the test structure. After analyzing the HI interconnect
test vehicle, a careful comparison of the DC, AC and reliability results will be compared with
similar circuits that are constructed in the traditional, 2D topology.
3D integration also allows for enhanced use of passive circuit elements to be
integrated into the structure. Currently, all but the smallest inductors and capacitors must be
fabricated off-chip. However, 3D inductors of much higher value and, perhaps, lower loss
1
should be realizable by taking advantage of HI. We plan to continue our design of 3D
inductors and transformers. Optimal layout will be investigated. In addition, we plan to
investigate the possibility of the designing 3D capacitive networks directly on the HI
integrated system, along with the possibility of achieving hi-K dielectrics between layers to
maximize capacitance without having to resort to off-chip designs.
In our accomplishments so far, we have designed and had fabricated interconnect and
contact pad structures. We have measured the effect of contact pads on IC performance, as
well as the coupling between interconnects on IC’s. We have also designed on-chip inductors
of different planar and 3D structures, and have sent these designs to MOSIS for fabrication.
From the results of these tests, we should be able to extrapolate the advantages of 3D
inductor and transformer structures, especially for RF IC’s. From a theoretical point of view,
we have developed two different codes for modeling electromagnetic effects. One is a
complete 3D time-domain Maxwell equations solver; the other is a frequency domain solver
that is particularly aimed at designing 3D on-chip inductors. The time domain solver is based
on the finite-difference-alternating-direction-implicit (FD-ADI) method. The advantage of
this method is that it allows for numerical solutions of spatial resolutions that are typical of
dimensions found in IC’s. We have a publication forthcoming on this approach of analyzing
EM coupling in IC’s. We plan to extend these calculations to more complicated 3D
interconnect geometries. Our modeling in the frequency domain has so far been aimed at
calculating the frequency dependence of on chip inductors. Next we plan to include the
unwanted capacitive parasitics of these on-chip 3D inductors. We also plan to extract RLC
equivalent circuits from the results of the distributed frequency domain calculations for 3D
IC’s.
3.0 Analog and Mixed Signal System Design
3.1 RF circuit improvements with HI
HI presents the possibility of integrating novel devices or circuit structures with the
expanded geometrical possibilities beyond simple planar processing. As mentioned in
Section 1.0, 3D integration makes it possible to create better-quality and higher value
inductors and, by extension, transformers on-chip. These new structures may prove to be
extremely beneficial for analog integrated circuits, which rely heavily on passive elements.
Currently, with standard 2D fabrication methodologies, passive elements on IC’s are limited
to small capacitors and inductors. With 3D integration, these limits can be removed and
much larger capacitors, inductors and even transformer may be attainable. With these new
circuit elements, significant strides could be made in analog and mixed signal circuit designs.
This should be especially evident in communication-type high-frequency circuits, which rely
heavily on impedance matching and tuning techniques that require high-Q passive
components. We plan to design and prototype these systems and demonstrate how HI can be
used to improve their performance.
Different receiver architectures with varying methods of demodulation and
addressing noise and interference problems are present and will be investigated in the context
of HI. Several points worth looking into are the attainability of better in-phase/quadrature
matching and noise suppression in receiver architectures. We are going to be investigating
2
receiver architectures with implementation on the PCB and IC levels for measurements to
identify problem-points and viable improvements. Another obvious application is to
determine if 3D integration can be used to increase attainable circuit speeds with a given
device process. Similarly, the increased geometrical freedom of 3D integration serves as a
starting point for looking into new layouts for A/D and D/A converters, which form the
backbone of many mixed signal systems. Finally, mixed signal applications will be explored
where shielding is facilitated by the 3D structure to minimize digital noise from interfering
with low-noise analog circuits.
In our background investigation, we have designed, had fabricated and tested several
key RF subsystems, including voltage controlled oscillators (VCO) and phase-locked loops
(PLL). These subsystems are ubiquitous, fundamental blocks in RF communication IC’s.
These chips were designed and fabricated using a 0.5micron CMOS process. Our
experimental results indicated that there was an extremely large amount of coupling between
analog and digital blocks on these IC’s. We found that the digital coupling reduced the signal
to noise ratio by as much at 37dB. We plan to develop a model to predict this coupling, and
next investigate how moving to a 3D system will improve the signal to noise ratio.
3.2 Optical sensor design utilizing HI
The geometrical flexibility of 3D integration can be exploited for constructing higherfill-factor, possibly faster scanning optical sensors with more of the fundamental signal
processing shifted closer to the light-sensitive components. We plan to employ CMOS active
pixel technology to develop the optical image sensor. We will begin with standard designs
that are realizable using the MOSIS facility. This design consists of a photodiode, which is
connected to a simple CMOS amplifier circuit within a single pixel element. The amplified
photo-signal is thereby activated to provide higher contrast relative to neighboring pixels. We
plan to go beyond standard designs to perform signal processing directly at the pixel level.
More specifically, we plan to map the pixel array directly onto a numerical grid where
computations, such as differentiation, can be directly performed. By performing the signal
processing directly at the pixel level, we will be able to facilitate packing more
computational power into a single integrated sensor system IC. Using device modeling, at
later stages we also hope to design our own photodiodes that are specifically optimized for
the optical sensing in the HI environment.
In our background investigation, we began designing a focal plane array that can be
accessed in parallel by taking advantage of 3D implementation. Each pixel in the array is
directly connected to a processing unit. We have designed processing units for optical edge
detection at the pixel level for this parallel 3D implementation. The processor was designed
using the high level the hardware description language verilog. The pixel processor was
simulated using Cadence, and synthesized using Cadence and Synopsys tools and a very
basic set of gates available from the Mississippi State University design library. A
synthesizer basically generates a gate level netlist of a design, given the design and a set of
libraries. This design was then extended to compute edge detection on a 100 pixel sensor
array (10 pixels X 10 pixels). This design was also synthesized using the above tools. We
plan to test and validate the design on a Xilinx field programmable gate array (FPGA), before
committing the design to an ASIC. We have also designed a pipelined analog to digital
converter to transform the pixel voltage into a digital signal. Once tested, we plan to develop
3
the parallel pixel array, and then interface it to the aforementioned processor in a 3D
structure.
4.0 Prevention of Potential Limitations to HI
4.1Thermal Dissipation
3D integration is likely to give rise to much higher densities of active and passive
devices. In addition, these devices will be less likely to be in physical contact with the
external environment. As a result, we have to concern ourselves with the heat generated
during the operation of 3D circuits. Since it might be more difficult to dissipate this heat,
thermal issues may arise that could have a mitigating effect on improved 3D circuit
performance. We plan to investigate thermal dissipation properties both experimentally and
theoretically. We plan to fabricate high-density 3D test vehicles that are composed of active
elements, and monitor their operation as a function of power dissipated by the circuit. In
addition to electrical measurements at the terminals, we plan to take infrared data as well.
Our experimental analysis will be complemented by theoretical ones. The theoretical
analysis on the device level will be to evaluate the heat produced and conducted during
device operation, and then calculate the resulting temperature profile inside a single device.
(These calculations will focus on MOSFETs.) This will be achieved by numerically solving a
lattice heat flow equation in conjunction with the semiconductor equations within a
MOSFET, requiring the development of new numerical methods for solving coupled systems
of differential equations. These calculations will be compared with experiments, and used to
predict maximum 3D packing densities in HI systems.
We plan to also predict heat dissipation on the circuit level. Using experimental and
numerical studies on a single device as well as entire circuits, we will develop SPICE models
that relate supply voltage levels and operating frequency to power dissipation and operating
temperatures. We will also develop register level models that similarly predict power
dissipation and operating temperatures given the array of IC components such as caches,
register files, ALUs, and control logic; their implementation parameters (size, width, ports,
etc.); and the expected application workload. These models can then be incorporated into
lumped algebraic device descriptions, and incorporated into SPICE. The resulting SPICE
circuit simulations will then relate terminal characteristics to ambient temperature levels
within HI circuits. The models can also be incorporated into higher-level CAD tools for the
thermal optimization of IC design.
In our background investigation, we developed a modeling method that calculates the
temperature distribution in an integrated circuit directly from the details of Joule heating in a
semiconductor device. A paper describing the method has been accepted for conference
publication. The following figure shows results for the theoretical modeling, given uniform
transistor density across the IC and uniformly random distribution of switching activity
across the transistors.
4
Figure 4.1.1: Temperature Gradients Within an Integrated Circuit
We plan to extend our theoretical modeling work so that it accounts for more details of the
actual chip functional units, as well as the materials used in chip packaging and
interconnects. We then plan to extend the method for use in chips thinned and stacked for 3D
integration.
Our experimental work will extend the theoretical work by measuring the transistor
density for different classes of on-chip structures. Then we will measure the operating
temperature of these structures while the chip is running for a range of application
benchmarks. For example, the following figure shows an initial chip design that is currently
in fabrication through MOSIS. We connect a vector source (a linear feedback shift register)
to a variety of 32-bit adder implementations (carry lookahead, fast carry lookahead, BrentKung, and ripple) and will characterize each implementation in terms of its transistor density
and operating temperature.
Figure 4.1.2: Experimental Test Circuit for Initial Temperature Study
5
Later chips will serve to characterize register files, SRAM-based caches, flip-flops,
latches, busses, other ALU and datapath logic such as shifters, multipliers, etc., and various
types of control logic.
We plan to implement two basic types of experimental techniques. One will be based
on infrared imaging of chips to ascertain hot spots during operation. The other method will
be to imbed thermal measuring devices directly into the chip (on-chip thermistors, etc.). The
experiments will help guide the theoretical model. Finally, once a very reliable numerical
model is established, we plan to provide design data as to how to best place the circuit
subsystems on the actual chip in an effort to minimize on-chip heating, in both 2D and 3D
systems. In addition, we will explore the use and exact positioning of thermal conduits and
contacts that can rapidly transfer the generated heat to the outside world.
4.2 Interconnect Cross-Coupling
3D HI systems are likely to have considerable potential for cross-talk between
conducting lines. Cross-talk implies the formation of a capacitive network between different
metals in the circuit, forming an unwanted or parasitic capacitive web. The web of these self
and mutual capacitances induce voltages on metal lines that are connected to high
impedances. From a circuit point of view, this could result in spurious voltage high-levels,
which would cause unwanted switching of circuit elements.
We have begun to write software to extract the capacitance matrix of an arbitrary
rectangular geometry in which metals of different sizes are scattered in an oxide region. This
software is based on the solution of the Poisson equation in 2 and 3 dimensions. Once we
have the capacitance matrix, it can be used with any voltage bias distribution to calculate the
coupling of electric fields and induced voltages between conductors. We plan to continue the
development of this software by relating it to specific circuit layout topologies. We plan to
apply the software to HI circuits to ensure that we produce 3D layouts that minimize the
possibilities for disruptive cross-coupling effects. We also plan to model the effect of
interlayer shielding, which may be helpful for reducing parasitic coupling, especially in
mixed signal circuits. Furthermore, the elements of the matrix provide the SPICE values for
the parasitic capacitors. Thus, these values can be used directly for circuit simulation. In
essence, using differential equation-based modeling, we can obtain extracted lumped circuits
elements for efficient circuit analysis and design of the new circuit topologies realized in HI.
This project is especially complementary to the interconnect test structure evaluation we
described in section 2.0.
6
5.0 HI Integrated Circuit and Digital System Design
5.1 Development of EMI-Resistant Digital Systems Using HI
Today, due to the high switching speeds and low voltage levels used in highperformance microprocessors, modern designs must deal explicitly with signal integrity
issues. The trends of high-performance computing are such that tomorrow’s computers will
be even more susceptible to signal integrity problems: tomorrow’s designs will use lower
voltage levels, faster clock speeds, and smaller devices. Each of these trends contributes
independently to signal integrity problems:

As voltage levels decrease, the noise levels required to corrupt a signal also
decrease.

As clock periods decrease, the portion of a cycle during which a signal is in
transition (as opposed to being stable) has become very significant, and it has
been shown that signals are most susceptible to upset during transition.

As device dimensions decrease, devices operate using fewer electrons and are
thus more easily upset.
Recently, we have created a physical prototype of an architecture that would
withstand arbitrary numbers of device upsets in the CPU. The prototype implements a
checkpoint/rollback scheme in hardware. The scheme stores fundamental CPU state
periodically to a “safe storage” chip implemented with larger, slower devices using larger
voltage swings. The scheme requires enormous off-chip bandwidth to avoid heavy
performance penalties, which is where HI comes in. Without HI, the scheme would only be
feasible with optical interconnects. The following figure illustrates the prototype’s physical
organization.
Debugging
Interface
RISC-based
CPU
Data and
Control
Signals
Safe
Storage
Area
Figure 5.1.1. System-level View of the Fault-tolerant System
The following is a die photo of the Safe-Storage chip. The chip was fabricated by
MOSIS in AMI’s 0.5um technology and measures 2.2mm per side (received from MOSIS
fall/winter 2002).
7
Figure 5.1.2. Safe Store Die Photo
The CPU chip has recently been fabricated through MOSIS in TSMC’s 0.25um
process technology (received from MOSIS spring 2003). The physical design is shown in
the figure below.
Figure 5.1.3. Robust-CPU Physical Design
We propose to continue the work on the EMI-resistant architecture and fully
characterize its behavior. We will compare two implementations: one planar, one 3D. Each
requires a high-bandwidth connection between processor and memory; due to the physical
limitations of a planar design, a planar chip may not even be able to support the number of
pins that the fault-tolerance mechanism requires. Even assuming that one could build a
planar design with the required number of pins, compared to the 3D implementation, the
8
planar implementation will require either significant complexity in routing wires or a simpler
bus structure with lower performance. The two designs will be compared to quantify the
effects on die area and performance.
We will also compare the effectiveness of the approach to various shielding
technologies, including an application of HI to build a Faraday cage around a processor and
its accompanying cache subsystem.
5.2 Development of High-Performance Systems Using HI
This sub-task investigates novel integration and architecture techniques for building
high-performance systems. 3D integration can allow the integration of high-density caches
and memory interconnects into a CPU package.
We plan to build a prototype 3D interconnect between a microprocessor and its cache
subsystem. The space conservation potential of HI technology can be exploited to improve
the speed performance of circuits as well. To achieve this end, we are investigating a 3D
CPU+SRAM processor design. We will model a system that spans three dimensions, with
the system’s caches, and the CPU’s pipeline, register file, and other structures extending over
multiple chips connected via 3D integration. The goal is to ultimately compare the design
complexity, area, and performance of a 3D implementation (via MOSIS and integrated
through LPS) to a similar planar design built with traditional technologies (via MOSIS).
To illustrate, Figure 5.2.1 shows two implementations of a cache structure.
Figure 5.2.1. Planar and 3D Cache Implementations
9
On the left is a typical planar structure, comprised of an array of cache blocks and a
set of sense amplifiers at the bottom that read values stored in the cache blocks. The access
time of the cache structure is proportional to the length plus the width of the cache’s physical
layout, this representing the longest distance that a signal must travel during a cache read or
write. As shown on the right, using 3D integration, the same amount of storage can fit into a
smaller physical volume and thus have a lower access time, or at the same access time one
can have a much larger cache.
5.3 Simultaneous Switching Noise (Ground Bounce), and HI Mitigation
Figure 5.3.1, taken from Toshiba, shows measurements of a high-speed DRAM
interface (FCRAM, or Fast Cycle DRAM), with measurements taken at the DRAM’s pins
(top) and measurements taken at the memory controller’s pins (bottom).
Figure 5.3.1. Ground Bounce Effects in Toshiba DRAM System
The figure illustrates quite clearly the effect of simultaneous switching noise, also
called ground bounce. The voltage levels of the ground planes (VDDQ and VSSQ) are
plotted with the data pins (DQ0-15). One can see the effect on the ground planes when
multiple data pins switch in the same direction at the same time: the voltage levels of power
and ground (VDD and VSS) are pulled far from their normal values when the I/O drivers
switch from high to low or low to high. This phenomenon can significantly alter the timing
of inter-chip communications, since it is the voltage level represented by the VDD and VSS
networks that is driven onto the bus and compared to a reference at the receiving side. If
either of the transmitting chip’s ground planes are pulled beyond the threshold voltage, the
receiver chip will be unable to correctly identify the token until the transmitter’s ground
planes settle.
10
Modern systems solve the problem by adding dedicated sets of power/ground pins for
high-speed I/O. Note that there are five sets of ground planes and 16 data I/O lines. In this
memory system, there is a power/ground pair for roughly every three to four data pins; there
is not one common set of ground planes for all devices. Yet, even with this small number of
devices attached to each dedicated set of power and ground networks, there is significant
switching noise and significant ground bounce effects.
We believe that many of these effects can be eliminated with heterogeneous
integration. Using HI techniques to integrate system structures such as CPUs, caches, and
DRAMs, one can significantly reduce the size of the output pads, one can eliminate
altogether the bonds wires and package pins, and PCB traces become unnecessary.
Therefore one can implement chip-to-chip communications with parasitics that are at least an
order of magnitude less significant than in traditional multi-chip implementations. We
propose to build test structures into the prototypes described in sections 5.1 and 5.2 that will
allow us to obtain direct measurements of these values for the systems under test.
6.0 Heterogeneous Integration of Embedded Systems-on-Chip
This section proposes a new direction of research for heterogeneous integration,
targeting embedded systems. Embedded systems integrate heterogeneous technologies into
the same system in a seamless whole. They typically incorporate a digital processor running
dedicated control or signal processing software; in addition, a typical embedded system has
any number of analog and/or MEMS devices that serve as sensors and actuators for the
system. The components can be either tightly coupled (e.g. all on the same circuit board) or
loosely coupled (e.g. in a controller area network, CAN), or some combination of the two.
Embedded systems are distinguished from other computing systems by several
characteristics:



Their requirements
Their expectations
Their proliferation
Embedded systems have requirements that they do not share with “regular”
computing systems. For instance, they must have predictable, regular performance, and they
must be extremely cost-effective to build and maintain (saving several pennies in a design
can make the difference between success and failure of the product).
The expectations that we have of embedded systems are that they interact with the
real world in a safe, predictable manner, and that they interact in real-time. Their behavior
must be thoroughly verified because they are used in applications in which downtime and
failures are not allowed—embedded systems are typically found in applications in which
either producing an incorrect response at the right time or producing a correct response at the
wrong time can cause loss of life and/or property.
Embedded systems are far more numerous than other computing systems. Today,
embedded systems are by far the leading consumer of microprocessors: 99% of all
microprocessors are sold to manufacturers of embedded systems, whereas only 1% of all
microprocessors are sold to OEMs of servers, desktops, and laptops. There are many more
embedded systems in the world than there are desktops, laptops, and servers.
11
Furthermore, the obvious trend in modern-day systems (including embedded systems)
is toward higher degrees of integration. This will become the single most important issue
facing the embedded systems industry if it is not addressed soon.
To illustrate the problem, consider the modern VLSI design flow. It is an extremely
structured, regular flow, characterized by rigid design rules that, if followed, help to
guarantee that an implementation will work as specified. Thus, errors are confined to those
of specification (as opposed to errors of implementation or integration). The practical result
is that one can specify a design at a high level (e.g., register transfer level, RTL) and hand off
that specification to automated design tools that produce a working semiconductor part.
Because of the design rules and structured design methodology, one can verify that the part
will work before the part is fabricated; the part can be verified during the design phase, as
opposed to being verified after physical implementation.
We contrast this with the typical embedded-systems design flow: whereas the VLSI
design flow is characterized by having rigid design rules, the embedded-systems design flow
is characterized by having no real design rules. Most implementation and verification is
done by hand in an ad-hoc manner. Components are built to specification by hand and tested
in isolation. Final system verification is not performed until the entire system is built—once
all of the heterogeneous components are integrated. Verification cannot be performed during
the design phase, it must be performed after implementation.
Though this practice suffices at the current levels of integration and sophistication of
embedded applications, it will certainly not scale to tomorrow’s levels of integration wherein
digital components and analog components and MEMS components will all co-exist on the
same substrate. We propose investigating the issues involved in such integration with a view
toward guaranteeing reliability and predictability of performance. Clearly, the EMI-resistant
architecture of section 5.1 is within the scope of this work. The following sections describe
two more projects within the scope.
6.1 Integration of Digital, Analog, and MEMS Components
This sub-task focuses on the issues of integrating heterogeneous technologies on the
same substrate. For instance, modeling digital components as white-noise producers can
provide a worst-case estimate of the interaction between digital components and nearby
analog components. We will model this theoretically and verify the results experimentally.
For the primary deliverable, we propose to model and, if possible, build a system-on-chip
that integrates a low-power digital CPU core, an analog RF transmitter, and a MEMS power
supply. Dr. Reza Ghodssi can provide our group with the power supply, which he believes
will generate 1mW, more than enough to run an ultra-low-power digital processor (e.g., the
MSP430 from Texas Instruments is a 16-bit RISC processor that runs off 250 micro-A).
6.2 Development of High-Performance Control Systems
This sub-task focuses on the development of microprocessor architectures that are
better suited to the requirements of embedded systems than are the architectures in use today.
Existing architectures are suited for high performance systems; once the architectures
become obsolete in the high performance arena they still have life left: they are sold for
orders of magnitude less in the embedded systems arena. As a rule, these architectures do
12
not support the requirements of reliable, predictable, real-time performance but instead give
very good average performance with possibly large standard deviations. For instance, they
perform speculative operations of all sorts: they use caches, they use branch prediction, some
use dynamically scheduled execution. All of the operations cause the processor to behave in
ways that make static timing analysis of the system impossible; thus, these mechanisms are
inimical to embedded systems development. To better support the requirements of embedded
systems, architectures should provide hardware that is under software control at all times
(e.g., software-managed caches), and they should offer hardware assists to turn variablelength execution into constant-length execution (e.g., on-chip task databases for simple
search & sort operations). We will model and build these architectures.
13