Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
C. Heterogeneous Integration Neil Goldsman and Bruce Jacob 1.0 Introduction Heterogeneous integration (HI) has great potential for yielding significantly improved circuit and system performance over standard approaches. With HI, new materials, devices and geometries are possible. Using HI, we can layout complex circuits in 3-dimensional (3D) geometry. This new capability opens avenues for novel device, circuit and system designs, ranging from new implementations of analog and digital circuits, to the development of improved optical sensors. In addition to designing systems which take advantage of HI, we will develop methodologies to ensure that these designs are not limited with regard to electromagnetic coupling and heat dissipation. Below we describe our proposed research projects that investigate and take advantage of HI’s capabilities. 2.0 HI Interconnect Test Structure Evaluation One of the key aspects of the HI project is to transform from the current paradigm, where electronic circuits are composed of numerous 2-dimensional (2D) individual chips on printed circuit boards (PCB), into circuits where all the functionality is contained in a single 3D integrated structure. This new paradigm will have the advantage of not being constrained by the limitations of chip-to-chip communication. In existing 2D systems, when signals communicate between chips, they are often forced to operate at lower frequencies due to the capacitance and inductance of the input/output (I/O) structures that are present on integrated circuits. By transforming to 3D structures, we are avoiding the I/O problem. Instead of signals having to communicate between chips, the signals will propagate inside the selfcontained 3D integrated structure. This will avoid the I/O problem, and can lead to significantly faster circuit operating speeds. In our background investigation, we have developed test structures that have been sent out for fabrication. These structures are designed to indicated the improvements of transforming to 3D systems. In the coming years we plan to continue to investigate characterizing such improvements that 3D structures yield. Our first major task will be to perform measurements on the HI interconnect test vehicle structures. Specifically, we must characterize the 3D metal vias that make connections vertically between different sub-circuits on the 3D structure. In addition to the vias, we also have to characterize inter-metal parasitics. That is, we will determine the intrinsic resistance, capacitance and inductance of the 3D interconnect network. We will first perform DC analyses to determine the resistance of the metal interconnect network. The resistance measurements will be followed by AC network analyzer-type measurements to determine the intrinsic reactances of the test structure. After analyzing the HI interconnect test vehicle, a careful comparison of the DC, AC and reliability results will be compared with similar circuits that are constructed in the traditional, 2D topology. 3D integration also allows for enhanced use of passive circuit elements to be integrated into the structure. Currently, all but the smallest inductors and capacitors must be fabricated off-chip. However, 3D inductors of much higher value and, perhaps, lower loss 1 should be realizable by taking advantage of HI. We plan to continue our design of 3D inductors and transformers. Optimal layout will be investigated. In addition, we plan to investigate the possibility of the designing 3D capacitive networks directly on the HI integrated system, along with the possibility of achieving hi-K dielectrics between layers to maximize capacitance without having to resort to off-chip designs. In our accomplishments so far, we have designed and had fabricated interconnect and contact pad structures. We have measured the effect of contact pads on IC performance, as well as the coupling between interconnects on IC’s. We have also designed on-chip inductors of different planar and 3D structures, and have sent these designs to MOSIS for fabrication. From the results of these tests, we should be able to extrapolate the advantages of 3D inductor and transformer structures, especially for RF IC’s. From a theoretical point of view, we have developed two different codes for modeling electromagnetic effects. One is a complete 3D time-domain Maxwell equations solver; the other is a frequency domain solver that is particularly aimed at designing 3D on-chip inductors. The time domain solver is based on the finite-difference-alternating-direction-implicit (FD-ADI) method. The advantage of this method is that it allows for numerical solutions of spatial resolutions that are typical of dimensions found in IC’s. We have a publication forthcoming on this approach of analyzing EM coupling in IC’s. We plan to extend these calculations to more complicated 3D interconnect geometries. Our modeling in the frequency domain has so far been aimed at calculating the frequency dependence of on chip inductors. Next we plan to include the unwanted capacitive parasitics of these on-chip 3D inductors. We also plan to extract RLC equivalent circuits from the results of the distributed frequency domain calculations for 3D IC’s. 3.0 Analog and Mixed Signal System Design 3.1 RF circuit improvements with HI HI presents the possibility of integrating novel devices or circuit structures with the expanded geometrical possibilities beyond simple planar processing. As mentioned in Section 1.0, 3D integration makes it possible to create better-quality and higher value inductors and, by extension, transformers on-chip. These new structures may prove to be extremely beneficial for analog integrated circuits, which rely heavily on passive elements. Currently, with standard 2D fabrication methodologies, passive elements on IC’s are limited to small capacitors and inductors. With 3D integration, these limits can be removed and much larger capacitors, inductors and even transformer may be attainable. With these new circuit elements, significant strides could be made in analog and mixed signal circuit designs. This should be especially evident in communication-type high-frequency circuits, which rely heavily on impedance matching and tuning techniques that require high-Q passive components. We plan to design and prototype these systems and demonstrate how HI can be used to improve their performance. Different receiver architectures with varying methods of demodulation and addressing noise and interference problems are present and will be investigated in the context of HI. Several points worth looking into are the attainability of better in-phase/quadrature matching and noise suppression in receiver architectures. We are going to be investigating 2 receiver architectures with implementation on the PCB and IC levels for measurements to identify problem-points and viable improvements. Another obvious application is to determine if 3D integration can be used to increase attainable circuit speeds with a given device process. Similarly, the increased geometrical freedom of 3D integration serves as a starting point for looking into new layouts for A/D and D/A converters, which form the backbone of many mixed signal systems. Finally, mixed signal applications will be explored where shielding is facilitated by the 3D structure to minimize digital noise from interfering with low-noise analog circuits. In our background investigation, we have designed, had fabricated and tested several key RF subsystems, including voltage controlled oscillators (VCO) and phase-locked loops (PLL). These subsystems are ubiquitous, fundamental blocks in RF communication IC’s. These chips were designed and fabricated using a 0.5micron CMOS process. Our experimental results indicated that there was an extremely large amount of coupling between analog and digital blocks on these IC’s. We found that the digital coupling reduced the signal to noise ratio by as much at 37dB. We plan to develop a model to predict this coupling, and next investigate how moving to a 3D system will improve the signal to noise ratio. 3.2 Optical sensor design utilizing HI The geometrical flexibility of 3D integration can be exploited for constructing higherfill-factor, possibly faster scanning optical sensors with more of the fundamental signal processing shifted closer to the light-sensitive components. We plan to employ CMOS active pixel technology to develop the optical image sensor. We will begin with standard designs that are realizable using the MOSIS facility. This design consists of a photodiode, which is connected to a simple CMOS amplifier circuit within a single pixel element. The amplified photo-signal is thereby activated to provide higher contrast relative to neighboring pixels. We plan to go beyond standard designs to perform signal processing directly at the pixel level. More specifically, we plan to map the pixel array directly onto a numerical grid where computations, such as differentiation, can be directly performed. By performing the signal processing directly at the pixel level, we will be able to facilitate packing more computational power into a single integrated sensor system IC. Using device modeling, at later stages we also hope to design our own photodiodes that are specifically optimized for the optical sensing in the HI environment. In our background investigation, we began designing a focal plane array that can be accessed in parallel by taking advantage of 3D implementation. Each pixel in the array is directly connected to a processing unit. We have designed processing units for optical edge detection at the pixel level for this parallel 3D implementation. The processor was designed using the high level the hardware description language verilog. The pixel processor was simulated using Cadence, and synthesized using Cadence and Synopsys tools and a very basic set of gates available from the Mississippi State University design library. A synthesizer basically generates a gate level netlist of a design, given the design and a set of libraries. This design was then extended to compute edge detection on a 100 pixel sensor array (10 pixels X 10 pixels). This design was also synthesized using the above tools. We plan to test and validate the design on a Xilinx field programmable gate array (FPGA), before committing the design to an ASIC. We have also designed a pipelined analog to digital converter to transform the pixel voltage into a digital signal. Once tested, we plan to develop 3 the parallel pixel array, and then interface it to the aforementioned processor in a 3D structure. 4.0 Prevention of Potential Limitations to HI 4.1Thermal Dissipation 3D integration is likely to give rise to much higher densities of active and passive devices. In addition, these devices will be less likely to be in physical contact with the external environment. As a result, we have to concern ourselves with the heat generated during the operation of 3D circuits. Since it might be more difficult to dissipate this heat, thermal issues may arise that could have a mitigating effect on improved 3D circuit performance. We plan to investigate thermal dissipation properties both experimentally and theoretically. We plan to fabricate high-density 3D test vehicles that are composed of active elements, and monitor their operation as a function of power dissipated by the circuit. In addition to electrical measurements at the terminals, we plan to take infrared data as well. Our experimental analysis will be complemented by theoretical ones. The theoretical analysis on the device level will be to evaluate the heat produced and conducted during device operation, and then calculate the resulting temperature profile inside a single device. (These calculations will focus on MOSFETs.) This will be achieved by numerically solving a lattice heat flow equation in conjunction with the semiconductor equations within a MOSFET, requiring the development of new numerical methods for solving coupled systems of differential equations. These calculations will be compared with experiments, and used to predict maximum 3D packing densities in HI systems. We plan to also predict heat dissipation on the circuit level. Using experimental and numerical studies on a single device as well as entire circuits, we will develop SPICE models that relate supply voltage levels and operating frequency to power dissipation and operating temperatures. We will also develop register level models that similarly predict power dissipation and operating temperatures given the array of IC components such as caches, register files, ALUs, and control logic; their implementation parameters (size, width, ports, etc.); and the expected application workload. These models can then be incorporated into lumped algebraic device descriptions, and incorporated into SPICE. The resulting SPICE circuit simulations will then relate terminal characteristics to ambient temperature levels within HI circuits. The models can also be incorporated into higher-level CAD tools for the thermal optimization of IC design. In our background investigation, we developed a modeling method that calculates the temperature distribution in an integrated circuit directly from the details of Joule heating in a semiconductor device. A paper describing the method has been accepted for conference publication. The following figure shows results for the theoretical modeling, given uniform transistor density across the IC and uniformly random distribution of switching activity across the transistors. 4 Figure 4.1.1: Temperature Gradients Within an Integrated Circuit We plan to extend our theoretical modeling work so that it accounts for more details of the actual chip functional units, as well as the materials used in chip packaging and interconnects. We then plan to extend the method for use in chips thinned and stacked for 3D integration. Our experimental work will extend the theoretical work by measuring the transistor density for different classes of on-chip structures. Then we will measure the operating temperature of these structures while the chip is running for a range of application benchmarks. For example, the following figure shows an initial chip design that is currently in fabrication through MOSIS. We connect a vector source (a linear feedback shift register) to a variety of 32-bit adder implementations (carry lookahead, fast carry lookahead, BrentKung, and ripple) and will characterize each implementation in terms of its transistor density and operating temperature. Figure 4.1.2: Experimental Test Circuit for Initial Temperature Study 5 Later chips will serve to characterize register files, SRAM-based caches, flip-flops, latches, busses, other ALU and datapath logic such as shifters, multipliers, etc., and various types of control logic. We plan to implement two basic types of experimental techniques. One will be based on infrared imaging of chips to ascertain hot spots during operation. The other method will be to imbed thermal measuring devices directly into the chip (on-chip thermistors, etc.). The experiments will help guide the theoretical model. Finally, once a very reliable numerical model is established, we plan to provide design data as to how to best place the circuit subsystems on the actual chip in an effort to minimize on-chip heating, in both 2D and 3D systems. In addition, we will explore the use and exact positioning of thermal conduits and contacts that can rapidly transfer the generated heat to the outside world. 4.2 Interconnect Cross-Coupling 3D HI systems are likely to have considerable potential for cross-talk between conducting lines. Cross-talk implies the formation of a capacitive network between different metals in the circuit, forming an unwanted or parasitic capacitive web. The web of these self and mutual capacitances induce voltages on metal lines that are connected to high impedances. From a circuit point of view, this could result in spurious voltage high-levels, which would cause unwanted switching of circuit elements. We have begun to write software to extract the capacitance matrix of an arbitrary rectangular geometry in which metals of different sizes are scattered in an oxide region. This software is based on the solution of the Poisson equation in 2 and 3 dimensions. Once we have the capacitance matrix, it can be used with any voltage bias distribution to calculate the coupling of electric fields and induced voltages between conductors. We plan to continue the development of this software by relating it to specific circuit layout topologies. We plan to apply the software to HI circuits to ensure that we produce 3D layouts that minimize the possibilities for disruptive cross-coupling effects. We also plan to model the effect of interlayer shielding, which may be helpful for reducing parasitic coupling, especially in mixed signal circuits. Furthermore, the elements of the matrix provide the SPICE values for the parasitic capacitors. Thus, these values can be used directly for circuit simulation. In essence, using differential equation-based modeling, we can obtain extracted lumped circuits elements for efficient circuit analysis and design of the new circuit topologies realized in HI. This project is especially complementary to the interconnect test structure evaluation we described in section 2.0. 6 5.0 HI Integrated Circuit and Digital System Design 5.1 Development of EMI-Resistant Digital Systems Using HI Today, due to the high switching speeds and low voltage levels used in highperformance microprocessors, modern designs must deal explicitly with signal integrity issues. The trends of high-performance computing are such that tomorrow’s computers will be even more susceptible to signal integrity problems: tomorrow’s designs will use lower voltage levels, faster clock speeds, and smaller devices. Each of these trends contributes independently to signal integrity problems: As voltage levels decrease, the noise levels required to corrupt a signal also decrease. As clock periods decrease, the portion of a cycle during which a signal is in transition (as opposed to being stable) has become very significant, and it has been shown that signals are most susceptible to upset during transition. As device dimensions decrease, devices operate using fewer electrons and are thus more easily upset. Recently, we have created a physical prototype of an architecture that would withstand arbitrary numbers of device upsets in the CPU. The prototype implements a checkpoint/rollback scheme in hardware. The scheme stores fundamental CPU state periodically to a “safe storage” chip implemented with larger, slower devices using larger voltage swings. The scheme requires enormous off-chip bandwidth to avoid heavy performance penalties, which is where HI comes in. Without HI, the scheme would only be feasible with optical interconnects. The following figure illustrates the prototype’s physical organization. Debugging Interface RISC-based CPU Data and Control Signals Safe Storage Area Figure 5.1.1. System-level View of the Fault-tolerant System The following is a die photo of the Safe-Storage chip. The chip was fabricated by MOSIS in AMI’s 0.5um technology and measures 2.2mm per side (received from MOSIS fall/winter 2002). 7 Figure 5.1.2. Safe Store Die Photo The CPU chip has recently been fabricated through MOSIS in TSMC’s 0.25um process technology (received from MOSIS spring 2003). The physical design is shown in the figure below. Figure 5.1.3. Robust-CPU Physical Design We propose to continue the work on the EMI-resistant architecture and fully characterize its behavior. We will compare two implementations: one planar, one 3D. Each requires a high-bandwidth connection between processor and memory; due to the physical limitations of a planar design, a planar chip may not even be able to support the number of pins that the fault-tolerance mechanism requires. Even assuming that one could build a planar design with the required number of pins, compared to the 3D implementation, the 8 planar implementation will require either significant complexity in routing wires or a simpler bus structure with lower performance. The two designs will be compared to quantify the effects on die area and performance. We will also compare the effectiveness of the approach to various shielding technologies, including an application of HI to build a Faraday cage around a processor and its accompanying cache subsystem. 5.2 Development of High-Performance Systems Using HI This sub-task investigates novel integration and architecture techniques for building high-performance systems. 3D integration can allow the integration of high-density caches and memory interconnects into a CPU package. We plan to build a prototype 3D interconnect between a microprocessor and its cache subsystem. The space conservation potential of HI technology can be exploited to improve the speed performance of circuits as well. To achieve this end, we are investigating a 3D CPU+SRAM processor design. We will model a system that spans three dimensions, with the system’s caches, and the CPU’s pipeline, register file, and other structures extending over multiple chips connected via 3D integration. The goal is to ultimately compare the design complexity, area, and performance of a 3D implementation (via MOSIS and integrated through LPS) to a similar planar design built with traditional technologies (via MOSIS). To illustrate, Figure 5.2.1 shows two implementations of a cache structure. Figure 5.2.1. Planar and 3D Cache Implementations 9 On the left is a typical planar structure, comprised of an array of cache blocks and a set of sense amplifiers at the bottom that read values stored in the cache blocks. The access time of the cache structure is proportional to the length plus the width of the cache’s physical layout, this representing the longest distance that a signal must travel during a cache read or write. As shown on the right, using 3D integration, the same amount of storage can fit into a smaller physical volume and thus have a lower access time, or at the same access time one can have a much larger cache. 5.3 Simultaneous Switching Noise (Ground Bounce), and HI Mitigation Figure 5.3.1, taken from Toshiba, shows measurements of a high-speed DRAM interface (FCRAM, or Fast Cycle DRAM), with measurements taken at the DRAM’s pins (top) and measurements taken at the memory controller’s pins (bottom). Figure 5.3.1. Ground Bounce Effects in Toshiba DRAM System The figure illustrates quite clearly the effect of simultaneous switching noise, also called ground bounce. The voltage levels of the ground planes (VDDQ and VSSQ) are plotted with the data pins (DQ0-15). One can see the effect on the ground planes when multiple data pins switch in the same direction at the same time: the voltage levels of power and ground (VDD and VSS) are pulled far from their normal values when the I/O drivers switch from high to low or low to high. This phenomenon can significantly alter the timing of inter-chip communications, since it is the voltage level represented by the VDD and VSS networks that is driven onto the bus and compared to a reference at the receiving side. If either of the transmitting chip’s ground planes are pulled beyond the threshold voltage, the receiver chip will be unable to correctly identify the token until the transmitter’s ground planes settle. 10 Modern systems solve the problem by adding dedicated sets of power/ground pins for high-speed I/O. Note that there are five sets of ground planes and 16 data I/O lines. In this memory system, there is a power/ground pair for roughly every three to four data pins; there is not one common set of ground planes for all devices. Yet, even with this small number of devices attached to each dedicated set of power and ground networks, there is significant switching noise and significant ground bounce effects. We believe that many of these effects can be eliminated with heterogeneous integration. Using HI techniques to integrate system structures such as CPUs, caches, and DRAMs, one can significantly reduce the size of the output pads, one can eliminate altogether the bonds wires and package pins, and PCB traces become unnecessary. Therefore one can implement chip-to-chip communications with parasitics that are at least an order of magnitude less significant than in traditional multi-chip implementations. We propose to build test structures into the prototypes described in sections 5.1 and 5.2 that will allow us to obtain direct measurements of these values for the systems under test. 6.0 Heterogeneous Integration of Embedded Systems-on-Chip This section proposes a new direction of research for heterogeneous integration, targeting embedded systems. Embedded systems integrate heterogeneous technologies into the same system in a seamless whole. They typically incorporate a digital processor running dedicated control or signal processing software; in addition, a typical embedded system has any number of analog and/or MEMS devices that serve as sensors and actuators for the system. The components can be either tightly coupled (e.g. all on the same circuit board) or loosely coupled (e.g. in a controller area network, CAN), or some combination of the two. Embedded systems are distinguished from other computing systems by several characteristics: Their requirements Their expectations Their proliferation Embedded systems have requirements that they do not share with “regular” computing systems. For instance, they must have predictable, regular performance, and they must be extremely cost-effective to build and maintain (saving several pennies in a design can make the difference between success and failure of the product). The expectations that we have of embedded systems are that they interact with the real world in a safe, predictable manner, and that they interact in real-time. Their behavior must be thoroughly verified because they are used in applications in which downtime and failures are not allowed—embedded systems are typically found in applications in which either producing an incorrect response at the right time or producing a correct response at the wrong time can cause loss of life and/or property. Embedded systems are far more numerous than other computing systems. Today, embedded systems are by far the leading consumer of microprocessors: 99% of all microprocessors are sold to manufacturers of embedded systems, whereas only 1% of all microprocessors are sold to OEMs of servers, desktops, and laptops. There are many more embedded systems in the world than there are desktops, laptops, and servers. 11 Furthermore, the obvious trend in modern-day systems (including embedded systems) is toward higher degrees of integration. This will become the single most important issue facing the embedded systems industry if it is not addressed soon. To illustrate the problem, consider the modern VLSI design flow. It is an extremely structured, regular flow, characterized by rigid design rules that, if followed, help to guarantee that an implementation will work as specified. Thus, errors are confined to those of specification (as opposed to errors of implementation or integration). The practical result is that one can specify a design at a high level (e.g., register transfer level, RTL) and hand off that specification to automated design tools that produce a working semiconductor part. Because of the design rules and structured design methodology, one can verify that the part will work before the part is fabricated; the part can be verified during the design phase, as opposed to being verified after physical implementation. We contrast this with the typical embedded-systems design flow: whereas the VLSI design flow is characterized by having rigid design rules, the embedded-systems design flow is characterized by having no real design rules. Most implementation and verification is done by hand in an ad-hoc manner. Components are built to specification by hand and tested in isolation. Final system verification is not performed until the entire system is built—once all of the heterogeneous components are integrated. Verification cannot be performed during the design phase, it must be performed after implementation. Though this practice suffices at the current levels of integration and sophistication of embedded applications, it will certainly not scale to tomorrow’s levels of integration wherein digital components and analog components and MEMS components will all co-exist on the same substrate. We propose investigating the issues involved in such integration with a view toward guaranteeing reliability and predictability of performance. Clearly, the EMI-resistant architecture of section 5.1 is within the scope of this work. The following sections describe two more projects within the scope. 6.1 Integration of Digital, Analog, and MEMS Components This sub-task focuses on the issues of integrating heterogeneous technologies on the same substrate. For instance, modeling digital components as white-noise producers can provide a worst-case estimate of the interaction between digital components and nearby analog components. We will model this theoretically and verify the results experimentally. For the primary deliverable, we propose to model and, if possible, build a system-on-chip that integrates a low-power digital CPU core, an analog RF transmitter, and a MEMS power supply. Dr. Reza Ghodssi can provide our group with the power supply, which he believes will generate 1mW, more than enough to run an ultra-low-power digital processor (e.g., the MSP430 from Texas Instruments is a 16-bit RISC processor that runs off 250 micro-A). 6.2 Development of High-Performance Control Systems This sub-task focuses on the development of microprocessor architectures that are better suited to the requirements of embedded systems than are the architectures in use today. Existing architectures are suited for high performance systems; once the architectures become obsolete in the high performance arena they still have life left: they are sold for orders of magnitude less in the embedded systems arena. As a rule, these architectures do 12 not support the requirements of reliable, predictable, real-time performance but instead give very good average performance with possibly large standard deviations. For instance, they perform speculative operations of all sorts: they use caches, they use branch prediction, some use dynamically scheduled execution. All of the operations cause the processor to behave in ways that make static timing analysis of the system impossible; thus, these mechanisms are inimical to embedded systems development. To better support the requirements of embedded systems, architectures should provide hardware that is under software control at all times (e.g., software-managed caches), and they should offer hardware assists to turn variablelength execution into constant-length execution (e.g., on-chip task databases for simple search & sort operations). We will model and build these architectures. 13