Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Thermal runaway wikipedia , lookup
Immunity-aware programming wikipedia , lookup
Opto-isolator wikipedia , lookup
Electromagnetic compatibility wikipedia , lookup
Surge protector wikipedia , lookup
Portable appliance testing wikipedia , lookup
Semiconductor device wikipedia , lookup
Reliability & Failure Analysis Yeoh Lai Seng Sr. Failure Analysis & Reliability Engineer Fairchild Semiconductor (M) Sdn. Bhd. 1 Reliability 2 What is Reliability ? • A total of over 2,300,000 blocks of limestone and granite were used in its construction with the average block weighing 2.5 tons and none weighing less than 2 tons. The large blocks used in the ceiling of the King's Chamber weigh as much as 9 tons. •The estimated total weight of the structure is 6.5 million tons! • Construction date (Estimated): 2589 B.C. • Construction time (Estimated): 20 years. Questions : What is The Reliability of The Great Pyramid (a) for 4000 Years? (b) for 6000 Years? 3 Semiconductor Reliability Engineering : An Industry Perspective Reliability theory developed apart from the mainstream of probability and statistics. It was originally a tool to help nineteenth century maritime insurance and life insurance companies compute profitable rates to charge their customers. Today, we apply the reliability concept in semiconductor industry. "Semiconductor Reliability Engineering" refers to the development of technology, processes, and standards to ensure the reliability of semiconductor devices during application. It encompasses a vast set of engineering disciplines that ensure the continuous improvement in the reliability of every device. “Reliability” is defined as the ability of a device to conform to its electrical and visual/mechanical specifications over a specified period of time under specified conditions at a specified confidence level. Reliability engineering employs a wide variety of reliability tests to achieve continuous reliability improvement throughout the entire life cycle of the semiconductor device from design, to manufacturing, to its usage, and until after its failure. 4 Since it is often more difficult to improve the reliability of a semiconductor device after it has been released, as much effort as possible must be exerted to design units that are inherently reliable. This concept is known as "Designing for Reliability", or DFR. This consists of following all known design rules for making a device reliable, not only electrically, but visually and mechanically as well. These design rules must be updated regularly, to reflect the best known practices that ensure maximum reliability for a device. Building reliability into a device as early as the 'design phase' is a 'must', especially now that semiconductor devices reach obsolescence more quickly than in previous years. Once an integrated circuit has been designed and the first silicon comes out, reliability tests at wafer level are done to assess the reliability of the die. This is known as wafer-level reliability testing. Any reliability issues identified at this level must be corrected, since these will surely manifest even at package level. Note that the possibility of encountering wafer-level problems will be greatly minimized by diligently following the concept of DFR. If the new circuit passes wafer-level reliability testing, the wafer is assembled into its intended package. The packaged device will then undergo package-level reliability testing. 5 Package-level reliability testing refers to the assessment of the over-all reliability of the device in packaged form. This consists of subjecting packaged samples to reliability tests that expose the various sample sets to different stress conditions, after which the samples are tested for any degradation in quality after the stress. Since reliability stresses are often destructive, only a sample population is used for reliability testing. As such, the assessment of the reliability of the rest of the population is essentially statistical and probabilistic in nature. There are many industry-standard package-level reliability tests already available. The reliability test employed is chosen based on the failure mechanism of interest to the engineer, as different stress tests accelerate different failure mechanisms. Nonetheless, most reliability tests utilize one or more of the following stress factors to accelerate failure: temperature, moisture or humidity, current, voltage, and pressure. Prior to the official release of a new device for mass manufacturing, it must undergo full qualification. New device qualification is operationally the same as package-level reliability testing, except that it is systematized with the objective of generating official reliability data that would justify the mass manufacturing of the new device. New device qualification most often requires several sets of samples for different reliability tests. 6 Once the final semiconductor device has successfully completed the qualification process, it may be released for mass manufacturing and consumption. To ensure that no major process deviations occur in the manufacturing line, a regular monitoring of the reliability of the manufactured units is performed. Reliability monitoring, as this activity is often referred to, consists of getting finished samples from the line and subjecting these to reliability testing. Valid reliability failures should undergo root cause analysis for reliability improvement. In summary, an excellent semiconductor reliability engineering system would have all of the following components: 1) design for reliability; 2) wafer-level reliability testing; 3) package-level reliability testing; 4) new device/process qualification; and 5) reliability monitoring. 7 An Overview of Semiconductor Engineering Flows Wafer Design Wafer Fab Wafer Level Failure Analysis Package Level Reliability Pass Product Release Wafer Level Reliability Fail Assembly Process Fail Pass Packaging Design Package Level Failure Analysis 8 Reliability Tests Request Generate Qual Plan E O N Request No Assembly, Test, Collect ALR Data Approved Qual Plan Yes Reliability Test Flow Received Materials Generate Reliability Tests Requests Start Reliability Tests ATE Test Verified Failure Yes Failure Analysis Generate Final Reliability Test Report Reliability Failure No Record Database No Yes 8D Analysis 9 Bath Tub Curve 10 Bath Tub Curve 1. An “infant mortality” early life phase characterized by a decreasing failure rate (Phase 1). Failure occurrence during this period is not random in time but rather the result of substandard components with gross defects and the lack of adequate controls in the manufacturing process. Parts fail at a high but decreasing rate. 2. A “useful life” period where electronics have a relatively constant failure rate caused by randomly occurring defects and stresses (Phase 2). This corresponds to a normal wear and tear period where failures are caused by unexpected and sudden over stress conditions. Most reliability analyses pertaining to electronic systems are concerned with lowering the failure frequency (i.e., is constant) during this period. 3. A “wear out” period where the failure rate increases due to critical parts wearing out (Phase 3). As they wear out, it takes less stress to cause failure and the overall system failure rate increases, accordingly failures do not occur randomly in time. 11 Bath Tub Curve : Example An IC manufacturing company encountered this failure rate curve Failure rate Time A program was implemented immediately to significantly improve the product quality and yield. 12 Bath Tub Curve : Example (a) (b) (c) (d) How the new bath-tub curve will looks like? 13 Case Study Company X launched a new product. During product qualification, the design engineer requested reliability study to be performed. Table below shows the reliability data. Time Period (Hours) Sample Size on Stress Number of Failures Cumulative Failures Failure Rate 24 50 23 23 0.9583 48 27 8 31 0.3333 72 19 3 34 0.1250 96 16 0 34 0 168 16 1 35 0.0139 240 15 0 35 0 360 15 2 37 0.0167 480 13 3 40 0.0250 600 10 1 41 0.0083 720 9 0 41 0 840 9 0 41 0 960 9 0 41 0 1008 9 0 41 0 14 1.2 1 Failure Rate 0.8 0.6 0.4 0.2 0 0 200 400 600 800 1000 1200 -0.2 Time (Hours) High infant mortality is demonstrated on the bathtub curve, where the early life failure is relatively high but rapidly declining over time. This is undesirable as it would reflect high failure rate at the early application stage of the end customer. The high infant mortality correlates with yield loss due to extrinsic defects. Why? 15 Failure rate () = Total number of failures within a population Total time expanded by that population = 41 23(24hrs) + 8(48hrs) + 3(72hrs) + 1(168hrs) + 2(360hrs) + 3(480hrs) + 1(600hrs) + 9(1008hrs) = 0.003117 failures/ hour = 3117000 FITs or 3117000 failures per 109 device-hours Mean Time Between failures (MTBF) MTBF = 1/ = 1/ 0.003117 = 320.8 hours = 13.37 days Note : 1 FIT (Failure unIT) = one failure in 109 device-hours 16 Time Period, t Reliability, R(t) = exp (-t) Failure, F(t) = 1- R(t) 1 month 0.106008075 0.893991924 2 months 0.011237712 0.988762287 3 months 0.001191288 0.998808711 4 months 0.000126286 0.999873713 5 months 0.000013387 0.999986612 6 months 0.000001419 0.999998580 7 months 0.000000150 0.999999849 8 months 0.000000016 0.999999984 9 months 0.000000002 0.999999998 10 months 0 1 17 In order to grasp the quantitative degree of risk that the products are exposed to, we have to study the tabulated reliability rate as well as the failure rate over time. It was observed that the reliability level of the devices is relatively low starting from the first month of operation, that is, merely 0.11. This implies a relatively high failure rate of 0.89. The devices’ reliability degrades drastically within a few months time and fall to zero at 10th months of operation, where the devices have came to the end of life. 0.12 1.02 0.11 1 0.98 0.09 Failure Function F(t) Reliability Function R(t) 0.1 0.08 0.07 0.06 0.05 0.04 0.03 0.96 0.94 0.92 0.9 0.88 0.02 0.86 0.01 0.84 0 0 1 2 3 4 5 6 7 Time Period t (Months) 8 9 10 11 1 month 2 months 3 months 4 months 5 months 6 months 7 months Time Period t (Months) 8 months 9 months 18 10 months Type of Accelerating Tests 1. Preconditioning (PRECON) The preconditioning stress sequence is performed for the purpose of evaluating the capability of semiconductor devices to withstand the stresses imposed by a user’s printed circuit board assembly operation. A properly designed device (i.e. die and package combination) should survive this preconditioning sequence with no measurable changes in electrical performance. Furthermore, preconditioning of properly designed devices should not produce latent defects which lead to degraded reliability during life or environmental stress tests. Changes in electrical characteristics and both observable as well as latent physical damage during this stress sequence result principally from mechanical and thermal stresses and from ingress of flux and cleaning agents. Effects include die and package cracks, fractured wire bonds, package and lead frame delamination, and corrosion of die metallization. 19 1. Preconditioning (PRECON) Step Stress Conditions 1 Initial Electrical Test Room temperature 2 External Visual Inspection 40X Magnification 3 Temperature Cycling 5 cycles at –40C (max) to +60C (min) (Step is optional) 4 Bake Out 24 hrs (min) at 125C 5 Moisture Soak Per MSL rating 6 Reflow 3 cycles per referenced profile 7 Flux Application 10 sec immersion in water soluble flux @ room temp 8 Cleaning Multiple DI water rinses 9 Dry Room temperature 10 Final Electrical Test Room temperature 20 2. Operating Life (SOPL/DOPL) The operating life test is performed for the purpose of demonstrating the quality and reliability of devices subjected to the specified conditions over an extended time period. Either a static or dynamic condition may be used, depending on the circuit type and the wafer fabrication technology. The specified test conditions (i.e. bias conditions, loads, clock inputs, etc.) are selected so as to represent the worst case conditions for the device. Unless otherwise specified in the detailed test procedure, the test is run at an ambient temperature of +125C or above. Many device types are routinely run at +150 C ambient. Ambient temperatures above +170 C are generally considered impractical due to the physical limitations of circuit boards, sockets, device lead finishes, molding compound glass transition temperatures, etc. Stress Conditions: 125 C – 150 C, Bias 21 3. Power Cycle (PRCL) The power cycle test is performed to determine the effects on solid state devices of thousands of power-on/power-off operations such as would be encountered in an automobile or a TV set. The repetitive heating/cooling effect caused by multiple on/off cycles can lead to fatigue cracks and other degrading thermal and/or electrical changes in the die attachment system of those devices which generate significant internal thermal heating under maximum load conditions (i.e. voltage regulators or highcurrent drivers). This test forces junction temperature excursions at the rate of ~ 30 cycles per hour (typical). Stress Conditions: Delta Tj = 100 C, 2 minute cycle 22 4. High Temperature Gate Bias Test (HTGB) The HTGB test biases gate or other oxides of the device samples. The devices are normally operated in a static mode at, or near, maximumrated oxide breakdown voltage levels. The particular bias conditions should be determined to bias the maximum number of gates in the device. The HTGB test is typically used for power devices. Stress Conditions: 150 C Tj, Biased 5. High Temperature Reverse Bias Test (HTRB) The HTRB test is configured to reverse bias major power handling junctions of the device samples. The devices are characteristically operated in a static operating mode at, or near, maximum-rated breakdown voltage and/or current levels. The particular bias conditions should be determined to bias the maximum number of solid state junctions in the device. The HTRB test is typically applied on power devices. Stress Conditions: 150 C Tj, Biased 23 6. Temperature Humidity Biased Test (THBT) The steady-state temperature-humidity-bias life test is performed for the purpose of evaluating the reliability of non-hermetic packaged devices operating in humid environments. It employs severe conditions of temperature, humidity, and bias which accelerate the penetration of moisture through the external protective material (encapsulant) or seal) or along the interface between the external protective materials and the metallic conductors passing through it. When moisture reaches the surface of the die, the applied potential forms an electrolytic cell, which corrodes the aluminum, affecting DC parameters through its conduction, and eventually causes catastrophic failure by opening the metal. The presence of contaminants such as chlorine greatly accelerates the reaction as does excessive phosphorus in the PSG layers (passivation, dielectric or field oxide). Stress Conditions: 85%RH, 85 C 24 7. Highly Accelerated Stress Test (HAST) The HAST test is performed for the purpose of evaluating the moisture resistance of non-hermetic packaged devices operating in high humidity environments. Bias is applied minimizing current draw using alternating potentials wherever possible. The test approximates a highly accelerated version of the THBT test. These severe conditions of pressure, humidity, and temperature, together with bias, accelerate the penetration of moisture through the external protective material (encapsulant or seal) or along the interface between the external protective material and the metallic conductors passing through it. When moisture reaches the surface of the die, the applied potential forms an electrolytic cell which corrodes the metallization, affecting the DC parameters and eventually causing catastrophic failure by opening the metal. The presence of contaminants, such as chlorine, greatly accelerate the reaction as does excessive phosphorus in the PSG layers (passivation, dielectric or field oxide). The use of HAST as a stress technique should be avoided when stressing assembly packages that have mold compound and die attach materials with Tg of less than 130 C since uncharacteristic failures may result. Stress Conditions: 130 C, 85%RH, 19.5psig 25 8. Autoclave (ACLV) The autoclave (or pressure cooker) test is performed for the process of evaluating the moisture resistance of non-hermetic packaged devices. No bias is applied to the devices during this test. It employs severe conditions of pressure, humidity and temperature not typical of actual operating environments that accelerate the penetration of moisture through the external protective material (encapsulant or seal) or along the interface between the external protective material and the metallic conductors passing through it. When moisture reaches the surface of the die, reactive agents cause leakage paths on the surface of the die and corrode the die metalization, affecting DC parameters and eventually catastrophic failure. Other die-related failure mechanisms are activated by this method including mobile ionic contamination and various temperature and moisture related phenomena. The autoclave test is destructive and produces increasing failure rates when repetitively applied. It is useful for short-term, comparative evaluations such as lot acceptance, process monitors and qualifications but generates no absolute information since accelerating factors relating the test conditions to those of the operating environment are not well established. 26 In addition, the autoclave test can produce spurious failures not representative of device reliability, due to excessive chamber contaminants. This condition is usually evidenced by severe external package degradation, including corroded device terminals/leads or the formation of conducting matter between the terminals, or both. The autoclave test is not, therefore, suitable for measurements of external package quality or reliability. Specific device types and technologies may be particularly sensitive to package degradation. ACLV test should not be included in the qualification of laminate or tape based packages i.e. FR4, polyimide tape, or equivalent. However, during the development stages of a laminate or tape based package ACLV can be used to understand inherent weakness in the package. Cautions must be taken when interpreting results because failure mechanisms may be due to exceeding the capabilities of the package, producing unrealistic material failures. Stress Conditions: 100%RH, 121 C, 2ATM 27 9. Temperature Cycle (TMCL) The temperature cycle test is conducted for the purpose of determining the resistance of devices to alternating exposures at extremes of high and low temperatures. Permanent changes in electrical characteristics and physical damage produced during temperature cycling result principally from mechanical stress caused by thermal expansion and contraction. Effects of temperature cycling include cracking and delamination of packages, cracking or cratering of die, cracking of passivation, delamination of metallization, and various other changes in the electrical characteristics resulting from thermo-mechanically induced damage. Stress Conditions: -40 C to +125 C or –65 C to +150 C 10. Board Level Temperature Cycle (BTMCL) The BTMCL test is intended to provide fatigue-related wearout information on the solder joint attachment of devices to circuit boards. Daisy chain structure test devices are mounted to circuit boards and cycled through temperature extremes typically in the range of 0 C to +100 C. During stress, the solder joint resistance is continuously monitored and a unit is considering failing when 5 cumulative incidences of elevated resistance (> 1000 ohms) are detected. Ideally, testing should continue until a cumulative 63% failure rate of the test sample has been observed. Stress Conditions: 0 C to +100 C, 2 cycles/hour 28 11. High Temperature Storage Life (HTSL) The high temperature storage (also called the stabilization bake test) is employed for the purpose of determining the effects of storing devices at elevated temperatures without electrical stresses applied. Devices under test are subjected to continuous storage in a chamber with circulated air heated to +150 C. At the end of the specified stress period, the devices are removed from the chamber, allowed to cool, and electrically tested. Interim measurements are made if specified in the detailed test procedure. Stress Conditions: 150 C 29 12. Moisture Sensitivity (MOIS) The purpose of this stress is to classify the sensitivity of non-hermetic solid state Surface Mount Devices (SMDs) to moisture-induced stress so that they can be properly packaged, stored, and handled to avoid subsequent thermal/mechanical damage during the assembly solder reflow attachment and/or repair operation. Step Stress Conditions 1 Initial Electrical Test Per data sheet 2 External Visual Inspection 40X Magnification 3 CSAM Inspection Classify and Measure Initial Delamination Levels 4 Bake Out 5 Moisture Soak Per Target MSL rating 6 Reflow 3 cycles per referenced profile 7 External Visual Inspection 40X Magnification 8 Final Electrical Test Per data sheet 9 CSAM Inspection Classify and Measure Final Delamination Levels 10 Final Electrical Test Room temperature 24 hrs (min) at 125 C 30 Case Study Imagine that you are a reliability engineer in a memory IC chips manufacturing firm. Your company design team has just came out with a new package. In this new package, solder bumps are replaced with silver-filled adhesive, while copper wire is used instead of gold wire. You are required to propose appropriate accelerating tests for the new package qualification. 31 Reliability Modeling for Failure Mechanisms Failure Mechanism Reliability Modeling, or reliability modeling, or acceleration modeling, or simply modeling, is the mathematical representation of a failure mechanism in terms of a set of algebraic or differential equations from the perspective of its reliability implications. The term failure mechanism refers to the actual physical phenomenon behind a failure occurrence. Modeling is a means of determining and understanding the different variables or factors that bring out and accelerate a failure mechanism. Being able to model a mechanism and quantify how it is affected by various environmental factors will allow a reliability engineer to develop appropriate reliability tests for estimating field failure rates and predicting when failures will begin to occur. Modeling is often expressed in the form of time to failure, or tf, or the acceleration factor, AF. 32 Failure Mechanism Package crack Lifted/ broken ball/wedge/neck Ea (eV) 1.0 0.45 – 0.75 Bond inter-metallic failure (Bromine induced) 1.0 Bond inter-metallic failure (chlorine induced) 0.8 Die attach failure 0.3 Delamination between mold compound and lead frame 0.3 Lifted welds at lead frame 0.45 Metal corrosion (halide induced) 0.7 Metal corrosion (electrolytic) 0.3 – 0.6 Device leakage (surface inversion) 1.0 Dielectric breakdown 0.35 Hot carrier trapping in oxide -0.06 Electromigration in Al 0.5 Contact electromigration, Si in Al 0.9 Contact electromigration (Al at sidewall) 0.8 – 1.4 Contact metal migration through barrier layer 1.8 Au-Al intermetallic growth 1.0 33 1. Thermal Effects (Arrhenius Equation) TTF: Ao: Ea: k: T: Time to Failure Arbitrary Scale Factor Activation Energy (eV) Boltzman’s Constant (8.62 x 10-5 eV/K) Temperature (degrees Kelvin) Acceleration Factor: (Ratio of TF values, use/stress) AF e ( Ea 1 )( 1 ) T T use stress k 34 2. Electromigration (Black’s Equation) TTF: Time to Failure A: Prefactor with complex dependence upon grain size, line structure and geometry, test conditions, current density, thermal history, etc. J: Current Density N: Empirically determined factor ranging from 1-14. The industry accepted number today is N=2 for current densities in the range of 1 to 2x106 A/cm2. Ea: Activation Energy (eV) k: Boltzman’s Constant (8.62 x 10-5 eV/K) T: Temperature (degrees Kelvin) Acceleration Factor: AF ( J use J stress ) N e ( Ea 1 )( 1 ) T Tstress use k 35 3. Temperature Humidity Mechanisms (Peck’s Model) TTF: Ao: RH: N: Ea: k: T: Time to Failure Arbitrary Scale Factor Relative Humidity (%) an experimentally determined constant Activation Energy (eV) Boltzman’s Constant (8.62 x 10-5 eV/K) Temperature (degrees Kelvin) Acceleration Factor: (Ratio of TF values, use/stress) AF ( RH use RH stress ) N e ( Ea 1 )( 1 ) T T use stress k 36 4. Temperature and Voltage Mechanisms (Eyring Equation) TTF: Ao: V: N: Ea: k: T: Time to Failure Arbitrary Scale Factor Voltage (V) An experimentally determined constant Activation Energy (eV) Boltzman’s Constant (8.62 x 10-5 eV/K) Temperature (degrees Kelvin) Acceleration Factor: (Ratio of TF values, use/stress) Vuse AF ( Vstress ) N e ( Ea 1 )( 1 ) T Tstress use k 37 5. Creep Mechanisms: TTF Bo (To T ) n e ( Ea ) kT TTF: Time to Failure Bo: Process Dependent Constant To: Stress free temperature for metal (~metal deposition temperature for aluminum) T: Temperature (degrees Kelvin) n: n = 2-3 (n usually ~5 if creep, thus implies T<Tm/2) Ea: Activation energy (eV) k: Boltzman’s Constant (8.62 x 10-5 eV/K) T: Temperature (degrees Kelvin) Acceleration Factor: (Ratio of TF values, use/stress): AF ((T0 Taccel ) /(T0 Tuse )) N e ( Ea 1 )( 1 ) T Tstress use k 38 6. Thermo-Mechanical Mechanisms (Coffin-Manson Equation) Nf: Co: T : n: Number of cycles to failure A material dependent constant Entire temperature cycle range for the device An experimentally determined constant Acceleration Factor: (Ratio of TF values, use/stress) AF (Tuse / Tstress ) n 39 Reliability Modeling : Example 1 40 Reliability Modeling : Example 2 41 Failure Analysis 42 Semiconductor Failure Analysis Semiconductor Failure analysis (FA) is the process of determining how or why a semiconductor device has failed, often performed as a series of steps known as FA techniques. Device failure is defined as any non-conformance of the device to its electrical and/or visual/mechanical specifications. Failure analysis is necessary in order to understand what caused the failure and how it can be prevented in the future. Electrical failure can either be functional or parametric. Functional failure refers to the inability of a device to perform its intended function. Parametric failure refers to the inability of a device to meet the electrical specifications for a measurable characteristic (such as leakage current) that does not directly pertain to functionality. Thus, a parametric failure may be present even if the device is still functional or able to perform its intended function. For example, a DAC that can convert digital data into the correct analog voltage but draws excessive supply current is a parametric failure, but the one that does not convert data at all is a functional failure. A device is said to be failing catastrophically if it is grossly failing all parametric and functional test blocks. 43 Semiconductor Failure Analysis Failure analysis starts with failure verification. It is important to validate the failure of a sample prior to failure analysis in order to conserve valuable FA resources. Failure verification is also done to characterize the failure mode. Good characterization of the failure mode is necessary to make the FA efficient and accurate. After failure verification, the analyst subjects the sample to various FA techniques step by step, collecting attributes and other observations along the way. Nondestructive FA techniques are done before destructive ones. Also, the results of these various FA techniques must be consistent or corroborative. Any inconsistency in results must be resolved before proceeding to the next step. In general, the results of the various FA techniques would collectively point to the real failure site. The FA process is finished once there are enough information to make a conclusion about the location of the failure site and cause or mechanism of failure. 44 FA Terminology Failure Mode - a description of how a device is failing, usually in terms of how much it is deviating from the specification that it is failing, e.g., excessive supply current, excessive offset voltage, excessive bias current Failure Mechanism - the physical phenomenon behind the failure of a device, e.g., metal corrosion, electrostatic discharge, electrical overstress Root Cause - the first event or condition that triggered, whether directly or indirectly, the occurrence of the failure, e.g., improper equipment grounding that resulted in ESD damage, a system problem that caused the usage of an incorrect mask set The objective of a failure analyst when conducting FA is to determine the failure mechanism that led to the failure mode of the device. Once the failure mechanism has been determined, the process owner or expert can work with the failure analyst to determine the root cause of the problem. The process owner must always address the root cause of the failure mechanism, not just the intermediate failure causes that occurred after the root cause has already happened. 45 Basic FA Capability FA Equipment Purpose Estimated Cost (USD) Curve Tracer and Break-out Boxes I/V Curve Tracing 10K - 20K Digital Multimeters Bench Testing 5K - 10K Power Supplies/Signal Generators Bench Testing 5K - 10K Oscilloscope Bench Testing 10K - 20K Low-power Microscope Optical Inspection 5K - 10K High-power Microscope Optical Inspection 10K - 50K Fume Hood and Glassware Manual Decapsulation 5K - 15K Vise and Assorted Hand Tools Mechanical Decapsulation <1K 46 Good FA Capability FA Equipment Purpose Estimated Cost (USD) Microprobing Station Probing / Thermography 40K - 100K Laser Cutter Circuit Isolation 20K - 40K Microsectioning Equipment Cross-sectioning 40K - 80K High-Resolution X-ray System X-ray Radiography 100K - 200K Automatic Decapsulation System Decapsulation 20K - 30K Reactive Ion Etcher Die Delayering 40K - 60K SEM/EDX System SEM / EDX Analysis 300K - 400K 47 Advanced FA Capability FA Equipment Purpose Estimated Cost (USD) Light Emission Microscope Light Emission Microscopy 150K - 200K Scanning Acoustic Microscope Acoustic Microscopy 200K - 250K Focused Ion Beam Machine FIB Analysis 500K - 700K FTIR Spectrometer FTIR Analysis 60K - 80K TEM System TEM Inspection 400K - 500K Auger Spectrometer Auger Analysis ~1M 48 FA Flow Receipt of Failure Devices Failure Verification Curve Trace Parametric Test Functional Bench Test External Visual Inspection X-ray Analysis External Package Cleaning & Retest Bake Recoverability Test SAT Decapsulation Internal Visual Inspection/ SEM/ EDX Analysis Data Review Failure Mechanism Determined Failure Mechanism Undetermined Level I Analysis Level II Analysis 49 Generate Final report FA Flow From Level 1 Analysis Internal Visual Inspection (Optical Microscope / SEM) Non-Visible Failure Mode Visible Failure Modes Electrical Microprobing TIVA/ LIVA Analysis Liquid Crystal Analysis Data Review SEM / EDX Analysis Cross-sectioning Deprocessing Internal Visual Inspection (Optical Microscope / SEM) Failure Mechanism Determined Final Report 50 Structural Diagram of an IC Package 51 Common Failure Mechanisms in Microelectronics Devices Ball Bond Lifting Detachment of the ball bond from the silicon chip; also refers to non-sticking of the ball bond to the bond pad. Common Causes: contamination on the bond pad, incorrect wire bond parameter settings, instability of the die during bonding, bond pad corrosion, excessive bond pad probing, Kirkendall voiding, excessive thermal stress resulting in excessive intermetallic formation, bond pad metallization/barrier metallization lifting, cratering. Wedge Bond Lifting Detachment (or non-sticking) of the wedge bond from the silicon chip, bonding post, or lead finger. Common Causes: contamination on the bond pad or lead finger, incorrect parameter settings, instability of the die or lead frame during bonding, bond pad or lead finger corrosion, excessive bond pad probing 52 Common Failure Mechanisms in Microelectronics Devices Ball Bond Neck Break Breakage of the wire at the neck of the Au ball bond. Common Causes: incorrect wire bond parameter settings, incorrect wire looping, die-to-package delamination, excessive wires weeping during mold. Wedge Bond Heel Break Breakage of the wire at the heel of the Al wedge bond. Common Causes: incorrect wire bond parameter settings, incorrect wire looping, lead finger-to-package delamination, excessive wires weeping during mold. Midspan Wire Break Breakage along the span of the wire. Common Causes: wire nicks or damage, wire corrosion, tight wire looping, excessive wires weeping, electrical overstress. 53 Common Failure Mechanisms in Microelectronics Devices Bond-to-Metal Shorting Electrical shorting between the bond and a metal line on the die. Common Causes: incorrect wire bond parameter settings, incorrect bond placement, insufficient bond pad-to-metal distance Bond-to-Bond Shorting Electrical shorting between two bonds. Common Causes: incorrect wire bond parameter settings, incorrect bond placement, insufficient bond pad-to-bond pad distance. Wire-to-Wire Shorting Electrical shorting between two wires. Common Causes: incorrect wire looping, excessive wires weeping, insufficient wire-to-wire distance. 54 Common Failure Mechanisms in Microelectronics Devices Cratering Silicon damage under the bond pad, the worst of which is when a chunk of silicon is completely detached from the active circuit. Common Causes: incorrect wire bond parameter settings, excessive bond pad probing. Die Chip-outs Die chipping is a failure mechanism wherein a part or parts of the die break away from the die itself. Die Corrosion Die cracking is the occurrence of fractures in or on any part of the die. 55 Common Failure Mechanisms in Microelectronics Devices Die Lifting Die lifting is the separation or detachment of the die from its die pad or die cavity. Die Scratches Die scratch is the presence of abrasion, or laceration damage on surface of the die. Package Cracking Ceramic/ plastic package cracking is the occurrence of fractures anywhere in or on the package. Plastic Package Delamination Plastic Package Delamination refers to the disbonding between a surface of the plastic package and that of another material. Plastic delamination may therefore occur at an interface of the plastic and the lead frame, die, die paddle, or die attach material. 56 Common Failure Mechanisms in Microelectronics Devices Corrosion in Die and Package Corrosion is the degradation of metals as a result of electrochemical activity. Lead Corrosion Lead corrosion is often due to inadequate lead finish, the presence of contaminants on the leads, and exposure of the leads to excessive moisture. It can be accelerated by higher temperatures and the presence of electrical bias on the leads. Lead Frame Corrosion Similar to lead corrosion, but occurring on the lead frames as received from the material supplier. Wire Corrosion Bond wire corrosion can occur, gross cases of which can lead to wire breaking or even total disintegration of the wire. Most commonly encountered in 57 aluminum wires that have been contaminated by chlorine. Common Failure Mechanisms in Microelectronics Devices Contact Migration Contact migration refers to the diffusion of the metal atoms of a contact (usually Al or an alloy thereof) into the Si substrate. Electromigration It is referred to the gradual displacement or mass transport of the metal atoms of a conductor as a result of current flowing through that conductor. It can lead to formation of voids in the metal line, which may cause open and short circuits. Dielectric Breakdown Dielectric breakdown refers to the destruction of a dielectric layer, usually as a result of excessive potential difference or voltage across it. It is usually manifested as a short or leakage at the point of breakdown. 58 Common Failure Mechanisms in Microelectronics Devices Time-Dependent Dielectric Breakdown Early life and time-dependent dielectric breakdowns are primarily due to the presence of weak spots within the oxide layers arising from poor processing or uneven growth. These weak spots or dielectric defects may be caused by mobile sodium, ions in the oxide, and impurities trapped on the Si surface prior to oxidation Electrical Overstress (EOS) EOS refers to the destruction of the circuit because of excessive voltage, current, or power. EOS damage is usually very obvious. Metal lines are discolored, burnt or melted. Electrostatic Discharge (ESD) ESD can occur when a high electrostatic field develops between two objects close proximity. An ESD event can damage a device in many ways, e.g. conductor fusing, metal-resistor severing, junction damage. 59 Common Failure Mechanisms in Microelectronics Devices Ionic Contamination Mobile ionic contamination refers to the presence of mobile ions such as Na+, Cl-, and K+ in the device structures of an integrated circuit. These mobile ions can come from the environment, humans, wafer processing materials, and packaging materials. Mobile ionic contamination is commonly observed in the gate oxide of a MOS transistor. These ions can accumulate and cause charge build-ups that can shift the gate threshold of the MOS transistor. Inversion channels may also form in MOS transistors. In bipolar devices, mobile ions can affect carrier concentrations, changing the beta of the transistor. Mobile ions respond to temperature and voltage, so failures due to mobile ionic contamination can be accelerated by burn-in. Mobile ionic contamination failures can also be made to recover by subjecting the device to unbiased bake, since this will redistribute the ions by promoting their random movement. Thus, a device is most likely a mobile ionic contamination failure if it fails after burn-in but recovers after unbiased bake. 60 Common Failure Mechanisms in Microelectronics Devices Ionic Contamination 61 Future Challenge for Failure Analysis 62 Evolution of IC Packages 27.90 x 9.14 mm 9.90 x 3.90 mm 5.50 x 4.40 mm Future ? 2.20 x 1.35 mm 1.45 x 1.00 mm 63 As a microelectronics engineering students, what can you be after graduated? 1. Process Engineer 2. Test Engineer 3. Product Engineer 4. Circuit Design Engineer 5. Package Development Engineer 6. R&D Engineer 7. Reliability Engineer 8. Failure Analysis Engineer 9. Others 64 Carrier as A Reliability Engineer -Create qualification of new packages and processes and reliability monitoring plans for continuous improvement. -To define internal stress/test procedures to support new/emerging technologies and to develop/implement required procedural specifications. - Regularly update and maintain the reliability specifications. -Provide technical support and training for the reliability specialists. - Acquire new equipment to increase the reliability capability. -To develop necessary FMDRC (Failure Mechanism Driven Reliability Characterization) techniques to characterize/model new technology. Work with modeling group to develop analyst models for potential failure mechanisms. 65 Carrier as A Failure Analysis Engineer - To define failure analysis procedures to support new/emerging technologies and to develop/implement required procedural specifications. -To provide technical guidance and compilation of the analysis specification. - Regularly update and maintain the failure analysis specifications. -Provide technical support and training for the failure analysis specialists. -Acquire new equipment to increase the failure analysis capability. - Handle plant engineering evaluation, new qualifications, quality issues, and customer returns. - Provide precise and comprehensive failure analysis report to the requestors and to assist them in finding out the root cause. 66 Thank You 67