Download Chapter 2 Semiconductor Device Reliability Verification

Chapter 2 Semiconductor Device Reliability Verification Semiconductor Quality and Reliability Handbook Chapter 2 Semiconductor Device Reliability Verification 2.1 Fundamental Knowledge on Semiconductor Reliability.............................................................................. 2-2 2.1.1 Measures for Representing Reliability............................................................................................. 2-2 2.1.2 Distributions Used in Reliability Analysis ......................................................................................... 2-4 2.1.3 Semiconductor Device Failure Pattern ........................................................................................... 2-7 2.1.3.1 Semiconductor Device Failure Regions ........................................................................... 2-7 2.1.3.2 Early Failures ..................................................................................................................... 2-8 2.1.3.3 Random Failures ............................................................................................................. 2-10 2.1.3.4 Wear-out Failures ............................................................................................................ 2-11 2.2 Semiconductor Reliability Verification ........................................................................................................ 2-13 2.2.1 Basic Approach Toward Reliability Verification............................................................................. 2-13 2.2.1.1 Reliability Verification in the Development Stage ........................................................... 2-13 2.2.1.2 Reliability Verification in the Prototype Stage ................................................................. 2-13 2.2.1.3 Reliability Verification in the Mass Production Stage ..................................................... 2-14 2.2.2 Reliability in the Development and Design Stages....................................................................... 2-15 2.2.2.1 Time-Dependent Dielectric Breakdown (TDDB)............................................................ 2-17 2.2.2.2 Hot carrier (HCI) ............................................................................................................... 2-19 2.2.2.3 Negative Bias Temperature Instability (NBTI) ................................................................ 2-20 2.2.2.4 Soft Error .......................................................................................................................... 2-21 2.2.2.5 Electromigration ............................................................................................................... 2-23 2.2.2.6 Stress Migration ............................................................................................................... 2-25 2.3 Acceleration Model...................................................................................................................................... 2-28 2.3.1 Acceleration Models for Environmental Stress............................................................................. 2-28 2.3.2 Acceleration Models for Operating Stress .................................................................................... 2-30 2-1 2.1 Fundamental Knowledge on Semiconductor Reliability With recent advances in the systematization, functions and performance of equipment, the social impact and damages produced by failures are increasing, and high reliability has come to be demanded of equipment. This means that even higher reliability is demanded of the individual components that comprise equipment. Large quantities of semiconductors are used in a single piece of equipment, and these semiconductors often handle the main functions of that equipment, so high reliability is extremely important. Semiconductors themselves are also becoming more miniaturized and highly integrated, with larger-scale circuit configurations. In addition, as semiconductor functions and performance advance and evolve into system LSIs, ensuring semiconductor reliability has become a vital matter. The reliability measures, distribution functions, trends in failure rates over time, and failure regions needed to discuss semiconductor reliability are described below. 2.1.1 Measures for Representing Reliability JISZ 8115 (Reliability Terminology) defines reliability as “The property of an item which enables it to fulfill its required functions for the prescribed period under the given conditions.” Therefore, reliability includes the concept of time, and reliability measures are functions of time. (1) Reliability Function (Reliability): R(t) Reliability indicates the probability for functioning correctly without failure until time t. When n samples are used under the same conditions, if the number of failures occurring until time t has elapsed is expressed as r(t), then the reliability R(t) is expressed by the following equation. R (t )  n  r (t ) n ・・・・Eq. 2.1.1 (2) Failure Distribution Function (Unreliability): F(t) This indicates the probability of failure occurring until time t, and is expressed by the following equation. F (t )  r (t ) n ・・・・Eq. 2.1.2 In addition, the following relationship is established between unreliability F(t) and reliability R(t). R (t )  F (t )  1 ・・・・Eq. 2.1.3 As shown in Fig. 2-1, R(t) decreases from 1 over time, while conversely F(t) increases from 0 toward 1 over time. Note that the distribution functions described hereafter are used as the failure distribution functions of semiconductor devices. 2-2 Fig. 2-1 Relationship between F(t) and R(t) (3) Failure Density Function: f(t) This represents the probability of failure occurring per unit time when time t has elapsed. f (t )  dF (t ) dR(t )  dt dt ・・・・Eq. 2.1.4 (4) Failure Rate Function: λ(t) This represents the probability of failure occurring in the next unit time for samples that have not yet failed when time t has elapsed.  (t )  f (t ) f (t )  1  F (t ) R(t ) ・・・・Eq. 2.1.5 The failure rate function is also called the instantaneous failure rate, and is calculated from the failure distribution function F(t) using Equations 2.1.4 and 2.1.5. Failure In Time (FIT: number of failures per billion (109) total operating hours) is generally used as the unit for semiconductor devices. Note that when the F(t) of the subject product is not known, the average failure rate obtained by the following equation is used. Average failure rate ≡ Total number of failures during the period / Total operating time during the period ・・・Eq. 2.1.6 [Supplement] In addition to the failure rate defined above, the cumulative failure rate after a set equipped with the semiconductor device has operated for the specified time in the market is sometimes used in the early failure region described hereafter. Unless otherwise requested by the customer, the Sony Semiconductor Business Unit also uses the cumulative failure rate after one year as the early failure rate. In addition, after the early failure region, most semiconductor devices do not reach wear-out failure (genuine failure) in the actual operating environment, and the failure rate exhibits the constant value of the random 2-3 failure region. This value becomes the same as that obtained by Equation 2.1.6, so the average failure rate can be said to essentially be the failure rate after the early failure region. (5) Mean Time To Failure: MTTF The Mean Time To Failure (MTTF) of an item such as a semiconductor device that is not subject to repair or maintenance is expressed by the following equation.  MTTF   tf (t )dt 0 ・・・Eq. 2.1.7 2.1.2 Distributions Used in Reliability Analysis Typical distribution functions used to analyze reliability data of semiconductor devices are described below. (1) Normal distribution The normal distribution is a typical continuous distribution used for quality control. It is said that in reliability analysis, the normal distribution is often applied to wear-out life where failures concentrate around a certain time. The probability density function f(t) and distribution function F(t) are expressed by the following equations.  t   2  1 exp (  t  ) ・・・・Eq. 2.1.8 f (t )  2  2   2  F t   1 2    x   2  exp   2 2 dx    t    t ・・・・Eq. 2.1.9 This distribution is given by the mean parameter μ and the dispersion (variance) parameter σ. As shown in Fig. 2-2 below, the normal distribution has a symmetrical bell shape centering on μ, and the probability of the value t being contained within the range of ±σ, ±2σ and ±3σ to both sides of μ is 68.26%, 95.44% and 99.7%, respectively. Fig. 2-2 Normal Distribution 2-4 (2) Exponential distribution The exponential distribution represents the life distribution (failure distribution function) in the random failure region where the failure rate λ is constant over time, and the probability density function f(t) and distribution function R(t) are expressed by the following equations. This distribution corresponds to the case when the shape parameter m = 1 in the Weibull distribution described hereafter. f (t )  e  t R(t )  1  e  t ・・・・Eq. 2.1.10 ・・・・Eq. 2.1.11 Fig. 2-3 Exponential Distribution Note that as shown in the following equation, the MTTF is given from t0, which is the inverse of the failure rate λ. 1   t0  MTTF ・・・・Eq. 2.1.12 (3) Logarithmic normal distribution The logarithmic normal distribution is a distribution function where ln t, which is the logarithm of the life time t, follows the above-mentioned normal distribution. The probability density function f(t) and distribution function F(t) are expressed by the following equations.  1  ln t    2  1 f (t )  exp    2 t  2     1 F (t )  2  0  t     1  ln x    2  1 0 x exp 2    dx   t 2-5 ・・・・Eq. 2.1.13 ・・・・Eq. 2.1.14 Fig. 2-4 Logarithmic Normal Distribution In semiconductor device reliability, the electromigration life is generally known to follow a logarithmic normal distribution. (4) Weibull distribution The Weibull distribution is a weakest link model proposed by W. Weibull (Sweden) in 1939 as a mechanical breakdown strength distribution. This model was applied by J. H. K. Kao in 1955 to analyze the life of vacuum tubes, and has often been used since then to model life distributions in analysis of semiconductor device reliability. The probability density function f(t) and distribution function F(t) are expressed by the following equations. m  t  f (t )        m 1   t   exp       t   F (t )  1  exp        m    m       ・・・・Eq. 2.1.15 ・・・・Eq. 2.1.16 2-6 f(t) Fig. 2-5 Weibull Distribution Here, m is called the form parameter, η the measure parameter (characteristic life), and γ the position parameter. In addition, assuming t0=ηm, the failure rate (t) is expressed by the following equation.  (t)  m t         m 1  m t   m 1 t0 ・・・・Eq. 2.1.17 The following information concerning the failure pattern can be obtained from the value of the form parameter m. 0 < m < 1: Early failure (DFR) pattern where the failure rate decreases over time m = 1: Random failure (CFR) pattern where the failure rate is constant (matches with the exponential distribution) m > 1: Wear-out failure (IFR) pattern where the failure rate increases over time 2.1.3 Semiconductor Device Failure Pattern 2.1.3.1 Semiconductor Device Failure Regions Like general electronic equipment, semiconductor device failure regions are classified into the three types of early, random and wear-out failure regions, and the time-dependent trend in the failure rate creates a curve called a bathtub curve as shown in Fig. 2-6. This curve is the sum of the early failure rate which decreases steadily over time, the random failure rate which exhibits a constant value, and the wear-out failure rate which increases steadily over time. However, in case of semiconductor devices, the random failure rate is thought to consist of only small soft errors as described hereafter, and the failure rate in the random failure region (the height of the bottom of the bathtub) can be said to be dominated by the sum of the failure rates of the region where the early rate converges towards a constant value and 2-7 the region where the wear-out failure rate begins to rise. Early failure region  Random failure region Wear-out failure region Failure rate Wear-out failure rate Early failure rate Random failure rate Product shipped Life (Useful years) Operating time Fig. 2-6 Time-Dependent Change in Semiconductor Device Failure Rate 2.1.3.2 Early Failures The failure rate in the early failure period is called the early failure rate (EFR), and the failure rate monotonically decreases over time. The vast majority of semiconductor device early failures are caused by defects built into devices mainly in the wafer process. The most common causes of these defects are dust adhering to wafers in the wafer process and crystal defects in the gate oxide film or the silicon substrate, etc. Most devices containing defects rooted in the manufacturing process fail within the manufacturing process and are eliminated as defective in the final sorting process. However, a certain percentage of devices with relatively insignificant defects may not have failed when making the final measurements and may be shipped as passing products. These types of devices that are inherently defective from the start often fail when stress (voltage, temperature, etc.) is applied for a relatively short period, and exhibit a high failure rate in a short time within the customer’s mounting process or in the initial stages after being shipped as products. However, these inherently defective devices fail and are eliminated over time, so the rate at which early failures occur decreases. This property of semiconductor devices where the failure rate decreases over time can be used to perform screening known as “burn-in,” where stress is applied for a short time in the stage before shipping to eliminate devices containing initial defects. Product groups from which devices with inherent initial defects have been removed to a certain degree by burn-in not only improve the early failure rate in the market, but also make it possible to maintain high quality over a long period as long as these products do not enter the wear-out failure region. An overview of burn-in is described below. (1) Derivation of failure distribution function of early failure period In order to determine the burn-in conditions for reliably removing devices with inherent early failures, it is necessary to obtain the failure distribution function of the early failure period. 2-8 To obtain this function, highly accelerated life tests are performed in a short time using a sample quantity on a scale that is certain to contain devices with inherent initial defects (normally several thousand to ten thousand pieces). The obtained failure time data is then plotted on Weibull probability paper and the failure distribution function is estimated from the resulting regression line. Fig. 2-7 shows an example of this process. The shape parameter m and the characteristic life η that determine the Weibull distribution in the following equation can be obtained from the linear regression.   t  m  F (t )  1  exp         ・・・・Eq. 2.1.18 This method of obtaining the failure distribution function is called burn-in study. Fig. 2-7 Weibull Plot of the Burn-in Study Note) Weibull probability paper is scaled to display linear regression of failure times that follow a Weibull distribution. (2) Determining the burn-in conditions The screening (burn-in) conditions required to reduce the early failure rate after shipment (Note 1) to the target value can be determined using the failure distribution function F(t) obtained from the burn-in study. Labeling the burn-in time as t0 and the coefficient of acceleration for the burn-in conditions and the market environment as K, the cumulative early failure rate that can be eliminated by burn-in is given as F(K·t0), and the new cumulative early failure rate F(t) up to time t after burn-in can be obtained by the following formula. F (t )  F ( K  t0  t )  F ( K  t0 ) ・・・・Eq. 2.1.19 This relationship can be expressed in graph form as shown in Fig. 2-8. The burn-in conditions are selected according to the combination of the acceleration conditions and time that will reduce this value to the target early failure rate or lower. Normally, initial defects that are the cause of 2-9 early failures occur at the highest rate in the initial stages of process development, and then decrease thereafter due to process improvements and process mastery. The early failure rate decreases in proportion to these initial defects, so the burn-in time is reviewed as appropriate in accordance with process improvements. Failure Probability Density Function f(t) Early failures eliminated by screening Cumulative early failure rate F(t) K・t0 t Burn-in Shipment Fig. 2-8 Early Failure Screening by Burn-in Note 1) The early failure rate described in this section is not the instantaneous failure rate but the cumulative failure rate over the specified period. See the [Supplement] under “2.1.1 Measures for Representing Reliability.” 2.1.3.3 Random Failures When devices containing initial defects have been eliminated to a certain degree, the early failure rate becomes extremely small, and the failure rate exhibits a gradually declining curve over time. In this state, the failure distribution is close to an exponential distribution, and this is called the random failure period. The semiconductor device failure rate during this period is an extremely small value compared to the early failure rate immediately after shipment, and is normally a level that can be ignored for the most part. Viewed in terms of failure mechanisms, there are extremely few semiconductor device failures that can be clearly defined as random failures. However, memory software errors and other phenomena caused by α rays and other high-energy particles are sometimes classified as randomly occurring failure mechanisms. When predicting semiconductor device failure rates, failures occurring sporadically after a certain long time has passed since the start of operation and failures for which the failure cause could not be determined are treated as random failures in some cases. However, most of these failures are thought to be devices containing relatively insignificant initial defects (dust or crystal defects) that fail after a long time, and should essentially be positioned on the early failure rate attenuation curve. This type of failure rate cannot be estimated from the results of tests performed with few samples such as reliability tests. There are also phenomena such as ESD breakdown, overvoltage (surge) breakdown (EOS) and latch-up that occur at random according to the conditions of use. However, these phenomena are all produced by the application of excessive stress over the device absolute 2-10 maximum ratings, so these are classified as breakdowns instead of failures, and are not included in the random failure rate. 2.1.3.4 Wear-out Failures Wear-out failures are failures rooted in the durability of the materials comprising semiconductor devices and the transistors, metal lines, oxide films and other elements, and are an index for determining the device life (useful years). In the wear-out failure region, the failure rate increases with time until ultimately all devices fail or suffer characteristic defects. The main wear-out failure mechanisms for semiconductor devices are as follows. • Electromigration • Hot carrier-induced characteristics fluctuation • Time-dependent dielectric breakdown (TDDB) • Laser diode luminance degradation Semiconductor device life is defined as the time (or stress) at which the cumulative failure rate for the wear-out failure mode reaches the prescribed value, and can be estimated using the results of reliability tests and test element group (TEG) evaluation. Semiconductor device life is often determined by the reliability of each element (metal lines, oxide film, interlayer film, transistor, etc.) comprising the device, and these reliabilities are evaluated using TEG for each element in the process development stage. These TEG evaluation results are incorporated into design rules in the form of allowable stress limits (electric field strength, current density, etc.) to suppress wear-out failures in the product stage and ensure long-term reliability. As a result, semiconductor devices experience almost no wear-out failures within the reliability test time (stress) range in the product stage. (1) Life estimation method Semiconductor device life can be obtained as follows based on the wear-out failure data generated by TEG evaluation and reliability tests. First, linear regression is performed for the time-dependent cumulative failure rate using a Weibull probability distribution or logarithmic normal probability distribution, then the life is obtained from the time (or stress) at which the reference cumulative failure rate is reached and the acceleration factor of the accelerated test conditions (Fig. 2-9). 2-11 F(t) (%) 99.9 99.0 90.0 80.0 70.0 60.0 50.0 40.0 30.0 20.0 Acceleration test failure rate Predicted market environment failure rate 10.0 5.0 2.0 ×Acceleration factor 1.0 0.5 0.2 0.1 10 100 1000 Time (h) 10000 100000 Fig. 2-9 Failure Rate Prediction Method Using Weibull Probability Plotting Paper 2-12 2.2 Semiconductor Reliability Verification 2.2.1 Basic Approach To Reliability Verification The Sony Semiconductor Business Unit performs reliability verification that takes into account semiconductor device failure modes (see Fig. 2-10) in each stage from process development through mass production. Failure rate Early failure mode (Extrinsic failures) After burn-in Wear-out failure mode (Intrinsic failures) New process Operating time Fig. 2-10 Semiconductor Device Failure Rate Curve 2.2.1.1 Reliability Verification in the Development Stage The failure time due to wear-out failure (intrinsic failure) of semiconductor devices, that is to say the life, is determined by the failure mechanisms of the process elements described in 2.2.2. Reliability is evaluated in the process development stage using test element groups (TEG) suitable for verifying these failure mechanisms to confirm that the prescribed reliability is satisfied. 2.2.1.2 Reliability Verification in the Prototype Stage (1) Reliability verification for wear-out failures (Intrinsic failures) Reliability is evaluated over long times using small quantities of prototypes to verify that wear-out failures do not occur in the assumed operating environments and operating periods. (See Table 2-1.) (2) Reliability verification for early failures (Extrinsic failures) Semiconductor devices tend to have a high failure rate at the start of operation, and this failure rate tends to decrease steadily over time. This is because a certain percentage of semiconductor devices have inherent manufacturing defects such as dust, causing these devices to fail. This tendency is more noticeable for new processes, so burn-in studies are performed when introducing production to verify the early failure rate. When the prescribed failure rate is not satisfied, burn-in and other screening methods are used to remove 2-13 semiconductor devices with inherent manufacturing defects. The Sony Semiconductor Business Unit continuously executes activities to stabilize and improve processes, and strives to reduce the number of semiconductor devices with inherent manufacturing defects so that prescribed early failure rates can be satisfied without the need to perform burn-in. 2.2.1.3 Reliability Verification in the Mass Production Stage Mass production items are sampled* and reliability is periodically evaluated at the product level corresponding to (1) above to confirm that the wear-out failure reliability level built in at the development stage is continuously maintained from mass production onward. * Samples are taken from each product family in consideration of combinations of wafer process, assembly process, factory, and other factors. 2-14 Table 2-1 shows typical LSI product reliability test items used by the Sony Semiconductor Business Unit. Table 2-1 Typical Sony LSI Product Reliability Test Items Name of test Code Test conditions High Temperature Operating Life HTOL Tj≧125C Vop_max 1000h Low Temperature Operating Life LTOL Ta=-55C Vop_max 1000h Temperature Humidity Bias THB Ta=85C85%RH Vop_max On/Off 1000h High Temperature Storage HTS Ta=150C 1000h Temperature Cycling TC Ts=-65~125C 700cyc Ts=-40~125C 850cyc Ts=-65~150C 500cyc Moisture Sensitivity Level MSL Level 3 (standard lank) (J-STD-020) Electrostatic Discharge Human Body Model (HBM) ESD HBM C=100pF, R=1500Ω (JS-001-2014) Electrostatic Discharge Charged Device Model (CDM) ESD CDM Charged Device Model (JESD22-C101) Latch-Up Trigger Pulse Current Injection Method LU I-Test Trigger pulse current injection method (JESD78) Latch-Up Supply Overvoltage Method LU V-Test Power supply overvoltage method; Ta=25, 125C (JESD78) Burn-In Study (Early Life Failure Rate) BIS (ELFR) Tj≧125C, Vop_max 2.2.2 Reliability in the Development and Design Stages Semiconductor devices have failure mechanisms unique to semiconductors, and resolving these problems in the process development stage is an important element for securing reliability. Stable product reliability can be secured by verifying the required reliability when developing each process element and reflecting these results to the design rules. Table 2-2 shows typical failure mechanisms that can pose problems in the process development stage. As processes become more miniaturized, higher internal electric fields, current densities, metal line stress and other factors increase the stress applied to transistors and metal lines. On the other hand, faster circuit speeds and increased parasitic impedance (metal line resistance, parasitic capacitance) reduce operating margins, which is a major issue in securing reliability with respect to transistor characteristics fluctuation. Typical semiconductor device failure mechanisms that can pose problems in the process development and design stages are described below. 2-15 Table 2-2 Typical Failure Mechanisms in the Process Development Stage Process element Failure mechanism Failure mode and cause Gate dielectric film Time-dependent dielectric breakdown (TDDB) Dielectric breakdown of the gate dielectric film. This is the phenomenon where bias applied to a gate electrode for a long time produces defects in the gate dielectric film, increasing the micro leak current and leading to dielectric breakdown. Transistor Hot carrier (HCI) Transistor characteristics fluctuation due to trapping of hot carriers in the gate dielectric film. This is the phenomenon where high-energy electrons and holes generated by impact ionization of electrons accelerated by high electric fields are trapped in the oxide film, causing the transistor characteristics to fluctuate. NBTI (slow trap) PMOS transistor characteristics fluctuation due to application of a gate negative bias (NBT). This is also called the slow trap phenomenon, and is the phenomenon where application of a bias at high temperatures increases the interface state and positive fixed charge, causing the transistor characteristics to fluctuate. Soft error Memory data rewrite error due to high-energy cosmic ray particles (neutron rays, proton rays, etc.), α rays, etc. This is a temporary data error phenomenon that occurs mainly in DRAM and SRAM. Retention/disturb Non-volatile memory data loss. This is the phenomenon where long-term storage or operating environment stress (read/write electric field, temperature, stress) causes the trapped charge in a Flash memory to disappear, inverting the data. Electromigration Increased metal line resistance and disconnection due to voids forming in metal lines. This is the phenomenon where physical impacts between electrons and metal atoms cause the metal atoms to move, creating voids. Stress migration The metal creep phenomenon due to metal line stress causes voids to form and grow in metal lines and connection (via hole) portions, resulting in open defects. In copper lines, this is the phenomenon where vacancies (atom holes) in copper lines due to metal line stress induce the creep phenomenon, causing voids to form and grow. TDDB between metal lines Short-circuit due to dielectric breakdown between copper lines. This phenomenon mainly consists of dielectric breakdown via the CMP interface of an interlayer dielectric film that uses low-k materials, resulting in a short-circuit between metal lines. Memory device Metal lines Low-k interlayer films 2-16 2.2.2.1 Time-dependent Dielectric Breakdown (TDDB) MOS FET gate dielectric film has a failure mechanism whereby applying even an electric field of the dielectric withstand voltage or less for a long time causes the dielectric film to deteriorate and lead to breakdown. This breakdown of the dielectric film over time is called time-dependent dielectric breakdown (TDDB). The TDDB life of gate dielectric film is one of the most important failure mechanisms determining the long-term reliability of a MOS-type semiconductor device. The TDDB life said to be the factor that determines the limit for reducing the gate dielectric film thickness, and the gate dielectric film thickness in system LSI is also sometimes determined by the TDDB life in accordance with the logic circuit supply voltage. (1) Gate dielectric film life distribution Time-dependent dielectric film breakdown phenomena can generally be divided into an initial breakdown area rooted in defects and a genuine life area. Fig. 2-11 shows the TDDB measurement data of a gate oxide film (SiO2) plotted using a Weibull distribution function. The initial breakdown and genuine life areas can be separated according to differences in the shape parameter (graph slope) of the Weibull distribution function. Dielectric film distributed in the initial breakdown area with a short TDDB life is oxide film that includes defects that may fail in a short time in the market, so it is important to suppress the defect occurrence rate to lower the early failure rate. In contrast to this, the genuine breakdown area indicates the natural life of gate dielectric film that does not include major defects, and is a necessary index for assuring long-term reliability. The genuine life at the actual operating voltage can be predicted using an electric field acceleration model from the evaluation results of TDDB accelerated by high electric field stress conditions. The electric field acceleration model uses the Emodel (τexp(E)), Power-law model (τE-n) and other models according to the film thickness and film type. (See Fig. 2-12.) (2) Gate dielectric film breakdown mechanism Gate dielectric film contains a large number of micro defects and impurities that occur in the wafer process, and micro leak currents flow via these defects even in the state where the applied electric field (supply voltage) is less than the genuine withstand voltage. These leak currents generate new defects in the dielectric film over time, and the accumulation of these defects leads to dielectric film breakdown. The percolation model is a typical failure mechanism for TDDB breakdown of thin gate dielectric film. In this failure model, when defects initially present in the gate dielectric film and new defects generated by tunnel current flowing due to the application of electric fields are continuous in the thickness direction, this leads to dielectric breakdown. (See Fig. 2-13.) As gate dielectric film becomes thinner, fewer defects may generate continuous defects which are needed for dielectric breakdown, so the TDDB life variance increases. In addition, data written in Flash memories can also 2-17 be lost (phenomenon of retention) due to micro leak currents prior to breakdown. Genuine life distribution Oxide film that includes defects (early failure area) Fig. 2-11 TDDB Data Distribution (Weibull) EFIELD: Actual electric field ETEST: Test electric field Fig. 2-12 Electric Field Acceleration Model and Life Prediction 2-18 (a) Initial stage (b) Defect generated by micro leak current (c) Breakdown occurs Defect Fig. 2-13 Gate Dielectric Film Breakdown Model (Percolation Model) 2.2.2.2 Hot carrier (HCI) Hot carrier is a failure mechanism where a charge (carrier) that has attained high energy mainly due to acceleration by the electric field inside the MOS FET becomes trapped in the gate dielectric film, causing the transistor characteristics to fluctuate and resulting in a circuit operation error. In a general operating environment, the greatest transistor deterioration is caused by Drain Avalanche Hot Carrier (DAHC) injection, which occurs when electrons flowing along an NMOS FET channel are accelerated by the high electric field near a drain. On the other hand, the hot carrier mechanism that injects a charge to the dielectric film is also used to write and erase data in a non-volatile memory. (1) Drain Avalanche Hot Carrier (DAHC) injection Electrons flowing in a NMOS FET channel are accelerated by the high electric field near a drain and undergo impact ionization, generating electron-hole pairs. Of the electron or the hole, the carrier with the higher energy (hot carrier) is injected to and trapped by the gate dielectric film, causing the transistor characteristics to fluctuate (threshold value fluctuation, drop in drain current, etc.). This is called Drain Avalanche Hot Carrier (DAHC) injection. (See Fig. 2-14.) The dominant DAHC injection mode in a NMOS FET is mainly electron injection, and the maximum deterioration occurs under the condition where the gate voltage is approximately 1/2 • VDS. This means that in a CMOS circuit, hot electron injection occurs when the signal is inverted (H→L/L→H), so deterioration progresses as the circuit is operated. This problem can be avoided by selecting operating conditions (voltage, duty) in the circuit design stage under which hot carriers are not easily generated, and reliability can also be increased by providing circuits with the required operating margin. Device countermeasures are also taken, such as adopting a device structure (LDD structure) that suppresses hot carrier generation by reducing the electric field around drains. 2-19 Electron Hole Gate Source Drain Fig. 2-14 DAHC Mechanism 2.2.2.3 Negative Bias Temperature Instability (NBTI) PMOS FET negative bias temperature instability (NBTI) is the phenomenon where transistor characteristics fluctuate when a negative gate bias is applied to a PMOS FET. This is one of the transistor deterioration mechanisms known as slow trap. PMOS FET is one of the latest MOS processes, and the use of surface channeltype transistors causes deterioration to increase, which is a transistor reliability problem on a level with hot carriers. (1) NBTI deterioration mechanisms When a negative bias is applied to a PMOS FET, the holes on the Si surface are trapped by the Si-H bond of the Si-SiO2 interface, and the hydrogen (H) is disassociated from the Si-H bond and generates an interface state. The hydrogen disassociated from the Si bond diffuses and is trapped within the gate dielectric film, generating a positive fixed charge that promotes deterioration of the transistor characteristics. Si ≡ Si- H + hole  Si ≡ Si-・+ + H H + H  H2 The interface state generated at the interface between the Si and the gate dielectric film traps the positive charge when the PMOS FET operates, and becomes positively charged. This generates a positive fixed charge in the dielectric film, and causes the transistor threshold voltage (Vth) to fluctuate and the drain current to drop. One characteristic of NBTI is that when negative bias is applied to a gate, deterioration occurs regardless of transistor operation, so deterioration proceeds even in circuits that are not operating. On the other hand, there is also the phenomenon that fluctuating characteristics recover rapidly when negative bias stress is not applied, and the amount of fluctuation in the operating state is known to be largely independent of the operating frequency. In the process conditions, the amount of NBTI deterioration is closely related to the concentrations and profile of the impurities (N, H, B, etc.) in the gate dielectric film, and the amount of deterioration increases in particular for gate dielectric films (SiON, SiN) with high nitrogen (N) contents. 2-20 This problem can be avoided by design countermeasures such as providing sufficient margin for circuit operation on account of transistor deterioration, and by reducing the electric fields applied to gate dielectric film. Device countermeasures are also taken such as forming the gate dielectric film so that interface states and fixed charges are not easily generated. Diffusion to within the oxide film  Generation of a positive fixed charge Hole Generation of an interface state Hole trapping Si-SiO2 interface terminated by hydrogen (H) (Negative bias applied) Hole trapping by the tunnel phenomenon Disassociation of hydrogen (H) and generation of an interface state Fig. 2-15 NBTI Failure Mechanisms 2.2.2.4 Soft Error When α rays and high-energy neutron rays generated from cosmic rays, etc. penetrate memory elements and other semiconductor devices, large quantities of electron-hole pairs are generated within the silicon crystals. These charges invert the memory nodes, resulting in memory data errors known as the soft error phenomenon. The soft error phenomenon temporarily inverts the memory and logic circuit data, and these errors can be recovered by rewriting the data. This phenomenon was previously a problem for DRAM, but is currently also considered a problem for SRAM reliability. (1) Principle of soft error generation by α rays The quartz materials used in the sealing resin packages of semiconductors contain trace amounts of radioactive elements (uranium: 238U; thorium: 232Th). In addition, the lead bumps used in flip chips sometimes contain polonium (210Po). When the high-energy α rays emitted by these radioactive elements penetrate the silicon substrate, electron (e-) and hole (e+) pairs are generated along the α ray path inside the silicon. The electric field causes electrons generated inside the depletion layers to migrate and cluster together in the n 2-21 diffusion area, which causes the memory node capacity potential to drop. (See Fig. 2-16.) Fig. 2-17 shows the soft error mechanisms in the SRAM memory cell. When the High side memory node potential falls below the driver transistor threshold value, the two inverters forming a Flip-Flop both turn off at the same time, making the Flip-Flop unstable and causing misoperation. Generally when the word line is selected, the High side memory node potential (Vh) drops to Vcc - Vth (word transistor threshold value). When the word line is not selected, the High side memory node is charged by the memory cell load and the potential returns to Vcc. The faster this recovery time from Vcc - Vth to Vcc, that is to say the greater the current supply capacity of the memory cell load, the more resistant the SRAM is to soft errors. Countermeasures for soft errors caused by α rays include forming a protective film on the chip surface to absorb α rays. In addition, countermeasures are also taken to reduce α ray emission levels such as by using highly pure package materials with reduced levels of radioactive element contents. Fig. 2-16 Generation of Electron and Hole Pairs by α Rays 2-22 Fig. 2-17 Soft Error in the SRAM Cell (2) Soft errors due to cosmic rays High-energy cosmic rays collide in the atmosphere with the atoms that comprise the atmosphere, generating high-energy protons and neutrons. These high-energy neutron rays passing through silicon, electron-hole pairs are generated along the range and the neutron rays collide with silicon atoms to generate secondary ions by spallation reaction; which can cause soft errors. The quantity of high-energy neutrons generated by cosmic rays that reaches the ground is known to increase in high-elevation regions due to differences in geographical conditions and lower atmospheric shielding effects, and this causes the soft error occurrence rate to increase. This can pose serious reliability problems in applications such as aircraft and satellites. It is difficult to suppress factors causing soft errors due to cosmic rays, so this is known as a failure mode that occurs at a certain probability. One countermeasure method for SRAM is to mount error correcting code (ECC) so that data experiencing soft errors is corrected. In addition, device structures such as SOI structures that are resistant to the effects of soft errors are also sometimes used. 2.2.2.5 Electromigration Electromigration is a failure mechanism where electrons flowing through metal (Al, Cu) lines collide physically with the metal atoms, causing the metal atoms to migrate and form voids in the metal lines which lead to increased metal line resistance and disconnection. Electromigration is a key failure mechanism that determines the long-term reliability of metal lines. (1) Aluminum electromigration 2-23 The thin films used in aluminum (Al) lines are formed by spattering, and the aluminum atoms accumulate in a polycrystalline (grain) structure. (See Fig. 2-18.) When current of a certain density or more flows through these metal lines, the electromigration phenomenon is caused where the metal atoms physically move by stress due to collisions between the electrons and metal atoms. The metal atoms around the grain boundaries have weak bonding energy and move easily, so electromigration occurring at the grain boundaries of metal lines with uneven grain sizes causes voids to form and grow along the grain boundaries, leading to disconnection. (See Figs. 2-19 and 2-20.) Process countermeasures include adding trace amounts of copper to aluminum to suppress aluminum atom migration by slowing down the movement time, and covering the top and bottom of metal lines with Ti, W or other metal alloys (cap layer) to suppress aluminum atom movement. Circuit design countermeasures are also taken such as keeping the current density that flows in metal lines to a certain value or less. Fig. 2-18 Aluminum Grain Structure Al accumulation Al shortage (void) Al grain boundary Grain boundary diffusion Electron Fig. 2-19 Electromigration Mechanism 2-24 Interlayer dielectric film Fig. 2-20 Photo of Electromigration (2) Copper electromigration Copper lines are formed by an embedded metal line (damascene) process that uses electroplating. Copper has a higher melting point and activation energy than aluminum, and exhibits reliability with respect to electromigration that is several ten to several hundred times higher than that of aluminum. However, the miniaturization of metal lines in the latest processes is increasing the current density, so resistance to electromigration is becoming an important issue for reliability. The electromigration resistance of copper is known to be greatly affected by the crystal grain size and alignment, and the adhesion at the interface between the copper and the barrier metal. Particularly in copper lines that has a structure surrounded by barrier metal, when the adhesion drops between the copper and the cap layer on the top surface where smoothing is performed, the copper at the interface moves easily, resulting in migration. Therefore, it is important that the process incorporate countermeasures to increase the adhesion at the interface between the copper and the cap layer. Circuit design countermeasures are also taken such as keeping the current density that flows in metal lines to a certain value or less. 2.2.2.6 Stress Migration Stress migration is a failure mechanism where stress applied to metal lines causes the metal atoms to creep, forming voids in metal lines which lead to increased metal line resistance and disconnection. Stress is generated in the metal lines (Al, Cu) used in LSI due to temperature differences between the heat treatment process in the manufacturing process and the operating environment temperature. Thanks to this stress, vacancies in the metal lines can creep and converge in a single location, forming a void. Stress migration occurs due to the interaction between the metal line stress and the metal atom creep phenomenon. Whereas the metal atom creep speed increases at high temperatures, the stress acting on the metal lines decreases at high temperatures, so there is known to be a peak to the temperatures at which stress migration occurs. 2-25 (1) Aluminum stress migration Aluminum lines have many vacancies and aluminum atoms with weak bonding force at the grain boundaries of the polycrystalline structure, so when tensile stress is applied to metal lines, these aluminum atoms and vacancies at the grain boundaries creep and form voids. Aluminum voids produced by tensile stress mainly form and grow along the crystal grain boundaries, and can lead to increased metal line resistance and disconnection defects. (See Fig. 2-21.) Aluminum stress migration is generally said to have an occurrence ratio peak around 150 to 200°C, and can become a problem for long-term reliability in devices that are used for long times in high-temperature environments. As a design countermeasure, patterns are designed to avoid applying excessive stress to metal lines. Process countermeasures include using a metal line structure that layers the aluminum between upper and lower layers of a cap layer (Ti, W, etc.) to prevent stress migration. In addition, countermeasures such as using an interlayer film structure that reduces stress and optimizing the heat treatment process are also taken to reduce the residual metal line stress. Fig. 2-21 Disconnection Defect due to Aluminum Stress Migration (2) Copper stress migration Regarding copper stress migration, the stress induced voiding (SIV) mode that produces voids in via holes that connect upper and lower lines is a problem for reliability. When wide lines and narrow lines are connected by a single via hole, the tensile stress on the wide line side concentrates in the via hole, causing the vacancies in the copper to creep and migrate to the via hole and form a void. (See Fig. 2-22.) Stress migration at copper via holes is known to have an occurrence temperature peak around 200°C. However, this failure is largely dependent on the stress generated in the high- temperature annealing process after copper line formation, so it occurs in a short time and is an early failure factor. A countermeasure method in the design stage is to use multiple via holes in areas where wide lines and narrow lines are connected. When metal lines are connected by multiple via holes, even if stress concentrates 2-26 on a single via hole and creates a void, the stress applied to other via holes is reduced so voids do not easily occur at those other via holes, enabling prevention of open defects between metal lines. Process countermeasures are also taken such as reducing the copper stress and selecting process conditions that reduce the vacancies in copper. Fig. 2-22 Void Caused by Stress Migration in a Copper Wiring Via Hole 1) <References> 1) R. Kanamura et al.: Symp. on VLSI Tech., p. 107, 2003 2-27 2.3 Acceleration Model In general, failure of components including semiconductor devices occurs due to some reaction at the atomic or molecular level, and can be described by the Eyring absolute reaction theory (hereafter, “Eyring model”). This Eyring model expresses the lifetime L in the absolute temperature T range that should be the focus for reliability by the following separation of variables-type equation, using the activation energy Ea shown in Fig. 2-23, the non-temperature stress S that is a factor inducing failure, and the Boltzmann’s constant k(8.617×10E-5[eV/K]). L = A・S-n exp(Ea/kT)・・・・Eq. 2.3.1 “A” and “n” in the above equation are constants. Outlines of the environmental and operating stress acceleration models used for semiconductor devices are described below. 2.3.1 Acceleration Models for Environmental Stress (1) Temperature acceleration model exp(Ea/kT) on the right side of Equation 2.3.1 is also called the Arrhenius model since this is the same to the equation derived empirically by Arrhenius in the 19th century. Ea is the activation energy of which unit is “eV.” The activation energy is an essential one for the progress of chemical and physical reactions. If chemical and physical reactions consisting of failure mechanisms are same, the activation energies are inevitably equal. L = A・exp(Ea/kT) ・・・・Eq. 2.3.2 (2) Humidity acceleration model Humidity-induced acceleration models express the absolute vapor pressure Vp or the relative humidity RH as humidity stress. Typical models are described below. ① Absolute vapor pressure model This model expresses temperature stress and humidity stress using the absolute vapor pressure VP, which is empirically known as correct. Because Vp depends on the temperature, the Eyring Model cannot be used. L = VP-n ・・・・Eq. 2.3.3 ② Relative humidity model This model is expressed conforming with the Eyring model by a separation of variables-type equation using the absolute temperature T and relative humidity RH since Vp depends on the temperature, and corresponds to the case when S = RH in Equation 2.3.1. L = A・(RH)-n exp(Ea/kT) 2-28 ・・・・Eq. 2.3.4 ③ Lycoudes model Those models which multiply temperature, relative humidity and a function of voltage are also available. As a typical model, the Lycoudes model reported by N. Lycoudes is shown below. MTTF=A・exp(Ea/kT)・exp(B/RH)・V-1・・・・Eq. 2.3.5 “V” and “B” in the above equation are voltage and a constant respectively. (3) Temperature difference acceleration model This model is applied to failures caused by the repeated application of stress (thermal stress) produced by temperature differences. Labeling the temperature difference as ∆T, the number of cycles N is expressed using the following equation, by substituting S=ΔT for Equation 2.3.1. N=A・ΔT –α ・・・・Eq. 2.3.6 [Supplement] In case of low cycle fatigue, failures due to thermal fatigue of materials (cycle life) Nf conforms to the Coffin-Manson model described by the following equation, where ∆ε is the plasticity strain amplitude. ΔεP・Nfα=C ・・・・Eq. 2.3.7 “a” and “C” in the above equation are material constants. In case of low cycle fatigue, failure due to repeated thermal stress conforms to the Coffin-Manson model, and the temperature difference acceleration model is thought to be a form of that model. Semiconductor chip failure can be broadly described using the temperature difference acceleration model, but the Coffin-Manson model must be taken into account for mounting failures including package factors, such as the thermal fatigue life of soldered portions. The following is a variation of the Coffin-Manson model on which the effects of the temperature cycling frequency and maximum temperature, suggested by Norris and other persons. Nf=C・fm・ΔεP-n・exp(Q/kTMAX) ・・・・Eq. 2.3.8 In the above equation; “Nf” is the fatigue life, “C” is the material constant, “m” and “n” are exponents, “f” is the cycling frequency, “ΔεP” is the plasticity strain amplitude, “Q” is the activation energy, “k” is the Boltzmann's constant and “TMAX” is the maximum temperature. 2-29 Activation energy Activated state Normal state Degraded state Fig. 2-23 Activation Energy 2.3.2 Acceleration Models for Operating Stress Operating stresses that determine semiconductor device life include voltage, current, electric field strength, current density, etc., and differ according to the failure mechanism as described in section 2.2.2. The main failure mechanism acceleration models are described below. Note that life in these models also depends on the temperature, so it is expressed by an Eyring model of the operating stress and temperature stress. (1) Time-dependent dielectric breakdown (TDDB) acceleration models The life of devices (TTF) due to TDDB depends on the gate oxide film thickness. The Eox model is said to be appropriate for those devices of which gate oxide film thickness is 5nm or more; the Vg model for more than 2nm, and less than 5nm; and the Power-law model for 2nm or less. ① ② ③ Eox model TTF=A・exp(-γEOX・Eox) exp(Ea/kT) ・・・・Eq. 2.3.9 TTF=A・exp(-γVg・Vg) exp(Ea/kT) ・・・・Eq. 2.3.10 Vg model Power-law model TTF=A・Vgn・exp(Ea/kT) ・・・・Eq. 2.3.11 In the above equations; “γEOX” is the field intensity acceleration factor, “γVg” and “n” are voltage acceleration factors, “Eox” is the stress electric field applied to the gate and “Vg” is the stress voltage applied to the gate. 2-30 (2) Hot carrier (HCI) acceleration models The life of devices due to hot carriers is indicated by the substrate current model expressed by the substrate current and the 1/Vds model expressed by the drain voltage. Since process nodes for the 0.25um and 0.15um generations and newer devices, other impacts are greater than that of the substrate current, the 1/Vds model is becoming the main one. ① substrate current model TTF=A・Isub -m ・exp(Ea/kT) ・・・・Eq. 2.3.12 ② 1/Vds model TTF=A・exp(B/Vds)・exp(Ea/kT) ・・・・・Eq. 2.3.13 In the above equations; “m” is the factor depending on the substrate current, “B” is the factor depending on the voltage, “Isub” is the maximum substrate current while stress is being applied and “Vds” is the drain voltage while stress is being applied. (3) Negative Bias Temperature Instability (NBTI) acceleration models The life of devices due to NBTI is often indicated by the following equations: TTF=A・exp(γ・Eox) exp(Ea/kT) ・・・・Eq. 2.3.14 TTF=A・Eoxγ ・exp(Ea/kT) ・・・・Eq. 2.3.15 n TTF=A・Vg ・exp(Ea/kT) ・・・・Eq. 2.3.16 In the above equations; “γ” is the field intensity acceleration factor, “n” is the voltage acceleration factor, “Eox” is the stress electric field applied to the gate oxide film and “Vg” is the stress voltage applied to the gate oxide film. (4) Electromigration (EM) acceleration model In general, the life of devices due to EM is logically explained by the Huntington’s equation. ∂C/∂t=D∇｛∇C-(eZ*/kT) E・C｝・・・・Eq. 2.3.17 In the above equation; “C” is the atomic concentration, “D” is the diffusion factor, “Z*” is the effective valence, “E” is the electric field, “e” is the electronic charge, “k” is the Boltzmann's factor and “T” is the absolute temperature. To calculate the actual life of devices due to EM (TTF), the Black’s equation which was derived empirically, is widely used. In the following equation; “T” is the absolute temperature, “j” is the current density, “Ea” is the activation energy, “A” is the constant of proportionality, “n” is the function of the current density and“k” is the Boltzmann's factor. TTF=A・j-n・exp(Ea/kT) ・・・・ 2-31 Eq. 2.3.18 <References> 1) JEITA EDR-4704A: Application guide of the accelerated life test for semiconductor devices 2) JEITA EDR-4707：Report on Failure Mechanism of LSI and reliability test method 3) JEITA ETR-7024：Research Report on Effect of Voids on Reliability of Lead-Free Solder Joints and Standard of Evaluation Criteria 4) N. J. Flood：Reliability aspects of plastic encapsulated integrated circuit, IRPS(1972) 5) D. S. Peck：Temperature-humidity acceleration of metal-electronics failure in semiconductor devices, IRPS(1973) 6) N. Lycodes：The reliability of plastic microcircuit in moist environments, Solid State Technology(1978) 7) T. Gasser：Hot Carrier Degradation in Semiconductor Device 8) Comparison of NMOS and PMOS hot carrier effects, IEEE transaction on electron devices(1997) 9) H. B. Huntington：Diffusion in Solids, Academic Press(1975) 2-32

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 2 Semiconductor Device Reliability Verification