Download Chapter 2 Semiconductor Device Reliability Verification

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Electrical resistivity and conductivity wikipedia , lookup

Transcript
Chapter 2
Semiconductor Device
Reliability Verification
Semiconductor Quality and Reliability Handbook
Chapter 2 Semiconductor Device Reliability Verification
2.1 Fundamental Knowledge on Semiconductor Reliability.............................................................................. 2-2
2.1.1 Measures for Representing Reliability............................................................................................. 2-2
2.1.2 Distributions Used in Reliability Analysis ......................................................................................... 2-4
2.1.3 Semiconductor Device Failure Pattern ........................................................................................... 2-7
2.1.3.1 Semiconductor Device Failure Regions ........................................................................... 2-7
2.1.3.2 Early Failures ..................................................................................................................... 2-8
2.1.3.3 Random Failures ............................................................................................................. 2-10
2.1.3.4 Wear-out Failures ............................................................................................................ 2-11
2.2 Semiconductor Reliability Verification ........................................................................................................ 2-13
2.2.1 Basic Approach Toward Reliability Verification............................................................................. 2-13
2.2.1.1 Reliability Verification in the Development Stage ........................................................... 2-13
2.2.1.2 Reliability Verification in the Prototype Stage ................................................................. 2-13
2.2.1.3 Reliability Verification in the Mass Production Stage ..................................................... 2-14
2.2.2 Reliability in the Development and Design Stages....................................................................... 2-15
2.2.2.1 Time-Dependent Dielectric Breakdown (TDDB)............................................................ 2-17
2.2.2.2 Hot carrier (HCI) ............................................................................................................... 2-19
2.2.2.3 Negative Bias Temperature Instability (NBTI) ................................................................ 2-20
2.2.2.4 Soft Error .......................................................................................................................... 2-21
2.2.2.5 Electromigration ............................................................................................................... 2-23
2.2.2.6 Stress Migration ............................................................................................................... 2-25
2.3 Acceleration Model...................................................................................................................................... 2-28
2.3.1 Acceleration Models for Environmental Stress............................................................................. 2-28
2.3.2 Acceleration Models for Operating Stress .................................................................................... 2-30
2-1
2.1 Fundamental Knowledge on Semiconductor Reliability
With recent advances in the systematization, functions and performance of equipment, the social impact and
damages produced by failures are increasing, and high reliability has come to be demanded of equipment. This
means that even higher reliability is demanded of the individual components that comprise equipment.
Large quantities of semiconductors are used in a single piece of equipment, and these semiconductors often
handle the main functions of that equipment, so high reliability is extremely important. Semiconductors themselves
are also becoming more miniaturized and highly integrated, with larger-scale circuit configurations. In addition, as
semiconductor functions and performance advance and evolve into system LSIs, ensuring semiconductor reliability
has become a vital matter.
The reliability measures, distribution functions, trends in failure rates over time, and failure regions needed to
discuss semiconductor reliability are described below.
2.1.1 Measures for Representing Reliability
JISZ 8115 (Reliability Terminology) defines reliability as “The property of an item which enables it to fulfill its
required functions for the prescribed period under the given conditions.” Therefore, reliability includes the concept
of time, and reliability measures are functions of time.
(1) Reliability Function (Reliability): R(t)
Reliability indicates the probability for functioning correctly without failure until time t.
When n samples are used under the same conditions, if the number of failures occurring until time t has
elapsed is expressed as r(t), then the reliability R(t) is expressed by the following equation.
R (t ) 
n  r (t )
n
・・・・Eq. 2.1.1
(2) Failure Distribution Function (Unreliability): F(t)
This indicates the probability of failure occurring until time t, and is expressed by the following equation.
F (t ) 
r (t )
n
・・・・Eq. 2.1.2
In addition, the following relationship is established between unreliability F(t) and reliability R(t).
R (t )  F (t )  1
・・・・Eq. 2.1.3
As shown in Fig. 2-1, R(t) decreases from 1 over time, while conversely F(t) increases from 0 toward 1 over
time. Note that the distribution functions described hereafter are used as the failure distribution functions of
semiconductor devices.
2-2
Fig. 2-1 Relationship between F(t) and R(t)
(3) Failure Density Function: f(t)
This represents the probability of failure occurring per unit time when time t has elapsed.
f (t ) 
dF (t )
dR(t )

dt
dt
・・・・Eq. 2.1.4
(4) Failure Rate Function: λ(t)
This represents the probability of failure occurring in the next unit time for samples that have not yet failed
when time t has elapsed.
 (t ) 
f (t )
f (t )

1  F (t ) R(t )
・・・・Eq. 2.1.5
The failure rate function is also called the instantaneous failure rate, and is calculated from the failure
distribution function F(t) using Equations 2.1.4 and 2.1.5. Failure In Time (FIT: number of failures per billion
(109) total operating hours) is generally used as the unit for semiconductor devices.
Note that when the F(t) of the subject product is not known, the average failure rate obtained by the following
equation is used.
Average failure rate ≡ Total number of failures during the period / Total operating time during the period
・・・Eq. 2.1.6
[Supplement]
In addition to the failure rate defined above, the cumulative failure rate after a set equipped with the
semiconductor device has operated for the specified time in the market is sometimes used in the early failure
region described hereafter. Unless otherwise requested by the customer, the Sony Semiconductor Business
Unit also uses the cumulative failure rate after one year as the early failure rate.
In addition, after the early failure region, most semiconductor devices do not reach wear-out failure (genuine
failure) in the actual operating environment, and the failure rate exhibits the constant value of the random
2-3
failure region. This value becomes the same as that obtained by Equation 2.1.6, so the average failure rate can
be said to essentially be the failure rate after the early failure region.
(5) Mean Time To Failure: MTTF
The Mean Time To Failure (MTTF) of an item such as a semiconductor device that is not subject to repair or
maintenance is expressed by the following equation.

MTTF   tf (t )dt
0
・・・Eq. 2.1.7
2.1.2 Distributions Used in Reliability Analysis
Typical distribution functions used to analyze reliability data of semiconductor devices are described below.
(1) Normal distribution
The normal distribution is a typical continuous distribution used for quality control. It is said that in
reliability analysis, the normal distribution is often applied to wear-out life where failures concentrate around a
certain time.
The probability density function f(t) and distribution function F(t) are expressed by the following equations.
 t   2 
1
exp
(  t  ) ・・・・Eq. 2.1.8
f (t ) 
2 
2 
 2 
F t  
1
2 
  x   2 
exp
  2 2 dx    t   
t
・・・・Eq. 2.1.9
This distribution is given by the mean parameter μ and the dispersion (variance) parameter σ.
As shown in Fig. 2-2 below, the normal distribution has a symmetrical bell shape centering on μ, and the
probability of the value t being contained within the range of ±σ, ±2σ and ±3σ to both sides of μ is 68.26%,
95.44% and 99.7%, respectively.
Fig. 2-2 Normal Distribution
2-4
(2) Exponential distribution
The exponential distribution represents the life distribution (failure distribution function) in the random
failure region where the failure rate λ is constant over time, and the probability density function f(t) and
distribution function R(t) are expressed by the following equations. This distribution corresponds to the case
when the shape parameter m = 1 in the Weibull distribution described hereafter.
f (t )  e  t
R(t )  1  e  t
・・・・Eq. 2.1.10
・・・・Eq. 2.1.11
Fig. 2-3 Exponential Distribution
Note that as shown in the following equation, the MTTF is given from t0, which is the inverse of the failure
rate λ.
1   t0  MTTF
・・・・Eq. 2.1.12
(3) Logarithmic normal distribution
The logarithmic normal distribution is a distribution function where ln t, which is the logarithm of the life
time t, follows the above-mentioned normal distribution.
The probability density function f(t) and distribution function F(t) are expressed by the following equations.
 1  ln t    2 
1
f (t ) 
exp 
 
2 t
 2    
1
F (t ) 
2 
0  t   
 1  ln x    2 
1
0 x exp 2    dx


t
2-5
・・・・Eq. 2.1.13
・・・・Eq. 2.1.14
Fig. 2-4 Logarithmic Normal Distribution
In semiconductor device reliability, the electromigration life is generally known to follow a logarithmic
normal distribution.
(4) Weibull distribution
The Weibull distribution is a weakest link model proposed by W. Weibull (Sweden) in 1939 as a mechanical
breakdown strength distribution. This model was applied by J. H. K. Kao in 1955 to analyze the life of vacuum
tubes, and has often been used since then to model life distributions in analysis of semiconductor device
reliability.
The probability density function f(t) and distribution function F(t) are expressed by the following equations.
m  t 
f (t )  
 



m 1
  t  
exp 
  
  t  
F (t )  1  exp 
  



m



m






・・・・Eq. 2.1.15
・・・・Eq. 2.1.16
2-6
f(t)
Fig. 2-5 Weibull Distribution
Here, m is called the form parameter, η the measure parameter (characteristic life), and γ the position
parameter.
In addition, assuming t0=ηm, the failure rate (t) is expressed by the following equation.
 (t) 
m t 

  



m 1

m
t   m 1
t0
・・・・Eq. 2.1.17
The following information concerning the failure pattern can be obtained from the value of the form
parameter m.
0 < m < 1: Early failure (DFR) pattern where the failure rate decreases over time
m = 1: Random failure (CFR) pattern where the failure rate is constant (matches with the exponential
distribution)
m > 1: Wear-out failure (IFR) pattern where the failure rate increases over time
2.1.3 Semiconductor Device Failure Pattern
2.1.3.1 Semiconductor Device Failure Regions
Like general electronic equipment, semiconductor device failure regions are classified into the three types of
early, random and wear-out failure regions, and the time-dependent trend in the failure rate creates a curve called a
bathtub curve as shown in Fig. 2-6.
This curve is the sum of the early failure rate which decreases steadily over time, the random failure rate which
exhibits a constant value, and the wear-out failure rate which increases steadily over time. However, in case of
semiconductor devices, the random failure rate is thought to consist of only small soft errors as described hereafter,
and the failure rate in the random failure region (the height of the bottom of the bathtub) can be said to be
dominated by the sum of the failure rates of the region where the early rate converges towards a constant value and
2-7
the region where the wear-out failure rate begins to rise.
Early failure region  Random failure region
Wear-out failure region
Failure rate
Wear-out
failure rate
Early failure rate
Random failure rate
Product shipped
Life (Useful years)
Operating time
Fig. 2-6 Time-Dependent Change in Semiconductor Device Failure Rate
2.1.3.2 Early Failures
The failure rate in the early failure period is called the early failure rate (EFR), and the failure rate
monotonically decreases over time. The vast majority of semiconductor device early failures are caused by defects
built into devices mainly in the wafer process. The most common causes of these defects are dust adhering to
wafers in the wafer process and crystal defects in the gate oxide film or the silicon substrate, etc. Most devices
containing defects rooted in the manufacturing process fail within the manufacturing process and are eliminated as
defective in the final sorting process. However, a certain percentage of devices with relatively insignificant defects
may not have failed when making the final measurements and may be shipped as passing products. These types of
devices that are inherently defective from the start often fail when stress (voltage, temperature, etc.) is applied for a
relatively short period, and exhibit a high failure rate in a short time within the customer’s mounting process or in
the initial stages after being shipped as products. However, these inherently defective devices fail and are
eliminated over time, so the rate at which early failures occur decreases.
This property of semiconductor devices where the failure rate decreases over time can be used to perform
screening known as “burn-in,” where stress is applied for a short time in the stage before shipping to eliminate
devices containing initial defects. Product groups from which devices with inherent initial defects have been
removed to a certain degree by burn-in not only improve the early failure rate in the market, but also make it
possible to maintain high quality over a long period as long as these products do not enter the wear-out failure
region.
An overview of burn-in is described below.
(1) Derivation of failure distribution function of early failure period
In order to determine the burn-in conditions for reliably removing devices with inherent early failures, it is
necessary to obtain the failure distribution function of the early failure period.
2-8
To obtain this function, highly accelerated life tests are performed in a short time using a sample quantity on
a scale that is certain to contain devices with inherent initial defects (normally several thousand to ten thousand
pieces). The obtained failure time data is then plotted on Weibull probability paper and the failure distribution
function is estimated from the resulting regression line.
Fig. 2-7 shows an example of this process. The shape parameter m and the characteristic life η that determine
the Weibull distribution in the following equation can be obtained from the linear regression.
  t  m 
F (t )  1  exp   
    
・・・・Eq. 2.1.18
This method of obtaining the failure distribution function is called burn-in study.
Fig. 2-7 Weibull Plot of the Burn-in Study
Note) Weibull probability paper is scaled to display linear regression of failure times that follow a Weibull
distribution.
(2) Determining the burn-in conditions
The screening (burn-in) conditions required to reduce the early failure rate after shipment (Note 1) to the
target value can be determined using the failure distribution function F(t) obtained from the burn-in study.
Labeling the burn-in time as t0 and the coefficient of acceleration for the burn-in conditions and the market
environment as K, the cumulative early failure rate that can be eliminated by burn-in is given as F(K·t0), and
the new cumulative early failure rate F(t) up to time t after burn-in can be obtained by the following formula.
F (t )  F ( K  t0  t )  F ( K  t0 )
・・・・Eq. 2.1.19
This relationship can be expressed in graph form as shown in Fig. 2-8.
The burn-in conditions are selected according to the combination of the acceleration conditions and time that
will reduce this value to the target early failure rate or lower. Normally, initial defects that are the cause of
2-9
early failures occur at the highest rate in the initial stages of process development, and then decrease thereafter
due to process improvements and process mastery. The early failure rate decreases in proportion to these initial
defects, so the burn-in time is reviewed as appropriate in accordance with process improvements.
Failure Probability
Density Function f(t)
Early failures eliminated by screening
Cumulative early failure rate F(t)
K・t0
t
Burn-in
Shipment
Fig. 2-8 Early Failure Screening by Burn-in
Note 1) The early failure rate described in this section is not the instantaneous failure rate but the
cumulative failure rate over the specified period. See the [Supplement] under “2.1.1 Measures for
Representing Reliability.”
2.1.3.3 Random Failures
When devices containing initial defects have been eliminated to a certain degree, the early failure rate becomes
extremely small, and the failure rate exhibits a gradually declining curve over time. In this state, the failure
distribution is close to an exponential distribution, and this is called the random failure period. The semiconductor
device failure rate during this period is an extremely small value compared to the early failure rate immediately
after shipment, and is normally a level that can be ignored for the most part. Viewed in terms of failure
mechanisms, there are extremely few semiconductor device failures that can be clearly defined as random failures.
However, memory software errors and other phenomena caused by α rays and other high-energy particles are
sometimes classified as randomly occurring failure mechanisms.
When predicting semiconductor device failure rates, failures occurring sporadically after a certain long time has
passed since the start of operation and failures for which the failure cause could not be determined are treated as
random failures in some cases. However, most of these failures are thought to be devices containing relatively
insignificant initial defects (dust or crystal defects) that fail after a long time, and should essentially be positioned
on the early failure rate attenuation curve. This type of failure rate cannot be estimated from the results of tests
performed with few samples such as reliability tests. There are also phenomena such as ESD breakdown,
overvoltage (surge) breakdown (EOS) and latch-up that occur at random according to the conditions of use.
However, these phenomena are all produced by the application of excessive stress over the device absolute
2-10
maximum ratings, so these are classified as breakdowns instead of failures, and are not included in the random
failure rate.
2.1.3.4 Wear-out Failures
Wear-out failures are failures rooted in the durability of the materials comprising semiconductor devices and the
transistors, metal lines, oxide films and other elements, and are an index for determining the device life (useful
years). In the wear-out failure region, the failure rate increases with time until ultimately all devices fail or suffer
characteristic defects.
The main wear-out failure mechanisms for semiconductor devices are as follows.
• Electromigration
• Hot carrier-induced characteristics fluctuation
• Time-dependent dielectric breakdown (TDDB)
• Laser diode luminance degradation
Semiconductor device life is defined as the time (or stress) at which the cumulative failure rate for the wear-out
failure mode reaches the prescribed value, and can be estimated using the results of reliability tests and test element
group (TEG) evaluation.
Semiconductor device life is often determined by the reliability of each element (metal lines, oxide film,
interlayer film, transistor, etc.) comprising the device, and these reliabilities are evaluated using TEG for each
element in the process development stage. These TEG evaluation results are incorporated into design rules in the
form of allowable stress limits (electric field strength, current density, etc.) to suppress wear-out failures in the
product stage and ensure long-term reliability. As a result, semiconductor devices experience almost no wear-out
failures within the reliability test time (stress) range in the product stage.
(1) Life estimation method
Semiconductor device life can be obtained as follows based on the wear-out failure data generated by TEG
evaluation and reliability tests. First, linear regression is performed for the time-dependent cumulative failure
rate using a Weibull probability distribution or logarithmic normal probability distribution, then the life is
obtained from the time (or stress) at which the reference cumulative failure rate is reached and the acceleration
factor of the accelerated test conditions (Fig. 2-9).
2-11
F(t) (%)
99.9
99.0
90.0
80.0
70.0
60.0
50.0
40.0
30.0
20.0
Acceleration test failure rate
Predicted market
environment failure rate
10.0
5.0
2.0
×Acceleration factor
1.0
0.5
0.2
0.1
10
100
1000
Time (h)
10000
100000
Fig. 2-9 Failure Rate Prediction Method Using Weibull Probability Plotting Paper
2-12
2.2 Semiconductor Reliability Verification
2.2.1 Basic Approach To Reliability Verification
The Sony Semiconductor Business Unit performs reliability verification that takes into account semiconductor
device failure modes (see Fig. 2-10) in each stage from process development through mass production.
Failure rate
Early failure mode
(Extrinsic failures)
After burn-in
Wear-out failure mode
(Intrinsic failures)
New process
Operating time
Fig. 2-10 Semiconductor Device Failure Rate Curve
2.2.1.1 Reliability Verification in the Development Stage
The failure time due to wear-out failure (intrinsic failure) of semiconductor devices, that is to say the life, is
determined by the failure mechanisms of the process elements described in 2.2.2.
Reliability is evaluated in the process development stage using test element groups (TEG) suitable for verifying
these failure mechanisms to confirm that the prescribed reliability is satisfied.
2.2.1.2 Reliability Verification in the Prototype Stage
(1) Reliability verification for wear-out failures (Intrinsic failures)
Reliability is evaluated over long times using small quantities of prototypes to verify that wear-out failures
do not occur in the assumed operating environments and operating periods. (See Table 2-1.)
(2) Reliability verification for early failures (Extrinsic failures)
Semiconductor devices tend to have a high failure rate at the start of operation, and this failure rate tends to
decrease steadily over time. This is because a certain percentage of semiconductor devices have inherent
manufacturing defects such as dust, causing these devices to fail. This tendency is more noticeable for new
processes, so burn-in studies are performed when introducing production to verify the early failure rate.
When the prescribed failure rate is not satisfied, burn-in and other screening methods are used to remove
2-13
semiconductor devices with inherent manufacturing defects.
The Sony Semiconductor Business Unit continuously executes activities to stabilize and improve processes,
and strives to reduce the number of semiconductor devices with inherent manufacturing defects so that
prescribed early failure rates can be satisfied without the need to perform burn-in.
2.2.1.3 Reliability Verification in the Mass Production Stage
Mass production items are sampled* and reliability is periodically evaluated at the product level corresponding
to (1) above to confirm that the wear-out failure reliability level built in at the development stage is continuously
maintained from mass production onward.
*
Samples are taken from each product family in consideration of combinations of wafer process, assembly
process, factory, and other factors.
2-14
Table 2-1 shows typical LSI product reliability test items used by the Sony Semiconductor Business Unit.
Table 2-1 Typical Sony LSI Product Reliability Test Items
Name of test
Code
Test conditions
High Temperature Operating Life
HTOL
Tj≧125C
Vop_max 1000h
Low Temperature Operating Life
LTOL
Ta=-55C
Vop_max 1000h
Temperature Humidity Bias
THB
Ta=85C85%RH
Vop_max On/Off 1000h
High Temperature Storage
HTS
Ta=150C 1000h
Temperature Cycling
TC
Ts=-65~125C 700cyc
Ts=-40~125C 850cyc
Ts=-65~150C 500cyc
Moisture Sensitivity Level
MSL
Level 3 (standard lank)
(J-STD-020)
Electrostatic Discharge Human Body Model
(HBM)
ESD
HBM
C=100pF, R=1500Ω
(JS-001-2014)
Electrostatic Discharge Charged Device
Model (CDM)
ESD
CDM
Charged Device Model
(JESD22-C101)
Latch-Up Trigger Pulse Current Injection
Method
LU
I-Test
Trigger pulse current injection method
(JESD78)
Latch-Up Supply Overvoltage Method
LU
V-Test
Power supply overvoltage method; Ta=25, 125C
(JESD78)
Burn-In Study (Early Life Failure Rate)
BIS
(ELFR)
Tj≧125C, Vop_max
2.2.2 Reliability in the Development and Design Stages
Semiconductor devices have failure mechanisms unique to semiconductors, and resolving these problems in the
process development stage is an important element for securing reliability. Stable product reliability can be
secured by verifying the required reliability when developing each process element and reflecting these results to
the design rules.
Table 2-2 shows typical failure mechanisms that can pose problems in the process development stage. As
processes become more miniaturized, higher internal electric fields, current densities, metal line stress and other
factors increase the stress applied to transistors and metal lines. On the other hand, faster circuit speeds and
increased parasitic impedance (metal line resistance, parasitic capacitance) reduce operating margins, which is a
major issue in securing reliability with respect to transistor characteristics fluctuation.
Typical semiconductor device failure mechanisms that can pose problems in the process development and
design stages are described below.
2-15
Table 2-2 Typical Failure Mechanisms in the Process Development Stage
Process
element
Failure mechanism
Failure mode and cause
Gate dielectric
film
Time-dependent
dielectric breakdown
(TDDB)
Dielectric breakdown of the gate dielectric film. This is the
phenomenon where bias applied to a gate electrode for a
long time produces defects in the gate dielectric film,
increasing the micro leak current and leading to dielectric
breakdown.
Transistor
Hot carrier (HCI)
Transistor characteristics fluctuation due to trapping of hot
carriers in the gate dielectric film. This is the phenomenon
where high-energy electrons and holes generated by impact
ionization of electrons accelerated by high electric fields are
trapped in the oxide film, causing the transistor
characteristics to fluctuate.
NBTI (slow trap)
PMOS transistor characteristics fluctuation due to
application of a gate negative bias (NBT). This is also called
the slow trap phenomenon, and is the phenomenon where
application of a bias at high temperatures increases the
interface state and positive fixed charge, causing the
transistor characteristics to fluctuate.
Soft error
Memory data rewrite error due to high-energy cosmic ray
particles (neutron rays, proton rays, etc.), α rays, etc. This is
a temporary data error phenomenon that occurs mainly in
DRAM and SRAM.
Retention/disturb
Non-volatile memory data loss. This is the phenomenon
where long-term storage or operating environment stress
(read/write electric field, temperature, stress) causes the
trapped charge in a Flash memory to disappear, inverting
the data.
Electromigration
Increased metal line resistance and disconnection due to
voids forming in metal lines. This is the phenomenon where
physical impacts between electrons and metal atoms cause
the metal atoms to move, creating voids.
Stress migration
The metal creep phenomenon due to metal line stress
causes voids to form and grow in metal lines and connection
(via hole) portions, resulting in open defects. In copper lines,
this is the phenomenon where vacancies (atom holes) in
copper lines due to metal line stress induce the creep
phenomenon, causing voids to form and grow.
TDDB between metal
lines
Short-circuit due to dielectric breakdown between copper
lines. This phenomenon mainly consists of dielectric
breakdown via the CMP interface of an interlayer dielectric
film that uses low-k materials, resulting in a short-circuit
between metal lines.
Memory
device
Metal lines
Low-k
interlayer films
2-16
2.2.2.1 Time-dependent Dielectric Breakdown (TDDB)
MOS FET gate dielectric film has a failure mechanism whereby applying even an electric field of the dielectric
withstand voltage or less for a long time causes the dielectric film to deteriorate and lead to breakdown. This
breakdown of the dielectric film over time is called time-dependent dielectric breakdown (TDDB). The TDDB life
of gate dielectric film is one of the most important failure mechanisms determining the long-term reliability of a
MOS-type semiconductor device. The TDDB life said to be the factor that determines the limit for reducing the
gate dielectric film thickness, and the gate dielectric film thickness in system LSI is also sometimes determined by
the TDDB life in accordance with the logic circuit supply voltage.
(1) Gate dielectric film life distribution
Time-dependent dielectric film breakdown phenomena can generally be divided into an initial breakdown
area rooted in defects and a genuine life area. Fig. 2-11 shows the TDDB measurement data of a gate oxide
film (SiO2) plotted using a Weibull distribution function. The initial breakdown and genuine life areas can be
separated according to differences in the shape parameter (graph slope) of the Weibull distribution function.
Dielectric film distributed in the initial breakdown area with a short TDDB life is oxide film that includes
defects that may fail in a short time in the market, so it is important to suppress the defect occurrence rate to
lower the early failure rate.
In contrast to this, the genuine breakdown area indicates the natural life of gate dielectric film that does not
include major defects, and is a necessary index for assuring long-term reliability. The genuine life at the actual
operating voltage can be predicted using an electric field acceleration model from the evaluation results of
TDDB accelerated by high electric field stress conditions. The electric field acceleration model uses the Emodel (τexp(E)), Power-law model (τE-n) and other models according to the film thickness and film type.
(See Fig. 2-12.)
(2) Gate dielectric film breakdown mechanism
Gate dielectric film contains a large number of micro defects and impurities that occur in the wafer process,
and micro leak currents flow via these defects even in the state where the applied electric field (supply voltage)
is less than the genuine withstand voltage. These leak currents generate new defects in the dielectric film over
time, and the accumulation of these defects leads to dielectric film breakdown.
The percolation model is a typical failure mechanism for TDDB breakdown of thin gate dielectric film. In
this failure model, when defects initially present in the gate dielectric film and new defects generated by tunnel
current flowing due to the application of electric fields are continuous in the thickness direction, this leads to
dielectric breakdown. (See Fig. 2-13.)
As gate dielectric film becomes thinner, fewer defects may generate continuous defects which are needed for
dielectric breakdown, so the TDDB life variance increases. In addition, data written in Flash memories can also
2-17
be lost (phenomenon of retention) due to micro leak currents prior to breakdown.
Genuine life distribution
Oxide film that
includes defects
(early failure area)
Fig. 2-11 TDDB Data Distribution (Weibull)
EFIELD: Actual electric field
ETEST: Test electric field
Fig. 2-12 Electric Field Acceleration Model and Life Prediction
2-18
(a) Initial stage
(b) Defect generated by
micro leak current
(c) Breakdown occurs
Defect
Fig. 2-13 Gate Dielectric Film Breakdown Model (Percolation Model)
2.2.2.2 Hot carrier (HCI)
Hot carrier is a failure mechanism where a charge (carrier) that has attained high energy mainly due to
acceleration by the electric field inside the MOS FET becomes trapped in the gate dielectric film, causing the
transistor characteristics to fluctuate and resulting in a circuit operation error. In a general operating environment,
the greatest transistor deterioration is caused by Drain Avalanche Hot Carrier (DAHC) injection, which occurs
when electrons flowing along an NMOS FET channel are accelerated by the high electric field near a drain. On the
other hand, the hot carrier mechanism that injects a charge to the dielectric film is also used to write and erase data
in a non-volatile memory.
(1) Drain Avalanche Hot Carrier (DAHC) injection
Electrons flowing in a NMOS FET channel are accelerated by the high electric field near a drain and undergo
impact ionization, generating electron-hole pairs. Of the electron or the hole, the carrier with the higher energy
(hot carrier) is injected to and trapped by the gate dielectric film, causing the transistor characteristics to
fluctuate (threshold value fluctuation, drop in drain current, etc.). This is called Drain Avalanche Hot Carrier
(DAHC) injection. (See Fig. 2-14.)
The dominant DAHC injection mode in a NMOS FET is mainly electron injection, and the maximum
deterioration occurs under the condition where the gate voltage is approximately 1/2 • VDS. This means that in a
CMOS circuit, hot electron injection occurs when the signal is inverted (H→L/L→H), so deterioration
progresses as the circuit is operated.
This problem can be avoided by selecting operating conditions (voltage, duty) in the circuit design stage
under which hot carriers are not easily generated, and reliability can also be increased by providing circuits
with the required operating margin. Device countermeasures are also taken, such as adopting a device structure
(LDD structure) that suppresses hot carrier generation by reducing the electric field around drains.
2-19
Electron
Hole
Gate
Source
Drain
Fig. 2-14 DAHC Mechanism
2.2.2.3 Negative Bias Temperature Instability (NBTI)
PMOS FET negative bias temperature instability (NBTI) is the phenomenon where transistor characteristics
fluctuate when a negative gate bias is applied to a PMOS FET. This is one of the transistor deterioration
mechanisms known as slow trap. PMOS FET is one of the latest MOS processes, and the use of surface channeltype transistors causes deterioration to increase, which is a transistor reliability problem on a level with hot carriers.
(1) NBTI deterioration mechanisms
When a negative bias is applied to a PMOS FET, the holes on the Si surface are trapped by the Si-H bond of
the Si-SiO2 interface, and the hydrogen (H) is disassociated from the Si-H bond and generates an interface state.
The hydrogen disassociated from the Si bond diffuses and is trapped within the gate dielectric film, generating
a positive fixed charge that promotes deterioration of the transistor characteristics.
Si ≡ Si- H + hole  Si ≡ Si-・+ + H
H + H  H2
The interface state generated at the interface between the Si and the gate dielectric film traps the positive
charge when the PMOS FET operates, and becomes positively charged. This generates a positive fixed charge
in the dielectric film, and causes the transistor threshold voltage (Vth) to fluctuate and the drain current to drop.
One characteristic of NBTI is that when negative bias is applied to a gate, deterioration occurs regardless of
transistor operation, so deterioration proceeds even in circuits that are not operating. On the other hand, there is
also the phenomenon that fluctuating characteristics recover rapidly when negative bias stress is not applied,
and the amount of fluctuation in the operating state is known to be largely independent of the operating
frequency. In the process conditions, the amount of NBTI deterioration is closely related to the concentrations
and profile of the impurities (N, H, B, etc.) in the gate dielectric film, and the amount of deterioration increases
in particular for gate dielectric films (SiON, SiN) with high nitrogen (N) contents.
2-20
This problem can be avoided by design countermeasures such as providing sufficient margin for circuit
operation on account of transistor deterioration, and by reducing the electric fields applied to gate dielectric
film. Device countermeasures are also taken such as forming the gate dielectric film so that interface states and
fixed charges are not easily generated.
Diffusion to within the oxide film
 Generation of a positive fixed charge
Hole
Generation of
an interface state
Hole trapping
Si-SiO2 interface terminated
by hydrogen (H)
(Negative bias applied)
Hole trapping by the
tunnel phenomenon
Disassociation of hydrogen
(H) and generation of an
interface state
Fig. 2-15 NBTI Failure Mechanisms
2.2.2.4 Soft Error
When α rays and high-energy neutron rays generated from cosmic rays, etc. penetrate memory elements and
other semiconductor devices, large quantities of electron-hole pairs are generated within the silicon crystals. These
charges invert the memory nodes, resulting in memory data errors known as the soft error phenomenon. The soft
error phenomenon temporarily inverts the memory and logic circuit data, and these errors can be recovered by
rewriting the data. This phenomenon was previously a problem for DRAM, but is currently also considered a
problem for SRAM reliability.
(1) Principle of soft error generation by α rays
The quartz materials used in the sealing resin packages of semiconductors contain trace amounts of
radioactive elements (uranium: 238U; thorium: 232Th). In addition, the lead bumps used in flip chips sometimes
contain polonium (210Po). When the high-energy α rays emitted by these radioactive elements penetrate the
silicon substrate, electron (e-) and hole (e+) pairs are generated along the α ray path inside the silicon. The
electric field causes electrons generated inside the depletion layers to migrate and cluster together in the n
2-21
diffusion area, which causes the memory node capacity potential to drop. (See Fig. 2-16.)
Fig. 2-17 shows the soft error mechanisms in the SRAM memory cell. When the High side memory node
potential falls below the driver transistor threshold value, the two inverters forming a Flip-Flop both turn off at
the same time, making the Flip-Flop unstable and causing misoperation. Generally when the word line is
selected, the High side memory node potential (Vh) drops to Vcc - Vth (word transistor threshold value). When
the word line is not selected, the High side memory node is charged by the memory cell load and the potential
returns to Vcc. The faster this recovery time from Vcc - Vth to Vcc, that is to say the greater the current supply
capacity of the memory cell load, the more resistant the SRAM is to soft errors.
Countermeasures for soft errors caused by α rays include forming a protective film on the chip surface to
absorb α rays. In addition, countermeasures are also taken to reduce α ray emission levels such as by using
highly pure package materials with reduced levels of radioactive element contents.
Fig. 2-16 Generation of Electron and Hole Pairs by α Rays
2-22
Fig. 2-17 Soft Error in the SRAM Cell
(2) Soft errors due to cosmic rays
High-energy cosmic rays collide in the atmosphere with the atoms that comprise the atmosphere, generating
high-energy protons and neutrons. These high-energy neutron rays passing through silicon, electron-hole pairs
are generated along the range and the neutron rays collide with silicon atoms to generate secondary ions by
spallation reaction; which can cause soft errors. The quantity of high-energy neutrons generated by cosmic rays
that reaches the ground is known to increase in high-elevation regions due to differences in geographical
conditions and lower atmospheric shielding effects, and this causes the soft error occurrence rate to increase.
This can pose serious reliability problems in applications such as aircraft and satellites.
It is difficult to suppress factors causing soft errors due to cosmic rays, so this is known as a failure mode
that occurs at a certain probability. One countermeasure method for SRAM is to mount error correcting code
(ECC) so that data experiencing soft errors is corrected. In addition, device structures such as SOI structures
that are resistant to the effects of soft errors are also sometimes used.
2.2.2.5 Electromigration
Electromigration is a failure mechanism where electrons flowing through metal (Al, Cu) lines collide physically
with the metal atoms, causing the metal atoms to migrate and form voids in the metal lines which lead to increased
metal line resistance and disconnection. Electromigration is a key failure mechanism that determines the long-term
reliability of metal lines.
(1) Aluminum electromigration
2-23
The thin films used in aluminum (Al) lines are formed by spattering, and the aluminum atoms accumulate in
a polycrystalline (grain) structure. (See Fig. 2-18.) When current of a certain density or more flows through
these metal lines, the electromigration phenomenon is caused where the metal atoms physically move by stress
due to collisions between the electrons and metal atoms. The metal atoms around the grain boundaries have
weak bonding energy and move easily, so electromigration occurring at the grain boundaries of metal lines
with uneven grain sizes causes voids to form and grow along the grain boundaries, leading to disconnection.
(See Figs. 2-19 and 2-20.)
Process countermeasures include adding trace amounts of copper to aluminum to suppress aluminum atom
migration by slowing down the movement time, and covering the top and bottom of metal lines with Ti, W or
other metal alloys (cap layer) to suppress aluminum atom movement. Circuit design countermeasures are also
taken such as keeping the current density that flows in metal lines to a certain value or less.
Fig. 2-18 Aluminum Grain Structure
Al accumulation
Al shortage (void)
Al grain boundary
Grain boundary diffusion
Electron
Fig. 2-19 Electromigration Mechanism
2-24
Interlayer
dielectric film
Fig. 2-20 Photo of Electromigration
(2) Copper electromigration
Copper lines are formed by an embedded metal line (damascene) process that uses electroplating. Copper has
a higher melting point and activation energy than aluminum, and exhibits reliability with respect to
electromigration that is several ten to several hundred times higher than that of aluminum. However, the
miniaturization of metal lines in the latest processes is increasing the current density, so resistance to
electromigration is becoming an important issue for reliability.
The electromigration resistance of copper is known to be greatly affected by the crystal grain size and
alignment, and the adhesion at the interface between the copper and the barrier metal. Particularly in copper
lines that has a structure surrounded by barrier metal, when the adhesion drops between the copper and the cap
layer on the top surface where smoothing is performed, the copper at the interface moves easily, resulting in
migration. Therefore, it is important that the process incorporate countermeasures to increase the adhesion at
the interface between the copper and the cap layer. Circuit design countermeasures are also taken such as
keeping the current density that flows in metal lines to a certain value or less.
2.2.2.6 Stress Migration
Stress migration is a failure mechanism where stress applied to metal lines causes the metal atoms to creep,
forming voids in metal lines which lead to increased metal line resistance and disconnection. Stress is generated
in the metal lines (Al, Cu) used in LSI due to temperature differences between the heat treatment process in the
manufacturing process and the operating environment temperature. Thanks to this stress, vacancies in the metal
lines can creep and converge in a single location, forming a void.
Stress migration occurs due to the interaction between the metal line stress and the metal atom creep
phenomenon. Whereas the metal atom creep speed increases at high temperatures, the stress acting on the metal
lines decreases at high temperatures, so there is known to be a peak to the temperatures at which stress migration
occurs.
2-25
(1) Aluminum stress migration
Aluminum lines have many vacancies and aluminum atoms with weak bonding force at the grain boundaries
of the polycrystalline structure, so when tensile stress is applied to metal lines, these aluminum atoms and
vacancies at the grain boundaries creep and form voids. Aluminum voids produced by tensile stress mainly
form and grow along the crystal grain boundaries, and can lead to increased metal line resistance and
disconnection defects. (See Fig. 2-21.)
Aluminum stress migration is generally said to have an occurrence ratio peak around 150 to 200°C, and can
become a problem for long-term reliability in devices that are used for long times in high-temperature
environments.
As a design countermeasure, patterns are designed to avoid applying excessive stress to metal lines. Process
countermeasures include using a metal line structure that layers the aluminum between upper and lower layers
of a cap layer (Ti, W, etc.) to prevent stress migration. In addition, countermeasures such as using an interlayer
film structure that reduces stress and optimizing the heat treatment process are also taken to reduce the residual
metal line stress.
Fig. 2-21 Disconnection Defect due to Aluminum Stress Migration
(2) Copper stress migration
Regarding copper stress migration, the stress induced voiding (SIV) mode that produces voids in via holes
that connect upper and lower lines is a problem for reliability. When wide lines and narrow lines are
connected by a single via hole, the tensile stress on the wide line side concentrates in the via hole, causing the
vacancies in the copper to creep and migrate to the via hole and form a void. (See Fig. 2-22.) Stress migration
at copper via holes is known to have an occurrence temperature peak around 200°C. However, this failure is
largely dependent on the stress generated in the high- temperature annealing process after copper line
formation, so it occurs in a short time and is an early failure factor.
A countermeasure method in the design stage is to use multiple via holes in areas where wide lines and
narrow lines are connected. When metal lines are connected by multiple via holes, even if stress concentrates
2-26
on a single via hole and creates a void, the stress applied to other via holes is reduced so voids do not easily
occur at those other via holes, enabling prevention of open defects between metal lines. Process
countermeasures are also taken such as reducing the copper stress and selecting process conditions that reduce
the vacancies in copper.
Fig. 2-22 Void Caused by Stress Migration in a Copper Wiring Via Hole 1)
<References>
1) R. Kanamura et al.: Symp. on VLSI Tech., p. 107, 2003
2-27
2.3 Acceleration Model
In general, failure of components including semiconductor devices occurs due to some reaction at the atomic or
molecular level, and can be described by the Eyring absolute reaction theory (hereafter, “Eyring model”).
This Eyring model expresses the lifetime L in the absolute temperature T range that should be the focus for
reliability by the following separation of variables-type equation, using the activation energy Ea shown in Fig. 2-23,
the non-temperature stress S that is a factor inducing failure, and the Boltzmann’s constant k(8.617×10E-5[eV/K]).
L = A・S-n exp(Ea/kT)・・・・Eq. 2.3.1
“A” and “n” in the above equation are constants.
Outlines of the environmental and operating stress acceleration models used for semiconductor devices are
described below.
2.3.1 Acceleration Models for Environmental Stress
(1) Temperature acceleration model
exp(Ea/kT) on the right side of Equation 2.3.1 is also called the Arrhenius model since this is the same to the
equation derived empirically by Arrhenius in the 19th century.
Ea is the activation energy of which unit is “eV.” The activation energy is an essential one for the progress of
chemical and physical reactions. If chemical and physical reactions consisting of failure mechanisms are same,
the activation energies are inevitably equal.
L = A・exp(Ea/kT)
・・・・Eq. 2.3.2
(2) Humidity acceleration model
Humidity-induced acceleration models express the absolute vapor pressure Vp or the relative humidity RH as
humidity stress.
Typical models are described below.
① Absolute vapor pressure model
This model expresses temperature stress and humidity stress using the absolute vapor pressure VP, which
is empirically known as correct. Because Vp depends on the temperature, the Eyring Model cannot be used.
L = VP-n
・・・・Eq. 2.3.3
② Relative humidity model
This model is expressed conforming with the Eyring model by a separation of variables-type equation
using the absolute temperature T and relative humidity RH since Vp depends on the temperature, and
corresponds to the case when S = RH in Equation 2.3.1.
L = A・(RH)-n exp(Ea/kT)
2-28
・・・・Eq. 2.3.4
③ Lycoudes model
Those models which multiply temperature, relative humidity and a function of voltage are also available.
As a typical model, the Lycoudes model reported by N. Lycoudes is shown below.
MTTF=A・exp(Ea/kT)・exp(B/RH)・V-1・・・・Eq. 2.3.5
“V” and “B” in the above equation are voltage and a constant respectively.
(3) Temperature difference acceleration model
This model is applied to failures caused by the repeated application of stress (thermal stress) produced by
temperature differences. Labeling the temperature difference as ∆T, the number of cycles N is expressed using
the following equation, by substituting S=ΔT for Equation 2.3.1.
N=A・ΔT –α
・・・・Eq. 2.3.6
[Supplement]
In case of low cycle fatigue, failures due to thermal fatigue of materials (cycle life) Nf conforms to the
Coffin-Manson model described by the following equation, where ∆ε is the plasticity strain amplitude.
ΔεP・Nfα=C
・・・・Eq. 2.3.7
“a” and “C” in the above equation are material constants.
In case of low cycle fatigue, failure due to repeated thermal stress conforms to the Coffin-Manson model,
and the temperature difference acceleration model is thought to be a form of that model. Semiconductor chip
failure can be broadly described using the temperature difference acceleration model, but the Coffin-Manson
model must be taken into account for mounting failures including package factors, such as the thermal fatigue
life of soldered portions. The following is a variation of the Coffin-Manson model on which the effects of the
temperature cycling frequency and maximum temperature, suggested by Norris and other persons.
Nf=C・fm・ΔεP-n・exp(Q/kTMAX)
・・・・Eq. 2.3.8
In the above equation; “Nf” is the fatigue life, “C” is the material constant, “m” and “n” are exponents, “f” is the
cycling frequency, “ΔεP” is the plasticity strain amplitude, “Q” is the activation energy, “k” is the Boltzmann's
constant and “TMAX” is the maximum temperature.
2-29
Activation energy
Activated state
Normal state
Degraded state
Fig. 2-23 Activation Energy
2.3.2 Acceleration Models for Operating Stress
Operating stresses that determine semiconductor device life include voltage, current, electric field strength,
current density, etc., and differ according to the failure mechanism as described in section 2.2.2. The main failure
mechanism acceleration models are described below.
Note that life in these models also depends on the temperature, so it is expressed by an Eyring model of the
operating stress and temperature stress.
(1) Time-dependent dielectric breakdown (TDDB) acceleration models
The life of devices (TTF) due to TDDB depends on the gate oxide film thickness. The Eox model is said to
be appropriate for those devices of which gate oxide film thickness is 5nm or more; the Vg model for more
than 2nm, and less than 5nm; and the Power-law model for 2nm or less.
①
②
③
Eox model
TTF=A・exp(-γEOX・Eox) exp(Ea/kT)
・・・・Eq. 2.3.9
TTF=A・exp(-γVg・Vg) exp(Ea/kT)
・・・・Eq. 2.3.10
Vg model
Power-law model
TTF=A・Vgn・exp(Ea/kT)
・・・・Eq. 2.3.11
In the above equations; “γEOX” is the field intensity acceleration factor, “γVg” and “n” are voltage acceleration
factors, “Eox” is the stress electric field applied to the gate and “Vg” is the stress voltage applied to the gate.
2-30
(2) Hot carrier (HCI) acceleration models
The life of devices due to hot carriers is indicated by the substrate current model expressed by the substrate
current and the 1/Vds model expressed by the drain voltage. Since process nodes for the 0.25um and 0.15um
generations and newer devices, other impacts are greater than that of the substrate current, the 1/Vds model is
becoming the main one.
① substrate current model
TTF=A・Isub -m ・exp(Ea/kT)
・・・・Eq. 2.3.12
② 1/Vds model
TTF=A・exp(B/Vds)・exp(Ea/kT)
・・・・・Eq. 2.3.13
In the above equations; “m” is the factor depending on the substrate current, “B” is the factor depending on the
voltage, “Isub” is the maximum substrate current while stress is being applied and “Vds” is the drain voltage while
stress is being applied.
(3) Negative Bias Temperature Instability (NBTI) acceleration models
The life of devices due to NBTI is often indicated by the following equations:
TTF=A・exp(γ・Eox) exp(Ea/kT)
・・・・Eq. 2.3.14
TTF=A・Eoxγ ・exp(Ea/kT)
・・・・Eq. 2.3.15
n
TTF=A・Vg ・exp(Ea/kT)
・・・・Eq. 2.3.16
In the above equations; “γ” is the field intensity acceleration factor, “n” is the voltage acceleration factor, “Eox” is
the stress electric field applied to the gate oxide film and “Vg” is the stress voltage applied to the gate oxide film.
(4) Electromigration (EM) acceleration model
In general, the life of devices due to EM is logically explained by the Huntington’s equation.
∂C/∂t=D∇{∇C-(eZ*/kT) E・C}
・・・・Eq. 2.3.17
In the above equation; “C” is the atomic concentration, “D” is the diffusion factor, “Z*” is the effective valence,
“E” is the electric field, “e” is the electronic charge, “k” is the Boltzmann's factor and “T” is the absolute
temperature.
To calculate the actual life of devices due to EM (TTF), the Black’s equation which was derived empirically, is
widely used.
In the following equation; “T” is the absolute temperature, “j” is the current density, “Ea” is the activation energy,
“A” is the constant of proportionality, “n” is the function of the current density and“k” is the Boltzmann's factor.
TTF=A・j-n・exp(Ea/kT)
・・・・
2-31
Eq. 2.3.18
<References>
1)
JEITA EDR-4704A: Application guide of the accelerated life test for semiconductor devices
2)
JEITA EDR-4707:Report on Failure Mechanism of LSI and reliability test method
3)
JEITA ETR-7024:Research Report on Effect of Voids on Reliability of Lead-Free Solder Joints and
Standard of Evaluation Criteria
4)
N. J. Flood:Reliability aspects of plastic encapsulated integrated circuit, IRPS(1972)
5)
D. S. Peck:Temperature-humidity acceleration of metal-electronics failure in semiconductor devices,
IRPS(1973)
6)
N. Lycodes:The reliability of plastic microcircuit in moist environments, Solid State Technology(1978)
7)
T. Gasser:Hot Carrier Degradation in Semiconductor Device
8)
Comparison of NMOS and PMOS hot carrier effects, IEEE transaction on electron devices(1997)
9)
H. B. Huntington:Diffusion in Solids, Academic Press(1975)
2-32