Download Backtesting Value-at-Risk based on Tail Losses Woon K. Wong

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Investment management wikipedia , lookup

Investment fund wikipedia , lookup

Business valuation wikipedia , lookup

Actuary wikipedia , lookup

Beta (finance) wikipedia , lookup

Moral hazard wikipedia , lookup

Financial economics wikipedia , lookup

Risk wikipedia , lookup

Modern portfolio theory wikipedia , lookup

Systemic risk wikipedia , lookup

Value at risk wikipedia , lookup

Transcript
Backtesting Value-at-Risk based on Tail Losses
Woon K. Wong
Tamkang University
2007/1/6
Abstract
Value-at-Risk (VaR) has been sanctioned for market risk capital determination and is
now the standard for risk measurement. However, there is potential tail risk in the use
of VaR since the risk measure disregards valuable information conveyed by sizes of
losses beyond the VaR level. While the Capital Accord stipulates a backtesting
procedure of VaR by counting the number of exceptions, this paper proposes to use
saddlepoint technique to backtest VaR by summing the tail losses. Monte Carlo
simulations show that the technique is very accurate and powerful even for small
samples. Application of the proposed backtest to S&P 500 index returns finds
substantial downside tail risks in the stock markets, and that risk models which
account for jumps, skewed and fat-tailed distributions fail to capture the tail risk
during the 1987 stock market crash.
Key words: Value-at-Risk, exception, tail risk, backtesting, risk management and
saddlepoint technique
JEL Classification: G32
0
1. Introduction
Value-at-Risk (VaR) was sanctioned in 1996 by the Basle Committee and major
central banking authorities for market risk capital determination through the so called
internal models.1 Since then, VaR has become the standard for measuring risk and is
now widely used by banks and other financial entities such as hedge funds for risk
reporting; see, for example, Jorion (2001). One reason for the preference of VaR over
other risk measures can be attributed to the fact that the backtest of VaR is easy to
implement. VaR can be defined as the maximum possible loss of a portfolio over a
specified period at a given confidence level. Backtest of VaR basically involves
counting the number of exceptions or tail losses exceeding VaR. For example, both
the Kupiec’s (1995) likelihood ratio test and the Christoffersen’s (1998) conditional
coverage test use empirical failure rate which can be easily calculated as the number
of exceptions by the sample size.
Unfortunately, above backtests of VaR disregards valuable information conveyed
by the sizes of exceptions. During turbulent market environments such as the fall of
1998 when many banks suffered huge financial losses, above frequency-based
backtests of VaR can be deceptive of the underlying risk. For instance, Berkowitz and
O’Brien (2002) study the trading profits and losses of six large multinational banks
from January 1998 to March 2000. Using above frequency-based backtests, they find
the banks’ internal VaR models provide sufficient coverage for market risk.
Nevertheless, losses were huge when breaches of VaR occurred during the volatile
period of fall 1998. That is, there can be substantial tail risk in the use of VaR that is
overlooked by the risk model.2
In this paper, I propose to backtest VaR model based on sizes of tail losses.
Specifically, I calculate the tail risk (TR) statistic which is defined as the sum of all
exceptions in excess of VaR divided by the sample size. The main contribution of this
paper is to use the saddlepoint or small sample asymptotic technique to calculate the
p-value of the TR test statistic accurately. Monte Carlo simulations show that the
1
Basle Committee on Banking Supervision (1996a, Section B.4) stipulates that banks may use internal
models to calculate the market risk capital requirement, which is defined as the higher of (i) its
previous day’s 10-day VaR and (ii) an average of the 10-day VaRs over the preceding 60 business days,
multiplied by a multiplication factor of at least 3.
2
See also Leippold (2004) and Yamai and Yoshiba (2005) regarding further description on the concept
of tail risk.
1
proposed backtest yields accurate test size and is very powerful against excessive tail
risk even for small samples. Application of the backtest to the S&P 500 index returns
from 1986 to 2000 reveals that risk models which account for jumps, skewed and
fat-tailed distributions fail to capture the tail risk during the 1987 stock market crash.
Tail risk as defined above is related to VaR and expected shortfall (ES)3 in the
following manner:
 1TR  ES  VaR ,
where 1   is the confidence level at which VaR is calculated. Thus TR refers to the
risk at the part of tail that is disregarded by VaR. The proposed backtest of VaR based
on tail losses would help a risk manager to assess whether the tail risk of her VaR
model is consistent with the tail of the underlying distribution model. Though various
backtests of ES have been proposed, noticeably the censored Gaussian approach of
Berkowitz (2001) and the functional delta method of Kerkohf and Melenberg (2004),
it is useful to consider the backtest of tail risk of VaR proposed in this paper for the
following reasons. VaR has now become the standard for risk measurement and its
wide usage is difficult to be changed. Since the Basle Committee stipulates that VaR
be backtested by counting the number of exceptions, it is meaningful to carry out the
proposed backtest based on the sum of all sizes of exceptions.
Indeed, the proposed backtest has many advantages that are complimentary to
VaR being an officially sanctioned risk measure. First, sizes of tail losses convey
valuable economic information for risk management. Though current backtesting
procedure counts only the number of VaR exceptions, the Basle Committee points out
that banks should document and explain all such exceptions if the number of
exceptions exceeds four.4 This is in order to determine an appropriate supervisory
response to a backtesting result that is in the yellow zone but possibly due to pure bad
luck. The backtest proposed in this paper considers sizes of exceptions and thus can
provide additional valuable information that will facilitate an appropriate supervisory
action to be made.
Second, it is recognized by the Basle Committee on Baking Supervision (1996b,
p. 5) that the frequency-based tests are limited in their power to distinguish an
accurate model from an inaccurate one, and such limitation has been a prominent
3
Expected shortfall is defined as the average of losses that exceed VaR; see Artzner (1997, 1999).
When the number of exceptions exceeds four, the backtesting result is classified by the Basle
Committee as in the yellow zone and the multiplication factor for market risk capital requirement will
be raised; see Section 2 for further details.
4
2
consideration in the design of the current backtesting framework. However, as formal
statistical technique for a size-based backtest was not available, the Basle Committee
rightly decided the frequency-of-exceptions method that is simple and straightforward
to implement. Though size-based backtests of ES have recently been proposed by
Berkowitz (2001) and Kerkhof and Melenberg (2004), they rely on large sample to
converge to the required limiting distribution. The backtest proposed in this paper
uses small sample asymptotic technique to calculate the p-value of the proposed
size-based test statistic, TR. It is very accurate and a lot more powerful than its
frequency counterpart. This is especially important because backtests are usually
required to be carried out on small samples.
Third, it is also envisaged that all risk information cannot be possibly captured
by a single risk measure, and thus the Basle Committee requires VaR to be
complemented by other risk analyses such as stress testing. While stress testing offers
rigorous and comprehensive scenarios study, by its very nature, it lacks uniformity
and sometimes can even be subjective.5 In this context, the proposed backtest of tail
risk of VaR is an excellent complimentary tool to VaR and its accompanied stress test
since the proposed backtest statistically tests the sizes of tail losses for possible
inadequacy of risk models.
Forth, as from 2006, the new Basle Accord requires major banks to incorporate
the market, credit as well as the operational risks in an integrated risk framework; see,
for examples, Basle Committee on Banking Supervision (2004) and Rosenberg and
Schuermann (2006). Operational risk is being defined as “the risk of loss resulting
from inadequate or failed internal processes, people, and systems, or from external
events” 6 which includes catastrophes such as 9/11. So operational losses are
characterized by infrequent but large losses and its distribution is likely non-elliptical.
Thus it is important for risk managers to carry out backtests of tail risks of VaR. To
illustrate the possible link of tail risk and non-subadditivity, this paper provides an
empirical example that the 99%VaR of the S&P 500 daily returns in 1987 fail to
satisfy the subadditive property while all three risk models considered in this paper
fail the backtest of tail risk of VaR.
Finally, as is pointed out by Basak and Shapiro (2001), evidence abounds in
5
For example, in addition to certain specified stress scenarios the bank portfolios are required to
subject to, banks are also expected to develop their own stress tests which they identify as most adverse
for the portfolios concerned.
6
See Basle Committee on Banking Supervision (2004, p. 137).
3
practice that VaR estimates are increasingly used as a tool to manage and control risk.
As a result, instead of using VaR estimates as summary statistics for decision making,
economic agents struggle to maintain the VaR of their market exposures at a
prespecified level. Such risk management practice could result in larger losses during
adverse down markets due to the fact that a VaR risk manager often chooses a larger
exposure to risky assets than a normal risk manager; see also Yamai and Yoshiba
(2005) for further illustrations on the subject. It is hoped that the availability of a
simple and effective backtest of VaR based on tail losses would encourage risk
managers to test for tail risk of VaR, and thus remedy this possible shortcoming of
VaR.
This paper is organized as follows. Section 2 explains why VaR continues to
enjoy its popularity in risk management notwithstanding its theoretical shortcomings.
Section 3 employs small sample asymptotic technique to calculate the p-value of the
proposed backtest whereas Section 4 carries out the Monte Carlo simulation studies.
Section 5 fits three popular risk models to the S&P 500 daily returns followed by
backtests of tail risks of VaR. Finally, Section 6 concludes.
2. Why VaR continues its success?
In this section, I explain why in spite of certain theoretical shortcomings, VaR
continues its success as a risk measure. Theoretical as well as pragmatic reasons for
the continual acceptance of VaR are presented below. The implication is that VaR as a
standard risk measure is unlikely to be changed. It is therefore useful to have a
size-based backtest of VaR that is complimentary to the “count” rules adopted by the
Basle Committee. Specifically, the Capital Accord stipulates to count the number of
exceptions to backtest VaR whereas this paper proposes to sum the sizes of exceptions
to backtest the tail risk of VaR.
The reasons for the VaR’s success can be attributed to the following reasons.
From a theoretical viewpoint, Danielsson et al. (2006) show provided it is sufficiently
far in the tail that both ES and VaR have similar and consistent ranking of risky
projects. Therefore, VaR is a sufficient risk measure if the confidence level is set
sufficiently large. This is in line with the Basle Committee setting the confidence level
4
for the market risk at 99% whereas for the credit and operational risks, VaR is
calculated at 99.9% level. Moreover, due to diversification, VaR at the level of the
whole bank is often found to be adequate for regulatory capital determination. As long
as there is no single position that dominates the portfolio, the returns of the bank
trading and credit portfolios are usually close to that of a normal or, if it is fat-tailed, a
Student t distribution. Both the normal and Student t distributions belong to the class
of elliptical distributions for which linear correlation is a sufficient dependence
measure. More importantly, for a portfolio with elliptically distributed returns, VaR is
subadditive and thus a coherent risk measure; see for example Embrechts et al. (1999)
and Szegö (2002).
In practice, risk management process is about communicating effectively various
aspects of risks and reducing complex risk factors into simple, understandable terms
on which decisions can be based. For instance, a trader from a government bond desk
may use term such as “duration” to describe his risk exposure; an equity trader will
talk about her “beta”; and an option trader is most worried about his “delta”.
Therefore, an important role of VaR is to make these risks transparent by condensing
the entire distribution of the returns of a bank trading portfolio into a single number
that people from different backgrounds such as board members and investors can
easily understand as a measure of risk. This is one reason why VaR is selected by the
Basle Committee as the basis for calculating regulatory capital for market risk.
Readers are referred to Lotz and Stahl (2005) for further details on the use of VaR in
practice.
Another reason for the Basle Committee’s preference for VaR is that it can be
backtested statistically. When a risk measure is calculated, it is crucial for regulators
as well as risk managers to check whether the underlying risk is captured properly.
Without the feasibility of backtesting, there is no basis for penalties and rewards, and
regulation is not possible. For this reason, the Basle Committee has specified a
“backtesting” procedure that simply counts the number of VaR breaches using the
most recent twelve months of data. If the number of exceptions over this period is less
than or equals to 4 (the green zone), the multiplication factor for the market risk
capital determination remains at the minimum value of three; if there are more than
four exceptions (the yellow zone), the multiplication factor will be increased up to a
5
maximum of four when there are ten or more exceptions (the red zone)7. In the
regulatory context, such simple and yet formal statistical based backtest allows for
uniformity across agencies, countries, and time. Moreover, it is hoped that such
backtesting would offer incentives for the banks to continually improve their internal
risk measurement models, and thus raise the standard of risk management. Indeed, a
breach of VaR is easily identified, and various techniques have been developed for the
backtesting of the risk measure. Some of the more well known backtests of VaR
proposed by academics include Kupiec’s (1995) likelihood ratio test of failure rate,
Christoffersen’s (1998) test of conditional coverage, Christoffersen and Diebold’s
(2000) test of serial correlation of VaR breaches, Dowd’s (2001, 2006) test of VaR
using order statistics, and the testing and comparing of VaRs using GMM method by
Christoffersen et al. (2001).
In summary, VaR has been well studied by academics and widely used by
practitioners. It is unlikely that the Basle Committee would replace VaR with another
risk measure. Therefore, it is useful to have a size-based backtest of VaR that is
complimentary to the use of VaR for risk management. While the backtesting
procedure stipulated by Basle II counts the number of exceptions, this paper proposes
to backtest VaR by summing the tail losses. Such complimentary role of the proposed
backtest provides additional valuable information for more powerful backtesting,
presents a statistical analysis on nature of tail risk of VaR, and facilitates appropriate
supervisory action to be made.
3. Backtesting Tail losses using the saddlepoint technique
This section derives the small sample asymptotic technique to backtest the tail
risk of VaR. I first present the test statistic and briefly discuss its relation with ES.
Next, I provide some basic statistic properties of the test statistic, which are then used
to derive the Lugannani and Rice (1980) formula in order to calculate the p-value of
the test statistic.
3.1 Tail risk of VaR
7
See Basle Committee for Banking Supervision (1996b, Table 2) for details of increases in the
multiplication factor.
6
Let R be the random profit or loss of a portfolio over a holding period. For the
sake of simplicity but without loss of generality, let F be the absolutely continuous
cumulative distribution function (CDF) of R. Then given a sample of T returns
{r1 ,, rT } , the tail risk of VaR can be calculated as
TR 
1 T
 (rt  q) I (rt  q) ,
T t 1
(1)
where I () is an indicator function and for 0 < < 1, q  F 1 ( ) is VaR at (1-)
confidence level. Basle II sets  = 0.01 for market risk. Throughout this paper, a
negative number represents a loss and thus VaR as defined above is less than zero.
The population counterpart of (1) is given by

q

(r  q) f (r )dr where f(r) denotes
the probability density function (PDF) of R. It is straight forward that
q
 1TR   1  (r  q) f (r )dr  ES  VaR .

(2)
Therefore,  1TR may be referred to as the part of ES that corresponds to the
portion of tail disregarded by VaR. Nevertheless, there is a subtle difference between
them in backtesting. It can be seen from (1) that TR is obtained by summing all the
tail losses in excess of VaR and dividing the sum by the total number of observations
in the sample. So unlike ES, which is the expected loss when VaR is breached, TR is
proportional to the sum of all tail losses accumulated over the entire history. This may
seem to be a less desirable property if one is interested in the average size of loss
when VaR is exceeded. However, TR is used in this paper specifically for backtesting.
It tells a risk manager the aggregate tail losses a trading account may incur over a
period. In this regard, the proposed backtest of tail risk of VaR is similar to a
frequency-based backtest of VaR in that while the latter counts the number VaR
breaches, the former sums all the tail losses. Therefore, backtesting the tail risk of
VaR can be regarded as a direct and complementary counterpart to that of the Basle’s
backtesting procedure for VaR.
3.2 The TR random variable and its moments
Due to the workings of the central limit theorem, the daily returns of a bank
trading account are often be fitted with a normal distribution. For example, Berkowitz
and O’Brien (2002) find that an ARMA-GARCH model with normally distributed
innovations fits well for the profits and losses of the trading accounts of six large
7
commercial banks in U.S. banks in the sense that the risk model passes the backtests
by Kupiec (1995) and Christoffersen (1998).8 Nevertheless, it is well known that
even at aggregate level the stock market indices are skewed and leptokurtic. It is thus
sensible to carry out further backtesting for tail risk of VaR. Since normality is widely
assumed in practice and a non-normal density can always be transformed into a
normal one,9 the saddlepoint technique in this paper is derived by assuming normality
for the underlying returns. Nevertheless, it must be emphasized that the small sample
asymptotic technique can also be derived under non-normal null hypothesis.
Now, let r be the portfolio return with a standard normal distribution and denote
its CDF and PDF by  and  respectively. From now onwards, unless stated
otherwise, the quantile q refers to the -quantile of a standard normal distribution,
that is q   1 ( ) . Next define a random variable X with realization x such that
r  q if r  q,
x
if r  q.
0
(3)
Note that x  ( , 0] and the expectation of X is the theoretical value of the
corresponding TR. The PDF of X can be defined in terms of  as
 ( x  q) if x  0,
f ( x)  
if x  0,
1  
and the corresponding CDF by
( x  q) if x  0,
F ( x)  
if x  0.
1
Now consider the following results on the moments of f (x) that shall be useful for
the rest of this section.
PROPOSITION 1: The moment generating function of f (x) is
M (t )  exp( qt  t 2 /2)  (q  t )  1   ,
(4)
and its derivatives with respect to t are given by
8
Passing the above frequency-based backtests, however, does not mean the tail risk is fully captured
by the normal distribution since it is acknowledged by the Basle Committee that frequency-based tests
are limited in powers.
9
If the null hypothesis assumes a non-normal density, Berkowitz (2001) and Kerkhof and Melenberg
(2004) also transform the non-normal null density into a normal one.
8
M (t )  exp( qt  t 2 /2) (t  q) (q  t )   (q  t ) ,
M (t )  (t  q) M (t )  M (t )  (1   ),
(5)
M ( n ) (t )  (t  q) M ( n1) (t )  (n  1) M ( n2) (t ),
where n  3 .
PROOF: Consider the moment generating function using the Stieltjes integral
0
0


M (t )  E(e tx )   e tx dF   e tx f ( x)dx  1   ,
(6)
which, after substituting  ( x  q) for f (x) in the integral term on the right hand
side of (6), collect the exponential terms and simplify, will give rise to (4). For the
derivatives of the moment generating function, first denote the differential operator
d
dt
by Dt . Now using the standard results that
Dt (t )   (t ),
Dt (t )  t   (t ),
Dt exp( qt  t 2 /2)  (t  q)  exp( qt  t 2 /2),
results in (5) can be verified by straight forward algebra.
Q.E.D.
As a corollary of Proposition 1, the following mean and variance of X can be obtained
easily.
COROLLARY 1: The mean of X, which is expectation of TR, is given by
e  q /2
2
 X  E( X )  q 
2
,
(7)
and its variance is
qe  q /2
2
  var( X )    q  
2
X
2
2
  X2 .
(8)
PROOF: Letting t = 0 in M (t ) and M (t ) from (4) will give rise to E(X) and E(X2)
respectively. It is straight forward to obtain of variance of X since var( X ) 
E( X 2 )  [E( X )]2 .
Q.E.D.
The asymptotical distribution of the sample mean of X can be obtained using
9
above results. Given a sample {xt } of size T, its sample mean x tends to normality
as T tends to infinity. That is,
 x  X
T 
 X
 d
 
 N (0 ,1) ,

where the population mean  X and variance  X are respectively given by (7) and
(8) in Corollary 1. Since extreme losses are used to provide as a basis for regulatory
capital determination,  is set by the Basle Committee at 0.01 for market risk capital.
Hence, above asymptotic result is hardly valid for sample sizes one would encounter
in practice. For this reasons, this paper resorts to the Lugannani and Rice (1980)
formula derived by saddlepoint technique in order to provide a more accurate method
to calculate the p-value for statistical backtest of tail risk.
3.3 The Lugannani and Rice formula
By using the small sample asymptotic technique, Lugannani and Rice (1980)
provide an elegant formula that can accurately calculate the tail probability (or 1-CDF)
of the sample mean of an independently and identically distributed (IID) random
sample. To have an intuitive understanding of the method, one can start with the
inversion formula when x  0 :10
f X (x) 
n
2



e n[ K (it )  itx ]dt ,
where f X (x ) denotes the PDF of the sample mean and K (t )  ln M (t ) is the
cumulant generating function of f (x) . Then the tail probability we required for
statistical testing can be written as
0
P( X  x )   f X (u )du 
x
1
2 i 
s  i
s  i
e n[ K (t )tx ] t 1 dt
(9)
where s is chosen to be the saddlepoint  satisfying
K ( )  x .
(10)
The role of the saddlepoint is to make possible a transformation be applied to (9), so
that one can avoid approximating K (t )  tx in the exponential where the function
would blow up. So to obtain the Lugannani and Rice (1980) formula, this paper
follows the approach of Daniels (1987) by using the following transformation
10
When x = 0, the probability density is simply
(1   )T and P( X  x )  0.
10
21W 2 (t )  W (t )  W  K (t )  t x
which describes the local behavior of K (t )  t x over a region containing both t  
and t  0 when  is small. Note that in the above transformation,
W  sgn(  )  2( x  K ( )) .
The tail probability formula may then be written as
P( X  x )  
c  i n[ 2 1 W 2 W W ]
1

c  i
e
t
dt
dW .
dW
For small t and x   X ,
W~
K ( )  K (0)
x  X
t 
t
W
W
and the tail probability may then be conveniently rearranged as
P( X  x )  
c  i n[ 2 1 W 2 W W ]

c  i
e
W 1dW  
c  i n[ 2 1 W 2 W W ]

c  i
e
t
1

dt / dW  W 1 dW .
Note that the first term on the right hand side of the above equation corresponds to the
singularity in (9) and can be shown to be equal to 1   (n1/2W ) . As for the second
term, Daniels (1987) shows after some algebra and application of Watson’s Lemma,11
it can be approximated by
 (n1/2W )  [(n1/2W )1  ( (nK ( ))1/2 )1 ]
with error term of order O(n 3/2 ) . This is the Lugannani and Rice (1980) formula
which may be more elegantly expressed as below without any further proof:
PROPOSITION 2: Given an IID standard normal random sample {rt } of size T,
construct {xt } according to (3). Let  be the saddlepoint such that (10) is satisfied,
and define    nK ( ) and   sgn( ) 2n( x  K ( )) , where sgn(s) equals
to zero when s = 0, or takes the same sign of s when s  0. Then the tail probability of
exceeding the sample mean x   X is given by
1 1

P( X  x )  1  ( )   ( )     O(T 3/2 )  .
 

< Insert Figure 1: Confidence bounds of TR >
11
See for examples Watson (1948) and Daniels (1954).
11
(11)
Figure 1 depicts the 99% confidence bounds of TR using the saddlepoint and
normal asymptotic distributions. Since TR is calculated based on 99%VaR, the
saddlepoint lower bound is not available for sample size T  500 (0.99500 > 0.005).
It is evident from Table I that the saddlepoint technique yields accurate test sizes.
Large deviation between the saddlepoint and normal limiting distribution below
therefore reveals that the latter approximate poorly the true distribution of TR when
the sample size is small.
In order to obtain the tail probability, I first calculate the saddlepoint  by
solving for t in the following equation
K (t ) 
M (t )
(t  q) (q  t )   (q  t )

x,
M (t ) (q  t )  (1   ) exp( qt  t 2 /2)
(12)
so that the required moment derivatives given in (5) can be evaluated at t   . Then
moment derivatives are used to obtain the cumulant counterparts, which in turn are
used to calculate  and  as defined in Proposition 2, and thus the required
probability. Above tail probability can be calculated only when x   X . Though the
case of x   X is hardly encountered in practice, the corresponding tail probability
is provided below for completeness:
P( X   X ) 
1
K (3) (0)

 O(n 3/2 ) .
3
2 6 2 n [ K (0)]
Finally, it is noted that the Lugannani and Rice formula given by (11) is applicable
only when the sample mean x  0 . When x  0, it is obvious that P( X  0)  1.
Hypothesis testing
For hypothesis testing, it is more intuitive to use the standardized statistic
~
z   1TR ,
(12)
where TR is given by (1). While the likelihood ratio backtests of Kupiec (1995) and
Berkowitz (2001) are two-tailed tests, the proposed backtest based on (12) can be
either one- or two-tailed. For a two-tailed backtest, the null and alternative hypotheses
are
H0 : ~
z ~
z0 versus H1 : ~
z ~
z0 ,
z 0  0.3389 under the standard normal null hypothesis.12 To carry out a
where ~
12
For a standard normal distribution, VaR = -2.3264, ES = -2.6652 and therefore TR = ES – VaR =
12
one-tailed regulatory backtest to check whether a risk model provides sufficient tail
risk coverage, the null and alternative hypotheses become
H0 : ~
z ~
z0 versus H 1 : ~
z ~
z0 .
Since portfolio losses are represented by negative numbers, if the theoretical TR
predicted by the risk model is more negative (i.e. larger tail risk) than the empirical
TR, then the risk model is said to “capture” the tail risk. On the other hand, if the
alternative hypothesis is accepted, then the risk model is described as failing to
capture the tail risk of VaR.
4. Monte Carlo simulations
In this section, Monte Carlo simulation experiments are carried out to investigate
the empirical test sizes and the powers of the proposed backtests when  is set at
0.01 for market risk measurement. To carry out the simulation study, IID samples of
various sizes are generated and the corresponding p-values of the observed
standardized statistic given by (12) are calculated using the Lugannani and Rice
formula. For each sample size, rejections are made if the calculated p-values are less
than the specified significance levels ranging from 0.5% to 5%. The observed
empirical sizes are then obtained by dividing the numbers of rejections by the total
numbers of simulations, which are 100,000. For each empirical test size, likelihood
ratio test of the empirical failure rate is applied.
4.1 Empirical test sizes
When the IID samples are generated from standard normal distribution, the
Monte Carlo rejection rates give us the empirical test sizes, which are reported in first
two panels of Table I below. The top panel provides the results for the right tail of the
distribution. Since the Lugannani and Rice formula is only valid for x  0 , the CDF
is set at the maximum value of 1 when the sample mean equals to zero. For ease of
reference, the first column of Panel A lists the theoretical probabilities when x  0 .
For example, when T = 250, the corresponding probability is 0.99 250  0.0811.
Accuracy of empirical test sizes are tested along the same line as the Kupiec’s
likelihood ratio test. Notation * indicates significance at 1% level. The last four
-0.3389. This result can also be obtained by evaluating (7).
13
columns of Panel A reveals that as long as the theoretical test size is smaller than the
corresponding probability value when x  0 , the empirical test sizes are very close to
the theoretical ones.
<Insert Table I: Empirical test sizes>
Right tail of the distribution can be used to check whether an observed TR is
over predicted by a risk model. However, in terms of the regulatory capital
requirement for buffeting huge losses, one would need to examine the left tail of the
distribution.13 Panel B shows the empirical test sizes at the left tail of the distribution
and the accuracy of the Lugannani and Rice formula is demonstrated.
As a benchmark for comparison, the last panel of the table provides the empirical
test size of the LR test by Kupiec (1995) for unconditional coverage. Since the LR
ratio test statistic achieves the Chi-squared distribution only when the sample size is
large, most of the empirical test sizes are quite far off when T is less than 1000. This is
basically due to the fact that for market risk,  is set at a low level of 0.01.
4.2 Empirical test powers
Student t distributions with 4, 6 and 8 degrees of freedom (df) are used for the
study of the powers of the Bernoulli, Kupiec and the proposed size-based backtests. It
is well known that the tail of Student t distribution behaves similarly to that of the
heavy-tailed case of a generalized Pareto distribution (GPD).14 Given a sample of
size T with b losses exceeding the magnitude of VaR, the p-value of the one-tailed
Bernoulli test is given by 1  P ( B  b) where B is distributed as Binomial (T ,  ) .
While the Kupiec’s LR test is a two-tailed test, the backtest of tail risk of VaR can be
either one- or two-tailed (see Section 3.2 above). Only rejection rates at 5%
significance level of above backtests are presented in Table II below. Panel A
tabulates the empirical test powers of the two-tailed Kupiec and the size-based
proposed backtests for various sample sizes ranging from 125 to 4,000 15 ; the
corresponding one-sided test powers are given in Panel B. As expected, both the
Kupiec and Bernoulli tests show some powers only when T is more than 1,000. For
13
For a two-sided test based on tail risk of VaR, both the right and left tails of the test statistic’s
distribution are required. This is in contrast to the LR tests of Kupiec (1995) and Berkowitz (2001)
where only right tail of the Chi-squared distribution is required for a two-tailed test.
14
Under the extreme value theorem, the maxima of random samples tend in limit to a GPD; see
Embrechts et al. (1997).
15
In the application of the proposed backtest of tail risk of VaR to real data in Section 5, the sample
size is 3,791.
14
the proposed backtest of tail risk of VaR, equivalent powers are achieved in the case
of df = 4 when sample sizes are as small as 125.
<Insert Table II: Test powers>
5. Application to S&P 500 daily returns
In this section three widely studied risk models are used to forecast one-step
ahead S&P 500 daily returns from 1-Jan-1986 to 31-Dec-2000; see Figure 3 below.
3,000 observations prior to 1-Jan-1986 are first used to estimate the risk models and
then produce ten one-step-ahead VaR forecasts. The estimation window is rolled
forward ten trading days and a new set of model parameters are obtained, which will
in turn be used to forecast one-step ahead for the next ten VaRs. In this manner, the
process is continued until the last day of the sample, 31-Dec-2000, is reached. The
chosen risk models are capable of modeling the stylized dynamics of stock returns
such as the time varying conditional variances, skewed and fat-tailed distribution, and
jumps. The risk forecasts are backtested with the frequency-based Kupiec and
Bernoulli tests, as well as the proposed backtest based on tail risk of VaR. Two-sided
Kupiec and the proposed size-based VaR backtests are applied to the 15 years of VaR
forecasts. Along the same line as the Basle rules on the backtest for market risk capital,
one-sided Bernoulli and the proposed backtests are also carried out on yearly basis.
The results of backtests suggest there are substantial downside tail risks present in the
stock markets.
< Insert Figure 2: S&P 500 daily returns >
5.1 Risk models
Before considering the three VaR models in detail, the basic risk model is first
introduced here. Let rt be the portfolio return on day t and  t be the corresponding
time varying daily expected return. Then the basic risk model is
rt  t   t    a rt 1   t ,
 t   1,t   2,t ,
where the innovation process  t has two components which are independent of each
15
other, namely the diffusion component  1,t and the jump component  2 ,t . The
diffusion component can be further modeled by the time varying conditional variance
 t and an IID innovation process zt as  1,t   t z t . For the jump component,  2 ,t
 J t  E[ J t |  t 1 ] where J t is a jump process, the detail of which will be described
below.
N-GARCH
The first risk model, GARCH with normal innovations (henceforth N-GARCH),
has been used by Berkowitz and O’Brien (2002) to forecast the profits and losses of
trading accounts of six major commercial banks. Berkowitz and O’Brien find that the
standard GARCH model outperforms the banks’ structural VaR models in that the
former achieves comparable risk coverage with less regulatory capital. 16 For
N-GARCH, the innovation process  t is fully described by the diffusion component
with the following further specification:
 t     t21   t 1 ,
zt ~ N (0,1).
The advantage of N-GARCH is able to model effectively the time-varying conditional
variances that are present in the financial asset returns.
SKST-APARCH
Empirical findings show that the N-GARCH residuals still exhibit fatness in the
tails, and that financial asset returns are often skewed; see, for example, Mittnik et al.
(2000). For this reason, the Asymmetric Power ARCH model with skewed Student
innovations (hereafter SKST-APARCH) is employed by Giot and Laurent (2003) to
forecast VaRs for long and short trading positions.17 Other literature of applying
SKST-APARCH for risk modeling includes Giot and Laurent (2004), Giot (2005) and
Tu and Wong (2005). Similar to N-GARCH, only the diffusion component is required,
16
Berkowitz and O’Brien (2002) attribute the poor performance of the banks’ structural VaR models to
(a) since large banks’ trading portfolios are large and complex, approximations are used to reduce
computational burdens and overcome estimation difficulties; (b) time variation in the variances and
covariances of market risks are not properly captured; and (c) conservative modeling practices and
regulatory constraints (prior to the revised Capital Accord II) such as the simple summation of
subportfolio VaRs to produce the bank-level global VaR.
17
Another candidate model that can fit fat-tails and skewness well is the Normal Inverse Gaussian
distribution; see, for example, Eberlein et al. (1998).
16
and its specification is provided below.
 t     (  t 1   n t 1 )   t1 ,
z t ~ skst (0,1, ,  ) .
The IID innovation z t has standardized (zero mean and unit variance) skewed
Student distribution with PDF
m
 2s
    1 g[ ( sz  m)  ] if z   s ,
f ( z  , )  
2s
m

g[ 1 ( sz  m)  ] if z   ,
1
  
s
where g ( ) is the standardized Student density. While  is the degree of freedom
parameter that describes fatness at the tails,  is the skewed parameter which is less
(greater) than 1 if the distribution is skewed to the left (right). The other two
parameters, m and s2 are respectively the mean and variance of the otherwise
non-standardized skewed Student:
m
((  1) / 2)   2
 ( / 2)
(   1 ) ,
s 2  ( 2   2  1)  m 2 .
GARJI
Finally, the jump-diffusion model (GARJI) of Maheu and McCurdy (2004) is the
last model considered in this paper. The GARJI model has been found by Chiu et al.
(2005) successful in capturing the jump dynamics of stock markets and producing
accurate VaR forecasts. Maheu and McCurdy use GARJI to model the conditional
variance of returns as a combination of jumps and smoothly changing components.
Large fluctuations in price due to the impact of news, as well as smoother changes in
price as a result of normal trading can be captured effectively by this model. In
particular, the smoother diffusion component 1,t   t zt where zt ~ N (0,1) and the
time varying variance is given by
 t2    g (,  t 1 )   t21   t21 ,
where   ( ,  j ,  n ,  nj ) is the parameter vector. The coefficient function
g (,  t 1 ) enables both jump and normal innovations to be feedback asymmetrically
into the conditional variance, and thus able to produce better volatility forecasts
17
especially after large moves such as the 1987 October crash. Specifically, it is written
as
g (,  t 1 )  exp    j E[nt 1 |  t 1 ]  I ( t 1 )( n   nj E[nt 1 |  t 1 ])  ,
where nt 1 is a Poisson process counting the number of jumps at time t  1 and
I ( t 1 ) equals one if  t 1  0 , zero otherwise. The Poisson process nt comes from
the second jump component with time varying conditional jump intensity t . That is
P(nt  j |  t 1 ) 
exp( t )  tj
,
j!
where
t  v  t 1    t 1 ,
and

 t 1  E[nt 1 |  t 1 ]  t 1   jP(nt 1  j |  t 1 )  t 1 .18
j 0
Here  t 1 is the ex post measured shock or jump intensity residual to t 1 . Next the
jump process is defined as the sum of IID normal jump sizes, that is J t  kt1Yt ,k ,
n
where the jump size
Yt ,k ~ IID N ( ,V ) .
So the jump component mentioned above is obtained as
 2,t  J t  E[ J t | t 1 ]  k 1Yt ,k  t .
nt
Readers are referred to Chan and Maheu (2002) for a more flexible specification on
the jump dynamics. Finally, it is remarked that the conditional variance of GARJI
consists of the smoother diffusion component as well as the jump component:
var( rt | t 1 )  var( 1,t | t 1 )  var( 2,t | t 1 )   t2  ( 2  V ) t .
All three risk models are estimated using maximum likelihood method and Table III
reports the relevant estimated parameters.
<Insert Table III: Risk models>
5.2 Calculation of VaR
Both the normal and Student distributions belong to the class of elliptical
18
Similar to Maheu and McCurdy (2004), the infinite summation is truncated at j = 20 in the
estimation of the GARJI model.
18
distributions and the corresponding VaR can be expressed as VaR    t  Z   t ,
where Z is the left -quantile. Above VaR expression is for the downside risk of
holding a long portfolio. For the upside risk of a short position, the corresponding
VaR is VaR 1   t  Z1  t . For standard normal distribution, Z  Z1 
 2.326 when  is set at 1% for market risk measurement. For the standardized
skewed Student distribution,19
skst , ,  m
Z   skst , , 
s
,
(13)
where if st , is the quantile function of the unit variance Student density at %,

skst , ,



 1 st  , 2 (1   2 )

2
1
   st  , 2 (1   )

if   (1   2 ) 1 ,
if   (1   2 ) 1 .
(14)
There is no simple formula for the VaR of the jump-diffusion model, GARJI.
Nevertheless, we can make use of the CDF function20
P(rt  R |  t 1 )

   f (rt | nt  j ,  t 1 )drt  P(nt  j |  t 1 )
j 0
q

(15)

 R      j  
t
t
  P(n  j |  ),
  
t
t 1
2


j 0
 t  jV


so that the VaRs for the long and short positions can be obtained respectively from the
facts that P(rt  VaR  |  t 1 )   and P(rt  VaR 1 |  t 1 )  1   .
< Insert Table IV: VaR basic statistics >
Table IV provides some basic statistics of the VaRs forecasted by the three risk
models. For the in-sample forecast VaR, it can be seen that both GARJI and
SKST-APARCH have the larger downside VaRs, which is consistent with the
skewness present in the S&P 500 daily returns. Another interesting observation is that
N-GARCH has the largest maximum VaRs on both tails of the distribution, whereas
the sum of its average downside and upside VaRs is only slightly smaller than that of
GARJI. For the out-of-sample forecast VaRs, it is noted that the difference from its
in-sample counterparts is quite large for GARJI, an indication of possible model risk
due to large number of parameters to be estimated in GARJI.
Z 1 .
19
Note that (13) and (14) are also applicable to
20
As in the estimation of GARJI, the infinite summation is truncated at j = 20.
19
5.3 Backtests
While qualitative and graphical analyses such as those by Diebold et al. (1998)
reveal valuable information for diagnosing how a risk model may fail, it is also
important to consider a formal statistical backtest that offers a more objective analysis.
To carry out the size-based TR backtest, it is required to transform the skewed Student
distribution into a standard normal one using  1 ( Fskst ( z  , )) , where the
standardized skewed Student CDF is
m
 2
if z   ,
1   2 G[ ( sz  m)  ]
s
Fskst ( z  , )  
2
m
1 
G[ 1 ( sz  m)  ] if z   .
 1    2
s
(16)
In (16) above, G ( |  ) is the unit variance Student CDF. For the jump-diffusion
process, one simply obtain the required normal deviates by applying the standard
normal inverse function to the probability value from (15) for each observed rt .
<Insert Table V: Backtests of full sample>
Table V reports both frequency- and size-based two-tailed backtests for the full
forecast period, a total of 3791 observations. First consider the downside risk of
holding a long position in S&P 500. For the frequency-based backtest, one finds that
the VaRs of N-GARCH fail the Kupiec test but the SKST-APARCH and GARJI
models are able to produce VaRs that provide correct coverage of S&P 500 daily
fluctuations. In the case of downside tail risk, however, it is found that all risk models
fail the proposed backtests, except for the in-sample SKST-APARCH case. In
particular, as a negative significant ~z statistic means that the predicted TR is too
small when compared with the empirical TR, one can conclude that the risk models
fail to capture downside tail risk of VaR in S&P 500. Also, the fact that the in-sample
risk forecasts of SKST-APARCH pass the proposed backtest is consistent with the
estimation result in Table III where the model achieves the largest likelihood value.
When one holds a short position in S&P 500, he or she faces upside risks. In this case,
a large positive (negative) ~z value means that the empirical TR is significantly
larger (smaller) than the TR predicted by the risk model. Though the N-GARCH risk
model passes all backtests, its downside risk forecasts fail the TR backtest miserably
20
with large negative ~z values.21 Both SKST-APARCH and GARJI are found to
provide excessive coverage of the tail risks, an indication of failure in modeling the
skewness of the distribution. The exception lies in the GARJI where its in-sample TR
forecast fails to capture the empirical tail risk. To sum up above analysis, substantial
skewness and downside tail risks of VaR are found to present in S&P 500 daily
returns, which are difficult to model and forecast.
<Insert Table VI: Backtests of tail risk-year on year>
Next I apply the proposed TR backtest to the S&P 500 daily returns along the line
of Basle rules so that the test is one-sided and is carried out on year-to-year basis. It is
remarked here that for a year’s daily returns, a sample size of around 250 is too small
for a two-sided test since in this case the probability of zero exception is 0.081, larger
than the significance level of 0.05. For this reason, Table VI reports only the onetailed Bernoulli and the proposed size-based VaR backtests. While the Bernoulli test
rejects N-GARCH for 3 out of 15 years, the proposed backtest detects significant tail
risks unpredicted by the model for 10 years of the full sample period. Basically, the
years of substantial tail risks took place during the few years surrounding the 1987
October crash and from 1996 to 2000, when the stock market circuit breaker was
triggered for the first time in 1997 and the LTCM nearly collapsed in the following
year.22 While it is reassuring that both SKST-APARCH and GARJI pass the backtest
of tail risk of VaR during the second period, these two risk models nevertheless fail to
capture the tail risk of VaR in 1987 and the year before. Indeed, the outcome remains
the same for the crash year even if the in-sample forecasts are used, illustrating the
challenge one faces in modeling the market fall of over twenty percent. It is
worthwhile to note that there was no circuit breakers mechanism in 1987 when the
stock market experienced its largest fall in a single day of trading. Therefore,
questions naturally arise: whether such trading halt rules would prevent the 1987
crash ever to happen again or the above finding that the post 1996 stock market tail
risks can be captured by the risk models is attributable to the circuit breakers? To
answer such questions, however, is beyond the scope of this paper but certainly merits
further research.
21
Note that for large sample, the TR test statistic is close to normal under the null hypothesis. So
z is in this case close to standard normal.
-18.91 and -20.41 are very large negative numbers as ~
22
Prior to the LTCM’s downfall, we witnessed in 1998 the devaluation of the Russian ruble followed
by Russian debt default.
21
Tail risk and subadditivity
As from 2006 onwards, the new Basle Accord requires major banks to
incorporate the operational risk in an integrated risk framework. Operational losses
are infrequent but large in magnitude, it is likely that its distribution is not elliptical
and thus it is important for risk managers to carry out backtest of tail risk of VaR.
Since the S&P 500 daily returns in 1987 are found to have substantial downside tail
risk, the subadditivity of VaR is explored here. In particular, consider the 1987 daily
returns of S&P 500, Nikkei 225 and an equal weighted portfolio of the two stock
indexes. 23 For the sake of illustration, the exchange rate risk is assumed to be
perfectly hedged away, so that all three sets of returns are directly comparable. Using
order statistics, 99% VaR in the year are calculated as -6.57, -4.65 and -11.74
percentages respectively. Since the sum of component VaRs, -11.22, is less negative
than the portfolio VaR (the reader is reminded that a loss is represented by a negative
number), the subadditivity of VaR breaks down.
As can be seen above, all three risk models fail the backtest of tail risk of VaR in
1987. As non-subadditivity is often caused by substantial tail risk or unusually large
losses, the proposed backtest could in practice offer preliminary check on whether the
VaR of a given sample would fail the subadditive property. Finally, it is remarked that
the corresponding ES values are also calculated and are found to be subadditive.24
6. Conclusion
Despite the fact that VaR has certain undesirable theoretical properties, I briefly
outline in this paper the reasons for its adoption by the Basle Committee for market
risk capital determination and its subsequent establishment as the standard risk
measure in financial industry. I then proceed to propose a simple backtest of VaR
based on sizes of tail losses. The associated test statistic TR has good economic
intuition, is easy to calculate and allows the backtest to be formulated as either a oneor two-tailed test. Saddlepoint or small sample asymptotic technique is used to obtain
the p-value of the test statistic. Since it makes use of information conveyed by sizes of
23
It is assumed that the exchange rate risk is perfectly hedged away.
The ES of S&P 500, Nikkei 225 and the equal weighted portfolio are -15.77, -10.60 and -19.16
respectively.
24
22
exceptions, the proposed backtest has accurate test size and is more powerful than
frequency-based backtests even for small samples. This is important because
regulatory backtest is required to be carried out on most recent 250 observations and
that samples available for backtesting are often quite small in practice.
The proposed backtest of tail risk of VaR is applied to the S&P 500 daily returns
from 1986 to 2000. Substantial downside tail risk is found to present in the stock
market returns, and risk models that take into account of jumps, skewed and fat-tailed
distributions fail to capture the downside tail risks, especially in 1987. In particular,
the VaR of S&P 500 and Nikkei 225 in the crash year fail to satisfy the subadditive
property. Such empirical evidence of non-subadditivity of VaR has important
implication for major banks. From 2006 onwards, the new Capital Accord requires
major banks to incorporate the operational risk in an integrated framework. As
operational risk is characterized by large and infrequent losses, tail risk of VaR is
likely to be substantial and the associated problem of non-subadditivity may not be
ignored. Thus in measuring a bank’s integrated risk, it is important to formally
backtest the tail risk of VaR, so that a risk manager would have more information
about the nature of risk (reflected by the tails of the underlying distribution) his or her
bank is facing.
In this paper, the saddlepoint technique for backtesting VaR based on tail losses
is derived under the standard normal null hypothesis. However, it must be emphasized
that the small sample asymptotic technique is also applicable to non-normal
distribution. An alternative approach that is considered in this paper is to transform
the non-normal distribution into a normal one. In this case, even for VaR obtained
using non-parametric method such as order statistics, the proposed backtest may still
be applied by transforming the sample distribution into a normal one.25 Given the
potential advantage of backtesting VaR based on the sizes of tail losses, such effort is
certainly worthwhile.
Finally, VaR is now ubiquitous in risk management and its wide usage is difficult
to be changed. Therefore, it is useful to backtest tail risk of VaR in a manner that is
complimentary to the backtesting procedure stipulated by the Capital Accord.
Specifically, the Basle rules count the number of exceptions whereas the proposed
backtest of VaR sums the sizes of tail losses. Such complimentary role provides not
25
There are many ways to carry out such transformation. For example, the sample distribution can first
be estimated by using the kernel smoothing, and then transformed into a normal distribution.
23
only valuable information for more powerful backtesting, but also presents a formal
statistical analysis on the nature of tail risk of VaR and facilitates, if any, appropriate
supervisory action to be made.
Reference
Artzner, Philippe, Freddy Delbaen, Jean M. Eber, and David Heath, 1997, Thinking
coherently, Risk 10(11), 68 – 71.
Artzner, Philippe, Freddy Delbaen, Jean M. Eber, and David Heath, 1999, Coherent
measures of risk, Mathematical Finance 9(3), 203 – 228.
Basle Committee on Banking Supervision, 1996a, Amendment to the Capital Accord
to Incorporate Market Risk.
Basle Committee on Banking Supervision, 1996b, Supervisory Framework for the use
of “Backtesting” in conjunction with the Internal Models Approach to Market
Risk Capital Requirements.
Basle Committee on Banking Supervision, 2004, International convergence of capital
measurement and capital standards: a revised framework.
Basak, Suleyman, and Alexander Shapiro, 2001, Value-at-Risk based risk
management: Optimal policies and asset prices, Review of Financial Studies 14,
371 – 405.
Berkowitz, Jeremy, 2001, Testing density forecasts, with applications to risk
management, Journal of Business and Economic Statistics 19, 465 – 474.
Berkowitz, Jeremy, and James O’Brien, 2002, How accurate are Value-at-Risk models
at commercial banks? Journal of Finance 57, 1093 – 1111.
Chan, W. H. and J. M. Maheu, 2002, Conditional jump dynamics in stock market
returns, Journal of Business & Economic Statistics 20, 377 – 389.
Chiu, Chien L., Lee, Ming C., and Jui C. Hung, 2005, Estimation of Value-at-Risk
under jump dynamics and asymmetric information, Applied Financial
Economics 15, 1095 – 1106.
Christoffersen, Peter F., 1998, Evaluating interval forecasts, International Economic
Review 39, 841 – 862.
Christoffersen, Peter F., and Francis X. Diebold, 2000, How relevant is volatility
forecasting for financial risk management, Review of Economics and Stiatistics
24
82, 12 – 22.
Christoffersen, Peter F., Jinyong Hahn, and Atsushi Inoue, 2001, Testing and
comparing Value-at-Risk measures, Journal of Empirical Finance 8, 325 – 342.
Daniels, H. E., 1954, Saddlepoint Approximations in Statistics, Annals of
Mathematical Statistics 25, 631 – 650.
Daniels, H. E., 1987, Tail Probability Approximations, International Statistical
Review 55, 37 – 48.
Danielsson, Jon, Bjorn N. Jorgensen, Mandira Sarma, and Casper G. De Vries, 2006,
Comparing downside risk measures for heavy tailed distributions, Economics
Letters 92, 202 – 208.
Diebold, F. X., T. A. Gunther, and A. S. Tay, 1998, Evaluating density forecasts,
International Economic Review 39, 863 – 883.
Dowd, Kevin, 2001, Estimating VaR with order statistics, Journal of Derivatives 8(3),
23 – 31.
Dowd, Kevin, 2006, Using Order Statistics to Estimate Confidence Intervals for
Probabilistic Risk Measures, Journal of Derivatives, winter 2006.
Eberlein, Ernst, Ulrich Keller, and Karsten Prause, 1998, New insights into smile,
mispricing and value at risk: the hyperbolic model, Journal of Business 71, 371 –
406.
Embrechts, Paul, Claudia Kluppelberg, and Thomas Mikosch, 1997, Modelling
Extremal Events, Springer, Berlin Heidelberg.
Embrechts, Paul, Alexander McNeil, and Daniel Straumann, 1999, Correlation and
dependence in risk management: Properties and pitfalls, Risk 12, 69 – 71.
Giot, Pierre, 2005, Implied Volatilities Indexes and Daily Value at Risk Models.
Journal of Derivatives 12(4), 54 – 64.
Giot, Pierre, and Sebastien Laurent, 2003, Value-at-Risk for long and short trading
positions, Journal of Applied Econometrics 18, 641 – 664.
Giot, Pierre, and Sebastien Laurent, 2004. Modelling daily Value-at-Risk using
realized volatility and ARCH type models. Journal of Empirical Finance 11,
379 – 398.
Jorion, Philippe, 2001, Value at Risk, 2nd Edition, McGraw Hill, New York.
Kerkhof, Jeroen and Bertrand Melenberg, 2004, Backtesting for risk-based regulatory
capital, Journal of Banking and Finance 28, 1845 – 1865.
Kupiec, Paul H., 1995, Techniques for verifying the accuracy of risk measurement
25
models, Journal of Derivatives 3, 73 – 84.
Leippold, Markus, 2004, Don’t rely on VaR, Euromoney, November, London.
Lotz, Christopher, and Gerhard Stahl, 2005, Why value at risk holds its own? In:
Euromoney, February 2005, London.
Lugannani, R., and S. O. Rice, 1980, Saddlepoint approximation for the distribution
of the sum of independent random variables, Advanced Applied Probability 12,
475 – 490.
Maheu, John M., and Thomas H. McCurdy, 2004, News Arrival, Jump Dynamics, and
Volatility Components for Individual Stock Returns, The Journal of
Finance 59 (2), 755 – 793.
Mittnik, S., Paolella, M.S., Rachev, S., 2000. Diagnosing and treating the fat tails in
financial returns data. Journal of Empirical Finance 7, 389 – 416.
Rosenberg, Joshua V., and Til Schuermann, 2006, A general approach to integrated
risk management with skewed, fat-tailed risks, Journal of Financial Economics
79, 569 – 614.
Szegö, Giorgio, 2002, Measures of risk, Journal of Banking and Finance 26, 1253 –
1272.
Tu, Anthony H., and Woon K. Wong, 2005, Value-at-Risk for Long and Short
Positions of Asian Stock Markets. 2005 International Conference on Business
and Finance Proceedings, Tamkang University, Taiwan, R.O.C.
Watson, G. N., 1948, Theory of Bessel Functions, Cambridge University Press.
Yamai, Yasuhiro, and Toshinao Yoshiba, 2005, Value-at-risk versus expected shortfall:
A practical perspective, Journal of Banking and Finance 29, 997 – 101.
26
Table I: Empirical test size of backtest of VaR based on tail losses
100,000 simulations of various sample sizes are generated to study the empirical test sizes of
backtest of VaR based on tail losses and the Kupiec’s (1995) likelihood ratio backtest of
failure rate. The former can be used for both one- and two-tailed tests, and the first two panels
below list the empirical test sizes on the two tails of the distribution. The latter can only be
used for two-tailed test and Panel C lists the corresponding empirical test sizes. All empirical
test sizes are tested using the Kupiec’s (1995) likelihood ratio test. Notation * indicates
significance at 1% level.
N
(%)
P ( x  0)
0.5
Theoretical test size (%)
1.0
2.5
Panel A: Right-tail empirical test sizes of backtest of tail risk of VaR
125
28.47
28.59*
28.59*
28.59*
250
500
750
1000
2000
4000
8.11
0.66
0.05
0.00
0.00
0.00
8.28*
5.16
5.07
4.98
5.00
4.99
Panel B: Left-tail empirical test sizes of backtest of tail risk of VaR
125
0.48
0.99
2.44
250
0.53
1.01
2.48
4.98
5.03
Panel C: Kupiec’s empirical test sizes
125
250
500
750
1000
2000
4000
8.28*
1.08*
1.04
1.02
1.01
0.97
28.59*
8.28*
2.60
2.58
2.47
2.47
2.51
500
750
1000
2000
4000
8.28*
0.67*
0.54
0.50
0.51
0.47
5.0
0.52
0.49
0.51
0.50
0.47
1.02
1.00
1.02
1.01
0.95
2.53
2.56
2.46
2.55
2.44
5.05
5.05
4.92
4.94
4.91
0.17*
0.10*
0.86*
0.17*
0.42*
1.19*
0.88*
9.61*
1.99*
3.83*
9.61*
7.20*
0.66*
0.43*
0.46
0.54
0.93
1.32*
0.97
0.89*
2.95*
2.35*
2.33*
2.52
4.07*
5.54*
4.19*
5.67*
27
Table II: Test powers
Powers of various backtests under the standard normal null hypothesis are investigated based
on 10,000 simulations of Student t random deviates with 4, 6 and 8 degrees of freedom (df).
TR, Kupiec and Bern denote respectively the proposed backtest of tail risk of VaR, the Kupiec
likelihood ratio test of failure rate and the Bernoulli backtest. Panel A reports the empirical
test powers of two-tailed TR and Kupiec backtests whereas Panel B presents that of one-tailed
TR and Bernoulli backtests.
Panel A: Two-tailed tests
df = 4
TR Kupiec
N
125
250
500
750
1000
2000
4000
df = 6
TR Kupiec
df = 8
TR Kupiec
0.363 0.125
0.531 0.111
0.260 0.114
0.390 0.104
0.196 0.098
0.287 0.093
0.746
0.866
0.930
0.995
1.000
0.581
0.715
0.805
0.966
0.999
0.436
0.555
0.650
0.873
0.988
0.228
0.250
0.345
0.533
0.851
0.203
0.215
0.301
0.461
0.782
0.164
0.164
0.235
0.348
0.639
Panel B: One-tailed tests
N
125
250
500
750
1000
2000
4000
df = 4
TR Bern
0.423
0.599
0.796
0.898
0.950
0.997
1.000
df = 6
TR Bern
0.125
0.180
0.223
0.347
0.443
0.610
0.881
0.325
0.463
0.651
0.778
0.856
0.978
1.000
28
0.114
0.161
0.197
0.306
0.389
0.538
0.822
df = 8
TR Bern
0.257
0.369
0.521
0.638
0.723
0.917
0.993
0.098
0.134
0.156
0.244
0.312
0.420
0.688
Table III: Risk Models
Three risk models, namely N-GARCH, SKST-APARCH and GARJI are estimated using
3,791 S&P 500 daily returns from 1986 to 2000. SKST-APARCH is found to have the largest
likelihood. Q2(20) is Ljung-Box portmanteau test for serial correlation in the squared
standardized residuals with 20 lags. p-values of Ljung-Box tests are shown in brackets.
Parameter

N-GARACH
Coeff. t-stat
SKST-APARCH
Coeff. t-stat
GARJI
Coeff. t-stat
0.064
4.29
0.042
3.15
0.035
2.74
A
0.040
2.35
0.015
0.95
0.029
1.70

0.016
1.23
0.012
2.59
0.000
0.57


0.093
1.86
0.063
5.96
-0.383
-9.54
0.897
16.79
0.940
77.5
0.977
218.7
n
0.595
5.80
0.003
0.09

1.018
7.98


5.569
9.81
-0.054
-2.59
j
-0.319
-1.29
 nj
0.300
1.22

0.017
4.13

0.960
127.8

0.639
6.87

-0.667
-7.47
V
0.743
8.27
Likelihood
Q2(20)
-5014.7
9.12 [0.982]
-4783.5
11.88 [0.920]
29
-4797.9
7.34 [0.995]
Table IV: VaR basic statistics
Basic statistics are reported for the in-sample and out-of-sample VaR forecasts by the three
risk models in Section 5. Max refers to the largest VaR in magnitude. Large difference
between in-sample and out-of-sample VaR forecasts of GARJI is observed.
Panel A: In-sample VaR statistics
Downside risk
Upside risk
SKST-
SKST-
N-GARCH APARCH
GARJI
N-GARCH APARCH
GARJI
Average
-2.20
-2.48
-2.53
2.33
2.39
2.02
Max
-17.76
-12.97
-10.35
17.23
11.87
8.33
Panel B: Out-of-sample VaR statistics
Downside risk
Upside risk
SKST-
SKST-
N-GARCH APARCH
GARJI
N-GARCH APARCH
GARJI
Average
-2.19
-2.40
-2.79
2.30
2.43
2.64
Max
-16.58
-15.35
-47.59
15.32
14.36
42.46
30
Table V: Backtests of full sample
Two-tailed backtests are carried out on full sample VaR forecasts from 1986 to 2000, a total
of 3791 observations. Panels A and B report respectively the results of backtests for the
in-sample and out-of-sample VaR forecasts. Empirical failure rate and TR statistic are
reported for the Kupiec backtest and the proposed backtest of tail risk of VaR respectively.
The corresponding p-values are shown in brackets. The theoretical value of TR is -0.33887
(0.33887) for the downside (upside) risk.
Downside risk
Upside risk
Panel A: In-sample
Kupiec (%)
TR
SKSTN-GARCH APARCH
GARJI
SKSTN-GARCH APARCH
1.77
[< 0.001]
-1.748
0.98
[0.881]
-0.443
0.87
[0.412]
-0.564
0.82
[0.244]
0.343
[< 0.001]
[0.175]
[0.003]
[0.919]
0.55
[0.003]
0.132
GARJI
1.50
[0.004]
0.576
[< 0.001] [0.005]
Panel B: Out-of-sample
SKSTN-GARCH APARCH
GARJI
Kupiec (%)
1.71
[< 0.001]
0.87
[0.412]
0.95
[0.753]
0.63
[0.015]
0.66
[0.025]
TR
-1.860
-0.642
[< 0.001] [< 0.001]
-0.545
[0.012]
0.431
[0.223]
0.145
[0.002]
0.180
[0.018]
1.11
[0.512]
SKSTN-GARCH APARCH
< TABLE VI: Out of Sample Backtests on last page >
31
GARJI
Figure 1: Confidence bounds of TR
The figure below depicts the 99% confidence bounds of TR using the saddlepoint and normal
asymptotic distributions. Since TR is calculated based on 99%VaR, the saddlepoint lower
bound is not available for sample size T  500 (0.99500 > 0.005). It can be seen from Table I
that the saddlepoint technique yields accurate test sizes. Large deviation between the
saddlepoint and normal limiting distribution below shows that the latter approximates poorly
the true distribution of TR when the sample size is small.
99% Confidence Bounds of TR
1
0.5
0
-0.5
-1
-1.5
-2
-2.5
100
200
300
400
500
600
700
800
900
1000
Sample size T
Saddlepoint Lower Bound
Saddlepoint Upper Bound
Normal Lower Bound
Normal Upper Bound
32
Figure 2: S&P 500 daily returns
S&P 500 daily returns from 2-Jan-1986 to 30-Dec-2000 are depicted below. The lower bound
is the 99% VaR for holding a long position in S&P 500, whereas the upper bound is the
counterpart VaR for holding a short position.
15
10
5
0
S&p 500
-5
Long-VaR
Short-VaR
-10
-15
-20
33
2000
1998
1996
1994
1992
1990
1988
1986
-25
TABLE VI: Out of Sample Backtests
Tests are based on 99% VaR. Bern refers to the one-tailed Bernoulli backtest and the reported frequency is the number of exceptions. TR refers to the
one-tailed backtest of VaR based on tail losses and the reported TR statistic equals -0.33887 (0.33887) under the null hypothesis for downside (upside)
risk. Notations * and ** are significance levels at 5% and 1% respectively.
Downside risk
N-GARCH
Upside risk
SKST-APARCH
GARJI
N-GARCH
SKST-APARCH
GARJI
Year
N
Bern
TR
Bern
TR
Bern
TR
Bern
TR
Bern
TR
Bern
TR
1986
1987
1988
1989
1990
1991
253
253
253
252
253
253
5
7*
4
4
5
3
-3.05**
-5.97**
-1.91**
-3.41**
-0.73
-1.49**
5
5
3
2
3
2
-2.03**
-2.04**
-0.76
-0.89
-0.20
-0.57
4
3
3
1
1
1
-1.43**
-1.20*
-1.04
-1.13*
-0.08
-0.57
4
3
5
2
1
4
0.36
0.55
1.18*
0.47
0.33
1.06*
1
2
5
1
1
4
0.16
0.27
0.42
0.16
0.12
0.49
1
1
2
1
1
3
0.08
0.03
0.53
0.16
0.17
0.31
1992
1993
1994
1995
1996
1997
1998
254
253
252
252
254
253
252
1
2
4
2
7*
5
7*
-0.10
-0.68
-0.83
-0.33
-1.91**
-2.71**
-2.33**
1
2
2
2
3
4
5
-0.02
-0.31
-0.31
-0.17
-0.50
-0.57
-0.57
0
2
1
2
4
3
4
0.00
-0.23
-0.25
-0.11
-0.49
-0.64
-0.46
0
1
0
1
0
3
3
0.00
0.01
0.00
0.14
0.00
0.38
0.19
0
1
0
1
0
3
0
0.00
0.01
0.00
0.13
0.00
0.16
0.00
0
2
0
4
0
3
3
0.00
0.14
0.00
0.21
0.00
0.25
0.34
1999
2000
252
252
4
5
-1.27**
-1.18*
1
2
-0.37
-0.31
1
3
-0.01
-0.54
2
7*
0.25
1.56**
1
4
0.02
0.24
1
3
0.03
0.46
34