Download Backtesting trading risk of commercial banks using expected shortfall

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Available online at www.sciencedirect.com
Journal of Banking & Finance 32 (2008) 1404–1415
www.elsevier.com/locate/jbf
Backtesting trading risk of commercial banks using expected shortfall q
Woon K. Wong *,1
Investment Management Research Unit, Cardiff Business School, Aberconway Buidling, Colum Drive, Cardiff CF10 3EU, United Kingdom
Received 8 January 2007; accepted 19 November 2007
Available online 19 February 2008
Abstract
This paper uses saddlepoint technique to backtest the trading risk of commercial banks using expected shortfall. It is found that four
out of six US commercial banks have excessive trading risks. Monte Carlo simulation studies show that the proposed backtest is very
accurate and powerful even for small test samples. More importantly, risk managers can carry out the proposed backtest based on any
number of exceptions, so that incorrect risk models can be promptly detected before any further huge losses are realized.
Ó 2007 Elsevier B.V. All rights reserved.
JEL classification: G32
Keywords: Value-at-Risk; Expected shortfall; Backtesting; Saddlepoint technique
1. Introduction
Since Value-at-Risk (VaR) was sanctioned by the Basle
Committee in 1996 for market risk capital requirement
through the so called internal models, VaR has become
the standard measure for financial market risk. Major commercial banks and other financial entities such as hedge
funds disclose VaR on a regular basis. Also, risk management consulting firms and software systems have been set
up to meet the demand generated by the VaR-based risk
management. Despite the popularity of VaR, many commercial banks and other financial institutions suffered
severe trading losses during the fall of 1998. This prompted
a report by the BIS Committee on the Global Financial
System in 1999, in which it is remarked that ‘‘last autumn’s
q
This paper was reviewed and accepted while Prof. Giorgio Szego was
the Managing Editor of The Journal of Banking and Finance and by the
past Editorial Board.
*
Tel.: +44 29 20875079; fax: +44 29 20874419.
E-mail address: wongwk3@cardiff.ac.uk
1
This paper was partially completed when the author was with the
Department of Banking and Finance in Tamkang University, Taiwan.
0378-4266/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.jbankfin.2007.11.012
events were in the ‘tails’ of distributions and that VaR models
were useless for measuring and monitoring market risk.”
Such risk in the use of VaR is termed the ‘‘tail risk” by
the literature; see Yamai and Yoshiba (2005). Given the
role of commercial banks as principal dealers in the ever
growing over-the-counter derivatives markets, their trading
accounts have grown rapidly and become progressively
more complex. Though most of them are hedged, during
times of extreme market movements, correlations between
various derivatives and their hedged counterparts may
break down, resulting in the failure of VaR models in measuring and monitoring market risk; see for example Jorion
(2000). On the other hand, Longin and Solnik (2001) and
Campbell et al. (2002) provide evidence of increased correlation in bear markets, with reduced diversification and
more heavy capital losses. Thus it is important to use a risk
measure that takes into account the extreme losses beyond
VaR, and expected shortfall (ES) proposed by Artzner
et al. (1997, 1999) is an alternative risk measure that can
remedy this shortcoming of VaR.
Essentially, VaR is a quantile measure which, at a given
a confidence level, describes the loss that can occur over a
given period due to exposure to market risk. ES is the
W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415
expected loss conditional on the loss being above the VaR
level. Despite the fact that ES is a coherent risk measure
whereas VaR is not subadditive,2 the former is absent in
Basel II. One main reason for this is that the backtest of
ES is harder than that of VaR; see Kerkhof and Melenberg
(2004, footnote 8). For instance, the test statistics of existing ES backtests, namely the censored Gaussian approach
of Berkowitz (2001) and the functional delta approach of
Kerkhof and Melenberg (2004), rely on large sample for
convergence to the limiting distributions. This is undesirable because test samples tend to be small in practice and
regulation requires backtest to be carried out on past 250
observations. In this paper, I propose to use saddlepoint
or small sample asymptotic technique to compute under
the null hypothesis the required p-value based on sample
expected shortfall. The advantage of the saddlepoint technique is that the required p-value can be calculated accurately for any given number of exceptions3 under the null
hypothesis. This is of great significance because the Capital
Accord stipulates VaR at 99% level, which implies that
exceptions rarely occur in practice. Monte Carlo simulation study confirms that the proposed technique is not only
very accurate but also very powerful even for small
samples.
Current VaR backtesting as stipulated by the Basle
Committee focuses on checking number of violations of
the VaR level. While it is advantageous for such backtesting procedure to be applicable to any distribution, it
ignores valuable information conveyed by the sizes of
losses exceeding the VaR level, and thus lacks statistical
test power. Indeed, the report by the Basle Committee
for Banking Supervision (1996b, page 5) acknowledges
that, ‘‘tests of this type are limited in their power to distinguish an accurate model from an inaccurate model. . .This
limitation has been a prominent consideration in the design
of the framework presented here, and should also be prominent among the considerations of national supervisors in
interpreting the results of a bank’s backtesting program.”
Therefore, various methods of ES backtesting, noticeably
by Berkowitz (2001) and Kerkhof and Melenberg (2004),
have been proposed. Though parametric assumptions for
the null distributions are made in the ES backtests, considerable powers have been gained.4 The proposed backtest of ES using saddlepoint technique, however, have
several advantages over these two existing approaches.
First, as mentioned above, both existing ES backtests rely
2
A coherent risk measure is subadditive (sum of individual risks cannot
be smaller than risk of sums), homogenous (larger positions are associated
with greater risk), monotonous (systematic lower returns imply greater
risk), and translation invariant (investing in risk-free asset will lower the
risk of portfolio). See, for example, Artzner et al. (1997, 1999) and Szego
(2002) for more details.
3
In this paper, an exception always means an exception or exceedance
of VaR. For brevity, similar presumption is also made for ‘violation’ and
‘breach’ of VaR.
4
For a distribution-free test, readers are referred to Section 4.4.3 in
McNeil et al. (2005).
1405
on asymptotic test statistics that might be inaccurate
when sample size is small. If the Basle Committee were
to employ the backtest of ES to determine the risk capital
requirement, such lack of accuracy might be unacceptable
because banks may be penalized based on incorrect inference. Second, not only Monte Carlo simulation study
shows that the saddlepoint backtest has correct size, it
is also more powerful than the two existing ES backtests.
Finally, perhaps the most important of all from a practical risk management view point, the proposed saddlepoint
technique allows one to compute the required p-value
conditional on the number of exceptions instead of the
size of the full sample. This means that it is possible to
detect failure of a risk model based on just one or two
exceptions before any more data are observed. This is
extremely useful since prompt action is often required in
order to avert extreme financial losses due to market risk.
The proposed backtest of ES is used to test for non-normal or excessive trading risk at the six commercial banks
studied by Berkowitz and O’Brien (2002). For the sample
banks, internal VaR is calculated using structural model
which takes into consideration all positions in the trading
portfolio. Since the saddlepoint technique is conditional
on number of realized exceptions but not on sample size,
ES backtests based on the bank’s internal VaR model are
carried out for the full sample period from January 1998
to March 2000 as well as for the three months sub-sample
period from August 1998 to October 1998 (when the financial markets were highly volatile). In both cases, four out of
six commercial banks’ trading accounts are found to have
sample ES that are significantly larger than a normal null
hypothesis would suggest. This serves as a warning for risk
modeling based on normal distribution which is quite common in practice; see for example Lotz and Stahl (2005).
This paper is organized as follows: Section 2 introduces
the sample ES statistic for backtesting and the saddlepoint
technique to calculate the p-value of the proposed backtest
is derived in Section 3. Section 4 presents the Monte Carlo
simulation results whereas Section 5 applies the proposed
backtest to the trading accounts at six large banks in US.
Section 6 briefly illustrates how the proposed ES backtesting could help risk management to be more responsive.
Finally, Section 7 concludes.
2. Backtesting expected shortfall for tail risk
In this section, some preliminary definitions of VaR and
ES are first provided, followed by the definition of the sample ES statistic to be used in backtesting. Difficulties of
existing methods for backtesting ES and the motivation
for the use of small sample asymptotic technique are
explained.
2.1. Some preliminary definitions
Let R be the random profit or loss of a portfolio over a
holding period. For simplicity, R is assumed to have a
1406
W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415
cumulative distribution function (CDF) that is absolutely
continuous. Then for 0 < a < 1, VaR is defined as the maximum potential loss realized by R over the holding horizon
at (1a) confidence level. If F(r) is the CDF of R, then a
formal definition of VaR can be written as
VaR ¼ qðaÞ ¼ F 1 ðaÞ:
That is, VaR may be regarded as the negative of left
a-quantile of the distribution, which will be simply denoted
as q from now onwards. Let f(r) denotes the probability
density function (PDF) of R. ES at (1a) confidence level
is then defined by
ES ¼ EðRjR < qÞ ¼ a1
Z
q
rf ðrÞ dr:
1
ES is appealing in that it sums all values of r, weighted by
f(r), from minus infinity to q, thus takes into account of the
sizes of losses beyond the VaR level. Economically it tells
us the average of losses when VaR is violated. The idea
of ES for risk management first appears in Rappoport
(1993) in the financial industry. It is formally proposed
by Artzner et al. (1997) as a coherent risk measure.
2.2. Sample statistic for backtesting expected shortfall
Now I shall consider the sample ES statistic which will
be used for backtesting. Given a sample R1, . . . , RT with
n > 0 exceptions, I use the simple average as an estimator
of the true ES:
ESn ¼ n
1X
RðiÞ ;
n i¼1
ð1Þ
where for i = 1, . . . , T1, R(i) is the order statistic such
that R(i) 6 R(i+1), and the subscript n denotes the fact that
the sample expected shortfall is based on n observations.
Given a sample of size T, one may set n = [Ta]5 if ES is
to be estimated ex post using a fixed number of exceedances. However, such definition is inconsistent with the
current VaR backtesting procedure stipulated by the Basle
Committee in which an exception is realized if the loss is
larger than the associated VaR forecast. Since VaR as a
measure for risk is already a common usage, it would
be sensible for backtesting purpose to measure ES as
the mean of exceptions in accordance to the Capital
Accord.
Therefore n in (1) refers to the number of VaR violations
that varies from sample to sample. n is moreover a small
number since the Basle Committee sets a = 0.01 as the
day-to-day smaller fluctuations are of less concern. Backtest of ES is thus rendered to be difficult by the fact that
a breach of VaR is rare in practice. Though the functional
5
[x] means the largest integer number less than or equal to x.
delta approach6 advocated by Kerkhof and Melenberg
(2004) improves somewhat the convergence speed of (1)
to its limiting distribution, backtests of risk models are
often required to be carried out on small samples. This
would be undesirable if the Basle Committee were to use
the backtest of ES to determine the risk-based capital
requirement. Raising the multiplication factor based on
inaccurate p-value will not be fair to the banks.
Similar small sample problem also occurs in the case of
the censored Gaussian likelihood ratio approach7 suggested by Berkowitz (2001). Moreover, this approach is a
two-sided test of joint hypothesis that the mean and variance conform to that of a standard normal distribution.
Since the null hypothesis could be rejected on the ground
that the observed tail is too thin or there is too much capital to cover the risk, (being a two-tailed test) it is not suitable for regulatory backtesting. This is because financial
fluctuations by their nature are very complicated and it is
likely that a true and perfect model can never be found.
While a risk manager may not know the perfect model, it
is feasible that she can always allow for possible inadequacy in her risk model by marking up the risk forecast.
Such practice is evident from Berkowitz and O’Brien
(2002) who observe that banks’ VaR forecasts are conservative. This is also in line with the one-tailed definition
and the corresponding backtesting procedure of VaR by
the Capital Accord; see Basle Committee on Banking
Supervision (1996a).
Unlike the functional delta and censored Gaussian
approaches, the proposed saddlepoint technique makes
use of small sample asymptotic distribution that works reasonably well even for n = 1.8 The key idea is that under
fairly general conditions, a full expansion of the true density of (1) can be derived by means of steepest decent and
thus the name saddlepoint; see Daniels (1954). Monte Carlo simulation study in Section 4 shows that for any realized
number of exceptions n > 0, the saddlepoint technique can
accurately calculate the p-value of the backtest of ES. This
has very important implication: in backtesting for possible
breakdown of a risk model, a decision can be formally
based on as little as one or two exceptions.
6
The ES estimator given by (1) may be regarded as a ‘‘functional” of the
‘‘statistic” which refers to the given sample distribution. In the same way
that Taylor’s expansion can be used to establish asymptotic normality of a
differentiable function of statistics that are asymptotically normal, the
functional delta approach can be used to establish the asymptotic
normality of (1) since the sample distribution converges to the true
distribution as sample size tends to infinity; see Shao (1998, chapter 5).
7
In this approach, all losses that are smaller than VaR are set equal to
VaR, thus give rise to a censored distribution.
8
When n = 1, a precise CDF can actually be obtained simply by using
the conditional distribution function; see the first paragraph of Section 4.1
for the case of normal distribution.
W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415
Corollary 1. The mean of X, which is negative of ES, is given
by
3. Saddlepoint technique
Since the Basle Committee sets the chance of extreme
loss at 1% for market risk, the number of exceptions in a
year is small and usual asymptotic result approximates
the true distribution poorly. This section adopts the saddlepoint technique to derive the small sample asymptotic distribution for the sample ES statistic given by (1) under a
standard normal null hypothesis. Monte-Carlo simulations
in Section 4 show that the technique performs very well and
the proposed backtest of ES has good power in detecting
non-normal trading risk or incorrect risk models.
3.1. The ES random variable and its moments
Let R be the portfolio return which has a standard normal distribution with CDF and PDF denoted by U and /,
respectively. From now onwards, unless stated otherwise, q
refers to the a-quantile of a standard normal distribution.
That is q = U1(a). Define a random variable X such that
P ðX 6 xÞ ¼ P ðR 6 xjR < qÞ ¼ P ðR 6 xÞ=P ðR < qÞ;
ð2Þ
where x < q. Thus X refers to the portfolio return conditional on R < q and the theoretical expected shortfall of
R at confidence level 1 a is given by E(X). Given a
sample X1, . . . , Xn of exceedances above VaR, the sample
expected shortfall can be written as
n
1X
X i:
ESn ¼ X ¼ n i¼1
The PDF of X is given by
f ðxÞ ¼ a1 /ðxÞ;
and the corresponding CDF by
F ðxÞ ¼ a1 UðxÞ;
where x 2 (1, q). Now consider the following results on
the moments of f(x) that shall be useful for the rest of this
section.
Proposition 1. Let X be a continuous random variable with
density f(x) = a1/(x), x 2 (1, q). The moment generating
function of X is then given by
MðtÞ ¼ a
1
2
expðt =2Þ Uðq tÞ;
ð3Þ
and its derivatives with respect to t are given by
M 0 ðtÞ ¼ t MðtÞ expðqtÞ a1 /ðqÞ;
M 00 ðtÞ ¼ t M 0 ðtÞ þ MðtÞ expðqtÞ qa1 /ðqÞ;
M ðmÞ ðtÞ ¼ t M ðm1Þ ðtÞ þ ðm 1ÞM ðm2Þ ðtÞ
ð4Þ
expðqtÞ qm1 a1 /ðqÞ;
where m P 3.
Proof. See Appendix
1407
2
lX ¼ EðX Þ ¼ /ðqÞ
eq =2
¼ pffiffiffiffiffiffi ;
a
a 2p
ð5Þ
and its variance is
2
r2X ¼ varðX Þ ¼ 1 q /ðqÞ
qeq =2
l2X ¼ 1 pffiffiffiffiffiffi l2X :
a
a 2p
ð6Þ
Proof. See Appendix h
pffiffiffi
Though nr1
X ðX lX Þ is asymptotically standard normal,
the asymptotic result is hardly valid for sample sizes one
would encounter in practice. In particular, a is set by the
Basle Committee as 0.01 and n equals to about 2 or 3 for
a year’s data. For this reason, I resort to the Lugannani
and Rice (1980) formula derived by the saddlepoint technique in order to provide an accurate method to compute
under the null hypothesis the required p-value for the backtest of ES.
3.2. The Lugannani and Rice formula
Using the saddlepoint technique, Lugannani and Rice
(1980) provide an elegant formula that can accurately
calculate the tail probability (or CDF) of the sample
mean of an independently and identically distributed
(IID) random sample X1, . . . , Xn from a continuous
distribution F with density f. To have an intuitive understanding of the method, we can look at the inversion
formula:
Z 1
n
fX ðxÞ ¼
en½KðitÞitx dt;
2p 1
where fX ðxÞ denotes the PDF of the sample mean and
K(t) = lnM(t) is the cumulant generating function of f(x).
Then the tail probability can be written as
Z sþi1
Z q
1
P ðX > xÞ ¼
fX ðuÞ du ¼
en½KðtÞtx t1 dt;
ð7Þ
2pi si1
x
where s is chosen to be the saddlepoint - satisfying
K 0 ð-Þ ¼ x:
ð8Þ
By using appropriate transformation, the elegant Lugannani and Rice formula can be obtained, which is given below without proof.9
Proposition 2. Given an IID standard normal random sample
R1, . . . , RT with n exceptions, construct X1, . . . , Xn according
to (2). Let - be
the saddlepoint
such that (8)
is satisfied, and
ffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
define g ¼ - nK 00 ð-Þ and 1 ¼ sgnð-Þ 2nð-x Kð-ÞÞ,
where sgn (s) equals to zero when s = 0, or takes the value
h
As a corollary of Proposition 1, the mean and variance
of X can be obtained easily.
9
Readers are referred to Daniels (1987) for proof of the Lugannani and
Rice formula.
1408
W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415
1.1
1
n=1
0.9
n=2
0.8
n=5
Normal
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4
-3
-2
-1
0
1
2
3
4
Fig. 1. Saddlepoint approximations The graphs below are the CDFs calculated by saddlepoint technique for sample sizes 1, 2 and 5. As a benchmark for
comparison, the standard normal CDF is also depicted. It can be seen that as sample size gets large, the CDF of the normalized test statistic tends to that
of a standard normal.
of 1(1) when s < 0(s > 0). Then the tail probability of less
than or equal to the sample mean x 6¼ lX 10 is given by
(
Uð1Þ /ð1Þ 1g 11 þ Oðn3=2 Þ for x < q;
P ðX 6 xÞ ¼
:
1
for x P q:
ð9Þ
To obtain the tail probability, the saddlepoint - is first
obtained by solving for t in the following equation11
K 0 ðtÞ ¼
M 0 ðtÞ
/ðq tÞ
¼ t expðt2 =2Þ
¼ x:
MðtÞ
Uðq tÞ
ð10Þ
Next, (3) and (4) are used to calculate M(-) and its first
two derivatives at t = -, which in turn are used to evaluate
00
K(-) and K (-). Then it is straight forward to calculate g
and B defined in Proposition 2, and the required
probability.
Fig. 1 depicts the theoretical CDF of the standardized
statistic z ¼ n1=2 r1
x lX Þ calculated using the Lugannani
X ð
and Rice formula for various n. Since the formula is valid
only when the sample mean is less than q, the CDF is
assigned a value of 1 when z > n1=2 r1
X ðq lX Þ. As predicted by the central limit theorem, the finite sample size
CDF approaches the standard normal CDF when n gets
large.
10
Though the case of x ¼ lX is hardly encountered in practice, the
corresponding tail probability is provided for completeness:
3.3. Hypothesis testing
Given n realized exceptions, one can compute the sample expected shortfall ESn ¼ X and carry out a one-tailed
backtest to check whether the sample ES is too large. The
null and alternative hypotheses can be written as
H 0 : ESn ¼ ES0 versus H 1 : ESn > ES0 ;
where ES0 denotes theoretical expected shortfall under the
null hypothesis. Under the standard normal null hypothesis
with a = 0.01, ES0 = lX = 2.6652. The p-value of the
hypothesis test is simply given by the Lugannani and Rice
formula as
p-value ¼ P ðX 6 xÞ:
ð11Þ
The null hypothesis is rejected if ESn is significantly larger
than 2.6652, the value of ES0 under the standard normal
assumption. In this case, the risk model may be described
as failing to ‘‘capture” the risk or simply there is excessive
risk in the empirical distribution. On the other hand, if the
null hypothesis is accepted, one may say that the risk model
provides sufficient risk coverage.12 Finally, it is trivial that
when there is no exception, that is n = 0, the null hypothesis is accepted.
Strictly speaking, above hypothesis formulation tests
whether a given sample of exceedances above VaR is consistent with the tail of distribution under the null hypothesis, which in our case refers to the standard normal
distribution. It cannot be over emphasized the fact that it
is possible to have non-normal distributions with values
of expected shortfall less than those of standard normal
distribution.
1
K ð3Þ ð0Þ
P ðX 6 lX Þ ¼ þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ Oðn3=2 Þ:
2
6 2pn½K 00 ð0Þ3
11
The saddlepoint - is unique; see Daniels (1987).
12
Strictly speaking, if the null hypothesis is accepted, it means that there
is no evidence suggesting insufficient risk coverage.
W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415
3.4. Non-normal null hypothesis
In this paper, backtest of ES is carried out under the null
hypothesis that the underlying distribution is normal. This
is because normality is widely assumed in practice13 and, if
the null hypothesis is a non-normal distribution, one can
always follows the suggestion of Berkowitz (2001) and
Kerkhof and Melenberg (2004) to transform the distribution into a normal one. The idea is that if the observed
returns are fatter-tailed than the model presumes, the
transformed data will be also fatter-tailed than the standard normal hypothesis. It must be emphasized that, however, the saddlepoint technique can also be applied directly
to non-normal null hypothesis. Consider the case of a random variable X with absolutely continuous PDF f(x) and
CDF F(x). Then the moment generating function is given
by
Z q
MðtÞ ¼ a expðqtÞ t
F ðxÞ expðtxÞ dx;
1
and its first two derivatives are
Z q
M 0 ðtÞ ¼ aq expðqtÞ F ðxÞ½expðtxÞð1 þ xtÞ dx;
1
Z
q
M 00 ðtÞ ¼ aq2 expðqtÞ F ðxÞ½expðtxÞð2 þ xtÞx dx:
1
Once the saddlepoint value is obtained using (8), the moment’s derivatives can be evaluated. Then it is straight forward for the required p-value to be computed using the
Lugannani and Rice formula described in Section 3.2.
4. Monte Carlo simulations
In this section, Monte Carlo simulation experiments are
carried out to investigate the empirical test sizes and powers of various backtests. Only 99% confidence level VaR
and ES are considered. IID samples of various sizes under
the null and various alternative hypotheses are generated
and the corresponding p-values are calculated. 100,000 simulations are carried out to calculate the empirical test sizes
and Bernoulli likelihood ratio tests are applied to check if
they differ significantly from the theoretical test sizes. For
powers of test, only rejection rates at 5% significance level
based on 10,000 simulations are reported. It is found that
the proposed backtest using the saddlepoint technique
yields very accurate test sizes and has the most power in
rejecting the false models.
4.1. Empirical test sizes of ES backtest using saddlepoint
technique
Here I first investigate the empirical test sizes of ES
backtesting using saddlepoint technique for number of
13
Due to the working of central limit theorem, trading account’s returns
at the bank level are often approximated by a normal distribution in
practice; see Lotz and Stahl (2005).
1409
exceptions n = 1, 2, 5 and 10 under the standard normal
null hypothesis. For each simulation, rejection is made if
the p-value calculated using (9) is less than the various significance levels ranging from 0.5% to 5% on both sides of
the distribution.14 Panel A of Table 1 depicts the rejection
rates for both tails based on 100,000 simulations. Bernoulli
likelihood ratio tests are carried out and a indicates that the
empirical test size differs significantly at 1% level from its
theoretical value. It can be seen that only the empirical test
sizes on the far right tails show incorrect rejection rates
when n = 1. This is nevertheless unimportant for two reasons. First, backtest for the purpose of risk management
concerns only on the left tail. Second, for n = 1, one can
easily obtain precise CDF by using the conditional distribution function a1U(x) under the null hypothesis. From
now onwards, when n = 1, the corresponding p and critical
value will be calculated using the conditional distribution
function.
Since the number of exceedances varies from sample to
sample in practice, Panel B lists the empirical test sizes for
standard normal samples of sizes ranging from 125 to 1000.
Only those samples with at least one exception are considered. For example, for sample of size T = 250, first simulation yields 3 exceptions and the required p-value is
calculated using (9) with n = 3. Second simulation yields
2 exceptions and the required p-value is obtained with
n = 2. If the third simulation does not produce any exception, no p-value is calculated and the simulation is abandoned. When 100,000 simulations are completed, the
empirical test size is obtained by dividing the number of
rejections by the number of nonzero-exception simulations.15 The accuracy of the saddlepoint technique is again
demonstrated by the results in Panel B.
Finally, Panel C considers the case when the number of
exceptions is not known (for example, a risk report that
discloses only the mean violations but not the number of
exceptions). In this case, the p-value can be calculated as
1
X
P ðX 6 xÞ ¼ Bð0jT Þ þ
P ðX 6 xjnÞ BðnjT Þ;
n¼1
where n is the number of exceptions and B(jT) is the Binomial probability density function with 0.01 probability of
success and T trials. As can be observed from Panel C,
the empirical test sizes differ from their true values only
at the right tail when T is small. This is because for sample
sizes T = 125, 250, 500 and 1000, B(0jT) = 0.285, 0.081,
0.007 and 4.32 105, respectively. Hence as long as the
theoretical test size is larger than the corresponding probability value when n = 0, the empirical test sizes are very
close to the theoretical ones.
14
Though only the left tail of the distribution is required for the
backtesting of risk model, right tail is also included to illustrate the good
approximation achieved by the saddlepoint technique.
15
For 99% confidence level (a = 0.01), the probability of getting a zeroexception sample is 0.99T where T is sample size.
1410
W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415
Table 1
Empirical test sizes
Left tail significance levels
Right tail significance levels
Panel A: given a fixed number of VaR violations
n
0.005
0.01
1
0.50
0.99
2
0.51
1.03
5
0.54
1.01
10
0.49
1.01
0.025
2.46
2.47
2.50
2.51
0.05
4.86
4.97
4.99
5.02
0.005
2.83a
0.53
0.51
0.51
0.01
3.00a
1.00
1.02
1.03
0.025
3.26a
2.46
2.55
2.56
0.05
4.84
4.99
5.03
5.08
Panel B: given realized number of VaR violations, n
T
0.005
0.01
125
0.50
1.01
250
0.54
1.07
500
0.50
1.02
1000
0.52
1.01
0.025
2.44
2.54
2.52
2.51
0.05
4.98
4.98
5.03
5.04
0.005
0.48
0.49
0.47
0.52
0.01
1.02
1.02
0.95
0.98
0.025
2.48
2.52
2.48
2.47
0.05
4.91
4.96
4.96
4.91
Panel C: given a fixed number of observations T, n unknown
T
0.005
0.01
0.025
125
0.51
1.01
2.44
250
0.51
1.00
2.50
500
0.54
1.06
2.58
1000
0.49
1.02
2.52
0.05
4.84
5.03
5.01
5.02
0.005
29.5a
8.84a
0.76a
0.49
0.01
29.5a
8.84a
1.00
0.99
0.025
29.5a
8.84a
2.50
2.44
0.05
29.5a
8.85a
4.98
4.92
This table reports the empirical test sizes of expected shortfall backtests based on the Lugannani and Rice (1980) formula using 100,000 Monte Carlo
simulations. The expected shortfalls are calculated based on 99%VaR. Panel A reports the empirical test sizes given a fixed number of violations n = 1, 2, 5
and 10. For various sample sizes T, Panel B presents the observed test sizes when the realized number of violations is known (but varies from sample to
sample) whereas Panel C provides the empirical rejection rates when the number of exceptions is not known.
a
Empirical test size differs significantly from its theoretical size at 1% level.
4.2. Empirical test sizes and powers of other backtests
Empirical test sizes of other backtests are first investigated. This is followed by study of their test powers if the
samples are distributed as IID Student t, normal inverse
Gaussian (NIG) or follow the GARCH process. ES-SDP,
ES-Delta and ES-Censor refer to the backtests of ES using
the saddlepoint, functional delta and censored Gaussian
methods, respectively. These are size-based backtests
whereas Bernoulli16 and Kupiec (see Kupiec (1995)) are frequency-based backtests. Same as above, a denotes empirical
test size significantly differs from theoretical level at 1% level.
In Panel A of Table 2, standard normal samples are simulated 100,000 times and the rejection is based at 5% significance level. It can be seen that except for ES-SDP, all
other backtests show poor convergence to the true test size.
While the reason for the poor convergence of the Bernoulli
backtest lies with discreteness of its test statistic (which
counts the number of exceptions), all other backtests rely
on asymptotic statistics that converge to the limiting distribution poorly when sample size is small. Such poor
approximation is accentuated by the fact that a is set at
0.01 by Basle II since extreme risk is the primary concern.
For the study of test powers, 10,000 simulations are used
and the rejection rates reported in Panels B, C and D of
Table 2 are all based on 5% significance level. In practice,
financial time series are found to exhibit excess kurtosis,
have longer left tails and time-varying conditional variances. I use three non-normal processes to replicate this
behavior. First, Panel B considers the case of Student t
distribution with 5 degrees of freedom. Student t distribution is well known for its heavy tails but is symmetric.
Next in Panel C, the IID NIG samples have excess kurtosis
and are skewed to the left. The parameters of the NIG
distribution are a = 0.8, b = 0.05, d = a(1 q2)3/2 and
l = qd(1 q2)1/2 where q = b/a.17 Finally in Panel D,
the time-varying conditional variance of financial returns
are simulated by a GARCH(1,1) process Rt = rtZt with
volatility equation given by r2t ¼ 0:05 þ 0:15R2t1 þ 0:8r2t1
and Zt distributed as IID N(0, 1). Generally speaking,
the ES-based backtests are more powerful in detecting
incorrect risk models, and amongst the ES-backtests, the
saddlepoint technique is the most powerful. Though the
Kupiec’s likelihood ratio test is the most powerful in
detecting non-normal tail risk when the underlying process
is GARCH(1,1) at T = 250, the test has large empirical test
size equals 9.61%. Finally, it is observed that in comparison
16
Given a sample of size T with b losses exceeding the magnitude of
VaR, the p-value of the one-tailed Bernoulli test is given by
P(B P b) = 1 P(B < b) where B is distributed as Binomial(T, a). Some
articles uses alternative p-value definition given by 1 P(B 6 b). However,
this definition is not adopted by this paper because, say for B Binomial
(250, 0.01) and b = 5, the p-value given by 1 P(B 6 5) = 0.0412 is
significant at 5% level but P(B = 5) = 0.0666.
17
Let g(z) = a[d2 + (z l)2]1/2, then the density of NIG(a, b, d, l) is
given by fNIG(z) = (pg(z))1ad exp{d (a2 b2)1/2 + b(z l)} K1(ag(z)),
where K1(x) is the modified Bessel function of the third kind. See, for
example, Eberlein et al. (1998).
W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415
Table 2
Empirical test sizes and powers of other backtests
T
ES-SDP
ES-Delta
ES-Censor
Panel A: standard normal (test size in %)
125
4.98
5.16
3.07a
a
250
4.98
6.47
3.71a
500
5.03
3.95a
5.44a
a
1000
5.04
4.16
5.32a
Panel B: student with 5 degree of freedom (%)
125
34.1
21.8
24.9
250
52.4
34.8
41.1
500
73.0
56.3
63.4
1000
91.7
81.1
86.8
Panel C: standardized NIG with a = 0.8, b = 0.05
125
48.1
29.2
39.6
250
70.0
47.8
62.2
500
89.3
74.4
85.9
1000
98.7
94.0
98.2
1411
Table 3
Estimation risk, test sizes and powers
Bernoulli
3.72a
4.14a
3.10a
4.74a
Kupiec
3.83a
9.61a
7.20a
5.54a
P
ES-SDP
ES-Delta
ES-Censor
Panel A: standard normal (test size in %)
4.10a
3.83a
125
6.96a
a
a
250
5.87
3.63
3.69a
500
5.43a
3.34a
3.72a
a
1000
5.15
3.18
3.70a
12.2
17.5
22.6
43.6
12.2
06.6
20.9
33.2
Panel B: student with 5 degree of freedom (%)
125
57.7
41.7
49.0
250
55.9
38.7
45.1
500
54.7
38.2
43.2
1000
54.7
37.8
42.4
(%)
23.1
37.4
52.7
83.1
23.1
19.3
50.5
75.7
Panel C: standardized
125
75.3
250
73.3
500
71.9
1000
70.8
15.8
28.4
31.1
35.8
Panel D: normal GARCH with x = 0.05, a = 0.15,
125
28.6
14.5
26.6
250
31.0
14.9
27.3
500
31.9
14.9
27.1
1000
30.8
14.3
24.4
Panel D: normal GARCH with x = 0.05, a = 0.15, b = 0.8 (%)
125
15.3
06.1
15.4
15.8
250
25.0
12.8
22.8
20.3
500
38.3
26.4
38.3
23.8
1000
58.1
44.2
54.8
34.9
This table reports the empirical test powers at 5% significance level of
various backtests based on 99% VaR. In Panel A, standard normal samples are simulated 100,000 times and the empirical test sizes are reported.
Panels B, C and D provide the size-adjusted test powers against nonnormal distributions based on 10,000 simulations. ES-SDP, ES-Delta and
ES-Censor refer to the ES backtests using the saddlepoint, functional delta
and censored normal methods, respectively.
a
Empirical test size differs significantly from its theoretical value at 1%
level.
to the other two non-normal alternatives, the powers
against the GARCH process are relatively low.
4.3. Estimation risk
The Monte Carlo simulation studies so far assume that
the parameters of forecasted distributions are free from
estimation errors. In practice, this is of course not true.
Therefore, the effects of estimation risk on the test sizes
and powers of various backtests are studied here. Specifically, sample mean and standard deviation computed from
estimation sample of size P are used to standardize testing
sample under the standard normal null hypothesis. Without loss of generality, only test sample size of 250 is considered. For the estimation, sample sizes of 125, 250, 500 and
1000 are used to compute the standard normal parameters,
mean and standard deviation, on a rolling window basis.
For example, estimation sample R1, . . . , RP are used to
compute the sample mean and standard deviation, denoted
^P and r
^P , respectively. One-step-ahead testing observaas l
b P þ1 ¼ r
b
^1
^P Þ. The
tion RP+1 is standardized as Z
P ð R P þ1 l
b P þ2 is obtained in
next standardized testing observation Z
the same way using rolling sample R2, . . . , RP+1. This process is repeated until the required full testing sample is constructed. Table 3 provides the empirical test sizes and
NIG with a = 0.8, b = 0.05
54.7
73.1
52.8
67.9
50.9
65.4
49.0
63.3
Bernoulli
4.37a
4.20a
4.45a
4.19a
Kupiec
5.90a
7.57a
8.82a
9.25a
21.4
18.9
19.1
18.0
09.4
07.6
07.5
07.0
(%)
45.1
41.3
39.5
38.4
26.0
22.4
20.6
20.1
b = 0.8 (%)
25.7
26.1
25.1
22.5
18.0
22.3
27.1
28.1
The empirical test sizes and powers at 5% significance level of various
backtests based on test samples of 250 observations are reported below.
These tests take into account of estimation risk. Specifically, sample mean
and standard deviation computed from estimation sample of size P are
used to standardize testing sample under the standard normal null
hypothesis. In Panel A, standard normal samples are simulated 100,000
times and the empirical test sizes are reported. Panels B, C and D provide
the size-adjusted test powers against non-normal distributions based on
10,000 simulations. ES-SDP, ES-Delta, ES-Censor refer to the ES backtests using the saddlepoint, functional delta and censored normal methods,
respectively.
a
Empirical test size differs significantly from its theoretical value at 1%
level.
powers at 5% significance level of various backtests based
on 99% VaR. It is obvious from the results in Panel A that
small estimation sample results in over-sized distortion in
the proposed saddlepoint technique and under-sized effect
in the other two ES backtesting methods. When estimation
sample increases to 1000, the saddlepoint technique gives
correct size but the under-sized problems are marginally
more severe for the other two ES backtests. Size-corrected
test powers are presented in the rest of the table. Overall,
test powers decrease with estimation sample, suggesting
that small sample estimation risk results in higher test powers even after size-adjustments (see Table 4).
5. Application of ES backtesting for non-normal trading risk
As an example of applying the saddlepoint backtest, I
apply in this section the ES backtesting to the profit and loss
(P&L) of trading accounts at six large banks studied by
Berkowitz and O’Brien (2002). Given the proprietary nature
of the banks’ P&L and VaR data, they are not easily available. However, the proposed backtest is easy to implement
1412
W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415
Table 4
Backtests of bank VaRs
No. of
observationsa
No. of
exceptionsa
Mean
violationa
p-Value
Panel A: full sample from January 1998 to March 2000
Bank 1
569
3
4.446
Bank 2
581
6
3.067
Bank 3
585
3
5.506
Bank 4
573
0
NA
Bank 5
746
1
3.101
Bank 6
586
3
8.166
<0.0001
0.0050
<0.0001
NA
0.0964
<0.0001
Panel B: sub sample from August 1998 to October 1998
Bank 1
63
3
4.446
Bank 2
64
5
3.188
Bank 3
65
3
5.506
Bank 4
63
0
NA
Bank 5
65
1
3.101
Bank 6
65
2
10.316
<0.0001
0.0018
<0.0001
NA
0.0964
<0.0001
This table applies the proposed backtest of ES to the daily profits and loss
data of 6 large commercial banks in US. The full sample period (Panel A)
covers from January 1998 through March 2000 whereas the sub-sample
(Panel B) refers to the financial turbulent period from August to October
1998. The daily data are standardized to have unit variance for confidentiality. Mean violation refers to the average of the losses that exceed
the VaR calculated by the bank’s structural model. Except for the case of
n = 1, the reported p-values are obtained according to (9). For both full
and sub-samples, the results of backtests show that four out of six banks
have non-normal or excessive trading risk.
a
Data source is from Berkowitz and O’Brien (2002). Mean violations
are represented by positive values here.
as it only requires the number of exceptions and the mean
violation of the banks’ trading books which are reported
by Berkowitz and O’Brien. Using frequency-based backtests, they find both the bank’s internal VaR model and the
reduced form normal GARCH18 model provide sufficient
coverage of trading risks. It is nevertheless valuable for risk
managers to investigate statistically if the exceptions exhibit
non-normal or excessive tail risk. Indeed, the proposed
backtests reveal that four out of six banks have excessive
trading risk.
5.1. Banks’ trading P&L and internal VaR
Some descriptions on the banks’ trading P&L and the
associated VaR are provided here. Readers are referred
to Berkowitz and O’Brien (2002) for more detailed information. All of the six banks are large multinational banks
that meet the Basle ‘‘large trader” criteria and their trading
activities include trading in interest rate, foreign exchange,
equity, commodity and derivatives. To manage market
risks, large scale structural models are developed which
measure and aggregate trading risks in current positions
at a highly detailed level. Given their roles as principal
dealers in the ever growing over-the-counter derivatives
markets, their trading accounts have grown rapidly and
18
The normal GARCH model used by Berkowitz and O’Brien (2002) is
an ARMA(2)-GARCH(1,1) model with IID standard normal innovations.
become progressively more complex. Thus the structural
models can have exposures to several hundred thousand
market risk factors, with individual positions numbering
in the millions.19 Therefore, approximations are employed
to reduce computational burdens and overcome estimation
hurdles in order to turn out daily VaR for trading risk. It is
required by regulation that the daily VaR estimates are
maintained by the banks for the purposed of forecast evaluation or ‘‘backtesting”. Because of their proprietary nature, there has been little empirical study of the P& L of
trading firms and the associated VaR forecasts.
5.2. Backtest of bank’s internal VaR model using expected
shortfall
Some assumptions are first made regarding the bank’s
trading P&L. Let Rt denotes the bank’s trading returns at
day t and it is assumed that Rt N(lt, r). So the bank’s
trading returns is allowed to have time varying expected
mean due to daily changes in the composition of the portfolio, but its variance is assumed to be constant. Though lt
and r are not known, the reported mean violation is calculated in excess of VaR and the bank’s P&L are standardized to have unit variance in order to protect
confidentiality. Hence, I can easily obtain the sample ES
as ESn ¼ xe þ z0:01 , where xe is the mean violation in excess
of VaR given by Berkowitz and O’Brien (2002) and
z0.01 = 2.326 is the absolute value of 1% normal quantile.
Table 3 lists for the six US large banks the number of
observations, the number of exceptions, the mean violation
(or the sample ES) and the corresponding p-value calculated using (9). Panel A provides the backtest results for
the full sample period from January 1998 to March 2000.
To illustrate how the p-value depends on the number of
exceptions, consider for instance the case of Bank 2 which
has six losses exceeded its internal VaR. According to (9)
with n = 6, the corresponding p-value is evaluated as
0.0050. Thus I infer at 1% significance level that there is
excessive trading risk for Bank 2.20 As another example,
the mean violation of Bank 5 is 3.101 based on one exception. Since in this case, the corresponding p-value is 0.0964,
the null hypothesis is accepted and the risk model is said to
‘‘cover” the risk as measured by ES, the average of tail
losses. As can be seen from the backtest results in Panel
A, four out of six banks exhibit excessive or fatter-thannormal market risks at 1% significance level. This is consistent with the observation by Berkowitz and O’Brien that
the banks’ internal VaRs are conservative; but when exceptions occur, they tend to exceed the VaR models
drastically.
19
See for example, page 71 of 2004 Annual Report of JPMorgan Chase
& Co.
20
Alternatively, the sample ES of 3.067 is said to exceed at 1 percent
significance level the theoretical ES, which equals to 2.6652 under the null
hypothesis.
W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415
Berkowitz and O’Brien also find most of the exceptions
took place between August and October of 1998, a turbulent period for financial markets.21 They provide the relevant data on mean violation and number of exceptions,
but do not carry out backtests on them during this period.
Backtests that rely on large sample asymptotic statistic may
not be applied here because the sample is too small for reliable inference. The advantage of the small sample asymptotic technique becomes obvious in this situation.
Regardless of the length of observations, the proposed
backtest of ES can be applied as long as there are exceptions. It can be seen from the backtest results presented
in Panel B that banks that fail the backtest during the full
sample also fail the test here. I therefore infer that the
excessive trading risks come from this period of extreme
market movements.
6. More responsive risk management decision
In this section, I provide a simple illustration of how the
proposed ES backtesting can make risk management decision more responsive. In the study by Berkowitz and
O’Brien (2002), a normal GARCH(1,1) is used as an alternative to the bank’s VaR model. Since the time series
model attempts no accounting for changes in portfolio
composition, the bank’s structural model should in principal deliver superior VaR forecasts. Berkowitz and O’Brien
find, however, the normal GARCH model permits comparable risk coverage with lower VaRs, that is, less regulatory
capital. They attribute this to the ability of the time series
model to capture trend and time varying volatility in a
bank’s P&L. Since time varying volatility as specified by
a normal GARCH also generates fat tails, it is interesting
to see how effective the model is in accounting for the
non-normal trading risk in the equity markets. Specifically,
I fit the following AR(1)-GARCH(1,1) risk model to S&P
500 daily index returns:
Rt ¼ lt þ et ¼ l þ qRt1 þ rt Z t ;
rt ¼ x þ ae2t1 þ brt1 ;
where Rt is the S&P 500 return at day t and Zt is IID standard
normal process. Conditional on information available on
t-1, one day ahead VaR is forecasted as VaRt ¼
^t . For n > 0 realized exceptions up toP
2:33^
rt l
time t, sample
n b
expected shortfall is computed as ESn ¼ n1 i¼1 Z
t , where
b
^1
^t Þ.
the standardized exception is given by Z t ¼ r
ðR
t l
t
Since the AR(1)-GARCH(1,1) risk model assumes IID standard normal innovation process Zt, the null hypothesis will
hold if the risk model fully describes the S&P 500 return
dynamics.
To see how the saddlepoint ES backtesting can help risk
management decision responsive, for the sake of illustration, suppose that we start the risk forecast from 1 August
21
Between August and October of 1998, there were the devaluation of
the Russian ruble, Russian debt default and the near-collapse of LTCM.
1413
4.00
Mean violation
5% critical value
1% critical value
3.75
3.50
3.25
3.00
2.75
2.50
1 19980825
2 19980827
3 19981125
4 19990319
5 19990716
Fig. 2. Prompt decision based on backtest of ES This figure depicts
sample ES or mean violation since 1 August 1998. The associated 5% and
1% critical values are also plotted. To illustrate how the proposed ES
backtest can be a basis for prompt decision, it is supposed that the
exception on 25 August 1998 is the first to occur since 12 months ago. The
first exception lies between the 5% and 1% critical values. When the second
exception occurs, the mean violation falls below 1% critical value and
remains so for the rest of the year. In this example, the risk model is
rejected at 5% significance level when the first exception occurs; the
rejection is at 1% significance level from second exception onwards.
1998.22 As it turns out, there are 5 exceptions in the following twelve months ending on 31 July 1999. Fig. 2 depicts
the series of mean violations (or sample ES) with first
exception realized on 25 August 1998. For the first exception, n = 1 and the sample ES lies below the 5% critical
value and is thus significant at 5% level. When the second
exception occurs, n = 2 and the mean violation becomes
significant at 1% level, an even stronger message for the
normal GARCH model to be rejected. Therefore, the proposed ES backtesting facilitates prompt risk management
response because accurate critical values can be calculated
for any number of exceptions.
7. Conclusion
Berkowitz and O’Brien (2002) find that banks’ internal
VaR models tend to be conservative but the losses are huge
when exceptions occur. This motivates backtest of bank’s
trading risk using expected shortfall (ES). The backtest of
ES, however, is difficult to implement since the existing censored Gaussian and functional delta methods rely on
asymptotic test statistics that converge poorly for small
samples. The proposed saddlepoint backtest employs small
sample asymptotic technique, which Monte Carlo simulation shows is highly accurate and powerful even when sample size is small. Conditional on the reported number of
exceptions and mean violation for the period from January
22
3,000 observations prior to 1 August 1998 are first used to estimate the
risk model and ten one-step-ahead VaRs are forecasted. The estimation
window is rolled forward ten trading days and a new set of model
parameters are obtained, which will in turn be used to forecast one-step
ahead VaR for the next ten trading days. The process is continued until 31
July 1999 is reached.
1414
W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415
1998 to March 2000, I find four out of the six banks studied
by Berkowitz and O’Brien have excessive market risk.
Moreover, since the proposed backtest depends on the
number of exceptions but not the sample size, the saddlepoint backtest can also be applied to the three months
sub-sample period from August to October 1998 when
financial markets were extremely volatile. It is found that
banks that have significant non-normal market risk in the
full sample also fail the sub-sample backtests, suggesting
that most of the detected tail risks can be attributed to
the volatile financial markets during this period.
The proposed saddlepoint technique can compute under
the null hypothesis the required p-value accurately conditional on any number of exceptions. This has two important implications. First, when there are huge losses, even
though there is just one or two of them, the risk model
can be evaluated on a formal statistical basis. This enables
prompt action to be taken by the risk managers or regulators if necessary. Second, the proposed backtest can be
used alongside the existing VaR measure. Since VaR has
been sanctioned by the Basle Committee and various central banking authorities, it is now the standard risk measure. Hence it is useful for backtesting purpose to
calculate ES based on the losses that exceed the forecasted
VaR. This means the number of exceptions will vary form
sample to sample, which poses no problem for the proposed backtest using the small sample asymptotic
technique.
In short, risk management process is about communicating various aspects of risks effectively and condensing
complex risk factors into simple, understandable terms.
To this end, VaR has demonstrated to be a successful
risk measure. However, being a quantile statistic, VaR
ignores valuable information contained in the sizes of
losses beyond the VaR level. By averaging the sizes of
exceptions, ES communicates the degree of non-normal
or tail risk in a simple way to investors, risk managers
and regulators. The shortcoming that holds back ES is
its backtesting, especially when sample size is small. This
paper overcomes this shortcoming by using the saddlepoint technique to compute under the null hypothesis
the required p-value of the test statistic. As the Basle
Committee on Banking Supervision (1996b, p. 2) recognizes the fact that ‘‘the techniques for risk measurement
and backtesting are still evolving, and the Committee is
committed to incorporating important new developments
in these areas into its framework,” it is hoped that the
proposed ES backtesting will help raise the standard of
future risk management.
Acknowledgment
The author would like to thank participants for suggestions and comments at The 5th NTU International Conference on Economics, Finance and Accounting–Financial
Institutions and Capital Market, Taipei, Taiwan, May
23–24, 2007.
Appendix
Proof of Proposition 1. Consider the moment generating
function
Z q
etx f ðxÞ dx;
MðtÞ ¼ Eðetx Þ ¼
1
which, after substituting a1/(x) for f(x) in the integral
term, collect the exponential terms and simplify, will give
rise to (3). For the derivatives of the moment generating
function, first denote the differential operator dtd by Dt.
Now using the standard results that DtU(t) = /(t) and
Dt exp (t2/2) = t exp(t2/2), the first derivative of the moment generating function can be written as
M 0 ðtÞ ¼ t MðtÞ expðt2 =2Þ a1 /ðq tÞ;
¼ t MðtÞ expðqtÞ a1 /ðqÞ:
Second and higher order derivatives of the moment generating function given by (4) follow straight away from differentiating above result. 0
00
Proof of Corollary 1. Letting t = 0 in M (t) and M (t) from
(4) will give rise to E(X) and E(X2), respectively. It is
easy to obtain of variance of X since var(X) = E(X2) [E(X)]2. References
Artzner, P., Delbaen, F., Eber, J.M., Heath, D., 1997. Thinking
coherently. Risk 10 (11), 68–71.
Artzner, P., Delbaen, F., Eber, J.M., Heath, D., 1999. Coherent measures
of risk. Mathematical Finance 9 (3), 203–228.
Basle Committee on Banking Supervision, 1996a. Amendment to the
Capital Accord to incorporate market risk.
Basle Committee on Banking Supervision, 1996b. Supervisory Framework
for the use of ‘‘Backtesting” in conjunction with the Internal Models
Approach to Market Risk Capital Requirements.
Berkowitz, J., 2001. Testing density forecasts, with applications to risk
management. Journal of Business and Economic Statistics 19, 465–
474.
Berkowitz, J., O’Brien, J., 2002. How accurate are Value-at-Risk models
at commercial banks? Journal of Finance 57, 1093–1111.
BIS Committee on the Global Financial System, 1999. A Review of
Financial Market Events in Autumn 1998, October.
Campbell, R., Koedijk, K., Kofman, P., 2002. Increased correlation in
bear markets. Financial Analysts Journal 58, 87–94.
Daniels, H.E., 1954. Saddlepoint approximations in statistics. Annals of
Mathematical Statistics 25, 631–650.
Daniels, H.E., 1987. Tail probability approximations. International
Statistical Review 55, 37–48.
Eberlein, E., Keller, U., Prause, K., 1998. New insights into smile,
mispricing and value at risk: the hyperbolic model. Journal of Business
71, 371–406.
Jorion, P., 2000. Risk management lessons from Long-Term Capital
Management. European Financial Management 6, 277–300.
Kerkhof, J., Melenberg, B., 2004. Backtesting for risk-based regulatory
capital. Journal of Banking and Finance 28, 1845–1865.
Kupiec, P.H., 1995. Techniques for verifying the accuracy of risk
measurement models. Journal of Derivatives 3, 73–84.
Longin, F., Solnik, B., 2001. Extreme correlation of international equity
markets. Journal of Finance 56, 649–676.
W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415
Lotz, C., Stahl, G., 2005. Why value at risk holds its own? Euromoney,
February 2005, London.
Lugannani, R., Rice, S.O., 1980. Saddlepoint approximation for the
distribution of the sum of independent random variables. Advanced
Applied Probability 12, 475–490.
McNeil, A., Frey, R., Embrechts, P., 2005. Quantitative Risk Management:
Concepts, Techniques and Tools. Princeton University Press, Princeton.
1415
Rappoport, P., 1993. A new approach: Average shortfall. J.P. Morgan
Fixed Income Research Technical Document.
Shao, J., 1998. Mathematical Statistics. Springer-Verlag, New York.
Szego, G., 2002. Measures of risk. Journal of Banking and Finance 26,
1253–1272.
Yamai, Y., Yoshiba, T., 2005. Value-at-risk versus expected shortfall: A
practical perspective. Journal of Banking and Finance 29, 997–1101.