Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Available online at www.sciencedirect.com Journal of Banking & Finance 32 (2008) 1404–1415 www.elsevier.com/locate/jbf Backtesting trading risk of commercial banks using expected shortfall q Woon K. Wong *,1 Investment Management Research Unit, Cardiff Business School, Aberconway Buidling, Colum Drive, Cardiff CF10 3EU, United Kingdom Received 8 January 2007; accepted 19 November 2007 Available online 19 February 2008 Abstract This paper uses saddlepoint technique to backtest the trading risk of commercial banks using expected shortfall. It is found that four out of six US commercial banks have excessive trading risks. Monte Carlo simulation studies show that the proposed backtest is very accurate and powerful even for small test samples. More importantly, risk managers can carry out the proposed backtest based on any number of exceptions, so that incorrect risk models can be promptly detected before any further huge losses are realized. Ó 2007 Elsevier B.V. All rights reserved. JEL classification: G32 Keywords: Value-at-Risk; Expected shortfall; Backtesting; Saddlepoint technique 1. Introduction Since Value-at-Risk (VaR) was sanctioned by the Basle Committee in 1996 for market risk capital requirement through the so called internal models, VaR has become the standard measure for financial market risk. Major commercial banks and other financial entities such as hedge funds disclose VaR on a regular basis. Also, risk management consulting firms and software systems have been set up to meet the demand generated by the VaR-based risk management. Despite the popularity of VaR, many commercial banks and other financial institutions suffered severe trading losses during the fall of 1998. This prompted a report by the BIS Committee on the Global Financial System in 1999, in which it is remarked that ‘‘last autumn’s q This paper was reviewed and accepted while Prof. Giorgio Szego was the Managing Editor of The Journal of Banking and Finance and by the past Editorial Board. * Tel.: +44 29 20875079; fax: +44 29 20874419. E-mail address: wongwk3@cardiff.ac.uk 1 This paper was partially completed when the author was with the Department of Banking and Finance in Tamkang University, Taiwan. 0378-4266/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jbankfin.2007.11.012 events were in the ‘tails’ of distributions and that VaR models were useless for measuring and monitoring market risk.” Such risk in the use of VaR is termed the ‘‘tail risk” by the literature; see Yamai and Yoshiba (2005). Given the role of commercial banks as principal dealers in the ever growing over-the-counter derivatives markets, their trading accounts have grown rapidly and become progressively more complex. Though most of them are hedged, during times of extreme market movements, correlations between various derivatives and their hedged counterparts may break down, resulting in the failure of VaR models in measuring and monitoring market risk; see for example Jorion (2000). On the other hand, Longin and Solnik (2001) and Campbell et al. (2002) provide evidence of increased correlation in bear markets, with reduced diversification and more heavy capital losses. Thus it is important to use a risk measure that takes into account the extreme losses beyond VaR, and expected shortfall (ES) proposed by Artzner et al. (1997, 1999) is an alternative risk measure that can remedy this shortcoming of VaR. Essentially, VaR is a quantile measure which, at a given a confidence level, describes the loss that can occur over a given period due to exposure to market risk. ES is the W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415 expected loss conditional on the loss being above the VaR level. Despite the fact that ES is a coherent risk measure whereas VaR is not subadditive,2 the former is absent in Basel II. One main reason for this is that the backtest of ES is harder than that of VaR; see Kerkhof and Melenberg (2004, footnote 8). For instance, the test statistics of existing ES backtests, namely the censored Gaussian approach of Berkowitz (2001) and the functional delta approach of Kerkhof and Melenberg (2004), rely on large sample for convergence to the limiting distributions. This is undesirable because test samples tend to be small in practice and regulation requires backtest to be carried out on past 250 observations. In this paper, I propose to use saddlepoint or small sample asymptotic technique to compute under the null hypothesis the required p-value based on sample expected shortfall. The advantage of the saddlepoint technique is that the required p-value can be calculated accurately for any given number of exceptions3 under the null hypothesis. This is of great significance because the Capital Accord stipulates VaR at 99% level, which implies that exceptions rarely occur in practice. Monte Carlo simulation study confirms that the proposed technique is not only very accurate but also very powerful even for small samples. Current VaR backtesting as stipulated by the Basle Committee focuses on checking number of violations of the VaR level. While it is advantageous for such backtesting procedure to be applicable to any distribution, it ignores valuable information conveyed by the sizes of losses exceeding the VaR level, and thus lacks statistical test power. Indeed, the report by the Basle Committee for Banking Supervision (1996b, page 5) acknowledges that, ‘‘tests of this type are limited in their power to distinguish an accurate model from an inaccurate model. . .This limitation has been a prominent consideration in the design of the framework presented here, and should also be prominent among the considerations of national supervisors in interpreting the results of a bank’s backtesting program.” Therefore, various methods of ES backtesting, noticeably by Berkowitz (2001) and Kerkhof and Melenberg (2004), have been proposed. Though parametric assumptions for the null distributions are made in the ES backtests, considerable powers have been gained.4 The proposed backtest of ES using saddlepoint technique, however, have several advantages over these two existing approaches. First, as mentioned above, both existing ES backtests rely 2 A coherent risk measure is subadditive (sum of individual risks cannot be smaller than risk of sums), homogenous (larger positions are associated with greater risk), monotonous (systematic lower returns imply greater risk), and translation invariant (investing in risk-free asset will lower the risk of portfolio). See, for example, Artzner et al. (1997, 1999) and Szego (2002) for more details. 3 In this paper, an exception always means an exception or exceedance of VaR. For brevity, similar presumption is also made for ‘violation’ and ‘breach’ of VaR. 4 For a distribution-free test, readers are referred to Section 4.4.3 in McNeil et al. (2005). 1405 on asymptotic test statistics that might be inaccurate when sample size is small. If the Basle Committee were to employ the backtest of ES to determine the risk capital requirement, such lack of accuracy might be unacceptable because banks may be penalized based on incorrect inference. Second, not only Monte Carlo simulation study shows that the saddlepoint backtest has correct size, it is also more powerful than the two existing ES backtests. Finally, perhaps the most important of all from a practical risk management view point, the proposed saddlepoint technique allows one to compute the required p-value conditional on the number of exceptions instead of the size of the full sample. This means that it is possible to detect failure of a risk model based on just one or two exceptions before any more data are observed. This is extremely useful since prompt action is often required in order to avert extreme financial losses due to market risk. The proposed backtest of ES is used to test for non-normal or excessive trading risk at the six commercial banks studied by Berkowitz and O’Brien (2002). For the sample banks, internal VaR is calculated using structural model which takes into consideration all positions in the trading portfolio. Since the saddlepoint technique is conditional on number of realized exceptions but not on sample size, ES backtests based on the bank’s internal VaR model are carried out for the full sample period from January 1998 to March 2000 as well as for the three months sub-sample period from August 1998 to October 1998 (when the financial markets were highly volatile). In both cases, four out of six commercial banks’ trading accounts are found to have sample ES that are significantly larger than a normal null hypothesis would suggest. This serves as a warning for risk modeling based on normal distribution which is quite common in practice; see for example Lotz and Stahl (2005). This paper is organized as follows: Section 2 introduces the sample ES statistic for backtesting and the saddlepoint technique to calculate the p-value of the proposed backtest is derived in Section 3. Section 4 presents the Monte Carlo simulation results whereas Section 5 applies the proposed backtest to the trading accounts at six large banks in US. Section 6 briefly illustrates how the proposed ES backtesting could help risk management to be more responsive. Finally, Section 7 concludes. 2. Backtesting expected shortfall for tail risk In this section, some preliminary definitions of VaR and ES are first provided, followed by the definition of the sample ES statistic to be used in backtesting. Difficulties of existing methods for backtesting ES and the motivation for the use of small sample asymptotic technique are explained. 2.1. Some preliminary definitions Let R be the random profit or loss of a portfolio over a holding period. For simplicity, R is assumed to have a 1406 W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415 cumulative distribution function (CDF) that is absolutely continuous. Then for 0 < a < 1, VaR is defined as the maximum potential loss realized by R over the holding horizon at (1a) confidence level. If F(r) is the CDF of R, then a formal definition of VaR can be written as VaR ¼ qðaÞ ¼ F 1 ðaÞ: That is, VaR may be regarded as the negative of left a-quantile of the distribution, which will be simply denoted as q from now onwards. Let f(r) denotes the probability density function (PDF) of R. ES at (1a) confidence level is then defined by ES ¼ EðRjR < qÞ ¼ a1 Z q rf ðrÞ dr: 1 ES is appealing in that it sums all values of r, weighted by f(r), from minus infinity to q, thus takes into account of the sizes of losses beyond the VaR level. Economically it tells us the average of losses when VaR is violated. The idea of ES for risk management first appears in Rappoport (1993) in the financial industry. It is formally proposed by Artzner et al. (1997) as a coherent risk measure. 2.2. Sample statistic for backtesting expected shortfall Now I shall consider the sample ES statistic which will be used for backtesting. Given a sample R1, . . . , RT with n > 0 exceptions, I use the simple average as an estimator of the true ES: ESn ¼ n 1X RðiÞ ; n i¼1 ð1Þ where for i = 1, . . . , T1, R(i) is the order statistic such that R(i) 6 R(i+1), and the subscript n denotes the fact that the sample expected shortfall is based on n observations. Given a sample of size T, one may set n = [Ta]5 if ES is to be estimated ex post using a fixed number of exceedances. However, such definition is inconsistent with the current VaR backtesting procedure stipulated by the Basle Committee in which an exception is realized if the loss is larger than the associated VaR forecast. Since VaR as a measure for risk is already a common usage, it would be sensible for backtesting purpose to measure ES as the mean of exceptions in accordance to the Capital Accord. Therefore n in (1) refers to the number of VaR violations that varies from sample to sample. n is moreover a small number since the Basle Committee sets a = 0.01 as the day-to-day smaller fluctuations are of less concern. Backtest of ES is thus rendered to be difficult by the fact that a breach of VaR is rare in practice. Though the functional 5 [x] means the largest integer number less than or equal to x. delta approach6 advocated by Kerkhof and Melenberg (2004) improves somewhat the convergence speed of (1) to its limiting distribution, backtests of risk models are often required to be carried out on small samples. This would be undesirable if the Basle Committee were to use the backtest of ES to determine the risk-based capital requirement. Raising the multiplication factor based on inaccurate p-value will not be fair to the banks. Similar small sample problem also occurs in the case of the censored Gaussian likelihood ratio approach7 suggested by Berkowitz (2001). Moreover, this approach is a two-sided test of joint hypothesis that the mean and variance conform to that of a standard normal distribution. Since the null hypothesis could be rejected on the ground that the observed tail is too thin or there is too much capital to cover the risk, (being a two-tailed test) it is not suitable for regulatory backtesting. This is because financial fluctuations by their nature are very complicated and it is likely that a true and perfect model can never be found. While a risk manager may not know the perfect model, it is feasible that she can always allow for possible inadequacy in her risk model by marking up the risk forecast. Such practice is evident from Berkowitz and O’Brien (2002) who observe that banks’ VaR forecasts are conservative. This is also in line with the one-tailed definition and the corresponding backtesting procedure of VaR by the Capital Accord; see Basle Committee on Banking Supervision (1996a). Unlike the functional delta and censored Gaussian approaches, the proposed saddlepoint technique makes use of small sample asymptotic distribution that works reasonably well even for n = 1.8 The key idea is that under fairly general conditions, a full expansion of the true density of (1) can be derived by means of steepest decent and thus the name saddlepoint; see Daniels (1954). Monte Carlo simulation study in Section 4 shows that for any realized number of exceptions n > 0, the saddlepoint technique can accurately calculate the p-value of the backtest of ES. This has very important implication: in backtesting for possible breakdown of a risk model, a decision can be formally based on as little as one or two exceptions. 6 The ES estimator given by (1) may be regarded as a ‘‘functional” of the ‘‘statistic” which refers to the given sample distribution. In the same way that Taylor’s expansion can be used to establish asymptotic normality of a differentiable function of statistics that are asymptotically normal, the functional delta approach can be used to establish the asymptotic normality of (1) since the sample distribution converges to the true distribution as sample size tends to infinity; see Shao (1998, chapter 5). 7 In this approach, all losses that are smaller than VaR are set equal to VaR, thus give rise to a censored distribution. 8 When n = 1, a precise CDF can actually be obtained simply by using the conditional distribution function; see the first paragraph of Section 4.1 for the case of normal distribution. W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415 Corollary 1. The mean of X, which is negative of ES, is given by 3. Saddlepoint technique Since the Basle Committee sets the chance of extreme loss at 1% for market risk, the number of exceptions in a year is small and usual asymptotic result approximates the true distribution poorly. This section adopts the saddlepoint technique to derive the small sample asymptotic distribution for the sample ES statistic given by (1) under a standard normal null hypothesis. Monte-Carlo simulations in Section 4 show that the technique performs very well and the proposed backtest of ES has good power in detecting non-normal trading risk or incorrect risk models. 3.1. The ES random variable and its moments Let R be the portfolio return which has a standard normal distribution with CDF and PDF denoted by U and /, respectively. From now onwards, unless stated otherwise, q refers to the a-quantile of a standard normal distribution. That is q = U1(a). Define a random variable X such that P ðX 6 xÞ ¼ P ðR 6 xjR < qÞ ¼ P ðR 6 xÞ=P ðR < qÞ; ð2Þ where x < q. Thus X refers to the portfolio return conditional on R < q and the theoretical expected shortfall of R at confidence level 1 a is given by E(X). Given a sample X1, . . . , Xn of exceedances above VaR, the sample expected shortfall can be written as n 1X X i: ESn ¼ X ¼ n i¼1 The PDF of X is given by f ðxÞ ¼ a1 /ðxÞ; and the corresponding CDF by F ðxÞ ¼ a1 UðxÞ; where x 2 (1, q). Now consider the following results on the moments of f(x) that shall be useful for the rest of this section. Proposition 1. Let X be a continuous random variable with density f(x) = a1/(x), x 2 (1, q). The moment generating function of X is then given by MðtÞ ¼ a 1 2 expðt =2Þ Uðq tÞ; ð3Þ and its derivatives with respect to t are given by M 0 ðtÞ ¼ t MðtÞ expðqtÞ a1 /ðqÞ; M 00 ðtÞ ¼ t M 0 ðtÞ þ MðtÞ expðqtÞ qa1 /ðqÞ; M ðmÞ ðtÞ ¼ t M ðm1Þ ðtÞ þ ðm 1ÞM ðm2Þ ðtÞ ð4Þ expðqtÞ qm1 a1 /ðqÞ; where m P 3. Proof. See Appendix 1407 2 lX ¼ EðX Þ ¼ /ðqÞ eq =2 ¼ pffiffiffiffiffiffi ; a a 2p ð5Þ and its variance is 2 r2X ¼ varðX Þ ¼ 1 q /ðqÞ qeq =2 l2X ¼ 1 pffiffiffiffiffiffi l2X : a a 2p ð6Þ Proof. See Appendix h pffiffiffi Though nr1 X ðX lX Þ is asymptotically standard normal, the asymptotic result is hardly valid for sample sizes one would encounter in practice. In particular, a is set by the Basle Committee as 0.01 and n equals to about 2 or 3 for a year’s data. For this reason, I resort to the Lugannani and Rice (1980) formula derived by the saddlepoint technique in order to provide an accurate method to compute under the null hypothesis the required p-value for the backtest of ES. 3.2. The Lugannani and Rice formula Using the saddlepoint technique, Lugannani and Rice (1980) provide an elegant formula that can accurately calculate the tail probability (or CDF) of the sample mean of an independently and identically distributed (IID) random sample X1, . . . , Xn from a continuous distribution F with density f. To have an intuitive understanding of the method, we can look at the inversion formula: Z 1 n fX ðxÞ ¼ en½KðitÞitx dt; 2p 1 where fX ðxÞ denotes the PDF of the sample mean and K(t) = lnM(t) is the cumulant generating function of f(x). Then the tail probability can be written as Z sþi1 Z q 1 P ðX > xÞ ¼ fX ðuÞ du ¼ en½KðtÞtx t1 dt; ð7Þ 2pi si1 x where s is chosen to be the saddlepoint - satisfying K 0 ð-Þ ¼ x: ð8Þ By using appropriate transformation, the elegant Lugannani and Rice formula can be obtained, which is given below without proof.9 Proposition 2. Given an IID standard normal random sample R1, . . . , RT with n exceptions, construct X1, . . . , Xn according to (2). Let - be the saddlepoint such that (8) is satisfied, and ffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi define g ¼ - nK 00 ð-Þ and 1 ¼ sgnð-Þ 2nð-x Kð-ÞÞ, where sgn (s) equals to zero when s = 0, or takes the value h As a corollary of Proposition 1, the mean and variance of X can be obtained easily. 9 Readers are referred to Daniels (1987) for proof of the Lugannani and Rice formula. 1408 W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415 1.1 1 n=1 0.9 n=2 0.8 n=5 Normal 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -4 -3 -2 -1 0 1 2 3 4 Fig. 1. Saddlepoint approximations The graphs below are the CDFs calculated by saddlepoint technique for sample sizes 1, 2 and 5. As a benchmark for comparison, the standard normal CDF is also depicted. It can be seen that as sample size gets large, the CDF of the normalized test statistic tends to that of a standard normal. of 1(1) when s < 0(s > 0). Then the tail probability of less than or equal to the sample mean x 6¼ lX 10 is given by ( Uð1Þ /ð1Þ 1g 11 þ Oðn3=2 Þ for x < q; P ðX 6 xÞ ¼ : 1 for x P q: ð9Þ To obtain the tail probability, the saddlepoint - is first obtained by solving for t in the following equation11 K 0 ðtÞ ¼ M 0 ðtÞ /ðq tÞ ¼ t expðt2 =2Þ ¼ x: MðtÞ Uðq tÞ ð10Þ Next, (3) and (4) are used to calculate M(-) and its first two derivatives at t = -, which in turn are used to evaluate 00 K(-) and K (-). Then it is straight forward to calculate g and B defined in Proposition 2, and the required probability. Fig. 1 depicts the theoretical CDF of the standardized statistic z ¼ n1=2 r1 x lX Þ calculated using the Lugannani X ð and Rice formula for various n. Since the formula is valid only when the sample mean is less than q, the CDF is assigned a value of 1 when z > n1=2 r1 X ðq lX Þ. As predicted by the central limit theorem, the finite sample size CDF approaches the standard normal CDF when n gets large. 10 Though the case of x ¼ lX is hardly encountered in practice, the corresponding tail probability is provided for completeness: 3.3. Hypothesis testing Given n realized exceptions, one can compute the sample expected shortfall ESn ¼ X and carry out a one-tailed backtest to check whether the sample ES is too large. The null and alternative hypotheses can be written as H 0 : ESn ¼ ES0 versus H 1 : ESn > ES0 ; where ES0 denotes theoretical expected shortfall under the null hypothesis. Under the standard normal null hypothesis with a = 0.01, ES0 = lX = 2.6652. The p-value of the hypothesis test is simply given by the Lugannani and Rice formula as p-value ¼ P ðX 6 xÞ: ð11Þ The null hypothesis is rejected if ESn is significantly larger than 2.6652, the value of ES0 under the standard normal assumption. In this case, the risk model may be described as failing to ‘‘capture” the risk or simply there is excessive risk in the empirical distribution. On the other hand, if the null hypothesis is accepted, one may say that the risk model provides sufficient risk coverage.12 Finally, it is trivial that when there is no exception, that is n = 0, the null hypothesis is accepted. Strictly speaking, above hypothesis formulation tests whether a given sample of exceedances above VaR is consistent with the tail of distribution under the null hypothesis, which in our case refers to the standard normal distribution. It cannot be over emphasized the fact that it is possible to have non-normal distributions with values of expected shortfall less than those of standard normal distribution. 1 K ð3Þ ð0Þ P ðX 6 lX Þ ¼ þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ Oðn3=2 Þ: 2 6 2pn½K 00 ð0Þ3 11 The saddlepoint - is unique; see Daniels (1987). 12 Strictly speaking, if the null hypothesis is accepted, it means that there is no evidence suggesting insufficient risk coverage. W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415 3.4. Non-normal null hypothesis In this paper, backtest of ES is carried out under the null hypothesis that the underlying distribution is normal. This is because normality is widely assumed in practice13 and, if the null hypothesis is a non-normal distribution, one can always follows the suggestion of Berkowitz (2001) and Kerkhof and Melenberg (2004) to transform the distribution into a normal one. The idea is that if the observed returns are fatter-tailed than the model presumes, the transformed data will be also fatter-tailed than the standard normal hypothesis. It must be emphasized that, however, the saddlepoint technique can also be applied directly to non-normal null hypothesis. Consider the case of a random variable X with absolutely continuous PDF f(x) and CDF F(x). Then the moment generating function is given by Z q MðtÞ ¼ a expðqtÞ t F ðxÞ expðtxÞ dx; 1 and its first two derivatives are Z q M 0 ðtÞ ¼ aq expðqtÞ F ðxÞ½expðtxÞð1 þ xtÞ dx; 1 Z q M 00 ðtÞ ¼ aq2 expðqtÞ F ðxÞ½expðtxÞð2 þ xtÞx dx: 1 Once the saddlepoint value is obtained using (8), the moment’s derivatives can be evaluated. Then it is straight forward for the required p-value to be computed using the Lugannani and Rice formula described in Section 3.2. 4. Monte Carlo simulations In this section, Monte Carlo simulation experiments are carried out to investigate the empirical test sizes and powers of various backtests. Only 99% confidence level VaR and ES are considered. IID samples of various sizes under the null and various alternative hypotheses are generated and the corresponding p-values are calculated. 100,000 simulations are carried out to calculate the empirical test sizes and Bernoulli likelihood ratio tests are applied to check if they differ significantly from the theoretical test sizes. For powers of test, only rejection rates at 5% significance level based on 10,000 simulations are reported. It is found that the proposed backtest using the saddlepoint technique yields very accurate test sizes and has the most power in rejecting the false models. 4.1. Empirical test sizes of ES backtest using saddlepoint technique Here I first investigate the empirical test sizes of ES backtesting using saddlepoint technique for number of 13 Due to the working of central limit theorem, trading account’s returns at the bank level are often approximated by a normal distribution in practice; see Lotz and Stahl (2005). 1409 exceptions n = 1, 2, 5 and 10 under the standard normal null hypothesis. For each simulation, rejection is made if the p-value calculated using (9) is less than the various significance levels ranging from 0.5% to 5% on both sides of the distribution.14 Panel A of Table 1 depicts the rejection rates for both tails based on 100,000 simulations. Bernoulli likelihood ratio tests are carried out and a indicates that the empirical test size differs significantly at 1% level from its theoretical value. It can be seen that only the empirical test sizes on the far right tails show incorrect rejection rates when n = 1. This is nevertheless unimportant for two reasons. First, backtest for the purpose of risk management concerns only on the left tail. Second, for n = 1, one can easily obtain precise CDF by using the conditional distribution function a1U(x) under the null hypothesis. From now onwards, when n = 1, the corresponding p and critical value will be calculated using the conditional distribution function. Since the number of exceedances varies from sample to sample in practice, Panel B lists the empirical test sizes for standard normal samples of sizes ranging from 125 to 1000. Only those samples with at least one exception are considered. For example, for sample of size T = 250, first simulation yields 3 exceptions and the required p-value is calculated using (9) with n = 3. Second simulation yields 2 exceptions and the required p-value is obtained with n = 2. If the third simulation does not produce any exception, no p-value is calculated and the simulation is abandoned. When 100,000 simulations are completed, the empirical test size is obtained by dividing the number of rejections by the number of nonzero-exception simulations.15 The accuracy of the saddlepoint technique is again demonstrated by the results in Panel B. Finally, Panel C considers the case when the number of exceptions is not known (for example, a risk report that discloses only the mean violations but not the number of exceptions). In this case, the p-value can be calculated as 1 X P ðX 6 xÞ ¼ Bð0jT Þ þ P ðX 6 xjnÞ BðnjT Þ; n¼1 where n is the number of exceptions and B(jT) is the Binomial probability density function with 0.01 probability of success and T trials. As can be observed from Panel C, the empirical test sizes differ from their true values only at the right tail when T is small. This is because for sample sizes T = 125, 250, 500 and 1000, B(0jT) = 0.285, 0.081, 0.007 and 4.32 105, respectively. Hence as long as the theoretical test size is larger than the corresponding probability value when n = 0, the empirical test sizes are very close to the theoretical ones. 14 Though only the left tail of the distribution is required for the backtesting of risk model, right tail is also included to illustrate the good approximation achieved by the saddlepoint technique. 15 For 99% confidence level (a = 0.01), the probability of getting a zeroexception sample is 0.99T where T is sample size. 1410 W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415 Table 1 Empirical test sizes Left tail significance levels Right tail significance levels Panel A: given a fixed number of VaR violations n 0.005 0.01 1 0.50 0.99 2 0.51 1.03 5 0.54 1.01 10 0.49 1.01 0.025 2.46 2.47 2.50 2.51 0.05 4.86 4.97 4.99 5.02 0.005 2.83a 0.53 0.51 0.51 0.01 3.00a 1.00 1.02 1.03 0.025 3.26a 2.46 2.55 2.56 0.05 4.84 4.99 5.03 5.08 Panel B: given realized number of VaR violations, n T 0.005 0.01 125 0.50 1.01 250 0.54 1.07 500 0.50 1.02 1000 0.52 1.01 0.025 2.44 2.54 2.52 2.51 0.05 4.98 4.98 5.03 5.04 0.005 0.48 0.49 0.47 0.52 0.01 1.02 1.02 0.95 0.98 0.025 2.48 2.52 2.48 2.47 0.05 4.91 4.96 4.96 4.91 Panel C: given a fixed number of observations T, n unknown T 0.005 0.01 0.025 125 0.51 1.01 2.44 250 0.51 1.00 2.50 500 0.54 1.06 2.58 1000 0.49 1.02 2.52 0.05 4.84 5.03 5.01 5.02 0.005 29.5a 8.84a 0.76a 0.49 0.01 29.5a 8.84a 1.00 0.99 0.025 29.5a 8.84a 2.50 2.44 0.05 29.5a 8.85a 4.98 4.92 This table reports the empirical test sizes of expected shortfall backtests based on the Lugannani and Rice (1980) formula using 100,000 Monte Carlo simulations. The expected shortfalls are calculated based on 99%VaR. Panel A reports the empirical test sizes given a fixed number of violations n = 1, 2, 5 and 10. For various sample sizes T, Panel B presents the observed test sizes when the realized number of violations is known (but varies from sample to sample) whereas Panel C provides the empirical rejection rates when the number of exceptions is not known. a Empirical test size differs significantly from its theoretical size at 1% level. 4.2. Empirical test sizes and powers of other backtests Empirical test sizes of other backtests are first investigated. This is followed by study of their test powers if the samples are distributed as IID Student t, normal inverse Gaussian (NIG) or follow the GARCH process. ES-SDP, ES-Delta and ES-Censor refer to the backtests of ES using the saddlepoint, functional delta and censored Gaussian methods, respectively. These are size-based backtests whereas Bernoulli16 and Kupiec (see Kupiec (1995)) are frequency-based backtests. Same as above, a denotes empirical test size significantly differs from theoretical level at 1% level. In Panel A of Table 2, standard normal samples are simulated 100,000 times and the rejection is based at 5% significance level. It can be seen that except for ES-SDP, all other backtests show poor convergence to the true test size. While the reason for the poor convergence of the Bernoulli backtest lies with discreteness of its test statistic (which counts the number of exceptions), all other backtests rely on asymptotic statistics that converge to the limiting distribution poorly when sample size is small. Such poor approximation is accentuated by the fact that a is set at 0.01 by Basle II since extreme risk is the primary concern. For the study of test powers, 10,000 simulations are used and the rejection rates reported in Panels B, C and D of Table 2 are all based on 5% significance level. In practice, financial time series are found to exhibit excess kurtosis, have longer left tails and time-varying conditional variances. I use three non-normal processes to replicate this behavior. First, Panel B considers the case of Student t distribution with 5 degrees of freedom. Student t distribution is well known for its heavy tails but is symmetric. Next in Panel C, the IID NIG samples have excess kurtosis and are skewed to the left. The parameters of the NIG distribution are a = 0.8, b = 0.05, d = a(1 q2)3/2 and l = qd(1 q2)1/2 where q = b/a.17 Finally in Panel D, the time-varying conditional variance of financial returns are simulated by a GARCH(1,1) process Rt = rtZt with volatility equation given by r2t ¼ 0:05 þ 0:15R2t1 þ 0:8r2t1 and Zt distributed as IID N(0, 1). Generally speaking, the ES-based backtests are more powerful in detecting incorrect risk models, and amongst the ES-backtests, the saddlepoint technique is the most powerful. Though the Kupiec’s likelihood ratio test is the most powerful in detecting non-normal tail risk when the underlying process is GARCH(1,1) at T = 250, the test has large empirical test size equals 9.61%. Finally, it is observed that in comparison 16 Given a sample of size T with b losses exceeding the magnitude of VaR, the p-value of the one-tailed Bernoulli test is given by P(B P b) = 1 P(B < b) where B is distributed as Binomial(T, a). Some articles uses alternative p-value definition given by 1 P(B 6 b). However, this definition is not adopted by this paper because, say for B Binomial (250, 0.01) and b = 5, the p-value given by 1 P(B 6 5) = 0.0412 is significant at 5% level but P(B = 5) = 0.0666. 17 Let g(z) = a[d2 + (z l)2]1/2, then the density of NIG(a, b, d, l) is given by fNIG(z) = (pg(z))1ad exp{d (a2 b2)1/2 + b(z l)} K1(ag(z)), where K1(x) is the modified Bessel function of the third kind. See, for example, Eberlein et al. (1998). W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415 Table 2 Empirical test sizes and powers of other backtests T ES-SDP ES-Delta ES-Censor Panel A: standard normal (test size in %) 125 4.98 5.16 3.07a a 250 4.98 6.47 3.71a 500 5.03 3.95a 5.44a a 1000 5.04 4.16 5.32a Panel B: student with 5 degree of freedom (%) 125 34.1 21.8 24.9 250 52.4 34.8 41.1 500 73.0 56.3 63.4 1000 91.7 81.1 86.8 Panel C: standardized NIG with a = 0.8, b = 0.05 125 48.1 29.2 39.6 250 70.0 47.8 62.2 500 89.3 74.4 85.9 1000 98.7 94.0 98.2 1411 Table 3 Estimation risk, test sizes and powers Bernoulli 3.72a 4.14a 3.10a 4.74a Kupiec 3.83a 9.61a 7.20a 5.54a P ES-SDP ES-Delta ES-Censor Panel A: standard normal (test size in %) 4.10a 3.83a 125 6.96a a a 250 5.87 3.63 3.69a 500 5.43a 3.34a 3.72a a 1000 5.15 3.18 3.70a 12.2 17.5 22.6 43.6 12.2 06.6 20.9 33.2 Panel B: student with 5 degree of freedom (%) 125 57.7 41.7 49.0 250 55.9 38.7 45.1 500 54.7 38.2 43.2 1000 54.7 37.8 42.4 (%) 23.1 37.4 52.7 83.1 23.1 19.3 50.5 75.7 Panel C: standardized 125 75.3 250 73.3 500 71.9 1000 70.8 15.8 28.4 31.1 35.8 Panel D: normal GARCH with x = 0.05, a = 0.15, 125 28.6 14.5 26.6 250 31.0 14.9 27.3 500 31.9 14.9 27.1 1000 30.8 14.3 24.4 Panel D: normal GARCH with x = 0.05, a = 0.15, b = 0.8 (%) 125 15.3 06.1 15.4 15.8 250 25.0 12.8 22.8 20.3 500 38.3 26.4 38.3 23.8 1000 58.1 44.2 54.8 34.9 This table reports the empirical test powers at 5% significance level of various backtests based on 99% VaR. In Panel A, standard normal samples are simulated 100,000 times and the empirical test sizes are reported. Panels B, C and D provide the size-adjusted test powers against nonnormal distributions based on 10,000 simulations. ES-SDP, ES-Delta and ES-Censor refer to the ES backtests using the saddlepoint, functional delta and censored normal methods, respectively. a Empirical test size differs significantly from its theoretical value at 1% level. to the other two non-normal alternatives, the powers against the GARCH process are relatively low. 4.3. Estimation risk The Monte Carlo simulation studies so far assume that the parameters of forecasted distributions are free from estimation errors. In practice, this is of course not true. Therefore, the effects of estimation risk on the test sizes and powers of various backtests are studied here. Specifically, sample mean and standard deviation computed from estimation sample of size P are used to standardize testing sample under the standard normal null hypothesis. Without loss of generality, only test sample size of 250 is considered. For the estimation, sample sizes of 125, 250, 500 and 1000 are used to compute the standard normal parameters, mean and standard deviation, on a rolling window basis. For example, estimation sample R1, . . . , RP are used to compute the sample mean and standard deviation, denoted ^P and r ^P , respectively. One-step-ahead testing observaas l b P þ1 ¼ r b ^1 ^P Þ. The tion RP+1 is standardized as Z P ð R P þ1 l b P þ2 is obtained in next standardized testing observation Z the same way using rolling sample R2, . . . , RP+1. This process is repeated until the required full testing sample is constructed. Table 3 provides the empirical test sizes and NIG with a = 0.8, b = 0.05 54.7 73.1 52.8 67.9 50.9 65.4 49.0 63.3 Bernoulli 4.37a 4.20a 4.45a 4.19a Kupiec 5.90a 7.57a 8.82a 9.25a 21.4 18.9 19.1 18.0 09.4 07.6 07.5 07.0 (%) 45.1 41.3 39.5 38.4 26.0 22.4 20.6 20.1 b = 0.8 (%) 25.7 26.1 25.1 22.5 18.0 22.3 27.1 28.1 The empirical test sizes and powers at 5% significance level of various backtests based on test samples of 250 observations are reported below. These tests take into account of estimation risk. Specifically, sample mean and standard deviation computed from estimation sample of size P are used to standardize testing sample under the standard normal null hypothesis. In Panel A, standard normal samples are simulated 100,000 times and the empirical test sizes are reported. Panels B, C and D provide the size-adjusted test powers against non-normal distributions based on 10,000 simulations. ES-SDP, ES-Delta, ES-Censor refer to the ES backtests using the saddlepoint, functional delta and censored normal methods, respectively. a Empirical test size differs significantly from its theoretical value at 1% level. powers at 5% significance level of various backtests based on 99% VaR. It is obvious from the results in Panel A that small estimation sample results in over-sized distortion in the proposed saddlepoint technique and under-sized effect in the other two ES backtesting methods. When estimation sample increases to 1000, the saddlepoint technique gives correct size but the under-sized problems are marginally more severe for the other two ES backtests. Size-corrected test powers are presented in the rest of the table. Overall, test powers decrease with estimation sample, suggesting that small sample estimation risk results in higher test powers even after size-adjustments (see Table 4). 5. Application of ES backtesting for non-normal trading risk As an example of applying the saddlepoint backtest, I apply in this section the ES backtesting to the profit and loss (P&L) of trading accounts at six large banks studied by Berkowitz and O’Brien (2002). Given the proprietary nature of the banks’ P&L and VaR data, they are not easily available. However, the proposed backtest is easy to implement 1412 W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415 Table 4 Backtests of bank VaRs No. of observationsa No. of exceptionsa Mean violationa p-Value Panel A: full sample from January 1998 to March 2000 Bank 1 569 3 4.446 Bank 2 581 6 3.067 Bank 3 585 3 5.506 Bank 4 573 0 NA Bank 5 746 1 3.101 Bank 6 586 3 8.166 <0.0001 0.0050 <0.0001 NA 0.0964 <0.0001 Panel B: sub sample from August 1998 to October 1998 Bank 1 63 3 4.446 Bank 2 64 5 3.188 Bank 3 65 3 5.506 Bank 4 63 0 NA Bank 5 65 1 3.101 Bank 6 65 2 10.316 <0.0001 0.0018 <0.0001 NA 0.0964 <0.0001 This table applies the proposed backtest of ES to the daily profits and loss data of 6 large commercial banks in US. The full sample period (Panel A) covers from January 1998 through March 2000 whereas the sub-sample (Panel B) refers to the financial turbulent period from August to October 1998. The daily data are standardized to have unit variance for confidentiality. Mean violation refers to the average of the losses that exceed the VaR calculated by the bank’s structural model. Except for the case of n = 1, the reported p-values are obtained according to (9). For both full and sub-samples, the results of backtests show that four out of six banks have non-normal or excessive trading risk. a Data source is from Berkowitz and O’Brien (2002). Mean violations are represented by positive values here. as it only requires the number of exceptions and the mean violation of the banks’ trading books which are reported by Berkowitz and O’Brien. Using frequency-based backtests, they find both the bank’s internal VaR model and the reduced form normal GARCH18 model provide sufficient coverage of trading risks. It is nevertheless valuable for risk managers to investigate statistically if the exceptions exhibit non-normal or excessive tail risk. Indeed, the proposed backtests reveal that four out of six banks have excessive trading risk. 5.1. Banks’ trading P&L and internal VaR Some descriptions on the banks’ trading P&L and the associated VaR are provided here. Readers are referred to Berkowitz and O’Brien (2002) for more detailed information. All of the six banks are large multinational banks that meet the Basle ‘‘large trader” criteria and their trading activities include trading in interest rate, foreign exchange, equity, commodity and derivatives. To manage market risks, large scale structural models are developed which measure and aggregate trading risks in current positions at a highly detailed level. Given their roles as principal dealers in the ever growing over-the-counter derivatives markets, their trading accounts have grown rapidly and 18 The normal GARCH model used by Berkowitz and O’Brien (2002) is an ARMA(2)-GARCH(1,1) model with IID standard normal innovations. become progressively more complex. Thus the structural models can have exposures to several hundred thousand market risk factors, with individual positions numbering in the millions.19 Therefore, approximations are employed to reduce computational burdens and overcome estimation hurdles in order to turn out daily VaR for trading risk. It is required by regulation that the daily VaR estimates are maintained by the banks for the purposed of forecast evaluation or ‘‘backtesting”. Because of their proprietary nature, there has been little empirical study of the P& L of trading firms and the associated VaR forecasts. 5.2. Backtest of bank’s internal VaR model using expected shortfall Some assumptions are first made regarding the bank’s trading P&L. Let Rt denotes the bank’s trading returns at day t and it is assumed that Rt N(lt, r). So the bank’s trading returns is allowed to have time varying expected mean due to daily changes in the composition of the portfolio, but its variance is assumed to be constant. Though lt and r are not known, the reported mean violation is calculated in excess of VaR and the bank’s P&L are standardized to have unit variance in order to protect confidentiality. Hence, I can easily obtain the sample ES as ESn ¼ xe þ z0:01 , where xe is the mean violation in excess of VaR given by Berkowitz and O’Brien (2002) and z0.01 = 2.326 is the absolute value of 1% normal quantile. Table 3 lists for the six US large banks the number of observations, the number of exceptions, the mean violation (or the sample ES) and the corresponding p-value calculated using (9). Panel A provides the backtest results for the full sample period from January 1998 to March 2000. To illustrate how the p-value depends on the number of exceptions, consider for instance the case of Bank 2 which has six losses exceeded its internal VaR. According to (9) with n = 6, the corresponding p-value is evaluated as 0.0050. Thus I infer at 1% significance level that there is excessive trading risk for Bank 2.20 As another example, the mean violation of Bank 5 is 3.101 based on one exception. Since in this case, the corresponding p-value is 0.0964, the null hypothesis is accepted and the risk model is said to ‘‘cover” the risk as measured by ES, the average of tail losses. As can be seen from the backtest results in Panel A, four out of six banks exhibit excessive or fatter-thannormal market risks at 1% significance level. This is consistent with the observation by Berkowitz and O’Brien that the banks’ internal VaRs are conservative; but when exceptions occur, they tend to exceed the VaR models drastically. 19 See for example, page 71 of 2004 Annual Report of JPMorgan Chase & Co. 20 Alternatively, the sample ES of 3.067 is said to exceed at 1 percent significance level the theoretical ES, which equals to 2.6652 under the null hypothesis. W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415 Berkowitz and O’Brien also find most of the exceptions took place between August and October of 1998, a turbulent period for financial markets.21 They provide the relevant data on mean violation and number of exceptions, but do not carry out backtests on them during this period. Backtests that rely on large sample asymptotic statistic may not be applied here because the sample is too small for reliable inference. The advantage of the small sample asymptotic technique becomes obvious in this situation. Regardless of the length of observations, the proposed backtest of ES can be applied as long as there are exceptions. It can be seen from the backtest results presented in Panel B that banks that fail the backtest during the full sample also fail the test here. I therefore infer that the excessive trading risks come from this period of extreme market movements. 6. More responsive risk management decision In this section, I provide a simple illustration of how the proposed ES backtesting can make risk management decision more responsive. In the study by Berkowitz and O’Brien (2002), a normal GARCH(1,1) is used as an alternative to the bank’s VaR model. Since the time series model attempts no accounting for changes in portfolio composition, the bank’s structural model should in principal deliver superior VaR forecasts. Berkowitz and O’Brien find, however, the normal GARCH model permits comparable risk coverage with lower VaRs, that is, less regulatory capital. They attribute this to the ability of the time series model to capture trend and time varying volatility in a bank’s P&L. Since time varying volatility as specified by a normal GARCH also generates fat tails, it is interesting to see how effective the model is in accounting for the non-normal trading risk in the equity markets. Specifically, I fit the following AR(1)-GARCH(1,1) risk model to S&P 500 daily index returns: Rt ¼ lt þ et ¼ l þ qRt1 þ rt Z t ; rt ¼ x þ ae2t1 þ brt1 ; where Rt is the S&P 500 return at day t and Zt is IID standard normal process. Conditional on information available on t-1, one day ahead VaR is forecasted as VaRt ¼ ^t . For n > 0 realized exceptions up toP 2:33^ rt l time t, sample n b expected shortfall is computed as ESn ¼ n1 i¼1 Z t , where b ^1 ^t Þ. the standardized exception is given by Z t ¼ r ðR t l t Since the AR(1)-GARCH(1,1) risk model assumes IID standard normal innovation process Zt, the null hypothesis will hold if the risk model fully describes the S&P 500 return dynamics. To see how the saddlepoint ES backtesting can help risk management decision responsive, for the sake of illustration, suppose that we start the risk forecast from 1 August 21 Between August and October of 1998, there were the devaluation of the Russian ruble, Russian debt default and the near-collapse of LTCM. 1413 4.00 Mean violation 5% critical value 1% critical value 3.75 3.50 3.25 3.00 2.75 2.50 1 19980825 2 19980827 3 19981125 4 19990319 5 19990716 Fig. 2. Prompt decision based on backtest of ES This figure depicts sample ES or mean violation since 1 August 1998. The associated 5% and 1% critical values are also plotted. To illustrate how the proposed ES backtest can be a basis for prompt decision, it is supposed that the exception on 25 August 1998 is the first to occur since 12 months ago. The first exception lies between the 5% and 1% critical values. When the second exception occurs, the mean violation falls below 1% critical value and remains so for the rest of the year. In this example, the risk model is rejected at 5% significance level when the first exception occurs; the rejection is at 1% significance level from second exception onwards. 1998.22 As it turns out, there are 5 exceptions in the following twelve months ending on 31 July 1999. Fig. 2 depicts the series of mean violations (or sample ES) with first exception realized on 25 August 1998. For the first exception, n = 1 and the sample ES lies below the 5% critical value and is thus significant at 5% level. When the second exception occurs, n = 2 and the mean violation becomes significant at 1% level, an even stronger message for the normal GARCH model to be rejected. Therefore, the proposed ES backtesting facilitates prompt risk management response because accurate critical values can be calculated for any number of exceptions. 7. Conclusion Berkowitz and O’Brien (2002) find that banks’ internal VaR models tend to be conservative but the losses are huge when exceptions occur. This motivates backtest of bank’s trading risk using expected shortfall (ES). The backtest of ES, however, is difficult to implement since the existing censored Gaussian and functional delta methods rely on asymptotic test statistics that converge poorly for small samples. The proposed saddlepoint backtest employs small sample asymptotic technique, which Monte Carlo simulation shows is highly accurate and powerful even when sample size is small. Conditional on the reported number of exceptions and mean violation for the period from January 22 3,000 observations prior to 1 August 1998 are first used to estimate the risk model and ten one-step-ahead VaRs are forecasted. The estimation window is rolled forward ten trading days and a new set of model parameters are obtained, which will in turn be used to forecast one-step ahead VaR for the next ten trading days. The process is continued until 31 July 1999 is reached. 1414 W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415 1998 to March 2000, I find four out of the six banks studied by Berkowitz and O’Brien have excessive market risk. Moreover, since the proposed backtest depends on the number of exceptions but not the sample size, the saddlepoint backtest can also be applied to the three months sub-sample period from August to October 1998 when financial markets were extremely volatile. It is found that banks that have significant non-normal market risk in the full sample also fail the sub-sample backtests, suggesting that most of the detected tail risks can be attributed to the volatile financial markets during this period. The proposed saddlepoint technique can compute under the null hypothesis the required p-value accurately conditional on any number of exceptions. This has two important implications. First, when there are huge losses, even though there is just one or two of them, the risk model can be evaluated on a formal statistical basis. This enables prompt action to be taken by the risk managers or regulators if necessary. Second, the proposed backtest can be used alongside the existing VaR measure. Since VaR has been sanctioned by the Basle Committee and various central banking authorities, it is now the standard risk measure. Hence it is useful for backtesting purpose to calculate ES based on the losses that exceed the forecasted VaR. This means the number of exceptions will vary form sample to sample, which poses no problem for the proposed backtest using the small sample asymptotic technique. In short, risk management process is about communicating various aspects of risks effectively and condensing complex risk factors into simple, understandable terms. To this end, VaR has demonstrated to be a successful risk measure. However, being a quantile statistic, VaR ignores valuable information contained in the sizes of losses beyond the VaR level. By averaging the sizes of exceptions, ES communicates the degree of non-normal or tail risk in a simple way to investors, risk managers and regulators. The shortcoming that holds back ES is its backtesting, especially when sample size is small. This paper overcomes this shortcoming by using the saddlepoint technique to compute under the null hypothesis the required p-value of the test statistic. As the Basle Committee on Banking Supervision (1996b, p. 2) recognizes the fact that ‘‘the techniques for risk measurement and backtesting are still evolving, and the Committee is committed to incorporating important new developments in these areas into its framework,” it is hoped that the proposed ES backtesting will help raise the standard of future risk management. Acknowledgment The author would like to thank participants for suggestions and comments at The 5th NTU International Conference on Economics, Finance and Accounting–Financial Institutions and Capital Market, Taipei, Taiwan, May 23–24, 2007. Appendix Proof of Proposition 1. Consider the moment generating function Z q etx f ðxÞ dx; MðtÞ ¼ Eðetx Þ ¼ 1 which, after substituting a1/(x) for f(x) in the integral term, collect the exponential terms and simplify, will give rise to (3). For the derivatives of the moment generating function, first denote the differential operator dtd by Dt. Now using the standard results that DtU(t) = /(t) and Dt exp (t2/2) = t exp(t2/2), the first derivative of the moment generating function can be written as M 0 ðtÞ ¼ t MðtÞ expðt2 =2Þ a1 /ðq tÞ; ¼ t MðtÞ expðqtÞ a1 /ðqÞ: Second and higher order derivatives of the moment generating function given by (4) follow straight away from differentiating above result. 0 00 Proof of Corollary 1. Letting t = 0 in M (t) and M (t) from (4) will give rise to E(X) and E(X2), respectively. It is easy to obtain of variance of X since var(X) = E(X2) [E(X)]2. References Artzner, P., Delbaen, F., Eber, J.M., Heath, D., 1997. Thinking coherently. Risk 10 (11), 68–71. Artzner, P., Delbaen, F., Eber, J.M., Heath, D., 1999. Coherent measures of risk. Mathematical Finance 9 (3), 203–228. Basle Committee on Banking Supervision, 1996a. Amendment to the Capital Accord to incorporate market risk. Basle Committee on Banking Supervision, 1996b. Supervisory Framework for the use of ‘‘Backtesting” in conjunction with the Internal Models Approach to Market Risk Capital Requirements. Berkowitz, J., 2001. Testing density forecasts, with applications to risk management. Journal of Business and Economic Statistics 19, 465– 474. Berkowitz, J., O’Brien, J., 2002. How accurate are Value-at-Risk models at commercial banks? Journal of Finance 57, 1093–1111. BIS Committee on the Global Financial System, 1999. A Review of Financial Market Events in Autumn 1998, October. Campbell, R., Koedijk, K., Kofman, P., 2002. Increased correlation in bear markets. Financial Analysts Journal 58, 87–94. Daniels, H.E., 1954. Saddlepoint approximations in statistics. Annals of Mathematical Statistics 25, 631–650. Daniels, H.E., 1987. Tail probability approximations. International Statistical Review 55, 37–48. Eberlein, E., Keller, U., Prause, K., 1998. New insights into smile, mispricing and value at risk: the hyperbolic model. Journal of Business 71, 371–406. Jorion, P., 2000. Risk management lessons from Long-Term Capital Management. European Financial Management 6, 277–300. Kerkhof, J., Melenberg, B., 2004. Backtesting for risk-based regulatory capital. Journal of Banking and Finance 28, 1845–1865. Kupiec, P.H., 1995. Techniques for verifying the accuracy of risk measurement models. Journal of Derivatives 3, 73–84. Longin, F., Solnik, B., 2001. Extreme correlation of international equity markets. Journal of Finance 56, 649–676. W.K. Wong / Journal of Banking & Finance 32 (2008) 1404–1415 Lotz, C., Stahl, G., 2005. Why value at risk holds its own? Euromoney, February 2005, London. Lugannani, R., Rice, S.O., 1980. Saddlepoint approximation for the distribution of the sum of independent random variables. Advanced Applied Probability 12, 475–490. McNeil, A., Frey, R., Embrechts, P., 2005. Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, Princeton. 1415 Rappoport, P., 1993. A new approach: Average shortfall. J.P. Morgan Fixed Income Research Technical Document. Shao, J., 1998. Mathematical Statistics. Springer-Verlag, New York. Szego, G., 2002. Measures of risk. Journal of Banking and Finance 26, 1253–1272. Yamai, Y., Yoshiba, T., 2005. Value-at-risk versus expected shortfall: A practical perspective. Journal of Banking and Finance 29, 997–1101.