* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Backtesting Value-at-Risk based on Tail Losses Woon K. Wong
Survey
Document related concepts
Transcript
Backtesting Value-at-Risk based on Tail Losses Woon K. Wong Tamkang University 2007/1/6 Abstract Value-at-Risk (VaR) has been sanctioned for market risk capital determination and is now the standard for risk measurement. However, there is potential tail risk in the use of VaR since the risk measure disregards valuable information conveyed by sizes of losses beyond the VaR level. While the Capital Accord stipulates a backtesting procedure of VaR by counting the number of exceptions, this paper proposes to use saddlepoint technique to backtest VaR by summing the tail losses. Monte Carlo simulations show that the technique is very accurate and powerful even for small samples. Application of the proposed backtest to S&P 500 index returns finds substantial downside tail risks in the stock markets, and that risk models which account for jumps, skewed and fat-tailed distributions fail to capture the tail risk during the 1987 stock market crash. Key words: Value-at-Risk, exception, tail risk, backtesting, risk management and saddlepoint technique JEL Classification: G32 0 1. Introduction Value-at-Risk (VaR) was sanctioned in 1996 by the Basle Committee and major central banking authorities for market risk capital determination through the so called internal models.1 Since then, VaR has become the standard for measuring risk and is now widely used by banks and other financial entities such as hedge funds for risk reporting; see, for example, Jorion (2001). One reason for the preference of VaR over other risk measures can be attributed to the fact that the backtest of VaR is easy to implement. VaR can be defined as the maximum possible loss of a portfolio over a specified period at a given confidence level. Backtest of VaR basically involves counting the number of exceptions or tail losses exceeding VaR. For example, both the Kupiec’s (1995) likelihood ratio test and the Christoffersen’s (1998) conditional coverage test use empirical failure rate which can be easily calculated as the number of exceptions by the sample size. Unfortunately, above backtests of VaR disregards valuable information conveyed by the sizes of exceptions. During turbulent market environments such as the fall of 1998 when many banks suffered huge financial losses, above frequency-based backtests of VaR can be deceptive of the underlying risk. For instance, Berkowitz and O’Brien (2002) study the trading profits and losses of six large multinational banks from January 1998 to March 2000. Using above frequency-based backtests, they find the banks’ internal VaR models provide sufficient coverage for market risk. Nevertheless, losses were huge when breaches of VaR occurred during the volatile period of fall 1998. That is, there can be substantial tail risk in the use of VaR that is overlooked by the risk model.2 In this paper, I propose to backtest VaR model based on sizes of tail losses. Specifically, I calculate the tail risk (TR) statistic which is defined as the sum of all exceptions in excess of VaR divided by the sample size. The main contribution of this paper is to use the saddlepoint or small sample asymptotic technique to calculate the p-value of the TR test statistic accurately. Monte Carlo simulations show that the 1 Basle Committee on Banking Supervision (1996a, Section B.4) stipulates that banks may use internal models to calculate the market risk capital requirement, which is defined as the higher of (i) its previous day’s 10-day VaR and (ii) an average of the 10-day VaRs over the preceding 60 business days, multiplied by a multiplication factor of at least 3. 2 See also Leippold (2004) and Yamai and Yoshiba (2005) regarding further description on the concept of tail risk. 1 proposed backtest yields accurate test size and is very powerful against excessive tail risk even for small samples. Application of the backtest to the S&P 500 index returns from 1986 to 2000 reveals that risk models which account for jumps, skewed and fat-tailed distributions fail to capture the tail risk during the 1987 stock market crash. Tail risk as defined above is related to VaR and expected shortfall (ES)3 in the following manner: 1TR ES VaR , where 1 is the confidence level at which VaR is calculated. Thus TR refers to the risk at the part of tail that is disregarded by VaR. The proposed backtest of VaR based on tail losses would help a risk manager to assess whether the tail risk of her VaR model is consistent with the tail of the underlying distribution model. Though various backtests of ES have been proposed, noticeably the censored Gaussian approach of Berkowitz (2001) and the functional delta method of Kerkohf and Melenberg (2004), it is useful to consider the backtest of tail risk of VaR proposed in this paper for the following reasons. VaR has now become the standard for risk measurement and its wide usage is difficult to be changed. Since the Basle Committee stipulates that VaR be backtested by counting the number of exceptions, it is meaningful to carry out the proposed backtest based on the sum of all sizes of exceptions. Indeed, the proposed backtest has many advantages that are complimentary to VaR being an officially sanctioned risk measure. First, sizes of tail losses convey valuable economic information for risk management. Though current backtesting procedure counts only the number of VaR exceptions, the Basle Committee points out that banks should document and explain all such exceptions if the number of exceptions exceeds four.4 This is in order to determine an appropriate supervisory response to a backtesting result that is in the yellow zone but possibly due to pure bad luck. The backtest proposed in this paper considers sizes of exceptions and thus can provide additional valuable information that will facilitate an appropriate supervisory action to be made. Second, it is recognized by the Basle Committee on Baking Supervision (1996b, p. 5) that the frequency-based tests are limited in their power to distinguish an accurate model from an inaccurate one, and such limitation has been a prominent 3 Expected shortfall is defined as the average of losses that exceed VaR; see Artzner (1997, 1999). When the number of exceptions exceeds four, the backtesting result is classified by the Basle Committee as in the yellow zone and the multiplication factor for market risk capital requirement will be raised; see Section 2 for further details. 4 2 consideration in the design of the current backtesting framework. However, as formal statistical technique for a size-based backtest was not available, the Basle Committee rightly decided the frequency-of-exceptions method that is simple and straightforward to implement. Though size-based backtests of ES have recently been proposed by Berkowitz (2001) and Kerkhof and Melenberg (2004), they rely on large sample to converge to the required limiting distribution. The backtest proposed in this paper uses small sample asymptotic technique to calculate the p-value of the proposed size-based test statistic, TR. It is very accurate and a lot more powerful than its frequency counterpart. This is especially important because backtests are usually required to be carried out on small samples. Third, it is also envisaged that all risk information cannot be possibly captured by a single risk measure, and thus the Basle Committee requires VaR to be complemented by other risk analyses such as stress testing. While stress testing offers rigorous and comprehensive scenarios study, by its very nature, it lacks uniformity and sometimes can even be subjective.5 In this context, the proposed backtest of tail risk of VaR is an excellent complimentary tool to VaR and its accompanied stress test since the proposed backtest statistically tests the sizes of tail losses for possible inadequacy of risk models. Forth, as from 2006, the new Basle Accord requires major banks to incorporate the market, credit as well as the operational risks in an integrated risk framework; see, for examples, Basle Committee on Banking Supervision (2004) and Rosenberg and Schuermann (2006). Operational risk is being defined as “the risk of loss resulting from inadequate or failed internal processes, people, and systems, or from external events” 6 which includes catastrophes such as 9/11. So operational losses are characterized by infrequent but large losses and its distribution is likely non-elliptical. Thus it is important for risk managers to carry out backtests of tail risks of VaR. To illustrate the possible link of tail risk and non-subadditivity, this paper provides an empirical example that the 99%VaR of the S&P 500 daily returns in 1987 fail to satisfy the subadditive property while all three risk models considered in this paper fail the backtest of tail risk of VaR. Finally, as is pointed out by Basak and Shapiro (2001), evidence abounds in 5 For example, in addition to certain specified stress scenarios the bank portfolios are required to subject to, banks are also expected to develop their own stress tests which they identify as most adverse for the portfolios concerned. 6 See Basle Committee on Banking Supervision (2004, p. 137). 3 practice that VaR estimates are increasingly used as a tool to manage and control risk. As a result, instead of using VaR estimates as summary statistics for decision making, economic agents struggle to maintain the VaR of their market exposures at a prespecified level. Such risk management practice could result in larger losses during adverse down markets due to the fact that a VaR risk manager often chooses a larger exposure to risky assets than a normal risk manager; see also Yamai and Yoshiba (2005) for further illustrations on the subject. It is hoped that the availability of a simple and effective backtest of VaR based on tail losses would encourage risk managers to test for tail risk of VaR, and thus remedy this possible shortcoming of VaR. This paper is organized as follows. Section 2 explains why VaR continues to enjoy its popularity in risk management notwithstanding its theoretical shortcomings. Section 3 employs small sample asymptotic technique to calculate the p-value of the proposed backtest whereas Section 4 carries out the Monte Carlo simulation studies. Section 5 fits three popular risk models to the S&P 500 daily returns followed by backtests of tail risks of VaR. Finally, Section 6 concludes. 2. Why VaR continues its success? In this section, I explain why in spite of certain theoretical shortcomings, VaR continues its success as a risk measure. Theoretical as well as pragmatic reasons for the continual acceptance of VaR are presented below. The implication is that VaR as a standard risk measure is unlikely to be changed. It is therefore useful to have a size-based backtest of VaR that is complimentary to the “count” rules adopted by the Basle Committee. Specifically, the Capital Accord stipulates to count the number of exceptions to backtest VaR whereas this paper proposes to sum the sizes of exceptions to backtest the tail risk of VaR. The reasons for the VaR’s success can be attributed to the following reasons. From a theoretical viewpoint, Danielsson et al. (2006) show provided it is sufficiently far in the tail that both ES and VaR have similar and consistent ranking of risky projects. Therefore, VaR is a sufficient risk measure if the confidence level is set sufficiently large. This is in line with the Basle Committee setting the confidence level 4 for the market risk at 99% whereas for the credit and operational risks, VaR is calculated at 99.9% level. Moreover, due to diversification, VaR at the level of the whole bank is often found to be adequate for regulatory capital determination. As long as there is no single position that dominates the portfolio, the returns of the bank trading and credit portfolios are usually close to that of a normal or, if it is fat-tailed, a Student t distribution. Both the normal and Student t distributions belong to the class of elliptical distributions for which linear correlation is a sufficient dependence measure. More importantly, for a portfolio with elliptically distributed returns, VaR is subadditive and thus a coherent risk measure; see for example Embrechts et al. (1999) and Szegö (2002). In practice, risk management process is about communicating effectively various aspects of risks and reducing complex risk factors into simple, understandable terms on which decisions can be based. For instance, a trader from a government bond desk may use term such as “duration” to describe his risk exposure; an equity trader will talk about her “beta”; and an option trader is most worried about his “delta”. Therefore, an important role of VaR is to make these risks transparent by condensing the entire distribution of the returns of a bank trading portfolio into a single number that people from different backgrounds such as board members and investors can easily understand as a measure of risk. This is one reason why VaR is selected by the Basle Committee as the basis for calculating regulatory capital for market risk. Readers are referred to Lotz and Stahl (2005) for further details on the use of VaR in practice. Another reason for the Basle Committee’s preference for VaR is that it can be backtested statistically. When a risk measure is calculated, it is crucial for regulators as well as risk managers to check whether the underlying risk is captured properly. Without the feasibility of backtesting, there is no basis for penalties and rewards, and regulation is not possible. For this reason, the Basle Committee has specified a “backtesting” procedure that simply counts the number of VaR breaches using the most recent twelve months of data. If the number of exceptions over this period is less than or equals to 4 (the green zone), the multiplication factor for the market risk capital determination remains at the minimum value of three; if there are more than four exceptions (the yellow zone), the multiplication factor will be increased up to a 5 maximum of four when there are ten or more exceptions (the red zone)7. In the regulatory context, such simple and yet formal statistical based backtest allows for uniformity across agencies, countries, and time. Moreover, it is hoped that such backtesting would offer incentives for the banks to continually improve their internal risk measurement models, and thus raise the standard of risk management. Indeed, a breach of VaR is easily identified, and various techniques have been developed for the backtesting of the risk measure. Some of the more well known backtests of VaR proposed by academics include Kupiec’s (1995) likelihood ratio test of failure rate, Christoffersen’s (1998) test of conditional coverage, Christoffersen and Diebold’s (2000) test of serial correlation of VaR breaches, Dowd’s (2001, 2006) test of VaR using order statistics, and the testing and comparing of VaRs using GMM method by Christoffersen et al. (2001). In summary, VaR has been well studied by academics and widely used by practitioners. It is unlikely that the Basle Committee would replace VaR with another risk measure. Therefore, it is useful to have a size-based backtest of VaR that is complimentary to the use of VaR for risk management. While the backtesting procedure stipulated by Basle II counts the number of exceptions, this paper proposes to backtest VaR by summing the tail losses. Such complimentary role of the proposed backtest provides additional valuable information for more powerful backtesting, presents a statistical analysis on nature of tail risk of VaR, and facilitates appropriate supervisory action to be made. 3. Backtesting Tail losses using the saddlepoint technique This section derives the small sample asymptotic technique to backtest the tail risk of VaR. I first present the test statistic and briefly discuss its relation with ES. Next, I provide some basic statistic properties of the test statistic, which are then used to derive the Lugannani and Rice (1980) formula in order to calculate the p-value of the test statistic. 3.1 Tail risk of VaR 7 See Basle Committee for Banking Supervision (1996b, Table 2) for details of increases in the multiplication factor. 6 Let R be the random profit or loss of a portfolio over a holding period. For the sake of simplicity but without loss of generality, let F be the absolutely continuous cumulative distribution function (CDF) of R. Then given a sample of T returns {r1 ,, rT } , the tail risk of VaR can be calculated as TR 1 T (rt q) I (rt q) , T t 1 (1) where I () is an indicator function and for 0 < < 1, q F 1 ( ) is VaR at (1-) confidence level. Basle II sets = 0.01 for market risk. Throughout this paper, a negative number represents a loss and thus VaR as defined above is less than zero. The population counterpart of (1) is given by q (r q) f (r )dr where f(r) denotes the probability density function (PDF) of R. It is straight forward that q 1TR 1 (r q) f (r )dr ES VaR . (2) Therefore, 1TR may be referred to as the part of ES that corresponds to the portion of tail disregarded by VaR. Nevertheless, there is a subtle difference between them in backtesting. It can be seen from (1) that TR is obtained by summing all the tail losses in excess of VaR and dividing the sum by the total number of observations in the sample. So unlike ES, which is the expected loss when VaR is breached, TR is proportional to the sum of all tail losses accumulated over the entire history. This may seem to be a less desirable property if one is interested in the average size of loss when VaR is exceeded. However, TR is used in this paper specifically for backtesting. It tells a risk manager the aggregate tail losses a trading account may incur over a period. In this regard, the proposed backtest of tail risk of VaR is similar to a frequency-based backtest of VaR in that while the latter counts the number VaR breaches, the former sums all the tail losses. Therefore, backtesting the tail risk of VaR can be regarded as a direct and complementary counterpart to that of the Basle’s backtesting procedure for VaR. 3.2 The TR random variable and its moments Due to the workings of the central limit theorem, the daily returns of a bank trading account are often be fitted with a normal distribution. For example, Berkowitz and O’Brien (2002) find that an ARMA-GARCH model with normally distributed innovations fits well for the profits and losses of the trading accounts of six large 7 commercial banks in U.S. banks in the sense that the risk model passes the backtests by Kupiec (1995) and Christoffersen (1998).8 Nevertheless, it is well known that even at aggregate level the stock market indices are skewed and leptokurtic. It is thus sensible to carry out further backtesting for tail risk of VaR. Since normality is widely assumed in practice and a non-normal density can always be transformed into a normal one,9 the saddlepoint technique in this paper is derived by assuming normality for the underlying returns. Nevertheless, it must be emphasized that the small sample asymptotic technique can also be derived under non-normal null hypothesis. Now, let r be the portfolio return with a standard normal distribution and denote its CDF and PDF by and respectively. From now onwards, unless stated otherwise, the quantile q refers to the -quantile of a standard normal distribution, that is q 1 ( ) . Next define a random variable X with realization x such that r q if r q, x if r q. 0 (3) Note that x ( , 0] and the expectation of X is the theoretical value of the corresponding TR. The PDF of X can be defined in terms of as ( x q) if x 0, f ( x) if x 0, 1 and the corresponding CDF by ( x q) if x 0, F ( x) if x 0. 1 Now consider the following results on the moments of f (x) that shall be useful for the rest of this section. PROPOSITION 1: The moment generating function of f (x) is M (t ) exp( qt t 2 /2) (q t ) 1 , (4) and its derivatives with respect to t are given by 8 Passing the above frequency-based backtests, however, does not mean the tail risk is fully captured by the normal distribution since it is acknowledged by the Basle Committee that frequency-based tests are limited in powers. 9 If the null hypothesis assumes a non-normal density, Berkowitz (2001) and Kerkhof and Melenberg (2004) also transform the non-normal null density into a normal one. 8 M (t ) exp( qt t 2 /2) (t q) (q t ) (q t ) , M (t ) (t q) M (t ) M (t ) (1 ), (5) M ( n ) (t ) (t q) M ( n1) (t ) (n 1) M ( n2) (t ), where n 3 . PROOF: Consider the moment generating function using the Stieltjes integral 0 0 M (t ) E(e tx ) e tx dF e tx f ( x)dx 1 , (6) which, after substituting ( x q) for f (x) in the integral term on the right hand side of (6), collect the exponential terms and simplify, will give rise to (4). For the derivatives of the moment generating function, first denote the differential operator d dt by Dt . Now using the standard results that Dt (t ) (t ), Dt (t ) t (t ), Dt exp( qt t 2 /2) (t q) exp( qt t 2 /2), results in (5) can be verified by straight forward algebra. Q.E.D. As a corollary of Proposition 1, the following mean and variance of X can be obtained easily. COROLLARY 1: The mean of X, which is expectation of TR, is given by e q /2 2 X E( X ) q 2 , (7) and its variance is qe q /2 2 var( X ) q 2 X 2 2 X2 . (8) PROOF: Letting t = 0 in M (t ) and M (t ) from (4) will give rise to E(X) and E(X2) respectively. It is straight forward to obtain of variance of X since var( X ) E( X 2 ) [E( X )]2 . Q.E.D. The asymptotical distribution of the sample mean of X can be obtained using 9 above results. Given a sample {xt } of size T, its sample mean x tends to normality as T tends to infinity. That is, x X T X d N (0 ,1) , where the population mean X and variance X are respectively given by (7) and (8) in Corollary 1. Since extreme losses are used to provide as a basis for regulatory capital determination, is set by the Basle Committee at 0.01 for market risk capital. Hence, above asymptotic result is hardly valid for sample sizes one would encounter in practice. For this reasons, this paper resorts to the Lugannani and Rice (1980) formula derived by saddlepoint technique in order to provide a more accurate method to calculate the p-value for statistical backtest of tail risk. 3.3 The Lugannani and Rice formula By using the small sample asymptotic technique, Lugannani and Rice (1980) provide an elegant formula that can accurately calculate the tail probability (or 1-CDF) of the sample mean of an independently and identically distributed (IID) random sample. To have an intuitive understanding of the method, one can start with the inversion formula when x 0 :10 f X (x) n 2 e n[ K (it ) itx ]dt , where f X (x ) denotes the PDF of the sample mean and K (t ) ln M (t ) is the cumulant generating function of f (x) . Then the tail probability we required for statistical testing can be written as 0 P( X x ) f X (u )du x 1 2 i s i s i e n[ K (t )tx ] t 1 dt (9) where s is chosen to be the saddlepoint satisfying K ( ) x . (10) The role of the saddlepoint is to make possible a transformation be applied to (9), so that one can avoid approximating K (t ) tx in the exponential where the function would blow up. So to obtain the Lugannani and Rice (1980) formula, this paper follows the approach of Daniels (1987) by using the following transformation 10 When x = 0, the probability density is simply (1 )T and P( X x ) 0. 10 21W 2 (t ) W (t ) W K (t ) t x which describes the local behavior of K (t ) t x over a region containing both t and t 0 when is small. Note that in the above transformation, W sgn( ) 2( x K ( )) . The tail probability formula may then be written as P( X x ) c i n[ 2 1 W 2 W W ] 1 c i e t dt dW . dW For small t and x X , W~ K ( ) K (0) x X t t W W and the tail probability may then be conveniently rearranged as P( X x ) c i n[ 2 1 W 2 W W ] c i e W 1dW c i n[ 2 1 W 2 W W ] c i e t 1 dt / dW W 1 dW . Note that the first term on the right hand side of the above equation corresponds to the singularity in (9) and can be shown to be equal to 1 (n1/2W ) . As for the second term, Daniels (1987) shows after some algebra and application of Watson’s Lemma,11 it can be approximated by (n1/2W ) [(n1/2W )1 ( (nK ( ))1/2 )1 ] with error term of order O(n 3/2 ) . This is the Lugannani and Rice (1980) formula which may be more elegantly expressed as below without any further proof: PROPOSITION 2: Given an IID standard normal random sample {rt } of size T, construct {xt } according to (3). Let be the saddlepoint such that (10) is satisfied, and define nK ( ) and sgn( ) 2n( x K ( )) , where sgn(s) equals to zero when s = 0, or takes the same sign of s when s 0. Then the tail probability of exceeding the sample mean x X is given by 1 1 P( X x ) 1 ( ) ( ) O(T 3/2 ) . < Insert Figure 1: Confidence bounds of TR > 11 See for examples Watson (1948) and Daniels (1954). 11 (11) Figure 1 depicts the 99% confidence bounds of TR using the saddlepoint and normal asymptotic distributions. Since TR is calculated based on 99%VaR, the saddlepoint lower bound is not available for sample size T 500 (0.99500 > 0.005). It is evident from Table I that the saddlepoint technique yields accurate test sizes. Large deviation between the saddlepoint and normal limiting distribution below therefore reveals that the latter approximate poorly the true distribution of TR when the sample size is small. In order to obtain the tail probability, I first calculate the saddlepoint by solving for t in the following equation K (t ) M (t ) (t q) (q t ) (q t ) x, M (t ) (q t ) (1 ) exp( qt t 2 /2) (12) so that the required moment derivatives given in (5) can be evaluated at t . Then moment derivatives are used to obtain the cumulant counterparts, which in turn are used to calculate and as defined in Proposition 2, and thus the required probability. Above tail probability can be calculated only when x X . Though the case of x X is hardly encountered in practice, the corresponding tail probability is provided below for completeness: P( X X ) 1 K (3) (0) O(n 3/2 ) . 3 2 6 2 n [ K (0)] Finally, it is noted that the Lugannani and Rice formula given by (11) is applicable only when the sample mean x 0 . When x 0, it is obvious that P( X 0) 1. Hypothesis testing For hypothesis testing, it is more intuitive to use the standardized statistic ~ z 1TR , (12) where TR is given by (1). While the likelihood ratio backtests of Kupiec (1995) and Berkowitz (2001) are two-tailed tests, the proposed backtest based on (12) can be either one- or two-tailed. For a two-tailed backtest, the null and alternative hypotheses are H0 : ~ z ~ z0 versus H1 : ~ z ~ z0 , z 0 0.3389 under the standard normal null hypothesis.12 To carry out a where ~ 12 For a standard normal distribution, VaR = -2.3264, ES = -2.6652 and therefore TR = ES – VaR = 12 one-tailed regulatory backtest to check whether a risk model provides sufficient tail risk coverage, the null and alternative hypotheses become H0 : ~ z ~ z0 versus H 1 : ~ z ~ z0 . Since portfolio losses are represented by negative numbers, if the theoretical TR predicted by the risk model is more negative (i.e. larger tail risk) than the empirical TR, then the risk model is said to “capture” the tail risk. On the other hand, if the alternative hypothesis is accepted, then the risk model is described as failing to capture the tail risk of VaR. 4. Monte Carlo simulations In this section, Monte Carlo simulation experiments are carried out to investigate the empirical test sizes and the powers of the proposed backtests when is set at 0.01 for market risk measurement. To carry out the simulation study, IID samples of various sizes are generated and the corresponding p-values of the observed standardized statistic given by (12) are calculated using the Lugannani and Rice formula. For each sample size, rejections are made if the calculated p-values are less than the specified significance levels ranging from 0.5% to 5%. The observed empirical sizes are then obtained by dividing the numbers of rejections by the total numbers of simulations, which are 100,000. For each empirical test size, likelihood ratio test of the empirical failure rate is applied. 4.1 Empirical test sizes When the IID samples are generated from standard normal distribution, the Monte Carlo rejection rates give us the empirical test sizes, which are reported in first two panels of Table I below. The top panel provides the results for the right tail of the distribution. Since the Lugannani and Rice formula is only valid for x 0 , the CDF is set at the maximum value of 1 when the sample mean equals to zero. For ease of reference, the first column of Panel A lists the theoretical probabilities when x 0 . For example, when T = 250, the corresponding probability is 0.99 250 0.0811. Accuracy of empirical test sizes are tested along the same line as the Kupiec’s likelihood ratio test. Notation * indicates significance at 1% level. The last four -0.3389. This result can also be obtained by evaluating (7). 13 columns of Panel A reveals that as long as the theoretical test size is smaller than the corresponding probability value when x 0 , the empirical test sizes are very close to the theoretical ones. <Insert Table I: Empirical test sizes> Right tail of the distribution can be used to check whether an observed TR is over predicted by a risk model. However, in terms of the regulatory capital requirement for buffeting huge losses, one would need to examine the left tail of the distribution.13 Panel B shows the empirical test sizes at the left tail of the distribution and the accuracy of the Lugannani and Rice formula is demonstrated. As a benchmark for comparison, the last panel of the table provides the empirical test size of the LR test by Kupiec (1995) for unconditional coverage. Since the LR ratio test statistic achieves the Chi-squared distribution only when the sample size is large, most of the empirical test sizes are quite far off when T is less than 1000. This is basically due to the fact that for market risk, is set at a low level of 0.01. 4.2 Empirical test powers Student t distributions with 4, 6 and 8 degrees of freedom (df) are used for the study of the powers of the Bernoulli, Kupiec and the proposed size-based backtests. It is well known that the tail of Student t distribution behaves similarly to that of the heavy-tailed case of a generalized Pareto distribution (GPD).14 Given a sample of size T with b losses exceeding the magnitude of VaR, the p-value of the one-tailed Bernoulli test is given by 1 P ( B b) where B is distributed as Binomial (T , ) . While the Kupiec’s LR test is a two-tailed test, the backtest of tail risk of VaR can be either one- or two-tailed (see Section 3.2 above). Only rejection rates at 5% significance level of above backtests are presented in Table II below. Panel A tabulates the empirical test powers of the two-tailed Kupiec and the size-based proposed backtests for various sample sizes ranging from 125 to 4,000 15 ; the corresponding one-sided test powers are given in Panel B. As expected, both the Kupiec and Bernoulli tests show some powers only when T is more than 1,000. For 13 For a two-sided test based on tail risk of VaR, both the right and left tails of the test statistic’s distribution are required. This is in contrast to the LR tests of Kupiec (1995) and Berkowitz (2001) where only right tail of the Chi-squared distribution is required for a two-tailed test. 14 Under the extreme value theorem, the maxima of random samples tend in limit to a GPD; see Embrechts et al. (1997). 15 In the application of the proposed backtest of tail risk of VaR to real data in Section 5, the sample size is 3,791. 14 the proposed backtest of tail risk of VaR, equivalent powers are achieved in the case of df = 4 when sample sizes are as small as 125. <Insert Table II: Test powers> 5. Application to S&P 500 daily returns In this section three widely studied risk models are used to forecast one-step ahead S&P 500 daily returns from 1-Jan-1986 to 31-Dec-2000; see Figure 3 below. 3,000 observations prior to 1-Jan-1986 are first used to estimate the risk models and then produce ten one-step-ahead VaR forecasts. The estimation window is rolled forward ten trading days and a new set of model parameters are obtained, which will in turn be used to forecast one-step ahead for the next ten VaRs. In this manner, the process is continued until the last day of the sample, 31-Dec-2000, is reached. The chosen risk models are capable of modeling the stylized dynamics of stock returns such as the time varying conditional variances, skewed and fat-tailed distribution, and jumps. The risk forecasts are backtested with the frequency-based Kupiec and Bernoulli tests, as well as the proposed backtest based on tail risk of VaR. Two-sided Kupiec and the proposed size-based VaR backtests are applied to the 15 years of VaR forecasts. Along the same line as the Basle rules on the backtest for market risk capital, one-sided Bernoulli and the proposed backtests are also carried out on yearly basis. The results of backtests suggest there are substantial downside tail risks present in the stock markets. < Insert Figure 2: S&P 500 daily returns > 5.1 Risk models Before considering the three VaR models in detail, the basic risk model is first introduced here. Let rt be the portfolio return on day t and t be the corresponding time varying daily expected return. Then the basic risk model is rt t t a rt 1 t , t 1,t 2,t , where the innovation process t has two components which are independent of each 15 other, namely the diffusion component 1,t and the jump component 2 ,t . The diffusion component can be further modeled by the time varying conditional variance t and an IID innovation process zt as 1,t t z t . For the jump component, 2 ,t J t E[ J t | t 1 ] where J t is a jump process, the detail of which will be described below. N-GARCH The first risk model, GARCH with normal innovations (henceforth N-GARCH), has been used by Berkowitz and O’Brien (2002) to forecast the profits and losses of trading accounts of six major commercial banks. Berkowitz and O’Brien find that the standard GARCH model outperforms the banks’ structural VaR models in that the former achieves comparable risk coverage with less regulatory capital. 16 For N-GARCH, the innovation process t is fully described by the diffusion component with the following further specification: t t21 t 1 , zt ~ N (0,1). The advantage of N-GARCH is able to model effectively the time-varying conditional variances that are present in the financial asset returns. SKST-APARCH Empirical findings show that the N-GARCH residuals still exhibit fatness in the tails, and that financial asset returns are often skewed; see, for example, Mittnik et al. (2000). For this reason, the Asymmetric Power ARCH model with skewed Student innovations (hereafter SKST-APARCH) is employed by Giot and Laurent (2003) to forecast VaRs for long and short trading positions.17 Other literature of applying SKST-APARCH for risk modeling includes Giot and Laurent (2004), Giot (2005) and Tu and Wong (2005). Similar to N-GARCH, only the diffusion component is required, 16 Berkowitz and O’Brien (2002) attribute the poor performance of the banks’ structural VaR models to (a) since large banks’ trading portfolios are large and complex, approximations are used to reduce computational burdens and overcome estimation difficulties; (b) time variation in the variances and covariances of market risks are not properly captured; and (c) conservative modeling practices and regulatory constraints (prior to the revised Capital Accord II) such as the simple summation of subportfolio VaRs to produce the bank-level global VaR. 17 Another candidate model that can fit fat-tails and skewness well is the Normal Inverse Gaussian distribution; see, for example, Eberlein et al. (1998). 16 and its specification is provided below. t ( t 1 n t 1 ) t1 , z t ~ skst (0,1, , ) . The IID innovation z t has standardized (zero mean and unit variance) skewed Student distribution with PDF m 2s 1 g[ ( sz m) ] if z s , f ( z , ) 2s m g[ 1 ( sz m) ] if z , 1 s where g ( ) is the standardized Student density. While is the degree of freedom parameter that describes fatness at the tails, is the skewed parameter which is less (greater) than 1 if the distribution is skewed to the left (right). The other two parameters, m and s2 are respectively the mean and variance of the otherwise non-standardized skewed Student: m (( 1) / 2) 2 ( / 2) ( 1 ) , s 2 ( 2 2 1) m 2 . GARJI Finally, the jump-diffusion model (GARJI) of Maheu and McCurdy (2004) is the last model considered in this paper. The GARJI model has been found by Chiu et al. (2005) successful in capturing the jump dynamics of stock markets and producing accurate VaR forecasts. Maheu and McCurdy use GARJI to model the conditional variance of returns as a combination of jumps and smoothly changing components. Large fluctuations in price due to the impact of news, as well as smoother changes in price as a result of normal trading can be captured effectively by this model. In particular, the smoother diffusion component 1,t t zt where zt ~ N (0,1) and the time varying variance is given by t2 g (, t 1 ) t21 t21 , where ( , j , n , nj ) is the parameter vector. The coefficient function g (, t 1 ) enables both jump and normal innovations to be feedback asymmetrically into the conditional variance, and thus able to produce better volatility forecasts 17 especially after large moves such as the 1987 October crash. Specifically, it is written as g (, t 1 ) exp j E[nt 1 | t 1 ] I ( t 1 )( n nj E[nt 1 | t 1 ]) , where nt 1 is a Poisson process counting the number of jumps at time t 1 and I ( t 1 ) equals one if t 1 0 , zero otherwise. The Poisson process nt comes from the second jump component with time varying conditional jump intensity t . That is P(nt j | t 1 ) exp( t ) tj , j! where t v t 1 t 1 , and t 1 E[nt 1 | t 1 ] t 1 jP(nt 1 j | t 1 ) t 1 .18 j 0 Here t 1 is the ex post measured shock or jump intensity residual to t 1 . Next the jump process is defined as the sum of IID normal jump sizes, that is J t kt1Yt ,k , n where the jump size Yt ,k ~ IID N ( ,V ) . So the jump component mentioned above is obtained as 2,t J t E[ J t | t 1 ] k 1Yt ,k t . nt Readers are referred to Chan and Maheu (2002) for a more flexible specification on the jump dynamics. Finally, it is remarked that the conditional variance of GARJI consists of the smoother diffusion component as well as the jump component: var( rt | t 1 ) var( 1,t | t 1 ) var( 2,t | t 1 ) t2 ( 2 V ) t . All three risk models are estimated using maximum likelihood method and Table III reports the relevant estimated parameters. <Insert Table III: Risk models> 5.2 Calculation of VaR Both the normal and Student distributions belong to the class of elliptical 18 Similar to Maheu and McCurdy (2004), the infinite summation is truncated at j = 20 in the estimation of the GARJI model. 18 distributions and the corresponding VaR can be expressed as VaR t Z t , where Z is the left -quantile. Above VaR expression is for the downside risk of holding a long portfolio. For the upside risk of a short position, the corresponding VaR is VaR 1 t Z1 t . For standard normal distribution, Z Z1 2.326 when is set at 1% for market risk measurement. For the standardized skewed Student distribution,19 skst , , m Z skst , , s , (13) where if st , is the quantile function of the unit variance Student density at %, skst , , 1 st , 2 (1 2 ) 2 1 st , 2 (1 ) if (1 2 ) 1 , if (1 2 ) 1 . (14) There is no simple formula for the VaR of the jump-diffusion model, GARJI. Nevertheless, we can make use of the CDF function20 P(rt R | t 1 ) f (rt | nt j , t 1 )drt P(nt j | t 1 ) j 0 q (15) R j t t P(n j | ), t t 1 2 j 0 t jV so that the VaRs for the long and short positions can be obtained respectively from the facts that P(rt VaR | t 1 ) and P(rt VaR 1 | t 1 ) 1 . < Insert Table IV: VaR basic statistics > Table IV provides some basic statistics of the VaRs forecasted by the three risk models. For the in-sample forecast VaR, it can be seen that both GARJI and SKST-APARCH have the larger downside VaRs, which is consistent with the skewness present in the S&P 500 daily returns. Another interesting observation is that N-GARCH has the largest maximum VaRs on both tails of the distribution, whereas the sum of its average downside and upside VaRs is only slightly smaller than that of GARJI. For the out-of-sample forecast VaRs, it is noted that the difference from its in-sample counterparts is quite large for GARJI, an indication of possible model risk due to large number of parameters to be estimated in GARJI. Z 1 . 19 Note that (13) and (14) are also applicable to 20 As in the estimation of GARJI, the infinite summation is truncated at j = 20. 19 5.3 Backtests While qualitative and graphical analyses such as those by Diebold et al. (1998) reveal valuable information for diagnosing how a risk model may fail, it is also important to consider a formal statistical backtest that offers a more objective analysis. To carry out the size-based TR backtest, it is required to transform the skewed Student distribution into a standard normal one using 1 ( Fskst ( z , )) , where the standardized skewed Student CDF is m 2 if z , 1 2 G[ ( sz m) ] s Fskst ( z , ) 2 m 1 G[ 1 ( sz m) ] if z . 1 2 s (16) In (16) above, G ( | ) is the unit variance Student CDF. For the jump-diffusion process, one simply obtain the required normal deviates by applying the standard normal inverse function to the probability value from (15) for each observed rt . <Insert Table V: Backtests of full sample> Table V reports both frequency- and size-based two-tailed backtests for the full forecast period, a total of 3791 observations. First consider the downside risk of holding a long position in S&P 500. For the frequency-based backtest, one finds that the VaRs of N-GARCH fail the Kupiec test but the SKST-APARCH and GARJI models are able to produce VaRs that provide correct coverage of S&P 500 daily fluctuations. In the case of downside tail risk, however, it is found that all risk models fail the proposed backtests, except for the in-sample SKST-APARCH case. In particular, as a negative significant ~z statistic means that the predicted TR is too small when compared with the empirical TR, one can conclude that the risk models fail to capture downside tail risk of VaR in S&P 500. Also, the fact that the in-sample risk forecasts of SKST-APARCH pass the proposed backtest is consistent with the estimation result in Table III where the model achieves the largest likelihood value. When one holds a short position in S&P 500, he or she faces upside risks. In this case, a large positive (negative) ~z value means that the empirical TR is significantly larger (smaller) than the TR predicted by the risk model. Though the N-GARCH risk model passes all backtests, its downside risk forecasts fail the TR backtest miserably 20 with large negative ~z values.21 Both SKST-APARCH and GARJI are found to provide excessive coverage of the tail risks, an indication of failure in modeling the skewness of the distribution. The exception lies in the GARJI where its in-sample TR forecast fails to capture the empirical tail risk. To sum up above analysis, substantial skewness and downside tail risks of VaR are found to present in S&P 500 daily returns, which are difficult to model and forecast. <Insert Table VI: Backtests of tail risk-year on year> Next I apply the proposed TR backtest to the S&P 500 daily returns along the line of Basle rules so that the test is one-sided and is carried out on year-to-year basis. It is remarked here that for a year’s daily returns, a sample size of around 250 is too small for a two-sided test since in this case the probability of zero exception is 0.081, larger than the significance level of 0.05. For this reason, Table VI reports only the onetailed Bernoulli and the proposed size-based VaR backtests. While the Bernoulli test rejects N-GARCH for 3 out of 15 years, the proposed backtest detects significant tail risks unpredicted by the model for 10 years of the full sample period. Basically, the years of substantial tail risks took place during the few years surrounding the 1987 October crash and from 1996 to 2000, when the stock market circuit breaker was triggered for the first time in 1997 and the LTCM nearly collapsed in the following year.22 While it is reassuring that both SKST-APARCH and GARJI pass the backtest of tail risk of VaR during the second period, these two risk models nevertheless fail to capture the tail risk of VaR in 1987 and the year before. Indeed, the outcome remains the same for the crash year even if the in-sample forecasts are used, illustrating the challenge one faces in modeling the market fall of over twenty percent. It is worthwhile to note that there was no circuit breakers mechanism in 1987 when the stock market experienced its largest fall in a single day of trading. Therefore, questions naturally arise: whether such trading halt rules would prevent the 1987 crash ever to happen again or the above finding that the post 1996 stock market tail risks can be captured by the risk models is attributable to the circuit breakers? To answer such questions, however, is beyond the scope of this paper but certainly merits further research. 21 Note that for large sample, the TR test statistic is close to normal under the null hypothesis. So z is in this case close to standard normal. -18.91 and -20.41 are very large negative numbers as ~ 22 Prior to the LTCM’s downfall, we witnessed in 1998 the devaluation of the Russian ruble followed by Russian debt default. 21 Tail risk and subadditivity As from 2006 onwards, the new Basle Accord requires major banks to incorporate the operational risk in an integrated risk framework. Operational losses are infrequent but large in magnitude, it is likely that its distribution is not elliptical and thus it is important for risk managers to carry out backtest of tail risk of VaR. Since the S&P 500 daily returns in 1987 are found to have substantial downside tail risk, the subadditivity of VaR is explored here. In particular, consider the 1987 daily returns of S&P 500, Nikkei 225 and an equal weighted portfolio of the two stock indexes. 23 For the sake of illustration, the exchange rate risk is assumed to be perfectly hedged away, so that all three sets of returns are directly comparable. Using order statistics, 99% VaR in the year are calculated as -6.57, -4.65 and -11.74 percentages respectively. Since the sum of component VaRs, -11.22, is less negative than the portfolio VaR (the reader is reminded that a loss is represented by a negative number), the subadditivity of VaR breaks down. As can be seen above, all three risk models fail the backtest of tail risk of VaR in 1987. As non-subadditivity is often caused by substantial tail risk or unusually large losses, the proposed backtest could in practice offer preliminary check on whether the VaR of a given sample would fail the subadditive property. Finally, it is remarked that the corresponding ES values are also calculated and are found to be subadditive.24 6. Conclusion Despite the fact that VaR has certain undesirable theoretical properties, I briefly outline in this paper the reasons for its adoption by the Basle Committee for market risk capital determination and its subsequent establishment as the standard risk measure in financial industry. I then proceed to propose a simple backtest of VaR based on sizes of tail losses. The associated test statistic TR has good economic intuition, is easy to calculate and allows the backtest to be formulated as either a oneor two-tailed test. Saddlepoint or small sample asymptotic technique is used to obtain the p-value of the test statistic. Since it makes use of information conveyed by sizes of 23 It is assumed that the exchange rate risk is perfectly hedged away. The ES of S&P 500, Nikkei 225 and the equal weighted portfolio are -15.77, -10.60 and -19.16 respectively. 24 22 exceptions, the proposed backtest has accurate test size and is more powerful than frequency-based backtests even for small samples. This is important because regulatory backtest is required to be carried out on most recent 250 observations and that samples available for backtesting are often quite small in practice. The proposed backtest of tail risk of VaR is applied to the S&P 500 daily returns from 1986 to 2000. Substantial downside tail risk is found to present in the stock market returns, and risk models that take into account of jumps, skewed and fat-tailed distributions fail to capture the downside tail risks, especially in 1987. In particular, the VaR of S&P 500 and Nikkei 225 in the crash year fail to satisfy the subadditive property. Such empirical evidence of non-subadditivity of VaR has important implication for major banks. From 2006 onwards, the new Capital Accord requires major banks to incorporate the operational risk in an integrated framework. As operational risk is characterized by large and infrequent losses, tail risk of VaR is likely to be substantial and the associated problem of non-subadditivity may not be ignored. Thus in measuring a bank’s integrated risk, it is important to formally backtest the tail risk of VaR, so that a risk manager would have more information about the nature of risk (reflected by the tails of the underlying distribution) his or her bank is facing. In this paper, the saddlepoint technique for backtesting VaR based on tail losses is derived under the standard normal null hypothesis. However, it must be emphasized that the small sample asymptotic technique is also applicable to non-normal distribution. An alternative approach that is considered in this paper is to transform the non-normal distribution into a normal one. In this case, even for VaR obtained using non-parametric method such as order statistics, the proposed backtest may still be applied by transforming the sample distribution into a normal one.25 Given the potential advantage of backtesting VaR based on the sizes of tail losses, such effort is certainly worthwhile. Finally, VaR is now ubiquitous in risk management and its wide usage is difficult to be changed. Therefore, it is useful to backtest tail risk of VaR in a manner that is complimentary to the backtesting procedure stipulated by the Capital Accord. Specifically, the Basle rules count the number of exceptions whereas the proposed backtest of VaR sums the sizes of tail losses. Such complimentary role provides not 25 There are many ways to carry out such transformation. For example, the sample distribution can first be estimated by using the kernel smoothing, and then transformed into a normal distribution. 23 only valuable information for more powerful backtesting, but also presents a formal statistical analysis on the nature of tail risk of VaR and facilitates, if any, appropriate supervisory action to be made. Reference Artzner, Philippe, Freddy Delbaen, Jean M. Eber, and David Heath, 1997, Thinking coherently, Risk 10(11), 68 – 71. Artzner, Philippe, Freddy Delbaen, Jean M. Eber, and David Heath, 1999, Coherent measures of risk, Mathematical Finance 9(3), 203 – 228. Basle Committee on Banking Supervision, 1996a, Amendment to the Capital Accord to Incorporate Market Risk. Basle Committee on Banking Supervision, 1996b, Supervisory Framework for the use of “Backtesting” in conjunction with the Internal Models Approach to Market Risk Capital Requirements. Basle Committee on Banking Supervision, 2004, International convergence of capital measurement and capital standards: a revised framework. Basak, Suleyman, and Alexander Shapiro, 2001, Value-at-Risk based risk management: Optimal policies and asset prices, Review of Financial Studies 14, 371 – 405. Berkowitz, Jeremy, 2001, Testing density forecasts, with applications to risk management, Journal of Business and Economic Statistics 19, 465 – 474. Berkowitz, Jeremy, and James O’Brien, 2002, How accurate are Value-at-Risk models at commercial banks? Journal of Finance 57, 1093 – 1111. Chan, W. H. and J. M. Maheu, 2002, Conditional jump dynamics in stock market returns, Journal of Business & Economic Statistics 20, 377 – 389. Chiu, Chien L., Lee, Ming C., and Jui C. Hung, 2005, Estimation of Value-at-Risk under jump dynamics and asymmetric information, Applied Financial Economics 15, 1095 – 1106. Christoffersen, Peter F., 1998, Evaluating interval forecasts, International Economic Review 39, 841 – 862. Christoffersen, Peter F., and Francis X. Diebold, 2000, How relevant is volatility forecasting for financial risk management, Review of Economics and Stiatistics 24 82, 12 – 22. Christoffersen, Peter F., Jinyong Hahn, and Atsushi Inoue, 2001, Testing and comparing Value-at-Risk measures, Journal of Empirical Finance 8, 325 – 342. Daniels, H. E., 1954, Saddlepoint Approximations in Statistics, Annals of Mathematical Statistics 25, 631 – 650. Daniels, H. E., 1987, Tail Probability Approximations, International Statistical Review 55, 37 – 48. Danielsson, Jon, Bjorn N. Jorgensen, Mandira Sarma, and Casper G. De Vries, 2006, Comparing downside risk measures for heavy tailed distributions, Economics Letters 92, 202 – 208. Diebold, F. X., T. A. Gunther, and A. S. Tay, 1998, Evaluating density forecasts, International Economic Review 39, 863 – 883. Dowd, Kevin, 2001, Estimating VaR with order statistics, Journal of Derivatives 8(3), 23 – 31. Dowd, Kevin, 2006, Using Order Statistics to Estimate Confidence Intervals for Probabilistic Risk Measures, Journal of Derivatives, winter 2006. Eberlein, Ernst, Ulrich Keller, and Karsten Prause, 1998, New insights into smile, mispricing and value at risk: the hyperbolic model, Journal of Business 71, 371 – 406. Embrechts, Paul, Claudia Kluppelberg, and Thomas Mikosch, 1997, Modelling Extremal Events, Springer, Berlin Heidelberg. Embrechts, Paul, Alexander McNeil, and Daniel Straumann, 1999, Correlation and dependence in risk management: Properties and pitfalls, Risk 12, 69 – 71. Giot, Pierre, 2005, Implied Volatilities Indexes and Daily Value at Risk Models. Journal of Derivatives 12(4), 54 – 64. Giot, Pierre, and Sebastien Laurent, 2003, Value-at-Risk for long and short trading positions, Journal of Applied Econometrics 18, 641 – 664. Giot, Pierre, and Sebastien Laurent, 2004. Modelling daily Value-at-Risk using realized volatility and ARCH type models. Journal of Empirical Finance 11, 379 – 398. Jorion, Philippe, 2001, Value at Risk, 2nd Edition, McGraw Hill, New York. Kerkhof, Jeroen and Bertrand Melenberg, 2004, Backtesting for risk-based regulatory capital, Journal of Banking and Finance 28, 1845 – 1865. Kupiec, Paul H., 1995, Techniques for verifying the accuracy of risk measurement 25 models, Journal of Derivatives 3, 73 – 84. Leippold, Markus, 2004, Don’t rely on VaR, Euromoney, November, London. Lotz, Christopher, and Gerhard Stahl, 2005, Why value at risk holds its own? In: Euromoney, February 2005, London. Lugannani, R., and S. O. Rice, 1980, Saddlepoint approximation for the distribution of the sum of independent random variables, Advanced Applied Probability 12, 475 – 490. Maheu, John M., and Thomas H. McCurdy, 2004, News Arrival, Jump Dynamics, and Volatility Components for Individual Stock Returns, The Journal of Finance 59 (2), 755 – 793. Mittnik, S., Paolella, M.S., Rachev, S., 2000. Diagnosing and treating the fat tails in financial returns data. Journal of Empirical Finance 7, 389 – 416. Rosenberg, Joshua V., and Til Schuermann, 2006, A general approach to integrated risk management with skewed, fat-tailed risks, Journal of Financial Economics 79, 569 – 614. Szegö, Giorgio, 2002, Measures of risk, Journal of Banking and Finance 26, 1253 – 1272. Tu, Anthony H., and Woon K. Wong, 2005, Value-at-Risk for Long and Short Positions of Asian Stock Markets. 2005 International Conference on Business and Finance Proceedings, Tamkang University, Taiwan, R.O.C. Watson, G. N., 1948, Theory of Bessel Functions, Cambridge University Press. Yamai, Yasuhiro, and Toshinao Yoshiba, 2005, Value-at-risk versus expected shortfall: A practical perspective, Journal of Banking and Finance 29, 997 – 101. 26 Table I: Empirical test size of backtest of VaR based on tail losses 100,000 simulations of various sample sizes are generated to study the empirical test sizes of backtest of VaR based on tail losses and the Kupiec’s (1995) likelihood ratio backtest of failure rate. The former can be used for both one- and two-tailed tests, and the first two panels below list the empirical test sizes on the two tails of the distribution. The latter can only be used for two-tailed test and Panel C lists the corresponding empirical test sizes. All empirical test sizes are tested using the Kupiec’s (1995) likelihood ratio test. Notation * indicates significance at 1% level. N (%) P ( x 0) 0.5 Theoretical test size (%) 1.0 2.5 Panel A: Right-tail empirical test sizes of backtest of tail risk of VaR 125 28.47 28.59* 28.59* 28.59* 250 500 750 1000 2000 4000 8.11 0.66 0.05 0.00 0.00 0.00 8.28* 5.16 5.07 4.98 5.00 4.99 Panel B: Left-tail empirical test sizes of backtest of tail risk of VaR 125 0.48 0.99 2.44 250 0.53 1.01 2.48 4.98 5.03 Panel C: Kupiec’s empirical test sizes 125 250 500 750 1000 2000 4000 8.28* 1.08* 1.04 1.02 1.01 0.97 28.59* 8.28* 2.60 2.58 2.47 2.47 2.51 500 750 1000 2000 4000 8.28* 0.67* 0.54 0.50 0.51 0.47 5.0 0.52 0.49 0.51 0.50 0.47 1.02 1.00 1.02 1.01 0.95 2.53 2.56 2.46 2.55 2.44 5.05 5.05 4.92 4.94 4.91 0.17* 0.10* 0.86* 0.17* 0.42* 1.19* 0.88* 9.61* 1.99* 3.83* 9.61* 7.20* 0.66* 0.43* 0.46 0.54 0.93 1.32* 0.97 0.89* 2.95* 2.35* 2.33* 2.52 4.07* 5.54* 4.19* 5.67* 27 Table II: Test powers Powers of various backtests under the standard normal null hypothesis are investigated based on 10,000 simulations of Student t random deviates with 4, 6 and 8 degrees of freedom (df). TR, Kupiec and Bern denote respectively the proposed backtest of tail risk of VaR, the Kupiec likelihood ratio test of failure rate and the Bernoulli backtest. Panel A reports the empirical test powers of two-tailed TR and Kupiec backtests whereas Panel B presents that of one-tailed TR and Bernoulli backtests. Panel A: Two-tailed tests df = 4 TR Kupiec N 125 250 500 750 1000 2000 4000 df = 6 TR Kupiec df = 8 TR Kupiec 0.363 0.125 0.531 0.111 0.260 0.114 0.390 0.104 0.196 0.098 0.287 0.093 0.746 0.866 0.930 0.995 1.000 0.581 0.715 0.805 0.966 0.999 0.436 0.555 0.650 0.873 0.988 0.228 0.250 0.345 0.533 0.851 0.203 0.215 0.301 0.461 0.782 0.164 0.164 0.235 0.348 0.639 Panel B: One-tailed tests N 125 250 500 750 1000 2000 4000 df = 4 TR Bern 0.423 0.599 0.796 0.898 0.950 0.997 1.000 df = 6 TR Bern 0.125 0.180 0.223 0.347 0.443 0.610 0.881 0.325 0.463 0.651 0.778 0.856 0.978 1.000 28 0.114 0.161 0.197 0.306 0.389 0.538 0.822 df = 8 TR Bern 0.257 0.369 0.521 0.638 0.723 0.917 0.993 0.098 0.134 0.156 0.244 0.312 0.420 0.688 Table III: Risk Models Three risk models, namely N-GARCH, SKST-APARCH and GARJI are estimated using 3,791 S&P 500 daily returns from 1986 to 2000. SKST-APARCH is found to have the largest likelihood. Q2(20) is Ljung-Box portmanteau test for serial correlation in the squared standardized residuals with 20 lags. p-values of Ljung-Box tests are shown in brackets. Parameter N-GARACH Coeff. t-stat SKST-APARCH Coeff. t-stat GARJI Coeff. t-stat 0.064 4.29 0.042 3.15 0.035 2.74 A 0.040 2.35 0.015 0.95 0.029 1.70 0.016 1.23 0.012 2.59 0.000 0.57 0.093 1.86 0.063 5.96 -0.383 -9.54 0.897 16.79 0.940 77.5 0.977 218.7 n 0.595 5.80 0.003 0.09 1.018 7.98 5.569 9.81 -0.054 -2.59 j -0.319 -1.29 nj 0.300 1.22 0.017 4.13 0.960 127.8 0.639 6.87 -0.667 -7.47 V 0.743 8.27 Likelihood Q2(20) -5014.7 9.12 [0.982] -4783.5 11.88 [0.920] 29 -4797.9 7.34 [0.995] Table IV: VaR basic statistics Basic statistics are reported for the in-sample and out-of-sample VaR forecasts by the three risk models in Section 5. Max refers to the largest VaR in magnitude. Large difference between in-sample and out-of-sample VaR forecasts of GARJI is observed. Panel A: In-sample VaR statistics Downside risk Upside risk SKST- SKST- N-GARCH APARCH GARJI N-GARCH APARCH GARJI Average -2.20 -2.48 -2.53 2.33 2.39 2.02 Max -17.76 -12.97 -10.35 17.23 11.87 8.33 Panel B: Out-of-sample VaR statistics Downside risk Upside risk SKST- SKST- N-GARCH APARCH GARJI N-GARCH APARCH GARJI Average -2.19 -2.40 -2.79 2.30 2.43 2.64 Max -16.58 -15.35 -47.59 15.32 14.36 42.46 30 Table V: Backtests of full sample Two-tailed backtests are carried out on full sample VaR forecasts from 1986 to 2000, a total of 3791 observations. Panels A and B report respectively the results of backtests for the in-sample and out-of-sample VaR forecasts. Empirical failure rate and TR statistic are reported for the Kupiec backtest and the proposed backtest of tail risk of VaR respectively. The corresponding p-values are shown in brackets. The theoretical value of TR is -0.33887 (0.33887) for the downside (upside) risk. Downside risk Upside risk Panel A: In-sample Kupiec (%) TR SKSTN-GARCH APARCH GARJI SKSTN-GARCH APARCH 1.77 [< 0.001] -1.748 0.98 [0.881] -0.443 0.87 [0.412] -0.564 0.82 [0.244] 0.343 [< 0.001] [0.175] [0.003] [0.919] 0.55 [0.003] 0.132 GARJI 1.50 [0.004] 0.576 [< 0.001] [0.005] Panel B: Out-of-sample SKSTN-GARCH APARCH GARJI Kupiec (%) 1.71 [< 0.001] 0.87 [0.412] 0.95 [0.753] 0.63 [0.015] 0.66 [0.025] TR -1.860 -0.642 [< 0.001] [< 0.001] -0.545 [0.012] 0.431 [0.223] 0.145 [0.002] 0.180 [0.018] 1.11 [0.512] SKSTN-GARCH APARCH < TABLE VI: Out of Sample Backtests on last page > 31 GARJI Figure 1: Confidence bounds of TR The figure below depicts the 99% confidence bounds of TR using the saddlepoint and normal asymptotic distributions. Since TR is calculated based on 99%VaR, the saddlepoint lower bound is not available for sample size T 500 (0.99500 > 0.005). It can be seen from Table I that the saddlepoint technique yields accurate test sizes. Large deviation between the saddlepoint and normal limiting distribution below shows that the latter approximates poorly the true distribution of TR when the sample size is small. 99% Confidence Bounds of TR 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 100 200 300 400 500 600 700 800 900 1000 Sample size T Saddlepoint Lower Bound Saddlepoint Upper Bound Normal Lower Bound Normal Upper Bound 32 Figure 2: S&P 500 daily returns S&P 500 daily returns from 2-Jan-1986 to 30-Dec-2000 are depicted below. The lower bound is the 99% VaR for holding a long position in S&P 500, whereas the upper bound is the counterpart VaR for holding a short position. 15 10 5 0 S&p 500 -5 Long-VaR Short-VaR -10 -15 -20 33 2000 1998 1996 1994 1992 1990 1988 1986 -25 TABLE VI: Out of Sample Backtests Tests are based on 99% VaR. Bern refers to the one-tailed Bernoulli backtest and the reported frequency is the number of exceptions. TR refers to the one-tailed backtest of VaR based on tail losses and the reported TR statistic equals -0.33887 (0.33887) under the null hypothesis for downside (upside) risk. Notations * and ** are significance levels at 5% and 1% respectively. Downside risk N-GARCH Upside risk SKST-APARCH GARJI N-GARCH SKST-APARCH GARJI Year N Bern TR Bern TR Bern TR Bern TR Bern TR Bern TR 1986 1987 1988 1989 1990 1991 253 253 253 252 253 253 5 7* 4 4 5 3 -3.05** -5.97** -1.91** -3.41** -0.73 -1.49** 5 5 3 2 3 2 -2.03** -2.04** -0.76 -0.89 -0.20 -0.57 4 3 3 1 1 1 -1.43** -1.20* -1.04 -1.13* -0.08 -0.57 4 3 5 2 1 4 0.36 0.55 1.18* 0.47 0.33 1.06* 1 2 5 1 1 4 0.16 0.27 0.42 0.16 0.12 0.49 1 1 2 1 1 3 0.08 0.03 0.53 0.16 0.17 0.31 1992 1993 1994 1995 1996 1997 1998 254 253 252 252 254 253 252 1 2 4 2 7* 5 7* -0.10 -0.68 -0.83 -0.33 -1.91** -2.71** -2.33** 1 2 2 2 3 4 5 -0.02 -0.31 -0.31 -0.17 -0.50 -0.57 -0.57 0 2 1 2 4 3 4 0.00 -0.23 -0.25 -0.11 -0.49 -0.64 -0.46 0 1 0 1 0 3 3 0.00 0.01 0.00 0.14 0.00 0.38 0.19 0 1 0 1 0 3 0 0.00 0.01 0.00 0.13 0.00 0.16 0.00 0 2 0 4 0 3 3 0.00 0.14 0.00 0.21 0.00 0.25 0.34 1999 2000 252 252 4 5 -1.27** -1.18* 1 2 -0.37 -0.31 1 3 -0.01 -0.54 2 7* 0.25 1.56** 1 4 0.02 0.24 1 3 0.03 0.46 34