Download Beating7 the Benchmark_TM and Market Timing

November 22, 2005 Spurious Mutual-Fund Performance Richard J. Sweeney * McDonough School of Business Georgetown University 37th and “O” Streets, NW Washington, D.C., 20057 1-202-687-3742, fax 1-202-687-7639, -4031 e-mail: [email protected] Abstract: The portfolio manager, without superior insight, changes the risky-portfolio weight in inverse proportion to the portfolio’s excess return, giving the weight a unit root. The manager is evaluated by a quadratic regression of the portfolio’s rate of return on the market. From simulations, the manager may appear to beat the market; in a base-case, the probability is approximately 0.50 that the squared-market coefficient is significantly positive at the 10% level in a one-tail test. With observable portfolio weights, conventional unit-root tests can detect the spurious results. With unobservable weights, cusum-squared tests and recursive parameter estimates may detect spurious results for the squared-market coefficient. * For helpful comments, thanks are due in particular to Wayne Ferson and Doria Moyun Xu, and also to Boo Sjöö. Spurious Mutual-Fund Performance Abstract: The portfolio manager, without superior insight, changes the risky-portfolio weight in inverse proportion to the portfolio’s excess return, giving the weight a unit root. The manager is evaluated by a quadratic regression of the portfolio’s rate of return on the market. From simulations, the manager may appear to beat the market; in a base-case, the probability is approximately 0.50 that the squared-market coefficient is significantly positive at the 10% level in a one-tail test. With observable portfolio weights, conventional unit-root tests can detect the spurious results. With unobservable weights, cusum-squared tests and recursive parameter estimates may detect spurious results for the squared-market coefficient. Spurious Mutual-Fund Performance 1. Introduction Thousands of mutual funds are available, and information on fund performance is available to the investor from many thousands of sources, including financial advisors, firms that track fund performance, and financial newspapers and magazines. The time and energy spent on choosing mutual funds makes most sense if some funds show superior, risk-adjusted performance, and this superior performance can be predicted out of sample with some degree of success. In practice, predicting superior mutual fund performance often comes to whether past superior performance persists. Studies of superior performance are conventionally divided into two groups. One group focuses on managers’ ability at stock selectivity. Here the fund buys and holds stocks until decisions are revised; the goal of stock selectivity is to choose stocks that are likely to experience positive abnormal returns during the holding period. Another group of studies focuses on managers’ ability to time movements in returns on different asset classes, with the aim of moving back and forth across the classes. The aim is to hold larger-than-average portfolio weights in classes likely to have abnormally high returns, smaller-than-average weights in classes likely to have abnormally low returns. A general approach to evaluating the manager’s performance moving back and forth across several asset classes is style analysis, following Sharpe (1992). In a specialized case of movements across asset classes, the manager times movements across stocks and cash. This paper focuses on this special case. A number of approaches are used to evaluate managers’ performance is this case. One is the Treynor-Mazuy (1966) approach: the fund’s rate of return is regressed on the market’s rate of return and on the market-return squared; superior performance is associated with a positive slope 1 on the market-squared term. In interpreting a positive slope on the market-squared term, the assumption is that the manager with superior insight increases (decreases) the beta risk of her portfolio when the expected market excess return is above average. Lehman and Modest (1987) and Comer (2003) use a multi-factor extension of the Treynor-Mazuy model. The present paper focuses on the Treynor-Mazuy approach. Another approach is Merton and Henriksson’s (1981), where the risky portfolio’s weight is compared with subsequent market returns using contingency tables. Cumby and Modest (1987) extend the Merton-Henriksson approach to use regression; Graham and Harvey (1996) and Grinblatt and Titman (1989, 1994) use related approaches. Classic papers find no significant market timing across the risky portfolio and cash (Treynor and Mazuy 1966, Henriksson 1984), but this may be arise from lack of power in the tests. Later papers, using daily rather than monthly data, find significant market timing ability for an important number of funds (Chance and Hemler 2001, Bollen and Busse 2001a); the authors suggest monthly data may provide too little power. Related, using simulation, Goetzmann, Ingersoll and Ivkovic (2000) show that if the manager’s ability shows up in daily data, it may be obscured in monthly data. Note that for daily data, Bollen and Busse (2001b) conclude that detected superior timing ability shows significant persistence. The phenomenon of observable superior timing ability in daily data, but not in monthly data, may be an example of the "interim trading" problem (Ferson and Khang 2002, Chen, Ferson and Peters 2005). Suppose the evaluator examines the rate of return on the managed portfolio at the end of each period of J days (J is approximately 22 trading days per month). Then, the evaluator misses much of the information in the interim trading over the J days. 2 On the one hand, interim trading problems may lead to spurious results of no superior trading. On the other hand, researchers have long recognized that, for a number of reasons, a fund may show superior performance that is spurious. Jaganathan and Korajczyk (1986) show that tests of market-timing ability may find spurious superiority for a buy-and-hold portfolio if that portfolio consists of stocks that have a larger option component than the benchmark market portfolio used in the test; in reaction to this possibility, authors have adjusted benchmarks to avoid spurious superiority. Further, it is now common practice to generalize the Treynor-Mazuy regression by also including factors that capture anomalies such as size, earnings-price, book-market, momentum, the dividend yield and interest rates, to avoid giving a fund credit for exploiting these well-known anomalies. The researcher must also take care to address thin-trading problems that may infect the fund’s stocks and the benchmark portfolio. And the researcher may want to account for skewness in fund or benchmark portfolio returns, and whether co-skewness between the two is priced (Harvey and Siddique 2000). Use of daily data to avoid interim trading problems may be subject to other problems. In particular, this paper addresses how behavior motivated by the "tournament" nature of compensation for mutual fund managers may lead to performance that spuriously appears to be superior. Funds that perform relatively well tend to attract more new funds (Sirri and Tufano 1998), and fund advisers' compensation depends in part on the amount of funds under management. Brown, Harlow and Starks (1996) provide statistical evidence that funds that are doing relatively poorly in the midst of a calendar year tend to increase their risk and hence expected return relative to others during the remainder of the year in an attempt to increase their relative rankings. 3 This paper discusses a new, previously unrecognized source of superior performance that is spurious. This spurious performance is related to the Saint Petersburg or Bernoulli paradox, and is one version of interim trading problems. The spurious performance can be thought of as arising from an ability-less manager trying nevertheless to win in the mutual-fund tournament. Using simulation to investigate the Treynor-Mazuy model run over a trading year on daily data, this paper shows that the expected slope on the market-squared term may be positive even if the manager has no timing ability—it depends on how the ability-less manager alters the weight on the risky portfolio. (a) Suppose the ability-less manager randomly chooses each day whether to make the weight on the fund’s risky portfolio higher or lower than its average by a given amount, say make the portfolio’s beta 0.90 or 1.10 around an average of 1.00. Then, the expected value of the slope on the market-squared term is zero. (b) If the ability-less manager makes long-lasting changes in the fund’s weight on its risky portfolio, and these weight-changes are inversely proportional to the risky portfolio’s latest excess rate of return, then the estimated value of the market-squared slope is highly likely to be positive. In some cases discussed below, there is a 0.50 probability of rejecting the null of no superior performance at the 10% significance level in a onetail test; further, if the fund shows superior performance this year, it has a 0.50 probability of showing significant performance next year. Intuitively, the manager starts out with much of her portfolio exposed to market risk. If there is a run of days on which the excess return on the market is above average, the manager reduces the fraction of her portfolio at risk—she locks in the abnormal returns that she made by chance. If there is a further run of days with exceptional market performance, she reduces her risk-exposure even more. If, however, there is a run of days with below-average market excess returns, she increases the fraction of her portfolio exposed to market risk; this behavior is analogous to the Bernoulli bettor who doubles up his bet if he loses (Brown 4 et al. 2004), and can be thought of as a "portfolio insurance" strategy where the manager hedges a portion of his profits. On the one hand, the ability-less manager is motivated by tournament incentives. On the other hand, if there were no interim trading on the days that constitute the year over the performance is evaluated, there would be no bias.1 Brown, Harlow and Starks (1996) argue that tournament incentives induce some of substantial variation over time that many funds show empirically in the proportions of their portfolios exposed to risk. So far no one has investigated whether these variations arise from the behavior considered in this project, but this is a possibility that performance evaluators should not overlook. The researcher must condition tests for market-timing ability on the properties of the process that generates the fund’s weight changes. If the time series of these weights is observable, the persistence in the weight is often obvious in a graph of the weight against time, and standard unit root tests are often adequate for detecting the persistence. If the time series of weights is unobservable, then the researcher can test for parameter instability in the Treynor-Mazuy test equation, in hopes of detecting time variation in the risky portfolio’s weight. An approach with more power is to use a modified Treynor-Mazuy test based on Ferson and Khang (2002), where conditioning variables are used to avoid interim trading biases. Ferson and Khang condition on past portfolio weights, and this approach works for the problem this paper discusses. Alternatively, the Ferson-Khang approach also works where the conditioning variable is past rates of return on the market; hence observable portfolio weights are not required. This paper uses simulation, as do Goetzmann, Ingersoll and Ivkovic (2000) and Daniel (2002). Roll (1978), Lehmann and Modest (1987) and Grinblatt and Titman (1994) show that estimated performance is sensitive to choice of benchmark; by using simulation, this study avoids 1 If the daily data show autocorrelation, modifying the Treynor-Mazuy regression to condition of the autocorrelation would eliminate bias. 5 problems that arise from benchmark misspecification. It might appear that simulation ensures that no covariates need be included as in a “conditional” Treynor-Mazuy analysis (Ferson and Schadt 1996, Ferson and Khang 2002); on the contrary, conditioning on past weights or past market rates of return is necessary to avoid biases in estimates of the Treynor-Mazuy performance measure. 2. The Model In the Treynor-Mazuy approach, the manager’s performance is evaluated with the OLS regression (1) RP,t = a + b RM,t + c (RM,t)2 + vt, where RP,t is the excess rate of return on the portfolio, RM,t the excess rate of return on the market, and vt is a residual. Superior performance is inferred from c > 0, negative or perverse performance from c < 0. Because of the tournament nature of manager compensation (Brown, Harlow and Starks 1996) and even job tenure, the manager has strong incentives to search for techniques to enhance his measured performance. While it is well known that a manager with no skill may spuriously enhance performance, this paper investigates a particular technique that has not been discussed before and uses simulation to show that a manager with no skill may nevertheless have a significantly positive market-squared term in (1). Suppose the manager’s overall portfolio is made up of his risky portfolio and his risk-free portfolio. The excess rate of return is zero on her riskfree portfolio. The excess rate of return on her overall portfolio and on her risky portfolio is (2) RP,t+1 = wt RM,t+1 + ut+1, where wt is the weight the manager chooses at the end of t for her risky portfolio for period t+1.2 The risky portfolio has a beta of wt on the excess rate of return on the market, RM,t, the only risk 2 This specification assumes that when wt = 0, the portfolio is still subject to idiosyncratic risk, RP,t+1 = ut+1. Another possible specification is RP,t+1 = wt (RM,t+1 + ut+1), where wt = 0 implies RP,t+1 = 0. The discussion in Section 5 shows 6 factor facing the manager. The risky portfolio’s mean-zero idiosyncratic error ut has no serial correlation and is uncorrelated with the market’s excess rate of return, E ut = 0, E ut RM,t+j for all t,j, and E ut ut+j = (2u, 0) for j (= 0,  0). The manager has no market-timing ability, but simply changes wt in inverse proportion to the portfolio’s latest excess return. She adjusts wt as wt = wt - wt-1 = -  (RM,t - E RM), giving (3) wt = wt-1 -  (RM,t - E RM) = w0 -  tj=1 (RM,j - E RM), where   1 (save for one experiment below), and, for convenience, the expected excess rate of return on the market is taken as time constant, E RM,t = E RM. Thus, wt is a unit-root process.4 Note that if the manager sets  = 0, then he rebalances every period to set wt = w0. Alternatively, if the manager follows a buy and hold strategy, the weight evolves as wt = wt-1 (1 + RM,t + rf,t) / (1 + Rp,t + rf,t), where rf,t is the risk-free rate; performance in the buy-and-hold case is discussed below. "You want the measured return used in (1) now to compound the results over J periods, J being a parameter of the simulator." More realistically, the manager modifies (3) to set upper and lower bounds on wt—max, min. Funds typically promise they will hold no more than a certain fraction, or less than another fraction, of assets in the risky portfolio; in the absence of bounds, | wt | may be substantially larger than the fund’s syllabus explicitly or implicitly promises. Thus, the manager is likely to use = max for [w0 -  tj=1 (RM,j - E RM)]  max that the estimated parameters in (3) below, and the t-value for the market-squared term, are complicated sets of functionals where the effects of ut go to zero in probability. In other words, the differences in specification are unimportant. Simulations (Table 8.E below) show that, for T=250, if the variance of the idiosyncratic term is cut in half, the tc mean and standard deviation rise to 2.11193 from 2.00339, and to 2.04518 from 1.92122. The ĉ mean and standard deviation are essentially unchanged, as are the b̂ mean and standard deviation. 4 Because wt is a unit-root process, in the manager’s current portfolio beta depends on how well her portfolio has performed in the past, as measured by (R M,j-1 - E RM). Past performance, say j periods ago, (RM,t-j - E RM), has an effect larger or equal to unity,   1, but the weight on performance in period t-j does not change as time passes: Good performance leads to permanent reductions in the portfolio’s future betas, bad performance to permanent increases. 7 (3’) wt = [w0 -  tj=1 (RM,j-1 - E RM)] for max > [w0 -  tj=1 (RM,j - E RM)] > min = min for [w0 -  tj=1 (RM,j - E RM)]  min The weight and hence portfolio beta has a unit root within the bounds of max, min. The max, min are reflecting barriers—if wt reaches a bound, eventually it is likely to move back within the bounds.5 (3) is analytically simpler, but (3’) is much more important practically. The data generating process (DGP) is (2) and either (3) or (3’). On the one hand, from (3) and (3’), the manager does not adjust her weight when she thinks the market is likely to be above average, as in the discussion typically behind the Treynor-Mazuy model in (1). The manager has no insight regarding future RM,t: Her best guess is Et-1 (RM,t - ERM) = 0, and the true value of c is zero. On the other hand, the distribution of the estimate of c in (1) is non-standard, because the time-varying weight wt has a unit root, over its entire range in (3), or over max  wt  min in (3’). A number of authors propose generalizing the Treynor-Mazuy regression to include other variables that may help explain the rate of return on the risky portfolio, for example, an interest rate, the dividend yield, the earnings-price ratio etc. (see Ferson and Schadt 1996, Ferson and Warther 1996, Chance and Hemler 2001, Daniel 2002). If such covariates are omitted, and if the market-squared term is correlated with these omitted variables, then the market-squared term may show spurious significance.6 Because such covariates do not enter the DGP for the simulations this paper studies, there is no need to include them in the Treynor-Mazuy regressions—they are irrelevant variables. The ability-less manager's behavior can be thought of as arising from tournament incentives, as discussed by Brown, Harlow and Starks (1996). In one view, the ability-less If wt hits the lower bound of 0.00, for example, it will become positive if [w0 -  tj=1 (RM,j-1 - E RM)] becomes positive. The longer the time horizon, the larger the probability this will occur. Thus, J is an interesting parameter. 6 Or lack of significance, depending on the signs of the omitted covariates’ effects on the risky portfolio and the covariates’ correlations with the market-squared term. 5 8 manager follows a contrarian strategy as opposed to a momentum strategy; in light of an aboveaverage excess return on the market, she reduces her risky holdings. Note, however, that the data show neither momentum nor reverse (or negative momentum); as far as the ability-less manager can see, the excess return on the market contains no structure; after an above-average excess return on the market the manager does not reduce her risky position because she thinks a below-average excess return is likely. In another view, the ability-less manager may be thought of as using portfolio insurance. Note, however, that the manager claims to outperform the market, and the fund’s prospectus does not state the manager uses portfolio insurance. On the one hand, the manager is following a mechanical strategy that, if it were revealed, an investor desiring portfolio insurance could follow for free (or at very low cost, by hedging with derivatives), and thus the investor would not pay the manager for her results. On the other hand, it is not at all clear that an investor would desire to increase the proportion of his portfolio at risk in the aftermath of below average excess returns on the market.7 3. Simulation Results Section 5 shows that OLS estimates from the Treynor-Mazuy regression (1) of ĉ and its tvalue, tc, have distributions that depend on functionals in complicated ways best studied by simulation. In the simulations below, daily data are assumed, with 251 days per trading year, or 250 days after the initial day. For the simulations, the DGP is (2) and either (3) or (3’). The excess rate of return on the market is RM,t  N(0.00024, 0.012649 2), or the annual excess rate of return has a mean and standard deviation of 6% and 20%. The idiosyncratic term is ut  N(0.0, The portfolio insurance involved in the manager’s behavior makes sense for an investor who desires to reduce the share of his portfolio at risk as his wealth increases, and increase the share at risk as his wealth falls. Even an investor with this type of risk aversion would not choose an arbitrary adjustment parameter , but instead  would be implied by parameters in his utility function. 7 9 0.00421637 2); its variance 2u is chosen to be 10 percent of 2RM + 2u; Section 5 presents results for variation in 2RM and 2u. For each case, 10,000 replications are used. The simulations give some very general results. First, if the bounds max = 2.00, min = 0.00 are imposed, and a value  of say 5.0 or larger is used,8 then tc is likely to be positive; the probability that tc is significant at the 10% level or better in a one-tail test is approximately 50%. Second, for  values of say 5.0 or larger, removing the bounds max, min increases the likelihood that tc is positive; the probability that tc is significantly positive at the 10% level or better in a onetail test rises to more than 60%. In the absence of weight-bounds, however, the manager is likely on an important number of occasions to choose weights much larger or smaller than the fund’s syllabus promises investors. 3.A: Effects of Bounds on Variations in the Risky Portfolio’s Weight. Unless otherwise noted, all cases start with w0 = 1. Cases 1 and 2 use  = 10.00. Case 1 sets the bounds max = 2.00, min = 0.00, and Case 2 shows the effects of not imposing bounds. Case 1. Table 1 shows the average tc is 1.29;9 the tc distribution shows little skewness but highly significant kurtosis.10 From Table 2, in 49.74% of the runs tc is significantly positive at the 10% level in a one-tail test, with only 21.47 percent of the tc negative. Table 2 also shows the true sizes for simulations relative to nominal sizes in one-tail tests of the null that c = 0. The b̂ have a mean of 0.999, quite close to unity and to w0 = 1; the maximum and minimum It might appear that  = 5.0 is large, but relative to returns parameters, it is not. The expected excess rate of return on the market (or portfolio with wt = 1) is 0.00024 per day, the expected change in wt is  x 0.00024, for  = 5.0 is 0.0012, and relative to the initial value of w0 = 1.00, is 0.0012, or 0.12 of 1%. The standard deviation in percent per day is 20%/250 = 0.08%; thus, for  = 5.0,  x 0.08% relative to w0 = 0 = 1 is 0.40%. 9 The notes to Table 1 contain critical values for Cases 1-4. They are sensitive to weight bounds and the values of . 10 The text and an appendix discuss in some detail the distributions of the estimated coefficients and their relations to each other. This is because, as Section 5 shows, the estimated coefficients and the t c are each a complicated set of functionals where the properties of the distributions must be found from simulation. Thus, the details presented for the various cases and the further simulation results in Section 5 characterize the distributions of the estimated parameters and tc. 8 10 are 2.018 and -0.013, close to the max = 2.00, min = 0.00. The b̂ show little skewness but negative kurtosis, which likely comes from the truncation of the distribution by max = 2.00, min = 0.00 (see Case 2 results). Measuring Stock-Selectivity and Timing Ability. A common interpretation is that ĉ reflects timing ability, â measure stock-selection ability, and â + ĉ s2Rm measures total ability, where s2Rm is the sample variance of the market’s excess rate of return (Aragon 2005, Daniel 2002, Glosten and Jaganathan 1994). "A better version is Aragon (2005) using option values. See Glosten and Jaganathan, Journal of Empirical Finance (1994)." Many researchers find that â and ĉ are negatively correlated and tend to offset each other’s effects on timing ability, often giving total timing ability of approximately zero. From Table 1, the average â > 0, the average ĉ > 0, and hence they do not tend to offset. Using the means from Table 1 and the variance of RM,t in the DGP, â + ĉ s2Rm = 3.31008D-06 + 2.41933 (0.012649 2) = 0.0003904. The square root multiplied by 100 gives 1.9758189%. Because of the DGP for this paper’s simulated portfolio returns, there is no stock selectivity; consistent with this, â is small and has little effect on â + ĉ s2Rm. Note, however, that the correlation between â and ĉ is -0.75999, highly significant (appendix). These results should be compared to the case below where wt changes randomly. Results are similar in Cases 2-4, save the mean â is -1.42E-05 in Case 3. Case 2. In the absence of weight-bounds, the average tc is 1.84, versus 1.29 in Case 1. From Table 2, in 62.93% of the runs the tc is significantly positive at the 10% level in a one-tail test, versus 49.74% in Case 1; further, the likelihood of finding a negative tc decreases, from 21.47% to 15.56%. The tc distribution is significantly skewed to the left (insignificant, right 11 skewness in Case 1), with significant kurtosis, as in Case 1. The mean ĉ is 4.98096, more than twice as large as the 2.41933 in Case 1. The b̂ have a mean of 0.991, close to unity and to w0 = 1; the maximum and minimum are 5.01458 and -3.13936—in contrast, Case 1 has 2.018 and -0.013, close to max = 2.00, min = 0.00. The b̂ show little skewness; their negative, insignificant kurtosis of -0.072029 is substantially closer to zero than the significant -1.35557 in Case 1. Comparison to Random Changes in Weight. For comparison, consider the case where wt fluctuates randomly, wt = w0 + Zt, where Zt  N(0.0, 0.012649 2), with zero mean but the same standard deviation as the return on the market, and max = 2.00, min = 0.00. Then, in the TreynorMazuy test regression (1): tc ĉ Mean 0.0016144 0.0035558 Std Dev 1.00391 1.22519 Skewness 0.024892 0.037214 Kurtosis 0.042075 0.19095 Minimum -3.72617 -4.68373 Maximum 4.46232 5.01194 tc is essentially zero, with a standard deviation of approximately unity; skewness and kurtosis are positive but insignificant.12, 13, 14 Comparison to a Buy-and-Hold Strategy. Alternatively, the manager may follow a buyand-hold strategy, where the weight evolves as wt = wt-1 (1 + RM,t + rf,t) / (1 + Rp,t + rf,t), and rf,t is the risk-free rate. With a buy-and-hold strategy, there are no weight bounds. Assuming that rf,t is 2%/year, in the Treynor-Mazuy test regression (1): tc Mean Std Dev Skewness Kurtosis 0.0069658 1.02693 0.024478 -0.0061471 -4.02336 12 Minimum Maximum 3.67080 In this case (and in general save Case 4), the maximum value of tc and c tends to exceed the absolute value of the minimum value because the expected value of the excess rate of return on the market is positive and substantial. 13 The mean â is -3.22051D-06 and the mean ĉ is 0.0035558, so that â + ĉ s2Rm = -3.22051D-06 + 0.0035558 (0.00016) = -3.22051D-06 + 0.6 D-06 = -3.16 x 10 -6 < 0. The correlation between â and ĉ is -0.57587. 14 The mean b is unity, with a standard deviation of 0.021540, significant positive skewness but insignificant kurtosis, and minimum and maximum values of 0.93025 and 1.10267. 12 ĉ 0.0081278 1.25677 -0.0099342 -0.035826 -4.88929 4.53971 tc is essentially zero, with a standard deviation of approximately unity; skewness is positive and kurtosis is negative, but both are insignificant.15, 16 3.B: Effects of the Adjustment Coefficient . If the coefficient  is reduced to 5.00, as in Case 3, the distribution of tc shows a relatively small leftward shift. If  is reduced further to 1.00, as in Case 4, the effect on the tc distribution is severe. In both cases the weight bounds are max = 2.00, min = 0.00, with w0 = 1.00. Case 3. The average tc is 1.28; in 50.27% of the runs, tc is significantly positive at the 10% level in a one-tail test, close to the 49.74% in Case 1, where  = 10.00. 21.26 percent of the tvalues are negative, versus 21.47 percent in Case 1. The tc distribution is skewed left, with positive kurtosis. The b̂ have a mean of 0.99644, with a range of 0.017253, 1.97032, just within the weight bounds. b̂ has insignificant left-skewness with significant negative kurtosis. Case 4. The average tc is 0.40, compared to 1.29 in Case 1; in 20.30% of the runs tc is significantly positive at the 10% level in a one-tail test, substantially less than the (approximate) 50% in Cases 1 and 3, where  = 10.00, 5.00. The b̂ have a mean of 1.00027, close to unity and w0 = 1; the maximum and minimum are 1.43768 and 0.55016, well within max = 2.00, min = 0.00. The b̂ show little skewness or kurtosis. Comparison of Results Across Values of . Table 1 allows a comparison of Cases 1, 3 and 4, with max = 2.00, min = 0.00 and w0 =1. As  goes from 10, to 5, to 1, the tc empirical distributions approach the N(0, 1). As  goes from 10, to 5, to 1, the ĉ empirical distributions show 15 The mean â is -6.30805D-06 and the mean ĉ is 0.0081278, so that â + ĉ s2Rm = -6.30805D-06 + 0.0081278 (0.00016) = -6.30805D-06 + 1.3 D-06 = -5.01 x 10 -6 < 0. The correlation between â and ĉ is -0.60139. 13 decreases in the mean and standard deviation; the mean goes to zero as  decreases towards zero. Across , the mean of the b̂ empirical distributions fluctuates around 1.00; the standard deviation decreases as  decrease towards zero.17 3.C: Mean Reversion in the Weight A more general weight-adjustment rule allows mean reversion in wt to its long-run value at the adjustment speed 1 > 0, with wt a unit-root process in the limit. The rule is (4) wt = (0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM). The long-run value of wt is w* = (0 1) / 1 = 0; thus, if the long-run weight is w* = 1, then 0 = 1, and the intercept in (4) is (0 1) = 1. Similar to above, if bounds of max, min are imposed, = max (4’) for max  [(0 1) + (1 - 1) wt -  (RM,t-1 - E RM)] wt = (0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM) for max > wt-1 > min = min for min  [(0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM)] Note that (4) and (4’) nest (3) and (3’): as 1  0, model (4) goes to (3) and (4’) goes to (3’). The simulations below use max = 2.00, min = 0.00,  = 10 and w* = 1 (comparable to w0 = 1 above). In Tables 3.A and 3.B, simulations of (4’) show that if the adjustment speed is 0.001, or 0.1%/day, then results are similar to the case where 1 = 0 in model (3’). Intuitively, if the weight is near integrated (NI), the results are similar to the results when the weight is I(1). For sufficiently high adjustment speeds, however, the results show that the manager has an edge, but only a slight one. For the adjustment speed 1, the gap between any wt and w* decreases over N periods by the 16 The mean b is 1.00368, with a standard deviation of 0.044241, positive but insignificant skewness and kurtosis, and minimum and maximum values of 0.83631 and 1.20671. 17 In cases where max = 1.50, min = 0.50, the mean tc is relatively insensitive to . As  goes from 10.0 to 15.0 to 20.0, the tc mean (standard deviation) goes from 0.84237 (1.43307) to 0.80034 (1.47881) to 0.81188 (1.51970). For max = 2.50, min = -1.50 with  = 10.0, the mean tc is 1.66000 (1.76345). In contrast, with max = 2.00, min = 0.00,  = 10.0, the mean tc is 1.28681 (1.65014). 14 percent [1 - (1 - 1)N]. For 1 = 0.001 over the course of a year of 250 trading days, the gap decreases by [1 - (0.999)250] = 1 - 0.7787 = 0.2213, or by 22.13%. For 1 = 0.01, the gap decreases by 91.89%; for 1 = 0.100, the gap is virtually zero. Simulations for 0.001, 0.010 and 0.100 can be compared to Case 1, 1 = 0.00—see Tables 1, 3.A and 3.B. As the adjustment speed rises from nil to 0.1%/day to 10.0%/day, the mean tc falls from 1.287 to 1.194 to 0.850 to 0.172. The gap between the maximum and the minimum values is roughly 11.5 to 12.5 across the 1. Increases in the adjustment speed can be thought of as shifting the tc distribution to the left. A comparison of the distributions of the â , b̂ , ĉ is in Table 3.B. For b̂ , increases in the adjustment speed leave the mean virtually unchanged, but reduce the spread of the distribution. For ĉ , increases in the adjustment speed reduce the mean and also reduce the spread of the distribution. For â , increases in the adjustment speed have no clear effect on the mean, but reduce the spread of the distribution. 4. Implications for Evaluating Performance Subsections 4.A-4.C illustrate difficulties in evaluating performance when fund managers use weight-adjustments such as (3), (3’), (4) or (4’). Section 5 discusses statistical methods of detecting the ability-less manager's strategy. 4.A: Evaluating a Fund’s Performance over Time. Consider a single manager who follows the strategy in (3’) and sets max = 2.00 and min = 0.00, and sets  = 10.00—Case 1. In any single year, from Table 2 the probability that her ĉ is significant at the 10% level is .4974  0.50. Given that she has ĉ > 0 significant this year at the 10 percent level, the conditional probability of ĉ > 0 significant next year is 0.50. The unconditional probability that she will have two years with ĉ significant at the 10 percent level in both years is thus 0.25 = 0.5 x 0.5, and a run of three 15 years has a probability of 0.125; with a symmetric distribution, one would instead expect probabilities of 0.10, 0.01, and 0.001. The probability that tc > 0 over one, two and three years in a row is 0.7853, 0.6167 and 0.4843, as opposed to the expected 0.5, 0.25 and 0.125. 4.B: Evaluating a Cross Section of Funds in a Given Year. Consider any given year. A manager who sets max = 2.00, min = 0.00,  = 10.00 (Case 1) is likely to do well: from Table 2, the probability that tc > 0 is 0.7853, and the probability is 0.497 that tc is significantly positive at the 10 percent level. But she has a significant chance of doing poorly; the probability of tc < 0 is 0.2147. Consider a family of mutual funds, where one fund sets  = 10.00 and another fund sets  = -10.00. The fund with  = 10.00 is likely to do well in many but not all years. In the years when this fund does poorly, the fund with  = -10.00 is likely to do well. The converse also holds: When the fund with  = 10.00 does well, the fund with  = -10.00 is likely to do poorly. The family of funds may argue that it provides the investor with nice diversification: In good years, the investor will do very well with the part of his portfolio in the fund with  = 10.00, and in the years when this part of the portfolio does poorly, the part of his portfolio in the fund with  = -10.00 is likely to take up the slack and do well. The fact that the fund with  = -10.00 does poorly in many years is just the cost the investor must pay for the insurance or diversification that the fund provides. 4.C: Effects of  : Cross-Section, Time-Series Behavior of Funds’ Performance. If a number of funds use the approach in (2) and (3) or (3’), an evaluator may find it difficult to detect the similarity. Consider two funds with the same risky portfolio and same idiosyncratic error, with max = 2.00, min = 0.00. One sets  = 10.00, the other  = 5.00, as in Cases 1 and 3. From Table 2, in a given year each has approximately a 0.50 probability that ĉ > 0 and significant at the 10% 16 level, and a 0.41 chance that ĉ > 0 significant at the 5% level.18 Generally, the two funds show substantial but not perfect correlation in their â , ĉ , b̂ , tc and paths of wt. Table 4 shows a regression of the funds’ t-values, where tc,i,=10 and tc,i,=5 are the t-values for the ith replication with  = 10.0,  = 5.0, and in each replication the same data are used for  = 10.00,  = 5.00. The R2 is 0.82612, and hence the correlation is 0.90891. Making the funds less alike reduces the correlations, for example, if each fund’s risky portfolio is the same, but their idiosyncratic errors are uncorrelated. Similarly, one fund may use max = 2.00, min = 0.00, but the other max, = 2.10, min = 0.10, or max, = 1.90, min = -0.10, etc. Suppose that 20 funds in a 100-fund panel follow (3’), choose values of , max, min similar to the ranges above, but with some heterogeneity in these parameters, the risky portfolios and idiosyncratic risks. Each year, the researcher is likely to find that very roughly 20 funds do well, but with fluctuations across years. He is likely to find that funds that do well in a given year have related, but hardly the same, values of ĉ . Over a two-year period, some funds do well in both years, but some that do well in the first year do not in the second. The researcher is likely to conclude that the subset of 20 funds does not as a whole follow the behavior in (3’). 5. Adjusting the Treynor-Mazuy Regression for the Manager's Strategy This section shows that, by using Ferson and Khang's (F-K 2002) modification of the Treynor-Mazuy model, the evaluator can control for the information-less manager's attempts to game evaluation. In simulations of the F-K modification of the T-M model, the (RM,t)2 variable is centered on zero and its t-value is approximately N(0, 1). F-K modify the T-M test regression to allow for conditioning on publicly available information and also for special insights the manager may have. For present purposes, the manager's special insight is ignored: by assumption the 18 For smaller significance levels, the probabilities diverge, with larger probabilities for the  = 10.00 fund. 17 manager considered has none (Ferson and Schadt 1996 discuss cases where only conditioning is considered). Suppose that the publicly available information is Zt. In this case, the F-K test regression is (1') RP,t = a + b1 RM,t + b2 (RM,t Zt-1) + c (RM,t)2 + ut, where ut is a residual and Zt-1 is publicly available information at the end of period t-1 (start of period t).19 For present purposes, use the publicly available past daily rates of return on the market as the conditioning information. Form the variable Z*t-1 = t-1j=1 (RM,j - ERM,j) Recall that wt-1 = 1 -  t-1j=1 (RM,j - ERM,j); thus, wt-1 = 1 -  Z*t-1 or Z*t-1 = (1 - wt-1) / . For the F-K test regression (1), Tables 5 and 6 show simulation results,20 Table 5 for the case where there are no upper and lower bounds on wt, Table 6 for the case where the upper and lower bonds are 2.0 and 1.0. In both Tables 5 and 6, the mean value of ĉ is close to zero (0.0026700, -0.0048807). Similarly, the t-values are center on zero and are approximately N(0, 1): The mean t-values tc and their standard deviations are (-0.0019133, 1.00771) and (-0.0045449, 0.99392); in both cases skewness and excess kurtosis of tc are close to zero, (0.0095965, 0.024926) and (-0.016973, -0.021319). The test regression in (1) does not produce biased estimates of the intercept a, and thus does not produce spurious estimates of selectivity. In Tables 5 and 6, the mean â is very small and positive (0.33352x10-6, 1.87593x10-6), and both are small relative to their standard deviations (0.13591x10-2, 0.32703x10-3). The correlation between â and ĉ is -0.56766 for Table 5 and 0.56875 for Table 6.21 19 Zt-1 may be a vector of variables, with b2 a conformable vector of coefficients. In the DGP used for Tables 5 and 6, the manager makes no attempt to adjust wt back to the initial w0; mean reversion is discussed below. 21 In a regression of the â on ĉ , the slopes (t-values) are -0.148405x10-3 (-68.9458) and -0.150900x10-3 (-69.1407). Across the two tables, the LM heteroscedasticity test, the Jarque-Bera test and Ramsey's RESET2 test raise one red flag: for the regression for Table X, the Jarque-Bera test statistic is significant at the 0.009 level. 20 18 Conditioning When the Weight Shows Mean Reversion. Section 3 discussed and showed results for the case where the weight wt reverts at the speed 1 to the long-run weight w0 in the absence of further surprises to the market rate of return. In this mean-reversion case, including the conditioning variable Z*t-1 = t-1j=1 (RM,j - ERM,j) in the Treynor-Mazuy test regression leads to downward bias from zero in ĉ , as the following shows for the adjustment speed 1 = 0.100: tc ĉ Mean -0.13566 -0.18228 Std Dev 1.21900 1.59907 Skewness -0.094529 -0.10472 Kurtosis 0.041400 0.081435 Minimum -4.82933 -7.58638 Maximum 4.19592 5.84840 The bias in ĉ arises because the conditioning variable Z*t-1 = t-1j=1 (RM,j - ERM,j) is misspecified. The appropriate conditioning variable is Z**t-1 = t-1j=1 (1 - 1)j-1 (RM,j - ERM,j), as the following shows for the adjustment speed 1 = 0.100: tc ĉ Mean 0.00059898 0.0016860 Std Dev 1.01065 1.24554 Skewness -0.063544 -0.062425 Kurtosis 0.024090 0.11412 Minimum -3.89421 -4.95423 Maximum 3.51783 4.80219 This suggests that when conditioning on Z*t changes a positive value of ĉ to a negative value, the researcher may search across values of (1 - 1) to find the value that sets ĉ = 0. 6. Distribution of Coefficient Estimates and t-values The distributions of the â , b̂ , ĉ and tc are non-standard and difficult to characterize analytically even when weight bounds max, min are not imposed. An appendix (available from the author) sketches the result that asymptotically the â , b̂ , ĉ and tc are complicated functionals with non-standard distributions; this arises because the weight wt contains a unit root. The characteristics of the empirical distributions must be found from simulations, as above. These results are supplemented by further simulations discussed below. Empirical Distributions of the Estimates: Further Characterization. Consider how increasing the sample period, from 125 to 750, affects empirical distributions in Table 7.A, where, 19 max, min are not imposed, w0 = 1.00 and  = 10. First, the mean of b̂ is very close to unity, across the T, or is very close to w0 = 1; the standard deviation of the b̂ rises, however, as T increases. Second, the mean of ĉ is very close to 5.0 across the T, and the standard deviation of the ĉ appears to be independent of T. Third, the mean and standard deviation of the tc rise as T increases. Effects of Changes in Parameters. Consider changes in , w0, 2Rmt, 2ut and E RM. In Table 7.B w0 changes from 1.00 to 0.00, all else constant. The mean of b̂ changes to -0.0063421 from 0.98530, but the standard deviation changes little; essentially b̂ is centered at w0 and is otherwise invariant to w0. The value of w0 has no systematic effects on the distributions of ĉ and tc. In Table 7.C,  changes from 10 to -10. Essentially, the means of ĉ and tc change signs, but otherwise their distributions are unchanged.  has little effect on b̂ ’s distribution. In Table 7.D, the standard deviations of both errors are cut in half. The tc mean falls to 1.65744 from 2.00339, the standard deviation falls to 1.74683 from 1.92122. The ĉ ’s mean and standard deviation are essentially unaffected. b̂ ’s mean is unaffected, but its standard deviation is approximately cut in half. In Table 7.E, the standard deviation of the error term is cut in half. The tc mean and standard deviation rise to 2.11193 from 2.00339, and to 2.04518 from 1.92122. The ĉ mean and standard deviation are essentially unchanged, as are the b̂ mean and standard deviation. Finally, in Table 7.F the mean rate of return on the market is cut in half. The tc mean and standard deviation are essentially unchanged, as are those for the b̂ and ĉ . 6. Conclusions Investors spend a good deal of time and trouble trying to discover mutual funds that show superior, risk-adjusted performance, particularly superior mutual fund performance that persists. One set of studies of superior performance focuses on market timing, or the fund’s ability to 20 predict relative rates of return on major classes of assets, often stocks versus cash. Classic papers on mutual-fund market timing (Treynor and Mazuy 1966, Henriksson 1984) do not find significant market-timing ability. Later papers, using daily rather than monthly data, find evidence of significant market timing ability for an important number of firms in their studies (Chance and Hemler 1999, Bollen and Busse 2001a); the authors suggest that monthly data may provide too little power. ("Another good issue. See Farnsworth et al. Journal of Business (2002).") Bollen and Busse (2001b) examine whether market timing ability persists and conclude that superior ability shows significant persistence. This paper studies the market-timing results that occur under the null that the fund has no superior ability. The Treynor and Mazuy (1966) and Henriksson (1984) tests rely on the assumption that their test statistic has an expected value of zero under the null of no ability; for example, Treynor and Mazuy regress the fund’s rate of return on the market’s rate of return and the square of the market’s rate of return, and test whether the slope coefficient on the marketsquared term is different from zero. This paper shows that the expected slope on the marketsquared term need not be zero under the null of no ability—it all depends on how the fund manager with no ability alters the weight on the risky portfolio. In particular, suppose the fund manager with no ability makes long-lasting changes in the fund’s weight on its risk portfolio, and these weight changes are inversely proportional to the risky portfolio’s latest excess rate of return. Then, the expected value of the slope on the market-squared term is positive, and the size in standard t-tests is much larger than the nominal size. For example, in some cases discussed above, there is a 0.50 probability of rejecting the null at the 10% significance level in a one-tail test. Intuitively, the manager follows a type of portfolio insurance or a contrarian strategy. She starts with much of her portfolio exposed to market risk. If there is a run of days on which the 21 excess return on the market is above average, she reduces the fraction of her portfolio at risk—she locks in the abnormal returns that she made by chance. If there is a further run of days with exceptional market performance, she reduces her risk-exposure even more, locking in more aboveaverage performance. If, however, there is a run of days with poorer than average market excess returns, she increases the fraction of her portfolio exposed to market risk, similar to the Bernoulli bettor who doubles up his bet if he loses. The researcher testing for market-timing ability must condition the test on the properties of the process generating the fund’s weight changes. If the weight time series is observable, the persistence of changes in the weight is often obvious in a graph of the weight against time, and standard unit root tests often detect the persistence. If the time series of weights is unobservable, then the researcher can test for parameter instability in the test equation, for example, the TreynorMazuy test equation, in hopes of detecting the time variation of the weight on the risky portfolio. It has long been recognized that market-timing tests may find spurious superiority. Jaganathan and Korajczyk (1986) show spurious superiority may arise for a buy-and-hold portfolio if that portfolio consists of stocks with a larger option component than the test’s benchmark market portfolio. Further, it is now common practice to generalize the Treynor-Mazuy regression by also including factors that capture anomalies such as size, earnings-price, bookmarket, momentum, the dividend yield and interest rates, to avoid giving a fund credit for exploiting these well-known anomalies; and similarly the researcher must take care to adjust for thin trading and skewness. The spurious superiority this paper discusses is yet another type for which the researcher must adjust when evaluating market-timing ability. 22 Table 1. Distributions of t-values for Alternative Simulations Case 1. max = 2.00, min = 0.00, and  = 10.00. tc a b c Mean 1.28681 3.31008D-06 0.99928 2.41933 Std Dev 1.65014 0.00054312 0.61080 3.35767 Skewness 0.022696 -0.030868 0.0059841 -0.18119** Kurtosis 0.13183** 0.57610** -1.35557** 0.76334** Minimum -4.76387 -0.0025459 -0.013030 -15.28305 Maximum 7.73448 0.0022637 2.01838 16.42690 -0.095040** 0.20646** -0.026803 1.65670** -0.021487 -0.072029 -0.0053432 2.58344** -6.43022 -0.0051384 -3.13936 -31.80902 9.01090 0.0047966 5.01458 45.53257 Case 2. max, min not imposed;  = 10.00. tc a b c 1.84454 2.41351D-06 0.99110 4.98096 1.84587 0.00083726 1.16039 5.77890 Case 3. max = 2.00, min = 0.00, w0 = 1.00 and  = 5.00. tc a b c 1.27974 -1.42E-05 0.99644 2.03787 1.47637 0.00045149 0.47277 2.47125 -0.088788** -0.04150* 0.010136 -0.26540** 0.15217** 0.27588** -0.96459** 0.67296** -5.17598 -0.002146 0.017253 -10.24065 6.90988 0.001835 1.97032 13.10835 -4.91819 -0.0012661 0.55016 -6.69911 4.73011 0.0011688 1.43768 6.25706 Case 4. max = 2.00, min = 0.00, w0 = 1.00 and  = 1.00. tc a b c 0.39973 1.06979 .63578D-06 0.00033603 1.00027 0.11852 0.49382 1.34263 -0.0055157 -0.024536 0.012371 -0.016438 0.062620 -0.054174 0.028159 0.24172** Notes to Table 1. The excess rate of return is zero on the risk-free portfolio. The risky portfolio has a beta of unity on the excess rate of return on the market, RM,t. The risky portfolio’s idiosyncratic error ut: E ut = 0, E ut RM,t+j for all t,j, and E ut ut+j = (2u, 0) for j (= 0,  0). The excess rate of return on the risky portfolio R R,t is (1) RP,t+1 = wt RM,t+1 + ut+1, where wt is the weight on the risky portfolio at the start of period t. The wt changes in inverse proportion to the portfolio’s latest excess rate of return, wt = wt - wt-1 = -  (RM,t-1 - E RM), giving (2) wt = wt-1 -  (RM,t-1 - E RM) = w0 -  tj=1 (RM,j-1 - E RM), where 0 <  >/< 1. This basic weight-adjustment rule may impose bounds on wt, for example,  max, min, = [w0 -  tj=1 (RM,j-1 - E RM)] for max > [w0 -  tj=1 (RM,j-1 - E RM)] > min (2’) wt = max for [w0 -  tj=1 (RM,j-1 - E RM)]  max = min for [w0 -  tj=1 (RM,j-1 - E RM)]  min The data generating process (DGP) is (1), (2) or (1), (2’). Performance is evaluated by the OLS regression (3) RP,t = a + b RM,t + c (RM,t)2 + vt, where vt is a residual. Superior performance is inferred from c > 0, negative or perverse performance from c < 0. For a normal distribution, the skewness measure and the (excess) kurtosis measure both have a mean of zero, and the 5% critical values are 0.04801 and 0.096022 (with 10% critical values of 0.040417 and 0.080835). 0.5% 1.0 2.5 5.0 10 Critical Values from Simulations, One-Tail Test Case 1 Case 2 Case 3 Case 4 5.640368 6.614648 5.006590 3.174528 5.234447 6.089043 4.671458 2.861250 4.571965 5.373463 4.128554 2.511393 4.008997 4.845478 3.683410 2.181221 3.393424 4.156200 3.176023 1.776886 23 N(0, 1) 2.575 2.327 1.960 1.645 1.281 Table 2. Size vs. Nominal Size, One-Tail Test a Nominal Size/Size .10 .05 .025 .01 .005 %<0 Case .4974 .6293 .5027 .2030 .6546 .4095 .5495 .4071 .1235 .5808 .3370 .4801 .3195 .0755 .5170 .2855 .3733 .1887 .0211 .4365 .2163 .3501 .1080 .0056 .3855 21.47 15.56 21.26 38.05 14.86 a 1 2 3 4 T = 500 See Table 1 for definitions of the Cases. The case of “T = 500” is in Table 5.A. Table 3.A. Effects on tc of Speed of Mean Reversion Per Day a ( max = 2.00,  min = 0.00;  = 10.00; T = 250; w*= 1) Speed 0.000 0.001 0.010 0.100 Mean 1.28681 1.19446 0.84955 0.17157 Std Dev 1.65014 1.50843 1.45632 1.23238 Skewness 0.022696 -0.13725** -0.11347** -0.071538 Kurtosis 0.13183** 0.21318** 0.14660** 0.18058** Minimum -4.76387 -4.92524 -5.15372 -6.54233 Maximum 7.73448 7.62329 6.45519 5.11234 Table 3.B. Effects on Estimated Parameter Values of Speed of Mean Reversion Per Day a ( max = 2.00,  min = 0.00;  = 10.00; T = 250; w*= 1) Speed 0.000 a b c Mean 3.31008D-06 0.99928 2.41933 Std Dev 0.00054312 0.61080 3.35767 Skewness -0.030868 0.0059841 -0.18119** Kurtosis 0.57610** -1.35557** 0.76334** Minimum Maximum -0.0025459 0.0022637 -0.013030 2.01838 -15.28305 16.42690 0.001 a b c 5.43056D-06 0.00044892 1.00446 0.44930 1.86973 2.52215 -0.014435 -0.0053798 -0.36698** 0.25460** -0.0023699 0.0019048 -0.86549** -0.00048421 1.97779 0.79280** -10.15006 13.06612 0.010 a b c 4.80644D-06 0.00042504 0.99859 0.26928 1.23097 2.29967 -0.062477** 0.12423** -0.0019480 0.0016125 -0.028983 -0.17015** 0.072844 1.90248 -0.39669** 0.67856** -9.81583 9.60677 0.100 a b c -1.74960D-06 0.00035358 0.99979 0.046362 0.21281 1.60208 -0.029180 -0.0069935 -0.10743** 0.036446 -0.0014021 0.0013055 0.026703 0.84126 1.19202 0.12375** -6.82606 5.50546 Notes to Tables 3.A. and 3.B. These simulations are comparable to Case 1. The long-run weight on the risky portfolio is w* = 1. The maximum and minimum weights are  max and min. The excess rate of return on the risky portfolio RP,t is (1) RP,t+1 = wt RM,t+1 + ut+1, The weight wt is a mean-reverting process, and wt tends to its long-run value at the adjustment speed 1 > 0. Then, (4) wt = (0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM). The long-run value of wt is w* = (0 1) / 1 = 0; thus, if the long-run weight is w* = 1, then 0 = 1, and the intercept in (4) is (0 1) = 1. Similar to above, bounds of max, min may be imposed to give the model = max for  max  [(0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM)] (4’) wt = (0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM) for  max > [(0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM)] > min = min for  min  [(0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM)] 24 Table 4. Cross-Section Time-Series Experiment with  tc,i,=10 = c0 + c1 tc,i,=5 + ei Variable Coefficient Std. Error t-statistic P-value c0 c1 0.042846 1.00202 0.895239E-02 0.459746E-02 4.78600 217.950 [.000] [.000] Notes: Two funds have the same risky portfolio with the same idiosyncratic error. Both set max = 2.00 and min = 0.00; one sets  = 10.00, the other  = 5.00: thus, the first fund mimics Case 1, the second Case 3. In each of 10,000 replications (i=1,…, 10,000) the same data are used for  = 10.00 and for  = 5.00. The correlation between the values of their tc in the quadratic regressions that test for performance is 0.90891. 25 Table 5. Ferson-Khang Modified Treynor-Mazuy Regression Results: No upper or lower bounds. 1 Rp,t = a + b1 RM,t + b2 [RM,t Z*t-1] + c (RM,t)2 + ut a b1 b2 c Mean .33352D-06 0.99998 1.00042 0.0026700 Std Dev .00032731 0.074047 0.072304 1.25199 Minimum -0.0013591 0.69593 -1.42588 -5.29629 Maximum 0.0012945 1.30710 -0.51490 5.96179 Skewness -0.021930 0.037009 0.0043412 -0.00049195 Kurtosis 0.067087 0.12667 2.09700 0.17958 Correlation Matrix a 1.00000 0.0088125 -0.012917 -0.56766 a b1 b2 c b1 b2 c 1.0000 -0.0070292 -0.016333 1.0000 0.14681 1.00000 Univariate statistics c tb2 1 Mean -0.0019133 -17.71386 Std Dev 1.00771 7.55050 Minimum -3.92133 -63.85422 Maximum 3.61945 -3.01581 Skewness 0.0095965 -1.21853 Kurtosis 0.024926 2.20190 The Ferson-Khang test regression is Rp,t = a + b1 RM,t + b2 [RM,t Z*t-1] + c (RM,t)2 + ut, t = 1, 250. The data generating process is RM,t  N(0.00024 , 0.0126492), w0 = 1,  = 5.0, t  N(0.0, 0.004216372); in RM,t + t, RM,t contributes 90 % of the variance Z*0 = 0.0000 w1,1 = w0 -  (RM,1 - 0.00024); w1,t = w1,t-1 -  (RM,t - 0.00024); Z*t = Z*t-1 + (RM,t - 0.00024); Rp,t = w1,t-1 RM,t + t. 26 Table 6. Ferson-Khang Modified Treynor-Mazuy Regression Results: Upper and lower bounds 2.00 and 0.00. 1 Rp,t = a + b1 RM,t + b2 [RM,t Z*t-1] + c (RM,t)2 + ut a b1 b2 c Mean 1.87593D-06 0.99936 -0.99923 -0.0048807 Std Dev 0.00032703 0.073438 0.070714 1.23257 Minimum -0.0014869 0.68537 -1.40863 -4.82050 Maximum 0.0011915 1.28859 -0.66231 4.71443 Skewness Kurtosis -0.0096071 -0.062566 0.013934 0.065946 0.0053747 1.70749 -0.0049155 0.11685 Correlation Matrix a 1.0000 -0.0029515 0.012940 -0.56875 a b1 b2 c b1 b2 c 1.00000 0.023010 -0.0029576 1.0000 0.13542 1.00000 Univariate statistics tc tb2 1 Mean -0.0045449 -17.57704 Std Dev 0.99392 7.44291 Minimum -3.67930 -62.30221 Maximum 3.66282 -4.08806 Skewness -0.016973 -1.18674 Kurtosis -0.021319 1.80547 The Ferson-Khang test regression is Rp,t = a + b1 RM,t + b2 [RM,t Z*t-1] + c (RM,t)2 + ut, t = 1, 250. The data generating process is RM,t  N(0.00024 , 0.0126492), w0 = 1,  = 5.0, t  N(0.0, 0.004216372); in RM,t + t, RM,t contributes 90% of the variance) up = 2.00, Z*0 = 0.0000, w1,1 = w0 -  (RM,1 - 0.00024); w1,t = w1,t-1 -  (RM,t - 0.00024); Z*t = Z*t-1 + (RM,t - 0.00024); D = 1 for w1,t < up and w1,t > low; D = 0 otherwise D1 = 1 for w1,t < low; D1 = 0 otherwise D2 = 1 for w1,t > up; D2 = 0 otherwise wt = D w1,t + D1 low + D2 up Rp,t = wt-1 RM,t + t. 27 low = 0.00 Table 7. A. Increases in T: Effects on Parameter Estimates and t-values a T 750 b c Mean 1.00208 4.97189 Std Dev 2.00370 5.75132 Skewness Kurtosis -0.00436530 0.017556 -0.00063187 2.73920** 500 b c 0.98530 5.02874 1.63275 5.65357 -0.0051030 0.041299 0.13025** 2.75065** 250 b c 0.99110 4.98096 1.16039 5.77890 -0.021487 -0.0053432 -0.072029 2.58344** 125 b c 0.99748 4.98402 0.83597 5.81352 -0.0102660 0.059600 -0.01711300 2.32064** T 750 500 250 125 tc Mean 2.03066 2.00339 1.84454 1.63481 Std Dev 1.98851 1.92122 1.84587 1.68027 Skewness Kurtosis -0.073599** 0.070424** -0.072320** 0.048996 -0.095040** 0.20646** -0.058848** 0.32544** Table 7. B. Effects of w0 = 0.00 b 500 a b c tc Mean -3.33236E-06 -0.0063421 5.03948 1.99033 Std Dev Skewness 0.00082905 0.096422 1.66439 0.0068818 5.84538 -0.10406 1.94735 -0.12633 a Kurtosis 2.12736 -0.029558 3.09065 0.22587 Notes for Table 7. The simulation results are designed to complete the characterization of the parameter estimates and tc begun in preceding tables. In all cases, max, min are not imposed. b The base-case is w0 = 1. The simulations here are for w0 = 0.00. 28 Table 7. C. Effects of  = -10.00 c 500 a b c tc Mean 2.10199E-06 1.02184 -5.01169 -1.99248 Std Dev 0.00082041 1.64334 5.82084 1.93682 Skewness -0.071728 -0.010790 0.095436 0.14857 Kurtosis 2.11383 -0.015547 2.89610 0.14187 Table 7. D. Standard Deviations Cut in Half d a b c tc Mean 8.54990D-07 1.00127 4.96676 1.65744 Std Dev 0.00022860 0.81695 6.05468 1.74683 Skewness -0.0027387 0.062091 -0.068413 -0.18403** Kurtosis 1.79665** 0.11748** 2.76009** 0.27025** Table 7. E. Standard Deviation of Idiosyncratic Risk Is Cut in Half e a b c tc Mean 4.80079E-06 1.02992 5.00364 2.11193 Std Dev 0.00079290 1.65494 5.74001 2.04518 Skewness -0.060433 0.039930 0.045356 -0.027181 Kurtosis 2.21465** 0.086343 2.95568** 0.15630** Table 7. F. Mean of the Rate of Return on the Market Is Cut in Half f a b c tc Mean -8.44441D-06 1.03061 5.07078 2.00268 Std Dev 0.00081381 1.64822 5.81818 1.94167 Skewness -0.10092 -0.043592 0.086718 -0.073600 Kurtosis 2.08108 0.017950 2.91858 0.13878 The base-case is  = 10.00. In these simulations,  = - 10.00. In these simulations, the variances of both the rate of return on the market and the idiosyncratic error term are cut in half relative to the base-case. e In these simulations, the variance of the idiosyncratic error term is cut in half relative to the base-case. f In these simulations, the expected rate of return on the market is cut in half relative to the base-case. c d 29 Appendix: Details of Simulation Results In Case 1, the ĉ have a positive mean, are highly leptokurtic and are left skewed. The â have a positive mean and are highly leptokurtic. The correlation between â and ĉ is 0.75999, highly significant. In a regression of â on ĉ , â i = 0 + 1 ĉ i + ei where ei is a residual, the estimated slope is negative, small but highly significant; the slope and t-value are -.122931E-03 and -116.920. The estimate of 0 and its t-value t0 are .300720E-03 and 69.1134, or the intercept has a small but strong positive bias. Note that the regression’s residuals are non-normal and show heteroskedasticity. Comparable regressions in Cases 2 and 3 also have non-normal and heteroskedastic residuals, but this is not true for Case 4, with  = 1.00, The b̂ show little correlation with the other coefficients: â and b̂ have a correlation of 0.013417, b̂ and ĉ of -0.0098278, both insignificant. In Case 2, the mean of ĉ is 4.98096, more than twice as large as the 2.41933 in Case 1. The ĉ have little skewness, and substantial positive kurtosis. The â have a positive mean, little skewness, and substantial positive kurtosis. The correlation between â and ĉ is -0.81265, highly significant (-0.75999 in Case 1). The b̂ show little correlation with the â and ĉ : â and b̂ have a correlation (probability) of -0.022141 (0.027), b̂ and ĉ of 0.011995 (0.230). In Case 3, the ĉ have a positive mean, with significantly negative skewness and positive kurtosis. â has a negative mean; â has borderline significant negative skewness, and significant positive kurtosis. The correlation between â and ĉ is -0.73525 (compared to -0.75999 in Case 1). The b̂ show little correlation with the other coefficients. In Case 4, the ĉ have a positive mean, with little skewness and positive kurtosis. The â have a positive mean, with little skewness. The correlation between â and ĉ is -0.59782, compared to -0.75999 in Case 1. The b̂ show little correlation with the other coefficients: â and b̂ have a correlation (t-value) of 0.014447 (1.44469), b̂ and ĉ of -0.020131 (-2.01330). Table 8.A, shows the effects of T. As T doubles from 125 to 250 to 500, T1/2 goes up by a factor of 1.4142136. The standard deviation of the b rises by approximately 1.414. As T goes from 500 to 750, T1/2 rises by a factor of 1.2247449, and b’s standard deviation rises by approximately 1.225. Comparing the cases where T = 500 and T = 250, the critical values are approximately 0.30 larger than those for Case 2. For the nominal 10% size, the T=500 case has a larger size than the T = 250 case, but by less than 0.0250, or 0.6546 versus 0.6288; the difference between the two cases grows to 0.0632 for the 1.0% significance level. 30 References Aragon (2005). Brown, S., and W. Goetzmann, “Performance persistence,” Journal of Finance 50 (1995), 679698. Brown, S., W. Goetzmann, Robert Ibbotson, and S. Ross, 1992, “Survivorship bias in performance studies,” Review of Financial Studies 4 (1992), 553-580. Busse, J., “Volatility timing in mutual funds: Evidence from daily returns,” Review of Financial Studies 12 (1999), 1009-1041. Carhart, M., “On persistence in mutual fund performance,” Journal of Finance 52 (1997), 57-82. Chance, Don M., and Michael L. Hemler, “The performance of professional market timers: Daily evidence from executed strategies,” Journal of Financial Economics 62 (2001), 377-411. Comer, George, “Hybrid Mutual Funds and Market Timing Performance,” Journal of Business (2003). Cumby, Robert E., and David M. Modest, “Testing for market timing ability: a framework for forecast valuation,” Journal of Financial Economics 19 (1987), 169-189. Daniel, Kent, Mark Grinblatt, Sheridan Titman, and Russ Wermers, “Measuring mutual fund performance with characteristic-based benchmarks,” Journal of Finance 52 (1997), 1035-1058. Daniel, Naven D., 2002. Do Specification Errors Affect Inferences on Portfolio Performance? Evidence from Monte Carlo Simulations. Working paper, Georgia State University, Atlanta, GA. Fama, Eugenc, and Kenneth French, “Common risk factors in the returns on stocks and bonds,” Journal of Financial Economics 33 (1993), 3-56. Ferson, Wayne, and Rudi W. Schadt, “Measuring fund strategy and performance in changing conditions,” Journal of Finance 51 (1996), 425-462. Ferson, Wayne, and Vincent A. Warther, “Evaluating fund performance in a dynamic market,” Financial Analysts Journal 52 (1996), 20-28. Glosten and Jaganathan, Journal of Empirical Finance (1994). Goetzmann, W., and R. Ibbotson, “Do winners repeat? Patterns in mutual fund performance,” Journal of Portfolio Management 20 (1994), 9-18. Goetzmann, W., J. Ingersoll, and Z. Ivkovic, “Monthly measurement of daily timers,” Journal of Financial and Quantitative Analysis 35 (2000) 257-290. 31 Graham, John R., and Campbell R. Harvey, “Market timing and volatility implied in investment newsletters’ asset allocation recommendations,” Journal of Financial Economics 42 (1996), 397421. Grinblatt, Mark, and Sheridan Titman, “Mutual fund performance: An analysis of quarterly portfolio holdings,” Journal of Business 62 (1989), 394-415. Grinblatt, Mark, and Sheridan Titman, “Performance measurement without benchmarks: An examination of mutual fund returns,” Journal of Business 66 (1993), 47-68. Grinblatt, Mark, and Sheridan Titman, “A study of monthly mutual fund returns and performance evaluation techniques,” Journal of Financial and Quantitative Analysis 29 (1994) 419-444. Harvey, Campbell, and Akhtar Siddique, "Time-Varying Conditional Skewness and the Market Risk Premium,” Research in Banking and Finance I (2000), 25-58. Hendricks, D., J. Patel, and Richard Zeckhauser, “Hot hands in mutual funds: Short-run persistence of performance, 1974-1988,” Journal of Finance 48 (1993), 93-130. Henriksson, Roy D., “Market timing and mutual fund performance: An empirical investigation,” Journal of Business 57 (1984), 73-97. Henriksson, Roy D., and Robert C. Merton, “On market timing and investment performance. II. Statistical procedures for evaluating forecasting skills,” Journal of Business 54 (1981), 513-533. Jagannathan, Ravi, and Robert A. Korajczyk, “Assessing the market timing performance of managed portfolios,” Journal of Business 59 (1986), 217-235. Lee, Cheng-few, Sharfiqur Rahman, “ Market Timing, Selectivity, and Mutual Fund Performance: An Empirical Investigation,” Journal of Business 63 (1990), 261-278. Lehmann, Bruce N., and David M. Modest, “Mutual Fund Performance Evaluation: A Comparison of Benchmarks and Benchmark Comparisons,” Journal of Finance 42 (1987), 233265. Roll, Richard “Ambiguity When Performance Is Measured by the Security Market Line,” Journal of Finance 33 (1978), 1051-1069. Sharpe, William F., “Asset Allocation: Management Style and Performance Measurement,” Journal of Portfolio Management, (1992), 7-19 Treynor, Jack L., and Kay K. Mazuy, “Can mutual funds outguess the market?” Harvard Business Review 44 (1966), 131-136. 32

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Beating7 the Benchmark_TM and Market Timing