Download Beating7 the Benchmark_TM and Market Timing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Coefficient of determination wikipedia , lookup

Harry Markowitz wikipedia , lookup

Transcript
November 22, 2005
Spurious Mutual-Fund Performance
Richard J. Sweeney *
McDonough School of Business
Georgetown University
37th and “O” Streets, NW
Washington, D.C., 20057
1-202-687-3742, fax 1-202-687-7639, -4031
e-mail: [email protected]
Abstract: The portfolio manager, without superior insight, changes the risky-portfolio weight in
inverse proportion to the portfolio’s excess return, giving the weight a unit root. The manager is
evaluated by a quadratic regression of the portfolio’s rate of return on the market. From
simulations, the manager may appear to beat the market; in a base-case, the probability is
approximately 0.50 that the squared-market coefficient is significantly positive at the 10% level
in a one-tail test. With observable portfolio weights, conventional unit-root tests can detect the
spurious results. With unobservable weights, cusum-squared tests and recursive parameter
estimates may detect spurious results for the squared-market coefficient.
* For helpful comments, thanks are due in particular to Wayne Ferson and Doria Moyun Xu, and
also to Boo Sjöö.
Spurious Mutual-Fund Performance
Abstract: The portfolio manager, without superior insight, changes the risky-portfolio weight in
inverse proportion to the portfolio’s excess return, giving the weight a unit root. The manager is
evaluated by a quadratic regression of the portfolio’s rate of return on the market. From
simulations, the manager may appear to beat the market; in a base-case, the probability is
approximately 0.50 that the squared-market coefficient is significantly positive at the 10% level
in a one-tail test. With observable portfolio weights, conventional unit-root tests can detect the
spurious results. With unobservable weights, cusum-squared tests and recursive parameter
estimates may detect spurious results for the squared-market coefficient.
Spurious Mutual-Fund Performance
1. Introduction
Thousands of mutual funds are available, and information on fund performance is available
to the investor from many thousands of sources, including financial advisors, firms that track fund
performance, and financial newspapers and magazines. The time and energy spent on choosing
mutual funds makes most sense if some funds show superior, risk-adjusted performance, and this
superior performance can be predicted out of sample with some degree of success. In practice,
predicting superior mutual fund performance often comes to whether past superior performance
persists.
Studies of superior performance are conventionally divided into two groups. One group
focuses on managers’ ability at stock selectivity. Here the fund buys and holds stocks until
decisions are revised; the goal of stock selectivity is to choose stocks that are likely to experience
positive abnormal returns during the holding period. Another group of studies focuses on
managers’ ability to time movements in returns on different asset classes, with the aim of moving
back and forth across the classes. The aim is to hold larger-than-average portfolio weights in
classes likely to have abnormally high returns, smaller-than-average weights in classes likely to
have abnormally low returns. A general approach to evaluating the manager’s performance
moving back and forth across several asset classes is style analysis, following Sharpe (1992). In a
specialized case of movements across asset classes, the manager times movements across stocks
and cash. This paper focuses on this special case.
A number of approaches are used to evaluate managers’ performance is this case. One is
the Treynor-Mazuy (1966) approach: the fund’s rate of return is regressed on the market’s rate of
return and on the market-return squared; superior performance is associated with a positive slope
1
on the market-squared term. In interpreting a positive slope on the market-squared term, the
assumption is that the manager with superior insight increases (decreases) the beta risk of her
portfolio when the expected market excess return is above average. Lehman and Modest (1987)
and Comer (2003) use a multi-factor extension of the Treynor-Mazuy model. The present paper
focuses on the Treynor-Mazuy approach.
Another approach is Merton and Henriksson’s (1981), where the risky portfolio’s weight is
compared with subsequent market returns using contingency tables. Cumby and Modest (1987)
extend the Merton-Henriksson approach to use regression; Graham and Harvey (1996) and
Grinblatt and Titman (1989, 1994) use related approaches.
Classic papers find no significant market timing across the risky portfolio and cash
(Treynor and Mazuy 1966, Henriksson 1984), but this may be arise from lack of power in the
tests. Later papers, using daily rather than monthly data, find significant market timing ability for
an important number of funds (Chance and Hemler 2001, Bollen and Busse 2001a); the authors
suggest monthly data may provide too little power. Related, using simulation, Goetzmann,
Ingersoll and Ivkovic (2000) show that if the manager’s ability shows up in daily data, it may be
obscured in monthly data. Note that for daily data, Bollen and Busse (2001b) conclude that
detected superior timing ability shows significant persistence.
The phenomenon of observable superior timing ability in daily data, but not in monthly
data, may be an example of the "interim trading" problem (Ferson and Khang 2002, Chen, Ferson
and Peters 2005). Suppose the evaluator examines the rate of return on the managed portfolio at
the end of each period of J days (J is approximately 22 trading days per month). Then, the
evaluator misses much of the information in the interim trading over the J days.
2
On the one hand, interim trading problems may lead to spurious results of no superior
trading. On the other hand, researchers have long recognized that, for a number of reasons, a fund
may show superior performance that is spurious. Jaganathan and Korajczyk (1986) show that tests
of market-timing ability may find spurious superiority for a buy-and-hold portfolio if that portfolio
consists of stocks that have a larger option component than the benchmark market portfolio used
in the test; in reaction to this possibility, authors have adjusted benchmarks to avoid spurious
superiority. Further, it is now common practice to generalize the Treynor-Mazuy regression by
also including factors that capture anomalies such as size, earnings-price, book-market,
momentum, the dividend yield and interest rates, to avoid giving a fund credit for exploiting these
well-known anomalies. The researcher must also take care to address thin-trading problems that
may infect the fund’s stocks and the benchmark portfolio. And the researcher may want to account
for skewness in fund or benchmark portfolio returns, and whether co-skewness between the two is
priced (Harvey and Siddique 2000).
Use of daily data to avoid interim trading problems may be subject to other problems. In
particular, this paper addresses how behavior motivated by the "tournament" nature of
compensation for mutual fund managers may lead to performance that spuriously appears to be
superior. Funds that perform relatively well tend to attract more new funds (Sirri and Tufano
1998), and fund advisers' compensation depends in part on the amount of funds under
management. Brown, Harlow and Starks (1996) provide statistical evidence that funds that are
doing relatively poorly in the midst of a calendar year tend to increase their risk and hence
expected return relative to others during the remainder of the year in an attempt to increase their
relative rankings.
3
This paper discusses a new, previously unrecognized source of superior performance that
is spurious. This spurious performance is related to the Saint Petersburg or Bernoulli paradox, and
is one version of interim trading problems. The spurious performance can be thought of as arising
from an ability-less manager trying nevertheless to win in the mutual-fund tournament. Using
simulation to investigate the Treynor-Mazuy model run over a trading year on daily data, this
paper shows that the expected slope on the market-squared term may be positive even if the
manager has no timing ability—it depends on how the ability-less manager alters the weight on
the risky portfolio. (a) Suppose the ability-less manager randomly chooses each day whether to
make the weight on the fund’s risky portfolio higher or lower than its average by a given amount,
say make the portfolio’s beta 0.90 or 1.10 around an average of 1.00. Then, the expected value of
the slope on the market-squared term is zero. (b) If the ability-less manager makes long-lasting
changes in the fund’s weight on its risky portfolio, and these weight-changes are inversely
proportional to the risky portfolio’s latest excess rate of return, then the estimated value of the
market-squared slope is highly likely to be positive. In some cases discussed below, there is a 0.50
probability of rejecting the null of no superior performance at the 10% significance level in a onetail test; further, if the fund shows superior performance this year, it has a 0.50 probability of
showing significant performance next year. Intuitively, the manager starts out with much of her
portfolio exposed to market risk. If there is a run of days on which the excess return on the market
is above average, the manager reduces the fraction of her portfolio at risk—she locks in the
abnormal returns that she made by chance. If there is a further run of days with exceptional market
performance, she reduces her risk-exposure even more. If, however, there is a run of days with
below-average market excess returns, she increases the fraction of her portfolio exposed to market
risk; this behavior is analogous to the Bernoulli bettor who doubles up his bet if he loses (Brown
4
et al. 2004), and can be thought of as a "portfolio insurance" strategy where the manager hedges a
portion of his profits. On the one hand, the ability-less manager is motivated by tournament
incentives. On the other hand, if there were no interim trading on the days that constitute the year
over the performance is evaluated, there would be no bias.1
Brown, Harlow and Starks (1996) argue that tournament incentives induce some of
substantial variation over time that many funds show empirically in the proportions of their
portfolios exposed to risk. So far no one has investigated whether these variations arise from the
behavior considered in this project, but this is a possibility that performance evaluators should not
overlook. The researcher must condition tests for market-timing ability on the properties of the
process that generates the fund’s weight changes. If the time series of these weights is observable,
the persistence in the weight is often obvious in a graph of the weight against time, and standard
unit root tests are often adequate for detecting the persistence. If the time series of weights is
unobservable, then the researcher can test for parameter instability in the Treynor-Mazuy test
equation, in hopes of detecting time variation in the risky portfolio’s weight. An approach with
more power is to use a modified Treynor-Mazuy test based on Ferson and Khang (2002), where
conditioning variables are used to avoid interim trading biases. Ferson and Khang condition on
past portfolio weights, and this approach works for the problem this paper discusses.
Alternatively, the Ferson-Khang approach also works where the conditioning variable is past rates
of return on the market; hence observable portfolio weights are not required.
This paper uses simulation, as do Goetzmann, Ingersoll and Ivkovic (2000) and Daniel
(2002). Roll (1978), Lehmann and Modest (1987) and Grinblatt and Titman (1994) show that
estimated performance is sensitive to choice of benchmark; by using simulation, this study avoids
1
If the daily data show autocorrelation, modifying the Treynor-Mazuy regression to condition of the autocorrelation
would eliminate bias.
5
problems that arise from benchmark misspecification. It might appear that simulation ensures that
no covariates need be included as in a “conditional” Treynor-Mazuy analysis (Ferson and Schadt
1996, Ferson and Khang 2002); on the contrary, conditioning on past weights or past market rates
of return is necessary to avoid biases in estimates of the Treynor-Mazuy performance measure.
2. The Model
In the Treynor-Mazuy approach, the manager’s performance is evaluated with the OLS
regression
(1)
RP,t = a + b RM,t + c (RM,t)2 + vt,
where RP,t is the excess rate of return on the portfolio, RM,t the excess rate of return on the market,
and vt is a residual. Superior performance is inferred from c > 0, negative or perverse performance
from c < 0.
Because of the tournament nature of manager compensation (Brown, Harlow and Starks
1996) and even job tenure, the manager has strong incentives to search for techniques to enhance
his measured performance. While it is well known that a manager with no skill may spuriously
enhance performance, this paper investigates a particular technique that has not been discussed
before and uses simulation to show that a manager with no skill may nevertheless have a
significantly positive market-squared term in (1). Suppose the manager’s overall portfolio is made
up of his risky portfolio and his risk-free portfolio. The excess rate of return is zero on her riskfree portfolio. The excess rate of return on her overall portfolio and on her risky portfolio is
(2)
RP,t+1 = wt RM,t+1 + ut+1,
where wt is the weight the manager chooses at the end of t for her risky portfolio for period t+1.2
The risky portfolio has a beta of wt on the excess rate of return on the market, RM,t, the only risk
2
This specification assumes that when wt = 0, the portfolio is still subject to idiosyncratic risk, RP,t+1 = ut+1. Another
possible specification is RP,t+1 = wt (RM,t+1 + ut+1), where wt = 0 implies RP,t+1 = 0. The discussion in Section 5 shows
6
factor facing the manager. The risky portfolio’s mean-zero idiosyncratic error ut has no serial
correlation and is uncorrelated with the market’s excess rate of return, E ut = 0, E ut RM,t+j for all
t,j, and E ut ut+j = (2u, 0) for j (= 0,  0).
The manager has no market-timing ability, but simply changes wt in inverse proportion to
the portfolio’s latest excess return. She adjusts wt as wt = wt - wt-1 = -  (RM,t - E RM), giving
(3)
wt = wt-1 -  (RM,t - E RM) = w0 -  tj=1 (RM,j - E RM),
where   1 (save for one experiment below), and, for convenience, the expected excess rate of
return on the market is taken as time constant, E RM,t = E RM. Thus, wt is a unit-root process.4
Note that if the manager sets  = 0, then he rebalances every period to set wt = w0. Alternatively, if
the manager follows a buy and hold strategy, the weight evolves as wt = wt-1 (1 + RM,t + rf,t) / (1 +
Rp,t + rf,t), where rf,t is the risk-free rate; performance in the buy-and-hold case is discussed below.
"You want the measured return used in (1) now to compound the results over J periods, J
being a parameter of the simulator."
More realistically, the manager modifies (3) to set upper and lower bounds on wt—max,
min. Funds typically promise they will hold no more than a certain fraction, or less than another
fraction, of assets in the risky portfolio; in the absence of bounds, | wt | may be substantially larger
than the fund’s syllabus explicitly or implicitly promises. Thus, the manager is likely to use
= max
for [w0 -  tj=1 (RM,j - E RM)]  max
that the estimated parameters in (3) below, and the t-value for the market-squared term, are complicated sets of
functionals where the effects of ut go to zero in probability. In other words, the differences in specification are
unimportant. Simulations (Table 8.E below) show that, for T=250, if the variance of the idiosyncratic term is cut in
half, the tc mean and standard deviation rise to 2.11193 from 2.00339, and to 2.04518 from 1.92122. The ĉ mean and
standard deviation are essentially unchanged, as are the b̂ mean and standard deviation.
4
Because wt is a unit-root process, in the manager’s current portfolio beta depends on how well her portfolio has
performed in the past, as measured by (R M,j-1 - E RM). Past performance, say j periods ago, (RM,t-j - E RM), has an
effect larger or equal to unity,   1, but the weight on performance in period t-j does not change as time passes: Good
performance leads to permanent reductions in the portfolio’s future betas, bad performance to permanent increases.
7
(3’)
wt
= [w0 -  tj=1 (RM,j-1 - E RM)]
for max > [w0 -  tj=1 (RM,j - E RM)] > min
= min
for [w0 -  tj=1 (RM,j - E RM)]  min
The weight and hence portfolio beta has a unit root within the bounds of max, min. The max, min
are reflecting barriers—if wt reaches a bound, eventually it is likely to move back within the
bounds.5 (3) is analytically simpler, but (3’) is much more important practically.
The data generating process (DGP) is (2) and either (3) or (3’). On the one hand, from (3)
and (3’), the manager does not adjust her weight when she thinks the market is likely to be above
average, as in the discussion typically behind the Treynor-Mazuy model in (1). The manager has
no insight regarding future RM,t: Her best guess is Et-1 (RM,t - ERM) = 0, and the true value of c is
zero. On the other hand, the distribution of the estimate of c in (1) is non-standard, because the
time-varying weight wt has a unit root, over its entire range in (3), or over max  wt  min in (3’).
A number of authors propose generalizing the Treynor-Mazuy regression to include other
variables that may help explain the rate of return on the risky portfolio, for example, an interest
rate, the dividend yield, the earnings-price ratio etc. (see Ferson and Schadt 1996, Ferson and
Warther 1996, Chance and Hemler 2001, Daniel 2002). If such covariates are omitted, and if the
market-squared term is correlated with these omitted variables, then the market-squared term may
show spurious significance.6 Because such covariates do not enter the DGP for the simulations this
paper studies, there is no need to include them in the Treynor-Mazuy regressions—they are
irrelevant variables.
The ability-less manager's behavior can be thought of as arising from tournament
incentives, as discussed by Brown, Harlow and Starks (1996). In one view, the ability-less
If wt hits the lower bound of 0.00, for example, it will become positive if [w0 -  tj=1 (RM,j-1 - E RM)] becomes
positive. The longer the time horizon, the larger the probability this will occur. Thus, J is an interesting parameter.
6
Or lack of significance, depending on the signs of the omitted covariates’ effects on the risky portfolio and the
covariates’ correlations with the market-squared term.
5
8
manager follows a contrarian strategy as opposed to a momentum strategy; in light of an aboveaverage excess return on the market, she reduces her risky holdings. Note, however, that the data
show neither momentum nor reverse (or negative momentum); as far as the ability-less manager
can see, the excess return on the market contains no structure; after an above-average excess return
on the market the manager does not reduce her risky position because she thinks a below-average
excess return is likely.
In another view, the ability-less manager may be thought of as using portfolio insurance.
Note, however, that the manager claims to outperform the market, and the fund’s prospectus does
not state the manager uses portfolio insurance. On the one hand, the manager is following a
mechanical strategy that, if it were revealed, an investor desiring portfolio insurance could follow
for free (or at very low cost, by hedging with derivatives), and thus the investor would not pay the
manager for her results. On the other hand, it is not at all clear that an investor would desire to
increase the proportion of his portfolio at risk in the aftermath of below average excess returns on
the market.7
3. Simulation Results
Section 5 shows that OLS estimates from the Treynor-Mazuy regression (1) of ĉ and its tvalue, tc, have distributions that depend on functionals in complicated ways best studied by
simulation. In the simulations below, daily data are assumed, with 251 days per trading year, or
250 days after the initial day. For the simulations, the DGP is (2) and either (3) or (3’). The excess
rate of return on the market is RM,t  N(0.00024, 0.012649 2), or the annual excess rate of return
has a mean and standard deviation of 6% and 20%. The idiosyncratic term is ut  N(0.0,
The portfolio insurance involved in the manager’s behavior makes sense for an investor who desires to reduce the
share of his portfolio at risk as his wealth increases, and increase the share at risk as his wealth falls. Even an investor
with this type of risk aversion would not choose an arbitrary adjustment parameter , but instead  would be implied
by parameters in his utility function.
7
9
0.00421637 2); its variance 2u is chosen to be 10 percent of 2RM + 2u; Section 5 presents results
for variation in 2RM and 2u. For each case, 10,000 replications are used.
The simulations give some very general results. First, if the bounds max = 2.00, min = 0.00
are imposed, and a value  of say 5.0 or larger is used,8 then tc is likely to be positive; the
probability that tc is significant at the 10% level or better in a one-tail test is approximately 50%.
Second, for  values of say 5.0 or larger, removing the bounds max, min increases the likelihood
that tc is positive; the probability that tc is significantly positive at the 10% level or better in a onetail test rises to more than 60%. In the absence of weight-bounds, however, the manager is likely
on an important number of occasions to choose weights much larger or smaller than the fund’s
syllabus promises investors.
3.A: Effects of Bounds on Variations in the Risky Portfolio’s Weight. Unless otherwise noted, all
cases start with w0 = 1. Cases 1 and 2 use  = 10.00. Case 1 sets the bounds max = 2.00, min =
0.00, and Case 2 shows the effects of not imposing bounds.
Case 1. Table 1 shows the average tc is 1.29;9 the tc distribution shows little skewness but
highly significant kurtosis.10 From Table 2, in 49.74% of the runs tc is significantly positive at the
10% level in a one-tail test, with only 21.47 percent of the tc negative. Table 2 also shows the true
sizes for simulations relative to nominal sizes in one-tail tests of the null that c = 0.
The b̂ have a mean of 0.999, quite close to unity and to w0 = 1; the maximum and minimum
It might appear that  = 5.0 is large, but relative to returns parameters, it is not. The expected excess rate of return on
the market (or portfolio with wt = 1) is 0.00024 per day, the expected change in wt is  x 0.00024, for  = 5.0 is
0.0012, and relative to the initial value of w0 = 1.00, is 0.0012, or 0.12 of 1%. The standard deviation in percent per
day is 20%/250 = 0.08%; thus, for  = 5.0,  x 0.08% relative to w0 = 0 = 1 is 0.40%.
9
The notes to Table 1 contain critical values for Cases 1-4. They are sensitive to weight bounds and the values of .
10
The text and an appendix discuss in some detail the distributions of the estimated coefficients and their relations to
each other. This is because, as Section 5 shows, the estimated coefficients and the t c are each a complicated set of
functionals where the properties of the distributions must be found from simulation. Thus, the details presented for the
various cases and the further simulation results in Section 5 characterize the distributions of the estimated parameters
and tc.
8
10
are 2.018 and -0.013, close to the max = 2.00, min = 0.00. The b̂ show little skewness but negative
kurtosis, which likely comes from the truncation of the distribution by max = 2.00, min = 0.00
(see Case 2 results).
Measuring Stock-Selectivity and Timing Ability. A common interpretation is that ĉ reflects
timing ability, â measure stock-selection ability, and â + ĉ s2Rm measures total ability, where s2Rm is
the sample variance of the market’s excess rate of return (Aragon 2005, Daniel 2002, Glosten and
Jaganathan 1994).
"A better version is Aragon (2005) using option values. See Glosten and Jaganathan, Journal
of Empirical Finance (1994)."
Many researchers find that â and ĉ are negatively correlated and tend to offset each other’s effects
on timing ability, often giving total timing ability of approximately zero. From Table 1, the
average â > 0, the average ĉ > 0, and hence they do not tend to offset. Using the means from Table
1 and the variance of RM,t in the DGP, â + ĉ s2Rm = 3.31008D-06 + 2.41933 (0.012649 2) =
0.0003904. The square root multiplied by 100 gives 1.9758189%. Because of the DGP for this
paper’s simulated portfolio returns, there is no stock selectivity; consistent with this, â is small and
has little effect on â + ĉ s2Rm. Note, however, that the correlation between â and ĉ is -0.75999, highly
significant (appendix). These results should be compared to the case below where wt changes
randomly. Results are similar in Cases 2-4, save the mean â is -1.42E-05 in Case 3.
Case 2. In the absence of weight-bounds, the average tc is 1.84, versus 1.29 in Case 1.
From Table 2, in 62.93% of the runs the tc is significantly positive at the 10% level in a one-tail
test, versus 49.74% in Case 1; further, the likelihood of finding a negative tc decreases, from
21.47% to 15.56%. The tc distribution is significantly skewed to the left (insignificant, right
11
skewness in Case 1), with significant kurtosis, as in Case 1. The mean ĉ is 4.98096, more than
twice as large as the 2.41933 in Case 1.
The b̂ have a mean of 0.991, close to unity and to w0 = 1; the maximum and minimum are
5.01458 and -3.13936—in contrast, Case 1 has 2.018 and -0.013, close to max = 2.00, min = 0.00.
The b̂ show little skewness; their negative, insignificant kurtosis of -0.072029 is substantially
closer to zero than the significant -1.35557 in Case 1.
Comparison to Random Changes in Weight. For comparison, consider the case where wt
fluctuates randomly, wt = w0 + Zt, where Zt  N(0.0, 0.012649 2), with zero mean but the same
standard deviation as the return on the market, and max = 2.00, min = 0.00. Then, in the TreynorMazuy test regression (1):
tc
ĉ
Mean
0.0016144
0.0035558
Std Dev
1.00391
1.22519
Skewness
0.024892
0.037214
Kurtosis
0.042075
0.19095
Minimum
-3.72617
-4.68373
Maximum
4.46232
5.01194
tc is essentially zero, with a standard deviation of approximately unity; skewness and kurtosis are
positive but insignificant.12, 13, 14
Comparison to a Buy-and-Hold Strategy. Alternatively, the manager may follow a buyand-hold strategy, where the weight evolves as wt = wt-1 (1 + RM,t + rf,t) / (1 + Rp,t + rf,t), and rf,t is
the risk-free rate. With a buy-and-hold strategy, there are no weight bounds. Assuming that rf,t is
2%/year, in the Treynor-Mazuy test regression (1):
tc
Mean
Std Dev
Skewness
Kurtosis
0.0069658
1.02693
0.024478
-0.0061471 -4.02336
12
Minimum
Maximum
3.67080
In this case (and in general save Case 4), the maximum value of tc and c tends to exceed the absolute value of the
minimum value because the expected value of the excess rate of return on the market is positive and substantial.
13
The mean â is -3.22051D-06 and the mean ĉ is 0.0035558, so that â + ĉ s2Rm = -3.22051D-06 + 0.0035558 (0.00016)
= -3.22051D-06 + 0.6 D-06 = -3.16 x 10 -6 < 0. The correlation between â and ĉ is -0.57587.
14
The mean b is unity, with a standard deviation of 0.021540, significant positive skewness but insignificant kurtosis,
and minimum and maximum values of 0.93025 and 1.10267.
12
ĉ
0.0081278
1.25677
-0.0099342
-0.035826
-4.88929
4.53971
tc is essentially zero, with a standard deviation of approximately unity; skewness is positive and
kurtosis is negative, but both are insignificant.15, 16
3.B: Effects of the Adjustment Coefficient . If the coefficient  is reduced to 5.00, as in Case 3,
the distribution of tc shows a relatively small leftward shift. If  is reduced further to 1.00, as in
Case 4, the effect on the tc distribution is severe. In both cases the weight bounds are max = 2.00,
min = 0.00, with w0 = 1.00.
Case 3. The average tc is 1.28; in 50.27% of the runs, tc is significantly positive at the 10%
level in a one-tail test, close to the 49.74% in Case 1, where  = 10.00. 21.26 percent of the tvalues are negative, versus 21.47 percent in Case 1. The tc distribution is skewed left, with positive
kurtosis. The b̂ have a mean of 0.99644, with a range of 0.017253, 1.97032, just within the weight
bounds. b̂ has insignificant left-skewness with significant negative kurtosis.
Case 4. The average tc is 0.40, compared to 1.29 in Case 1; in 20.30% of the runs tc is
significantly positive at the 10% level in a one-tail test, substantially less than the (approximate)
50% in Cases 1 and 3, where  = 10.00, 5.00. The b̂ have a mean of 1.00027, close to unity and w0
= 1; the maximum and minimum are 1.43768 and 0.55016, well within max = 2.00, min = 0.00.
The b̂ show little skewness or kurtosis.
Comparison of Results Across Values of . Table 1 allows a comparison of Cases 1, 3 and
4, with max = 2.00, min = 0.00 and w0 =1. As  goes from 10, to 5, to 1, the tc empirical
distributions approach the N(0, 1). As  goes from 10, to 5, to 1, the ĉ empirical distributions show
15
The mean â is -6.30805D-06 and the mean ĉ is 0.0081278, so that â + ĉ s2Rm = -6.30805D-06 + 0.0081278 (0.00016)
= -6.30805D-06 + 1.3 D-06 = -5.01 x 10 -6 < 0. The correlation between â and ĉ is -0.60139.
13
decreases in the mean and standard deviation; the mean goes to zero as  decreases towards zero.
Across , the mean of the b̂ empirical distributions fluctuates around 1.00; the standard deviation
decreases as  decrease towards zero.17
3.C: Mean Reversion in the Weight A more general weight-adjustment rule allows mean
reversion in wt to its long-run value at the adjustment speed 1 > 0, with wt a unit-root process in
the limit. The rule is
(4)
wt = (0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM).
The long-run value of wt is w* = (0 1) / 1 = 0; thus, if the long-run weight is w* = 1, then 0 =
1, and the intercept in (4) is (0 1) = 1. Similar to above, if bounds of max, min are imposed,
= max
(4’)
for max  [(0 1) + (1 - 1) wt -  (RM,t-1 - E RM)]
wt = (0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM) for max > wt-1 > min
= min
for min  [(0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM)]
Note that (4) and (4’) nest (3) and (3’): as 1  0, model (4) goes to (3) and (4’) goes to (3’). The
simulations below use max = 2.00, min = 0.00,  = 10 and w* = 1 (comparable to w0 = 1 above).
In Tables 3.A and 3.B, simulations of (4’) show that if the adjustment speed is 0.001, or
0.1%/day, then results are similar to the case where 1 = 0 in model (3’). Intuitively, if the weight
is near integrated (NI), the results are similar to the results when the weight is I(1). For sufficiently
high adjustment speeds, however, the results show that the manager has an edge, but only a slight
one. For the adjustment speed 1, the gap between any wt and w* decreases over N periods by the
16
The mean b is 1.00368, with a standard deviation of 0.044241, positive but insignificant skewness and kurtosis, and
minimum and maximum values of 0.83631 and 1.20671.
17
In cases where max = 1.50, min = 0.50, the mean tc is relatively insensitive to . As  goes from 10.0 to 15.0 to 20.0,
the tc mean (standard deviation) goes from 0.84237 (1.43307) to 0.80034 (1.47881) to 0.81188 (1.51970). For max =
2.50, min = -1.50 with  = 10.0, the mean tc is 1.66000 (1.76345). In contrast, with max = 2.00, min = 0.00,  = 10.0,
the mean tc is 1.28681 (1.65014).
14
percent [1 - (1 - 1)N]. For 1 = 0.001 over the course of a year of 250 trading days, the gap
decreases by [1 - (0.999)250] = 1 - 0.7787 = 0.2213, or by 22.13%. For 1 = 0.01, the gap decreases
by 91.89%; for 1 = 0.100, the gap is virtually zero.
Simulations for 0.001, 0.010 and 0.100 can be compared to Case 1, 1 = 0.00—see Tables
1, 3.A and 3.B. As the adjustment speed rises from nil to 0.1%/day to 10.0%/day, the mean tc falls
from 1.287 to 1.194 to 0.850 to 0.172. The gap between the maximum and the minimum values is
roughly 11.5 to 12.5 across the 1. Increases in the adjustment speed can be thought of as shifting
the tc distribution to the left.
A comparison of the distributions of the â , b̂ , ĉ is in Table 3.B. For b̂ , increases in the
adjustment speed leave the mean virtually unchanged, but reduce the spread of the distribution.
For ĉ , increases in the adjustment speed reduce the mean and also reduce the spread of the
distribution. For â , increases in the adjustment speed have no clear effect on the mean, but reduce
the spread of the distribution.
4. Implications for Evaluating Performance
Subsections 4.A-4.C illustrate difficulties in evaluating performance when fund managers
use weight-adjustments such as (3), (3’), (4) or (4’). Section 5 discusses statistical methods of
detecting the ability-less manager's strategy.
4.A: Evaluating a Fund’s Performance over Time. Consider a single manager who follows the
strategy in (3’) and sets max = 2.00 and min = 0.00, and sets  = 10.00—Case 1. In any single
year, from Table 2 the probability that her ĉ is significant at the 10% level is .4974  0.50. Given
that she has ĉ > 0 significant this year at the 10 percent level, the conditional probability of ĉ > 0
significant next year is 0.50. The unconditional probability that she will have two years
with ĉ significant at the 10 percent level in both years is thus 0.25 = 0.5 x 0.5, and a run of three
15
years has a probability of 0.125; with a symmetric distribution, one would instead expect
probabilities of 0.10, 0.01, and 0.001. The probability that tc > 0 over one, two and three years in a
row is 0.7853, 0.6167 and 0.4843, as opposed to the expected 0.5, 0.25 and 0.125.
4.B: Evaluating a Cross Section of Funds in a Given Year. Consider any given year. A manager
who sets max = 2.00, min = 0.00,  = 10.00 (Case 1) is likely to do well: from Table 2, the
probability that tc > 0 is 0.7853, and the probability is 0.497 that tc is significantly positive at the
10 percent level. But she has a significant chance of doing poorly; the probability of tc < 0 is
0.2147. Consider a family of mutual funds, where one fund sets  = 10.00 and another fund sets 
= -10.00. The fund with  = 10.00 is likely to do well in many but not all years. In the years when
this fund does poorly, the fund with  = -10.00 is likely to do well. The converse also holds: When
the fund with  = 10.00 does well, the fund with  = -10.00 is likely to do poorly. The family of
funds may argue that it provides the investor with nice diversification: In good years, the investor
will do very well with the part of his portfolio in the fund with  = 10.00, and in the years when
this part of the portfolio does poorly, the part of his portfolio in the fund with  = -10.00 is likely
to take up the slack and do well. The fact that the fund with  = -10.00 does poorly in many years
is just the cost the investor must pay for the insurance or diversification that the fund provides.
4.C: Effects of  : Cross-Section, Time-Series Behavior of Funds’ Performance. If a number of
funds use the approach in (2) and (3) or (3’), an evaluator may find it difficult to detect the
similarity. Consider two funds with the same risky portfolio and same idiosyncratic error, with
max = 2.00, min = 0.00. One sets  = 10.00, the other  = 5.00, as in Cases 1 and 3. From Table 2,
in a given year each has approximately a 0.50 probability that ĉ > 0 and significant at the 10%
16
level, and a 0.41 chance that ĉ > 0 significant at the 5% level.18 Generally, the two funds show
substantial but not perfect correlation in their â , ĉ , b̂ , tc and paths of wt. Table 4 shows a
regression of the funds’ t-values, where tc,i,=10 and tc,i,=5 are the t-values for the ith replication with
 = 10.0,  = 5.0, and in each replication the same data are used for  = 10.00,  = 5.00. The R2 is
0.82612, and hence the correlation is 0.90891.
Making the funds less alike reduces the correlations, for example, if each fund’s risky
portfolio is the same, but their idiosyncratic errors are uncorrelated. Similarly, one fund may use
max = 2.00, min = 0.00, but the other max, = 2.10, min = 0.10, or max, = 1.90, min = -0.10, etc.
Suppose that 20 funds in a 100-fund panel follow (3’), choose values of , max, min
similar to the ranges above, but with some heterogeneity in these parameters, the risky portfolios
and idiosyncratic risks. Each year, the researcher is likely to find that very roughly 20 funds do
well, but with fluctuations across years. He is likely to find that funds that do well in a given year
have related, but hardly the same, values of ĉ . Over a two-year period, some funds do well in both
years, but some that do well in the first year do not in the second. The researcher is likely to
conclude that the subset of 20 funds does not as a whole follow the behavior in (3’).
5. Adjusting the Treynor-Mazuy Regression for the Manager's Strategy
This section shows that, by using Ferson and Khang's (F-K 2002) modification of the
Treynor-Mazuy model, the evaluator can control for the information-less manager's attempts to
game evaluation. In simulations of the F-K modification of the T-M model, the (RM,t)2 variable is
centered on zero and its t-value is approximately N(0, 1). F-K modify the T-M test regression to
allow for conditioning on publicly available information and also for special insights the manager
may have. For present purposes, the manager's special insight is ignored: by assumption the
18
For smaller significance levels, the probabilities diverge, with larger probabilities for the  = 10.00 fund.
17
manager considered has none (Ferson and Schadt 1996 discuss cases where only conditioning is
considered). Suppose that the publicly available information is Zt. In this case, the F-K test
regression is
(1')
RP,t = a + b1 RM,t + b2 (RM,t Zt-1) + c (RM,t)2 + ut,
where ut is a residual and Zt-1 is publicly available information at the end of period t-1 (start of
period t).19 For present purposes, use the publicly available past daily rates of return on the market
as the conditioning information. Form the variable Z*t-1 = t-1j=1 (RM,j - ERM,j) Recall that wt-1 = 1
-  t-1j=1 (RM,j - ERM,j); thus, wt-1 = 1 -  Z*t-1 or Z*t-1 = (1 - wt-1) / .
For the F-K test regression (1), Tables 5 and 6 show simulation results,20 Table 5 for the
case where there are no upper and lower bounds on wt, Table 6 for the case where the upper and
lower bonds are 2.0 and 1.0. In both Tables 5 and 6, the mean value of ĉ is close to zero
(0.0026700, -0.0048807). Similarly, the t-values are center on zero and are approximately N(0, 1):
The mean t-values tc and their standard deviations are (-0.0019133, 1.00771) and (-0.0045449,
0.99392); in both cases skewness and excess kurtosis of tc are close to zero, (0.0095965,
0.024926) and (-0.016973, -0.021319).
The test regression in (1) does not produce biased estimates of the intercept a, and thus
does not produce spurious estimates of selectivity. In Tables 5 and 6, the mean â is very small and
positive (0.33352x10-6, 1.87593x10-6), and both are small relative to their standard deviations
(0.13591x10-2, 0.32703x10-3). The correlation between â and ĉ is -0.56766 for Table 5 and 0.56875 for Table 6.21
19
Zt-1 may be a vector of variables, with b2 a conformable vector of coefficients.
In the DGP used for Tables 5 and 6, the manager makes no attempt to adjust wt back to the initial w0; mean
reversion is discussed below.
21
In a regression of the â on ĉ , the slopes (t-values) are -0.148405x10-3 (-68.9458) and -0.150900x10-3 (-69.1407).
Across the two tables, the LM heteroscedasticity test, the Jarque-Bera test and Ramsey's RESET2 test raise one red
flag: for the regression for Table X, the Jarque-Bera test statistic is significant at the 0.009 level.
20
18
Conditioning When the Weight Shows Mean Reversion. Section 3 discussed and showed
results for the case where the weight wt reverts at the speed 1 to the long-run weight w0 in the
absence of further surprises to the market rate of return. In this mean-reversion case, including the
conditioning variable Z*t-1 = t-1j=1 (RM,j - ERM,j) in the Treynor-Mazuy test regression leads to
downward bias from zero in ĉ , as the following shows for the adjustment speed 1 = 0.100:
tc
ĉ
Mean
-0.13566
-0.18228
Std Dev
1.21900
1.59907
Skewness
-0.094529
-0.10472
Kurtosis
0.041400
0.081435
Minimum
-4.82933
-7.58638
Maximum
4.19592
5.84840
The bias in ĉ arises because the conditioning variable Z*t-1 = t-1j=1 (RM,j - ERM,j) is misspecified.
The appropriate conditioning variable is Z**t-1 = t-1j=1 (1 - 1)j-1 (RM,j - ERM,j), as the following
shows for the adjustment speed 1 = 0.100:
tc
ĉ
Mean
0.00059898
0.0016860
Std Dev
1.01065
1.24554
Skewness
-0.063544
-0.062425
Kurtosis
0.024090
0.11412
Minimum
-3.89421
-4.95423
Maximum
3.51783
4.80219
This suggests that when conditioning on Z*t changes a positive value of ĉ to a negative value, the
researcher may search across values of (1 - 1) to find the value that sets ĉ = 0.
6. Distribution of Coefficient Estimates and t-values
The distributions of the â , b̂ , ĉ and tc are non-standard and difficult to characterize
analytically even when weight bounds max, min are not imposed. An appendix (available from the
author) sketches the result that asymptotically the â , b̂ , ĉ and tc are complicated functionals with
non-standard distributions; this arises because the weight wt contains a unit root. The
characteristics of the empirical distributions must be found from simulations, as above. These
results are supplemented by further simulations discussed below.
Empirical Distributions of the Estimates: Further Characterization. Consider how
increasing the sample period, from 125 to 750, affects empirical distributions in Table 7.A, where,
19
max, min are not imposed, w0 = 1.00 and  = 10. First, the mean of b̂ is very close to unity, across
the T, or is very close to w0 = 1; the standard deviation of the b̂ rises, however, as T increases.
Second, the mean of ĉ is very close to 5.0 across the T, and the standard deviation of the ĉ appears
to be independent of T. Third, the mean and standard deviation of the tc rise as T increases.
Effects of Changes in Parameters. Consider changes in , w0, 2Rmt, 2ut and E RM. In
Table 7.B w0 changes from 1.00 to 0.00, all else constant. The mean of b̂ changes to -0.0063421
from 0.98530, but the standard deviation changes little; essentially b̂ is centered at w0 and is
otherwise invariant to w0. The value of w0 has no systematic effects on the distributions of ĉ and tc.
In Table 7.C,  changes from 10 to -10. Essentially, the means of ĉ and tc change signs, but
otherwise their distributions are unchanged.  has little effect on b̂ ’s distribution.
In Table 7.D, the standard deviations of both errors are cut in half. The tc mean falls to
1.65744 from 2.00339, the standard deviation falls to 1.74683 from 1.92122. The ĉ ’s mean and
standard deviation are essentially unaffected. b̂ ’s mean is unaffected, but its standard deviation is
approximately cut in half. In Table 7.E, the standard deviation of the error term is cut in half. The
tc mean and standard deviation rise to 2.11193 from 2.00339, and to 2.04518 from 1.92122. The ĉ
mean and standard deviation are essentially unchanged, as are the b̂ mean and standard deviation.
Finally, in Table 7.F the mean rate of return on the market is cut in half. The tc mean and
standard deviation are essentially unchanged, as are those for the b̂ and ĉ .
6. Conclusions
Investors spend a good deal of time and trouble trying to discover mutual funds that show
superior, risk-adjusted performance, particularly superior mutual fund performance that persists.
One set of studies of superior performance focuses on market timing, or the fund’s ability to
20
predict relative rates of return on major classes of assets, often stocks versus cash. Classic papers
on mutual-fund market timing (Treynor and Mazuy 1966, Henriksson 1984) do not find
significant market-timing ability. Later papers, using daily rather than monthly data, find evidence
of significant market timing ability for an important number of firms in their studies (Chance and
Hemler 1999, Bollen and Busse 2001a); the authors suggest that monthly data may provide too
little power. ("Another good issue. See Farnsworth et al. Journal of Business (2002).") Bollen
and Busse (2001b) examine whether market timing ability persists and conclude that superior
ability shows significant persistence.
This paper studies the market-timing results that occur under the null that the fund has no
superior ability. The Treynor and Mazuy (1966) and Henriksson (1984) tests rely on the
assumption that their test statistic has an expected value of zero under the null of no ability; for
example, Treynor and Mazuy regress the fund’s rate of return on the market’s rate of return and
the square of the market’s rate of return, and test whether the slope coefficient on the marketsquared term is different from zero. This paper shows that the expected slope on the marketsquared term need not be zero under the null of no ability—it all depends on how the fund
manager with no ability alters the weight on the risky portfolio. In particular, suppose the fund
manager with no ability makes long-lasting changes in the fund’s weight on its risk portfolio, and
these weight changes are inversely proportional to the risky portfolio’s latest excess rate of return.
Then, the expected value of the slope on the market-squared term is positive, and the size in
standard t-tests is much larger than the nominal size. For example, in some cases discussed above,
there is a 0.50 probability of rejecting the null at the 10% significance level in a one-tail test.
Intuitively, the manager follows a type of portfolio insurance or a contrarian strategy. She
starts with much of her portfolio exposed to market risk. If there is a run of days on which the
21
excess return on the market is above average, she reduces the fraction of her portfolio at risk—she
locks in the abnormal returns that she made by chance. If there is a further run of days with
exceptional market performance, she reduces her risk-exposure even more, locking in more aboveaverage performance. If, however, there is a run of days with poorer than average market excess
returns, she increases the fraction of her portfolio exposed to market risk, similar to the Bernoulli
bettor who doubles up his bet if he loses.
The researcher testing for market-timing ability must condition the test on the properties of
the process generating the fund’s weight changes. If the weight time series is observable, the
persistence of changes in the weight is often obvious in a graph of the weight against time, and
standard unit root tests often detect the persistence. If the time series of weights is unobservable,
then the researcher can test for parameter instability in the test equation, for example, the TreynorMazuy test equation, in hopes of detecting the time variation of the weight on the risky portfolio.
It has long been recognized that market-timing tests may find spurious superiority.
Jaganathan and Korajczyk (1986) show spurious superiority may arise for a buy-and-hold
portfolio if that portfolio consists of stocks with a larger option component than the test’s
benchmark market portfolio. Further, it is now common practice to generalize the Treynor-Mazuy
regression by also including factors that capture anomalies such as size, earnings-price, bookmarket, momentum, the dividend yield and interest rates, to avoid giving a fund credit for
exploiting these well-known anomalies; and similarly the researcher must take care to adjust for
thin trading and skewness. The spurious superiority this paper discusses is yet another type for
which the researcher must adjust when evaluating market-timing ability.
22
Table 1. Distributions of t-values for Alternative Simulations
Case 1. max = 2.00, min = 0.00, and  = 10.00.
tc
a
b
c
Mean
1.28681
3.31008D-06
0.99928
2.41933
Std Dev
1.65014
0.00054312
0.61080
3.35767
Skewness
0.022696
-0.030868
0.0059841
-0.18119**
Kurtosis
0.13183**
0.57610**
-1.35557**
0.76334**
Minimum
-4.76387
-0.0025459
-0.013030
-15.28305
Maximum
7.73448
0.0022637
2.01838
16.42690
-0.095040** 0.20646**
-0.026803
1.65670**
-0.021487
-0.072029
-0.0053432
2.58344**
-6.43022
-0.0051384
-3.13936
-31.80902
9.01090
0.0047966
5.01458
45.53257
Case 2. max, min not imposed;  = 10.00.
tc
a
b
c
1.84454
2.41351D-06
0.99110
4.98096
1.84587
0.00083726
1.16039
5.77890
Case 3. max = 2.00, min = 0.00, w0 = 1.00 and  = 5.00.
tc
a
b
c
1.27974
-1.42E-05
0.99644
2.03787
1.47637
0.00045149
0.47277
2.47125
-0.088788**
-0.04150*
0.010136
-0.26540**
0.15217**
0.27588**
-0.96459**
0.67296**
-5.17598
-0.002146
0.017253
-10.24065
6.90988
0.001835
1.97032
13.10835
-4.91819
-0.0012661
0.55016
-6.69911
4.73011
0.0011688
1.43768
6.25706
Case 4. max = 2.00, min = 0.00, w0 = 1.00 and  = 1.00.
tc
a
b
c
0.39973
1.06979
.63578D-06 0.00033603
1.00027
0.11852
0.49382
1.34263
-0.0055157
-0.024536
0.012371
-0.016438
0.062620
-0.054174
0.028159
0.24172**
Notes to Table 1. The excess rate of return is zero on the risk-free portfolio. The risky portfolio has a beta of unity
on the excess rate of return on the market, RM,t. The risky portfolio’s idiosyncratic error ut: E ut = 0, E ut RM,t+j for all
t,j, and E ut ut+j = (2u, 0) for j (= 0,  0). The excess rate of return on the risky portfolio R R,t is
(1)
RP,t+1 = wt RM,t+1 + ut+1,
where wt is the weight on the risky portfolio at the start of period t. The wt changes in inverse proportion to the
portfolio’s latest excess rate of return, wt = wt - wt-1 = -  (RM,t-1 - E RM), giving
(2)
wt = wt-1 -  (RM,t-1 - E RM) = w0 -  tj=1 (RM,j-1 - E RM),
where 0 <  >/< 1. This basic weight-adjustment rule may impose bounds on wt, for example,  max, min,
= [w0 -  tj=1 (RM,j-1 - E RM)]
for max > [w0 -  tj=1 (RM,j-1 - E RM)] > min
(2’)
wt
= max
for [w0 -  tj=1 (RM,j-1 - E RM)]  max
= min
for [w0 -  tj=1 (RM,j-1 - E RM)]  min
The data generating process (DGP) is (1), (2) or (1), (2’). Performance is evaluated by the OLS regression
(3)
RP,t = a + b RM,t + c (RM,t)2 + vt,
where vt is a residual. Superior performance is inferred from c > 0, negative or perverse performance from c < 0.
For a normal distribution, the skewness measure and the (excess) kurtosis measure both have a mean of
zero, and the 5% critical values are 0.04801 and 0.096022 (with 10% critical values of 0.040417 and 0.080835).
0.5%
1.0
2.5
5.0
10
Critical Values from Simulations, One-Tail Test
Case 1
Case 2
Case 3
Case 4
5.640368
6.614648
5.006590
3.174528
5.234447
6.089043
4.671458
2.861250
4.571965
5.373463
4.128554
2.511393
4.008997
4.845478
3.683410
2.181221
3.393424
4.156200
3.176023
1.776886
23
N(0, 1)
2.575
2.327
1.960
1.645
1.281
Table 2. Size vs. Nominal Size, One-Tail Test a
Nominal Size/Size
.10
.05
.025
.01
.005
%<0
Case
.4974
.6293
.5027
.2030
.6546
.4095
.5495
.4071
.1235
.5808
.3370
.4801
.3195
.0755
.5170
.2855
.3733
.1887
.0211
.4365
.2163
.3501
.1080
.0056
.3855
21.47
15.56
21.26
38.05
14.86
a
1
2
3
4
T = 500
See Table 1 for definitions of the Cases. The case of “T = 500” is in Table 5.A.
Table 3.A. Effects on tc of Speed of Mean Reversion Per Day a
( max = 2.00,  min = 0.00;  = 10.00; T = 250; w*= 1)
Speed
0.000
0.001
0.010
0.100
Mean
1.28681
1.19446
0.84955
0.17157
Std Dev
1.65014
1.50843
1.45632
1.23238
Skewness
0.022696
-0.13725**
-0.11347**
-0.071538
Kurtosis
0.13183**
0.21318**
0.14660**
0.18058**
Minimum
-4.76387
-4.92524
-5.15372
-6.54233
Maximum
7.73448
7.62329
6.45519
5.11234
Table 3.B. Effects on Estimated Parameter Values of Speed of Mean Reversion Per Day a
( max = 2.00,  min = 0.00;  = 10.00; T = 250; w*= 1)
Speed
0.000 a
b
c
Mean
3.31008D-06
0.99928
2.41933
Std Dev
0.00054312
0.61080
3.35767
Skewness
-0.030868
0.0059841
-0.18119**
Kurtosis
0.57610**
-1.35557**
0.76334**
Minimum Maximum
-0.0025459 0.0022637
-0.013030 2.01838
-15.28305 16.42690
0.001 a
b
c
5.43056D-06 0.00044892
1.00446
0.44930
1.86973
2.52215
-0.014435
-0.0053798
-0.36698**
0.25460** -0.0023699 0.0019048
-0.86549** -0.00048421 1.97779
0.79280** -10.15006
13.06612
0.010 a
b
c
4.80644D-06 0.00042504
0.99859
0.26928
1.23097
2.29967
-0.062477** 0.12423** -0.0019480 0.0016125
-0.028983
-0.17015** 0.072844
1.90248
-0.39669**
0.67856** -9.81583
9.60677
0.100 a
b
c
-1.74960D-06 0.00035358
0.99979
0.046362
0.21281
1.60208
-0.029180
-0.0069935
-0.10743**
0.036446 -0.0014021 0.0013055
0.026703 0.84126
1.19202
0.12375** -6.82606
5.50546
Notes to Tables 3.A. and 3.B. These simulations are comparable to Case 1. The long-run weight on the risky
portfolio is w* = 1. The maximum and minimum weights are  max and min. The excess rate of return on the risky
portfolio RP,t is
(1)
RP,t+1 = wt RM,t+1 + ut+1,
The weight wt is a mean-reverting process, and wt tends to its long-run value at the adjustment speed 1 > 0. Then,
(4)
wt = (0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM).
The long-run value of wt is w* = (0 1) / 1 = 0; thus, if the long-run weight is w* = 1, then 0 = 1, and the
intercept in (4) is (0 1) = 1. Similar to above, bounds of max, min may be imposed to give the model
= max
for  max  [(0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM)]
(4’)
wt
= (0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM) for  max > [(0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM)] > min
= min
for  min  [(0 1) + (1 - 1) wt-1 -  (RM,t-1 - E RM)]
24
Table 4. Cross-Section Time-Series Experiment with 
tc,i,=10 = c0 + c1 tc,i,=5 + ei
Variable
Coefficient
Std. Error
t-statistic
P-value
c0
c1
0.042846
1.00202
0.895239E-02
0.459746E-02
4.78600
217.950
[.000]
[.000]
Notes: Two funds have the same risky portfolio with the same idiosyncratic error. Both set max = 2.00 and min =
0.00; one sets  = 10.00, the other  = 5.00: thus, the first fund mimics Case 1, the second Case 3. In each of 10,000
replications (i=1,…, 10,000) the same data are used for  = 10.00 and for  = 5.00. The correlation between the
values of their tc in the quadratic regressions that test for performance is 0.90891.
25
Table 5. Ferson-Khang Modified Treynor-Mazuy Regression Results:
No upper or lower bounds. 1
Rp,t = a + b1 RM,t + b2 [RM,t Z*t-1] + c (RM,t)2 + ut
a
b1
b2
c
Mean
.33352D-06
0.99998
1.00042
0.0026700
Std Dev
.00032731
0.074047
0.072304
1.25199
Minimum
-0.0013591
0.69593
-1.42588
-5.29629
Maximum
0.0012945
1.30710
-0.51490
5.96179
Skewness
-0.021930
0.037009
0.0043412
-0.00049195
Kurtosis
0.067087
0.12667
2.09700
0.17958
Correlation Matrix
a
1.00000
0.0088125
-0.012917
-0.56766
a
b1
b2
c
b1
b2
c
1.0000
-0.0070292
-0.016333
1.0000
0.14681
1.00000
Univariate statistics
c
tb2
1
Mean
-0.0019133
-17.71386
Std Dev
1.00771
7.55050
Minimum
-3.92133
-63.85422
Maximum
3.61945
-3.01581
Skewness
0.0095965
-1.21853
Kurtosis
0.024926
2.20190
The Ferson-Khang test regression is
Rp,t = a + b1 RM,t + b2 [RM,t Z*t-1] + c (RM,t)2 + ut,
t = 1, 250.
The data generating process is
RM,t  N(0.00024 , 0.0126492),
w0 = 1,
 = 5.0,
t  N(0.0, 0.004216372); in RM,t + t, RM,t contributes 90 % of the variance
Z*0 = 0.0000
w1,1 = w0 -  (RM,1 - 0.00024);
w1,t = w1,t-1 -  (RM,t - 0.00024);
Z*t = Z*t-1 + (RM,t - 0.00024);
Rp,t = w1,t-1 RM,t + t.
26
Table 6. Ferson-Khang Modified Treynor-Mazuy Regression Results:
Upper and lower bounds 2.00 and 0.00. 1
Rp,t = a + b1 RM,t + b2 [RM,t Z*t-1] + c (RM,t)2 + ut
a
b1
b2
c
Mean
1.87593D-06
0.99936
-0.99923
-0.0048807
Std Dev
0.00032703
0.073438
0.070714
1.23257
Minimum
-0.0014869
0.68537
-1.40863
-4.82050
Maximum
0.0011915
1.28859
-0.66231
4.71443
Skewness
Kurtosis
-0.0096071 -0.062566
0.013934
0.065946
0.0053747
1.70749
-0.0049155
0.11685
Correlation Matrix
a
1.0000
-0.0029515
0.012940
-0.56875
a
b1
b2
c
b1
b2
c
1.00000
0.023010
-0.0029576
1.0000
0.13542
1.00000
Univariate statistics
tc
tb2
1
Mean
-0.0045449
-17.57704
Std Dev
0.99392
7.44291
Minimum
-3.67930
-62.30221
Maximum
3.66282
-4.08806
Skewness
-0.016973
-1.18674
Kurtosis
-0.021319
1.80547
The Ferson-Khang test regression is
Rp,t = a + b1 RM,t + b2 [RM,t Z*t-1] + c (RM,t)2 + ut,
t = 1, 250.
The data generating process is
RM,t  N(0.00024 , 0.0126492),
w0 = 1,
 = 5.0,
t  N(0.0, 0.004216372); in RM,t + t, RM,t contributes 90% of the variance)
up = 2.00,
Z*0 = 0.0000,
w1,1 = w0 -  (RM,1 - 0.00024);
w1,t = w1,t-1 -  (RM,t - 0.00024);
Z*t = Z*t-1 + (RM,t - 0.00024);
D = 1 for w1,t < up and w1,t > low; D = 0 otherwise
D1 = 1 for w1,t < low; D1 = 0 otherwise
D2 = 1 for w1,t > up; D2 = 0 otherwise
wt = D w1,t + D1 low + D2 up
Rp,t = wt-1 RM,t + t.
27
low = 0.00
Table 7. A. Increases in T: Effects on Parameter Estimates and t-values a
T
750
b
c
Mean
1.00208
4.97189
Std Dev
2.00370
5.75132
Skewness
Kurtosis
-0.00436530 0.017556
-0.00063187 2.73920**
500
b
c
0.98530
5.02874
1.63275
5.65357
-0.0051030
0.041299
0.13025**
2.75065**
250
b
c
0.99110
4.98096
1.16039
5.77890
-0.021487
-0.0053432
-0.072029
2.58344**
125
b
c
0.99748
4.98402
0.83597
5.81352
-0.0102660 0.059600
-0.01711300 2.32064**
T
750
500
250
125
tc
Mean
2.03066
2.00339
1.84454
1.63481
Std Dev
1.98851
1.92122
1.84587
1.68027
Skewness
Kurtosis
-0.073599** 0.070424**
-0.072320** 0.048996
-0.095040** 0.20646**
-0.058848** 0.32544**
Table 7. B. Effects of w0 = 0.00 b
500
a
b
c
tc
Mean
-3.33236E-06
-0.0063421
5.03948
1.99033
Std Dev
Skewness
0.00082905 0.096422
1.66439
0.0068818
5.84538
-0.10406
1.94735
-0.12633
a
Kurtosis
2.12736
-0.029558
3.09065
0.22587
Notes for Table 7. The simulation results are designed to complete the characterization of the parameter estimates
and tc begun in preceding tables. In all cases, max, min are not imposed.
b
The base-case is w0 = 1. The simulations here are for w0 = 0.00.
28
Table 7. C. Effects of  = -10.00 c
500
a
b
c
tc
Mean
2.10199E-06
1.02184
-5.01169
-1.99248
Std Dev
0.00082041
1.64334
5.82084
1.93682
Skewness
-0.071728
-0.010790
0.095436
0.14857
Kurtosis
2.11383
-0.015547
2.89610
0.14187
Table 7. D. Standard Deviations Cut in Half d
a
b
c
tc
Mean
8.54990D-07
1.00127
4.96676
1.65744
Std Dev
0.00022860
0.81695
6.05468
1.74683
Skewness
-0.0027387
0.062091
-0.068413
-0.18403**
Kurtosis
1.79665**
0.11748**
2.76009**
0.27025**
Table 7. E. Standard Deviation of Idiosyncratic Risk Is Cut in Half e
a
b
c
tc
Mean
4.80079E-06
1.02992
5.00364
2.11193
Std Dev
0.00079290
1.65494
5.74001
2.04518
Skewness
-0.060433
0.039930
0.045356
-0.027181
Kurtosis
2.21465**
0.086343
2.95568**
0.15630**
Table 7. F. Mean of the Rate of Return on the Market Is Cut in Half f
a
b
c
tc
Mean
-8.44441D-06
1.03061
5.07078
2.00268
Std Dev
0.00081381
1.64822
5.81818
1.94167
Skewness
-0.10092
-0.043592
0.086718
-0.073600
Kurtosis
2.08108
0.017950
2.91858
0.13878
The base-case is  = 10.00. In these simulations,  = - 10.00.
In these simulations, the variances of both the rate of return on the market and the idiosyncratic error term are cut
in half relative to the base-case.
e
In these simulations, the variance of the idiosyncratic error term is cut in half relative to the base-case.
f
In these simulations, the expected rate of return on the market is cut in half relative to the base-case.
c
d
29
Appendix: Details of Simulation Results
In Case 1, the ĉ have a positive mean, are highly leptokurtic and are left skewed.
The â have a positive mean and are highly leptokurtic. The correlation between â and ĉ is 0.75999, highly significant. In a regression of â on ĉ ,
â i = 0 + 1 ĉ i + ei
where ei is a residual, the estimated slope is negative, small but highly significant; the slope and
t-value are -.122931E-03 and -116.920. The estimate of 0 and its t-value t0 are .300720E-03
and 69.1134, or the intercept has a small but strong positive bias. Note that the regression’s
residuals are non-normal and show heteroskedasticity. Comparable regressions in Cases 2 and 3
also have non-normal and heteroskedastic residuals, but this is not true for Case 4, with  = 1.00,
The b̂ show little correlation with the other coefficients: â and b̂ have a correlation of
0.013417, b̂ and ĉ of -0.0098278, both insignificant.
In Case 2, the mean of ĉ is 4.98096, more than twice as large as the 2.41933 in Case 1.
The ĉ have little skewness, and substantial positive kurtosis. The â have a positive mean, little
skewness, and substantial positive kurtosis. The correlation between â and ĉ is -0.81265, highly
significant (-0.75999 in Case 1). The b̂ show little correlation with the â and ĉ : â and b̂ have a
correlation (probability) of -0.022141 (0.027), b̂ and ĉ of 0.011995 (0.230).
In Case 3, the ĉ have a positive mean, with significantly negative skewness and positive
kurtosis. â has a negative mean; â has borderline significant negative skewness, and significant
positive kurtosis. The correlation between â and ĉ is -0.73525 (compared to -0.75999 in Case 1).
The b̂ show little correlation with the other coefficients.
In Case 4, the ĉ have a positive mean, with little skewness and positive kurtosis.
The â have a positive mean, with little skewness. The correlation between â and ĉ is -0.59782,
compared to -0.75999 in Case 1. The b̂ show little correlation with the other coefficients:
â and b̂ have a correlation (t-value) of 0.014447 (1.44469), b̂ and ĉ of -0.020131 (-2.01330).
Table 8.A, shows the effects of T. As T doubles from 125 to 250 to 500, T1/2 goes up by a
factor of 1.4142136. The standard deviation of the b rises by approximately 1.414. As T goes
from 500 to 750, T1/2 rises by a factor of 1.2247449, and b’s standard deviation rises by
approximately 1.225. Comparing the cases where T = 500 and T = 250, the critical values are
approximately 0.30 larger than those for Case 2. For the nominal 10% size, the T=500 case has a
larger size than the T = 250 case, but by less than 0.0250, or 0.6546 versus 0.6288; the difference
between the two cases grows to 0.0632 for the 1.0% significance level.
30
References
Aragon (2005).
Brown, S., and W. Goetzmann, “Performance persistence,” Journal of Finance 50 (1995), 679698.
Brown, S., W. Goetzmann, Robert Ibbotson, and S. Ross, 1992, “Survivorship bias in
performance studies,” Review of Financial Studies 4 (1992), 553-580.
Busse, J., “Volatility timing in mutual funds: Evidence from daily returns,” Review of Financial
Studies 12 (1999), 1009-1041.
Carhart, M., “On persistence in mutual fund performance,” Journal of Finance 52 (1997), 57-82.
Chance, Don M., and Michael L. Hemler, “The performance of professional market timers: Daily
evidence from executed strategies,” Journal of Financial Economics 62 (2001), 377-411.
Comer, George, “Hybrid Mutual Funds and Market Timing Performance,” Journal of Business
(2003).
Cumby, Robert E., and David M. Modest, “Testing for market timing ability: a framework for
forecast valuation,” Journal of Financial Economics 19 (1987), 169-189.
Daniel, Kent, Mark Grinblatt, Sheridan Titman, and Russ Wermers, “Measuring mutual fund
performance with characteristic-based benchmarks,” Journal of Finance 52 (1997), 1035-1058.
Daniel, Naven D., 2002. Do Specification Errors Affect Inferences on Portfolio Performance?
Evidence from Monte Carlo Simulations. Working paper, Georgia State University, Atlanta, GA.
Fama, Eugenc, and Kenneth French, “Common risk factors in the returns on stocks and bonds,”
Journal of Financial Economics 33 (1993), 3-56.
Ferson, Wayne, and Rudi W. Schadt, “Measuring fund strategy and performance in changing
conditions,” Journal of Finance 51 (1996), 425-462.
Ferson, Wayne, and Vincent A. Warther, “Evaluating fund performance in a dynamic market,”
Financial Analysts Journal 52 (1996), 20-28.
Glosten and Jaganathan, Journal of Empirical Finance (1994).
Goetzmann, W., and R. Ibbotson, “Do winners repeat? Patterns in mutual fund performance,”
Journal of Portfolio Management 20 (1994), 9-18.
Goetzmann, W., J. Ingersoll, and Z. Ivkovic, “Monthly measurement of daily timers,” Journal of
Financial and Quantitative Analysis 35 (2000) 257-290.
31
Graham, John R., and Campbell R. Harvey, “Market timing and volatility implied in investment
newsletters’ asset allocation recommendations,” Journal of Financial Economics 42 (1996), 397421.
Grinblatt, Mark, and Sheridan Titman, “Mutual fund performance: An analysis of quarterly
portfolio holdings,” Journal of Business 62 (1989), 394-415.
Grinblatt, Mark, and Sheridan Titman, “Performance measurement without benchmarks: An
examination of mutual fund returns,” Journal of Business 66 (1993), 47-68.
Grinblatt, Mark, and Sheridan Titman, “A study of monthly mutual fund returns and
performance evaluation techniques,” Journal of Financial and Quantitative Analysis 29 (1994)
419-444.
Harvey, Campbell, and Akhtar Siddique, "Time-Varying Conditional Skewness and the Market
Risk Premium,” Research in Banking and Finance I (2000), 25-58.
Hendricks, D., J. Patel, and Richard Zeckhauser, “Hot hands in mutual funds: Short-run
persistence of performance, 1974-1988,” Journal of Finance 48 (1993), 93-130.
Henriksson, Roy D., “Market timing and mutual fund performance: An empirical investigation,”
Journal of Business 57 (1984), 73-97.
Henriksson, Roy D., and Robert C. Merton, “On market timing and investment performance. II.
Statistical procedures for evaluating forecasting skills,” Journal of Business 54 (1981), 513-533.
Jagannathan, Ravi, and Robert A. Korajczyk, “Assessing the market timing performance of
managed portfolios,” Journal of Business 59 (1986), 217-235.
Lee, Cheng-few, Sharfiqur Rahman, “ Market Timing, Selectivity, and Mutual Fund
Performance: An Empirical Investigation,” Journal of Business 63 (1990), 261-278.
Lehmann, Bruce N., and David M. Modest, “Mutual Fund Performance Evaluation: A
Comparison of Benchmarks and Benchmark Comparisons,” Journal of Finance 42 (1987), 233265.
Roll, Richard “Ambiguity When Performance Is Measured by the Security Market Line,”
Journal of Finance 33 (1978), 1051-1069.
Sharpe, William F., “Asset Allocation: Management Style and Performance Measurement,”
Journal of Portfolio Management, (1992), 7-19
Treynor, Jack L., and Kay K. Mazuy, “Can mutual funds outguess the market?” Harvard
Business Review 44 (1966), 131-136.
32