Download how are crop yields distributed - Department of Agricultural Economics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Ranking Crop Yield Models Using Out-of-Sample Likelihood Functions
Authors:
Bailey Norwood, Assistant Professor, Department of
Agricultural Economics, Oklahoma State University.
Matthew C. Roberts, Assistant Professor, Department of
Agricultural, Environmental, and Development Economics,
The Ohio State University.
Jayson Lusk, Associate Professor, Department of
Agricultural Economics, Purdue University.
Contact:
Bailey Norwood
426 Agricultural Hall
Stillwater, Oklahoma 74078
405-744-9820
Email: [email protected]
Final Submission: American Journal of Agricultural Economics
Date: December 18, 2003
Ranking Crop Yield Models Using Out-of-Sample Likelihood Functions
Accurate knowledge of crop yield behavior is critical for devising farm
management tools, farm policy and crop insurance. Knowledge of the likelihood and
severity of yield shortfalls, for example, is necessary for government programs to respond
with the appropriate policy. However, crop yields can be extremely variable from year to
year, perhaps much more than the output of non-agricultural firms. Although
understanding the stochastic nature of crop yields is important, characterizing yield
distributions can be quite difficult.
How are crop yields distributed? The literature is replete with candidate yield
distributions, few of which can be excluded on theoretical grounds. This has led
agricultural economists to seek empirical evidence as to which models are best. Previous
studies have relied almost exclusively on hypothesis tests for model discrimination.
While hypothesis tests are useful in many settings, relying only on in-sample fit does not
provide information on how well models extrapolate outside the range of data from
which they were estimated.
Yield distributions are most commonly used for forecasting. Historical yield
distributions are used to set crop insurance premiums based upon the assumption that the
following year’s realization is drawn from the same distribution. Farm policy studies
often make similar assumptions. Because of the importance of forecasting crop yields,
yield distributions should be chosen at least partially on out-of-sample performance.
Although there exists a rich body of literature describing how to evaluate point forecasts,
less work has focused on how to evaluate the predictive ability of an entire distribution.
1
The purpose of this study is to develop a method for evaluating crop yield models
by determining how well they describe the distribution of out-of-sample yields. We show
that predicted yield densities can be ranked by their log-likelihood function values
observed at out-of-sample observations. This method of model evaluation is then applied
to six popular yield models recently developed in the literature. The six models are pitted
against one another in a forecasting contest. We find that a model with a kernel smoother
applied to percent deviations from the mean outperforms the competing models in
describing out-of-sample yields.
Evaluating Predicted Yield Densities
Ranking yield models by their out-of-sample performance is worthwhile for two reasons.
First, sample sizes are typically low, making it easy to over-fit models. Second, yield
models are frequently used for making probability statements about future yields, and as
such, it seems natural to rank models by their ability to describe out-of-sample yields.
The most common method of ranking models by out-of-sample performance is prediction
error. Prediction error measures the distance between predicted yield and actual yield for
a series of out-of-sample yields. Common prediction error statistics are the out-ofsample-root-mean-squared error; average-absolute-out-of-sample error; and the Ashley,
Granger, and Schmalensee test (Brandt and Bessler; Kastens and Brester; Norwood and
Schroeder).
Although useful in many instances, relying solely on prediction error for model
selection leaves much to be desired. Prediction error alone does not account for how well
2
a model captures variance, skewness, kurtosis and probabilities in general. This issue is
of particular interest in characterizing yield distributions because insurance policies, for
example, pay out when actual yields fall below a particular threshold, which is often quite
different than the mean yield level. If the purpose of yield distributions is to generate
probability statements over the entire distribution, we must consider more than prediction
error. In particular, we contend that a model should perform well in describing
forecasted probability statements relative to observed yields. Such a feat can be
accomplished by analyzing a model’s likelihood function value observed at out-ofsample observations.
The Out-of-Sample-Log-Likelihood Function (OSLLF) Approach
Many model selection criteria are available for comparing models’ likelihood functions.
Virtually all model selection criteria are based on the Kullback-Leibler Information
Criterion (KLIC), which measures the distance between a hypothesized likelihood
function and the true likelihood function. The KLIC was originally derived from Bayes’
Rule as follows. Let g(Y|θg) and f(Y|θf) be the likelihood functions of two competing
models, where θf and θg are parameter vectors and Y = [yt=1, ... yt=T]'. Using Pr{.} to
denote probabilities, according to Bayes’ Rule the probability that g(.) is the correct
model given the observed data is
(1) Pr{g(.) | Y} =
Pr{Y | g(.)}Pr{g(.)}
.
Pr{Y}
3
By definition, Pr{Y | g(.)} = g(Y | θ g ) . Similarly, the probability that f(.) is correct, given
Y, is
(2) Pr{f (.) | Y} =
Pr{Y | f (.)}Pr{f (.)}
.
Pr{Y}
If we assume that, prior to observing Y, each model has an equal probability of being
correct, i.e. Pr{f (.)} = Pr{g(.)} , the odds ratio for the competing models is
(3)
Pr{f (.) | Y} Pr{Y | f (.)} f (Y | θ f )
=
=
Pr{g (.) | Y} Pr{Y | g(.)} g (Y | θ g )
Model f(.) would be chosen over model g(.) if the value of (3) is greater than one.
In order for (3) to be statistically valid, the likelihood functions must share the variable of
integration. For example, if h(Y|θ1) Y>0 is a normal distribution, then h(ln(Y)|θ2) is a
log-normal distribution. The odds ratio for the two densities must then be stated as
h(Y|θ1)/ [h(ln(Y)|θ2)Y-1].
Notice that the greater the value of (3), the more information the model f(Y|θf)
contains relative to g(Y|θg). Kullback and Leibler convert the odds ratio from a relative
measure of model performance to an absolute measure by assuming g(Y|θg) is the true
data generating process. After taking the logarithm and the expectation of (3) with
respect to the true process, the information content of f(Y|θf) can now be written as
(4) I(f (.) | Y ) =
∞
∫ [ln[f (Y | θ )] − ln[g(Y | θ )]]g(Y | θ )dY .
f
g
g
−∞
By this measure, the best model is that which minimizes (4). Since g(Y|θg) is the
true data generating process it will always be larger than f(Y|θf), and the model with the
4
largest expected likelihood function, shown in (5), minimizes the KLIC (Kullback and
Leibler).
(5) I(f (.) | Y ) =
∞
∫ ln[f (Y | θ )]g(Y | θ )dY .
f
g
−∞
Because the true process g(.) is unknown, (5) cannot be directly calculated. If the
observations y1, ..., yT in Y are independently and identically distributed, we can estimate
(5) by estimating the parameter vector θf and calculating the log-likelihood function as
T
[(
(6) Î(f (.) | Y ) = ∑ ln f y t | θ̂ f,t
t =1
)]
where θˆ f,t is the parameter estimate used to calculate f(.) at observation t. At first glance,
it might appear that θf could be estimated by maximum likelihood using observations t =
1, ..., T. However, such an approach would lead to an upward biased estimator of the true
KLIC in (5), where this bias is increasing with the number of parameters in θf (Akaike;
Sawa).
An alternative approach is to estimate the KLIC by calculating the log-likelihood
function using out-of-sample observations.
T
[(
(7) Î(f (.) | Y ) = ∑ ln f y t | θ̂ f,-t
t =1
)]
In (7), θˆ f,-t is the parameter vector estimated without the tth observation. Thus, we refer
to (7) as an out-of-sample-log-likelihood function (OSLLF). The OSLLF value can be
calculated several ways. The estimate θ̂ f,-t could be calculated using cross-validation
where θˆ f,-t is estimated using every observation except t. This is often referred to as
5
“leave one out at a time forecasting.” Alternatively, we could partition the observations
into groups where each group is iteratively omitted and θˆ f,-t is estimated. Then, the
omitted group of observations can be used to calculate the OSLLF. This procedure is
referred to as grouped-cross-validation. Grouped-cross-validation where the number of
partitions equals the number of observations is simply cross-validation. The greater the
number of partitions, the more parsimonious will be the chosen model (Shao). The
OSLLF is an asymptotically unbiased estimator of the KLIC with the proper number of
partitions (Stone 1977a and 1977b; Shao; Zhang 1992 and 1993).
Because the optimal number of partitions is a function of the true data generating
process, it is impossible to identify exactly (Zhang 1993). Without knowledge of the true
data generating process, it is always questionable which model selection criterion is best.
This fact is true of all model selection criteria, not just the OSLLF criterion. However,
some insight is provided through simulation studies. Zhang suggests using groupedcross-validation utilizing no less than five partitions. In subsequent analysis, we carry
out grouped-cross-validation with at least five partitions, where each partition contains at
least five observations.
In the statistical literature, the term validation usually refers to log-likelihood
functions, whereas in the agricultural economics literature validation often refers to
prediction error. To prevent confusion, we use validation to refer to cross- and groupedcross-validation, and refer to the calculation of log-likelihood functions using validation
as the Out-of-Sample-Log-Likelihood function (OSLLF) approach.
6
The OSLLF criterion has the asymptotic property that it chooses the model with
the highest information content, as measured by the Kullback-Leibler Information
Criterion. While such asymptotic properties are desirable, due the nature of existing yield
samples, we are also interested in the small sample properties of the OSLLF criterion. To
explore the small sample properties of the OSLLF criterion, Norwood, Lusk, and Roberts
and Norwood, Ferrier, and Lusk performed simulation studies to provide a comparison
between the OSLLF and other model selection criteria in small samples. The simulations
were designed to mimic crop yield distributions. The results showed that the OSLLF
criterion picked the true yield distribution with a higher frequency than other popular
methods such as the Akaike Information Criterion, out-of-sample-root-mean squared
error, Chi-Square statistic, Kolomogorov-Smirnov statistic, and the Anderson-Darling
statistic.
Interpolative And Extrapolative Forecasts
Yield models are sometimes used for describing the past behavior of yields, but are more
frequently used for predicting future yields. A common feature of most yield models is
that the yield distribution is described as dependent on a time trend. Thus, forecasting
future yields requires extrapolation outside the data. Econometricians usually caution
against such extrapolations, but extrapolation is the major reason yield distributions are
estimated. For this reason, we make no apologies for ranking models outside the range of
data used to estimate them. In fact, we embrace this approach by making extrapolation
an important component of model selection.
7
Validation of yield models that possess time trends requires interpolation and
extrapolation. Suppose that our sample contained the years 1970-1999, and validation is
performed by partitioning the data into six groups of five observations each. When
validation is used to forecast yields from 1975-1994, such forecasts are interpolative
since the time trend value at each forecast is within the range used for estimation.
Forecasts of yields in the periods 1970-1974 and 1995-1999 are extrapolative forecasts
since the time trend values are outside the range used for estimation. We believe that
interpolative and extrapolative forecasts provide different kinds of information, and it is
the extrapolative forecasts that make the OSLLF criterion especially appealing for yield
densities.
In large samples, there is no difference between validation and other likelihood
based criteria such as the Akaike, Bayesian, Final Prediction, and Shibata criteria (Stone
1977a; Zhang). However, these other criteria utilize only in-sample observations where
no extrapolations are made. Obtaining information on the ability of yield models to
provide accurate probability statements outside the range of data is crucial for selecting
models best suited for prescribing risk management strategies.
Candidate Yield Models
It has long been recognized that yields exhibit complex behavior. Mean yield may or
may not be increasing over time. The same can be said for yield variance. For U.S. corn
yields, the mean and variance are increasing over time. However, Miller, Kahl, and
Rathwell find that South Carolina and Georgia peach yield distributions exhibit a
8
constant mean and variance over this same time period. In addition to the complex
possibilities of the behavior of mean and variance throughout time, yields may be
negatively skewed, positively skewed or symmetric (Day) and may even exhibit
bipolarity (Goodwin and Ker). In response, agricultural economists have developed a
myriad of flexible models to allow shape of the distribution shape to be driven by
available data. For the purposes of this analysis, six yield models published between
1980 and 2000 were drawn from American Journal of Agricultural Economics. These
models are discussed below.
GAMMA Model
The first model is a frontier model developed by Gallagher, in which the maximum
attainable yield is described by YM = γ0 + γ1t where t is an annual time trend. Deviations
from this yield frontier are
(8) εt = Yt - γ0 - γ1t.
Notice that the error εt is always positive, making it suitable for a gamma distribution. If
we substitute (8) as the random variable in the gamma probability density function, the
yield pdf is
[
] [γ
(9) f (Yt ) = β α Γ(α )
-1
{
}
+ γ 1 t - Yt ] exp − β −1 [γ 0 + γ 1 t - Yt ] .
α −1
0
where the Greek symbols are parameters to be estimated. The gamma function is flexible
in that it can be symmetric or left-skewed, which in this application would make yield
right-skewed. To account for heteroskedasticity, Gallagher used OLS residuals to
9
estimate error variance as a function of a time trend. He then created a time-specific
index for yield variance, denoted VSt, and entered this index into the pdf as
[
] [γ
(10) f (Yt ) = β α Γ(α )
-1
+ γ 1 t - Yt ]
α −1
0
{
}
VS1t-α exp − β −1 VS −t 1 [γ 0 + γ 1 t - Yt ] .
For this study, this model is modified in two ways. First, the time trend for the
yield frontier is replaced with a single value for maximum yield. This maximum yield
(YM) is the largest observed yield (in- and out-of-sample) multiplied by 1.2, which is
unlikely to be exceeded within forecast horizons of less than five years. This
modification is made because maximum likelihood estimates tend to place the maximum
attainable yield close to the largest in-sample observation. This almost inevitably results
in a likelihood of zero for some observation during the validation procedure. Second, to
allow the yield distribution to evolve over time, the parameters α and β are made
conditional on a time trend
~
α = α1 + α 2 t
(11) ~
.
β = β1 + β 2 t
This specification allows heteroskedasticity, so no weighting of the observations by VSt
is needed. The yield model analyzed in this study, referred to as GAMMA, is
[
][
-1
~~
α ) Y M - Yt
(12) f (Yt ) = β α Γ(~
]
~
α −1
{
[
]}
~
exp − β −1 Y M - Yt .
This distribution can exhibit a positive skew but not a negative skew. To allow a positive
skew, another version of the model is created where Yt is substituted for YM – Yt. The
version with the highest [in-sample] likelihood function is selected for each estimation.
10
BETA
Nelson and Preckel also utilized the concept of a maximum attainable yield, which they
estimated by maximum likelihood. Deviations of yield from its maximum value were
then modeled as a conditional beta distribution. This model is referred to as the BETA
model. In its original form, the BETA model conditioned the two beta parameters on
agricultural inputs, such as nitrogen use and soil characteristics. To facilitate comparison
with the other five models, these parameters are instead conditioned on a time trend.
This specification naturally allows heteroskedasticity and a time-dependent mean yield.
If YM denotes the yield ceiling, the pdf for the BETA model can be written as
(13) f (Yt ) =
(
Y α −1 Y M − Y
( )
B(α, δ ) Y M
)
δ −1
α + δ −1
where B(.) is the beta function. Nelson and Preckel originally specified α and δ as CobbDouglas functions of farm inputs. We replaced the farm inputs with a time trend, but
found the Cobb-Douglass form made convergence very difficult. Therefore, we specified
α and δ as linear time trends, eliminating convergence difficulties.
(14)
α = θ1 + ω1 t
δ = θ 2 + ω2 t
As with GAMMA, YM is 20% greater than the highest historical yield in BETA.
STOCHIHS
Moss and Shonkwiler modeled mean yield as a linear time trend, but allowed the
parameters of this trend to be random according to a Kalman Filter. The Kalman Filter
11
allows maximum likelihood estimation to determine the degree of randomness in the
parameters. The equations governing mean yield are described as
Yt = µ t + ε t
(15) u t = u t −1 + β t −1 + η t .
β t = β t −1 + ζ t
Notice that this specification allows the intercept and/or the slope of the linear time trend
to vary across time. Estimation of (15) returns values of µt=0 and βt=0 and the variances of
ηt and ζt, denoted ση2 and σζ2, respectively. Deviations of yield from its time trend, εt, is
assumed to follow a hyperbolic sine transformation of normality by constructing a new
variable vt which equals
(
{
}
ln θε t + θ 2 ε 2t + 1
(16) v t =
θ
1/2
)− δ .
The distribution of vt is assumed normal with a zero mean. Maximum likelihood
estimation of vt provides estimates of θ, δ, µ0, β0, σv2, ση2 and σζ2 where σv2 is the
variance of vt. While vt is assumed normal, actual yields may exhibit non-normality by
the selection of θ (for skewness) and δ (for kurtosis). Owing to the use of the Kalman
filter, the likelihood function is rather complex. For sake of brevity, the likelihood
function is not reproduced here, but is available in the original article.
This model is referred to as STOCHIHS where the STOCH label designates yields
to be a function of stochastic parameters and the IHS label indicates the use of an inverse
hyperbolic sine function. The STOCHIHS model imposes homoskedasticity to maintain
tractability in the Kalman filter. This model is difficult to employ in validation because
the parameters µt and βt are dependent on µt-1 and βt-1. When using validation where
12
groups of observations are removed, the values µt-1 and βt-1 may not be known. Because
the expected value of ηt and ζt is zero, we replace µt-1 and βt-1 with µt-6 and βt-6 for that
observation. All other observations remain unchanged.
MULTIHS
Ramirez modified the STOCHIHS model to allow heteroskedasticity by replacing the
stochastic trend with a fixed-parameter trend, in addition to several other creative
reparameterizations. The new specification also allowed correlations between yields of
different crops. This model’s name is MULTIHS, as it is multivariate and utilizes an
inverse hyperbolic sine distribution for the errors. Mean yields for crop i are described as
(17) Yt,i = β 0,i + β1,i t + ε t,i .
A new variable vt,i is created which equals
ln θ i (ε t,i ) + θ i2 ε 2t,i + 1
(18) v t,i = 
θ
1/2

,
just as in STOCHIHS.
The distributions of vt,i are assumed normal with separate means µi and a
covariance matrix given by Σ. For P different crops and letting w = [vt,1 vt,2 ...]' the loglikelihood function is written as
T
LF = ∏ (2π )
(19)
− P/2
Σ
−1/2
{
}
exp − 0.5(w − µ )' Σ −1 (w − µ )
t =1
∏ (1 + [θ (Y
P
i
i =1
t,i
− β 0,i − β1,i t )
2
.
])
−1/2
13
Ramirez then reparameterizes the model to allow heteroskedasticity where the variances
differ across crop and time. In most models the variance differed each decade. In our
analysis validation is conducted removing five observations at a time. Thus, we allowed
the variance to differ each decade so that there are a minimum of five observations with
which to detect variance changes.
The reparameterization proposed by Rameriz results in a likelihood function that
is very difficult to interpret. So that the reader may obtain some intuition into the model,
we only provide the likelihood function before heteroskedasticity is imposed. Like
STOCHIHS, MULTIHS has flexible third and fourth moments, while maintaining
normality as a special case. The major differences between the models are that mean
yields are stochastic in STOCHIHS and deterministic in MULTIHS, whereas MULTIHS
allows heteroskedasticity as well as correlations between crop yields in a given year,
while STOCHIHS does not.
SEMIPAR
A simpler but still flexible model was offered by Goodwin and Ker. This model, denoted
SEMIPAR for semiparametric, portrays percent deviations of yield from its mean with a
nonparametric kernel smoother. In the original article, mean yields were modeled as an
ARIMA process. However, in correspondence one of the authors recommended using
either a quadratic or linear trend. Therefore, a quadratic trend is used unless the quadratic
term is insignificant, in which case a linear trend is employed. Once the trend is
14
estimated, each residual ε t = Yt − β̂ 0 − β̂1t − β̂ 2 t 2 is divided by the forecast for that time
(
)
period, v t = ε t / β̂ 0 + β̂1 t + β̂ 2 t 2 .
The distribution of vt need not be addressed until the OSLLF values are
calculated, as no parametric fitting of vt takes place. To illustrate, suppose we employed
validation on thirty observations where the data were partitioned into six groups of five
observations each. The first iteration drops the first five observations and estimates
Yt=β0 + β1t + β2t2 via OLS for t=6,…,30. If β2 is found to be insignificant, Yt=β0 + β1t is
estimated using OLS. This provides a vector of in-sample vt's which are calculated as
(
)(
)
v t,in = Yt − β̂ 0 − β̂1 t t / β̂ 0 + β̂1 t for t = 6, ...,30. The “in” subscript is used to denote an
in-sample observation. Now, consider the OSLLF value for a single out-of-sample
(
)(
)
observation at t = 1. First, is v t =1,out calculated as v t =1,out = Y1 − β̂ 0 − β̂11 t / β̂ 0 + β̂11 . The
OSLLF value for v1 is then calculated using the nonparametric kernel smoother:
 1 30  v1,out − v k,in
K
(20) OSLLF1 = ln
 25h ∑
h
k =6



1

 β̂ 0 + β̂1
(
)




In (20), K(.) is the standard normal pdf, h is the bandwidth and is chosen using the
Silberman rule-of-thumb described in Goodwin and Ker. The last term in (20) is the
derivative of v1,out with respect to yield, the Jacobian, to ensure the pdf integrates over
yield. The final OSLLF value is the sum of (20) over all validations.
15
NORMAL
A recent article by Just and Weninger suggests that previous findings of skewed yield
distributions may be the result of inappropriate detrending and failure to properly model
heteroskedasticity. When using flexible polynomial trends for mean yield and yield
variance, the authors find that normality is difficult to reject. This last model is referred
to as the NORMAL model. Following Just and Weninger, a cubic polynomial is used for
expected yield, which can be reduced to a quadratic or linear trend if supported by t-tests.
Heteroskedasticity is accounted for by regressing the absolute value of ordinary least
squares residuals against a quadratic trend. Using t-tests, the variance equation may be
reduced to a linear trend or a constant.
A Forecasting Contest
The six yield models described above are pitted against one another in a forecasting
contest to determine which model tends to be ranked highest across various regions and
crops. The first contest utilizes cornbelt corn, soybean, and wheat yields from 19501989, available in the appendix of Ramirez. Because the three yields are likely correlated
and the MULTIHS model is multivariate, this contest provides insight into the usefulness
of modeling correlations among crops. All six models are estimated with the Ramirez
data. The five univariate models, GAMMA, BETA, STOCHIHS, SEMIPAR and
NORMAL, are used to obtain OSLLF values for each crop, and are then summed across
the three crops to produce a multivariate OSLLF value. Then the MULTIHS model is
used to estimate corn, soybeans, and wheat yields jointly to obtain another multivariate
16
OSLLF value. If MULTIHS is ranked higher than the other six, this is evidence that
accounting for correlations across crops may improve the accuracy of forecasted
probability statements.
Table 1 shows that the semiparametric model, SEMIPAR, is ranked highest. The
low ranking of the MULTIHS model was the result of a few large outliers; the model
predicted a low variance for the years 1950-1954 and was penalized because yields were
low during this period. If the model rankings were repeated using only OSLLF values
from 1956-1989 the MULTIHS would be ranked highest.1 Notice also that the median
OSLLF value for MULTIHS is considerably higher than the other models, suggesting
multivariate distributions can improve yield forecasts.
The next forecasting contest utilizes county-level yields from 1962-1992 available
from the National Agricultural Statistics Service. Univariate yield distributions were
estimated for corn, wheat and soybeans. For each crop, the thirty largest counties in
terms of harvested acreage were selected and an additional thirty counties were chosen at
random. This provides a total of 180 crop/county combinations, each containing 30
observations. The results in the second column of table 2 (Comparison I) show that, once
again, SEMIPAR was ranked highest, followed closely by GAMMA. Surprisingly, the
NORMAL model was not chosen for any crop/county combination.
The forecasting performance of each model is relative, as columns two through
four (Comparisons II-IV) demonstrate. When considering all six models, NORMAL is
never chosen, but when compared only to SEMIPAR it is chosen almost half of the time.
Moreover, when only SEMIPAR and GAMMA are compared, SEMIPAR is second best.
17
To see why some models fared better than others, figure 1 shows the predicted yield
density for 1969 and 1989 wheat in Texas County, Oklahoma. Note how dramatically a
model’s density can change across periods. In 1969, the GAMMA density has very fat
tails while in 1989 the tails are much smaller, and SEMIPAR changes from near bimodality to a much more uniformly-distributed distribution.
Figure 1 also provides a clear picture why GAMMA and SEMIPAR forecasted
well and models like MULTIHS and NORMAL did not. In 1969, it is clear the GAMMA
model is more consistent with the realized yield, while in 1989 SEMIPAR dominates. In
both instances the distributions exhibit fatter tails than the lower-ranking models. The
tails of MULTIHS and NORMAL predict an almost zero chance of the observing the
yields that actually occurred in 1969 and 1989. It is interesting to note that BETA did not
perform well partly because the out-of-sample forecasts of its parameters (α and δ) were
sometimes negative, making the OSLLF function indeterminate. Also interesting is the
fact that the densities of STOCHIHS and MULTIHS frequently diverge, even though their
formulations are similar. These results reemphasize the point made by Just and Weninger
that changes in the way in which mean yield and heteroskedasticity are modeled can
drastically change the predicted yield density.
What features of a model make for better yield descriptions? To address this, we
attempt to determine what makes SEMIPAR perform better than NORMAL. Both use
similar functions for mean yield, but NORMAL accounts for heteroskedasticity by
assuming that yield variance follows a polynomial time trend whereas SEMIPAR assumes
that proportional deviations are homoskedastic. Also, NORMAL assumes normality
18
while SEMIPAR makes no distributional assumption. To determine the role of the
normality assumption, we replace the heteroskedastic normal distribution in NORMAL
with a kernel smoother. Deviations in mean yield were modeled with the same smoother
as SEMIPAR, except that the kernel is applied to nominal, instead of proportional,
deviations as in SEMIPAR.2 Column six (Comparison V) of table 2 shows that this new
model outperforms NORMAL, increasing its ranking from 0% to 13%. SEMIPAR still
dominates, suggesting that both the nonparametric distribution and its treatment of
heteroskedasticity play a role in SEMIPAR's high ranking.
Estimating a constant variance normal distribution also results in better
performance than NORMAL (see Comparison VI of table 2), suggesting that the
polynomial estimate of variance does not extend well to out-of-sample yields. Overall, it
appears that SEMIPAR performs well for two reasons: proportional deviations from mean
yields exhibit less distributional variation, and the semi-parametric form is highly
flexible.
The rankings in table 2 contain both interpolative and extrapolative forecasts.
Yield forecasts for 1967-1987 were interpolative forecasts because the estimation
procedure contained data prior to 1966 and after 1987. Conversely, forecasts for yields
before 1967 and after 1987 are extrapolative forecasts, requiring the use of time trend
values outside the range used in model estimation. Researchers in crop insurance and
farm management may be more interested in the extrapolative forecasts than interpolative
forecasts. Some readers may even contend that the true definition of a forecast is an
extrapolative forecast that uses past observations to predict the future. For this reason,
19
model rankings were also conducted separately for interpolation and extrapolation
forecasts and are shown in table 3. Rankings for the interpolative forecasts used the years
1967 through 1987, while the extrapolative forecast rankings used only the last five years
(1988-1992). Again, SEMIPAR is chosen most frequently across the 180 crop/county
combinations, regardless of whether the forecast is interpolative or extrapolative. The
dominance of SEMIPAR is less pronounced in the extrapolative forecasts, and is followed
closely by GAMMA and STOCHIHS.
Overall, the results support the use of the SEMIPAR model for yield predictions.
This model tends to be ranked highest by the OSLLF criterion in both interpolation and
extrapolation forecasts. Although GAMMA is ranked higher when only SEMIPAR and
GAMMA are compared, SEMPAR is preferred. The reason is that the performance of
GAMMA is sensitive to the maximum yield value chosen.3 When conducting real
forecasts, it is difficult to identify the maximum yield needed to implement the GAMMA
model. Maximum likelihood estimates are of little help because they place the upper
bound near the maximum observed yield, which is a poor estimate when extrapolating
yields. The forecasting contest conducted in this study set the upper bound with ex ante
knowledge it would not be exceeded. In real-time forecasting such conveniences are not
possible.
Concluding Remarks
Recently, there has been considerable debate over the appropriate model to characterize
yield distributions. In this study, we proposed a new method for comparing crop yield
20
models by their out-of-sample log-likelihood function values. We believe this approach
has several advantages over more conventional methods of yield model selection.
Namely, our approach evaluates models by their out-of-sample performance, which
mirrors the way most practitioners utilize such models, and our approach is able to
characterize the ability of models to describe the entire distribution of yields, not just the
mean.
Applying our model selection criterion to six popular yield models appearing in
the literature, we find that models assuming normality were consistently outperformed by
competing models. This finding does not imply that normality is rejected. Such a result
may imply that more consideration should be given to the formulation for mean yield and
yield variance as suggested by Just and Weninger. Although Just and Weninger also
suggest combining normality with flexible polynomial time trends for yield variance, we
find a homoskedastic normal model forecasts better. Overall, results of the model
comparisons indicate that a semi-parametric model proposed by Goodwin and Ker tends
to outperform others.
It is interesting that the best forecasting model in this analysis is also the model
that involves the least optimization. Although the Goodwin and Ker model minimizes
squared-error for the mean equation, no maximum likelihood optimization is performed
on the yield density itself. Unless yields tend to be bi-modal, it would seem that the use
of flexible parametric models, such as those evaluated in this study, would provide higher
in-sample-log-likelihood functions. However, as Kastens and Brester note (page 310)
“reducing in-sample fit may often enhance out-of-sample fit.” Ideally, a balance should
21
be struck between in-sample and out-of-sample fit. Most studies concentrate solely on
the former, while we focus solely on the latter. Further research might focus on how to
combine the two approaches for a more complete measure of model performance.
22
FOOTNOTES
1.
The average OSLLF values for 1956-1989 are; GAMMA = -11.04, BETA = -
13.66, STOCHIHS = -8.96, MULTIHS = -6.63, SEMIPAR = -8.91, NORMAL = -10.63.
2.
If Yt is the realized yield and Ŷt is the mean yield given by a polynomial time
trend, the OSLLF value for the last forecast (t=30) is
(
) (
 1 25  Y30,out − Ŷ30,out − Yk,in − Ŷk,in
K
OSLLFt =30 = ln

 25h ∑
h
k =1


3.
)  .


In the original Gallagher article, the maximum attainable yield for GAMMA was
modeled as a time trend which can extrapolate into the future. In earlier drafts of this
paper this time trend was used for maximum yield, but GAMMA was rarely chosen
because yields often exceeded the maximum yield, causing the OSLLF value to hit
negative infinity. We thank Wade Brorsen and an anonymous reviewer for pointing out
that validation almost forces this to occur, making it an unfair comparison.
23
REFERENCES
Akaike, H. “Information Theory and an Extension of the Maximum Likelihood
Principle.” Proceedings of the 2nd International Symposium on Information
Theory. Edited by N. Petrov and F. Csadki. Budapest. Akademiaai Kiado, 1972.
Pages 267-81.
Brandt, J. A. and D. A. Bessler. “Price Forecasting and Evaluation: An
Application inAgriculture.” Journal of Forecasting 2(1983):237-48.
Day, R. “Probability Distributions of Field Crop Yields.” Journal of Farm
Economics 47(1965):713-41.
Gallagher, P. “U.S. Soybean Yields: Estimation and Forecasting With Nonsymmetric
Disturbances.” American Journal of Agricultural Economics 69(1987):796-803.
Goodwin, B. and A. P. Ker. “Nonparametric Estimation of Crop Yield
Distributions: Implications for Rating Group-Risk Crop Insurance Contracts.”
American Journal of Agricultural Economics 80(1998):139-53.
Just, R. E. and Q. Weninger. “Are Crop Yields Normally Distributed?”
American Journal of Agricultural Economics 81(1999):287-304.
Kastens, T.L., and G.W. Brester. “Model Selection and Forecasting Ability of Theory
Constrained Food and Demand Systems.” American Journal of Agricultural
Economics 78(2)(1996):67-80.
Kullback, S. and R. A. Leibler. “On Information and Sufficiency.” Annals of
Mathematical Statistics 22(1951):79-96.
Miller, S. E., Kandice H. K., and P. J. R. “Revenue Insurance for
24
Georgia and South Carolina Peaches.” Journal of Agricultural and Applied
Economics 32(2000):123-32.
Moss, C. B. and J.S. Shonkwiler. “Estimating Yield Distributions With a Stochastic
Trend and Nonnormal Errors.” American Journal of Agricultural Economics
75(1993):1056-62.
Nelson, C. H. and P. V. Preckel. “The Conditional Beta Distribution As a Stochastic
Production Function.” American Journal of Agricultural Economics
71(1989):370-77.
Norwood, B., J. Lusk, and M. C. Roberts. “A Comparison of Crop Yield
Distribution Selection Criteria.” Presented at the 2002 Agricultural Economics
Southern Meetings in Orlando, Florida. February 2-6, 2002.
Norwood, B., P. Ferrier, and J. Lusk. “Model Selection Using Likelihood
Functions and Out-of-Sample Performance.” Proceedings of the NCR-134
Conference of Applied Commodity Price Analysis, Forecasting, and Market Risk
Management, 2001.
Norwood, B. and T. C. Schroeder. “Usefulness of Placement Weight Data in
Forecasting Fed Cattle Prices.” Journal of Agriculture and Applied Economics
32(2000):63-72.
Ramirez, O. A. “Estimation and Use of a Multivariate Parametric Model for
Simulating Heteroskedastic, Correlated, Nonnormal Random Variables: The
Case of Corn Belt Corn, Soybean, and Wheat Yields.” American Journal of
Agricultural Economics 79(1997):191-205.
25
Sawa, T. “Information Criteria For Discriminating Among Alternative
Regression Models.” Econometrica 46(1978).
Shao, J. “Linear Model Selection by Cross-Validation.” Journal of the American
Statistical Association 88:422(1993):486-494.
Stone, M. “An Asymptotic Equivalence of Choice of Model by Cross-Validation and
Akaike’s Criterion.” Journal of the Royal Statistical Society. Series B
(Methodological) 39:1(1977a):44-47.
_______. “Asymptotics For and Against Cross-Validation.” Biometrika 64:1
(1977b):29-35.
Zhang, P. “On the Distributional Properties of Model Selection Criteria.” Journal of
the American Statistical Association 87:419(1992):732-737.
_______. “Model Selection Via Multifold Cross Validation.” The Annals of
Statistics 21:1(1993):299-313.
26
Table 1.
Model Ranking Results for Corn, Soybean and Wheat Yields in the Cornbelt
OSLLF Summary Statistics for Multivariate Estimation
Model
Out-of-Sample-LogLikelihood Function
(OSLLF)
Average
OSLLF
Valuea
Median
OSLLF
Value
Minimum
OSLLF
Value
Maximum
OSLLF
Value
GAMMA
-433
-10.82
-9.82
-18.46
-9.07
BETA
-527
-13.18
-9.54
-58.52
-7.57
STOCHIHS
-363
-9.08
-8.67
-14.43
-6.69
MULTIHS
-3622
-90.56
-6.40
-1716.50
-4.10
SEMIPAR
-356
-8.88
-8.70
-14.72
-7.39
NORMAL
-396
-9.91
-8.24
-38.60
-7.02
Note: These are multivariate models, meaning the OSLLF values correspond to the
probability of the corn, soybean and wheat yields being realized simultaneously. The
data used here can be found in the Appendix of Ramirez.
a
There were forty total forecasts for each model.
27
Table 2.
Model Rankings for 180 Crop and County Combinations
aModel
Percent of Times Model Is Ranked Highesta According To
Mean OSLLF Value Across Different Candidate Models
Comparison
I
II
III
IV
V
VI
GAMMA
29%
54%
--------
85%
26%
26%
BETA
12%
--------
--------
--------
8%
8%
STOCHIHS
17%
--------
--------
--------
13%
14%
MULTIHS
5%
--------
--------
--------
4%
5%
SEMIPAR
38%
46%
59%
--------
36%
36%
NORMAL
0%
--------
41%
15%
--------
--------
HOMOSKEDASTIC
KERNEL
SMOOTHER
--------b
--------
--------
--------
13%
--------
HOMOSKEDASTIC
NORMAL
--------
--------
--------
--------
a
The model with the largest OSLLF value is ranked highest.
b
Model was not included in the ranking contest.
28
11%
Table 3.
Model Rankings For 180 Crop and County Combinations
Separated by Interpolation and Extrapolation Forecasts
Model
Percent of Times Model Is Ranked Highesta According To
Mean OSLLF Value
Interpolative Forecastsb
Extrapolative Forecastsc
Comparison
Comparison
I
II
III
I
II
III
GAMMA
13%
11%
11%
24%
24%
26%
BETA
21%
19%
19%
6%
5%
5%
STOCHIHS
3%
3%
3%
25%
18%
23%
MULTIHS
3%
4%
4%
12%
11%
11%
SEMIPAR
55%
54%
54%
27%
27%
27%
NORMAL
5%
--------
-------
6%
--------
--------
--------d
8%
--------
--------
16%
--------
--------
--------
9%
--------
--------
8%
HOMOSKEDASTIC
KERNEL SMOOTHER
HOMOSKEDASTIC
NORMAL
a
The model with the largest OSLLF value is ranked highest.
b
Forecasts of yields from 1967 through 1987.
c
Forecasts of yields from 1988 through 1992.
d
Model was not included in the ranking contest.
29
1969 Yields (actual yield was 26.5)
GAMMA
BETA
STOCHIHS
MULTIHS
SEMIPAR
NORMAL
1989 Yields (actual yield was 24.1 bushels)
GAMMA
BETA
STOCHIHS
MULTIHS
SEMIPAR
NORMAL
Figure 1
Forecasted distributions for wheat yields in Texas County,
Oklahoma
30