Download how are crop yields distributed - Department of Agricultural Economics

Ranking Crop Yield Models Using Out-of-Sample Likelihood Functions Authors: Bailey Norwood, Assistant Professor, Department of Agricultural Economics, Oklahoma State University. Matthew C. Roberts, Assistant Professor, Department of Agricultural, Environmental, and Development Economics, The Ohio State University. Jayson Lusk, Associate Professor, Department of Agricultural Economics, Purdue University. Contact: Bailey Norwood 426 Agricultural Hall Stillwater, Oklahoma 74078 405-744-9820 Email: [email protected] Final Submission: American Journal of Agricultural Economics Date: December 18, 2003 Ranking Crop Yield Models Using Out-of-Sample Likelihood Functions Accurate knowledge of crop yield behavior is critical for devising farm management tools, farm policy and crop insurance. Knowledge of the likelihood and severity of yield shortfalls, for example, is necessary for government programs to respond with the appropriate policy. However, crop yields can be extremely variable from year to year, perhaps much more than the output of non-agricultural firms. Although understanding the stochastic nature of crop yields is important, characterizing yield distributions can be quite difficult. How are crop yields distributed? The literature is replete with candidate yield distributions, few of which can be excluded on theoretical grounds. This has led agricultural economists to seek empirical evidence as to which models are best. Previous studies have relied almost exclusively on hypothesis tests for model discrimination. While hypothesis tests are useful in many settings, relying only on in-sample fit does not provide information on how well models extrapolate outside the range of data from which they were estimated. Yield distributions are most commonly used for forecasting. Historical yield distributions are used to set crop insurance premiums based upon the assumption that the following year’s realization is drawn from the same distribution. Farm policy studies often make similar assumptions. Because of the importance of forecasting crop yields, yield distributions should be chosen at least partially on out-of-sample performance. Although there exists a rich body of literature describing how to evaluate point forecasts, less work has focused on how to evaluate the predictive ability of an entire distribution. 1 The purpose of this study is to develop a method for evaluating crop yield models by determining how well they describe the distribution of out-of-sample yields. We show that predicted yield densities can be ranked by their log-likelihood function values observed at out-of-sample observations. This method of model evaluation is then applied to six popular yield models recently developed in the literature. The six models are pitted against one another in a forecasting contest. We find that a model with a kernel smoother applied to percent deviations from the mean outperforms the competing models in describing out-of-sample yields. Evaluating Predicted Yield Densities Ranking yield models by their out-of-sample performance is worthwhile for two reasons. First, sample sizes are typically low, making it easy to over-fit models. Second, yield models are frequently used for making probability statements about future yields, and as such, it seems natural to rank models by their ability to describe out-of-sample yields. The most common method of ranking models by out-of-sample performance is prediction error. Prediction error measures the distance between predicted yield and actual yield for a series of out-of-sample yields. Common prediction error statistics are the out-ofsample-root-mean-squared error; average-absolute-out-of-sample error; and the Ashley, Granger, and Schmalensee test (Brandt and Bessler; Kastens and Brester; Norwood and Schroeder). Although useful in many instances, relying solely on prediction error for model selection leaves much to be desired. Prediction error alone does not account for how well 2 a model captures variance, skewness, kurtosis and probabilities in general. This issue is of particular interest in characterizing yield distributions because insurance policies, for example, pay out when actual yields fall below a particular threshold, which is often quite different than the mean yield level. If the purpose of yield distributions is to generate probability statements over the entire distribution, we must consider more than prediction error. In particular, we contend that a model should perform well in describing forecasted probability statements relative to observed yields. Such a feat can be accomplished by analyzing a model’s likelihood function value observed at out-ofsample observations. The Out-of-Sample-Log-Likelihood Function (OSLLF) Approach Many model selection criteria are available for comparing models’ likelihood functions. Virtually all model selection criteria are based on the Kullback-Leibler Information Criterion (KLIC), which measures the distance between a hypothesized likelihood function and the true likelihood function. The KLIC was originally derived from Bayes’ Rule as follows. Let g(Y|θg) and f(Y|θf) be the likelihood functions of two competing models, where θf and θg are parameter vectors and Y = [yt=1, ... yt=T]'. Using Pr{.} to denote probabilities, according to Bayes’ Rule the probability that g(.) is the correct model given the observed data is (1) Pr{g(.) | Y} = Pr{Y | g(.)}Pr{g(.)} . Pr{Y} 3 By definition, Pr{Y | g(.)} = g(Y | θ g ) . Similarly, the probability that f(.) is correct, given Y, is (2) Pr{f (.) | Y} = Pr{Y | f (.)}Pr{f (.)} . Pr{Y} If we assume that, prior to observing Y, each model has an equal probability of being correct, i.e. Pr{f (.)} = Pr{g(.)} , the odds ratio for the competing models is (3) Pr{f (.) | Y} Pr{Y | f (.)} f (Y | θ f ) = = Pr{g (.) | Y} Pr{Y | g(.)} g (Y | θ g ) Model f(.) would be chosen over model g(.) if the value of (3) is greater than one. In order for (3) to be statistically valid, the likelihood functions must share the variable of integration. For example, if h(Y|θ1) Y>0 is a normal distribution, then h(ln(Y)|θ2) is a log-normal distribution. The odds ratio for the two densities must then be stated as h(Y|θ1)/ [h(ln(Y)|θ2)Y-1]. Notice that the greater the value of (3), the more information the model f(Y|θf) contains relative to g(Y|θg). Kullback and Leibler convert the odds ratio from a relative measure of model performance to an absolute measure by assuming g(Y|θg) is the true data generating process. After taking the logarithm and the expectation of (3) with respect to the true process, the information content of f(Y|θf) can now be written as (4) I(f (.) | Y ) = ∞ ∫ [ln[f (Y | θ )] − ln[g(Y | θ )]]g(Y | θ )dY . f g g −∞ By this measure, the best model is that which minimizes (4). Since g(Y|θg) is the true data generating process it will always be larger than f(Y|θf), and the model with the 4 largest expected likelihood function, shown in (5), minimizes the KLIC (Kullback and Leibler). (5) I(f (.) | Y ) = ∞ ∫ ln[f (Y | θ )]g(Y | θ )dY . f g −∞ Because the true process g(.) is unknown, (5) cannot be directly calculated. If the observations y1, ..., yT in Y are independently and identically distributed, we can estimate (5) by estimating the parameter vector θf and calculating the log-likelihood function as T [( (6) Î(f (.) | Y ) = ∑ ln f y t | θ̂ f,t t =1 )] where θˆ f,t is the parameter estimate used to calculate f(.) at observation t. At first glance, it might appear that θf could be estimated by maximum likelihood using observations t = 1, ..., T. However, such an approach would lead to an upward biased estimator of the true KLIC in (5), where this bias is increasing with the number of parameters in θf (Akaike; Sawa). An alternative approach is to estimate the KLIC by calculating the log-likelihood function using out-of-sample observations. T [( (7) Î(f (.) | Y ) = ∑ ln f y t | θ̂ f,-t t =1 )] In (7), θˆ f,-t is the parameter vector estimated without the tth observation. Thus, we refer to (7) as an out-of-sample-log-likelihood function (OSLLF). The OSLLF value can be calculated several ways. The estimate θ̂ f,-t could be calculated using cross-validation where θˆ f,-t is estimated using every observation except t. This is often referred to as 5 “leave one out at a time forecasting.” Alternatively, we could partition the observations into groups where each group is iteratively omitted and θˆ f,-t is estimated. Then, the omitted group of observations can be used to calculate the OSLLF. This procedure is referred to as grouped-cross-validation. Grouped-cross-validation where the number of partitions equals the number of observations is simply cross-validation. The greater the number of partitions, the more parsimonious will be the chosen model (Shao). The OSLLF is an asymptotically unbiased estimator of the KLIC with the proper number of partitions (Stone 1977a and 1977b; Shao; Zhang 1992 and 1993). Because the optimal number of partitions is a function of the true data generating process, it is impossible to identify exactly (Zhang 1993). Without knowledge of the true data generating process, it is always questionable which model selection criterion is best. This fact is true of all model selection criteria, not just the OSLLF criterion. However, some insight is provided through simulation studies. Zhang suggests using groupedcross-validation utilizing no less than five partitions. In subsequent analysis, we carry out grouped-cross-validation with at least five partitions, where each partition contains at least five observations. In the statistical literature, the term validation usually refers to log-likelihood functions, whereas in the agricultural economics literature validation often refers to prediction error. To prevent confusion, we use validation to refer to cross- and groupedcross-validation, and refer to the calculation of log-likelihood functions using validation as the Out-of-Sample-Log-Likelihood function (OSLLF) approach. 6 The OSLLF criterion has the asymptotic property that it chooses the model with the highest information content, as measured by the Kullback-Leibler Information Criterion. While such asymptotic properties are desirable, due the nature of existing yield samples, we are also interested in the small sample properties of the OSLLF criterion. To explore the small sample properties of the OSLLF criterion, Norwood, Lusk, and Roberts and Norwood, Ferrier, and Lusk performed simulation studies to provide a comparison between the OSLLF and other model selection criteria in small samples. The simulations were designed to mimic crop yield distributions. The results showed that the OSLLF criterion picked the true yield distribution with a higher frequency than other popular methods such as the Akaike Information Criterion, out-of-sample-root-mean squared error, Chi-Square statistic, Kolomogorov-Smirnov statistic, and the Anderson-Darling statistic. Interpolative And Extrapolative Forecasts Yield models are sometimes used for describing the past behavior of yields, but are more frequently used for predicting future yields. A common feature of most yield models is that the yield distribution is described as dependent on a time trend. Thus, forecasting future yields requires extrapolation outside the data. Econometricians usually caution against such extrapolations, but extrapolation is the major reason yield distributions are estimated. For this reason, we make no apologies for ranking models outside the range of data used to estimate them. In fact, we embrace this approach by making extrapolation an important component of model selection. 7 Validation of yield models that possess time trends requires interpolation and extrapolation. Suppose that our sample contained the years 1970-1999, and validation is performed by partitioning the data into six groups of five observations each. When validation is used to forecast yields from 1975-1994, such forecasts are interpolative since the time trend value at each forecast is within the range used for estimation. Forecasts of yields in the periods 1970-1974 and 1995-1999 are extrapolative forecasts since the time trend values are outside the range used for estimation. We believe that interpolative and extrapolative forecasts provide different kinds of information, and it is the extrapolative forecasts that make the OSLLF criterion especially appealing for yield densities. In large samples, there is no difference between validation and other likelihood based criteria such as the Akaike, Bayesian, Final Prediction, and Shibata criteria (Stone 1977a; Zhang). However, these other criteria utilize only in-sample observations where no extrapolations are made. Obtaining information on the ability of yield models to provide accurate probability statements outside the range of data is crucial for selecting models best suited for prescribing risk management strategies. Candidate Yield Models It has long been recognized that yields exhibit complex behavior. Mean yield may or may not be increasing over time. The same can be said for yield variance. For U.S. corn yields, the mean and variance are increasing over time. However, Miller, Kahl, and Rathwell find that South Carolina and Georgia peach yield distributions exhibit a 8 constant mean and variance over this same time period. In addition to the complex possibilities of the behavior of mean and variance throughout time, yields may be negatively skewed, positively skewed or symmetric (Day) and may even exhibit bipolarity (Goodwin and Ker). In response, agricultural economists have developed a myriad of flexible models to allow shape of the distribution shape to be driven by available data. For the purposes of this analysis, six yield models published between 1980 and 2000 were drawn from American Journal of Agricultural Economics. These models are discussed below. GAMMA Model The first model is a frontier model developed by Gallagher, in which the maximum attainable yield is described by YM = γ0 + γ1t where t is an annual time trend. Deviations from this yield frontier are (8) εt = Yt - γ0 - γ1t. Notice that the error εt is always positive, making it suitable for a gamma distribution. If we substitute (8) as the random variable in the gamma probability density function, the yield pdf is [ ] [γ (9) f (Yt ) = β α Γ(α ) -1 { } + γ 1 t - Yt ] exp − β −1 [γ 0 + γ 1 t - Yt ] . α −1 0 where the Greek symbols are parameters to be estimated. The gamma function is flexible in that it can be symmetric or left-skewed, which in this application would make yield right-skewed. To account for heteroskedasticity, Gallagher used OLS residuals to 9 estimate error variance as a function of a time trend. He then created a time-specific index for yield variance, denoted VSt, and entered this index into the pdf as [ ] [γ (10) f (Yt ) = β α Γ(α ) -1 + γ 1 t - Yt ] α −1 0 { } VS1t-α exp − β −1 VS −t 1 [γ 0 + γ 1 t - Yt ] . For this study, this model is modified in two ways. First, the time trend for the yield frontier is replaced with a single value for maximum yield. This maximum yield (YM) is the largest observed yield (in- and out-of-sample) multiplied by 1.2, which is unlikely to be exceeded within forecast horizons of less than five years. This modification is made because maximum likelihood estimates tend to place the maximum attainable yield close to the largest in-sample observation. This almost inevitably results in a likelihood of zero for some observation during the validation procedure. Second, to allow the yield distribution to evolve over time, the parameters α and β are made conditional on a time trend ~ α = α1 + α 2 t (11) ~ . β = β1 + β 2 t This specification allows heteroskedasticity, so no weighting of the observations by VSt is needed. The yield model analyzed in this study, referred to as GAMMA, is [ ][ -1 ~~ α ) Y M - Yt (12) f (Yt ) = β α Γ(~ ] ~ α −1 { [ ]} ~ exp − β −1 Y M - Yt . This distribution can exhibit a positive skew but not a negative skew. To allow a positive skew, another version of the model is created where Yt is substituted for YM – Yt. The version with the highest [in-sample] likelihood function is selected for each estimation. 10 BETA Nelson and Preckel also utilized the concept of a maximum attainable yield, which they estimated by maximum likelihood. Deviations of yield from its maximum value were then modeled as a conditional beta distribution. This model is referred to as the BETA model. In its original form, the BETA model conditioned the two beta parameters on agricultural inputs, such as nitrogen use and soil characteristics. To facilitate comparison with the other five models, these parameters are instead conditioned on a time trend. This specification naturally allows heteroskedasticity and a time-dependent mean yield. If YM denotes the yield ceiling, the pdf for the BETA model can be written as (13) f (Yt ) = ( Y α −1 Y M − Y ( ) B(α, δ ) Y M ) δ −1 α + δ −1 where B(.) is the beta function. Nelson and Preckel originally specified α and δ as CobbDouglas functions of farm inputs. We replaced the farm inputs with a time trend, but found the Cobb-Douglass form made convergence very difficult. Therefore, we specified α and δ as linear time trends, eliminating convergence difficulties. (14) α = θ1 + ω1 t δ = θ 2 + ω2 t As with GAMMA, YM is 20% greater than the highest historical yield in BETA. STOCHIHS Moss and Shonkwiler modeled mean yield as a linear time trend, but allowed the parameters of this trend to be random according to a Kalman Filter. The Kalman Filter 11 allows maximum likelihood estimation to determine the degree of randomness in the parameters. The equations governing mean yield are described as Yt = µ t + ε t (15) u t = u t −1 + β t −1 + η t . β t = β t −1 + ζ t Notice that this specification allows the intercept and/or the slope of the linear time trend to vary across time. Estimation of (15) returns values of µt=0 and βt=0 and the variances of ηt and ζt, denoted ση2 and σζ2, respectively. Deviations of yield from its time trend, εt, is assumed to follow a hyperbolic sine transformation of normality by constructing a new variable vt which equals ( { } ln θε t + θ 2 ε 2t + 1 (16) v t = θ 1/2 )− δ . The distribution of vt is assumed normal with a zero mean. Maximum likelihood estimation of vt provides estimates of θ, δ, µ0, β0, σv2, ση2 and σζ2 where σv2 is the variance of vt. While vt is assumed normal, actual yields may exhibit non-normality by the selection of θ (for skewness) and δ (for kurtosis). Owing to the use of the Kalman filter, the likelihood function is rather complex. For sake of brevity, the likelihood function is not reproduced here, but is available in the original article. This model is referred to as STOCHIHS where the STOCH label designates yields to be a function of stochastic parameters and the IHS label indicates the use of an inverse hyperbolic sine function. The STOCHIHS model imposes homoskedasticity to maintain tractability in the Kalman filter. This model is difficult to employ in validation because the parameters µt and βt are dependent on µt-1 and βt-1. When using validation where 12 groups of observations are removed, the values µt-1 and βt-1 may not be known. Because the expected value of ηt and ζt is zero, we replace µt-1 and βt-1 with µt-6 and βt-6 for that observation. All other observations remain unchanged. MULTIHS Ramirez modified the STOCHIHS model to allow heteroskedasticity by replacing the stochastic trend with a fixed-parameter trend, in addition to several other creative reparameterizations. The new specification also allowed correlations between yields of different crops. This model’s name is MULTIHS, as it is multivariate and utilizes an inverse hyperbolic sine distribution for the errors. Mean yields for crop i are described as (17) Yt,i = β 0,i + β1,i t + ε t,i . A new variable vt,i is created which equals ln θ i (ε t,i ) + θ i2 ε 2t,i + 1 (18) v t,i =  θ 1/2  , just as in STOCHIHS. The distributions of vt,i are assumed normal with separate means µi and a covariance matrix given by Σ. For P different crops and letting w = [vt,1 vt,2 ...]' the loglikelihood function is written as T LF = ∏ (2π ) (19) − P/2 Σ −1/2 { } exp − 0.5(w − µ )' Σ −1 (w − µ ) t =1 ∏ (1 + [θ (Y P i i =1 t,i − β 0,i − β1,i t ) 2 . ]) −1/2 13 Ramirez then reparameterizes the model to allow heteroskedasticity where the variances differ across crop and time. In most models the variance differed each decade. In our analysis validation is conducted removing five observations at a time. Thus, we allowed the variance to differ each decade so that there are a minimum of five observations with which to detect variance changes. The reparameterization proposed by Rameriz results in a likelihood function that is very difficult to interpret. So that the reader may obtain some intuition into the model, we only provide the likelihood function before heteroskedasticity is imposed. Like STOCHIHS, MULTIHS has flexible third and fourth moments, while maintaining normality as a special case. The major differences between the models are that mean yields are stochastic in STOCHIHS and deterministic in MULTIHS, whereas MULTIHS allows heteroskedasticity as well as correlations between crop yields in a given year, while STOCHIHS does not. SEMIPAR A simpler but still flexible model was offered by Goodwin and Ker. This model, denoted SEMIPAR for semiparametric, portrays percent deviations of yield from its mean with a nonparametric kernel smoother. In the original article, mean yields were modeled as an ARIMA process. However, in correspondence one of the authors recommended using either a quadratic or linear trend. Therefore, a quadratic trend is used unless the quadratic term is insignificant, in which case a linear trend is employed. Once the trend is 14 estimated, each residual ε t = Yt − β̂ 0 − β̂1t − β̂ 2 t 2 is divided by the forecast for that time ( ) period, v t = ε t / β̂ 0 + β̂1 t + β̂ 2 t 2 . The distribution of vt need not be addressed until the OSLLF values are calculated, as no parametric fitting of vt takes place. To illustrate, suppose we employed validation on thirty observations where the data were partitioned into six groups of five observations each. The first iteration drops the first five observations and estimates Yt=β0 + β1t + β2t2 via OLS for t=6,…,30. If β2 is found to be insignificant, Yt=β0 + β1t is estimated using OLS. This provides a vector of in-sample vt's which are calculated as ( )( ) v t,in = Yt − β̂ 0 − β̂1 t t / β̂ 0 + β̂1 t for t = 6, ...,30. The “in” subscript is used to denote an in-sample observation. Now, consider the OSLLF value for a single out-of-sample ( )( ) observation at t = 1. First, is v t =1,out calculated as v t =1,out = Y1 − β̂ 0 − β̂11 t / β̂ 0 + β̂11 . The OSLLF value for v1 is then calculated using the nonparametric kernel smoother:  1 30  v1,out − v k,in K (20) OSLLF1 = ln  25h ∑ h k =6    1   β̂ 0 + β̂1 ( )     In (20), K(.) is the standard normal pdf, h is the bandwidth and is chosen using the Silberman rule-of-thumb described in Goodwin and Ker. The last term in (20) is the derivative of v1,out with respect to yield, the Jacobian, to ensure the pdf integrates over yield. The final OSLLF value is the sum of (20) over all validations. 15 NORMAL A recent article by Just and Weninger suggests that previous findings of skewed yield distributions may be the result of inappropriate detrending and failure to properly model heteroskedasticity. When using flexible polynomial trends for mean yield and yield variance, the authors find that normality is difficult to reject. This last model is referred to as the NORMAL model. Following Just and Weninger, a cubic polynomial is used for expected yield, which can be reduced to a quadratic or linear trend if supported by t-tests. Heteroskedasticity is accounted for by regressing the absolute value of ordinary least squares residuals against a quadratic trend. Using t-tests, the variance equation may be reduced to a linear trend or a constant. A Forecasting Contest The six yield models described above are pitted against one another in a forecasting contest to determine which model tends to be ranked highest across various regions and crops. The first contest utilizes cornbelt corn, soybean, and wheat yields from 19501989, available in the appendix of Ramirez. Because the three yields are likely correlated and the MULTIHS model is multivariate, this contest provides insight into the usefulness of modeling correlations among crops. All six models are estimated with the Ramirez data. The five univariate models, GAMMA, BETA, STOCHIHS, SEMIPAR and NORMAL, are used to obtain OSLLF values for each crop, and are then summed across the three crops to produce a multivariate OSLLF value. Then the MULTIHS model is used to estimate corn, soybeans, and wheat yields jointly to obtain another multivariate 16 OSLLF value. If MULTIHS is ranked higher than the other six, this is evidence that accounting for correlations across crops may improve the accuracy of forecasted probability statements. Table 1 shows that the semiparametric model, SEMIPAR, is ranked highest. The low ranking of the MULTIHS model was the result of a few large outliers; the model predicted a low variance for the years 1950-1954 and was penalized because yields were low during this period. If the model rankings were repeated using only OSLLF values from 1956-1989 the MULTIHS would be ranked highest.1 Notice also that the median OSLLF value for MULTIHS is considerably higher than the other models, suggesting multivariate distributions can improve yield forecasts. The next forecasting contest utilizes county-level yields from 1962-1992 available from the National Agricultural Statistics Service. Univariate yield distributions were estimated for corn, wheat and soybeans. For each crop, the thirty largest counties in terms of harvested acreage were selected and an additional thirty counties were chosen at random. This provides a total of 180 crop/county combinations, each containing 30 observations. The results in the second column of table 2 (Comparison I) show that, once again, SEMIPAR was ranked highest, followed closely by GAMMA. Surprisingly, the NORMAL model was not chosen for any crop/county combination. The forecasting performance of each model is relative, as columns two through four (Comparisons II-IV) demonstrate. When considering all six models, NORMAL is never chosen, but when compared only to SEMIPAR it is chosen almost half of the time. Moreover, when only SEMIPAR and GAMMA are compared, SEMIPAR is second best. 17 To see why some models fared better than others, figure 1 shows the predicted yield density for 1969 and 1989 wheat in Texas County, Oklahoma. Note how dramatically a model’s density can change across periods. In 1969, the GAMMA density has very fat tails while in 1989 the tails are much smaller, and SEMIPAR changes from near bimodality to a much more uniformly-distributed distribution. Figure 1 also provides a clear picture why GAMMA and SEMIPAR forecasted well and models like MULTIHS and NORMAL did not. In 1969, it is clear the GAMMA model is more consistent with the realized yield, while in 1989 SEMIPAR dominates. In both instances the distributions exhibit fatter tails than the lower-ranking models. The tails of MULTIHS and NORMAL predict an almost zero chance of the observing the yields that actually occurred in 1969 and 1989. It is interesting to note that BETA did not perform well partly because the out-of-sample forecasts of its parameters (α and δ) were sometimes negative, making the OSLLF function indeterminate. Also interesting is the fact that the densities of STOCHIHS and MULTIHS frequently diverge, even though their formulations are similar. These results reemphasize the point made by Just and Weninger that changes in the way in which mean yield and heteroskedasticity are modeled can drastically change the predicted yield density. What features of a model make for better yield descriptions? To address this, we attempt to determine what makes SEMIPAR perform better than NORMAL. Both use similar functions for mean yield, but NORMAL accounts for heteroskedasticity by assuming that yield variance follows a polynomial time trend whereas SEMIPAR assumes that proportional deviations are homoskedastic. Also, NORMAL assumes normality 18 while SEMIPAR makes no distributional assumption. To determine the role of the normality assumption, we replace the heteroskedastic normal distribution in NORMAL with a kernel smoother. Deviations in mean yield were modeled with the same smoother as SEMIPAR, except that the kernel is applied to nominal, instead of proportional, deviations as in SEMIPAR.2 Column six (Comparison V) of table 2 shows that this new model outperforms NORMAL, increasing its ranking from 0% to 13%. SEMIPAR still dominates, suggesting that both the nonparametric distribution and its treatment of heteroskedasticity play a role in SEMIPAR's high ranking. Estimating a constant variance normal distribution also results in better performance than NORMAL (see Comparison VI of table 2), suggesting that the polynomial estimate of variance does not extend well to out-of-sample yields. Overall, it appears that SEMIPAR performs well for two reasons: proportional deviations from mean yields exhibit less distributional variation, and the semi-parametric form is highly flexible. The rankings in table 2 contain both interpolative and extrapolative forecasts. Yield forecasts for 1967-1987 were interpolative forecasts because the estimation procedure contained data prior to 1966 and after 1987. Conversely, forecasts for yields before 1967 and after 1987 are extrapolative forecasts, requiring the use of time trend values outside the range used in model estimation. Researchers in crop insurance and farm management may be more interested in the extrapolative forecasts than interpolative forecasts. Some readers may even contend that the true definition of a forecast is an extrapolative forecast that uses past observations to predict the future. For this reason, 19 model rankings were also conducted separately for interpolation and extrapolation forecasts and are shown in table 3. Rankings for the interpolative forecasts used the years 1967 through 1987, while the extrapolative forecast rankings used only the last five years (1988-1992). Again, SEMIPAR is chosen most frequently across the 180 crop/county combinations, regardless of whether the forecast is interpolative or extrapolative. The dominance of SEMIPAR is less pronounced in the extrapolative forecasts, and is followed closely by GAMMA and STOCHIHS. Overall, the results support the use of the SEMIPAR model for yield predictions. This model tends to be ranked highest by the OSLLF criterion in both interpolation and extrapolation forecasts. Although GAMMA is ranked higher when only SEMIPAR and GAMMA are compared, SEMPAR is preferred. The reason is that the performance of GAMMA is sensitive to the maximum yield value chosen.3 When conducting real forecasts, it is difficult to identify the maximum yield needed to implement the GAMMA model. Maximum likelihood estimates are of little help because they place the upper bound near the maximum observed yield, which is a poor estimate when extrapolating yields. The forecasting contest conducted in this study set the upper bound with ex ante knowledge it would not be exceeded. In real-time forecasting such conveniences are not possible. Concluding Remarks Recently, there has been considerable debate over the appropriate model to characterize yield distributions. In this study, we proposed a new method for comparing crop yield 20 models by their out-of-sample log-likelihood function values. We believe this approach has several advantages over more conventional methods of yield model selection. Namely, our approach evaluates models by their out-of-sample performance, which mirrors the way most practitioners utilize such models, and our approach is able to characterize the ability of models to describe the entire distribution of yields, not just the mean. Applying our model selection criterion to six popular yield models appearing in the literature, we find that models assuming normality were consistently outperformed by competing models. This finding does not imply that normality is rejected. Such a result may imply that more consideration should be given to the formulation for mean yield and yield variance as suggested by Just and Weninger. Although Just and Weninger also suggest combining normality with flexible polynomial time trends for yield variance, we find a homoskedastic normal model forecasts better. Overall, results of the model comparisons indicate that a semi-parametric model proposed by Goodwin and Ker tends to outperform others. It is interesting that the best forecasting model in this analysis is also the model that involves the least optimization. Although the Goodwin and Ker model minimizes squared-error for the mean equation, no maximum likelihood optimization is performed on the yield density itself. Unless yields tend to be bi-modal, it would seem that the use of flexible parametric models, such as those evaluated in this study, would provide higher in-sample-log-likelihood functions. However, as Kastens and Brester note (page 310) “reducing in-sample fit may often enhance out-of-sample fit.” Ideally, a balance should 21 be struck between in-sample and out-of-sample fit. Most studies concentrate solely on the former, while we focus solely on the latter. Further research might focus on how to combine the two approaches for a more complete measure of model performance. 22 FOOTNOTES 1. The average OSLLF values for 1956-1989 are; GAMMA = -11.04, BETA = - 13.66, STOCHIHS = -8.96, MULTIHS = -6.63, SEMIPAR = -8.91, NORMAL = -10.63. 2. If Yt is the realized yield and Ŷt is the mean yield given by a polynomial time trend, the OSLLF value for the last forecast (t=30) is ( ) (  1 25  Y30,out − Ŷ30,out − Yk,in − Ŷk,in K OSLLFt =30 = ln   25h ∑ h k =1   3. )  .   In the original Gallagher article, the maximum attainable yield for GAMMA was modeled as a time trend which can extrapolate into the future. In earlier drafts of this paper this time trend was used for maximum yield, but GAMMA was rarely chosen because yields often exceeded the maximum yield, causing the OSLLF value to hit negative infinity. We thank Wade Brorsen and an anonymous reviewer for pointing out that validation almost forces this to occur, making it an unfair comparison. 23 REFERENCES Akaike, H. “Information Theory and an Extension of the Maximum Likelihood Principle.” Proceedings of the 2nd International Symposium on Information Theory. Edited by N. Petrov and F. Csadki. Budapest. Akademiaai Kiado, 1972. Pages 267-81. Brandt, J. A. and D. A. Bessler. “Price Forecasting and Evaluation: An Application inAgriculture.” Journal of Forecasting 2(1983):237-48. Day, R. “Probability Distributions of Field Crop Yields.” Journal of Farm Economics 47(1965):713-41. Gallagher, P. “U.S. Soybean Yields: Estimation and Forecasting With Nonsymmetric Disturbances.” American Journal of Agricultural Economics 69(1987):796-803. Goodwin, B. and A. P. Ker. “Nonparametric Estimation of Crop Yield Distributions: Implications for Rating Group-Risk Crop Insurance Contracts.” American Journal of Agricultural Economics 80(1998):139-53. Just, R. E. and Q. Weninger. “Are Crop Yields Normally Distributed?” American Journal of Agricultural Economics 81(1999):287-304. Kastens, T.L., and G.W. Brester. “Model Selection and Forecasting Ability of Theory Constrained Food and Demand Systems.” American Journal of Agricultural Economics 78(2)(1996):67-80. Kullback, S. and R. A. Leibler. “On Information and Sufficiency.” Annals of Mathematical Statistics 22(1951):79-96. Miller, S. E., Kandice H. K., and P. J. R. “Revenue Insurance for 24 Georgia and South Carolina Peaches.” Journal of Agricultural and Applied Economics 32(2000):123-32. Moss, C. B. and J.S. Shonkwiler. “Estimating Yield Distributions With a Stochastic Trend and Nonnormal Errors.” American Journal of Agricultural Economics 75(1993):1056-62. Nelson, C. H. and P. V. Preckel. “The Conditional Beta Distribution As a Stochastic Production Function.” American Journal of Agricultural Economics 71(1989):370-77. Norwood, B., J. Lusk, and M. C. Roberts. “A Comparison of Crop Yield Distribution Selection Criteria.” Presented at the 2002 Agricultural Economics Southern Meetings in Orlando, Florida. February 2-6, 2002. Norwood, B., P. Ferrier, and J. Lusk. “Model Selection Using Likelihood Functions and Out-of-Sample Performance.” Proceedings of the NCR-134 Conference of Applied Commodity Price Analysis, Forecasting, and Market Risk Management, 2001. Norwood, B. and T. C. Schroeder. “Usefulness of Placement Weight Data in Forecasting Fed Cattle Prices.” Journal of Agriculture and Applied Economics 32(2000):63-72. Ramirez, O. A. “Estimation and Use of a Multivariate Parametric Model for Simulating Heteroskedastic, Correlated, Nonnormal Random Variables: The Case of Corn Belt Corn, Soybean, and Wheat Yields.” American Journal of Agricultural Economics 79(1997):191-205. 25 Sawa, T. “Information Criteria For Discriminating Among Alternative Regression Models.” Econometrica 46(1978). Shao, J. “Linear Model Selection by Cross-Validation.” Journal of the American Statistical Association 88:422(1993):486-494. Stone, M. “An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion.” Journal of the Royal Statistical Society. Series B (Methodological) 39:1(1977a):44-47. _______. “Asymptotics For and Against Cross-Validation.” Biometrika 64:1 (1977b):29-35. Zhang, P. “On the Distributional Properties of Model Selection Criteria.” Journal of the American Statistical Association 87:419(1992):732-737. _______. “Model Selection Via Multifold Cross Validation.” The Annals of Statistics 21:1(1993):299-313. 26 Table 1. Model Ranking Results for Corn, Soybean and Wheat Yields in the Cornbelt OSLLF Summary Statistics for Multivariate Estimation Model Out-of-Sample-LogLikelihood Function (OSLLF) Average OSLLF Valuea Median OSLLF Value Minimum OSLLF Value Maximum OSLLF Value GAMMA -433 -10.82 -9.82 -18.46 -9.07 BETA -527 -13.18 -9.54 -58.52 -7.57 STOCHIHS -363 -9.08 -8.67 -14.43 -6.69 MULTIHS -3622 -90.56 -6.40 -1716.50 -4.10 SEMIPAR -356 -8.88 -8.70 -14.72 -7.39 NORMAL -396 -9.91 -8.24 -38.60 -7.02 Note: These are multivariate models, meaning the OSLLF values correspond to the probability of the corn, soybean and wheat yields being realized simultaneously. The data used here can be found in the Appendix of Ramirez. a There were forty total forecasts for each model. 27 Table 2. Model Rankings for 180 Crop and County Combinations aModel Percent of Times Model Is Ranked Highesta According To Mean OSLLF Value Across Different Candidate Models Comparison I II III IV V VI GAMMA 29% 54% -------- 85% 26% 26% BETA 12% -------- -------- -------- 8% 8% STOCHIHS 17% -------- -------- -------- 13% 14% MULTIHS 5% -------- -------- -------- 4% 5% SEMIPAR 38% 46% 59% -------- 36% 36% NORMAL 0% -------- 41% 15% -------- -------- HOMOSKEDASTIC KERNEL SMOOTHER --------b -------- -------- -------- 13% -------- HOMOSKEDASTIC NORMAL -------- -------- -------- -------- a The model with the largest OSLLF value is ranked highest. b Model was not included in the ranking contest. 28 11% Table 3. Model Rankings For 180 Crop and County Combinations Separated by Interpolation and Extrapolation Forecasts Model Percent of Times Model Is Ranked Highesta According To Mean OSLLF Value Interpolative Forecastsb Extrapolative Forecastsc Comparison Comparison I II III I II III GAMMA 13% 11% 11% 24% 24% 26% BETA 21% 19% 19% 6% 5% 5% STOCHIHS 3% 3% 3% 25% 18% 23% MULTIHS 3% 4% 4% 12% 11% 11% SEMIPAR 55% 54% 54% 27% 27% 27% NORMAL 5% -------- ------- 6% -------- -------- --------d 8% -------- -------- 16% -------- -------- -------- 9% -------- -------- 8% HOMOSKEDASTIC KERNEL SMOOTHER HOMOSKEDASTIC NORMAL a The model with the largest OSLLF value is ranked highest. b Forecasts of yields from 1967 through 1987. c Forecasts of yields from 1988 through 1992. d Model was not included in the ranking contest. 29 1969 Yields (actual yield was 26.5) GAMMA BETA STOCHIHS MULTIHS SEMIPAR NORMAL 1989 Yields (actual yield was 24.1 bushels) GAMMA BETA STOCHIHS MULTIHS SEMIPAR NORMAL Figure 1 Forecasted distributions for wheat yields in Texas County, Oklahoma 30

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download how are crop yields distributed - Department of Agricultural Economics