Download 11069_2014_1248_MOESM1_ESM

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Instrumental variables estimation wikipedia , lookup

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Supplementary Section
Detailed Methodology
The adopted GLS regression method is summarised below from Madsen et al. (2002)
and Haddad et al. (2011). Consider
ŷ i
to be an estimate of the annual maximum series
(AMS) parameter at station i. The following linear relationship is considered:
p
yˆ i   0 
 X
k
ik
  i  i
(A1)
k 1
where X ik are predictor variables (climatic and physiographic characteristics),  k are
the regression coefficients,  i is the random sampling error associated with ŷi , and
 i is the residual model error. To evaluate equation A1, the covariance structure of the
sampling error must be known. The sampling error variances of the AMS parameters
can be estimated by Monte Carlo simulations (Madsen et al., 2002). Estimates can be
derived for the sampling error variances (diagonal of error covariance matrix) by
substituting the population parameters by the sample estimates. It must be noted
though, to solve the GLSR equations, the error covariance estimator should be
independent, or nearly so, of the AMS parameter estimate
ŷ i
(Stedinger and Tasker,
1985). Following a similar approach as outlined by Madsen et al. (2002), estimates of
the sampling error variance that is nearly independent of the three AMS parameters
were obtained.
For the estimation of inter-site correlation for the various parameters, we considered
concurrent records of annual maximum rainfall series across all the sites within a
selected region. The inter-site correlation between the sample mean values  ij is
equal to the correlation coefficient between concurrent rainfall events of sites i and j.
The correlation between higher order sample moments depends on the order of the
moment (Stedinger, 1983; Madsen et al., 2002). For the L-CV and L-SK estimates,
the inter-site correlation coefficient is approximated by  ij   ij2 and  ij   ij3
2
3
respectively. The estimated cross correlation coefficients have reasonably large
sampling uncertainties associated with them, especially if the concurrent record length
is small. Relatively better estimates of cross correlation can be found when the sample
1
cross correlation coefficients are smoothed by relating them to the distance between
sites. In this study, the following exponential correlation function was used:
ij
 d ij 


 d 1 
   ij 
(A2)
where dij is the distance between sites i and j and  and  are parameters estimated
from the data for each duration required.
The Bayesian Information Criterion (BIC) and the Akaike Information Criterion
(AIC) were used to evaluate the goodness-of-fit of the candidate distributions. The
AIC has generally been used in hydrological applications to select the flood frequency
model (Laio et al., 2009). However in this study the second order variant of AIC,
called AICc was used (given by equation A3), where n is the sample size and P is the
number of parameters of the desired probability distribution. AICc accounts for the
biases in smaller sample size. As reported, AICc should be used when n/p < 40 to
avoid bias (Calenda et al., 2009).
AIC c  - 2 (Y)  2 P
n
(n - P - 1)
(A3)
Where  (Y) is the log-likelihood maximised function and P is the number of model
parameters fitted to the available sample. In practice, after the computation of the
AICc, for all of the operating models, one selects the model with the minimum AICc
value. The BIC is very similar to the AIC, but is developed in a Bayesian framework:
BIC  - 2 (Y)  ln( n) P
(A4)
The BIC penalizes more heavily for small sample size and for models with high
values of P. Since
 (Y ) depends
on the sample, the candidate models can be compared
using AIC and BIC only if fitted on the same sample. In this study, the competing
distributions were fitted to the same samples. It should be noted here that other
goodness-of-fit tests (such as Chi-square and Kolmogorov-Simirnov tests) could have
2
been adopted (e.g. Rahman et al., 2013), but it was deemed to be adequate adopting
the AICc and BIC tests in this study.
A number of goodness-of-fit measures and statistical diagnostics were used to assess
2
the regression equations. A pseudo coefficient of determination ( RGLS
) (Reis et al.
2005) was used (defined in equation A5) as the traditional coefficient of
determination made little sense with the GLSR as it neglects sampling variability
portion of the total error. Outlier statistics and various diagnostics plots were used to
identify outlier sites.
2
The RGLS
is given by:
2
RGLS

n[ˆ 2 (0)  ˆ 2 (k )]
ˆ 2 (k )

1

nˆ 2 (0)
ˆ 2 (0)
(A5)
where ˆ 2 (k ) and ˆ 2 (0) are the model error variances when k and no predictor
2
variables are used, respectively. In this case, RGLS
measures the improvement of a
GLSR model with k predictor variables against the estimated error variance for a
model without predictor variables.
The standardised residual (rsi) is the residual ri divided by the square root of its
variance and was calculated as:
rsi 
ri
[i  x i ( X T Λ 1 X) 1 xTi ]0.5
where λi is the diagonal of Λ,
(A6)
where  is the error covariance matrix and
xi is catchment characteristic at site i.
To assess the adequacy of the estimated design rainfall the Z score was plotted as a
quantile-quantile (QQ)-plot (See equation A7) (for easier interpretation) to assess if
the underlying assumptions of normality of residuals were satisfied.
3
Z score 
log10 yi  log10 yˆi
(A7)
 2 yi  ˆ 2 yˆi
Here the numerator is the difference between the at-site rainfall quantile and regional
rainfall quantile (estimated from the developed regression equation) and the
denominator is the square root of the sum of the variances of the at-site (  2 yi ) and
regional ( ˆ 2 ŷ ) rainfall quantiles in logarithm space.
References
Calenda G, Mancini CP, Volpi E. 2009. Selection of the probabilistic model of
extreme floods: The case of the River Tiber in Rome. Journal of Hydrology 27: 1-11.
Haddad K. Rahman A. Green J. 2011. Design Rainfall Estimation in Australia: A
Case Study using L moments and Generalized Least Squares Regression. Stochastic
Environmental Research & Risk Assessment 25(6): 815-825.
Laio F, Di Baldassarre G, Montanari A. 2009. Model selection techniques for the
frequency analysis of hydrological extremes. Water Resources Research 45:
W07416.doi:10.1029/2007/WR006666.
Madsen H, Mikkelsen PS, Rosbjerg D, Harremoes P. 2002. Regional estimation of
rainfall intensity duration curves using generalised least squares regression of partial
duration series statistics. Water Resources Research 38(11): 1-11.
Rahman S.A, Rahman A, Zaman M, Haddad K, Ashan A, Imteaz MA. 2013. A Study
on Selection of Probability Distributions for At-site Flood Frequency Analysis in
Australia. Natural Hazards, 69: 1803-1813.
Reis DS Jr, Stedinger, JR, Martins ES. 2005. Bayesian generalised least squares
regression with application to log Pearson type 3 regional skew estimation. Water
Resources Research 41: W10419 doi: 10.1029/2004WR003445.
Stedinger JR. 1983. Estimating a regional flood frequency distribution. Water
Resources Research 19(2): 503-510.
Stedinger JR, Tasker GD. 1985. Regional hydrologic analysis, 1.Ordinary, weighted,
and generalised least squares compared. Water Resources Research 22(9):1421-1432.
4