Download Statistical Downscaling of Daily Temperature in Central Europe

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
1 JULY 2002
HUTH
1731
Statistical Downscaling of Daily Temperature in Central Europe
RADAN HUTH
Institute of Atmospheric Physics, Prague, Czech Republic
(Manuscript received 11 June 2001, in final form 19 November 2001)
ABSTRACT
Statistical downscaling methods and potential large-scale predictors are intercompared for winter daily mean
temperature in a network of stations in central and western Europe. The methods comprise (i) canonical correlation
analysis (CCA), (ii) singular value decomposition analysis, (iii) multiple linear regression (MLR) of predictor
principal components (PCs) with stepwise screening, (iv) MLR of predictor PCs without screening (i.e., all PCs
are forced to enter the regression model), and (v) MLR of gridpoint values with stepwise screening (pointwise
regression). The potential predictors include two circulation variables (sea level pressure and 500-hPa heights)
and two temperature variables (850-hPa temperature and 1000–500-hPa thickness). The methods are evaluated
according to the accuracy of specification (in terms of rmse and variance explained), their temporal structure
(characterized by lag-1 autocorrelations), and their spatial structure (characterized by spatial correlations and
objectively defined divisions into homogeneous regions).
The most accurate specification and best approximation of the temporal structure are achieved by the pointwise
regression; the spatial structure is best captured by CCA. The best choice of predictors appears to be a pair of
one circulation and one temperature predictor. Of the two ways of reproducing the original variance, inflation
yields more realistic both temporal and spatial variability than randomization. The size of the domain on which
predictors are defined plays a rather negligible role.
1. Introduction
The performance of general circulation models
(GCMs) at the local scale is often poor, which is particularly true for surface variables (Giorgi et al. 2001).
The local-scale surface variables are, however, essential
for assessing climate change impacts. On the other hand,
GCM simulations of large-scale upper-air fields are generally considered quite reliable. The methodologies to
bridge the gap between what is successfully simulated
by GCMs and what is needed in climate impact research
are generally referred to as downscaling. Among the
approaches designed for this purpose, statistical downscaling methods appear to be those most widely used.
Statistical downscaling consists of seeking statistical
relationships between the variables simulated well by
GCMs (predictors) and the surface climate variables
(predictands). Several statistical methods are applicable
to description of these relationships. Multiple linear regression (MLR), based either directly on gridpoint data
or on principal components (PCs) of predictor fields,
and canonical correlation analysis (CCA) have been
used most frequently. Recently, nonlinear methods have
emerged, including multivariate splines (Corte-Real et
al. 1995) and neural networks (e.g., Crane and Hewitson
Corresponding author address: Radan Huth, Institute of Atmospheric Physics, Bočnı́ II 1401, 141 31 Praha 4, Czech Republic.
E-mail: [email protected]
q 2002 American Meteorological Society
1998; Wilby et al. 1998; Trigo and Palutikof 1999; Cavazos 2000).
Many recent studies have used large-scale circulation
(geopotential height or sea level pressure) as the only
predictor in specifying a variety of variables, including
temperature (Hewitson 1994; Schubert and HendersonSellers 1997), precipitation (Noguer 1994; Corte-Real
et al. 1995; Saunders and Byrne 1996; Busuioc et al.
1999), and sea level (Heyen et al. 1996). However, the
assumption that changes in surface climate elements due
to the enhanced greenhouse effect can be derived from
changes in circulation only does not appear to be realistic because observed changes in circulation may not
be a dominant agent in long-term surface climate trends
(Widmann and Schär 1997; Hanssen-Bauer and Førland
1998; Huth 2001). The downscaling procedure based
on circulation variables only, if applied to a future climate, may lead to an entirely unrealistic result of virtually no temperature change, although a direct GCM
output indicates a warming by several degrees (Schubert
1998). More recent studies avoid that questionable assumption and include among predictors other variables
such as free atmospheric temperature, or equivalently
thickness (Kaas and Frich 1995; Cavazos 1997), and
moisture variables (Crane and Hewitson 1998; Wilby et
al. 1998).
Since downscaling approaches are numerous, a comparison of methods and potential predictors appears to
1732
JOURNAL OF CLIMATE
VOLUME 15
TABLE 1. List of stations used in the study. Altitudes are in meters above mean sea level. The stations are marked in Fig. 1 with the
numbers in the left column. The international country codes are used: CZ—Czech Republic, SK—Slovakia, DE—Germany, CH—Switzerland,
AT—Austria, BE—Belgium.
No.
Station name
Country
Alt
No.
Station name
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Hradec Králové
Velké Meziřı́čı́
Holešov
Milešovka
Teplice
Husinec
Hurbanovo
Sliač
Oravská Lesná
Štrbské Pleso
Poprad
Košice
Norderney
Hamburg
Greifswald
Kleve
Hameln
Potsdam
Cottsbus
Erfurt
CZ
278
452
224
833
225
536
115
313
780
1360
695
230
11
13
2
46
66
81
69
316
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Gießen
Saarbrücken
Würzburg
Nürnberg
Stuttgart
München
Neuchâtel
Basel
Zürich
Säntis
Davos
Reutte
Sonnblick
Feurkogel
Klagenfurt
Wien
Koksijde
Deurne
Saint Hubert
SK
DE
be a useful exercise. In the first study of this kind related
to daily temperature, Winkler et al. (1997) concentrated
on a sensitivity of downscaling output to a definition of
seasons and the form of the transfer function, but relied
on circulation predictors only (sea level pressure and
500-hPa heights) and a single downscaling method
(multiple linear regression), the evaluation being based
on explained variance. Huth (1999, hereafter H99) compared the accuracy of specification of three linear downscaling methods and several large-scale circulation and
temperature predictors in terms of root-mean-square error (rmse). However, the accuracy of specification is not
the only criterion according to which the methods can
be rated. Also important, but so far considered sporadically in downscaling studies, are the temporal structure
of downscaled time series and the spatial structure of
downscaled fields. The former has been investigated
recently by Huth et al. (2001) from the point of view
of prolonged extreme events (heat and cold waves),
while the latter has been considered by Solman and
Nuñez (1999) in terms of spatial correlations for monthly mean temperature in South America and by Easterling
(1999) who examined the dependence of correlation between stations on their distance.
Since the predictors account only for a part of variance of the predictand, the variance of downscaled time
series should be artificially enhanced in order to be reproduced properly. For this purpose, the inflation procedure, introduced by Karl et al. (1990), has been used
most commonly. The inflation consists of increasing the
temperature anomaly on each day by the same factor.
However, the inflation implicitly assumes that all local
variability originates from large-scale variability, which
is not true. An alternative is to represent processes unresolved by the large-scale predictor(s) by adding to the
Country
CH
AT
BE
Alt
186
319
268
314
373
515
487
317
569
2498
1590
870
3105
1618
447
202
5
10
556
downscaled series a noise with some predefined properties (randomization; von Storch 1999).
This paper is a follow-up of the H99 study. Its aims
can be summarized in a few items: (i) to compare the
performance of several linear downscaling methods and
several sets of large-scale predictors in terms of the
accuracy of specification, temporal structure, and spatial
structure; (ii) to compare the performance of the two
ways of reproduction of variance (inflation vs randomization); and (iii) to estimate the dependence of the
downscaling output on the size of the domains on which
the predictors and predictands are defined. The downscaling is performed for daily mean temperature in winter at a network of stations in central Europe.
2. Data
The study is carried out for eight winter seasons (December–February) from 1982–83 to 1989–90. Daily
mean temperature data are collected at 39 stations in
central and parts of western Europe (Austria, Belgium,
Czech Republic, Germany, Slovakia, and Switzerland).
The stations are located in northern midlatitudes in various orographic conditions, ranging from the sea coast
to highlands, mountain valleys, and summits. The stations are listed in Table 1 together with their elevations;
their locations are mapped in Fig. 1. (All the maps displayed indicate degrees of east longitude and degrees
of north latitude.) The relative shortness of the dataset
is given by the availability of data. We are aware of the
limitations it poses, for example, concerning the stationarity of the predictor–predictand relationships.
The potential large-scale predictors include 500-hPa
heights (HGT), sea level pressure (SLP), 850-hPa temperature (TEM), and 1000–500-hPa thickness (THI) de-
1 JULY 2002
HUTH
FIG. 1. Location of stations used in the study. For their names refer
to Table 1. Degrees of east longitude on x axis and degrees of north
latitude on y axis (for this and other map figures).
fined on a 58 3 108 grid extending from 358 to 708N
and from 458W to 458E, which covers most of Europe
and adjacent parts of the Atlantic Ocean and consists
of 80 grid points. The source of large-scale data are the
National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalyses (Kalnay et al. 1996). We decided to increase
the grid step relative to the original resolution of NCEP–
NCAR products (2.58 3 2.58) to reduce the amount of
data as well as the redundancy in them because all the
predictors exhibit high spatial correlations.
3. Downscaling methods
Three linear methods of statistical downscaling are
evaluated in this study: (i) CCA prefiltered by principal
component analysis (PCA), (ii) singular value decomposition (SVD) analysis, and (iii) MLR. The CCA and
SVD methods are described and discussed in detail in
Bretherton et al. (1992). They seek pairs of patterns,
one from predictors and one from predictands, such that
they share maximum correlation (CCA) or covariance
(SVD). The higher-order (second, third, etc.) pairs
(modes) maximize correlation/covariance not captured
by the preceding pairs. To make CCA feasible, its predictors are usually prefiltered using PCA (Barnett and
Preisendorfer 1987; Barnston and Ropelewski 1992).
We consider three different MLR models: (i) stepwise
screening of PCs of the predictor field(s) (hereafter referred to as ‘‘stepwise regression’’), (ii) MLR of predictor’s PCs without screening, that is, all PCs being
forced to enter the model (‘‘full regression’’), and (iii)
stepwise screening of gridded values (‘‘pointwise regression’’). The MLR approach was developed and successfully applied several decades ago to the specification
of surface temperature from midtropospheric circulation, mainly for the purpose of weather prediction, by
Klein (1962). In stepwise screening, each potential pre-
1733
dictor (PC or gridpoint value) is evaluated for its individual significance level before including it in the regression equation, and, after the addition, each variable
within the equation is evaluated for its significance as
part of the model. A variable is included (retained) in
the equation if the corresponding significance level is
lower than 10% (5%).
For each method, several variants of the number of
PCs and the number of modes have been examined. In
all models, both predictors and predictands are used in
the form of normalized anomalies, that is, the means
over the analyzed period were subtracted first from the
values at each grid point/station, which were then divided by standard deviations.
Two methods of reproducing variance of temperature
series are compared. The inflation consists in increasing
the anomaly on each day by the ratio of standard deviations of the observed and downscaled series, which
is identical to dividing by the correlation between the
two series (Karl et al. 1990). This has been the most
common way of handling the unexplained variance so
far. An alternative to it is to represent the processes
unresolved by the large-scale predictor by adding noise
to the downscaled series (von Storch 1999). The observed series can then be considered as a sum of the
downscaled series and the noise. In this study we add
white noise, generated at each station separately, to the
output from downscaling. Because white noise is by
definition uncorrelated with the downscaled time series,
its variance may be defined as the difference of variances of the downscaled and observed series. The same
approach has been adopted by Wilby et al. (1999) and
Zorita and von Storch (1999). We take the inflation as
a standard approach, so wherever the method of enhancing the variance is not specifically stated, the inflation is used.
4. Accuracy of specification
a. Methodology
The accuracy of downscaled values has been quantified in terms of (i) the rmse of the downscaled values
relative to the observed, (ii) mean absolute error, and
(iii) percentage of variance explained (or, equivalently,
correlation coefficient). The rating of methods and predictors is identical for the three measures of correspondence in most cases, so we use in the presentation of
results just one of them. To obtain one number for each
method to characterize its general performance, the values of the correspondence measures are averaged over
the stations.
The downscaling methods are evaluated within the
cross-validation framework, which allows an unbiased
estimate of potential ‘‘predictability.’’ The cross validation consists in omitting one case in turn, building
the statistical model on the remaining dataset, and applying the statistical model to the omitted case (Mi-
1734
JOURNAL OF CLIMATE
FIG. 2. Percentage of variance explained, averaged over 39 stations,
for different predictors and their combinations. All for pointwise
MLR.
chaelsen 1987). Since time series of daily temperature
exhibit considerable autocorrelation, the above procedure would lead to a positive bias in downscaling skill.
To eliminate autocorrelation, one season is held out at
a time instead of a single day, and the statistical model
built on the remaining seven seasons is verified on the
omitted period. All the statistical models are thus built
eight times.
VOLUME 15
FIG. 3. Percentage of variance explained, maximum among the four
combined predictors. The most successful combination of predictors
is indicated by symbols: star (HGT1TEM), cross (HGT1THI), circle
(SLP1TEM), square (SLP1THI).
plained ranges from 90.6% at Feurkogel, Austria, to
only 37.4% at Klagenfurt, Austria. The highest shares
of variance are explained at the elevated, especially
mountain summit stations (Feurkogel and Sonnblick in
Austria, Säntis and Davos in Switzerland, Štrbské Pleso
in Slovakia, and Milešovka in the Czech Republic). The
elevation dependence of the percentage of explained
variance is underlined by its Spearman rank correlation
with the elevation, which is equal to 0.350 (significant
at the 5% level).
b. Rating of predictors
In H99, temperature predictors (850-hPa temperature
and 1000–500-hPa thickness) have been shown, in
terms of rmse, to be superior to the circulation predictors
(sea level pressure and 500-hPa heights); and the combined predictors (one temperature and one circulation)
to be superior to any single predictor. This holds not
only for the pointwise MLR method as shown in H99,
but also for any method yielding a reasonably good
specification. Figure 2 illustrates these facts in terms of
variance explained for the pointwise regression. In addition, it shows that all four combined predictors yield
a level of performance very close to one another and
explain about two-thirds of the daily temperature variance. The combination of SLP and 850-hPa temperature
appears to be superior, but by a rather narrow margin.
It explains 68.4% of the variance on average, the mean
correlation coefficient being 0.825 and mean rmse
2.848C.
The geographical distribution of the percentage of
variance explained can be seen in Fig. 3. It shows at
each station the combination of predictors for which the
highest share of variance is explained, and the corresponding value. Three combinations of predictors,
SLP1TEM, HGT1TEM, and HGT1THI, are each superior to the others at 10–13 stations. The variance ex-
c. Rating of methods
The downscaling methods are intercompared for the
combination of predictors HGT1TEM, with variance
inflation employed. The results for other combinations
of predictors are qualitatively the same.
First of all we discuss the number of PCs to enter the
MLR and CCA methods. It is our intention to compare
results of downscaling based on different numbers of
PCs, so we do not confine ourselves to identifying only
a single, hopefully the most appropriate number. In selecting the suitable numbers of PCs we followed the
rule of O’Lenic and Livezey (1988): the PCs should be
cut just behind a section with a small slope (shelf ) on
an eigenvalue versus PC-number diagram. Such a diagram, with a percentage of variance explained given in
a logarithmic scale on the ordinate, is shown in Fig. 4
for the set of predictors (500-hPa heights and 850-hPa
temperatures at grid points) and predictands (daily mean
temperature at stations). In both diagrams, eight curves
are displayed, one for each cross-validated sample. The
numbers of PCs for which breaks between the shelves
and steeper sections are (more or less) evident for the
majority of cross-validated samples (3, 5, 7, 9, 11, and
15 PCs for the predictor, and 4, 7, 9, and 15 PCs for
the predictand) are indicated by vertical lines. These
1 JULY 2002
1735
HUTH
TABLE 2. Performance (in terms of areally averaged rmse, in 8C)
of stepwise and full regression for different numbers of PCs of predictors. The last row shows results for the pointwise regression. The
range of the number of variables selected in the stepwise regression
models is in the last column.
FIG. 4. Percentage of variance explained (y axis) by 20 leading
PCs for (top) the predictor (HGT1TEM) and (bottom) predictand
(Tsfc, daily mean temperature at stations). The numbers of PCs selected for the further analysis are indicated by vertical lines.
numbers of PCs are selected to enter the statistical models.
Results for the MLR methods are displayed in Table
2. The accuracy of the downscaled values improves with
the increasing number of PCs, but after PC 9 the improvements become negligible. However, even higherorder PCs may be relevant for regression models: for
example, PC 15 (which may have been considered the
least important) enters the regression equations as the
fourth–sixth most informative predictor at the majority
of stations. Note that more than a half of PCs is selected
by the stepwise procedure as informative predictors in
the regression equations (last column in Table 2). Differences between the stepwise and full regression are
negligible, which implies that the addition of predictors
carrying virtually no information does not deteriorate
the accuracy.
The pointwise regression clearly outperforms the regressions based on PCs: it improves the rmse by more
No. of
PCs
Rmse
stepwise
regression
Rmse
full
regression
No. of variables
in stepwise
regression
3
5
7
9
11
15
Pointwise
4.00
3.84
3.69
3.55
3.53
3.52
2.88
4.00
3.83
3.69
3.55
3.52
3.50
—
1–3
3–5
4–7
5–9
6–11
8–14
10–29
than 0.68C relative to the most successful full regression
model for 15 PCs. The saturated level of accuracy, to
which the stepwise and full regression appear to converge with the increasing number of PCs, is far below
the performance of the pointwise regression. The difference in performance of the pointwise and PC-based
regressions does not seem to be explainable by different
numbers of parameters to be estimated. The area-averaged rmse for the pointwise model with the number
of variables constrained to 12 is 2.888C, which is identical to the unconstrained pointwise regression. This
above opinion is further supported by the results of
Klein and Walsh (1983) who found the superiority of
the pointwise regression to the PC-based regression with
both models constrained to the same number of variables. The most likely reason for this superiority was
suggested by Klein and Walsh (1983): Whereas the
pointwise regression selects the grid points that maximize the explained variance of surface temperature, the
PCs are designed to maximize the height variance, and
necessarily contain some information irrelevant to the
temperature variability.
In the CCA method, the addition of more canonical
modes enhances the accuracy first, but after a certain
number of modes, the accuracy starts to decline slightly.
An example is shown in Table 3 where the mean rmse
is presented for CCA with seven PCs of predictors and
seven PCs of predictands: the accuracy is at maximum
for four modes. The dependence of the accuracy on the
number of PCs entering CCA is documented in Table
4. The higher the number of PCs, the more accurate the
TABLE 3. Areally averaged rmse (in 8C) for different numbers of
modes in CCA with 7 PCs of predictor and 7 PCs of predictand.
No. of
modes
Rmse
1
2
3
4
5
6
4.09
4.04
3.89
3.85
3.90
3.97
1736
JOURNAL OF CLIMATE
VOLUME 15
TABLE 4. Areally averaged rmse for CCA with different combinations of the numbers of predictor and predictand PCs. Results are
only shown for the number of canonical modes (indicated in parenthesis) that yields the lowest rmse. The omitted values were not
calculated.
No. of
predictor
PCs
4
7
9
15
3
5
7
9
11
15
4.12 (3)
3.98 (4)
3.89 (4)
3.70 (3)
3.65 (4)
—
4.06 (2)
3.96 (4)
3.85 (4)
3.64 (3)
3.62 (4)
—
4.04 (3)
3.92 (3)
3.78 (4)
3.62 (3)
3.63 (5)
—
—
—
—
—
—
3.55 (5)
No. of predictand PCs
downscaled values. The accuracy is affected more by
the predictor PCs than predictand PCs. The optimum
number of canonical modes for each combination of the
numbers of predictor and predictand PCs varies from
two to five (numbers in parentheses in Table 4). The
most accurate approximation, achieved for 15 predictor
PCs, 15 predictand PCs, and 5 canonical modes (rmse
of 3.558C) is, however, far from the accuracy of the
pointwise regression (rmse of 2.888C).
The SVD method appears to be the least successful
one. The lowest rmse, 3.648C, is achieved for the solution with one SVD mode; it is worse than for the
superior variant of any other method. With an increasing
number of modes, the error increases rather rapidly,
reaching rmse of 5.228C for four modes.
If the white noise addition is employed to reproduce
the variance instead of inflation, the rating of methods
remains unchanged. The only difference is a considerably weaker agreement of downscaled values with observations, which is nevertheless an expected result. For
example, the areally averaged correlation for the pointwise regression of HGT and TEM drops from 0.821 to
0.718, and rmse increases from 2.888 to 3.628C.
d. Sensitivity to the domain size: Predictors
To examine the effect the size of the domain from
which predictors are selected has on the accuracy of
downscaled values, the calculations are repeated for predictors defined over a smaller area on a denser grid. For
that purpose we use the grid with a 58 3 58 spacing,
extending from 408 to 608N and from 158W to 258E.
The effect on the results comes mainly from the changed
spatial extent of the domain, although the grid density
FIG. 5. Difference in percentage of variance explained between
small and large predictor domain for pointwise regression and HGT
and TEM predictors. Positive values (crosses as station symbols)
indicate that downscaling from the small domain is more accurate.
Negative values (squares as station symbols) indicate that downscaling is less accurate.
is different as well. The latter effect is, however, of little
importance because of high correlation among grid
points. The results are shown in Table 5 for three methods. The number of predictor PCs for a small domain,
5, was selected so that it accounts for approximately the
same share of variance as 11 PCs for a large domain.
The domain size does not affect areally averaged results of the pointwise regression. Although distant grid
points are selected in regression equations quite commonly, they are successfully replaced with closer grid
points that appear to carry almost identical information.
The methods utilizing PCs as predictors behave in a
different way: the approximation is better if predictors
are taken from a smaller area. This is because in the
PCs on a larger domain, due to their larger spatial extent,
the relevant information is more likely to be diluted with
noise from grid points (frequently remote) irrelevant to
temperature at a particular station. The implicit inclusion of such grid points in the regression model cannot
be avoided when PCs are regressed, but they are ignored
in building the pointwise regression model.
Although the areally averaged error of the pointwise
regression method is almost identical for both domains,
nonnegligible differences exist at individual stations
(Fig. 5). In terms of the percentage of variance explained, the difference varies from 25.8% at Holešov,
TABLE 5. Comparison of rmse (in 8C) between large and small predictor domains for pointwise regression, full regression, and CCA, all
for HGT and TEM as predictors. For CCA, the number of predictor/predictand PCs is shown.
Pointwise regression
No. of PCs
No. of modes
Rmse (8C)
Full regression
CCA
Large
Small
Large
Small
Large
Small
—
—
2.88
—
—
2.89
11
—
3.52
5
—
3.34
11/9
5
3.63
5/9
4
3.46
1 JULY 2002
1737
HUTH
large and small networks: Hradec Králové belongs to
the central Czech stations, the other three stations being
in Belgium. The smaller predictand domain results in a
slightly better approximation than the large one, even
if one PC only is involved; however, the performance
of CCA still remains far behind pointwise regression.
The inclusion of information on the west–east temperature contrast, contained in the second PC, has a very
small effect on the accuracy of downscaled values.
FIG. 6. Station locations in dense networks over small domains:
Belgium and central Czech Republic. Stations indicated by stars are
common with the large network.
Czech Republic, to 14.3% at Teplice, Czech Republic
(the difference is positive for small domain yielding
more accurate results). We can observe a tendency for
the small domain to give better results at more elevated
stations, whereas in lowlands, the large domain is preferable, although there are notable exceptions. For example, at lowland stations in eastern Germany and at
Teplice (lowland station in northwest Czech Republic),
the small domain is to be preferred.
e. Sensitivity to the domain size: Predictands
Unlike multiple regression, which calculates the
downscaled values at individual stations independently,
the CCA method (as well as SVD, which is not discussed here because of its poorer performance) treats
the predictand variable as a field. Therefore, its performance may depend on the size of the domain on which
the predictand is defined and on the density of stations.
To examine this, the calculations are performed for 2
dense networks of stations over small domains: 9 stations in the center of the Czech Republic, and 10 stations
in Belgium; their location is shown in Fig. 6.
The most important difference from the large domain
is in the number of PCs necessary to explain the temperature variance. Whereas the first PC over the large
domain explains about three-fourths of the total variance, over the small domains it explains 96%–97%. It
is loaded highly positively everywhere, while the second PC (accounting for about 1% of total variance in
the central Czech Republic and about 1.7% in Belgium)
carries the information on a west–east temperature contrast. Table 6 compares the performance between the
predictand domains for the stations included both in the
5. Temporal and spatial structure
Both temporal and spatial structures were examined
for the time series obtained from statistical models built
on the whole analyzed period, that is, without cross
validation. The temporal structure of downscaled temperatures is characterized by their persistence (lag-1 autocorrelations); the spatial structure is examined by
means of correlation maps and by dividing the domain
into regions using principal component analysis.
a. Persistence
First let us examine the area-averaged persistence,
shown in Table 7 together with deviations from its observed magnitudes. Among all the methods considered,
the pointwise regression with inflation yields the areaaveraged persistence closest to the observations. If the
white noise addition is used to reproduce variance, the
day-to-day variability is enhanced considerably, resulting in unrealistically low lag-1 autocorrelations. All other methods overestimate the persistence. The inclusion
of higher-order PCs of predictors, which tend to describe
processes on smaller spatial and temporal scales, leads
to a slight decrease in the area-averaged persistence:
this holds both for the full regression and CCA. The
effect of including higher-order PCs of predictands and
higher-order canonical modes in the CCA method is not
so straightforward: some PCs and modes are conducive
to a decrease of the overall persistence, others to its
increase. The pointwise regression with inflation is best
also in approximating the spatial distribution of persistence (not shown).
b. Interstation correlations
Thirty-nine sets of correlation maps could have been
produced in order to verify the ability of downscaling
TABLE 6. Comparison of percentage of explained variance for CCA between the large and small predictand domains for selected stations.
HGT and TEM are used as predictors. Methods are described by the number of predictor/predictand PCs and the number of canonical modes.
Pointwise regression is shown for reference.
Domain
Large
Small
Small
—
Method
CCA, 11/7 PCs, 4 modes
CCA, 11/1 PCs, 1 mode
CCA, 11/2 PCs, 1 mode
Pointwise regression
Hradec Králové
(CZ)
47.4
51.8
51.7
65.4
Koksijde
(BE)
Deurne
(BE)
St. Hubert
(BE)
50.9
53.7
53.5
66.9
52.8
56.5
56.7
67.0
62.2
62.9
63.7
74.6
1738
JOURNAL OF CLIMATE
TABLE 7. Lag-1 autocorrelations (31000) averaged over all stations
for different downscaling methods (persistence) and their difference
from observations (bias).
Method
Persistence
Bias
Observed
Pointwise regression, inflation
Pointwise regression, white noise
Full regression, 3 PCs
Full regression, 5 PCs
Full regression, 7 PCs
Full regression, 11 PCs
CCA, 3/9 PCs, 3 modes
CCA, 5/9 PCs, 3 modes
CCA, 7/9 PCs, 4 modes
CCA, 11/9 PCs, 4 modes
CCA, 11/4 PCs, 4 modes
CCA, 11/7 PCs, 4 modes
CCA, 11/9 PCs, 4 modes
CCA, 11/9 PCs, 1 mode
CCA, 11/9 PCs, 2 modes
CCA, 11/9 PCs, 3 modes
CCA, 11/9 PCs, 4 modes
CCA, 11/9 PCs, 5 modes
CCA, 11/9 PCs, 6 modes
CCA, 11/9 PCs, 7 modes
863
855
656
939
939
922
909
889
889
884
883
899
905
883
885
875
888
883
904
900
898
—
28
2207
176
176
159
146
126
126
121
120
136
142
120
122
112
125
120
141
137
135
methods to reproduce the spatial structure of daily temperature fields. Here we present the maps of correlations
with Norderney, the station located at the northwest corner of the domain, for observations and six downscaling
methods (Fig. 7). Correlation maps based on other stations lead to analogous results.
In observations, correlations tend to gradually decrease with an increasing distance. Some irregularities
occur in mountainous areas where close lowland (valley) and elevated (mountain) stations exhibit considerable differences in their link with the distant station
of Norderney. Three such pairs of stations are listed in
Table 8: Zürich and Säntis in Switzerland, Poprad and
Štrbské Pleso in Slovakia, and Teplice and Milešovka
in the Czech Republic. In the first two pairs, stations
with lower elevation are correlated more strongly with
Norderney. In contrast, in the Czech pair, the lowland
Teplice station is more decoupled from surrounding stations because of its isolated position in a relatively deep
valley with frequent temperature inversions, which inhibit surface conditions from being influenced by the
free atmosphere.
The basic geographical features in the field of spatial
correlations are reproduced by all the downscaling
methods: the decrease with increasing distance, low values on the eastern and southern flanks of the domain
(stations Košice, Slovakia, and Klagenfurt, Austria) and
at Austrian and Swiss mountain stations (Feurkogel,
Sonnblick, Säntis), and relatively high values at lowland
and valley stations in the Alpine region (Basel, Zürich,
Davos, Reutte). If only one predictor PC or canonical
mode is used in the statistical model, all stations undergo
identical temporal variations, and correlations between
all station pairs are equal to 1. Adding more PCs and
VOLUME 15
canonical modes to the downscaling equations results
in lowering the magnitude of correlations. However, the
decline of correlations with distance is too flat in the
neighborhood of the base station because of a dominance of one PC/mode and little effect of other PCs/
modes, as can be seen in Fig. 7 for both the full regression and CCA. This flatness of the correlation field
is extreme for the full regression of 3 PCs and still
strongly pronounced for the full regression of 11 PCs.
All the methods that employ the variance inflation
overestimate spatial correlations at all stations. This is
in accord with the findings of Easterling (1999) for
monthly values. The overestimation is most severe for
the full regression of 3 PCs: even rather remote stations
of Teplice and Milešovka correlate with Norderney by
as much as 0.999 (see Table 8). Spatial correlations are
reproduced most closely by CCA and the correspondence is better for more modes involved. CCA with
seven modes outperforms the pointwise regression everywhere except for the seven stations nearest to Norderney where the regression is marginally better. The
differences between the paired lowland/mountain stations are reproduced by all methods to a satisfactory
degree for the Swiss and Slovak stations (Table 8). For
the Czech pair, only CCA with seven modes captures
its different behavior.
The pointwise regression with white noise underestimates spatial correlations at the majority of stations.
The underestimation is largest at sites close to the base
station. A slight overestimation only appears at four
stations with the largest share of variance explained,
that is, with the smallest amounts of added white noise.
It is only these four stations (Säntis, Davos, Sonnblick,
and Feurkogel) where the overestimation of correlations
produced by the pointwise regression is not outweighed
by the added white noise (which is uncorrelated among
stations).
c. Regionalization
One of the common tasks in climatological research
is to divide a research domain into regions within which
the analyzed climate element varies similarly in time
(this is frequently referred to as regionalization). In this
section we examine to what extent the divisions of stations based on the time course of daily temperature agree
between observations and the downscaling methods. We
employ the rotated principal component analysis as a
regionalization tool (Richman and Lamb 1985; Gong
and Richman 1995). In this method, each PC is identified with one region and the stations are assigned to
that region for whose PC they have the highest loading.
PCA is based on the correlation matrix and the ‘‘Direct
Oblimin’’ method of oblique rotation was used.
Results of regionalization are shown in Fig. 8. Four
regions are identified in observations. The largest region
encompasses the lowlands and plains in the northwest
up to Bavaria (the southeastern part of Germany) and
1 JULY 2002
HUTH
1739
FIG. 7. Spatial correlations with the station of Norderney for observations and six downscaling
methods. In the map for observations, Norderney is localized with a square, and the stations,
referred to in Table 8, have gray stars.
the western part of the Czech Republic. The second and
third region involve more continental stations in a more
hilly terrain in the east and southeast (Slovakia, east of
the Czech Republic, eastern Austria), and in the south
(southern Germany, Switzerland). A separate group (although not geographically coherent) is composed of the
five most elevated stations: Säntis and Davos in Swit-
zerland, Sonnblick and Feurkogel in Austria, and
Štrbské Pleso in Slovakia.
The full regression of three predictor PCs does not
provide any classification at all: the same PC dominates
at all stations; that is, the stations are assigned to a single
region covering the whole domain (not shown). The full
regression of 11 predictor PCs produces 3 regions: the
1740
JOURNAL OF CLIMATE
VOLUME 15
TABLE 8. Correlations (31000) with Norderney (Germany) for pairs of close mountain/lowland stations in observations and selected
downscaling methods.
Observed
Pointwise regression, inflation
Pointwise regression, white
noise
Full regression, 3 PCs
Full regression, 11 PCs
CCA, 1 mode
CCA, 11/9 PCs, 4 modes
CCA, 11/9 PCs, 7 modes
Zürich
(CH)
Säntis
(CH)
Poprad
(SK)
Štrbské Pleso
(SK)
Teplice
(CZ)
Milešovka
(CZ)
704
829
494
631
657
804
543
650
728
916
827
929
653
978
896
1000
752
736
521
889
756
1000
543
529
589
922
848
1000
751
744
542
880
710
1000
586
570
633
999
971
1000
949
772
727
999
963
1000
924
916
elevated stations are correctly grouped together, but the
other 2 regions do not correspond to observations; the
large one (denoted by circles) involves all remaining
stations but 3 in the southeast and east (denoted by
crosses). The large region is oversized because of the
overestimation of correlations between stations across
the whole domain, and especially in its northwestern
part, as shown in Fig. 7. The pointwise regression yields
the same regionalization regardless of what method of
variance reproduction is used (with an exception of sta-
FIG. 8. Assignment of stations to regions (denoted by crosses, circles, triangles, and stars) for
observations and selected downscaling methods.
1 JULY 2002
HUTH
tion Holešov, which is assigned to the eastern region
after inflation but to the large northwestern region after
white noise addition). Only three regions are produced
by that method, the large one (denoted by circles) joining the observed large northwest region, the southern
region, and a part of the eastern region. The Štrbské
Pleso station is incorrectly assigned to the eastern region, not among the elevated stations. The best regionalization is achieved by CCA, resulting in four regions
for both four and seven canonical modes. If four modes
are retained in CCA, the border between the northwestern and eastern regions is placed too far east, and
Štrbské Pleso is misassigned. CCA with seven modes
produces regionalization that is identical with observations.
6. Discussion and conclusions
The main conclusions of this paper can be summarized in the following items. They apply to central and
western Europe in winter; the potential for their validity
to be extended to other areas should be proved. Nevertheless, it is likely that in general they are transferable
at least to northern midlatitude continental areas.
1) Large-scale free-atmosphere temperature variables
are more informative predictors of local surface daily
mean temperature than large-scale circulation fields.
The best results are achieved if a pair of predictors,
one temperature and one circulation based, is used.
2) In the accuracy of specification (regardless of whether expressed in terms of rmse, mean absolute error,
or correlation coefficient), the stepwise regression of
gridded data (referred to as ‘‘pointwise regression’’)
is by far the best. It outperforms the regression of
predictors’ principal components (both with and
without stepwise screening), canonical correlation
analysis, as well as singular value decomposition.
Worth mentioning is the absence of overfitting in
most statistical models after higher-order PCs, frequently suspected for being little important and containing noise rather than relevant signal, are added.
3) The magnitude and spatial distribution of lag-1 autocorrelations are best approximated by the pointwise linear regression. All the other methods overestimate the persistence. The situation may be even
worse for summer temperatures, for which the pointwise linear regression itself produces time series that
are too persistent (Huth et al. 2001).
4) The spatial structure of the surface temperature field
is characterized in two ways: by correlation maps
and by dividing the analyzed area into regions within
which temperature covaries similarly. In both respects, CCA appears to be the best method. Nonetheless, all the downscaling methods (including
CCA) overestimate the spatial correlations.
5) The effect of the size of the domain on which predictors are defined is only marginal. A denser net-
1741
work of stations on a smaller domain tends to result
in a more accurate specification if the CCA method
is applied. The density of stations and the size of
predictand’s domain have naturally no effect if the
pointwise regression is used.
6) A comparison of two approaches to reproducing the
original variance in downscaled series indicates that
the variance inflation performs better than the addition of white noise. The latter results in a lower
correspondence between downscaled and observed
values, a strong underestimation of persistence, and
to a general underestimation of spatial correlations.
The final aim of downscaling studies is a construction
of site-specific time series for a future, changed climate,
with use of outputs from general circulation models
(GCMs). From this point of view, the predictors selected
for downscaling must fulfill more demands than only
to provide the best approximation of the observed predictands after the downscaling is applied to the observed
predictors. There are at least two such additional demands. First, the predictors should be well simulated by
the GCM; if they are not, we can have no confidence
in surface variables derived from them (e.g., Palutikof
et al. 1997). Second, the variables selected as predictors
must be predictors not only of natural variability within
the ranges of the present climate, but also of climate
change. This is impossible to prove on the observed
data; instead, one should resort to a GCM output and
compare the GCM-simulated climate change with that
derived from downscaling (Busuioc et al. 1999; Murphy
2000). The agreement between the two does not necessarily mean that the downscaling relationship will
hold under the changed climate and that the selected
variable(s) is (are) a good predictor of climate change
(because of the imperfection of GCM itself ); it rather
hints that the selected predictor(s) may not be a bad
choice.
The study underlines the necessity of validating more
characteristics of downscaled variables than only the
degree to which they fit observations. We show that
different downscaling methods may be most satisfactory
for different characteristics. Therefore, the selection of
the downscaling method should be suited to the aims
of the study. If, for example, the time structure is of
primary importance (including prolonged extreme
events such as cold or heat waves), which is frequently
the case of impact studies, for example, on agriculture
(Dubrovský et al. 2000), the pointwise regression method should be selected. If the aim is in regionalization,
it would perhaps be better to choose the CCA method,
although its performance in terms of correspondence
measures is not as good as that of the pointwise regression.
Acknowledgments. The study was supported by the
Grant Agency of the Czech Republic under Project 205/
99/1561 and by the Grant Academy of the Czech Acad-
1742
JOURNAL OF CLIMATE
emy of Sciences under Project A3042903. Thanks are
due to I. Auer, M. Beniston, A. Berger, A. Kästner, M.
Lapin, E. Nieplová, R. Schweitzer, C. Tricot, and M.
Wolek for the provision of the data and an assistance
in acquiring them. The NCEP–NCAR reanalyses were
obtained from the University of Colorado, Boulder, Colorado. The datasets were preprocessed by N. Klimperová.
REFERENCES
Barnett, T. P., and R. Preisendorfer, 1987: Origins and levels of monthly and seasonal forecast skill for United States surface air temperatures determined by canonical correlation analysis. Mon.
Wea. Rev., 115, 1825–1850.
Barnston, A. G., and C. F. Ropelewski, 1992: Prediction of ENSO
episodes using canonical correlation analysis. J. Climate, 5,
1316–1345.
Bretherton, C. S., C. Smith, and J. M. Wallace, 1992: An intercomparison of methods for finding coupled patterns in climate data.
J. Climate, 5, 541–560.
Busuioc, A., H. von Storch, and R. Schnur, 1999: Verification of
GCM-generated regional seasonal precipitation for current climate and of statistical downscaling estimates under changing
climate conditions. J. Climate, 12, 258–272.
Cavazos, T., 1997: Downscaling large-scale circulation to local winter
rainfall in north-eastern Mexico. Int. J. Climatol., 17, 1069–
1082.
——, 2000: Using self-organizing maps to investigate extreme climate events: An application to wintertime precipitation in the
Balkans. J. Climate, 13, 1718–1732.
Corte-Real, J., X. Zhang, and X. Wang, 1995: Downscaling GCM
information to regional scales: A non-parametric multivariate
regression approach. Climate Dyn., 11, 413–424.
Crane, R. G., and B. C. Hewitson, 1998: Doubled CO 2 precipitation
changes for the Susquehanna basin: Down-scaling from the
GENESIS general circulation model. Int. J. Climatol., 18, 65–
76.
Dubrovský, M., Z. Žalud, and M. Št’astná, 2000: Sensitivity of CERES-Maize yields to statistical structure of daily weather series.
Climatic Change, 46, 447–472.
Easterling, D. R., 1999: Development of regional climate scenarios
using a downscaling approach. Climatic Change, 41, 615–634.
Giorgi, F., and Coauthors, 2001: Regional climate information—Evaluation and projections. Climate Change 2001: The Scientific Basis, J. T. Houghton et al., Eds., Cambridge University Press, 583–
638.
Gong, X., and M. B. Richman, 1995: On the application of cluster
analysis to growing season precipitation data in North America
east of the Rockies. J. Climate, 8, 897–931.
Hanssen-Bauer, I., and E. J. Førland, 1998: Long-term trends in precipitation and temperature in the Norwegian Arctic: Can they
be explained by changes in atmospheric circulation patterns?
Climate Res., 10, 143–153.
Hewitson, B., 1994: Regional climates in the GISS general circulation
model: Surface air temperature. J. Climate, 7, 283–303.
Heyen, H., E. Zorita, and H. von Storch, 1996: Statistical downscaling
of monthly mean North-Atlantic air-pressure to sea level anomalies in the Baltic Sea. Tellus, 48A, 312–323.
Huth, R., 1999: Statistical downscaling in central Europe: Evaluation
of methods and potential predictors. Climate Res., 13, 91–101.
——, 2001: Disaggregating climatic trends by classification of circulation patterns. Int. J. Climatol., 21, 135–153.
——, J. Kyselý, and M. Dubrovský, 2001: Time structure of observed,
GCM-simulated, downscaled, and stochastically generated daily
temperature series. J. Climate, 14, 4047–4061.
Kaas, E., and P. Frich, 1995: Diurnal temperature range and cloud
VOLUME 15
cover in the Nordic countries: Observed trends and estimates
for the future. Atmos. Res., 37, 211–228.
Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77, 437–471.
Karl, T. R., W. C. Wang, M. E. Schlesinger, R. W. Knight, and D.
Portman, 1990: A method of relating general circulation model
simulated climate to the observed local climate. Part I: Seasonal
statistics. J. Climate, 3, 1053–1079.
Klein, W. H., 1962: Specification of monthly mean surface temperatures from 700 mb heights. J. Appl. Meteor., 1, 154–156.
——, and J. E. Walsh, 1983: A comparison of pointwise screening
and empirical orthogonal functions in specifying monthly surface temperature from 700 mb data. Mon. Wea. Rev., 111, 669–
673.
Michaelsen, J., 1987: Cross-validation in statistical climate forecast
models. J. Climate Appl. Meteor., 26, 1589–1600.
Murphy, J., 2000: Predictions of climate change over Europe using
statistical and dynamical downscaling techniques. Int. J. Climatol., 20, 489–501.
Noguer, M., 1994: Using statistical techniques to deduce local climate
distributions. An application for model validation. Meteor. Appl.,
1, 277–287.
O’Lenic, E. A., and R. E. Livezey, 1988: Practical considerations in
the use of rotated principal component analysis (RPCA) in diagnostic studies of upper-air height fields. Mon. Wea. Rev., 116,
1682–1689.
Palutikof, J. P., J. A. Winkler, C. M. Goodess, and J. A. Andresen,
1997: The simulation of daily temperature time series from GCM
output. Part I: Comparison of model data with observations. J.
Climate, 10, 2497–2513.
Richman, M. B., and P. J. Lamb, 1985: Climatic patterns analysis of
three- and seven-day summer rainfall in the central United States:
Some methodological considerations and a regionalization. J.
Climate Appl. Meteor., 24, 1325–1343.
Saunders, I. R., and J. M. Byrne, 1996: Generating regional precipitation from observed and GCM synoptic-scale pressure fields,
southern Alberta, Canada. Climate Res., 6, 237–249.
Schubert, S., 1998: Downscaling local extreme temperature changes
in south-eastern Australia from the CSIRO Mark2 GCM. Int. J.
Climatol., 18, 1419–1438.
——, and A. Henderson-Sellers, 1997: A statistical model to downscale local daily temperature extremes from synoptic-scale atmospheric circulation patterns in the Australian region. Climate
Dyn., 13, 223–234.
Solman, S. A., and M. N. Nuñez, 1999: Local estimates of global
climate change: A statistical downscaling approach. Int. J. Climatol., 19, 835–861.
Trigo, R. M., and J. P. Palutikof, 1999: Simulation of daily temperatures for climate change scenarios over Portugal: A neural network model approach. Climate Res., 13, 45–59.
von Storch, H., 1999: On the use of ‘‘inflation’’ in statistical downscaling. J. Climate, 12, 3505–3506.
Widmann, M., and C. Schär, 1997: A principal component and longterm trend analysis of daily precipitation in Switzerland. Int. J.
Climatol., 17, 1333–1356.
Wilby, R. L., T. M. L. Wigley, D. Conway, P. D. Jones, B. C. Hewitson, J. Main, and D. S. Wilks, 1998: Statistical downscaling
of general circulation model output: A comparison of methods.
Water Resour. Res., 34, 2995–3008.
——, L. E. Hay, and G. H. Leavesley, 1999: A comparison of downscaled and raw GCM output: Implications for climate change
scenarios in the San Juan River basin, Colorado. J. Hydrol., 225,
67–91.
Winkler, J. A., J. P. Palutikof, J. A. Andresen, and C. M. Goodess,
1997: The simulation of daily temperature time series from GCM
output. Part II: Sensitivity analysis of an empirical transfer function methodology. J. Climate, 10, 2514–2532.
Zorita, E., and H. von Storch, 1999: The analog method as a simple
statistical downscaling technique: Comparison with more complicated methods. J. Climate, 12, 2474–2489.