Download Supporting Materials (Text) for “The Hindcast Skill of the CMIP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Supporting Materials (Text) for “The Hindcast Skill of the CMIP
Ensembles for the Surface Air Temperature Trend”
Koichi Sakaguchi1, Xubin Zeng1, and Michael A. Brunke1
Department of Atmospheric Sciences, University of Arizona, Physics-Atmospheric Sciences
Bldg, 1118 E. 4th St., Tucson, AZ, 85721, USA
July 10, 2012
for Journal of Geophysical Research - Atmospheres
1
Table S1. Modeling centers and references for the models used in this study.
Centera
CMIP3
Model
Reference
CM2.0
E-R
ECHO-G
NOAA/GFDL
Delworth et al., 2006
NASA/GISS
Schmidt et al., 2006
MIUB/KMA
Roeckner et al., 1996; Wollf et al., 1997
MIROC3.2 medres
HadGEM1
CGCM2.3.2
CCSM3
CCSR(University of Tokyo)/NIES/JAMSTEC
K-1 Developers, 2004
MOHC
Johns et al., 2006; Martin et al., 2006
MRI
Yukimoto et al., 2006a; 2006b
NCAR
Collins et al., 2006
CMIP5
CM3
E2-R
NOAA/GFDL
Donner et al., 2011; Griffies et al., 2011
NASA/GISS
http://data.giss.nasa.gov/modelE/ar5/
ESM-LR
MIROC-ESM
HadGEM2-ES
CGCM3
CCSM4
MPI
Roeckner et al., 2003; Bathiany et al., 2010
AORI (University of Tokyo)/NIES/JAMSTEC
Watanabe et al., 2011
MOHC
Collins et al., 2011; Martin et al., 2011
MRI
Yukimoto et al., 2011
NCAR
Gent et al., 2011
a
NOAA: National Oceanic and Atmospheric Administration, GFDL: Geophysical Fluid Dynamics Laboratory, NASA: National
Aeronautics and Space Administration, GISS: Goddard Institute for Space Studies, MIUB: Meteorological Institute of the University
of Bonn, KMA: Korea Meteorological Administration, CCSR: Center for Climate System Research, NIES: National Institute for
Environmental Studies, JAMSTEC: Japan Agency for Marine-Earth Science and Technology, MOHC: Met Office Hadley Centre,
MRI: Meteorological Research Institute, NCAR: National Center for Atmospheric Research, AORI: Atmosphere and Ocean Research
Institute
2
Text S2. Uncertainty Analysis
S2.1. Uncertainty in Performance Statistics
The performance statistics (e.g., RMSE, correlation) are calculated from running trend time
series for each grid point at each spatiotemporal scale. Their sampling uncertainty is assessed
using corrected variance based on the serial correlation to reflect the inter-dependence of the
moving windows. For example, the lag-1 autocorrelation ranges from 0.78 (10-year running
trend) to 0.98 (50-year running trend).
For correlation, we followed the effective sample size of order two as reviewed and derived
in Bretherton et al. [1999] [their equation (30)]. Based on this estimated effective sample size,
the Fisher’s Z transformation is used for constructing the confidence interval and statistical
significance test for the correlation. For RMSE, the same equation for the effective sample size is
used to estimate the variance of the sample mean of the squared errors as a function of the
autocorrelation of the squared error time series.
The variance of the sampling distribution of the Brier score (BS) is derived by Bradley et al.
[2008] (their equation (19)) and further modified by Wilks [2010] to incorporate the effect of
serial correlation:
n¢ 1- (1- m x ) {b(1- BS) r1 }
(S1)
=
n 1+ (1- m x ) {b(1- BS) r1 }2
2
where n is the number of samples, n’ is the effective sample size, μx is the climatological
probability of the event occurrence (in our case the mean probability of positive trend from all
the available time windows for a given grid), BS is the Brier score, r1 is the lag-1 autocorrelation
3
of the predicted probability time series, and b is a parameter set to be 0.8 in this study, although
ideally to be varied depending on the spatiotemporal scales and associated ensemble
characteristics. Sensitivity of the variance to the parameter b is small such that the change of 0.1
in b results in the change of 0.001 for the square root of the sample variance (i.e., standard error
for BS).
For the rank histogram, the Chi-square test for uniformity (i.e., reliable prediction) is found
to be sensitive to the added noise to reflect the uncertainty of the observation [Anderson, 1996;
Candille and Talagrand, 2008], particularly at smaller scales in our analysis. Therefore, the
Monte-Carlo approach was taken by making 1000 realizations of the rank histogram by adding
different random noise to ensemble members. The random noise for a given grid is based on the
difference between the running trend time series from HadCRUT3 and NCDC, assuming a
normal distribution with mean of zero (since the mean difference of the two observational data is
substantially smaller than those between model simulations and HadCRUT3) and the variance of
the difference of the running trend time series. This ‘perturbed ensemble’ approach is also used
to estimate the effect of the observational errors for RMSE, r, and BS as described in the main
text (Section 2).
The critical values in the standard Chi-square test assumes independent samples, and in
order to take into consideration the effect of serial correlation Wilks [2004] introduced additive
corrections to the critical values for a given significance level and lag-1 autocorrelation (his
Table 2). Since the additive corrections are given for particular lag-1 autocorrelations (0.4, 0.5, ...
0.9), we interpolate them for different autocorrelation values by cubic spline interpolation
(provided by MATLAB software). We did not interpolate the additive corrections at each grid
point, instead we used typical lag-1 autocorrelations for the five temporal scales in our analysis;
4
0.78, 0.92, 0.96, 0.97, and 0.98. This simplified correction seems to be reasonable since most of
the time the chi-square test values from the Monte-Carlo distribution do not exceed the critical
values even without the corrections.
S2.2. Field Significance
It is desirable to test if the number of grid points (or area), at which a performance statistic
is better than a given null hypothesis such as zero correlation with significance for a given level,
exceeds the numbers of grid points with the local significance obtained by chance using a
binomial distribution, and also whether the grid points with significance is affected by spatial
correlation of the surface air temperature anomaly [Livezey and Chen, 1983]. For the 15°x15°
and smaller spatial scales, the spatial correlation of the surface air temperature anomaly is not
negligible [Hansen et al. 1999], and we take it into consideration by the Monte Carlo approach.
We randomly resampled HadCRUT3 data 1000 times following the moving blocks bootstrap that
keeps the spatial and temporal correlations in the data (Wilks [1997]; although the lag-1
autocorrelations of the resampled data is usually lower than the original observation by ~0.1 in
our case). It gives the distribution of the numbers of grid points with local significance that can
be obtained by chance under similar spatial and temporal correlations. For the 5°x5° and 50-year
spatiotemporal scale, we were not able to produce bootstrap samples due to the higher fraction of
missing data and strong autocorrelation of the trend time series.
The null hypotheses for each statistic at grid level are:
1) RMSE is equal to or greater than the variability of the observed running trend.
2) Correlation is equal to or smaller than zero.
5
3) Brier Score is equal to or smaller than that obtained by a climatological probability of
positive trend. This is actually explored by testing whether Brier Skill Score (BSS) is equal to or
smaller than zero,
BSS =
BS - BSclim
BS
=10 - BSclim
BSclim
(S2)
where BSclim is the BS that is obtained by using a constant climatological probability of the event
[Wilks 2010]. The sample variance for BSS was derived by Bradley et al. [2008] separately from
that for BS, and the correction based on the auto-correlation of the probabilistic forecasts was
suggested by Wilks [2010]. In general the sample variance of BSS is less affected by the autocorrelation than that for BS.
To summarize the result, field significance (at the α = 0.05 level for both local and field
significance) was not obtained for the first null hypothesis concerning RMSE for any of the
spatial scales explored here (15°x15° and smaller) with all the temporal scales. For correlation
between the ensemble means and HadCRUT3 trends, the CMIP5 ensemble obtained field
significance for the 10-year trend at all the three spatial scales while the CMIP3 ensemble did not
obtain field significance at all the scales (again we consider all the temporal scales and the spatial
scales of 15°x15° and smaller). For BS being better than climatology, both ensembles obtain
field significance at all the scales except for the 10-year scale at all the three spatial scales in the
CMIP3 ensemble.
In the main text (section 3.1) we used a simpler approach to summarize the model
performance at each scale by comparing the 75th percentiles across the grid points for RMSE
and BS (i.e., 75% of the available grid points have smaller RMSE or BS) and the 25th percentiles
6
for correlation (i.e., 75% of the available grid points have higher correlation) to some reference
values: unity for RMSE after normalized by the standard deviations of the observed running
trend, 0.7 for correlation that corresponds to r2 ~ 0.5, and 0.25 for BS that corresponds to the
score by a random 50-50 guess. This approach is simpler and easily reproducible. Even though
this approach seems to throw away the sampling uncertainty at grid point level, the under- or
overestimations of the performance statistics can potentially cancel out since a large number of
grid points is involved to summarize the performance (75% of the available ones).
S2.3. Uncertainties associated with model selection
We have done sensitivity tests to infer how sensitive the main results of this study are to the
selection of the models. First, we formed 21 different combinations (maximum possible number
of combinations out of seven models) of five-model ensembles from seven models from CMIP5
(two of the seven models are different from the ensembles used for the results shown in this
study). It is found that, among the 21 ensemble means, their RMSE is visually indistinguishable,
the spread in correlation is less than 0.1, and the spread in BS from each ensemble is less than
0.03 except for the 50-year trend at large spatial scales where the spread can be as high as 0.1.
Therefore, the deterministic performance of the ensemble mean does not seem to be so sensitive
to model selection, but for probabilistic predictions on larger spatiotemporal scale (with higher
sampling uncertainty) the model selection may have some impacts. The sensitivities of the
probabilistic predictions are further explored with five more different ensembles, and the results
are included in section 4 of the main text.
7
References
Anderson, J. L. (1996), A method for producing and evaluating probabilistic forecasts from
ensemble model integrations, J. Clim., 9, 1518-1540.
Bathiany, S., M. Claussen, V. Brovkin, T. Raddatz, and V. Gayler (2010), Combined
biogephysical and biogeochemical effects of large-scale forest cover changes in the MPI
earth system model, Biogeosciences, 7, 1383-1399, doi:10.5194/bg-7-1383-2010.
Bradley, A. A., S. S. Schwartz, and T. Hoshino (2008), Sampling uncertanity and confidence
intervals for the Brier Score and Brier Skill Score, Wea. Forecasting, 23(5), 992-1006.
Bretherton, C. S., M. Widmann, V. P. Dymnikov, J. M. Wallace, and I. Bladé (1999), The
effective number of spatial degrees of freedom of a time-varying field, J. Clim., 12, 19902009.
Candille, G., and Talagrand, O. (2008), Impact of observational error on the validation of
ensemble prediction systems, Q. J. R. Meteorol. Soc. 13, 959-971.
Collins, W. D., et al. (2006), The Community Climate System Model version 3 (CCSM3), J.
Clim.,19(11), 2122-2143.
Collins, W. J., et al. (2011), Development and evaluation of an Earth-System model –
HadGEM2, Geosci. Model Dev., 4, 1051-1075, doi:10.5149/gmd-4-1051-2011.
Delworth, T. L., et al. (2006), GFDL’s CM2 global coupled climate models. Part I: Formulation
and simulation characteristics, J. Clim., 19(5), 643-674.
Donner, L. J., et al. (2011), The dynamical core, physical parameterizations, and basic simulation
characteristics of the atmospheric component AM3 of the GFDL global coupled model
CM3, J. Clim., 24(13), 3484-3519.
8
Gent, P. R., et al. (2011), The Community Climate System Model version 4, J. Clim., 24, 49734991.
Griffies, S. M., et al. (2011), The GFDL CM3 coupled climate model: Characteristics of the
ocean and sea ice simulations, J. Clim., 24(13), 3520-3544.
Hansen, J., R. Ruedy, J. Glascoe, and M. Sato (1999), GISS analysis of surface temperature
change, J. Geophys. Res., 104(D24), 30,997-31,022
Johns, T. C., et al. (2006), The new Hadley Centre climate model (HadGEM1): Evaluation of
coupled simulations, J. Clim., 19, 1327-1353.
Jones, C. D., et al. (2011), The HadGEM2-ES implementation of CMIP5 centennial simulations,
Geosci. Model Dev., 4, 543-570, doi:10.5149/gmd-4-543-2011.
K-1 Model Developers (2004), K-1 Coupled Model (MIROC) Description, edited by H. Hasumi
and S. Emori, Center for Climate System Research, University of Tokyo, Tokyo, Japan.
Livezey, R. E., and W. Y. Chen (1983), Statistical field significance and its determination by
Monte Carlo techniques, Mon. Wea. Rev., 111, 46-59.
Martin, G. M., M. A. Ringer, V. D. Pope, A. Jones, C. Dearden, and T. J. Hinton (2006), The
physical properties of the atmosphere in the new Hadley Centre Global Environmental
Model, HadGEM1. Part I: Model description and global climatology, J. Clim., 19, 12741301.
Martin, G. M., et al. (2011), The HadGEM2 family of Met Office Unified Model climate
configurations, Geosci. Model Dev., 4, 723-757, doi:10.5149/gmd-4-723-2011.
Roeckner, E., et al. (1996), The atmospheric general circulation model ECHAM4: Model
description and simulation of present-day climate, MPI report No.218, Max-Planck-Institut
für Meteorologie, Hamburg, Germany.
9
Roeckner, E., et al. (2003), The atmospheric general circulation model ECHAM5. Part I: Model
description, MPI report No.349, Max-Planck-Institut für Meteorologie, Hamburg, Germany.
Schmidt, G. A., et al. (2006), Present-day atmospheric simulations using GISSS ModelE:
Comparison to in situ, satellite, and reanalysis data, J. Clim., 19(2), 153-192.
Yukimoto, S., A. Noda, T. Uchiyama, S. Kusunoki, and A. Kitoh (2006a), Climate Changes of
the twentieth through twenty-first centuries simulated by the MRI-CGCM2.3, Pap. Metor.
Geophys., 56, 9-24.
Yukimoto, S., et al. (2006b), Present-day climate and climate sensitivity in the Meteorological
Research Institute Coupled GCM version 2.3 (MRI-CGCM2.3), J. Metor. Soc. Japan, 84(2),
333-363.
Yukimoto, S., et al. (2011), Meteorological Research Institute-Earth System Model version 1
(MRI-ESM1), Technical Report of the Meteorological Research Institute, No. 64,
Meteorological Research Institute, Tsukuba, Ibaraki, Japan.
Watanabe, S., et al. (2011), MIROC-ESM 2010: model description and basic results of CMIP520c3m experiments, Geosci. Model Dev., 4, 845-872, doi:10.5194/gmd-4-845-201.
Wilks, D. S. (1997), Resampling hypothesis tests for autocorrelated fields, J. Clim., 10(1), 65-82.
Wilks, D. S. (2004), The minimum spanning tree histogram as a verification tool for
multidimensional ensemble forecasts, Mon. Wea. Rev., 132, 1329-1340.
Wilks, D. S. (2010), Sampling distributions of the Brier score and Brier skill score under serial
dependence, Q. J. R. Meteorol. Soc., 136, 2109-2118.
Wolff, J.-O., E. Maier-Reimer, and S. Lebutke (1997), The Hanburg Ocean Primitive Equation
Model, DKRZ Technical Report No.13, Deutsches KlimaRechenZentrum, Hamburg,
Germany.
10
Figure Captions
Figure S1. Time series of global mean running linear trend from CMIP3 and CMIP5 models and
two observations. The top two panels for 10-year, and the bottom two panels for 50-year trends.
The left and right column shows CMIP3 and CMIP5 model simulations, respectively.
Figure S2. Spatial distribution of RMSE against HadCRUT3 SAT trend. The x-axis shows the
number of years for the linear trend, grouped for eight different spatial scales (labeled at the top
of each panel with the same notation to Fig.1). Edges of the boxes represent the 25th and 75th
percentiles for the statistics from all the grid points (black-NCDC, green: CMIP5-EM). The
medians are shown by lines as appear in the legend. The dashed lines in global average
subpanels show 95% confidence intervals.
11