Download Low Frequency Hydroclimate Variability

Document related concepts
Transcript
A Stochastic Nonparametric Framework
for Ensemble Hydrologic Forecast and
Downscaling
Balaji Rajagopalan
Department of Civil and Environmental Engg.
University of Colorado
Boulder, CO
RSMAS/U Miami – Spring 2003
Acknowledgements
Upmanu Lall
(Lamont-Doherty Earth Observatory)
Columbia University, NY
James Prairie, Katrina Grantz, Somkiat
Apipattanavis
Subhrendu Gangopadhyay, Martyn Clark
CIRES/University of Colorado, Boulder, CO
David Yates
NCAR/University of Colorado, Boulder, CO
A Water Resources Management Perspective
Inter-decadal
Decision Analysis: Risk + Values
T
• Facility Planning
i
– Reservoir, Treatment Plant Size
m
e
• Policy + Regulatory Framework
Climate
– Flood Frequency, Water Rights, 7Q10 flow
H
o
r
i
z
o
n
• Operational Analysis
– Reservoir Operation, Flood/Drought Preparation
• Emergency Management
– Flood Warning, Drought Response
Data: Historical, Paleo, Scale, Models
Hours
Weather
Ensemble Forecast (or Scenarios
generation)
• Scenarios (synthetic sequences) of hydroclimate are simulated
for various decision making situations
Reservoir operations (USBR/Riverware)
Erosion Prediction (USDA/WEPP)
Reservoir sizing (Flood frequency)
• Given [Yt] t = 1,2,…,N hydroclimate time series (e.g. daily
weather variables, streamflow, etc.)
Parametric models are fit
(probability density functions – Gamma, Exponential etc.)
Time series Models
(Auto Regressive Models)
The Problem
• Ensemble Forecast/Stochastic Simulation
/Scenarios generation – all of them are
conditional probability density function
problems
f  yt y , y ,..., y  
t 1
t 2
t p
f ( yt , yt 1 , yt  2 ,..., yt  p )
 f ( y , y , y ,..., y ) dy
• Estimate conditional PDF and simulate
(Monte Carlo, or Bootstrap)
t
t 1
t 2
t p
t
All India Monthly Rainfall
Parametric Models
• Periodic Auto Regressive model (PAR)
– Linear lag(1) model
y, =   + 1   y   – 1 –  – 1  +   ,
– Stochastic Analysis, Modeling, and Simulation (SAMS)
(Salas, 1992)
• Data must fit a Gaussian distribution
• Expected to preserve
– mean, standard deviation, lag(1) correlation
– skew dependant on transformation
– gaussian probability density function
Parametric Models - Drawbacks
• Model selection / parameter estimation issues
Select a model (PDFs or Time series models)
from candidate models
Estimate parameters
• Limited ability to reproduce nonlinearity and nonGaussian features.
All the parametric probability distributions are
‘unimodal’
All the parametric time series models are ‘linear’
Parametric Models - Drawbacks
• Models are fit on the entire data set
Outliers can inordinately influence parameter
estimation
(e.g. a few outliers can influence the mean,
variance)
Mean Squared Error sense the models are
optimal but locally they can be very poor.
Not flexible
• Not Portable across sites
Nonparametric Methods
• Any functional (probabiliity density, regression etc.)
estimator is nonparametric if:
It is “local” – estimate at a point depends only on
a few neighbors around it.
(effect of outliers is removed)
No prior assumption of the underlying functional
form – data driven
Nonparametric Methods
• Kernel Estimators
(properties well studied)
• Splines
• Multivariate Adaptive Regression Splines (MARS)
• K-Nearest Neighbor (K-NN) Bootstrap Estimators
• Locally Weighted Polynomials (K-NN Polynomials)
K-NN Philosophy
• Find K-nearest neighbors to the desired point x
• Resample the K historical neighbors (with high
probability to the nearest neighbor and low
probability to the farthest)  Ensembles
• Weighted average of the neighbors  Mean Forecast
• Fit a polynomial to the neighbors – Weighted Least
Squares
– Use the fit to estimate the function at the desired point x
(i.e. local regression)
• Number of neighbors K and the order of polynomial
p is obtained using GCV (Generalized Cross
Validation) – K = N and p = 1  Linear modeling
framework.
K-Nearest Neighbor Estimators
k/n
k/n
f NN (x) =
=
Vk (x) c d rk d (x)
A k-nearest neighbor
density estimate
n
x  x 
1
i
f GNN(x) = d
K

rk (x)n i1  rk(x) 

f
A conditional k-nearest
neighbor density estimate
GNN
(x | D)  f (x, D) / f (D)
(r (x, D)n)-1
n  (x, D)  (x , D ) 
k
i i

 K
r (x, D) 
n  x  x  i 1 

k
1
i
(r (x)n)  K

k
i  1  rk (x) 
f(.) is continuous on Rd, locally Lipschitz of order p
k(n) =O(n2p/(d+2p))
A k-nearest neighbor ( modified
Nadaraya Watson) conditional mean
estimate
K (u)  0,  uK (u)du  0,  u 2K (u)du  
 D * D 
n


x
K
 i  r (D*)i 


i1
 k

mˆ (x | D  D*) 
n  D * Di 
 K

i1  rk (D*) 
Classical Bootstrap (Efron):
Given x1, x2, …... xn are i.i.d. random variables with a cdf F(x)
n
ˆ
F
(
x
)

Construct the empirical cdf
 I (xi  x) / n
i1
Draw a random sample with replacement of size n from Fˆ (x)
Moving Block Bootstrap (Kunsch, Hall, Liu & Singh) :
Resample independent blocks of length b<n, and paste them together to form a series of
length n
k-Nearest Neighbor Conditional Bootstrap (Lall and Sharma, 1996)
Construct the Conditional Empirical Distribution Function:
n
Fˆ (x | D*)   I (xi  x)I (Di  Br (D*))K (i) / k
i1
k
ˆ
Draw a random sample with replacement from F (x | D*)
A time series from the model
xt+1 = 1 - 4(xt - 0.5)2
1
k-nearest neighborhoods A
and B for xt=x*A and x*B
respectively
0.75
x t 0.5
1
1
State
4
S
2
0.75
3
A
•
0
25
50
75
time
100
B
3
•
Di
•
•D3 D1 • D2
•
•
•
0
4
Values of x t
1
0.25
State
2
3
0.5
125
xt+1
2
0.25
x* A
x* B
1
0
0
0.25
0.5
0.75
xt
Logistic Map Example
4-state Markov Chain
discretization
1
Define the composition of the "feature vector" Dt of dimension d.
(1) Dependence on two prior values of the same time series.
Dt : (xt-1, xt-2) ; d=2
(2) Dependence on multiple time scales (e.g., monthly+annual)
Dt: (xt-1, xt-21, .... xt-M11; xt-2, xt-22, ..... xt-M22) ; d=M1+M2
(3) Dependence on multiple variables and time scales
Dt: (x1t-1, .... x1t-M11; x2t, x2t-2, .... x2t-M22); d=M1+M2+1
Identify the k nearest neighbors of Dt in the data D1 ... Dn
Define the kernel function ( derived by taking expected values of distances to each
of k nearest neighbors, assuming the number of observations of D in a neighborhood
Br(D*) of D*; r0, as n , is locally Poisson, with rate (D*))
K(j) 
for the jth nearest neighbor
1/j
k
j =1...k
1/j
i 1
Selection of k: GCV, FPE, Mutual Information, or rule of thumb (k=n0.5)
Applications to date….
• Monthly Streamflow Simulation Space and time disaggregation of
monthly to daily streamflow
• Monte Carlo Sampling of Spatial Random Fields
• Probabilistic Sampling of Soil Stratigraphy from Cores
• Hurricane Track Simulation
•Multivariate, Daily Weather Simulation
• Downscaling of Climate Models
•Ensemble Forecasting of Hydroclimatic Time Series
• Biological and Economic Time Series
• Exploration of Properties of Dynamical Systems
• Extension to Nearest Neighbor Block Bootstrapping -Yao and Tong
K-NN Local Polynomial
K-NN Algorithm
k N
90  9
yt *
yt-1
Residual Resampling
yt = yt* + et*
e t*
yt *
yt-1
Applications
K-NN Bootstrap
• Weather Generation – Erosion Prediction
• Precipitation/Temperature Downscaling
Local-Polynimial + K-NN residual bootstrap
• Ensemble Streamflow forecasting
Truckee-Carson basin, NV
• Ensemble forecast from categorical probabilistic
forecast
Local Polynomial
• Flood Frequency Analysis
• Salinity Modeling in the Upper Colorado Basin
Is a 2 State
Markov Chain
Adequate ?
Can the lag-0
and lag-1
dependence
across variables
be easily
preserved ?
Are multi-scale
statistics preserved by
the Daily Model ?
Our current implementation
uses moving window
seasons
January-March Daily Weather, Salt Lake City - Wet Days
January-March Daily Weather, Salt Lake City - Dry Days
mean wet spell length
fraction of wet days
standard deviation of wet spell length
longest wet spell length
mean dry spell length
fraction of dry days
standard deviation of dry spell length
longest dry spell length
Mean seasonal precipitation
Variance of seasonal precipitation
Annual Mean Annual Variance
k-nn daily Simulations of
Precipitation
- Performance in terms of
aggregated statistics
SRAD and TMX
TMX and TMN
TMN and P
SRAD and TMN
SRAD and DPT
TMX and DPT
TMX and P
TMN and DPT
DPT and P
lag 0 cross correlation,
for selected daily
variables
MAR-1 simulations
SRAD and TMX
SRAD and TMN
SRAD and DPT
TMX and DPT
TMX and P
k-nn simulations
TMX and TMN
TMN and P
TMN and DPT
DPT and P
Mean Annual Erosion in Kg/Sq. m.
Location CLIGEN BOOTCLIM BOOTCLIM-scramble
Idaho
0.6
0.1
0.5
Oregon
5.2
1.4
3.2
Arizona
0.6
0.3
0.6
Impact of Improper Dependence Structure on Erosion Estimated
from Physical Model (WEPP) using Simulated Weather
Differences in CLIGEN/BOOTCLIM-scramble vs BOOTCLIM
are due to inability vs ability to preserve cross-correlations
between temperature and precipitation (and hence rain/snow)
Region
Figure 1. Map depicting the 21-state area of interest in this study. The numbers indicate stations grouped by region. The two dark filled squares in
the east are Stations 114198 and 112140 in Region 4, and the two dark squares in the west are Stations 52281 and 52662 in Region 7.
Map depicting the 21-state area of interest in this study. The numbers indicate stations
grouped by region. The two dark filled squares in the east are Stations 114198 and
112140 in Region 4, and the two dark squares in the west are Stations 52281 and 52662
in Region 7.
Temperature Mean and Standard Deviations
Precipitation Statistics
Lag Correlations
Spatial Correlations
Downscaling Concept
Horizontal resolution
~ 200 km
[scale mis-match]
Area of interest
~500 to 2000 km2
•
Purpose: Downscale global-scale atmospheric forecasts to local scales
in river basins (e.g., individual stations).
Downscaling Approach
•
Identify outputs from the global-scale Numerical Weather Prediction
(NWP) model that are related to precipitation and temperature in the
basins of interest
–
–
–
–
•
Geo-potential height, wind, humidity at five pressure levels etc.
Various surface flux variables
Computed variables such as vorticity advection, stabilitiy indices, etc.
Variables lagged to account for temporal phase errors in atmospheric forecasts.
Use NWP outputs in a statistical model to estimate precipitation and
temperature for the basins
–
–
–
–
–
–
Multiple linear regression
K-nn
NWS bias-correction methodology
Local polynomial regression
Canonical Correlation Analysis
Artificial Neural Networks
Multiple Linear Regression (MLR) Approach
•
•
•
•
•
•
Multiple linear Regression with forward selection
Y = a0 + a1X1 + a2X2 + a3X3 . . . + anXn + e
Use cross-validation procedures for variable selection – typically less than 8
variables are selected for a given equation
A separate equation is developed for each station, each forecast lead time, and each
month.
Stochastic modeling of the residuals in the regression equation is done to provide
ensemble time series
The ensemble members are subsequently shuffled to reconstruct the observed
spatio-temporal covariability
Regression coefficients are estimated from the period of the NCEP 1998 MRF
hindcast (1979-2001)
K-nn Approach - Methodology
• Get all the NCEP MRF output variables within a 14 day window (7
days, lag+lead) centered on the current day
•Perform EOF analysis of the climate variables and retain the first few
leading Pcs, that capture most of the variance
•~6 Pcs capture about 90% of the variance
•The PC space leading Pcs becomes the “feature vector”
•Project the forecast climate variable of the current day on to the PC
space – i.e. The “feature vector”
• Select the “nearest” neighbor to the “feature vector” in the PC space
– hence, a day from the historical record.
Snowmelt
Dominated
Cle Elum
Rainfall
Dominated
526km2
East Fork
of
the Carson
Animas
Snowmelt
Dominated
Snowmelt
Dominated
922km2
BASINS
3626km2
Alapaha
1792km2
Results
• RPSS – precipitation and maximum temperature, MLR and KNN
Ranked Probability Skill Score (RPSS) = 1 – RPSSf / RPSSc
• Spatial autocorrelation – precip, max temp, MLR and KNN
• Lag-1 autocorrelation – precip, max temp, MLR and KNN
MLR Approach – RPSS, PRCP-Jan
Knn Approach – RPSS, PRCP-Jan
MLR Approach – RPSS, PRCP-July
Knn Approach – RPSS, PRCP-July
MLR Approach – RPSS, TMAX-Jan
Knn Approach – RPSS, TMAX-Jan
MLR Approach – RPSS, TMAX-Jul
Knn Approach – RPSS, TMAX-Jul
MLR – Spatial Cor., Unshuffled CO4734-CO1609
Knn Approach – Spatial Cor., CO4734-CO1609
MLR – Spatial Cor., Unshuffled GA0140-GA2266
Knn Approach – Spatial Cor., GA0140-GA2266
MLR – Lag-1, Unshuffled CO7017
MLR – Lag-1, Shuffled CO7017
Knn Approach – Lag-1, CO7017
Knn Approach – Reliability Diagrams (1day forecast),
CO7017
Knn Approach – Lag-1, CO7017
Knn Approach – Lag-1, CO7017
Knn Approach – Lag-1, CO7017
Conclusion: Comparison of MLR and KNN
K-NN method exhibits comparable to better skills than the MLR in
downscaling daily precipitation/temperature
•
The K-NN provides a flexible and parsimonious framework for
downscaling.
•
The K-NN approach can be improved to better capture the temporal
dependence and also to generate sequences not seen in history.
•
Hydrologic Forecasting
•
•
•
•
Conditional Statistics of Future State, given Current State
Current State: Dt : (xt, xt-, xt-2 , …xt-d1, yt, yt- , yt-2, …yt-d2)
Future State: xt+T
Forecast: g(xt+T) = f(Dt)
– where g(.) is a function of the future state, e.g., mean or pdf
– and f(.) is a mapping of the dynamics represented by Dt to g(.)
– Challenges
• Composition of Dt
• Identify g(.) given Dt and model structure
– For nonlinear f(.) , Nonparametric function estimation methods used
•
•
•
•
K-nearest neighbor
Local Regression
Regression Splines
Neural Networks
Ensemble Forecast of Spring Streamflows on the Truckee
and Carson Rivers
Study Area
WINNEMUCCA
LAKE (dry)
NEVADA
CALIFORNIA
PYRAMID
LAKE
Nixon
Stillwater NWR
Derby
Dam
STAMPEDE
Reno/Sparks
INDEPENDENCE
DONNER
Fernley
Newlands
Project
Farad
MARTIS
Carson
City
Ft Churchill
Tahoe City
LAKE TAHOE
Fallon
TRUCKEE
RIVER
BOCA
PROSSER
Truckee
TRUCKEE
CANAL
CARSON
RIVER
LAHONTAN
CARSON
LAKE
Motivation
•
USBR needs good seasonal forecasts on Truckee and
Carson Rivers
•
Forecasts determine how
storage targets will be met
on Lahonton Reservoir to
supply Newlands Project
Truckee Canal
Outline of Approach
• Climate Diagnostics
To identify large scale features correlated to Spring flow in the
Truckee and Carson Rivers
• Ensemble Forecast
Stochastic Models conditioned on climate indicators (Parametric and
Nonparametric)
• Application
Demonstrate utility of improved forecast to water management
Data
– 1949-1999 monthly averages
•
•
•
•
Streamflow at Ft. Churchill and Farad
Precipitation (regional)
Geopotential Height 500mb (regional)
Sea Surface Temperature (regional)
Annual Cycle of Flows
Fall Climate Correlations
Carson Spring Flow
500 mb Geopotential Height
Sea Surface Temperature
Winter Climate Correlations
Carson Spring Flow
500 mb Geopotential Height
Sea Surface Temperature
Winter Climate Correlations
Truckee Spring Flow
500 mb Geopotential Height
Sea Surface Temperature
Climate Composites
High-Low Flow
Sea Surface Temperature
Vector Winds
Precipitation Correlation
Geopotential Height Correlation
SST Correlation
Flow - NINO3 / Geopotential Height
Relationship
Regression Fit
Linear Fit
Local Fit 
Precip Fit 
The Forecasting Model
• Forecast Spring Runoff in Truckee and Carson Rivers
using Winter Precipitation and Climate Data Indices
(Geopotential height index and SST index).
• Linear Regression:
- can capture only linear relationship
- inability to generate ensembles
- Symmetric uncertainity bands
• Modified K-NN Method:
– Uses Local Polynomial for the mean forecast
– Bootstraps the residuals for the ensemble
Wet Years: 1994-1999
1994
1995
1996
1997
1998
1999
1994
1995
1996
1997
1998
1999
Precipitation
1994
1995
1996
1994
1995
1996
1997
1997
1998
1998
1999
1999
Precipitation and Climate
• Overprediction w/o Climate (1995, 1996)
– Might release water for flood control– stuck in spring with
not enough water
• Underprediction w/o Climate (1998)
Dry Years: 1987-1992
1987
1988
1989
1990
1991
1992
1987
1988
1989
1990
1991
1992
1987
1988
1989
1987
1988
1989
Precipitation
1990
1991
1990
Precipitation and Climate
• Overprediction w/o Climate (1998, 991)
– Might not implement necessary drought
precautions in sufficient time
1991
1992
1992
Fall Prediction w/ Climate
1994
1995
1996
1997
1998
1999
1994
1995
1996
1997
1998
1999
Wet Years
1987
1988
1989
1987
1988
1989
1990
1990
1991
1991
1992
1992
Dry Years
• Fall Climate forecast captures whether season will be
above or below average
• Results comparable to winter forecast w/o climate
Simple Water Balance
St = St-1 + It - Rt
• St-1 is the storage at time ‘t-1’, It is the inflow at time ‘t’
and Rt is the release at time ‘t’.
• Method to test the utility of the model
• Pass Ensemble forecasts (scenarios) for It
• Gives water managers a quick look at how much storage
they will have available at the end of the season – to evluate
decision strategies
For this demonstration,
• Assume St-1=0, Rt= 1/2(avg. Inflowhistorical)
Water Balance
1995 Storage
1995 K-NN
Ensemble
PDF
Historical
PDF
Future Work
• Stochastic Model for
Timing of the Runoff
Disaggregate Spring flows to monthly flows.
• Statistical Physical Model
Couple PRMS with stochastic weather generator
(conditioned on climate info.)
• Test the utility of these approaches to water
management using the USBR operations model
in RiverWare
Region / Data
6 rainfall stations
- Nakhon Sawan, Suphan Buri, Lop
Buri, Kanchana Buri, Bangkok, and
Don Muang
3 streamflow stations
(Chao Phaya basin)
- Nakhon Sawan, Chai Nat, Ang-Thong
5 temperature stations
- Nakhon Sawan, Lop Buri, Kanchana
Buri, Bangkok, Don Muang
Large Scale Climate
Variables
NCEP-NCAR Re-analysis data
(http://www.cdc.noaa.gov)
Composite Maps of High rainfall
Pre 1980
Post 1980
Composite Maps of Low rainfall
Pre 1980
Post 1980
Example Forecast for 1997
Conditional Probabilities
from historical data
(Categories are at Quantiles)
Categorical ENSO forecast
Conditional flow probabilites
using Total Probability Theorem
La Nina
Neu
El Nino
Flow
Low
Neu
0.000
0.538
0.320
0.440
0.385
0.538
La Nina
0.2
Low
0.3
High
0.462
0.240
0.077
Neu
El Nino
0.2
0.6
Neu
0.52
High
0.19
Ensemble Forecast from Categorical
Probabilistic forecasts
• If the categorical probabilistic forecasts are P1,
P2 and P3 then
– Choose a category with the above probabilities
– Randomly select an historical observation from the
chosen category
– Repeat this a numberof times to generate ensemble
forecasts
Ensemble Forecast of Thailand
Streamflows – 1997
Initial Study Area: 6 reservoirs in
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
Jaguaribe-Metropolitano Hidrossytem
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
$ %U
T
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
T
$
T
$
#
#%
U#
T
$
#
#
#
#
#
#
U
%
#
#
Reservatório
T 0- 54
$
T 54 -148
$
#
#
#
#
#
# #%
U
##$
T# # #
#
#
#
$$T
T
#
#
#
Fortaleza
%
U
S
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
S
#
T
$
#
#
S
#
S
#
#
#
#
T
$
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
T# %U
$
#
#
#
#
##
#
Oros Reservoir
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
N
#
#
#
#
#
#
#
#
1001 -4725
4726 -9705
# 9706 -21909
# 21910 -48163
# 48164 -465319
Demanda
U 0.3
%
U 0.3- 0.57
%
U 0.57- 4
%
%
U 4- 5.11
%
U 5.11- 9.14
S NódePassagem
#
Link
Canais.shp
Rios.shp
Açudespol.shp
Bacia.shp
#
#
#
175 -480
#
#
#
#
T
$
480 -1940
T
$
População.dbf
#
%
U
%
U #%
U
#
T
$
148 -175
W
E
#
#
S
Jaguaribe
80% irrigation
20% municipal
Mainly in Aug
To November
Metropolitan
80% Municipal
20% Irrigation
Uniform distribution
Over the year
90
Marginal 90%
80
70
Per90%
Per75%
60
Per50%
Marginal 75%
50
Per25%
Per10%
40
Obs
30
Marginal 50%
20
10
Marginal 25%
Marginal 10%
0
1993
1994
1995
1996
1997
1998
1999
2000
Oros Annual Flow Forecast from previous July
– model fit 1914-1991, k=30 Correlation (Median==Obs)=0.91
Summary
• Nonparametric techniques (K-NN framework in
particular) provides a flexible alternative to Parametric
methods for
Ensemble forecasting/Downscaling
• Easy to implement, parsimonious extension to
multivariate situations. Water managers can utilize the
improved forecasts in operations and seasonal planning
• No prior assumption to the functional form is needed.
Can capture nonlinear/non-Gaussian features readily.