Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Stochastic Nonparametric Framework for Ensemble Hydrologic Forecast and Downscaling Balaji Rajagopalan Department of Civil and Environmental Engg. University of Colorado Boulder, CO RSMAS/U Miami – Spring 2003 Acknowledgements Upmanu Lall (Lamont-Doherty Earth Observatory) Columbia University, NY James Prairie, Katrina Grantz, Somkiat Apipattanavis Subhrendu Gangopadhyay, Martyn Clark CIRES/University of Colorado, Boulder, CO David Yates NCAR/University of Colorado, Boulder, CO A Water Resources Management Perspective Inter-decadal Decision Analysis: Risk + Values T • Facility Planning i – Reservoir, Treatment Plant Size m e • Policy + Regulatory Framework Climate – Flood Frequency, Water Rights, 7Q10 flow H o r i z o n • Operational Analysis – Reservoir Operation, Flood/Drought Preparation • Emergency Management – Flood Warning, Drought Response Data: Historical, Paleo, Scale, Models Hours Weather Ensemble Forecast (or Scenarios generation) • Scenarios (synthetic sequences) of hydroclimate are simulated for various decision making situations Reservoir operations (USBR/Riverware) Erosion Prediction (USDA/WEPP) Reservoir sizing (Flood frequency) • Given [Yt] t = 1,2,…,N hydroclimate time series (e.g. daily weather variables, streamflow, etc.) Parametric models are fit (probability density functions – Gamma, Exponential etc.) Time series Models (Auto Regressive Models) The Problem • Ensemble Forecast/Stochastic Simulation /Scenarios generation – all of them are conditional probability density function problems f yt y , y ,..., y t 1 t 2 t p f ( yt , yt 1 , yt 2 ,..., yt p ) f ( y , y , y ,..., y ) dy • Estimate conditional PDF and simulate (Monte Carlo, or Bootstrap) t t 1 t 2 t p t All India Monthly Rainfall Parametric Models • Periodic Auto Regressive model (PAR) – Linear lag(1) model y, = + 1 y – 1 – – 1 + , – Stochastic Analysis, Modeling, and Simulation (SAMS) (Salas, 1992) • Data must fit a Gaussian distribution • Expected to preserve – mean, standard deviation, lag(1) correlation – skew dependant on transformation – gaussian probability density function Parametric Models - Drawbacks • Model selection / parameter estimation issues Select a model (PDFs or Time series models) from candidate models Estimate parameters • Limited ability to reproduce nonlinearity and nonGaussian features. All the parametric probability distributions are ‘unimodal’ All the parametric time series models are ‘linear’ Parametric Models - Drawbacks • Models are fit on the entire data set Outliers can inordinately influence parameter estimation (e.g. a few outliers can influence the mean, variance) Mean Squared Error sense the models are optimal but locally they can be very poor. Not flexible • Not Portable across sites Nonparametric Methods • Any functional (probabiliity density, regression etc.) estimator is nonparametric if: It is “local” – estimate at a point depends only on a few neighbors around it. (effect of outliers is removed) No prior assumption of the underlying functional form – data driven Nonparametric Methods • Kernel Estimators (properties well studied) • Splines • Multivariate Adaptive Regression Splines (MARS) • K-Nearest Neighbor (K-NN) Bootstrap Estimators • Locally Weighted Polynomials (K-NN Polynomials) K-NN Philosophy • Find K-nearest neighbors to the desired point x • Resample the K historical neighbors (with high probability to the nearest neighbor and low probability to the farthest) Ensembles • Weighted average of the neighbors Mean Forecast • Fit a polynomial to the neighbors – Weighted Least Squares – Use the fit to estimate the function at the desired point x (i.e. local regression) • Number of neighbors K and the order of polynomial p is obtained using GCV (Generalized Cross Validation) – K = N and p = 1 Linear modeling framework. K-Nearest Neighbor Estimators k/n k/n f NN (x) = = Vk (x) c d rk d (x) A k-nearest neighbor density estimate n x x 1 i f GNN(x) = d K rk (x)n i1 rk(x) f A conditional k-nearest neighbor density estimate GNN (x | D) f (x, D) / f (D) (r (x, D)n)-1 n (x, D) (x , D ) k i i K r (x, D) n x x i 1 k 1 i (r (x)n) K k i 1 rk (x) f(.) is continuous on Rd, locally Lipschitz of order p k(n) =O(n2p/(d+2p)) A k-nearest neighbor ( modified Nadaraya Watson) conditional mean estimate K (u) 0, uK (u)du 0, u 2K (u)du D * D n x K i r (D*)i i1 k mˆ (x | D D*) n D * Di K i1 rk (D*) Classical Bootstrap (Efron): Given x1, x2, …... xn are i.i.d. random variables with a cdf F(x) n ˆ F ( x ) Construct the empirical cdf I (xi x) / n i1 Draw a random sample with replacement of size n from Fˆ (x) Moving Block Bootstrap (Kunsch, Hall, Liu & Singh) : Resample independent blocks of length b<n, and paste them together to form a series of length n k-Nearest Neighbor Conditional Bootstrap (Lall and Sharma, 1996) Construct the Conditional Empirical Distribution Function: n Fˆ (x | D*) I (xi x)I (Di Br (D*))K (i) / k i1 k ˆ Draw a random sample with replacement from F (x | D*) A time series from the model xt+1 = 1 - 4(xt - 0.5)2 1 k-nearest neighborhoods A and B for xt=x*A and x*B respectively 0.75 x t 0.5 1 1 State 4 S 2 0.75 3 A • 0 25 50 75 time 100 B 3 • Di • •D3 D1 • D2 • • • 0 4 Values of x t 1 0.25 State 2 3 0.5 125 xt+1 2 0.25 x* A x* B 1 0 0 0.25 0.5 0.75 xt Logistic Map Example 4-state Markov Chain discretization 1 Define the composition of the "feature vector" Dt of dimension d. (1) Dependence on two prior values of the same time series. Dt : (xt-1, xt-2) ; d=2 (2) Dependence on multiple time scales (e.g., monthly+annual) Dt: (xt-1, xt-21, .... xt-M11; xt-2, xt-22, ..... xt-M22) ; d=M1+M2 (3) Dependence on multiple variables and time scales Dt: (x1t-1, .... x1t-M11; x2t, x2t-2, .... x2t-M22); d=M1+M2+1 Identify the k nearest neighbors of Dt in the data D1 ... Dn Define the kernel function ( derived by taking expected values of distances to each of k nearest neighbors, assuming the number of observations of D in a neighborhood Br(D*) of D*; r0, as n , is locally Poisson, with rate (D*)) K(j) for the jth nearest neighbor 1/j k j =1...k 1/j i 1 Selection of k: GCV, FPE, Mutual Information, or rule of thumb (k=n0.5) Applications to date…. • Monthly Streamflow Simulation Space and time disaggregation of monthly to daily streamflow • Monte Carlo Sampling of Spatial Random Fields • Probabilistic Sampling of Soil Stratigraphy from Cores • Hurricane Track Simulation •Multivariate, Daily Weather Simulation • Downscaling of Climate Models •Ensemble Forecasting of Hydroclimatic Time Series • Biological and Economic Time Series • Exploration of Properties of Dynamical Systems • Extension to Nearest Neighbor Block Bootstrapping -Yao and Tong K-NN Local Polynomial K-NN Algorithm k N 90 9 yt * yt-1 Residual Resampling yt = yt* + et* e t* yt * yt-1 Applications K-NN Bootstrap • Weather Generation – Erosion Prediction • Precipitation/Temperature Downscaling Local-Polynimial + K-NN residual bootstrap • Ensemble Streamflow forecasting Truckee-Carson basin, NV • Ensemble forecast from categorical probabilistic forecast Local Polynomial • Flood Frequency Analysis • Salinity Modeling in the Upper Colorado Basin Is a 2 State Markov Chain Adequate ? Can the lag-0 and lag-1 dependence across variables be easily preserved ? Are multi-scale statistics preserved by the Daily Model ? Our current implementation uses moving window seasons January-March Daily Weather, Salt Lake City - Wet Days January-March Daily Weather, Salt Lake City - Dry Days mean wet spell length fraction of wet days standard deviation of wet spell length longest wet spell length mean dry spell length fraction of dry days standard deviation of dry spell length longest dry spell length Mean seasonal precipitation Variance of seasonal precipitation Annual Mean Annual Variance k-nn daily Simulations of Precipitation - Performance in terms of aggregated statistics SRAD and TMX TMX and TMN TMN and P SRAD and TMN SRAD and DPT TMX and DPT TMX and P TMN and DPT DPT and P lag 0 cross correlation, for selected daily variables MAR-1 simulations SRAD and TMX SRAD and TMN SRAD and DPT TMX and DPT TMX and P k-nn simulations TMX and TMN TMN and P TMN and DPT DPT and P Mean Annual Erosion in Kg/Sq. m. Location CLIGEN BOOTCLIM BOOTCLIM-scramble Idaho 0.6 0.1 0.5 Oregon 5.2 1.4 3.2 Arizona 0.6 0.3 0.6 Impact of Improper Dependence Structure on Erosion Estimated from Physical Model (WEPP) using Simulated Weather Differences in CLIGEN/BOOTCLIM-scramble vs BOOTCLIM are due to inability vs ability to preserve cross-correlations between temperature and precipitation (and hence rain/snow) Region Figure 1. Map depicting the 21-state area of interest in this study. The numbers indicate stations grouped by region. The two dark filled squares in the east are Stations 114198 and 112140 in Region 4, and the two dark squares in the west are Stations 52281 and 52662 in Region 7. Map depicting the 21-state area of interest in this study. The numbers indicate stations grouped by region. The two dark filled squares in the east are Stations 114198 and 112140 in Region 4, and the two dark squares in the west are Stations 52281 and 52662 in Region 7. Temperature Mean and Standard Deviations Precipitation Statistics Lag Correlations Spatial Correlations Downscaling Concept Horizontal resolution ~ 200 km [scale mis-match] Area of interest ~500 to 2000 km2 • Purpose: Downscale global-scale atmospheric forecasts to local scales in river basins (e.g., individual stations). Downscaling Approach • Identify outputs from the global-scale Numerical Weather Prediction (NWP) model that are related to precipitation and temperature in the basins of interest – – – – • Geo-potential height, wind, humidity at five pressure levels etc. Various surface flux variables Computed variables such as vorticity advection, stabilitiy indices, etc. Variables lagged to account for temporal phase errors in atmospheric forecasts. Use NWP outputs in a statistical model to estimate precipitation and temperature for the basins – – – – – – Multiple linear regression K-nn NWS bias-correction methodology Local polynomial regression Canonical Correlation Analysis Artificial Neural Networks Multiple Linear Regression (MLR) Approach • • • • • • Multiple linear Regression with forward selection Y = a0 + a1X1 + a2X2 + a3X3 . . . + anXn + e Use cross-validation procedures for variable selection – typically less than 8 variables are selected for a given equation A separate equation is developed for each station, each forecast lead time, and each month. Stochastic modeling of the residuals in the regression equation is done to provide ensemble time series The ensemble members are subsequently shuffled to reconstruct the observed spatio-temporal covariability Regression coefficients are estimated from the period of the NCEP 1998 MRF hindcast (1979-2001) K-nn Approach - Methodology • Get all the NCEP MRF output variables within a 14 day window (7 days, lag+lead) centered on the current day •Perform EOF analysis of the climate variables and retain the first few leading Pcs, that capture most of the variance •~6 Pcs capture about 90% of the variance •The PC space leading Pcs becomes the “feature vector” •Project the forecast climate variable of the current day on to the PC space – i.e. The “feature vector” • Select the “nearest” neighbor to the “feature vector” in the PC space – hence, a day from the historical record. Snowmelt Dominated Cle Elum Rainfall Dominated 526km2 East Fork of the Carson Animas Snowmelt Dominated Snowmelt Dominated 922km2 BASINS 3626km2 Alapaha 1792km2 Results • RPSS – precipitation and maximum temperature, MLR and KNN Ranked Probability Skill Score (RPSS) = 1 – RPSSf / RPSSc • Spatial autocorrelation – precip, max temp, MLR and KNN • Lag-1 autocorrelation – precip, max temp, MLR and KNN MLR Approach – RPSS, PRCP-Jan Knn Approach – RPSS, PRCP-Jan MLR Approach – RPSS, PRCP-July Knn Approach – RPSS, PRCP-July MLR Approach – RPSS, TMAX-Jan Knn Approach – RPSS, TMAX-Jan MLR Approach – RPSS, TMAX-Jul Knn Approach – RPSS, TMAX-Jul MLR – Spatial Cor., Unshuffled CO4734-CO1609 Knn Approach – Spatial Cor., CO4734-CO1609 MLR – Spatial Cor., Unshuffled GA0140-GA2266 Knn Approach – Spatial Cor., GA0140-GA2266 MLR – Lag-1, Unshuffled CO7017 MLR – Lag-1, Shuffled CO7017 Knn Approach – Lag-1, CO7017 Knn Approach – Reliability Diagrams (1day forecast), CO7017 Knn Approach – Lag-1, CO7017 Knn Approach – Lag-1, CO7017 Knn Approach – Lag-1, CO7017 Conclusion: Comparison of MLR and KNN K-NN method exhibits comparable to better skills than the MLR in downscaling daily precipitation/temperature • The K-NN provides a flexible and parsimonious framework for downscaling. • The K-NN approach can be improved to better capture the temporal dependence and also to generate sequences not seen in history. • Hydrologic Forecasting • • • • Conditional Statistics of Future State, given Current State Current State: Dt : (xt, xt-, xt-2 , …xt-d1, yt, yt- , yt-2, …yt-d2) Future State: xt+T Forecast: g(xt+T) = f(Dt) – where g(.) is a function of the future state, e.g., mean or pdf – and f(.) is a mapping of the dynamics represented by Dt to g(.) – Challenges • Composition of Dt • Identify g(.) given Dt and model structure – For nonlinear f(.) , Nonparametric function estimation methods used • • • • K-nearest neighbor Local Regression Regression Splines Neural Networks Ensemble Forecast of Spring Streamflows on the Truckee and Carson Rivers Study Area WINNEMUCCA LAKE (dry) NEVADA CALIFORNIA PYRAMID LAKE Nixon Stillwater NWR Derby Dam STAMPEDE Reno/Sparks INDEPENDENCE DONNER Fernley Newlands Project Farad MARTIS Carson City Ft Churchill Tahoe City LAKE TAHOE Fallon TRUCKEE RIVER BOCA PROSSER Truckee TRUCKEE CANAL CARSON RIVER LAHONTAN CARSON LAKE Motivation • USBR needs good seasonal forecasts on Truckee and Carson Rivers • Forecasts determine how storage targets will be met on Lahonton Reservoir to supply Newlands Project Truckee Canal Outline of Approach • Climate Diagnostics To identify large scale features correlated to Spring flow in the Truckee and Carson Rivers • Ensemble Forecast Stochastic Models conditioned on climate indicators (Parametric and Nonparametric) • Application Demonstrate utility of improved forecast to water management Data – 1949-1999 monthly averages • • • • Streamflow at Ft. Churchill and Farad Precipitation (regional) Geopotential Height 500mb (regional) Sea Surface Temperature (regional) Annual Cycle of Flows Fall Climate Correlations Carson Spring Flow 500 mb Geopotential Height Sea Surface Temperature Winter Climate Correlations Carson Spring Flow 500 mb Geopotential Height Sea Surface Temperature Winter Climate Correlations Truckee Spring Flow 500 mb Geopotential Height Sea Surface Temperature Climate Composites High-Low Flow Sea Surface Temperature Vector Winds Precipitation Correlation Geopotential Height Correlation SST Correlation Flow - NINO3 / Geopotential Height Relationship Regression Fit Linear Fit Local Fit Precip Fit The Forecasting Model • Forecast Spring Runoff in Truckee and Carson Rivers using Winter Precipitation and Climate Data Indices (Geopotential height index and SST index). • Linear Regression: - can capture only linear relationship - inability to generate ensembles - Symmetric uncertainity bands • Modified K-NN Method: – Uses Local Polynomial for the mean forecast – Bootstraps the residuals for the ensemble Wet Years: 1994-1999 1994 1995 1996 1997 1998 1999 1994 1995 1996 1997 1998 1999 Precipitation 1994 1995 1996 1994 1995 1996 1997 1997 1998 1998 1999 1999 Precipitation and Climate • Overprediction w/o Climate (1995, 1996) – Might release water for flood control– stuck in spring with not enough water • Underprediction w/o Climate (1998) Dry Years: 1987-1992 1987 1988 1989 1990 1991 1992 1987 1988 1989 1990 1991 1992 1987 1988 1989 1987 1988 1989 Precipitation 1990 1991 1990 Precipitation and Climate • Overprediction w/o Climate (1998, 991) – Might not implement necessary drought precautions in sufficient time 1991 1992 1992 Fall Prediction w/ Climate 1994 1995 1996 1997 1998 1999 1994 1995 1996 1997 1998 1999 Wet Years 1987 1988 1989 1987 1988 1989 1990 1990 1991 1991 1992 1992 Dry Years • Fall Climate forecast captures whether season will be above or below average • Results comparable to winter forecast w/o climate Simple Water Balance St = St-1 + It - Rt • St-1 is the storage at time ‘t-1’, It is the inflow at time ‘t’ and Rt is the release at time ‘t’. • Method to test the utility of the model • Pass Ensemble forecasts (scenarios) for It • Gives water managers a quick look at how much storage they will have available at the end of the season – to evluate decision strategies For this demonstration, • Assume St-1=0, Rt= 1/2(avg. Inflowhistorical) Water Balance 1995 Storage 1995 K-NN Ensemble PDF Historical PDF Future Work • Stochastic Model for Timing of the Runoff Disaggregate Spring flows to monthly flows. • Statistical Physical Model Couple PRMS with stochastic weather generator (conditioned on climate info.) • Test the utility of these approaches to water management using the USBR operations model in RiverWare Region / Data 6 rainfall stations - Nakhon Sawan, Suphan Buri, Lop Buri, Kanchana Buri, Bangkok, and Don Muang 3 streamflow stations (Chao Phaya basin) - Nakhon Sawan, Chai Nat, Ang-Thong 5 temperature stations - Nakhon Sawan, Lop Buri, Kanchana Buri, Bangkok, Don Muang Large Scale Climate Variables NCEP-NCAR Re-analysis data (http://www.cdc.noaa.gov) Composite Maps of High rainfall Pre 1980 Post 1980 Composite Maps of Low rainfall Pre 1980 Post 1980 Example Forecast for 1997 Conditional Probabilities from historical data (Categories are at Quantiles) Categorical ENSO forecast Conditional flow probabilites using Total Probability Theorem La Nina Neu El Nino Flow Low Neu 0.000 0.538 0.320 0.440 0.385 0.538 La Nina 0.2 Low 0.3 High 0.462 0.240 0.077 Neu El Nino 0.2 0.6 Neu 0.52 High 0.19 Ensemble Forecast from Categorical Probabilistic forecasts • If the categorical probabilistic forecasts are P1, P2 and P3 then – Choose a category with the above probabilities – Randomly select an historical observation from the chosen category – Repeat this a numberof times to generate ensemble forecasts Ensemble Forecast of Thailand Streamflows – 1997 Initial Study Area: 6 reservoirs in # # # # # # # # # # # # # # # # # # # # # # Jaguaribe-Metropolitano Hidrossytem # # # # # # # # # # # # # # # # # # # $ %U T # # # # # # # # # # # ## # # # # # # # # # # # # # T $ T $ # #% U# T $ # # # # # # U % # # Reservatório T 0- 54 $ T 54 -148 $ # # # # # # #% U ##$ T# # # # # # $$T T # # # Fortaleza % U S # # # # # # # # # # # # # # ## S # T $ # # S # S # # # # T $ # # # # # # # # # # # # # # # # T# %U $ # # # # ## # Oros Reservoir # # # # # # # # # # # # # # # # # # # # # # # # ## # N # # # # # # # # 1001 -4725 4726 -9705 # 9706 -21909 # 21910 -48163 # 48164 -465319 Demanda U 0.3 % U 0.3- 0.57 % U 0.57- 4 % % U 4- 5.11 % U 5.11- 9.14 S NódePassagem # Link Canais.shp Rios.shp Açudespol.shp Bacia.shp # # # 175 -480 # # # # T $ 480 -1940 T $ População.dbf # % U % U #% U # T $ 148 -175 W E # # S Jaguaribe 80% irrigation 20% municipal Mainly in Aug To November Metropolitan 80% Municipal 20% Irrigation Uniform distribution Over the year 90 Marginal 90% 80 70 Per90% Per75% 60 Per50% Marginal 75% 50 Per25% Per10% 40 Obs 30 Marginal 50% 20 10 Marginal 25% Marginal 10% 0 1993 1994 1995 1996 1997 1998 1999 2000 Oros Annual Flow Forecast from previous July – model fit 1914-1991, k=30 Correlation (Median==Obs)=0.91 Summary • Nonparametric techniques (K-NN framework in particular) provides a flexible alternative to Parametric methods for Ensemble forecasting/Downscaling • Easy to implement, parsimonious extension to multivariate situations. Water managers can utilize the improved forecasts in operations and seasonal planning • No prior assumption to the functional form is needed. Can capture nonlinear/non-Gaussian features readily.