Download Linear Regression Models - Civil, Environmental and Architectural

Bayesian Hierarchical Modeling of Hydroclimate Problems Balaji Rajagopalan Department of Civil, Environmental and Architectural Engineering And Cooperative Institute for Research in Environmental Sciences (CIRES) University of Colorado Boulder, CO, USA Bayes by the Bay Conference, Pondicherry January 7, 2013 Co-authors & Collaborators       Upmanu Lall and Naresh Devineni – Columbia University, NY Hyun-Han Kwon, Chonbuk National University, South Korea Carlos Lima, Universidade de Brasila, Brazil Pablo Mendoza James McCreight & Will Kleiber – University of Colorado, Boulder, CO Richard Katz – NCAR, Boulder, CO NSF, NOAA, USBReclamation and Korean Science Foundation Outline      Bayesian Hierarchical Modeling  Introduction from GLM Hydroclimate Applications  BHM  Contrast with near Bayesian models currently in vogue Stochastic Rainfall Generator  BHM (Lima and Lall, 2009, WRR)  Latent Gaussian Process Model (Kleiber et al., 2012, WRR) Riverflow Forecasting (Kwon et al., 2009, Hydrologic Sciences)  Seasonal Flow  Flow extremes Paleo Reconstruction of Climate (Devineni and Lall, 2012, J. Climate) Linear Regression Models Suppose the model relating the regressors to the response is In matrix notation this model can be written as Linear Regression Models where Linear Regression Models We wish to find the vector of least squares estimators that minimizes: The resulting least squares estimate is 12-1 Multiple Linear Regression Models 12-1.4 Properties of the Least Squares Estimators Unbiased estimators: Covariance Matrix: 12-1 Multiple Linear Regression Models 12-1.4 Properties of the Least Squares Estimators Individual variances and covariances: In general, Generalized Linear Model (GLM) Bayesian Perspective • Linear Regression is not appropriate • when the dependent variable y is not Normal • Transformations of y to Normal are not possible • Several situations (rainfall occurrence; number of wet/dry days; etc.) • Hence, GLM • Linear model is fitted to a ‘suitably’ transformed variable of y • Linear model is fitted to the ‘parameters’ of the assumed distribution of y Likelihood Generalized Linear Model (GLM) Bayesian Perspective Exponential family PDF, parameters All distributions Arise from this Normal, Exponential, Gamma Binomial, Poisson, etc • Noninformative prior on β • Assuming Normal distribution for Y, g (.) is identity  Linear Regression Generalized Linear Model (GLM) Bayesian Perspective • Log and logit – Canonical Link Functions Generalized Linear Model (GLM) Bayesian Perspective Generalized Linear Model (GLM) Bayesian Perspective Generalized Linear Model (GLM) Bayesian Perspective Inverse Chi-Square Generalized Linear Model (GLM) Bayesian Perspective Summary • GLM is hierarchical • Specific Distribution • Link function • With a simple step – i.e., Providing priors and computing likelihood/posterior  BHM • Assuming Normal distribution of dependent variable and uninformative priors • BHM collapses to a standard Linear Regression Model • Thus BHM is a generalized framework • Uncertainty in the model parameters and model Structure are automatically obtained. Generalized Linear Model (GLM) Example - Bayesian Hierarchical Model • Hard to sample from posterior - Use MCMC Stochastic Weather Generators Precipitation Occurrence, Rain Onset Day (Lima and Lall, 2009) Precipitation Occurrence and Amounts (Kleiber, 2012) • Users most interested in sectoral/process outcomes (streamflows, crop yields, risk of disease X, etc.) • Need for a robust spatial weather generator Historical Synthetic series – Conditional on Climate Information Data 28.5 23.1 29.1 25.8 … … … … … … … … … … … … 12.4 10.2 11.4 9.7 … Process model Frequency distribution of outcomes Need for Downscaling   Seasonal climate forecasts and future climate model projections often have coarse scales:  Spatial: regional  Temporal: seasonal, monthly Process models (hydrologic models, ecological models, crop growth models) often require daily weather data for a given location  There is a scale mismatch!  Stochastic Weather Generators can help bridge this scale gap. Precipitation Occurrence     504 stations in Brazil (Latitude & Longitude shown in figure) Lima and Lall (WRR, 2009) Modeling of rainfall occurrence (0 = dry, 1 = rain, P = 0.254mm threshold) using a probabilistic model (logistic regression): Modeling Occurrence at a Site where yst(n) is a non-homegeneous Bernoulli random variable for station s, day n and year t, being either 1 for a wet state or 0 for a dry state. • pst(n) is the rainfall probability for station s and day n of year t. The seasonal cycle is modeled through Fourier harmonics: Results from Site #3 Outlier? Bayesian Hierarchical Model (BHM) But rainfall occurrence is correlated in space – how to model? - partial BHM • Shrinks paramters towards a common mean, reduce uncertainty since we are use more information to estimate model parameters; • Parameter uncertainties are fully accounted during simulations Bayesian Hierarchical Model (BHM) Likelihood Function Posterior Distribution – Bayes theorem MCMC to obtain posterior distribution Results for Station #3 – Yearly Probability of Rainfall Results Station #3 - Average Probability of Rainfall 1 T As   a st T t 1 1 T Bs   bst T t 1 1 T C s   c st T t 1 Ps (n)  log it 1 ( As  Bs sin( n)  C s cos(n)) Clusters on average day of max probability Psmas  logit 1 A  s Bs2  Cs2  Day of Max Probability of Rainfall • Max Probability of rainfall correlated With climate variables – ENSO, etc. • Characterize rainfall ‘onset’ • Prediction of ‘onset’ • Lima and Lall (2009, WRR) Max Probability of Rainfall Space-time Precipitation Generator Latent Gaussian Process (Kleiber et al., 201, WRR) Latent Gaussian Process  Fit a GLM for Precipitation Occurrence and amounts at each location independently      Occurrence  logistic regression-based Amounts  Gamma link function Spatial Process to smooth the GLM coefficients in space Almost Bayesian Hierarchical Modeling Alpha, gamma – shape and scale parameter of Gamma Occurrence Model Latent Gaussian Process Latent Gaussian Process  Parameter Estimation  MLE, two step GLM + Latent Gaussian Process Kleiber et al. (2012) For Max and Min Temperature Models Conditioned on Precipitation Model - Using Latent Gaussian Process Kleiber et al. (2013, Annals of App. Statistics, in press) Outline      Bayesian Hierarchical Modeling  Introduction from GLM Hydroclimate Applications  BHM  Contrast with near Bayesian models currently in vogue Stochastic Rainfall Generator  BHM (Lima and Lall, 2009, WRR)  Latent Gaussian Process Model (Kleiber et al., 2012, WRR) Riverflow Forecasting (Kwon et al., 2009, Hydrologic Sciences)  Seasonal Flow  Flow extremes Paleo Reconstruction of Climate (Devineni and Lall, 2012, J. Climate) Seasonal average and maximum Streamflow Forecasting (Kwon et al.,2009, Hydrologic Sciences) Streamflow Forecasting at Three Gorges Dam Identify Predictors • Correlate seasonal streamflow with large scale climate variables from preceding seaons • JJA flow with MAM climate • Select regions of strong (Grantz et al., 2005) correlation • predictors Yichang hydrological station (YHS) Streamflow Forecasting at Three Gorges Dam a) SST Vs Mean JJA Flow(1970-2001) c) SST Vs Peak Flow(1970-2001) b) Snow Vs Mean JJA Flow(1970-2001) d) Snow Vs Peak Flow(1970-2001) Zone selected Climate predictors SST1 -10°N~10°N 150°E~180°E SST2 -20°N~0° 75°E~110°E SST3 10°N~30° N 130°E~150°E Snow -10°N~0°N 200°E~230°E †: Significant at 95% confidence; ‡: Significant at 90% confidence JJA Seasonal Flow -0.27 ‡ 0.51 † 0.38 † 0.42 † Annual Peak Flow -0.28 ‡ 0.20 † 0.45 † 0.42 † BHM for Seasonoal Streamflow  Model Data showed mild nonlinearity  Quadratic terms in the model is distributed as half-Cauchy with parameter 25  “mildly informative” Gelman (2006, Bayesian Analysis) MCMC is used to obtain the posterior distributions 400 400 400 Streamflow Forecasting at Three Gorges Dam Histogram of tau Histogram of Beta1 300 400 Histogram of Beta2 300 400 300 400 Histogram of tau Histogram of Beta1 Histogram of Beta2 200 300 200 300 200 300 100 200 100 200 100 200 10000 0 0 400 10 20 10 30 20 30 40 40 10001.8 0 1.8 400 2 2.2 2 2.4 2.2 Histogram of Beta3 2.6 2.4 2.6 0 100 -0.4 0 -0.4 400 Histogram of Beta4 200 300 200 300 100 200 100 200 100 200 Description 0 Interceptor -0.4 -0.2 SST1 SST12 SST2 0 Beta1 0.2 Beta2 Beta3 0 100 -0.4 0.4 0.6 0.4 Mean 0 2.273 0.6 -0.4 -0.111 0.130 Node 0.2 0.4 0.4 -0.2 0 0.2 Standard Dev. -0.2 0.074 0 0.050 0.048 0.2 0.4 0.6 0 100 -0.4 2.50% 0.4 2.129 0.6 -0.209 0.035 -0.2 0 0.2 Median 0 -0.4 2.273 0 -0.2 -0.111 0.130 0.4 0.6 0.6 97.50% 0.2 2.420 0.6 0.4 -0.011 0.224 Beta4 0.276 0.051 0.176 0.276 0.377 Beta5 Snow Performance Measure Seasonal (JJA) 0.083 R 0.802 0.025 CoE 0.643 0.034 IoA 0.886 0.083 Bias 0.001 0.132 RMSE 0.231 2 0.6 Histogram of Beta5 200 300 0.2 0 0.2 300 400 Histogram of Beta3 0 -0.2 0 Histogram of Beta5 300 400 -0.2 -0.2 Histogram of Beta4 300 400 0 100 -0.4 Predictors 2, 3, 4 and 5 Show tighter Bounds Uncertainty in predictors (i.e. model) is obtained and propogated in the forecacsts You can use PCA or stepwis etc. to reduce the number of predictors (this can be crude) Streamflow Forecasting at Three Gorges Dam Description Node Mean Standard Dev. 2.50% Median 97.50% Interceptor SST1 Beta1 Beta2 Beta3 2.273 -0.111 0.130 0.074 0.050 0.048 2.129 -0.209 0.035 2.273 -0.111 0.130 2.420 -0.011 0.224 Beta4 0.276 0.051 0.176 0.276 0.377 Beta5 Snow Performance Measure Seasonal (JJA) 0.083 R 0.802 0.025 CoE 0.643 0.034 IoA 0.886 0.083 Bias 0.001 0.132 RMSE 0.231 SST12 SST2 2 Maximum Seasonal Streamflow Extreme Value Analysis – Floods (Kwon et al.,2010, Hydrologic Sciences) Ann Max Flow American River at Fair Oaks - Ann. Max. Flood 180,000 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 0 1900 1920 1940 1960 Year 100 yr flood estimated from 21 & 51 yr moving windows 1980 2000 Floods   The time varying (nonstationary) nature of hydrologic (flood) frequency (few examples)  Climate Variability and Climate Change  Climate Mechanisms that lead to changes in flood statistics Adaptation Strategy  ‘Adaptive’ Flood Risk Estimation  Nonstationary Flood Frequency Estimation  Seasonal to Inter-annual Forecasts & Climate Change  Improved Infrastructure Management  Summary / Climate Questions and Issues related to Hydrologic Extremes Flood Variance given DJF NINO3 and PDO Flood mean given DJF NINO3 and PDO NINO3 NINO3 PDO PDO Derived using weighted local regression with 30 neighbors Correlations: Log(Q) vs DJF NINO3 -0.34 Jain & Lall, 2000 vs DJF PDO -0.32 Atmospheric River generates flooding CZD Russian River, CA Flood Event of 18-Feb-04 Slide from Paul Neiman’s talk Russian River flooding in Monte Rio, California 18 February 2004 IWV (cm) GPS IWV data from near CZD: 14-20 Feb 2004 Atmospheric river Cloverdale 10” rain at CZD in ~48 hours IWV (inches) IWV (cm) Bodega Bay photo courtesy of David Kingsmill Flood Estimation Under Nonstationarity  Significant interannual/interdecadal variability of floods    Stationarity assumptions (i.i.d) are invalid Large scale climate features in the OceanAtmosphere-Land system orchestrate floods at all time scales Need tools that can capture the nonstationarity   Incorporate large scale climate information Year-to-Year time scale (Climate Variability)   Flood mitigation planning, reservoir operations Interdecadal time scale (Climate Variability and Change)  Facility design, planning and management Exponential (light, shape = 0), Pareto (heavy, shape > 0) and Beta (bounded, shape < 0) Generalized extreme value (GEV) can be used to characterize extreme flow distribution (Katz et al., 2002) 1 /        z    G( z )  exp  1              3 Model parameters Location parameter:  (where distribution is centered) Scale parameter:   0 (spread of the distribution)  Shape parameter: (behavior of distribution tail) Gumbell, Frischet, Weibull “Unconditional” GEV (Coles 2001) Incorporate covariates into GEV parameters to account for nonstationarity Could apply to any parameter, but location is most intuitive:    0  1 x GLM Framework Hierarchical Bayesian Modeling  natural and attractive alternative GEV fit using extRemes toolkit in R (Gilleland and Katz, 2011) http://www.isse.ucar.edu/extremevalues/extreme.html (Gilleland and Katz 2005) Streamflow Forecasting at Three Gorges Dam a) SST Vs Mean JJA Flow(1970-2001) c) SST Vs Peak Flow(1970-2001) b) Snow Vs Mean JJA Flow(1970-2001) d) Snow Vs Peak Flow(1970-2001) Zone selected Climate predictors SST1 -10°N~10°N 150°E~180°E SST2 -20°N~0° 75°E~110°E SST3 10°N~30° N 130°E~150°E Snow -10°N~0°N 200°E~230°E †: Significant at 95% confidence; ‡: Significant at 90% confidence JJA Seasonal Flow -0.27 ‡ 0.51 † 0.38 † 0.42 † Annual Peak Flow -0.28 ‡ 0.20 † 0.45 † 0.42 † BHM for Seasonal Maximum Flow  Model Data showed mild nonlinearity  Quadratic terms in the model is distributed as half-Cauchy with parameter 25  “mildly informative” Gelman (2006, Bayesian Analysis) MCMC is used to obtain the posterior distributions of Beta2 Streamflow Forecasting at ThreeHistogram Gorges Dam 400 400 400 Histogram of tau Histogram of Beta1 300 400 300 400 300 400 Histogram of tau Histogram of Beta1 Histogram of Beta2 200 300 200 300 200 300 100 200 100 200 100 200 0 1000 0.5 1 1.5 0 1003 3.5 4 4.5 5 5.5 0 -0.5 100 0 0.5 1 1.5 0 0 400 0.5 1 1.5 0 3 400 3.5 4 4.5 5 5.5 0 -0.5 0 0.5 1 1.5 Histogram of Beta3 Histogram of Beta4 300 400 300 400 Histogram of Beta3 Histogram of Beta4 200 300 200 300 100 200 100 200 0 -0.5 100 Predictors 3 and 5 Show tighter Bounds 0 0.5 1 1.5 0 -0.5 100 0 0.5 1 1.5 Zone selected Climate predictors 0 0 -10°N~10°N -0.5 0 SST1 0.5 1 1.5 -0.5 0 0.5150°E~180°E 1 1.5 SST2 -20°N~0° 75°E~110°E SST3 10°N~30° N 130°E~150°E Snow -10°N~0°N 200°E~230°E †: Significant at 95% confidence; ‡: Significant at 90% confidence JJA Seasonal Flow -0.27 ‡ 0.51 † 0.38 † 0.42 † Annual Peak Flow -0.28 ‡ 0.20 † 0.45 † 0.42 † Streamflow Forecasting at Three Gorges Dam Description Interceptor Node Beta1 Mean 4.174 Standard Dev. 0.195 2.50% 3.791 Median 4.171 97.50% 4.548 SST12 SST3 Beta2 Beta3 Beta4 0.198 0.699 -0.089 0.119 0.148 0.079 -0.055 0.410 -0.264 0.203 0.706 -0.085 0.423 0.986 0.053 Beta5 Snow2 Performance Measure Annual Peak Flow 0.302 0.098 0.091 0.310 0.473 R 0.729 CoE 0.531 IoA 0.828 Bias -0.001 RMSE 0.602 SST32 Nonstationary Flood Risk at Three Gorges Dam Ann Max Flow Dynamic 50-yea flood from BHM and Stationary 50-year flood 180,000 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 0 1900 1920 1940 1960 Year 1980 2000 Conditional (nonstationary) Extremes in Water Quality (Towler et al., 2009, WRR) Case study location: PWB Towler et al. (2009) “Forest to Faucet” - Rain -Runoff -Storage (2 reservoirs) -Chemical Disinfection (Cl2, NH3) -Distribution -No physical filtration (“unfiltered”) Case study location: PWB Precipitation events High Flows Exceedances (SWTR criterion: turbidity < 5 NTU) Back-up groundwater source (Pumping $$) GEV Model Uncond CondT CondR CondRT CondR+T β0 β0+β1T β0+β1R β0+β1(RT) β0+β1R+β2T β0 (se) 1924 (120) 1930 (1000) 1739 (410) 611.4 (150) 1911 (880) β1 (se) - -0.8914 (27) 61.08 (32) 3.716 (0.36) 141.2 (14) β2 (se) - - - - -36.45 (24) σ (se) 1245 (84) 1220 (81) 1246 (160) 923.7 (69) 968.5 (74) ξ (se) -0.02246 (0.065) -0.01286 (0.065) -0.06180 (0.084) 0.07009 (0.082) 0.01619 (0.075) llh -1289 -1289 -1274 -1250 -1250 K 1 2 2 2 3 AIC 2580 2582 2552 2504 2506 M 0* - Uncond Uncond Uncond CondR D - 0 30 78 48 Sig** - No (0.635) Yes (0.000) Yes (0.000) Yes (0.000) ρ*** - - 0.5516 0.5989 0.5918 Variable * Nested model to which model is compared in likelihood ratio test ** Significance is tested at α=0.05 level, and ( ) indicates p-value. *** Correlation between the cross-validated z90 estimates and the observed maximum values 0 0 Streamflow (cfs) Maximum Maximum Streamflow (cfs) 8000 6000 4000 2000 2000 4000 6000 8000 Conditional quantiles correspond well to observed record 1970 1970 1980 1990 1980 1990 Year Year 2000 2000 Uses concurrent climate, but could also be used with seasonal forecast 2e-04 1e-04 0e+00 PDF 3e-04 4e-04 GEV distribution can be compared for specific historic times 0 2000 4000 6000 Maximum Streamflow (cfs) 8000 P and T climate change projections from IPCC AR4 are readily available 12 km2 resolution (1/8 of a grid cell) http://gdo-dcp.ucllnl.org/downscaled_cmip3_projections/#Welcome Bias correct P & T to historic data for PWB watershed area Results indicate increasing maximum streamflow anomalies Observed 16 GCM models Maximum Streamflow Anomaly (%) 75 50 25 0 -25 GCM model average 1950 2000 Year 2050 2100 Streamflow quantiles shift higher under CC projections Observed 16 GCM models Likelihood of Turbidity Spike  P( E )   P( E | S ) P( S ) Conditional P(E) 0 Probability of a turbidity spike given a certain maximum flow Maximum Flow (CFS) (Ang and Tang 2007) Likelihood of a turbidity spike increases under CC projections Observed 16 GCM models Likelihood of a turbidity spike increases 1950-2007 2070-2099 95th (top whisker) 13 28 75th (box top) 6.3 11 50th (box middle) 4.2 5.9 P(E) Percentile Small shifts in risk can result in high expected loss Percent Increase in Expected Loss Relative to 1950-2007 Period 140 50th 75th 120 95th 115 100 80 75 60 62 40 41 40 20 23 24 16 10 0 2010-2039 2040-2069 2070-2099 Expected loss can be high, especially for the risk averse Summary • Bayesian Hierarchical Modeling •Powerful tool for all functional (regression) estimation problems (which is most of forecasting/simulation) • Provides model and parameter uncertainties • Obviates the need for discarding covariates • Enables incorporation of expert opinions • Enables modeling a rich variety of variable types • Continuous, skewed, bounded, categorical, discrete etc. • And distributions (Binomial, Poisson, Gammma, GEV) • Generalized Framework • Traditional linear models are a subset Paleo Hydrology Reconstruction Devineni and Lall, 2012, J. Climate accepted Motivation Paleo Hydrology Colorado River Example UC CRSS stream gauges LC CRSS stream gauges 20 Total Colorado River Use 9-year moving average. 18 NF Lees Ferry 9-year moving average 16 Annual Flow (MAF) 14 12 10 8 6 4 2 Calnder Year 19 98 20 02 20 06 19 94 19 90 19 86 19 82 19 78 19 74 19 70 19 66 19 62 19 58 19 54 19 50 19 46 19 42 19 38 19 34 19 30 19 26 19 18 19 22 19 14 Colorado River Demand - Supply 0 Streamflow and Tree Ring Data Fulton Oneida Herkimer Saratoga Onondaga Madison Montgomery MiCO Schenectady R Otsego Cortland Albany Schoharie Chenango ^ Broome are Ri v er *# Schoharie ^ y1 ela w ra st B We Delaware er tB ra nc h Tioga v re Ri D New York hD nc a e la w Ea y4 * Pepacton * Canonsville # # s Sc Batavia Kill ho ha r ie Cre ^ Greene MoCO MoTP Columbia MHH y5 nk rsi e v Ne ek er Ri v nd Ro o C ut k ree MSBMPP MRH MLQ ^ ^ y2 * y3 # # * Roundout Ulster Susquehanna Bradford Neversink Dutchess Sullivan Wayne Wyoming Lackawanna Putnam Orange Streamflow and Tree Ring Data Average Summer (JJA) Flows as Predictand Reservoir System Feed Creek Stream Gauge Data Record # of years Drainage Area (mi2 ) Schoharie Schoharie 1350000 1903 – 1999 97 237 Neversink Neversink 1435000 1937 – 1999 63 67 Roundout Roundout 1365000 1937 – 1999 63 38 Canonsville West branch Delaware River 1423000 1950 – 1999 50 332 Pepacton East Branch Delaware River 1413500 1937 – 1999 63 163 Annual Tree Ring Growth Index (Chronology) as Predictor – 246 years common data Abbreviation MHH MLQ MRH MSB MPP MoCO MoTP MiCO Site Mohonk, NY Mohonk, NY Mohonk, NY Mohonk, NY Mohonk, NY Montplace, NY Montplace, NY Middleburgh, NY Species Humpty Dumpty Helmlock Long, QUSP Rock Rift Hemlock Sweet Birch, BELE Pitch Pine Chestnut Oak, QUPR Tulip Popular, LITU Chestnut Oak, QUPR 1754 Number of Trees 43 20 18 17 23 21 20 23 1999 1903 1937 1950 Summer Flow = f(tree rings) + error Number of Series 25 34 25 27 45 34 32 42 Data Record 1754 - 1999 1754 - 1999 1754 - 1999 1754 - 1999 1754 - 1999 1754 - 1999 1754 - 1999 1754 - 1999 246 years chronology (Xt) (8 tree ring chronologies) 1999 variable length streamflow record (Yt) (5 sites) Preliminary Data Analysis – Bayesian Hypothesis (correlation – tree chronology Vs average summer seasonal flow) Station-tree correlations similar! - pooling? Oneida Hypothesis Herkimer Saratoga Montgomery No Shrinkage of Regression MiCO Coefficients (no pooling) traditional regression Madison Shrinkage of Regression Coefficients across sites (partial pooling) hierarchical model Schenectady Otsego Chenango Albany Schoharie (a) *# Schoharie r elawa Little D elaware ^ r R i ve S choh C r ee k MoRO MoTP MoCO Eas t ^ Columbia y5 * Pepacton # MBO r Rive sink ever n k R ive r N h c si Bran Never East nch a Br t s e k r ee ou t C ^ y2 r Nev e Sullivan Ro R sink nd iv e W r Broome Wayne Greene arie Br ^ y4 * Canonsville # iv an c hD Delaware Batavia Ki ll y1 er B hD eR st nc ra el a wa r We New York ve r e Ri ^ y3 * Neversink # # * Roundout MLQ MSB MHH MRH Ulster Dutchess Bayesian Hierarchical Models Streamflow Log Normal Distribution Regression Coefficients (β) of the hierarchical model - multivariate normal distribution Partial Pooling – Hierarchical Model Shrinkage on the coefficients to incorporate the predictive ability of each tree chronology on multiple stations trees ti   i    ij xtj 0 j  ~ MVN (   j ,  j ) i j p( / data)  p( )  p(data /  ) p(data) Key ideas: 1. Streamflow at each site comes from a pdf 2. Parameters of each pdf informed by each tree 3. Common multivariate distribution of parameters across trees 4. Noniformative prior for parameters of multivariate distribution 5. MCMC for parameter estimation T log( yti ) ~ N ( ti ,  2 ) i  trees ti   i    ij xtj 0 j  ~ MVN (   j ,  j ) i j   j ~ N (0,0.0001)  ~ N (0,0.0001) i 0 Site i  0i  ti  2i log( Qti ) i j j  xt j  ~ covariance  1  i  ~ unif (0, 100)  2  i Year t j Tree stand j Delaware River Reconstruction and Performance Models Developed • Hierarchical Bayesian Regression (Partial Pooling) • Linear Regression (No Pooling) Model Simulations • WinBUGS : Bayesian Inference Using Gibbs Sampler • 7500 simulations with 3 chains and convergence tests. Cross Validated Performance Metrics • Reduction of Error (RE), Coefficient of Efficiency (CE) Delaware River Reconstruction and Performance Posterior PDF (Model Level 1) Delaware River Reconstruction and Performance Regression Coefficients Model Level 2 No Pooling Partial Pooling Delaware River Reconstruction Cross-Validated Performance Canonsville Pepacton Paleo Hydrology Reconstruction Traditional Methods Linear/Nonlinear Regression PCA of Tree Rings Regression on leading PCs Slide 88 of 49 Objective 1: Tree-ring Reconstructions  LCBR  Naturalize streamflow  9 nodes in CRSS  5 are well correlated with precipitation (>0.5)  Referred to as “good nodes” (blue)  4 are not correlated (<0.1)  Referred to as “noise nodes” (yellow) Slide 89 of 49 Tree-Ring Reconstruction Approaches  Multiple Linear Regression  Individual chronologies are added in a stepwise fashion  Principle Component Linear Regression  Eliminates multicollinearity  Parsimonious model since the majority of the variance is represented in fewer variables.  K-nearest neighbor nonparametric approach  No assumption of distribution  Captures nonlinearities  Removes undue influence of outliers Slide 90 of 49 New Approach  Cluster analysis on the tree-ring chronologies to find distinct, coherent climate signals.  K-means clustering approach  Increases the amount of climate signal that can be extracted  Perform PCA on each cluster, provide the leading PCs from each cluster as potential predictors  Signal that may have been washed out during PCA on the entire pool of predictors is preserved Slide 91 of 49 Slide 92 of 49 Regression Methods  Present two regression methods to add to the tree-ring reconstruction repertoire  Local Polynomial regression.  Extreme Value Analysis (EVA) Slide 93 of 49 Method 1: Local Polynomial Regression  Find the K-nearest neighbors, fit a polynomial to the neighborhood  Polynomials are fitted in the GLM framework, where Y can be of any distribution in the exponential family (normal, gamma, binomial, etc)  G(E(Y))=f(Y)+  G(.) = link function,  X = set of predictors/independent variables  E(Y) is the expected value of the response/dependent variable   is the error, assumed to be normally distributed  Improvement over K-NN resampling  Values beyond those found in the historical record can be generated Slide 94 of 49 Slide 95 of 49 Summary • Bayesian Hierarchical Modeling •Powerful tool for all functional (regression) estimation problems (which is most of forecasting/simulation) • Provides model and parameter uncertainties • Obviates the need for discarding covariates • Enables incorporation of expert opinions • Enables modeling a rich variety of variable types • Continuous, skewed, bounded, categorical, discrete etc. • And distributions (Binomial, Poisson, Gammma, GEV) • Generalized Framework • Traditional linear models are a subset

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Linear Regression Models - Civil, Environmental and Architectural