Download Predictive Methods and Statistical Modeling of Crash Data II

Fall 2015 Statistical Models For Crash Data Modeling Process Determine Modeling Objectives •Definition (Intersections, Pedestrians, etc.) •Data availability •Unit Scales (Crashes/year; Severity; etc.) Establish Appropriate Process •Sampling Models •Observational Models •Process/System State Models •Parameter Models (Bayesian Models Only) Statistical Models For Crash Data Modeling Process Determine Inferential Goals •Point estimate (Value + Standard Error) •Distribution (Bayesian Models) •Percentiles (2.5%, 85%, etc.; Bayesian Models) Select Computation Techniques •Frequentist (MLE) •Bayesian (via simulation) •Empirical Bayes Evaluate Models •Goodness-of-Fit •Prediction •Confidence Intervals Statistical Models For Crash Data Data and Methodological Issues Associated with Crash-Frequency Data Data/Methodological Issue Overdispersion Associated Problems Can violate some the basic count-data modeling assumptions of some modeling approaches Underdispersion As with overdispersion, can violate some the basic count-data modeling assumptions of some modeling approaches Time-varying explanatory variables Averaging of variables over studied time intervals ignores potentially important variations within time intervals – which can result in erroneous parameter estimates Correlation over time and space causes losses in estimation efficiency Causes an excess number of observations where zero crashes are observed which can cause errors in parameter estimates Temporal and spatial correlation Low sample mean and small sample size Injury severity and crash type correlation Correlation between severities and crash types causes losses in estimation efficiency when separate severity-count models are estimated Under reporting Under reporting can distort model predictions and lead to erroneous inferences with regard to the influence of explanatory variables Omitted variables bias If significant variables are omitted from the model, parameter estimates will be biased and possibly erroneous inferences with regard to the influence of explanatory variables will result Endogenous variables If endogenous variables are included without appropriate statistical corrections parameter estimates will be biased and erroneous inferences with regard to the influence of explanatory variables may be drawn Functional form If incorrect functional for is used, the result will be biased parameter estimates and possibly erroneous inferences with regard to the influence of explanatory variables If parameters are estimated as fixed when they actually vary across observations, the result will be biased parameter estimates and possibly erroneous inferences with regard to the influence of explanatory variables Fixed parameters Statistical Models For Crash Data Summary of Existing Models for Analyzing Crash-Frequency Data Model Type Poisson Advantages Most basic model; easy to estimate Negative binomial/Poissongamma Easy to estimate can account for overdispersion Poisson-lognormal More flexible than the Poisson-gamma to handle over-dispersion Zero-inflated Poisson and negative binomial Handles datasets that have a large number of zero-crash observations Conway-Maxwell-Poisson Gamma Generalized estimating equation models Generalized additive models Disadvantages Cannot handle over- and under-dispersion; negatively influenced by the low sample mean and small sample size bias Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias (less than the Poisson-gamma); cannot estimate a varying dispersion parameter Can create theoretical inconsistencies; zeroinflated negative binomial can be adversely influenced by the low sample mean and small sample size bias Can handle under- and over-dispersion or Could be negatively influenced by the low combination of both using a variable dispersion sample mean and small sample size bias; no (scaling) parameter multivariate extensions available to date Can handle under-dispersed data Truncated distribution (full gamma function); independence of data (incomplete gamma function) Can handle temporal correlation May need to determine or evaluate the type of temporal correlation a priori; results sensitive to missing values More flexible than the traditional generalized Relatively complex to implement; may not estimating equation models; allows non-linear be easily transferable to other datasets variable interactions Statistical Models For Crash Data Summary of Existing Models for Analyzing Crash-Frequency Data Model Type Advantages Random-effects models Handles temporal and spatial correlation Negative multinomial Can account for overdispersion and serial correlation; panel count data. Random-parameters models More flexible than the traditional fixed parameter models in accounting for unobserved heterogeneity Can model different crash types simultaneously; more flexible functional form than the generalized estimating equation models (can use non-linear functions) Bivariate/multivariate models Finite mixture/Markov Switching Duration models Can be used for analyzing sources of dispersion in the data By considering the time between crashes (as opposed to crash frequency directly), allows for a very in-depth analysis of data and duration effects Hierarchical/Multilevel Models Can handle temporal, spatial and other correlations among groups of observations Neural Network, Bayesian Neural Network, and support vector machine Disadvantages May not be easily transferable to other datasets Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias Complex estimation process; may not be easily transferable to other datasets Complex estimation process; requires formulation of correlation matrix Complex estimation process; may not be easily transferable to other datasets Requires more detailed data than traditional crash frequency models; timevarying explanatory variables are difficult to handle May not be easily transferable to other datasets; correlation results can be difficult to interpret Non parametric approach does not require an Complex estimation process; may not be assumption about distribution of data; flexible transferable to other datasets; work as functional form; usually provides better black-boxes; may not have interpretable statistical fit than traditional parametric parameters models Review of Multivariate Linear Models Ordinary Least Square Method: This is an estimation technique that is used for estimating unknown coefficients. It consists of solving p = k + 1 simultaneously linear equations and by minimizing the sum of square errors. Let yi   0  1 xi1   2 xi 2    k xik   i k yi   0    j xij   i j 1 Note: E(ε) = 0 and var(ε) = σ2 Review of Multivariate Linear Models The least square function S is given by n S   2 i 1    S    yi  b0   b j xij     i 1  j 1   n k 2 The S function is to be minimized with respect to β1, β2, …, βk. The least square estimators, say b0, b1, …, bk, must satisfy S |b0 ,b1,  0 S |b0 ,b1,  j k    ,bk  2  yi  b0    j xij    0 i 1  i 1   n  k    ( yi  b0   b j xij )  xij  0 ,bk  2    i 1  j 1    n j = 1, 2, …, k Review of Multivariate Linear Models It is easier to solve the equations by using a matrix format. The equations can be written the following way: y  Xβ   where  y1  1 x11 x12    1 x x y 21 22 2    X y       1 xn1 xn 2  yn  x1k   1   1       2   x2 k  2    ε β           xnk  n   n  Review of Multivariate Linear Models Need to find the least square estimator b that minimizes n S (  )    i2     (y  Xβ)(y  Xβ) i 1 It can be shown that S(β) can be expressed this way S ( )  yy  2βXy  βXXβ The least square estimator* must satisfy S |b  2Xy  2XXb  0  which simplifies to XXb  Xy b  (XX) Xy 1 * b is called the ordinary least squares estimator of β. Review of Multivariate Linear Models Maximum Likelihood Method: The likelihood function is found from the joint probability distribution of the observations. Given the assumption that the distribution of errors is normally distributed and the variance σ2 is constant, the likelihood function is the following (normal distribution) (y, β,  )   1 2 (2 ) Same model as before: 2 e n/2 1 2 Y  Xβ 2 ( y  Xβ )( y  Xβ ) Review of Multivariate Linear Models The maximum likelihood estimators are the values of the parameters β and σ2 that maximize the likelihood function. Maximizing the likelihood is equivalent to maximizing the log-likelihood, ln( ). The log-likelihood is: n n 1 2 ln[ (y, β,  )]   ln(2 )  ln( )  2 ( y  Xβ)( y  Xβ) 2 2 2 2 The derivative of the log-likelihood function is called the score function. Taking the derivatives with respect to the coefficients β and equating to zero yields  ln( ) 1   2 (2bXy  bXXβ)  0 β 2 1 2 2 X(y  Xb)  0 b =  XX Xy Review of Multivariate Linear Models Taking the partial derivative with respect to  2 gives  ln( ) 1 1   2  4 (y  Xb)(y  Xb)  0 2  2 2 Which is 1   (y  Xb)(y  Xb) n 2 Generalized Linear Models In the previous overheads, it was obvious how the normal distribution played an important role in estimating the coefficients and inferences of probabilistic models. Unfortunately, there are many practical situations where the normal assumption is not valid. Count data, binary response (0 or 1) or other continuous variables with positive and high-skewed distribution cannot be modeled with a normally distributed errors. The generalized linear model (GLM) was developed to allow fitting regression models for univariate response data that follows a very general distribution called exponential family. This family includes the normal, binomial, negative binomial, geometric, gamma, etc. Statistical Models For Crash Data Poisson-gamma Model (NB) The crash count (or any count) follows a Poisson distribution: e  i  i  f ( yi | i )  yi ! The mean of yi, conditional on μi, is Poisson with the conditional mean and variance given by f ( yi | i )    0 e  i    i     1e d  i i yi !    i i Statistical Models For Crash Data Poisson-gamma Model (NB) The PDF of the Poisson-gamma regression for yi is  ( yi   )     ui  f ( yi )      ( yi  1)( )    ui     i  yi The mean and variance are given by  E ( yi )  ui Var ( yi )  ui   2 i The mean function is given by or Var ( yi )  ui  i2 E ( yi )  ui  exp(xi β) Statistical Models For Crash Data Poisson-gamma Model Example – Crash Data at 3-legged signalized intersections: Functional form:   e  0  1 Fmaj   2 Fmaj Functional form needed to model crash data:     0 Fmaj Fmin 1 2 Where,   Expected number of crashes Fmaj  Major traffic flow Fmin  Minor traffic flow Need to take the natural log of the flow variables Statistical Models For Crash Data Poisson-gamma Model The GENMOD Procedure Model Information Data Set WORK.C Distribution Negative Binomial Link Function Log Dependent Variable Total Total  e 10.113 0.740 maj F 0.505 min Number of Observations Read Number of Observations Used F   4.05E  05  F 0.740 maj 255 255 Criteria For Assessing Goodness Of Fit 0.505 min F Criterion DF Value Value/DF Deviance 252 288.8580 1.1463 Scaled Deviance 252 288.8580 1.1463 Pearson Chi-Square 252 312.6975 1.2409 Scaled Pearson X2 252 312.6975 1.2409 Log Likelihood 836.0686 Full Log Likelihood -606.7989 AIC (smaller is better) 1221.5978 AICC (smaller is better) 1221.7578 BIC (smaller is better) 1235.7628 Algorithm converged. Var ( y)    0.313 2 Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 -10.0648 1.3659 -12.7420 -7.3876 54.29 <.0001 logf_maj 1 0.7517 0.1320 0.4929 1.0105 32.41 <.0001 logf_min 1 0.4837 0.0562 0.3735 0.5939 74.01 <.0001 Dispersion 1 0.3153 0.0519 0.2135 0.4170 NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood. Statistical Models For Crash Data Statistical fit (Goodness of fit) There are various methods for estimating the statistical fit of models. The methods cane be divided into two categories: Likelihood Statistics • Log-Likelihood • Deviance • Pearson Chi-Square • Akaike’s Information Criterion (AIC) • Bayesian Information Criterion (BIC) Model Errors • Mean Absolute Deviance • Mean Squared Prediction Errors Statistical Models For Crash Data Log-likelihood Poisson: n ln L       yi ln  i   i  ln  yi ! i 1 NB:    1i  ln L  ,      yi ln  1 1   i i 1    n Where: i  expxiβ   1   ln 1     ln  y    ln  y  1  ln    i   i     i     Statistical Models For Crash Data Log-likelihood Example – Crash Data at 3-legged signalized intersections: Poisson: -685.34 NB: -606.80 Statistical Models For Crash Data Statistical fit (Goodness of fit) The deviance statistic is defined as twice the difference between the maximum log-likelihood achievable (y=μ) and the log-likelihood of the fitted model: D(y | μˆ )  2{ (y )  (μˆ )} When competitive models are compared, the model with the lowest deviance offers the best statistical fit. A note of caution: this is only valid when the dispersion parameter Φ is the same for each competitive model. Statistical Models For Crash Data Statistical fit (Goodness of fit) The deviance statistic for the Poisson model is the following:   yi DP  2  yi ln  i 1   ˆ i  n     ( yi  ˆ i )    The deviance statistic for the Poisson-gamma model is the following:   yi  2  yi ln  i 1   ˆ i  n DNB 1   yi     1   ( yi   ) ln  1     ˆ i     Statistical Models For Crash Data Statistical fit (Goodness of fit) The deviance statistic for the Poisson model is the following: DP  644.4 The deviance statistic for the Poisson-gamma model is the following: DNB  288.9 Statistical Models For Crash Data Statistical fit (Goodness of fit) AIC and BIC penalize the fit when additional variables are added to the model. AIC: AIC  2ln L  2P BIC: BIC  2 ln L  P  ln(n) P = estimated coefficients + 1 n = number of observations Statistical Models For Crash Data AIC and BIC AIC and BIC penalize the fit when additional variables are added to the model. AIC: AICP  2  685.3  2  3  1,376.7 AICNB  2  606.8  2  4  1, 221.6 BIC: BICP  2  685.3  3  ln(255)  1,387.2 BICNB  2  606.8  4  ln(255)  1, 235.8 Statistical Models For Crash Data Statistical fit (Model Errors) Mean Absolute Deviation (MAD) This criterion has been proposed by Oh et al. (2003) to evaluate the fit of models. The Mean Absolute Deviance (MAD) calculates the absolute difference between the estimated and observed values 1 n MAD   ˆ i  yi n i 1 Mean Squared Prediction Error (MSPE) The Man Squared Prediction Error (MSPE) is a traditional indicator of error and calculates the difference between the estimated and observed values squared. 1 n 2 MPSE    ˆ i  yi  n i 1  Recent Models for Over-dispersion: ◦ Poisson-lognormal  Poisson mean follows a lognormal distribution ◦ Poisson-Weibull  Poisson mean follows a Weibull distribution ◦ Random-Parameters (investigation of the variance) ◦ Negative Binomial-Lindley (highly dispersed data)  Overcome problems with zero-inflated models. ◦ Generalized Sichel (highly dispersed data) ◦ Generalized Waring (highly dispersed data – investigation of variance) ◦ Finite mixture (Poisson and Poisson-gamma – investigation of variance and structure of data) ◦ Bayesian Model Averaging (automatically compare different models) ◦ See AA&P and Safety Science for info on some of these models.  Recent Models for Under-dispersion: ◦ Not very common; usually with low sample mean and often based on model output (conditional on the mean). ◦ All the models below can be also used for over-dispersion ◦ Gamma time-dependent  Observations not independent. ◦ Conway-Maxwell-Poisson  Has become increasingly popular ◦ Double-Poisson  Work published ◦ Hyper-Poisson  Work published    Crash data have often the characteristics that the mean μ can be very low (below 1.0) Create problems with goodness-of-fit and prediction Read papers by ◦ Wood, G.R. (2004) Generalised Linear Models and Goodness of Fit Testing. Accident Analysis & Prevention, Vol. 34, pp. 417-427. ◦ Lord, D. (2006) Modeling Motor Vehicle Crashes using Poisson-gamma Models: Examining the Effects of Low Sample Mean Values and Small Sample Size on the Estimation of the Fixed Dispersion Parameter. Accident Analysis & Prevention, Vol. 38, No. 4, pp. 751-766. Statistical Models For Crash Data Low Mean Issue Statistical Models For Crash Data Time Trend Effects 2.5 Mean (crashes per year) 2 1.5 1 0.5 0 0 1 2 3 4 Year 5 6 7 Statistical Models For Crash Data Time Trend Effects Goal: capture changes that vary from year to year directly into the model. The model structure is given by the following: yit  0,it   j 1  ji x j p Time Trend captured with the intercept (i.e., one intercept for each year) Characteristic: each year is defined as a different observation. Issues: Since each site is observed at a different point in time, a temporal serial correlation exits and affects the statistical inferences of statistical models. Therefore, you need to account for this correlation into the model. Modeling approach: Generalized Estimating Equations (GEE); Random-Effects models, etc.      The Bayes method approaches the analysis of data differently than the classical method (frequentist) Subjective judgment more easily incorporated with the observed data and models Treat unknown coefficients of regression models as random variables Data analysis less limited by the number of observations (can be supplemented with subjective judgment) Computationally intensive (no longer an issue)   The Bayes method makes inferences from data using probability models for quantities that are observed and for quantities one is interested to learn about Bayesian data analysis can be divided into three steps: ◦ Setting up a full probability model: provide a joint probability distribution for all observable and unobservable quantities ◦ Conditioning on observed data: calculating and interpreting the appropriate posterior distribution (conditional probability distribution) ◦ Evaluating the fit of the model and implication of the posterior distribution  Emphasis placed on interval estimation (confidence interval) rather than hypothesis testing     For the EB method, a different weight is assigned to the prior distribution and standard estimate respectively In safety analyses, the weights are estimated with the assumption that the mean () for each site follows a Gamma distribution The EB estimates has been found to outperform other estimates, such as the MLE The EB framework is presented on next overhead Formulation: ˆˆ  ˆ  (1   ) y where  1 ˆ 1  Mean of a Poisson-gamma regression  1  Dispersion parameter of NB regression Using the same example shown earlier: F1 = 24,164; F2 = 3,392; y=10 The values are estimated as follows 0.816 0.3732 ˆ u  5.5E  5  24,164  2,560 ˆ  3.9 Crashes per year 1   0.39 3.90 1 2.46 ˆˆ  0.39  3.9  (1  0.39)10  7.63 Crashes per year Crashes per Year Observed value 10 EB estimate 7.63 MLE estimate 3.9 1 t 2 Year

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Predictive Methods and Statistical Modeling of Crash Data II