Download Daniel Emerson - SaferRoads Conference

USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A FOCUS ON SKID RESISTANCE VALUES Authors Daniel Emerson; Richi Nayak; QUT Justin Weligamage: QDTMR Presenter Daniel Emerson Computer Science Discipline Queensland University of Technology (QUT) Project Details • The work for this presentation was conducted as a larger skid resistance – crash analysis as CIEAM I and CIEAM II projects from 2009 -20011 and conducted at QUT. • Project initiators & organizers: Justin Weligamage, Richi Nayak. • Data mining supervisor: Richi Nayak. • Data preparation, data mining & dm strategist : Daniel Emerson • Road engineering advisor: Nappadol Piyatrapoomi • Motivation (why the work was done) • Applied data mining as a new approach for analysis with Queensland road & crashes data. • Had found a relationship between the crash risk of roads and their attributes, with skid resistance being significant. (roads having crash). • Sought a higher resolution measure of road crash risk through the crash count method. • Application of crash count data mining models in decision support systems to identify potential roads for investigation and treatment. Introduction This paper presents a data mining case study in which predictive data mining is applied to model the skid resistance & road attributes to predict crash relationship with the purpose of:  development of models (algorithms) on sample data,  application of the models to other data to predict high risk roads. Data and Data Preprocessing • Several data sources obtained from QDTMR for four year period of 2004 to 2007 include – annual 1 km (or less) road segment snapshots with a list of road variables, • road surface texture depth test readings; seal type and seal age; roadway features, traffic flow, features such as intersections and many others. – dated, skid resistance 100 metre (or less) values representing skid resistance tests F0, – Crash instances, crash details and their road location Examination of road segment crash count • Meeting our need for a more precise crash measure: crashes per 1km per year. Crash count characteristics Scatterplot of 2004, 2005, 2006, 2007 vs Year Crash Count 1400 Variable 2004 2005 2006 2007 Crash Instance Count 1200 1000 800 600 400 200 0 1 yr time scale 0 5 10 20 15 Year Crash Count 25 30 35 • Road segment crash count showed stability from year to year, indicating its value in crash risk analysis. Clusters: crash count ranges (4yr) • Road segment data mining clusters based on road properties showed characteristic crash counts, thus relating road crash 8 proneness with road properties Method: Applying predictive data mining Reasons; • To demonstrate that road segment crash count can be modeled, thus establishing a relationship between crash count and roadway features. • Use the rules obtained from the model output in the analytical process to further contribute to understanding of how the roadway features contribute to crash count. • Later apply successful models in decision support. Method: Applying predictive data mining … using a subset of quality data • • • • Select the target variable to be predicted (crash count). Select the input variables (road segment attributes). Select a modelling method (regression tree algorithm). Run a range of models with varying configurations (regression tree). • Evaluate and understand the results. Model variables Target Variable Road segment crash count Road attribute input variables (significant order) AVG_FRICTION_AT_60_Ikm (F60 skid resistance) AADT (traffic rates) traffic_percent_heavy lane_count Texture Depth roughness_average rutting_average seal_age seal_type CRASH_SPEED_LIMIT CWAY_TYPE (single, double) CRAS_DIVIDED_ROAD ROAD_TYPE (highway, urban arterial etc) Roadway Feature (roundabouts, bridges, intersections etc) • These road segment attributes were relevant to predicting road segment crash count and became model input variables. Model results Model Leaves & rules Correlation (R-squared) 1 143 0.93 2 159 0.93 3 161 0.93 4 163 0.92 5 119 0.91 6 88 0.86 • All models show a high correlation between actual crash count and predicted crash count Charts of actual value vs. predicted value predicted value Actual value • Comparing models with 143 leaves and 83 leaves A sample output rule Sample Rule 1. IF AVG_FRICTION_AT_60 < 0.4095 • AND CRASH_SPEED_LIMIT IS ONE OF: 90 100 110 • AND 3987 <= AADT < 6105 • AND CWAY_TYPE EQUALS SINGLE THEN • NODE : 48 • N : 315 …. Number of road segments in the group • AVE : 4.04444 …average crashes for the group • SD : 2.5357 ..standard deviation of the predicted crash values Conclusion  Road segment crash count can be successfully modelled with road attributes using data mining.  A strong relationship exists between road crash count and road attributes.  Skid resistance plays an important role in determining the crash characteristics of the road segment.  The models may be of sufficient quality to use in decision support.  While the models are specific to Queensland roads, the method can be trialled and evaluated elsewhere. Future Work • Work with road asset domain experts to analyse the rules, draw conclusions and improve the models. • Apply models for analysis of data subsets, such as crashes with severe human outcomes. • Apply the models to the whole-of-network dataset with the goal of identifying road segments that are skid resistance sensitive, i.e surface intervention to improve skid resistance will result in reduce crash risk. Acknowledgement This study is an ongoing investigation into road-crash supported by CIEAM (CRC Asset Management), QDTMR and Faculty of Science and Technology, QUT Data mining tools used include •SAS (Statistical Analysis Software) •WEKA (Data Mining Software) Acknowledgement Thanks and Questions Project Publications [1] Nayak, R., Piyatrapoomi, N. and Weligamage, J. (2009). Application of text mining in analysing road crashes for road asset management. Proceedings of the Third World Congress on Engineering Asset Management, WCEAM 2009, ( Athens, Greece, 28-30 September 2009). [2] Nayak, R., Emerson, D., Weligamage, J. and Piyatrapoomi, N.(2010) Using Data Mining on Road Asset Management Data in Analysing Road Crashes. Proceedings of the 16th Annual TMR Engineering & Technology Forum, (Brisbane, July 20, 2010, 2010). [3] Emerson, D., Nayak, R., Weligamage, J. and Piyatrapoomi, N. (2011). Identifying differences in wet and dry road crashes using data mining. (2010). Proceedings of the Fifth World Congress on Engineering Asset Management, WCEAM 2010, ( Brisbane, October 26,2010). [4] Nayak, R., Emerson, D., Weligamage, J. and Piyatrapoomi, N. (2011) Road Crash Proneness Prediction using Data Mining, Proceedings of the EDBT 2011, (Uppsala, Sweden., 2011).

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Daniel Emerson - SaferRoads Conference