Download application of data mining techniques for analyzing road traffic

APPLICATION OF DATA MINING TECHNIQUES FOR ANALYZING ROAD TRAFFIC INCIDENTS A CASE STUDY OF LIBYA’S ROAD TRAFFIC 1 2 Sayed Mujahed Hussain Faraj A. El-Mouadib 1 Department of Computer Science, Faculty of Arts and Science Ajdabiya,, University of Benghazi,Ajdabiya, Libya [email protected] 2Department of Computer Science, Faculty of Information Technology, University of Benghazi, Benghazi, Libya. [email protected] Descriptive data mining and predictive data mining; some of the most common techniques of DM are the decision trees, the production rules and neural networks. Before the emergence of DM, the only analysis tool that was available was simple statistical manipulation that was having not much power to present the data of a particular user interest. Traffic Control System is one of the areas where DM functionalities are being used effectively to minimize the death rate by road accidents. Road traffic injuries are predictable and preventable, but good data are important to understand the ways in which road safety interventions and technology can be successfully transferred from developed countries where they have proven effective[2]. According to the World Health Organization, road traffic injuries caused an estimated 1.24 million deaths worldwide in the year 2010, slightly down from 1.26 million in 2000 [3]. Road accidents have earned India a dubious distinction. With over 130,000 deaths annually, the country has overtaken China and now has the worst road traffic accident rate worldwide [4]. Statistics have shown a very clear increase in road deaths in Libya during the past few years [5]. Following figure-2 shows the number of death caused due to road accidents in Libya since 1995 to 2008. In this particular research study we have focused on the causes of road traffic accidents in Libya. In Libya 50000 people died in road accidents during the last forty years from 1969 to 2009. This unpleasant fact was revealed in the Global Status Report on Road Safety released in 2008 by the World Health Organization. Road Traffic accidents in Libya particularly are a changeable factor. It depends on the certain parts of the day or specific days and weather conditions of the particular day to contribute beside many other factors in causing road accidents in Libya. Intelligent system is one of the most modern centralized traffic control system in the world. The job of this centralized computer is to analyze the traffic volume of the city records of the data for further study and implement the time plan for smooth traffic flow without delay or traffic jams in the intersections and cross roads. With this system free flow of traffic is ensured with minimum displacement of manpower. The traffic light systems operate on a timing mechanism that changes the lights after a given interval. The Intelligent Traffic Light system senses the presence or absence of vehicles and reacts accordingly. The idea behind this research is to exploit the techniques which are readily available in Data Mining to help to minimize the waiting time of vehicles, in the signal intersections that will reduce the mental stress of the vehicle Abstract: This paper aims to present the proposed theory to control the traffic management with the application of Data Mining (DM) functionalities by combining intelligent agent based system. A case study was conducted to illustrate the efficiency of Cluster Analysis one of a Data Mining functionalities in combination with the Intelligent Agent Based System. The traffic in all the developed and under development countries is regularized by traffic lights in which most of the traffic lights are based on a fixed cycle protocol. In this paper the causes of accidents and a proposal to avoid the them is presented from DM perspective. The results of this research shows that the use of Data Mining functionalities by combined with various techniques holds high potential to provide Intelligent Transportation System to control and to avoid possible accidents. Keywords: Data Mining, Cluster Analysis Intelligent Transportation System, Agent Based System, Traffic Management. 1. INTRODUCTION With the rapid development of Information Technology in the area of Databases and Data Mining, people can access huge amount of information of one’s interest. Analyzing, interpreting and making maximum use of the data is difficult and resource demanding due to the exponential growth of many businesses, governmental and scientific databases. Data Mining therefore considered as a useful tool to address the need for sifting useful knowledge such as hidden patterns from databases. These days where the data is increasing with an alarming rate, due to various resources such as satellite, radar, cameras, sensors and other scientific instruments. Understanding, interpreting and making use of extracted knowledge is considered as an important issue to make certain real time decisions. To make an effective decision data from various sources is gathered first and organized in a proper way. However the mere gathering of data is not sufficient to make use of it. The widening gap between data and information calls for a systematic development of data mining tools that will turn data tombs into “golden nuggets” of knowledge [1].The tasks of the data mining can be classified in two categories: 1 users, reduce the pollution, saving the precious fuel, and most importantly avoiding the accidents. online repositories. In Libya, major roads have improved in terms of length, quality and linking between cities [8]. The important agents in traffic accidents are driver (the human element), the road and the vehicles. The traffic department records show many different causes for traffic accidents inside the country, as high speed (speeding), lack of attention, improper stopping, driving under the influence of drugs, improper turning, violation of traffic laws, ignoring priority of way, using mobile phones during driving and other reasons. The following table.1 shows the major causes of accidents with its total percentage of contribution to cause of death by road accidents. 2. ROLE OF DATA MINING IN TRAFFIC INCIDENCE Data mining requires identification of a problem, along with collection of data that can lead to better understanding, and computer models to provide statistical or other means of analysis [6]. Data Mining is task oriented so the first stage is to gather the data from many different sources. The second task is to identify related data to the problem description. The next thing which we have to be careful about is to choose the variables which are independent of each other. Independent in the sense they do not contain overlapping information. Figure.11 depicts the common process of knowledge mining task which usually applied to all fields where we intend to apply the Data Mining. Table 1. Reasons for traffic accidents. Cause Carelessness Close following on tailgating Over speeding Des-regarding traffic priority Using incorrect lane Bad turn Incorrect reversing Disallowing pedestrian priority Wrong overtaking Wrong turn Source: GPCGS 2011 ref [8] Figure.1 Steps of knowledge mining process. 4.1. DATA SELECTION The following data sets were used for our study purpose Accident data: It contains the complete information related to the accidents like the severity of the accidents such as fatal or non-fatal, location of the accident, details about the road conditions, also the temporal data such as year, month, day, hour, date etc. Agents data: it contains the overall agents involved in this phenomenon such as the persons, drivers, vehicles, and other such objects etc. Traffic density: A considerable factor as a possible and potential risk for the traffic accidents. The traffic density information is recorded against the road number, starting segment, ending segment, starting distance from the starting segment, ending distance from the ending segment. Road Traffic accidences are predictable and preventable, good data is an important consideration and an understanding of the ways in which road safety interventions and technology can be successfully transferred from different disciplines where they have shown to be effective. 3. PROFILE OF THE PROBLEM In developed countries, road traffic death rates have decreased since the 1960s because of successful interventions such as seat belt safety laws [2]. But due to inadequate use of proper technology traffic fatalities increased in developing countries. The increasing numbers of vehicles is significantly affecting the rising road traffic deaths in Libya. An estimated 2375 people died and 14025 were injured in Road Traffic Accidents (RTA) in Libya in 2010 [8]. Figure. 2 depict the year wise death rate in Libya. 4.2. DATA PREPROCESSING To be useful for data mining purposes, the databases need to undergo preprocessing, in the form of data cleaning and data transformation [9]. All the attributes are to be grouped into a single data matrix and inconsistencies from the data is removed. Although the missing values are always have been processed in the mining process takes but in this particular project since missing values indicate special meaning such as a missing value in the traffic lights attribute indicates that there exist no traffic lights at the accident place hence it is left un preprocessed and all the missing values in the traffic lights attribute were replaced by zeros. 100% No. of Injured 80% 60% No. of Killed 40% 20% No. of Accidents 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 0% 4.3. DATA TRANSFORMATION In data transformation, the data are transformed or consolidated into forms appropriate for mining [1]. It may involve many activities such as smoothing, aggregation, generalization, normalization, etc. The given data set contains many different types of attributes. These must be transformed and scaled before the actual mining methods can be employed effectively. Figure. 2 Deaths caused due to road accident from 1990 till 2010 4. DATA COLLECTION The most important step in the mining task is the collection of the relevant data and the required information. Data for this study was collected from many different sources and 1 Percentage (%) 22.5 17.0 15.0 14.6 12.4 10.5 5.3 1.2 0.8 0.7 This figure is taken from ref. [7]. 2 Table1. Statistics of Road Traffic Accidents in Libya (19902010) Year No. of No. of No. of Accidents Killed Injured 5. CLUSTER ANALYSIS One of the DM functionalities is to search for new and interesting hypotheses than confirming the existing one. In this paper the feasibility and utility of DM techniques in the context of road traffic incidence is studied. Here we intend to use one of the Data Mining functionalities which is clustering for traffic incidence analysis. Cluster Analysis is a family of mathematical and statistical techniques that divides data into groups with similar characteristics [10]. Here we observed that by clustering the casualties into groups with equal accident frequencies enhance the understanding on traffic accidents. In this method specific formula has been used to calculate the number of severely injured, slightly injured and number of deaths due to accidents. Cluster specific attribute is used to characterize the clusters that will affect the enhancement of road safety. A cluster can be defined, for example, as ”a set of entities which are alike, and entities from different clusters are not alike [7].” 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 5.1. SIMPLE K-Means Clustering A cluster is comprised of a number of similar objects collected or grouped together. A clustering is a type of classification imposed on a finite set of objects [11]. In this study we are going to implement K-Means clustering approach for our analysis purpose. Specially k-Means clustering algorithm is used because of its effectiveness for finding clusters in data. The algorithm proceeds as follows. Step 1: It asks the user number of clusters k the data set should be partitioned into. Step 2: In this step it randomly assigns k records to be the initial cluster center locations. Step 3: For each record it finds the nearest cluster center. Each cluster center “owns” a subset of the records, thereby representing a partition of the data set. We therefore have k clusters, C1,C2, . . . ,Ck . Step 4: For each of the k clusters it finds the cluster centroid, and update the location of each cluster center to the new value of the centroid. Step5: Repeat steps 3 to 5 until convergence or termination. 7847 7749 8423 9009 8400 8419 8437 9279 9393 9370 10667 10895 12017 12154 11643 11898 11982 13165 13352 1312 1279 1435 1440 998 1071 927 1119 1224 1204 1504 1598 1751 1744 1785 1800 1866 2138 2332 2138 2375 6665 6342 6323 7852 7432 7703 7750 8076 8343 8394 9617 10033 11058 10502 10746 11541 12164 13497 13725 14025 Source: General People’s Committee General Security (2010). After performing all the operations and application of KMeans clustering algorithm on the data we get the following cluster output which is shown is figure- 3. The initial clustering creation depends on the concept of “nearest” which is calculated by the well known Euclidean distance formula. 6. RESULT AND ANALYSIS Here we will give the details of the experiment related to the analysis. The experiment is performed on a laptop computer with Intel core i5 CPU, 2.40 GHz speed with a memory of 4 GB and Windows 7 Professional 64 bit Operating system. For analysis purpose we have taken the accident related data from the source: General People’s Committee General Security (2009) Libya. The data related to the number of Accidents, Number of Killed and Number of Injured is depicted in the table-1. To get the result of analysis we used open source Data Mining software WEKA, which is a well know software for performing almost all the Data Mining related tasks. Figure-3 Cluster output after the K-Means clustering algorithm application. 6.1. VISUALIZATION The clustering results are presented using WEKA and dimension reduction techniques, introducing the most 3 [2] Heidi Worley, “Road Traffic Accidents Increase Dramatically worldwide”, [http://www.prb.org/Publications/Articles.aspx?search=Road +Traffic+Accidents+increase+dramatically] [3]Wikipedia,” List of countries by traffic-related death rate”, [http://en.wikipedia.org/wiki/List_of_countries_by_trafficrelated_death_rate] [4] Murali Krishnan, “India has the highest number of road accidents in the world”,[ http://www.dw.de/india-hasthe-highest-number-of-road-accidents-in-the-world/a5519345] [5] Younis Al-Fenadi,” Meteorological information and road safety in Libya” [6] David L. Olson, Dursun Delen,” Advanced Data Mining Techniques”, Springer 2008, Pages 1-30 [7] Sami A¨ yra¨mo¨ , Pasi Pirtala, Janne Kauttonen , Kashif Naveed Tommi, K¨arkk¨ainen ,”Mining Road Traffic Accidents”, ISBN 9789513937522, ISSN 14564378 [8] A. Ismail, H.A.M.Yahia, “ CAUSES AND EFFECTS OF ROAD TRAFFIC ACCIDENTS IN TRIPOLI – LIBYA”, ISBN 978-602-8605-08-3. [9]Danial T. Larose, “Discovering Knowledge in Data”,A John Wiley & sons,INC Publication, New Jersey, 2005, Pages 27-40. [10] Olivia Parr Rud, “Data Mining Cookbook”, John characteristic attribute values for each cluster. The resulted clusters are shown in figure-4. Cluster 3 Cluster 2 Cluster 1 Cluster 0 Figure 4 – cluster visualization showing the number of death casualties in a particular year. 7. FUTURE PREDICTION OF CASUALTIES In the following figure-5, the number of future predictions of casualties on Libyan road is expected if proper precaution is not taken on time and the trend of traffic continues this way. wiley & Sons, INC Publication, New York, 2001, Pages 183-206. [11]A.K.Jain, Richard C. Dubes, “Algorithms for clustering Data”, Prentice Hall, New Jersey, Pages 55-140 Author Biographies 1 Sayed Mujahed Hussain, Ph.D. (CS) Working as a lecturer since past fourteen years in the field of teaching, in computer science and information technology. Taught at graduate and post graduate levels at various national as well as international universities such as Alanadlus university of science and technology Yemen, musanna college of technology Oman. Guided several graduate as well as post graduate students for their final year projects. Areas of interest includes : DBMS, Data Mining, Data Structure, OOPS, OS. Currently working as a lecturer in the department of Computer Science, Faculty of Arts and science Ajdabiya, University of Benghazi, Ajdabiya, Libya. Figure 5 – Future prediction of casualties. The given result is generated by Weka’s Prediction tool for future prediction with parameter 5, the number of units to forecast since 2010 onwards. 8. CONCLUSION The objective of this work was to experiments Weka’s Data Mining capabilities such as cluster analysis using K-Means for finding the clusters of interest and future prediction. The use of Data Mining tool to automatically generate the clusters for the traffic incidence directly benefits the Transportation department and local municipality to develop proper plan to avoid future casualties due to road traffic accidents. In this particular study Linear regression technique has been applied for future prediction of casualties due to road accidents. By analyzing the result it makes some sense to take proper precautions for road safety issue. 2 Faraj A. El-Mouadib, Ph.D. (CS) Working as a Professor in Department of Computer Science, Faculty of Information Technology, University of Benghazi, Benghazi, Libya. Chaired various scientific committees, have been involved in evaluation, reviewing of conference papers. Have been teaching at graduate as well as post graduate level. Guided several students for their bachelor and masters project, and research theses. Areas of interest includes DBMS, Data Mining, Artificial Intelligence, Cluster analyses Data Warehousing, etc. References [1]. Jiawei Han, M. Kamber,” Data Mining: Concepts and Techniques” ,Morgan Kaufmann Publisher USA, 2012 Pages 1-42. 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download application of data mining techniques for analyzing road traffic