Download application of data mining techniques for analyzing road traffic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
APPLICATION OF DATA MINING TECHNIQUES
FOR ANALYZING ROAD TRAFFIC INCIDENTS
A CASE STUDY OF LIBYA’S ROAD TRAFFIC
1
2
Sayed Mujahed Hussain
Faraj A. El-Mouadib
1
Department of Computer Science, Faculty of Arts and Science Ajdabiya,, University of Benghazi,Ajdabiya, Libya
[email protected]
2Department of Computer Science, Faculty of Information Technology, University of Benghazi, Benghazi, Libya.
[email protected]
Descriptive data mining and predictive data mining; some of
the most common techniques of DM are the decision trees,
the production rules and neural networks. Before the
emergence of DM, the only analysis tool that was available
was simple statistical manipulation that was having not much
power to present the data of a particular user interest.
Traffic Control System is one of the areas where DM
functionalities are being used effectively to minimize the
death rate by road accidents. Road traffic injuries are
predictable and preventable, but good data are important to
understand the ways in which road safety interventions and
technology can be successfully transferred from developed
countries where they have proven effective[2]. According to
the World Health Organization, road traffic injuries caused
an estimated 1.24 million deaths worldwide in the year 2010,
slightly down from 1.26 million in 2000 [3]. Road accidents
have earned India a dubious distinction. With over 130,000
deaths annually, the country has overtaken China and now
has the worst road traffic accident rate worldwide [4].
Statistics have shown a very clear increase in road deaths in
Libya during the past few years [5]. Following figure-2
shows the number of death caused due to road accidents in
Libya since 1995 to 2008.
In this particular research study we have focused on the
causes of road traffic accidents in Libya. In Libya 50000
people died in road accidents during the last forty years from
1969 to 2009. This unpleasant fact was revealed in the Global
Status Report on Road Safety released in 2008 by the World
Health Organization. Road Traffic accidents in Libya
particularly are a changeable factor. It depends on the certain
parts of the day or specific days and weather conditions of
the particular day to contribute beside many other factors in
causing road accidents in Libya. Intelligent system is one of
the most modern centralized traffic control system in the
world. The job of this centralized computer is to analyze the
traffic volume of the city records of the data for further study
and implement the time plan for smooth traffic flow without
delay or traffic jams in the intersections and cross roads.
With this system free flow of traffic is ensured with
minimum displacement of manpower. The traffic light
systems operate on a timing mechanism that changes the
lights after a given interval. The Intelligent Traffic Light
system senses the presence or absence of vehicles and reacts
accordingly. The idea behind this research is to exploit the
techniques which are readily available in Data Mining to help
to minimize the waiting time of vehicles, in the signal
intersections that will reduce the mental stress of the vehicle
Abstract: This paper aims to present the proposed theory
to control the traffic management with the application of
Data Mining (DM) functionalities by combining
intelligent agent based system. A case study was
conducted to illustrate the efficiency of Cluster Analysis
one of a Data Mining functionalities in combination with
the Intelligent Agent Based System. The traffic in all the
developed and under development countries is
regularized by traffic lights in which most of the traffic
lights are based on a fixed cycle protocol. In this paper
the causes of accidents and a proposal to avoid the them
is presented from DM perspective. The results of this
research shows that the use of Data Mining functionalities
by combined with various techniques holds high potential
to provide Intelligent Transportation System to control
and to avoid possible accidents.
Keywords: Data Mining, Cluster Analysis Intelligent
Transportation System, Agent Based System, Traffic
Management.
1.
INTRODUCTION
With the rapid development of Information Technology in
the area of Databases and Data Mining, people can access
huge amount of information of one’s interest. Analyzing,
interpreting and making maximum use of the data is difficult
and resource demanding due to the exponential growth of
many businesses, governmental and scientific databases.
Data Mining therefore considered as a useful tool to address
the need for sifting useful knowledge such as hidden patterns
from databases. These days where the data is increasing with
an alarming rate, due to various resources such as satellite,
radar, cameras, sensors and other scientific instruments.
Understanding, interpreting and making use of extracted
knowledge is considered as an important issue to make
certain real time decisions. To make an effective decision
data from various sources is gathered first and organized in a
proper way. However the mere gathering of data is not
sufficient to make use of it. The widening gap between data
and information calls for a systematic development of data
mining tools that will turn data tombs into “golden nuggets”
of knowledge [1].The tasks of the data mining can be
classified in two categories:
1
users, reduce the pollution, saving the precious fuel, and most
importantly avoiding the accidents.
online repositories. In Libya, major roads have improved in
terms of length, quality and linking between cities [8]. The
important agents in traffic accidents are driver (the human
element), the road and the vehicles.
The traffic department records show many different causes
for traffic accidents inside the country, as high speed
(speeding), lack of attention, improper stopping, driving
under the influence of drugs, improper turning, violation of
traffic laws, ignoring priority of way, using mobile phones
during driving and other reasons. The following table.1
shows the major causes of accidents with its total percentage
of contribution to cause of death by road accidents.
2. ROLE OF DATA MINING IN TRAFFIC
INCIDENCE
Data mining requires identification of a problem, along with
collection of data that can lead to better understanding, and
computer models to provide statistical or other means of
analysis [6]. Data Mining is task oriented so the first stage is
to gather the data from many different sources. The second
task is to identify related data to the problem description. The
next thing which we have to be careful about is to choose the
variables which are independent of each other. Independent
in the sense they do not contain overlapping information.
Figure.11 depicts the common process of knowledge mining
task which usually applied to all fields where we intend to
apply the Data Mining.
Table 1. Reasons for traffic accidents.
Cause
Carelessness
Close following on tailgating
Over speeding
Des-regarding traffic priority
Using incorrect lane
Bad turn
Incorrect reversing
Disallowing pedestrian priority
Wrong overtaking
Wrong turn
Source: GPCGS 2011 ref [8]
Figure.1 Steps of knowledge mining process.
4.1. DATA SELECTION
The following data sets were used for our study purpose
Accident data: It contains the complete information related
to the accidents like the severity of the accidents such as fatal
or non-fatal, location of the accident, details about the road
conditions, also the temporal data such as year, month, day,
hour, date etc.
Agents data: it contains the overall agents involved in this
phenomenon such as the persons, drivers, vehicles, and other
such objects etc.
Traffic density: A considerable factor as a possible and
potential risk for the traffic accidents. The traffic density
information is recorded against the road number, starting
segment, ending segment, starting distance from the starting
segment, ending distance from the ending segment.
Road Traffic accidences are predictable and preventable,
good data is an important consideration and an
understanding of the ways in which road safety interventions
and technology can be successfully transferred from different
disciplines where they have shown to be effective.
3. PROFILE OF THE PROBLEM
In developed countries, road traffic death rates have
decreased since the 1960s because of successful interventions
such as seat belt safety laws [2]. But due to inadequate use of
proper technology traffic fatalities increased in developing
countries. The increasing numbers of vehicles is significantly
affecting the rising road traffic deaths in Libya. An estimated
2375 people died and 14025 were injured in Road Traffic
Accidents (RTA) in Libya in 2010 [8]. Figure. 2 depict the
year wise death rate in Libya.
4.2. DATA PREPROCESSING
To be useful for data mining purposes, the databases need to
undergo preprocessing, in the form of data cleaning and data
transformation [9]. All the attributes are to be grouped into a
single data matrix and inconsistencies from the data is
removed. Although the missing values are always have been
processed in the mining process takes but in this particular
project since missing values indicate special meaning such as
a missing value in the traffic lights attribute indicates that
there exist no traffic lights at the accident place hence it is
left un preprocessed and all the missing values in the traffic
lights attribute were replaced by zeros.
100%
No. of
Injured
80%
60%
No. of
Killed
40%
20%
No. of
Accidents
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
0%
4.3. DATA TRANSFORMATION
In data transformation, the data are transformed or
consolidated into forms appropriate for mining [1]. It may
involve many activities such as smoothing, aggregation,
generalization, normalization, etc. The given data set
contains many different types of attributes. These must be
transformed and scaled before the actual mining methods can
be employed effectively.
Figure. 2 Deaths caused due to road accident from 1990 till
2010
4. DATA COLLECTION
The most important step in the mining task is the collection
of the relevant data and the required information. Data for
this study was collected from many different sources and
1
Percentage
(%)
22.5
17.0
15.0
14.6
12.4
10.5
5.3
1.2
0.8
0.7
This figure is taken from ref. [7].
2
Table1. Statistics of Road Traffic Accidents in Libya (19902010)
Year
No.
of No.
of No.
of
Accidents
Killed
Injured
5. CLUSTER ANALYSIS
One of the DM functionalities is to search for new and
interesting hypotheses than confirming the existing one. In
this paper the feasibility and utility of DM techniques in the
context of road traffic incidence is studied. Here we intend to
use one of the Data Mining functionalities which is clustering
for traffic incidence analysis. Cluster Analysis is a family of
mathematical and statistical techniques that divides data into
groups with similar characteristics [10]. Here we observed
that by clustering the casualties into groups with equal
accident frequencies enhance the understanding on traffic
accidents. In this method specific formula has been used to
calculate the number of severely injured, slightly injured and
number of deaths due to accidents. Cluster specific attribute
is used to characterize the clusters that will affect the
enhancement of road safety. A cluster can be defined, for
example, as ”a set of entities which are alike, and entities
from different clusters are not alike [7].”
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
5.1. SIMPLE K-Means Clustering
A cluster is comprised of a number of similar objects
collected or grouped together. A clustering is a type of
classification imposed on a finite set of objects [11]. In this
study we are going to implement K-Means clustering
approach for our analysis purpose. Specially k-Means
clustering algorithm is used because of its effectiveness for
finding clusters in data. The algorithm proceeds as follows.
Step 1: It asks the user number of clusters k the data set
should be partitioned into.
Step 2: In this step it randomly assigns k records to be the
initial cluster center locations.
Step 3: For each record it finds the nearest cluster center.
Each cluster center “owns” a subset of the records,
thereby representing a partition of the data set. We
therefore have k clusters, C1,C2, . . . ,Ck .
Step 4: For each of the k clusters it finds the cluster
centroid, and update the location of each cluster
center to the new value of the centroid.
Step5: Repeat steps 3 to 5 until convergence or termination.
7847
7749
8423
9009
8400
8419
8437
9279
9393
9370
10667
10895
12017
12154
11643
11898
11982
13165
13352
1312
1279
1435
1440
998
1071
927
1119
1224
1204
1504
1598
1751
1744
1785
1800
1866
2138
2332
2138
2375
6665
6342
6323
7852
7432
7703
7750
8076
8343
8394
9617
10033
11058
10502
10746
11541
12164
13497
13725
14025
Source: General People’s Committee General Security
(2010).
After performing all the operations and application of KMeans clustering algorithm on the data we get the following
cluster output which is shown is figure- 3.
The initial clustering creation depends on the concept of
“nearest” which is calculated by the well known Euclidean
distance formula.
6. RESULT AND ANALYSIS
Here we will give the details of the experiment related to the
analysis. The experiment is performed on a laptop computer
with Intel core i5 CPU, 2.40 GHz speed with a memory of 4
GB and Windows 7 Professional 64 bit Operating system.
For analysis purpose we have taken the accident related data
from the source: General People’s Committee General
Security (2009) Libya. The data related to the number of
Accidents, Number of Killed and Number of Injured is
depicted in the table-1. To get the result of analysis we used
open source Data Mining software WEKA, which is a well
know software for performing almost all the Data Mining
related tasks.
Figure-3 Cluster output after the K-Means clustering
algorithm application.
6.1. VISUALIZATION
The clustering results are presented using WEKA and
dimension reduction techniques, introducing the most
3
[2] Heidi Worley, “Road Traffic Accidents Increase
Dramatically worldwide”,
[http://www.prb.org/Publications/Articles.aspx?search=Road
+Traffic+Accidents+increase+dramatically]
[3]Wikipedia,” List of countries by traffic-related death
rate”,
[http://en.wikipedia.org/wiki/List_of_countries_by_trafficrelated_death_rate]
[4] Murali Krishnan, “India has the highest number of road
accidents in the world”,[ http://www.dw.de/india-hasthe-highest-number-of-road-accidents-in-the-world/a5519345]
[5] Younis Al-Fenadi,” Meteorological information and road
safety in Libya”
[6] David L. Olson, Dursun Delen,” Advanced Data
Mining Techniques”, Springer 2008, Pages 1-30
[7] Sami A¨ yra¨mo¨ , Pasi Pirtala, Janne Kauttonen , Kashif
Naveed Tommi, K¨arkk¨ainen ,”Mining Road Traffic
Accidents”, ISBN 9789513937522, ISSN 14564378
[8] A. Ismail, H.A.M.Yahia, “ CAUSES AND EFFECTS OF
ROAD TRAFFIC ACCIDENTS IN TRIPOLI – LIBYA”,
ISBN 978-602-8605-08-3.
[9]Danial T. Larose, “Discovering Knowledge in Data”,A
John Wiley & sons,INC Publication, New Jersey, 2005,
Pages 27-40.
[10] Olivia Parr Rud, “Data Mining Cookbook”, John
characteristic attribute values for each cluster. The resulted
clusters are shown in figure-4.
Cluster 3
Cluster 2
Cluster 1
Cluster 0
Figure 4 – cluster visualization showing the number of death
casualties in a particular year.
7. FUTURE PREDICTION OF CASUALTIES
In the following figure-5, the number of future predictions of
casualties on Libyan road is expected if proper precaution is
not taken on time and the trend of traffic continues this way.
wiley & Sons, INC Publication, New York, 2001,
Pages 183-206.
[11]A.K.Jain, Richard C. Dubes, “Algorithms for
clustering Data”, Prentice Hall, New Jersey,
Pages 55-140
Author Biographies
1
Sayed Mujahed Hussain, Ph.D. (CS)
Working as a lecturer since past fourteen years in the field of
teaching, in computer science and information technology.
Taught at graduate and post graduate levels at various
national as well as international universities such as
Alanadlus university of science and technology Yemen,
musanna college of technology Oman. Guided several
graduate as well as post graduate students for their final year
projects. Areas of interest includes : DBMS, Data Mining,
Data Structure, OOPS, OS.
Currently working as a lecturer in the department of
Computer Science, Faculty of Arts and science Ajdabiya,
University of Benghazi, Ajdabiya, Libya.
Figure 5 – Future prediction of casualties.
The given result is generated by Weka’s Prediction tool for
future prediction with parameter 5, the number of units to
forecast since 2010 onwards.
8. CONCLUSION
The objective of this work was to experiments Weka’s Data
Mining capabilities such as cluster analysis using K-Means
for finding the clusters of interest and future prediction.
The use of Data Mining tool to automatically generate the
clusters for the traffic incidence directly benefits the
Transportation department and local municipality to develop
proper plan to avoid future casualties due to road traffic
accidents. In this particular study Linear regression technique
has been applied for future prediction of casualties due to
road accidents. By analyzing the result it makes some sense
to take proper precautions for road safety issue.
2
Faraj A. El-Mouadib, Ph.D. (CS)
Working as a Professor in Department of Computer Science,
Faculty of Information Technology, University of Benghazi,
Benghazi, Libya. Chaired various scientific committees, have
been involved in evaluation, reviewing of conference papers.
Have been teaching at graduate as well as post graduate level.
Guided several students for their bachelor and masters
project, and research theses. Areas of interest includes
DBMS, Data Mining, Artificial Intelligence, Cluster analyses
Data Warehousing, etc.
References
[1]. Jiawei Han, M. Kamber,” Data Mining: Concepts and
Techniques” ,Morgan Kaufmann Publisher USA, 2012
Pages 1-42.
4