Download The SAS Application for Road Safety Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Transcript
THE SAS APPLICATION FOR ROAD SAFETY ANALYSIS
Marzena Nowakowska, Laboratory of Computer Science, Technical University
of Kielce, Poland
1. INTRODUCTION
In Poland, details about each road accident are recorded by the police
in the standard format called Road Accident Card. The Card consists of a
number of pieces of information that identify a variety of accident
features such as location, kind of accident, number of victims, number
and types of vehicles/other road users involved. This information is
collected and stored in a computer accident database fi les for use by
local, regional or nat ional authori ties. Each record in the computer
data base file has the same structure as the Road Accident Card. On the
basis of this information, research works on traffic road safety can be
conducted by applying modern computational techniques.
The elaboration of road safety appraisal method is a complex task as it
involves an approach to mulitiactor accident events and the recognition
of the interaction between different road, vehicle and road users
aspects. One of the most important elements of the analysis is the
identification of blackspots. A blackspot is a section of a road with
high accident risk. Having them identified a traffic engineer can
concentrate on the technical detai Is of combining sui table remedial
countermeasures to improve traffic safety.
Statistical Analysis System® has turned out to be very useful in various
research works on traffic road safety. The paper describes a SAS
software application, that enables accident databases analysis and
processing, in the investigation of accident occurence patterns in order
to evaluate the level of road safety and to identify blackspot sections
of a road. The application works in Windows environment. The SAS system
was chosen for the following reasons:
-·it can run on a variety of enginees,
- there are no data size constraints, except for hardware configuration,
- it offers a very wide range of statistical procedures as well as
graphical tools,
data step and macro programming languages greatly ease database
project~n and selec®ion,
®
- SAS/EIS" , SAS/FSP" and SAS/AF" software enable us to provide a
flexible, interactive and user-friendly interface for the analysis.
Practical application facilities, which quickly allow researchers to
explore a number of safety combination features, significantly enhance
the quality of analysis.
2. THE MODEL OF INFERENCE
The accident data are subject to stochastic variations as a consequence
of random nature of accidents, and they are scarse. Taking these facts
into account, and consequently uncertainty of statistical inference, the
849
r
MM_
I
bayesian theory has been adapted to do the research.
statistic approach is used in safety analysis [2, 3, 6].
This kind of
The idea of uncertainty brings together the randomness of an event X and
the doubt as to the value of parameters of the model that describes this
randomness. Using the bayesian approach it is possible to define the
probability distribution of' such variable X with respect to the fact,
that parameters of its distribution are also random variables. The new
distribution hex), known as a compound distribution, can be written in
the following form:
hex)
= Ig(xIA) 'k(A)dA
(1)
A
where: g(XIA) is the conditional distribution of X, given the parameter
~, k(A) is the distribution of A. The distribution (1) is also called a
bayesian distribution.
The random variables considered here, that can stand tor X, are the
following accident events that occur along a certain section of a road:
number of accidents, number of victims and number of involved vehicles.
The number of accident events is independent between different sections,
with Poisson distribution of an annual intensity parameter A.The value
of A is subject to. a gamma distribution and characterises all road
sections. The combination of the Poisson distribution for X given the
intensiy A, and a gamma distribution for the parameter A is a negative
binomial distribution [4] h(x). It describes the distribution of event
number over the road sections. Parameters for this distribution depend
on the parameters of the gamma distribution. Their evaluation is
straightforward as the distributions of Poisson and gamma types are
natural conjugates [5].
I f the intensity A has a pior distribution k(A) and the probability of
the x number of events given A is described by the conditional
distribution g(XIA), then the conditional distribution k~(Alx) is given
by:
k~(Alx)
= g(xIA)'k(A)IIg(xIT)'k(T)dT
(2)
A
The posterior distribution k~(Alx), obtained from the modyfying factor
g(XIA) defined for each individual road section, is used while defining
blackspots classification criterion.
For sections of a road two sources of information can be used to
describe a proposed accident-proness model. The first one, h(x), is the
distribution (1) that describes the. randomness of event number over
these sections. The second, h~(ylx), is the distribution of event number
Y which can occur at those road sections, where the number of x events
has been recorded. The h~(ylx) function is defined by putting the
posterior probability function (2) in the place of k(A) in the formula
(1).
\
>.
'-'
It is obvious that the level of safety varies between different roads
and road sections. Therefore to define the change in accident occurrence
for each road section, the value of cumulative probability of h-(ylx),
calculated for the median of the hex) distribution, has been proposed.
The median is considered as a threshold value that divides the range of
possible event numbers along a section into two parts so that the prior
850
,
;
'J.
probabilities are 0.5 for each part. The posterior probabilities of
were used in the blackspots identification process.
h~(ylx)
3. THE PARTS OF DATA PROCESSING
The analysis is conducted on the basis of the data collected from rural
roads of two neighbouring provinces during the last 3-year period. Only
accidents with human victims are on record; the others as vehicle damage
collisions are not included. A packet of programs has been worked out i~
which SAS language, S&S Elementary Statistics Procedures, SAS/GRAPH
software and SAS/STAT software were extensively used to carry out
essential tasks. These are:
- regruoping and transforming the basic data to the form required by the
application,
- selection of road sections with "accident problem",
evaluation of the needed distribution parameters and then calculation
chosen characteristics of road accidents,
identifying high risk accident sections for further in-depth
engineering analysis,
- presenting final results.
Pic. 1. The main menu of the apl ication for road safety analysis.
In majority of cases the output of one or more programs is the input to
another program; SAS macro language was excellent tool to enable a
fluent flow of information and to join some sequences of program tasks.
The SAS software application for road safety analysis utilizes this
packet of programs in the form of an interactive interface for a user.
To build the application SAS/AF, SAS/FSP, SAS/EIS and SAS Screen Control
Language were used. The first screen of the application is presented in
the picture 1.
851
The main parts of the application ~re: START-UP, ROAD SECTIONS selection
and BLACKSPOTS identification. They were designed using the procedure
BUILD and utilizing the features of FRAME entries, which particularly
well simplified SCL programs respective for these parts. At every stage
while running the application the HELP option is offered to clarify the
current problem.
3.1. THE START-UP PART
To perform the analysis, road sections with "accident problem" should be
determined. In the present context these are sect ions where accidents
groupped; they have a specified allowable maximum length and a specified
allowable mInImum number of accidents along a section. However,
according to the suggestions of a local road safety authority, these
values should fall within certain intervals. The interval for the number
of accidents is <4, 00) and the interval for the section length is <0,4>
km. The default values are 4 and 4.0 respectively. The user can easily
redefine these defaults; see picture 2.
Pic. 2. Starting-up the calculations for the road safety analysis.
In the START-UP part all SAS language programs that take the longest
time to prepare the data for the other parts of the application are run.
Thus, the most time-consuming algorithms are processed once for the
given initial defaults. When the OK push button is selected the set of
information about all roads from the area is investigated, resulting in
new SAS data sets. Each of them contains the accident data from an
individual road. Thus the number of the new sets is equal to the number
of the roads from the area under invest igat ion. These sets become the
substant ial source of informat ion on the base of which sect ions with
accident problem are then determined. This is the most important part of
the whole application. Thank to iterative statements of Screen Control
852
;.",.-
Language a group of some important macros could be call~d for each road.
Here the index variable is a road identification number taken from a
suitable SAS data set using the util i ties of the Widget classes. The
crucial point in macro calculations is the procedure CLUSTER from
SAS/STAT software; the variable used in the clustering algorithm is a
road kilometreage, where an accident has occured. The macros enable also
the selection and projection of SAS data sets as well as the calculation
of new variables to obtain the accident characteristics for a road
section.
3.2. THE SELECTION OF ROAD SECTIONS WITH ACCIDENT PROBLEM
In the second part of the application the user chooses a road for
investigation. As a result a SAS data set, created in the START-UP part,
is opened. It contains the information about sect ions of the choosen
road that have accident problem. From these data statistical accident
characteristics are calculated and then presented in the form shown in
the picture 3.
Pic. 3. The information of road sections with accident problem.
It is also possible to view some detailed information that concerns
every road section, like the starting and the ending kilometreage, the
total number of ki lled and injured as well as the number of involed
vehicles together with their type classification. The SAS procedures
FSVIEW and FSBROWSE made it possible to present the results either in a
full-screen mode or in one-record-at-a-time mode with a carefuly
redesigned display. Every time the user chooses a road a new sequence of
calculations is carried out.
853
3.3. BLACKSPOTS IDENTIFICATION
It is obvious that the traffic volume along a road affects the number of
sections with accident problem; there were several times more sections
on
interregional
road
than
on
local
one.
In
the
BLACKSPOTS
ident ification part, roads wi th at least 5 sections are analysed. For
roads where sufficient data are not available a broader grouping is used
by putting their sections together into one SAS data set.
To determine whether an accident problem section should be identified as
a blackspot, not only number of accidents but also number of victims and
involved vehicles were taken into account.
Hence several random
variables are investigated in the analysis. The probabilities of these
variables occurrence higher than their respect i ve prior median values
can be a significant issue in the inference process. These probabilities
become a vector to be used in the classification process, which is
carried out again by the cluster analysis. The nearest centroid sorting
method processed by the procedure FASTCLUS was applied. To obtain binary
grouping (below or above the median) two initial cluster seeds are
specified; these are the vectors with minimum and maximum lengths
respectively. The classified section can be indicated as hazardous one
(blackspot) if i t belongs to this group of clusters that has a "havier"
seed vector. It should be underlined here that the sections from the
other set can in no way be treated as safe or without accident risk.
Pic. 4. The detailed information about blackspot diagnosis.
Like in the part ROAD SECTIONS, the user chooses a road for which
blackspots have to be ident ified. As a result the global number of
blackspots is displayed.
In addition, for each road section some
detailed information about classification vector values and a blackspot
diagnosis can be presented on request. The one-at-a-t ime present at ion
example is showed in the picture 4.
854
4. REMARKS AND CONCLUSIONS
The presented SAS software application easily gives the required results
provided the initial information is satisfactory. On one hand each
blackspot section can be considered individually in in-depth safety
analysis while on the other hand they can be investigated together to
reveal
accident
similarities.
Initial
results
obtained
for
an
experimental data base have showed that the application is relevant and
useful.
New information can be easily added to data bases and
consequently updates the output.
At every stage of work while bulding the application, in order to obtain
suitable subsets of the basic SAS data set, lots of possibilities
offered by the SAS system were exploited: data set options, filtering
operations (IF, WHERE), BY-group processing, combining SAS data sets
(MERGE, APPEND). Algorithms of fundamental meaning for the analysis
could have been worked out thank to the following procedures: MEANS,
CLUSTER, FASTCLUS, TREE, CAPABILITY, FREQ. Organizing and helping
procedures SORT, DATASETS, CONTENTS considerably eased and fasten the
work.
The macro language made it poss"ible to process iterative
calculations for the same data but various values of parameters for each
iteration step and to organize communication between programs.
The SAS
inspires
open and
engineer
system is a powerful tool in research works. Whats more it
to further investigations. Thus the application is designed as
amenable to expansion, becoming a handy utensil for a traffic
in his work on road safety ..
5. ACKNOWLEDGEMENTS
The SAS software application was build in the course of the work on the
author doctor's thesis. The doctor's degree is being taken under the
direction of Prof. Marian Tracz, Chair of Highway & Traffic Engineering,
Cracow University of Technology, Poland. The author would like to thank
him very mauch for his valuable suggestions and remarks, especially
those concerning the model of inference.
5. REFERENCES
1.
2.
3.
4.
5.
Benjamin J. R., Cornell A. C., "Probability, Statistics and Decision
for Civil Engineers", Mc Graw Hill, Inc, 1970.
B. G. Heydecker and J. Wu, "Using the Information in Road Accident
Records",
19-th Summer Annual Meeting of PTRS, 1991, Sussex,
England.
E. Hauer, "On the Estimation of Expected Number of Accidents",
Accident Analysis and Prevention, Vol. 18, No.1, 1-12.
Johnson L. N., Kotz S., "Discrete distributions", Wi ley & Sons, Inc.
Maritz J.S., "Empirical Bayes Methods", Metheuen and Co LTD, London,
855
6.
7.
8.
9.
10.
11.
12.
1970.
Mountain.L., Fawaz B., "The Accuracy of Estimates of Expected
accident frequences obrained using an Empirical Bayes Approach",
Traffic Engineering and Control, May 1991.
SAS is a registered trademark of SAS Institute, Cary, NC, USA.
SAS/GRAPH, SAS/EIS , SAS/FSP and SAS/AF, SAS/STAT are registered
trademarks of SAS Institute Inc., Cary, NC, USA.
SAS Guide to Macro Processing, Release 6.03 Edition, SAS Institute
Inc., Cary, 1991..
SAS Language: Reference, Version 6 First Edition, SAS Inst itute
Inc., Cary, 1991.
SAS Procedures Guide, Release 6.03 Edition, SAS Institute Inc.,
Cary, 1991.
SAS Screen Control Language, Version 6 First Edition, SAS Institute
Inc., Cary, 1991.
856