Download transportation data analysis. advances in data mining

Document related concepts

Factor analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Transcript
UNIVERSITÁ DEGLI STUDI DI TRIESTE
Sede Amministrativa del Dottorato di Ricerca
UNIVERSITÁ DEGLI STUDI DI PADOVA
Sede Convenzionata
XXIV CICLO DEL DOTTORATO DI RICERCA IN
INGEGNERIA CIVILE E AMBIENTALE
Indirizzo Infrastrutture, Strutture e Sistemi di Trasporto
TRANSPORTATION DATA ANALYSIS.
ADVANCES IN DATA MINING AND
UNCERTAINTY TREATMENT
(Settore Scientifico-disciplinare ICAR/05)
DOTTORANDO
GREGORIO GECCHELE
DIRETTORE DELLA SCUOLA DI DOTTORATO
PROF. IGINIO MARSON
TUTORI
PROF. ROMEO VESCOVI
PROF. RICCARDO ROSSI
RELATORI
PROF. RICCARDO ROSSI
PROF. MASSIMILIANO GASTALDI
Anno Accademico 2011/2012
2
Contents
1 Introduction
11
I Data Mining
13
2 Data Mining Concepts
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
2.2 Classification . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Techniques Based on Measures of Distance
2.2.2 Decision Trees . . . . . . . . . . . . . . . . .
2.2.3 Artificial Neural Networks . . . . . . . . .
2.3 Clustering . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Distances Between Clusters . . . . . . . . .
2.3.2 Hierarchical Methods . . . . . . . . . . . .
2.3.3 Partitioning Methods . . . . . . . . . . . . .
2.3.4 Model-based Clustering . . . . . . . . . . .
2.3.5 Methods for Large Databases . . . . . . . .
2.3.6 Fuzzy Clustering . . . . . . . . . . . . . . .
2.4 Association Rules . . . . . . . . . . . . . . . . . . .
2.4.1 Basic Algorithm . . . . . . . . . . . . . . . .
2.4.2 Apriori Algorithm . . . . . . . . . . . . . .
3 Data Mining in Transportation Engineering
3.1 Knowledge Discovery Process . . . . . .
3.2 Pavement Management Systems . . . .
3.3 Accident Analysis . . . . . . . . . . . . .
3.4 Traffic forecasting . . . . . . . . . . . . .
3.5 Other Studies . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
19
20
22
31
41
43
45
50
54
57
60
62
64
65
.
.
.
.
.
67
67
68
70
71
72
II Road Traffic Monitoring
75
4 Traffic Monitoring Guide
4.1 Data Collection Design . . . . . . . . . . . . . . . . . . . . . .
77
77
3
4.2
4.3
4.4
4.5
4.1.1 Permanent Traffic Counts . . . . . . .
4.1.2 Short Period Traffic Counts . . . . . .
TMG Factor Approach . . . . . . . . . . . . .
4.2.1 Basic Issues of TMG Factor Approach
Details About Factor Approach . . . . . . . .
4.3.1 Types of Factor . . . . . . . . . . . . .
4.3.2 Road Groups Identification . . . . . .
4.3.3 ATR Sample Dimension . . . . . . . .
Alternatives to Factor Approach . . . . . . .
Specificities of Truck Vehicles . . . . . . . . .
5 Review of Traffic Monitoring Guide
5.1 Introduction . . . . . . . . . . . . . . . . .
5.2 Bias and Precision in MDT Estimation . .
5.3 Grouping of Road Segments . . . . . . . .
5.3.1 Clustering techniques . . . . . . .
5.4 Assignment of Road Segments . . . . . .
5.4.1 Assignment Using Traffic Patterns
5.4.2 Multiple Linear Regression . . . .
5.4.3 Artificial Neural Networks . . . .
5.5 Final Considerations . . . . . . . . . . . .
6 Proposed Approach
6.1 Grouping Step . . . . . . . . . . . . . . .
6.2 Assignment Step . . . . . . . . . . . . .
6.3 Measures of Uncertainty in Assignment
6.4 AADT Estimation . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
78
79
79
82
84
84
85
87
89
89
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
91
91
92
94
97
101
102
112
119
126
.
.
.
.
129
130
133
133
135
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
III Case Study Analysis
137
7 Case Study
7.1 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Data Treatment . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Model Implementation . . . . . . . . . . . . . . . . . . . .
7.3.1 Establishing Road Groups Using Fuzzy C-Means
7.3.2 Developing the Artificial Neural Networks . . . .
7.3.3 Calculation of AADT . . . . . . . . . . . . . . . . .
7.4 Results and Discussion . . . . . . . . . . . . . . . . . . . .
7.4.1 Examination of the Estimated AADTs . . . . . . .
7.4.2 Comparison with Other Models . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
139
139
141
142
142
146
146
147
147
152
8 Conclusions and Further Developments
155
9 Bibliography
157
4
A Case Study Data
167
5
6
List of Figures
2.1
2.2
2.3
2.4
2.5
2.6
2.7
KDD Process . . . . . . . . . . . . . . . . . . .
CRISP-DM Framework . . . . . . . . . . . . .
Schematic of ART1 Network . . . . . . . . . .
DBSCAN Distances . . . . . . . . . . . . . . .
OPTICS. Illustration of the Cluster Ordering
APriori. Net of {A, B, C, D} . . . . . . . . . . .
APriori. Subsets of {A, C, D} . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16
17
40
59
60
65
65
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
Example of Kohonen Neural Network . . . . . . . . . . .
Scheme of Assignment Strategies . . . . . . . . . . . . . .
Effect of AE on AADT Estimation Errors . . . . . . . . . .
Cumulative Percentage of Testing Points . . . . . . . . . .
Illustration of Semivariogram . . . . . . . . . . . . . . . .
Comparison of Percentage Prediction Errors . . . . . . .
ANN Model for AADT Estimation . . . . . . . . . . . . .
AADT Estimation Errors Using Various Models . . . . .
AADT Estimation Errors Using Neural Network Models
AADT Estimation Errors Using Factor Approach . . . . .
Comparison of AADT Estimation Errors . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
96
101
108
116
118
119
120
122
124
125
126
6.1
Scheme of the Proposed Approach . . . . . . . . . . . . . . . 130
7.1
Reciprocals of the Seasonal Adjustment Factors for the Road
Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Percentage of 48hr SPTCs With Values of Discord Lower Than
Different Thresholds for 14 AVC Sites . . . . . . . . . . . . .
Mean Absolute Error for 14 ATR Sites Based on Different
SPTC Durations and Discord Values . . . . . . . . . . . . . .
Mean Absolute Error for 14 ATR Sites Based on Different
SPTC Durations . . . . . . . . . . . . . . . . . . . . . . . . . .
Percentage of Samples with Non-specificity Lower Than Different Thresholds for 14 AVC Sites . . . . . . . . . . . . . . .
7.2
7.3
7.4
7.5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
144
150
151
152
153
A.1 Road Section Data Summary (SITRA Monitoring Program) . 168
A.2 Province of Venice AVC Sites (year 2012) . . . . . . . . . . . . 169
7
A.3 Case Study AVC Sites . . . . . . . . . . . . . . . . . . . . . . . 170
A.4 Road Groups Identified with FCM Algorithm . . . . . . . . . 172
8
List of Tables
5.3
Suggested Frequency of Counts of Different Durations . . . 104
Effect of Road Type on AADT Estimation Errors from 48hr
SPTCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Effect of Count Duration on AADT Estimation Errors . . . . 107
6.1
Example of clearly belonging and ”I don’t know” AVCs . . . . . 132
7.1
7.2
7.3
7.4
7.5
7.6
Length Classes Adopted by SITRA Monitoring Program . .
Speed Classes Adopted by SITRA Monitoring Program . . .
SPTCs Datasets Used for Simulations . . . . . . . . . . . . .
Membership Grades of the AVCs to Different Road Groups .
Characteristics of ANNs Used for the Assignment . . . . . .
Mean Absolute Error (MAE) of the Road Groups for Different
Combination of SPTC and Time Periods . . . . . . . . . . . .
Standard Deviation of Absolute Error (SDAE) of the Road
Groups for Different Combination of SPTC and Time Periods
Comparison of MAE of the Proposed Model with Previous
Models (Sharma et al., and LDA model) Using 48hr SPTCS
Taken on Weekdays . . . . . . . . . . . . . . . . . . . . . . . .
140
141
142
143
146
A.1 AVC Sites Sample Size . . . . . . . . . . . . . . . . . . . . . .
A.2 Average Seasonal Adjustment Factors for Road Groups . . .
A.3 Average Reciprocals of the Seasonal Adjustment Factors for
Road Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.4 MAE of AVC Sites. ”Total” Sample . . . . . . . . . . . . . . .
A.5 SDAE of AVC Sites. ”Total” Sample . . . . . . . . . . . . . . .
A.6 Maximum Error of AVC Sites. ”Total” Sample . . . . . . . . .
A.7 Number of Samples of AVC Sites. ”Total” Sample . . . . . .
A.8 MAE of AVC Sites. ”Weekdays” Sample . . . . . . . . . . . .
A.9 SDAE of AVC Sites. ”Weekdays” Sample . . . . . . . . . . .
A.10 Maximum Error of AVC Sites. ”Weekdays” Sample . . . . .
A.11 Number of Samples of AVC Sites. ”Weekdays” Sample . . .
A.12 MAE of AVC Sites. ”Week-ends” Sample . . . . . . . . . . .
A.13 SDAE of AVC Sites. ”Week-ends” Sample . . . . . . . . . . .
171
173
5.1
5.2
7.7
7.8
9
147
148
154
174
175
176
177
178
179
180
181
182
183
184
A.14 Maximum Error of AVC Sites. ”Week-ends” Sample . . . . . 185
10
Chapter 1
Introduction
In the study of transportation systems, the collection and use of correct information representing the state of the system represent a central
point for the development of reliable and proper analyses. Unfortunately
in many application fields information is generally obtained using limited,
scarce and low-quality data and their use produces results affected by high
uncertainty and in some cases low validity.
Technological evolution processes which interest different fields, including Information Technology, electronics and telecommunications make easier and less expensive the collection of large amount of data which can be
used in transportation analyses. These data include traditional information
gathered in transportation studies (e.g. traffic volumes in a given road section) and new kind of data, not directly connected to transportation needs
(e.g. Bluetooth and GPS data from mobile phones).
However in many cases this large amount of data cannot be directly
applied to transportation problems. Generally there are low-quality, nonhomogeneous data, which need time consuming verification and validation process to be used. Data Mining techniques can represent an effective
solution to treat data in these particular contexts since are designed to manage large amount of data producing results whose quality increases as the
amount of data increases.
Based on these facts, this thesis analyses the capabilities offered by the
implementation of Data Mining techniques in transportation field, developing a new approach for the estimation of Annual Average Daily Traffic
from traffic monitoring data.
In the first part of the thesis the most well-established Data Mining techniques are reviewed, identifying application contexts in transportation field
for which they can represent useful analysis tools. Chapter 2 introduces the
basic concepts of Data Mining techniques and presents a review of the most
commonly applied techniques. Classification, Clustering and Association
Rules are presented giving some details about the main characteristics of
11
well-established algorithms. In Chapter 3 a literature review concerning
Data Mining applications in the transportation field is presented; a deeper
analysis has been done with reference to some research topics which have
extensively applied Data Mining techniques.
The second part of the thesis focuses on a deep critical review of the U.S.
Federal Highway Administration (FHWA) traffic monitoring approach for
the estimation of AADT, which represents the main applicative topic of
this research. In Chapter 4 the FHWA factor approach is presented in its
original form, while Chapter 5 reports a detailed summary of the modifications proposed in recent years. From the analysis of the review of FHWA
approach, a new approach is proposed in Chapter 6, based on the use of
Data Mining techniques (Fuzzy clustering and Artificial Neural Networks)
and measures of uncertainty from Dempster-Shafter Theory.
The third part of the thesis (Chapter 7) presents the validation study
of the proposed approach, reporting the results obtained in the case study
context and discussing the main findings.
Finally conclusions and further developments of the research are reported in Chapter 8.
12
Part I
Data Mining
13
Chapter 2
Data Mining Concepts
2.1 Introduction
Data Mining (DM) is a general concept which is used to consider a
wide number of models, techniques and methods, extremely different one
to each other. Many authors have tried to define this concept, providing
some definitions, such as:
Data mining is the analysis of (often large) observational data
sets to find unsuspected relationships and to summarize the data in
novel ways that are both understandable and useful to the data owner.
(Hand, Manilla, and Smith 2001)
Data mining is an interdisciplinary field bringing together techniques from machine learning, pattern recognition, statistics, databases,
and visualization to address the issue of information extraction from
large data bases. (Simoudis 1998)
Data mining is the exploration and the analysis of large quantities
of data in order to discover meaningful patterns and rules. (Berry and
Linoff 2004)
Following the framework developed by Fayyad et al. 1996, which incorporates the basic ideas of these definitions, DM could be considered a
passage of the Knowledge Discovery in Databases (KDD) Process (Figure
2.1). Data obtained from a certain source are selected, pre-processed and
transformed in order to be elaborated by Data Mining techniques. This preprocess is particularly important since in many cases data under analysis
are:
• secondary data (i.e. data stored for reasons different from the analysis);
• observational data (i.e. data not obtained from a precise experimental
design);
15
Figure 2.1: KDD Process. Source: Fayyad et al. 1996
• large amount of data (i.e. data for which it is difficult to define adequate research hypotheses).
The results obtained by DM need to be interpreted and evaluated to
come out with a better comprehension of the problem under analysis. This
point is particularly important to understand the capabilities and the limits
of DM. The application of DM techniques does not guarantee the solution
of a problem, but gives some useful indications that the decision-maker
must interpret to solve his/her problem. The more precise is the problem
definition, the simpler would be the choice of the most effective technique
(or set of techniques) and the better would be the final result of the process.
This concept could become more clear considering the CRISP-DM framework (Chapman et al. 2000), which provides a non proprietary and freely
available standard process for fitting data mining into the general problemsolving strategy of a business or a research unit.
As can be observed in figure 2.2, CRISP-DM is an iterative, adaptive
process, consisting of 6 Phases:
1. Business/Research understanding Phase.
(a) Enunciate the project objectives and requirements clearly in terms
of the business or the research unit as a whole
(b) Translate these goals and restrictions into the formulation of a
data mining problem definition
(c) Prepare a preliminary strategy for achieving these objectives
2. Data understanding Phase.
(a) Collect the data
(b) Use exploratory data analysis to familiarize yourself with the
data and discover initial insights
16
Figure 2.2: CRISP-DM. Source: Chapman et al. 2000
(c) Evaluate the quality of the data
(d) If desired, select interesting subsets that may contain actionable
patterns
3. Data preparation Phase.
(a) Prepare from the initial raw data the final data set that is to be
used for all subsequent phases. This phase is very labor intensive
(b) Select the cases and variables you want to analyze and that are
appropriate for your analysis
(c) Perform transformations on certain variables, if needed
(d) Clean the raw data so that is ready for the modeling tools
4. Modeling Phase.
(a) Select and apply appropriate modeling techniques
(b) Calibrate model settings to optimize results
(c) Remember that often, several different techniques may be used
for the same data mining problem
17
(d) If necessary, loop back to the data preparation phase to bring
the form of data into line with the specific requirements of a
particular data mining technique
5. Evaluation Phase.
(a) Evaluate the one or more models delivered in the modeling phase
for quality and effectiveness before deploying them for use in the
field
(b) Determine whether the model in fact achieves the objectives set
for in the first phase
(c) Establish whether some important facet of the business or research problem has not been accounted for sufficiently
(d) Come to a decision regarding use of the data mining results
6. Deployment Phase.
(a) Make use of the model created: model creation does not signify
the completion of a project
(b) Example of a simple deployment: Generate a report
(c) Example of a more complex deployment: Implement a parallel
data mining process in another department
(d) For businesses, the customer often carries out the deployment
based on your model
In practice the analyst can choose the technique to adopt from a large
number of alternatives. Traditionally DM techniques have been divided in
categories, which are related to the objective one would achieve from the
analysis (Berry and Linoff 2004):
Classification. The main objective is assigning the data in pre-defined
classes, usually described by qualitative variables;
Estimation. The main objective is producing estimates of a quantitative
variable, generally continuous;
Prediction. The main objective is producing estimates of future values that
can be assumed by a quantitative variable of interest;
Clustering. The main objective is subdividing observations (data) in groups
not already defined (clusters), being maximized the similarity among
observations belonging to the same group and minimized the similarities with observations in other groups;
Affinity Grouping (Association). The main objective is defining rules
which describe existing patterns in data, connecting the variables of
interest one to each other;
18
Profiling. The main objective is providing a description of the observations.
Another common classification of DM techniques is between supervised
and unsupervised techniques. In the first case the dataset under analysis has
clearly defined the solution that the algorithm has to learn and reproduce
on new data (e.g. Classification task); in the second case the dataset do
not have a pre-defined solution and the DM techniques try to identify
patterns/relationships among data (e.g. Clusterin or Association rules).
In any case, the choice of the analyst will consider techniques which
come from two different fields: machine learning and statistics. In this thesis more attention has been given to the analysis of machine learning approaches, since they represent more typically Data Mining techniques and
are less explored tools for the research compared to statistical techniques.
For these reasons in the following sections the most important DM techniques will be introduced and described, following in particular the book
of Witten and Frank (2005). Major details will be given for the categories
commonly employed in transportation systems analysis, which have been
considered and applied in this thesis: Classification, Clustering and Association Rules techniques.
2.2 Classification
Classification techniques are probably the most commonly applied DM
techniques and have as a main objective the insertion of observations in
pre-defined classes.
Given a dataset of elements (observations) D = {t1 , t2 , . . . , tn } and a set
of classes C = {C1 , C2 , . . . , Cm }, a classification problem is defining a map
f : D → C for which ti is assigned to just one class. A class C j has the
elements mapped by f, that is: C j = ti | f (ti ) = C j , 1 ≤ i ≤ n, ti ∈ D.
Classes must be pre-defined, non overlapping and such that they partition completely the dataset. Generally a classification problem is solved in
two steps, following a supervised learning approach:
1. Training step. A classification model is defined based on classified data.
2. Application step. Elements belonging to the dataset D are classified by
the model developed in the training step.
It must be observe that a large number of techniques employed to solve
classification problems can be applied to estimation or prediction problems.
In these cases, models do not refer to pre-defined classes, but produce as
output a quantitative response variable, generally continuous.
Four broad categories of techniques are generally employed to solve
classification problem, based on the approach they adopt: techniques based
19
on measures of distance, Decision Trees, Artificial Neural Networks and
Statistical approaches.
Excluding statistical approaches, the most important techniques belonging to each category will be presented in the following sections.
2.2.1 Techniques Based on Measures of Distance
The concept of distance (and similarity) can be successfully applied
in classification problems considering that observations in the same class
should be more similar one to each other than observations in other classes.
The main difficulty in applying this approach is the choice of similarity
measures adequate to the variables adopted.
Formally it is correct to distinguish between measures of similarity and
measures of distance, depending on the type of variables adopted.
In case of qualitative variables similarity among elements in the dataset
must be used.
The similarity sim(ti , t j ) between two observations ti e t j in a dataset D,
is defined as a map from D × D to interval [0, 1], such that sim(ti , t j ) ∈ [0, 1].
Similarity measures have some properties:
1. Non negativity: ∀ti , t j ∈ D, sim(ti , t j ) ≥ 0;
2. Normalization: ∀ti ∈ D, sim(ti , ti ) = 1;
3. Symmetry: ∀ti , t j ∈ D, sim(ti , t j ) = sim(t j , ti );
In practice qualitative variables are re-codified using binary variables
(dummy variables) and similarity indices for these variables are used. Considering 2 observations codified using p binary variables, absolute frequencies are calculated for 4 situations:
• CP = co-presences. Number of variables for which both observations
have value 1;
• CA = co-absences. Number of variables for which both observations
have value 0;
• AP (and PA) = absences-presences (and presences-absences). Number
of variables for which the first (second) observation has value 1 and
the second (first) has value 0.
Different indices of similarity have been proposed, combining the aforementioned four values in different ways:
• Russel and Rao’s index of similarity.
sim(ti , t j ) =
20
CP
p
(2.1)
• Jaccard’s index of similarity.
CP
CP + PA + AP
sim(ti , t j ) =
(2.2)
If there is a complete dissimilarity between observations (CA = p), the
index is undetermined.
• Sokal e Michener’s index of similarity. (simple matching coefficient)
sim(ti , t j ) =
CP + CA
p
(2.3)
If quantitative variables are adopted, measures of distance are calculated. For quantitative variables some properties are satisfied:
1. Non negativity: ∀ti , t j ∈ D, dis(ti , t j ) ≥ 0;
2. Identity: ∀ti , t j ∈ D, dis(ti , t j ) = 0 ⇔ ti = t j ;
3. Symmetry: ∀ti , t j ∈ D, dis(ti , t j ) = dis(t j , ti );
4. Triangular inequality: ∀ti , t j , tk ∈ D, dis(ti , t j ) ≤ dis(ti , tk ) + dis(t j , tk );
In a k-dimensions space, a large number of measures can be used; two
measures commonly adopted are:
1. Euclidean
v
u
t
dis(ti , t j ) =
k
∑
(tih − t jh )2
(2.4)
h=1
2. Manhattan
dis(ti , t j ) =
k
∑
|(tih − t jh )|
(2.5)
h=1
In particular Euclidan distance is the most adopted measure of distance,
even if it can be highly influenced by the presence of extreme values. These
effects are usually due to variables measured on different scales. Normalization (or standardization) can be sufficient to reduce or solve this problem.
Two extremely simple techniques based on the measure of similarity or
distance are presented: the simplified approach and the K Nearest Neighbour.
21
Simplified Approach
This approach has been derived from Information Retrieval(IR) field.
It assumes that each observation ti in the dataset is defined as a vector
of numerical values {ti1 , ti2 , . . . , tik } and that each class C j is defined as a
vector of numerical values {C j1 , C j2 , . . . , C jk }. Each observation is simply
assigned to the class to which the measure of similarity is larger. The vector
representative of each class is generally calculated using the center of the
region which subdivides the training set observations.
K Nearest Neighbour
K Nearest Neighbour (KNN) is a very simple algorithm commonly
adopted for classification. When a new observation is presented to the
algorithm, it calculates the distance between this observation and each
element in the training set. Then only the K nearest elements (the ”nearest
neighbours”) are considered and the new observation is assigned to the
class which contains the larger number of elements. Due to its simplicity,
KNN algorithm is extremely sensitive
√ to the choice of K value. A simple
rule of thumb suggests to chose K = T, where T is the number of elements
belonging to the training set.
2.2.2 Decision Trees
Decision trees algorithms represent classification techniques which divide the instances on the basis of a hierarchy of the attribute space, first
considering the most important attribute (the root of the tree) and progressively using the other attributes till the reaching the attribution of a certain
class (the leaves of the tree).
Decision trees can be considered non-parametric predictive models,
since they do not make specific hypotheses on the probability distribution
of the dependent variable. This fact generally requires more computational
resources and could produce some dependences from the observed data,
limiting the generalization of the results to other datasets (overfitting problem).
Formally the problem can be expressed as:
Being D = {(t1 ), . . . , (tn )} a set of observations ti = {ti1 , ti2 , . . . , tik } defined
by the attributes {(A1 ), . . . , (Ah )} and a set of classes C = {(C1 ), . . . , (Cm )}, a
decision tree (DT) is a tree associated to D, with the following properties:
• each node in the tree is identified by an attribute, Ai ;
• each branch is identified by a predicate applied to the attribute associated to the parent node;
• each leaf node is identified with a class, C j .
22
Decision trees can be divided in two main types:
1. Classification trees, if the output is the assignment of an instance to one
of the predetermined classes;
2. Regression trees, if the output is the estimate of a quantitative variable.
From the applicative point of view the differences are minimal, since
changes are needed only in the definition of the measurements and in the
interpretation of the results, not in the algorithmic structure.
For each output variable yi , a regression tree produces an estimated
value ŷi that is equal to the mean of the response variable of the leaf m
which the observation i belongs to, that is:
∑nm
ŷi =
l=1
ylm
nm
(2.6)
In the case of a classification tree, the estimated values are the probabilities of belonging to a certain group πi . For binary classification (i.e.
classification with only two classes) the estimated probability of success is:
∑nm
πi =
l=1
ylm
nm
(2.7)
where the observations ylm can assume the values 0 or 1, and the probability of belonging to a group is the observed proportion of successes in the
group m.
In both cases ŷi and πi are constant for all the values of the group.
For each leaf of the tree a rule can be derived: usually it is chosen the
one which correspond to the class with the majority of instances (majority
rule). By this way each path in the tree represents a classification rule which
divides the instances on the classes.
The learning algorithm has a top-down recursive structure. At each
level the best splitting attribute is chosen as the one which induces the best
segmentation of the instances and the algorithm creates as many branches
as the number of predicates of the splitting attribute (splitting predicates).
In case of binary trees the branches are two. The algorithm recursively
analyses the remaining attributes, till the reaching of a certain stopping
criterion which determines the definitive structure of the tree.
The main differences between decision trees regard:
1. the criterion function for the choice of the best splitting attribute;
2. the stopping criterion for the creation of the branches.
23
Criterion Functions
Considering the criterion function, at each step of the procedure (at
each level of the tree structure) a function Φ(t), which gives a measure of
the diversity between the values of the response variable in the children
groups generated by the splitting (s = 2 for binary trees) and the ones in the
parent parent group t, is used as index.
Being tr r = 1, . . . , s the children groups generated by the splitting and
p∑r the proportion of observations in t allocated to each children node, with
pr = 1, the criterion function is generally expressed as:
Φ(s, t) = I(t) −
s
∑
I(tr )pr
(2.8)
r=1
where I(t) is an impurity function.
For regression trees the output variable are quantitative, therefore the
use of the variance measure is a logical choice. More precisely for a regression tree the impurity of a node tr can be defined as:
∑ntr
IV (tr ) =
l=1
(yltr − ŷtr )2
ntr
(2.9)
where ŷtr is the mean estimated value for the node tr , which has ntr
instances.
For classification trees the most common impurity measures adopted
are:
1. Misclassification impurity
∑ntr
IM (tr ) =
i=1
1(yltr , yk )
ntr
= 1 − πk
(2.10)
where yk is the class with estimated probability πk ; the notation 1
indicates the indication function, which assumes the value 1 if yltr = yk
and 0 otherwise.
2. Gini impurity
IG (tr ) = 1 −
m
∑
π2i
(2.11)
i=1
where πi are the estimated probability of the m classes in the node tr .
3. Entropy impurity
24
IE (tr ) = −
m
∑
πi log πi
(2.12)
i=1
The use of the impurity functions could be extended in order to globally
evaluate a decision tree. Being N(T) the number of leaves of a tree T, the
total impurity of the tree can be calculated as:
IT =
N(T)
∑
I(tm )pm
(2.13)
m=1
where pm are the proportions of the instances in the final classification.
Stopping Criteria
Theoretically the stopping criteria would activate when all the instances
of the training set were correctly classified. However in practice it’s better to
cut the latest branches added to the tree (pruning), to prevent the creation
of excessively long tree and over-fitting problems. This means that the tree
must give a classification parsimonious and discriminatory at the same time.
The first property leads to decision trees with a small number of leaves, with
decision rules easy to interpret. Conversely the second property leads to a
large number of leaves, extremely different one to each other.
Other relevant factors
Other factors are relevant for a correct definition of a decision tree. They
include:
• An accurate Exploratory Data Analysis process, which excludes outlier data and limits the number of classes;
• An adequate number of observations included in the training set;
• A balanced structure of the tree, which has the same length for each
path from the root node to the leaves;
• The pruning process, which improve classification performances, removing sub-trees resulting from over-fitting phenomena.
Before introducing the more common algorithms adopted for building
decision trees, it can be useful summarize the main advantages and limits
of decision trees. The main advantages of decision trees are the ease of use
and the efficiency, the availability of rules that facilitate the interpretation
of the results, the capability of handling a large amount of attributes and
the computational scalability.
25
The main limits are the difficulty in using continuous data or missing
data, a tendency to over-fitting, that can be counterbalance by the pruning
technique, the fact that correlation among the attributes are ignored and
that the solution space is divided in rectangular regions and this is not good
for all classification problems.
ID3
ID3 algorithm (Quinlan 1986) is based on the Information Theory principles. The rationale of the technique is minimizing the expected number of
comparison choosing splitting attributes that give the maximum increment
of information.
If one considers that decision trees divide the research space in rectangular regions, the division attribute which produces the maximum increment
of information is that one which divides the research space in two subspaces
with similar dimensions. In fact this attribute has attribute values with the
same amount of information.
Therefore, at each level of the tree, ID3 calculates Entropy impurity
associated to each attribute and chooses the attribute which produces the
largest increment in the criterion function Φ(t), called information gain.
Consequently information gain is calculated as the difference between the
entropy before and after the splitting, that is:
Φ(s, t) = Gain(s, t) = IE (t) −
s
∑
IE (tr )pr
(2.14)
r=1
where:
• tr r = 1, . . . , s are the child branches generated by the splitting;
• pr is the
∑ proportion of observations in t allocated to each child branch,
with pr = 1.
C4.5 and C5.0
C4.5 algorithm (Quinlan 1993) represents an improvement of the ID3
algorithm from different points of view. The main one is the substitution of
the Information Gain with the GainRatio as a criterion function, defined as:
Gain(s, t)
Φ(s, t) = GainRatio(s, t) = ∑s
r=1 pr log pr
(2.15)
which it is more stable and less influenced by the number of values
of each attribute. Other improvement are summarized in the following
paragraphs.
26
Numerical Attributes C4.5 algorithm produces binary trees, restricting
the possibilities to a two-way split at each node of the tree. The gain ratio is
calculated for each possible breakpoint in the domain of each attribute.
The main issue about using binary trees with numerical variables is
that successive splits may continue to give new information and to create
trees complex and particularly difficult to understand, because the test on
a single numerical attribute are not locate together.
To solve this problem is possible to test against several constants at
a single node of the tree or, in a simpler but less powerful solution, to
prediscretize the attributes.
Pruning A clear distinction can be made between two different types of
pruning strategies:
1. Prepruning (Forward pruning). This strategy involves the choice of
when stopping the development of the sub-trees during the treebuilding process.
2. Postprunign (Backward pruning). In this case the process of pruning
is made after the tree was built.
Even if the first strategy seems to be more attractive, since it can limit
the development of sub-trees, the second one allows the presence, in some
cases, of synergies between the attributes, that the first one can eliminate in
the building phase. Analysing in more depth postpruning strategy (since it
the most implemented in learning algorithm), two different operations can
be considered:
1. Subtree Replacement
2. Subtree Raising
At each node a learning scheme can decide which one of the two techniques adopting, even both or none of them.
Subtree Replacement refers to the operation of taking some sub-trees
and replace them with single leaves. This operation certainly decreases the
accuracy on the training set, but can give an opposite effect on test set.
When implemented, this procedure starts from the leaves and goes back to
the root node.
Subtree Raising refers to the operation of replacing an internal node
with one of the node below it. It is a more complex and time-consuming
procedure and it’s not always necessary to implement it; for this reason it
is usually restricted to the subtrees of the most popular branch.
A relevant point is how to decide when to perform the two operations.
27
Estimating Errors To do this is necessary to estimate the error rate that
would be expected at a particular node given an independently chosen
dataset, both for leaves and internal nodes. C4.5 algorithm uses the training
set to do this, but it is more statistically sound using an independent dataset
(different from the training and the test sets), performing the so-called
reduced-error pruning.
C4.5 algorithm analyses what happens on a certain node of the tree,
considering that the majority class could be used to represent the node.
From the total number of instances N, a certain error E, represented by the
minority classes, is made.
At this point it is assumed that the true probability of errors at the node
is q and that the N instances are generated by a Bernoulli process with
parameter q, of which E represent the errors. Since the values of E and
N are measured on the training data, and not on an independent dataset,
a pessimistic estimate of the error is made, using the upper limit of the
confidence limit.
This means that, given a confidence level c (the default value used by
C4.5 is 25%), a confidence limit z is such that:
]
f −q
>z =c
Pr
q(1 − q)/N
[
(2.16)
where N is the number of instances, f = E/N is the observed error rate,
and q is the true error rate. This upper confidence limit can be used as a
pessimistic estimate of the error rate e at the node considered:
e=
f+
z2
2N
√
+z
f
N
1+
−
z2
N
f2
N
+
z2
4N2
(2.17)
Fixing the value of the confidence level c to 25% gives a value of z = 0.69,
even if a higher level can be chosen.
In practice the errors is calculated at each node, considering the combined error estimate for the children and the estimate error for the parent
node. If the error of the parent is less the the error for the children nodes,
they are pruned away.
The estimated errors obtained with this calculation must be considered
with particular attention, since they are based on particular strong assumptions; however the method seems to work reasonably well in the practice.
Rules The final decision tree can be used to create a set of rules describing
the classification of the instances; the use of the estimated error allows the
selection of the smallest set of rules, even if this process can lead to long
computational efforts.
28
’C5.0 algorithm is a commercial version of C4.5, implemented in many
DM packages, modified in order to be used efficiently in datasets with large
amount of data. One of the main improvements is the implementation of
boosting, which allows to create multiple training datasets; specific classification trees are created using these subsets and merged together in a final
tree with the boosting operation.
CART
The Classification and Regression Tree (CART) is a technique that produces
binary trees (regression or classification ones) on the basis of the entropy impurity IE . Differently from the ID3 at each node it creates only two branches,
selecting the best splitting following the criterion function:
Φ(s, t) = 2pL pR
m
∑
|P(C j |tL ) − P(C j |tR )|
(2.18)
j=1
The function is calculated with reference to the node t for each of the
two possible splitting s:
1. L and R represent the two children created with the split;
2. pL and pR are the probability that the instances of the training set are
on the right or the left part of the tree; they were estimated as the ratio
between the instances of each child branch and the total number of
instances of the training set;
3. P(C j |tL ) and P(C j |tR ) are the probability that an instance belongs to the
class C j and to the left or right child branch; they were estimated as
the ratio between the instances of the class C j in each child branch and
number of instances in the parent node.
Missing values are not considered in the learning phase, which iteratively goes on until the splitting of the tree does not increments the performances of the tree.
The stopping criterion of the splitting process is related to the global
performance index of the tree and to the pruning strategy. Being T0 the
biggest tree and T a general smaller tree, the pruning process determines
an optimal tree starting from T0 , minimizing the loss function:
Cα (T) = I(T) + αN(T)
(2.19)
where, for a given tree T, I(T) is the total impurity function, N(T) is
the number of leaves in the tree and α is a constant value which linearly
penalizes the complexity of the tree.
29
The pruning process should be operated in combination with accurate
treatments of available data, distinguishing between training set, adopted
for the building of the tree, and testing set, adopted for a correct evaluation
of the model, including the calculation of loss function and the pruning process. In this sense cross-validation, which separates one set of observations
(learning sample) to another completely independent set of observations
(testing sample), can be an effective solution
CHAID
The CHAID (Chi-squared Automatic Interaction Detector) tree classification method was originally proposed by Kass 1980. CHAID is a recursive
partitioning method that builds non-binary trees, based on an algorithm
particularly well suited for the analysis of larger datasets, to solve both
regression-type or classification-type problems.
The basic algorithm differs in case of classification or regression problems. In the first case, when the dependent variable is categorical, relies
on the Chi-square test to determine the best next split at each step, while
for regression-type problems the program will actually compute F-tests.
Specifically, the algorithm proceeds as follows:
Preparing predictors. The first step is to create categorical predictors out
of any continuous predictors by dividing the respective continuous
distributions into a number of categories with an approximately equal
number of observations (prediscretization). For categorical predictors,
the categories (classes) are "naturally" defined.
Merging categories. The next step is to cycle through the predictors to
determine for each predictor the pair of predictor categories that is
least significantly different with respect to the dependent variable,
computing a Chi-square test for classification problems and F tests for
regression problems.
If the respective test for a given pair of predictor categories is not
statistically significant as defined by an alpha-to-merge value, then it
will merge the respective predictor categories and repeat this step (i.e.,
find the next pair of categories, which now may include previously
merged categories).
If the statistical significance for the respective pair of predictor categories is significant (less than the respective alpha-to-merge value),
then (optionally) it will compute a Bonferroni adjusted p-value for the
set of categories for the respective predictor.
Selecting the split variable. The next step is to choose the predictor variable that will yield the most significant split, i.e. having the smallest
30
adjusted p-value; if the smallest (Bonferroni) adjusted p-value for any
predictor is greater than some alpha-to-split value, then no further
splits will be performed, and the respective node is a terminal node.
Continue this process until no further splits can be performed (given the
alpha-to-merge and alpha-to-split values).
A general issue of CHAID, is that the final trees can become very large,
diminishing the ease of understanding characteristic of decision tree methods.
2.2.3 Artificial Neural Networks
Artificial Neural Networks (ANNs) are highly connected systems of
basic computational elements, created with the objective of imitate the neurophysiology of human brain.
A basic neural network is formed by a set of computational elements
(called nodes, neurons or units), connected one to each other by weighted
connections. Neurons are organised in layers and each node is connected
with neurons belonging to previous or following layer. Each node operates
independently by the others and its activity is determined by the input
values received simultaneously by the neurons belonging to the previous
layer. The neuron is activated if input signals overcome a pre-fixed threshold
(bias) and produces a (generally unique) output signal.
Being j a generic neuron with ’bias’ θ j , it receives n input signals x =
(x1j , x2j , . . . , xnj ) from the neurons of the previous layer, with associated
weights w = (w1j , w2j , . . . , wnj ). The neuron elaborates input signals x using
a combination function and the result (potential P j ) is transferred by a
transfer function, producing the final output y j .
Generally the combination function is a linear combination of input
signals x and bias θ j , which can be represented as a further input with
signal x0 = 1 and weight w0j = −θ j :
Pj =
n
∑
xij wij
(2.20)
i=0
The output y j of the j-th neuron is given by the application of a transfer
function to the potential P j :
 n

∑

y j = f (x, w) = f (P j ) = f 
xij wi j 
(2.21)
i=0
The functional form of transfer function f (P j ) is defined during the ANN
model specification. The most common transfer functions used are:
31
1. Linear transfer function
f (P j ) = βP j + α
(2.22)
where α and β are constant values.
2. Step-wise transfer function



α P j > θ j
f (P j ) = 

β P j ≤ θ j
(2.23)
When α = 1, β = 0 and θ j = 0 f (P j ) is the sign function, which takes
values 0 if the input is negative and +1 if it is negative.
3. Sigmoidal transfer function
f (P j ) =
1
1 + e−αP j
(2.24)
where α is a positive parameter.
4. Hyperbolic tangent transfer function
f (P j ) =
1 − e−αP j
1 + e−αP j
(2.25)
5. Gaussian transfer function
f (P j ) = e
−P j 2
(2.26)
V
where P j is the mean and V is the variance of the function.
6. Softmax transfer function
ev j
so f tmax(v j ) = ∑ g
h=i
evh
j = 1, . . . , n
(2.27)
Each ANN has its own structure, which organizes the neurons in three
types of layers: input layer, output layer and hidden layers. The input layer
neurons accept input from the external environment, and generally each
input neuron represents an explicative variable. The output layer neurons
send data produced by the neural network to the external environment.
Hidden layer neurons connect input layer with output layer, without any
relationship with external environment. One or more hidden layers can be
introduce in the neural network structure. Their main role is elaborating
the information obtained from the input layer.
Neural networks can be classified on the basis of four characteristics of
their architecture (or topology):
32
Differentiation between input and output layers. ANNs can have separate input and output layers (the majority of cases) or having layers
which work at the same time like input and output layers;
Number of layers.
Direction followed by the information flow in the network.
• Feed-forward networks: the information is elaborated from one
layer to the next one, following one specific direction.
• Feed-back networks: the information can be elaborated from one
layer to the other connected layers, in any direction.
Type of connections.
• Fully connected networks: each neuron in a layer is connected
only to all the neurons in the next layer;
• Fully interconnected networks: each neuron in a layer is connected to all the neurons in the other layers.
The choice of the architecture is made considering the objective of the
analysis to be made with the ANN and data characteristics. As an example,
some commonly applied neural networks are:
• Auto-associative Neural Networks. The architecture has one layer
of fully interconnected neurons which behave like input and output
layers;
• Single Layer Percetrons (SLP). These networks have a feed-forward
architecture, with n input neurons x1 , x2 , . . . , xn and p output neurons
y1 , y2 , . . . , yn fully connected.
• Multi-Layer Perceptrons (MLP), These networks have a feed-forward
architecture with n input neurons, p output neurons and hi neurons
in hidden layer i. The number of hidden layers i can vary depending
on the needs.
A final parameter to classify ANN is given by the type of training process
they follow:
• in case of supervised learning the values of explanatory variable and
dependent variable are given to ANN for each observation in the
training dataset. The objective of the ANN is modifying its structure
(i.e. weights) such that the sum of distances between observed and
estimated values of the response variables is minimum. The ANN
obtained at the end of the learning can be applied for classification
and estimation tasks;
33
• in case of unsupervised learning only the values of explanatory variable are given to ANN for each observation in the training dataset. The
objective of the ANN is modifying its structure (i.e. weights) such that
the observations are clustered in an effective way. The ANN obtained
at the end of the learning can be applied for clustering task.
In the following sections major details will be given about Multi-Layer
Perceptrons (MLP), which are the ANNs most applied for classification
and estimation tasks and Adaptive Resonance Theory Neural Networks,
as representative of another type of Neural Networks. Details concerning
unsupervised neural networks are given in section 2.3.3, since they can be
applied in clustering problems.
Multi-Layer Perceptrons
Multi-Layer Perceptrons (MLP) are feed-forward architecture with fully
connected neurons, organised in one input layer, one output layer and one
or more hidden layers. The simplest MLP is composed by one input layer
with n neurons x1 , x2 , . . . , xn , one output layer p with neurons y1 , y2 , . . . , yp
and h neurons in the hidden layer.
The layers are connected one to each other by two sets of weights: the
first set is composed by weights wik (i = 1, . . . , n; k = 1, . . . , h), which connect
the neurons in the input layer with the neurons in the hidden layer, the
second one by weights zk j (k = 1, . . . , h; j = 1, . . . , p), which connect the
neurons in the hidden layer with the neurons in the output layer.
Each node in the input layer elaborates the information from the input
layer producing the output:
hk = f (x, wk )
(2.28)
The output from hidden layer neurons is fed to the output layer neurons
which produce the final output:
y j = g(h, z j )
(2.29)
Therefore the final output produced by the j-th neuron of the output
layer is given by:
yj = gj
(∑
)
hk zk j = g j
(∑
k
k
zk j fk
(∑
))
xi wik
(2.30)
i
Some aspects are important for the correct design of a MLP:
• Coding of variables. Quantitative variables are usually described by a
single neuron. For categorical variables one neuron is needed for each
mode of the variable (as done with dummy variables);
34
• Variable transformation. In some cases it can be useful transform
original explanatory variables. In particular standardization can be a
good choice when variables are measured using different scales;
• Choice of the architecture. This choice is relevant for the quality of
final results, however it is difficult to define specific guidelines. The
analysis of performances calculated with techniques such as cross
validation can give useful indication to compare alternative architectures;
• Learning process. The adaptation of weights in the training phase
must be carefully analysed. In particular attention should be given to
two aspects:
– The choice of the error function between observed values and
values determined by the MLP;
– The choice of the optimization algorithm.
Choice of the error function Given a training set of observations D =
{(x1 , t1 ), . . . , (xn , tn )}, the definition of error function is based on the principle
of maximum likelihood, which leads to the minimization of function:
E(w) = −
n
∑
log[p(ti |xi ; w)]
(2.31)
i=1
where p(ti |xi ; w) is the distribution of response variable, conditioned by
values of input variables and by weights of the neural network.
If the MLP is adopted for the estimation of a continuous response variable (regression case) each component ti,k of the response vector ti is defined
by:
ti,k = yi,k + ei,k
k = 1, . . . , q
(2.32)
where yi,k = y(xi , w) is the k-th component of the output vector yi and ei,k
is a random error component. Random errors are assumed to be normally
distributed, therefore the error function is:
E(w) =
q
n ∑
∑
(ti,k − yi,k )2
(2.33)
i=1 k=1
that is similar to a least-squares function.
Otherwise if the MLP is adopted for the estimation of a categorical response variable (classification case) the output is the estimated probability
that an observation belongs to the different classes.
Each class is represented by a neuron, and the activation of the neuron
is the conditional probability P(Ck |x) where Ck is the k-th class and x is the
35
input vector. Therefore the output value represent the estimated probability
that the i-th observation belongs to the k-th class Ck .
The error function becomes:
E(w) = −
q
n ∑
∑
ti,k log(yi,k ) + (1 − ti,k ) log(1 − yi,k )
(2.34)
i=1 k=1
Choice of the optimization algorithm The value of function error E(w) is
highly non-linear with reference to the weights, therefore different minima,
which satisfy ∇E = 0, could exist. The search of the optimum point w∗ is
done iteratively, with algorithms that from an initial estimate w(0) generate
a series of solution w(s) , s = 1, 2, . . . which converges to the value ŵ. The
algorithms follow some basic steps:
1. identify a research direction d(s) :
2. choose the value α(s) and set w(s+1) = w(s) + α(s)d(s) = w(s) + ∆w(s) ;
3. if the convergence (or stopping) criterion is verified, set ŵ = w(s+1) ,
otherwise s = s + 1 and algorithms go back to step 1.
These algorithms must be used carefully, since different elements can
influence the convergence to the optimal solution, including:
• the choice of initial weights w(0) ;
• the choice of convergence criterion, which can be set as a function
of the number of iterations, the computing time or the value of error
function;
• the learning rule, that is the way the increment of weights ∆w(s) is
calculated.
In particular the choice of the learning rule is important for the convergence of the algorithm. Different learning rules have been proposed and are
commonly used. Here only the most important are reported.
One considers the j-th neuron of the neural network, which produces
the output y j , connected to xi j input neurons by weights wij . Some learning
rule are:
• Hebb’s rule (Hebb 1949)
∆wi j = cxij y j
(2.35)
where c is the learning rate. As a rule of thumb it can be assumed that
c = N1 , where N is the number of elements in the training dataset.
36
• Delta rule (Widrow and Hoff 1960)
∆wi j = cxij (d j − y j )
(2.36)
where d j is the value assumed by the output in the training dataset
and c is again the learning rate.
• Generalized Delta rule
∆wij = −η
∂E
∂wij
(2.37)
where η is a learning parameter, usually included in the interval [0, 1],
∂E
and ∂w
is the gradient of the error for the weight wij .
ij
The Generalized Delta rule is applied in the backpropagation algorithm, which is probably the most applied algorithm for the optimization
of neural networks. The structure of this algorithm is similar to the general
one; the name is due to the characteristic that the weights are updated at
each iteration ’going back’ from the output to the input layer. The analytical
complexity of the formulation depends on different factors, such as the type
of function error, the type of transfer function and the type of neurons to
update.
One considers the simple MLP structure introduced at the beginning of
this section and the notation adopted. In addition to this the neurons have a
continuous non-linear transfer function (e.g. sigmoidal) and tmj is the correct
output for input pattern µ. The backpropagation algorithm follows some
steps:
• Given an input pattern xµ , calculate the output of hidden and output
neurons


∑



µ
µ
hk = Φ 
wik xi 
(2.38)
i=0
µ
yj


∑

µ

= Φ 
zk j hk 
(2.39)
k=0
• Calculate the error function for the
Ew =
(
)2
1 ∑∑ µ
µ
tj − yj
2 µ
j
that can be expanded in the form
37
(2.40)
2




∑
∑
1 ∑ ∑  µ
µ


t − Φ 
z
Φ
w
x
Ew =

k
j
ik
i



 j
2 µ
j
(2.41)
i
k
• Apply the generalized delta rule to calculate the variation of weights
zk j
∆zk j = −η
) ( )
∑(
∂E
µ
µ
µ
µ
=η
t j − y j Φ′ P j h k
∂zk j
µ
(2.42)
µ
where Φ′ is the derivative of Φ and P j is the activation potential
µ
µ
µ
µ
of neuron i for pattern µ. If δ j = (t j − y j )Φ′ (P j ) is introduced, the
formulation can be simplified as
∆zk j = η
∑
µ
µ µ
δ j hk
(2.43)
• Calculate the variation ∆wik for weights wik
µ
∑ ∂E ∂h
∂E
k
= −η
∆wik = −η
µ
∂wik
∂w
∂h
ik
µ
(2.44)
k
that can be expanded in the form
∆wik = η
∑∑(
µ
µ
tj
−
µ
yj
)
Φ
′
(
µ
Pj
)
( )
µ
µ
zk j Φ P j xi
′
(2.45)
j
or
∆wik = η
∑∑
µ
µ
δ j z k j Φ′
( )
µ
µ
P j xi
(2.46)
j
The algorithm can be applied following two alternative modes.
• batch (offline) approach. The changes of weights values are applied after that all the observations in the training dataset have been evaluated
and a global error has been calculated.
• incremental (online) approach. The error is calculated for each observation in the training dataset and the weights are modified consequently. This is generally preferred since it allows to examine a wider
number of possible solutions.
38
In some applications the error function Ew is particularly complex. The
main effect is that the finding of the optimal solution can be difficult (a local
minimum is found) or the algorithm can be very slow. Some changes to the
original algorithm have been proposed, including:
• the adoption of a modified version of generalized delta rules:
∆wij (t + 1) = −η
∂E
+ α∆wi j (t)
∂wij
(2.47)
where the modifications to weights at time (iteration) t+1 are not only
based on the gradient, but also on the latest modification of weights,
multiplied by a coefficient α (the momentum), which tends to prevent
possible oscillations of the algorithm around the solution;
• the introduction of adaptive parameters (η and α), which vary their
values during the learning process to accelerate the convergence to
the solution or to improve the search of the solution;
• specific controls in the choice of initial weights;
• addition of ’noise’ during the learning process to avoid the finding of
local minima.
Adaptive Resonance Theory
Adaptive Resonance Theory (ART) Neural Networks have one layer of
processing units, which are fully connected with input buffer with types of
weights (Figure 2.3). This structure is interesting since:
1. ART implements an analogic classification that is considered more
"natural" compared with cluster analysis in classifying patterns based
on the numerical computational results;
2. ART is able to overcome the so-called stability-plasticity dilemma.
When an arbitrary pattern comes into the network, the previously
stored memory is not damaged; instead, a new class is automatically
set up for it.
3. The ART processes on-line training so that no target patterns are
needed.
Analysing in details Figure 2.3:
• n = number of inputs of the network;
• xi = i-th component of the input vector (0 or 1);
39
Figure 2.3: Schematic of ART1 Network. Source: Faghri and Hua 1995
• y j = j-th output;
• wij = the weight for the connection from the j-th output to the i-th
input;
• w∗ji = the weight for the connection from the i-th input to the j-th
output
• ρ = the so-called ’vigilance parameter’, a constant having values between 0 and 1;
• k = index that denotes the winning output element (largest value
among output)
The relationship between the 2 kinds of weight vectors is given by the
equation:
w∗ji =
1+
wi j
∑n
Initially all weights wij are set to 1 and w∗ji are equal to
The ART1 develops using these steps:
40
(2.48)
k=1 wk j
1
.
(1 + n)
1. Compute the outputs using the formula
xj =
n
∑
wk j
(2.49)
k=1
2. Determine the largest output with a ’winner takes all’ strategy, and
let the winner be Xk ;
3. Rate the input pattern match with the following formula:
∑n
i=1
r= ∑
n
wik xi
i=1 xx
(2.50)
4. If r < ρ, set X = 0 and go to step 2.
5. If r > ρ, for all i, if xi = 0 and wik = 0, then, recompute w∗ji for all i, if
any weights have been changed.
ART1 stored vectors, and checked the committed processing units in
order according to how well the vectors [w∗j1 , . . . , w∗jn ] being stored match
the input pattern. If none of the committed processing units match well
enough, an uncommitted unit will be chosen. The network sets up certain
categories for the input vector, and classifies the input pattern into a proper
category. If the input pattern does not match any of those categories, the
network creates a new category for it.
2.3 Clustering
Clustering analysis techniques are the most applied DM descriptive techniques. The main objective of this kind of analysis is subdividing observations (data) in groups not already defined (clusters), being maximized
the similarity among observations belonging to the same group (internal
cohesion) and minimized the similarities with observations in other groups
(external separation).
Formally the problem can be defined as:
Being D = {t1 , t2 , . . . , tn } a dataset of elements (observations), a certain
measure of similarity sim(ti , tl ) defined between two observations ti , tl ∈ D
and an integer value K, the clustering problem is defining a map f : D →
{1, . . . , K} where every ti is assigned to a cluster K j , 1 ≤ j ≤ K. Given a
certain cluster K j , ∀t jl , t jm ∈ K j and ti < K j , sim(t jl , t jm ) > sim(t jl , ti ).
From the definition of the clustering problem, some relevant points can
be highlighted:
41
• Clustering analysis follows an unsupervised approach, that is a reference condition or situation does not exist, or is not available to the
analyst;
• Differently from classes in classification problem, the meaning of each
cluster is not known a priori, therefore the analyst must interpret the
results and define them;
• It is not known a priori the optimal number k of groups to identify,
therefore there is not a unique solution;
• A satisfactory solution is highly influenced by the inclusion (exclusion) of not significant (significant) variables; Exploratory Data Analysis is a relevant preliminary step to consider in order to avoid these
situations;
• The presence of outlier data could negatively affect the identification
of significant clusters; however in some cases the identification of
outliers is the main objective of cluster analysis and outliers represent
relevant data.
Clustering methods can be subdivided in different ways, but traditionally they have been divided in two broad categories, hierarchical and non
hierarchical methods.
• Hierarchical methods produce a hierarchy of partitions. At one side each
object is assigned to a cluster (n clusters, having n observations) and
at the other side all the observations are included in the same group
(1 cluster). Hierarchical method are called agglomerative if at each step
the two most similar clusters are merged in a new cluster, or divisive
if at each step one cluster is splitted in new clusters;
• Partitioning methods create only one partition of data, placing each
observation in one of the k clusters.
More recently other types of clustering have been developed and are
nowadays applied in different contexts, in particular:
• Model-based clustering (Fraley and Raftery 2002), which assumes that
each cluster could be represented by a density function belonging to
a certain parametric family (e.g. the multivariate normal) and that the
associated parameters could be estimated from observations;
Furthermore new algorithms have been developed to deal with specific needs (Dunham 2003), including the analysis of non-quantitative data
(qualitative data, images, ..) or the real-time treatment of large amount of
data (streaming data).
42
Clustering models can also be classified in other ways, considering other
characteristics they present:
• Connectivity models: models based on distance connectivity (hierarchical clustering);
• Distribution models: models which use statistic distributions, such
as multivariate normal distributions, to model clusters (model-based
clustering);
• Centroid models: models which represent each cluster by a single
mean vector (many partitioning models)
• Density models: models which defines clusters as connected dense
regions in the data space (models applied in large databases such as
DBSCAN and OPTICS)
Another important distinction that is useful to introduce here is between
hard clustering and soft clustering.
• In hard clustering, data is divided into distinct clusters, where each
data element belongs to exactly one cluster.
• In soft clustering (also referred to as fuzzy clustering), data elements
can belong to more than one cluster, and associated with each element is a set of membership levels. These indicate the strength of the
association between that data element and a particular cluster.
In the following sections some details will be given about the most
important clustering methods for quantitative data commonly applied in
practice: hierarchical, partitioning and model-based methods. Considering
the increasing attention given to large database, basic indications concerning algorithms for this type of datasets are also presented. For all these
clustering methods hard clustering situations will be considered.
Finally Fuzzy C-means Clustering will be introduced, as a representative
of soft clustering approach.
2.3.1 Distances Between Clusters
The most common measures of similarity and distance have been presented introducing classification techniques (2.2.1). The same measures are
applied in the clustering analysis, but some details are given in this section
since the distance between clusters can be evaluated in different ways, given
a pre-fixed measure of distance.
43
Given a cluster Km with n observations tm1 , tm2 , . . . , tmn one can define:
∑n
tmi
(2.51)
Centroid = Cm = i=1
n
√∑
n
2
i=1 (tmi − Cm )
Radius = Rm =
(2.52)
n
√∑ ∑
n
n
2
i=1
j=1 (tmi − tmj )
Diameter = Dm =
(2.53)
(n)(n − 1)
The centroid Cm could be used as a representative point of a cluster, even
if it could not belong to it; alternatively one can use the medoid Mm which is
an observation of the cluster positioned in proximity of its center.
Being Ki and K j two clusters and dis(til , t jm ) the distance between two
observations til ∈ Ki and t jm ∈ K j , the distance between Ki and K j can be
calculated in different ways:
• Single link method
dis(Ki , K j ) = min[dis(til , t jm )] ∀til ∈ Ki < K j e ∀t jm ∈ K j < Ki
(2.54)
• Complete link method
dis(Ki , K j ) = max[dis(til , t jm )] ∀til ∈ Ki < K j e ∀t jm ∈ K j < Ki
(2.55)
• Average link method
dis(Ki , K j ) = mean[dis(til , t jm )] ∀til ∈ Ki < K j e ∀t jm ∈ K j < Ki
(2.56)
• Centroid method
dis(Ki , K j ) = dis(Ci , C j )
(2.57)
where Ci and C j are the centroids of Ki and K j , respectively.
• Medoid method
dis(Ki , K j ) = dis(Mi , M j )
where Mi and M j are the medoids of Ki and K j , respectively.
44
(2.58)
• Ward method
Ward’s minimum variance method (Ward 1963) minimizes an objective function to produce clusters with the maximum internal cohesion
and the maximum external separation.
The Total Deviance (T) of the p variables is divided in two parts: “Within
Groups Deviance”(W) and “Between Groups Deviance”(B):
T =W+B
(2.59)
In practice, having n observations grouped in k clusters, the deviance
terms of equation 2.59 are calculated by:
– Total Deviance (T), being xis is the value of variable s taken by
observation i and x̄s is the overall mean for variable s.
p ∑
n
∑
T=
(xis − x̄s )2
(2.60)
s=1 i=1
– Within Groups Deviance (W)
W=
g
∑
Wk
(2.61)
k=i
where Wk is the deviance of the p variables of k-th cluster, which
has nk observations and centroid x̄k = [x̄1k , . . . , x̄pk ], defined as:
Wk =
p ∑
nk
∑
(xis − x̄sk )2
(2.62)
nk (x̄sk − x̄s )2
(2.63)
s=i i=1
– Between Groups Deviance (B)
B=
p ∑
g
∑
s=i k=1
At each step the Ward’s method aggregates the groups which produce
the minimum increments of the Within Groups Deviance W and consequently the higher increment of Between Groups Deviance B.
2.3.2 Hierarchical Methods
Clustering hierarchical methods produce a series of partitions of data,
associated to successive levels of grouping, following on ordering that can
be graphically represented by a tree structure.
45
At the root of the tree all the observations belong to the same cluster,
while at the top level (leaves of the tree), each observation belongs to a
separate cluster. Between these two extreme situations, n − 2 levels exist
and for each of them only one partition of the n observations in k groups.
As already written, hierarchical methods are called agglomerative if the
clusters are merged going from the top to the bottom of the tree, or divisive if
the clusters are splitted in a bottom-up way. Since in practical applications
divisive methods are scarcely applied in this review only agglomerative
methods are described.
Agglomerative Methods
Given a dataset D with n observations {t1 , t2 , . . . , tn }, A = {n × n} is
the relative Distance Matrix, which has in each cell the distances between
couples of observations A[i, j] = dis(ti , t j ) and DE = {d, k, K} is the tree
structure, where d is the threshold distance, k is the number of cluster
created and K is the set of clusters.
The algorithm, common to different agglomerative hierarchical methods, has some basic steps:
1. Creation of the tree structure DE = {d, k, K} with one cluster for each
observation;
2. Determination, at each step, of the couple of clusters separated by
the minimum distance d. The clusters identified are merged in a new
cluster of the upper level;
3. The Distance Matrix A and the threshold distance d are updated to
consider the new cluster added to the tree structure;
4. The process terminated when all the observations belong to the same
cluster.
Differences among agglomerative methods regard the choice of the
method used to compute the distances between clusters 2.3.1, which can
differentiate the final segmentation of the dataset.
However, the main difficulty associated to hierarchical is the choice of
the most correct number of clusters k, which guarantees the maximum internal cohesion and the minimum external separation. To solve this problem,
some performance indices have been defined and can help the analyst to
determine the optimal number of clusters. Usually the combined adoption
of more indices is the most reasonable choice.
The most important criteria commonly adopted are:
• Pseudo F Statistic (Calinski and Harabasz 1974; Milligan and Cooper
1985). This statistic analyses the hierarchy at each level and a peak
46
value of PSF reveals the optimal number of clusters. It is calculated
by:
PSF =
T−PG
G−1
PG
n−G
(2.64)
where:
– T=
∑n
i=1 ||xi
∑
− x̄||2 ;
– PG = W j calculated for the G groups at the G-th level of the
hierarchy;
– G = the number of clusters at a given level of the hierarchy;
– n = the number of observations;
– xi = i-th observation;
– x̄ = sample mean vector;
∑
– Wk = ni∈Ck ||xi − x¯k ||2 ;
– x¯k = mean vector for group Ck .
• Davies-Bouldin Index (Davies and Bouldin 1979). This index assigns
the best score (minimum value) to the structure that produces groups
with high similarity within a group and low similarity between groups:
DB =
n
σi + σ j
1 ∑
max
n
d(ci , c j )
(2.65)
i=1,1, j
where cx is the centroid of group x, σx is the average distance of all
elements in group x to centroid cx , d(ci , c j ) is the distance between
centroids ci and c j , and n is the number of groups.
• Dunn Index (Dunn 1974). This index aims to identify sets of clusters that are compact, with a small variance between members of the
cluster, and well separated, where the means of different clusters are
sufficiently far apart, as compared to the within cluster variance. The
Dunn index is defined as the ratio between the minimal inter-cluster
distance to maximal intra-cluster distance, therefore for a given assignment of clusters, a higher Dunn index indicates better clustering.
It can be calculated by:












δ(C
,
C
)
i
j 




DIm = min 
min





 ∀i, j, k



16i6m 
16
j6m,j,i
max
∆

 16k6m k 

where:
47
(2.66)
– δ(Ci , C j ) is the inter-cluster distance between clusters Ci and C j ;
– ∆k measures the intra-cluster distance of cluster k.
• Silhouette measure (Rousseeuw 1987). The silhouette width S(i) takes
values in the interval [−1; +1]. Observations with large values of S(i)
are very well clustered, small S(i) (around 0) means that the observation lies between two clusters, and observations with a negative S(i)
are probably placed in the wrong cluster. The overall average silhouette width for the entire plot is simply the average of the S(i) for all
items in the whole dataset.
For each observation i, the silhouette width S(i) is defined as:
s(i) =
(b(i) − a(i))
max(a(i), b(i))
(2.67)
where:
– a(i) = average dissimilarity between i and all other points of the
cluster to which i belongs. If i is the only element in its cluster,
S(i) = 0;
– b(i) = minC d(i, C), being d(i, C) the average dissimilarity of i to all
observations of other clusters C. b(i) represents the dissimilarity
between i and its ”neighbor” cluster, that is the nearest one to
which it does not belong to.
• Goodman and Kruskal’s index G2 (Goodman and Kruskal 1954). This
index considers all possible quadruples (q, r, s, t) of input parameters
and determines if this quadruple is concordant or discordant, based
on the relative distances d(x,y) between the samples x and y.
A quadruple is called concordant if one of the following two conditions is true:
– d(q, r) < d(s, t), being q and r in the same cluster, and s and t in
different clusters;
– d(q, r) > d(s, t), being q and r in different clusters, and s and t in
the same cluster.
Conversely, a quadruple is called discordant if one of following two
conditions is true:
– d(q, r) < d(s, t), q and r are in different clusters, and s and t are in
the same cluster;
– d(q, r) > d(s, t), q and r are in the same cluster, and s and t are in
different clusters.
48
A good clustering is one with many concordant and few discordant
quadruples. Let Sc and Sd denote the number of concordant and discordant quadruples, respectively, then the Goodman-Kruskal index is
defined as:
GK =
Sc − Sd
Sc + Sd
(2.68)
Large values of GK indicate a good clustering.
• Decomposition of Deviance Indices.
The implementation of the Ward’s method is based on the decomposition of the Total Deviance of the p variables(T = W + B) such that a
good clustering of data is characterised by a small Deviance Within
Groups W and a large Deviance Between Groups B.
Synthetic indices can be calculated from the values of T, W and B.
– R2 index. This index takes values in the interval [0, 1]: tte closer to
1 is R2 , the better is the quality of the clusters identified (Wk ≈ 0 e
B ≈ T). The main shortcoming of this index is that the best value
of R2 is obtained when each observation has its own cluster.
In fact at the same time the maximum homogeneity and the
maximum separation are obtained, but this result is not useful
for applications.
R2 = 1 −
W B
=
T
T
(2.69)
– Root-Mean-Squared Standard Deviation (RMSSTD). This index derives from R2 index and considers only the portion of deviance
within groups that is added at each step. At the h-th step (h =
2, . . . , n − 2) the index is defined by:
√
RMSSTD =
Wh
p(nh − 1)
(2.70)
where Wh is the deviance within the cluster created at the h-th,
nh is the number of observations in the cluster and p are the
variables considered. If two very dissimilar clusters are merged
,RMSSTD has a strong increment, which suggests to stop the
procedure at the previous step.
– Semi Partial R2 (RMSSTD). Also this index evaluates the local contribution given by the h-th step of the agglomerative algorithm.
It is defined by:
49
SPRSQ =
Wh − Wr − Ws
T
(2.71)
where h is the group obtained at the h-th step from the merging
of clusters r and s, T is the total deviance of observations, Wh , Wr
and Ws are the deviance within groups h, r and s, respectively.
SPRSQ measures the increment of the deviance within groups
obtained from the merging of clusters r and s; as observed for
RMSSTD a relevant increment between two successive steps can
be associated to the merging of two dissimilar clusters, and the
procedure can be stopped at the previous step.
2.3.3 Partitioning Methods
Differently form hierarchical methods, partitioning methods create only
one partition of data, placing each of the n observations in one of the k
clusters, where k is chosen by the analyst.
Partitioning methods produce this partition satisfying criteria of optimality, generally expressed by the maximization of an objective function.
This approach is generally more efficient and more robust than that one of
hierarchical methods. In fact, many methods do not need t store the distance
matrix, as happens for hierarchical methods, with relevant computational
advantages. Therefore they can be better implemented in large databases.
However the large number of possible solutions lead to constrained
results, which often correspond to local maxima. In addition to this, the
main difficulty of these methods is related to the choice of the value k by the
analyst. Since this is often difficult, the algorithms are applied varying the
value of k and evaluating the results on the basis of the indices introduced
for agglomerative clustering methods.
Algorithms based on squared error
Algorithms based on squared error determine the partition of data minimizing the squared error.
Given a cluster Ki , with observations ti1 , ti2 , . . . , tim and centroid Ck , the
squared error is defined by:
seKi =
m
∑
||tij − CKi ||2
(2.72)
j=1
Considering a set of clusters K = {K1 , K2 , . . . , Kk }, the squared error for is
defined by:
50
seK =
k
∑
seK j
(2.73)
j=1
At the beginning of the procedure each observation is randomly assigned to a cluster. Then, at each step, each observation ti is assigned to
the cluster whose centroid is the closest to the observation. The centroids
of the new clusters are re-calculated and the squared error is calculated
considering the new partition of data. The procedure is stopped when the
decrement of successive squared error is lower than a pre-fixed threshold.
K-means algorithm
K-means algorithm is probably the most famous partitioning method.
The algorithm chooses initial seeds as initial values for the K-means,
which are representative of the centroids of the clusters in the p-dimensional
space. Seeds must be sufficiently dispersed in the variable space to guarantee an adequate convergence of the algorithm. Specific sub-algorithms,
which impose a minimum distance among seeds, have been developed to
accomplish this task.
Once the initial seeds have been selected, the iterative structure of the
algorithm begins:
• Assignment of the observation to the closest mean;
• Calculation of the mean for each cluster.
The algorithm ends when the maximum number of iterations is reached
or when a certain convergence criterion (such as a minimum value of the
squared error) is satisfied.
K-means method is widely adopted as a clustering method, but suffers
of some shortcomings, in particular a poor computational scalability, the
necessity of giving a priori the number of clusters and a search prone to
local minima.
Different modifications of K-means have been proposed in recent years.
Just citing some of the most recent development:
• X-means (Pelleg and Moore 2000) which allows the identification of the
optimal number of clusters using the Bayesian Information Criteria
(BIC);
• K-modes (Chaturvedi, Green, and Caroll 2001), which is a nonparametric approach to derive clusters from categorical data, following an
approach similar to K-means;
51
• K-means++ (Arthur and Vassilvitskii 2007), which was designed to
choose the seeds for the k-means trying to avoid the sometimes poor
clusterings found by the standard k-means algorithm;
PAM Algorithm
Partitioning Around Medoids(PAM, also called K-medoids) algorithm is a
clustering method which adopts medoids instead of centroids, obtaining a
relevant advantage in the treatment of missing data.
Initially k observations belonging to D are randomly chosen as medoids
and the others are associated to the cluster with the closest medoid (Build
Step). At each iteration the non-medoid observations are analysed, testing
if they can become new medoids, improving the quality of the partition,
that is minimizing the sum of the dissimilarities of the observations to their
closest medoid (Swap Step).
Considering a cluster Ki represented by medoid ki , the algorithm evaluates if any other observation th of the cluster can be changed with ti ,
becoming the new medoid. Cijh is the changing of cost for the observation
t j associated to the change of medoid from ti to th . Repeating this process
for all the observations of cluster Ki , the total changing of cost is equal to
the change of the sum of distances of observations to their medoids.
As a consequence of the medoid change four different conditions could
happen:
1. t j ∈ Ki but ∃ another medoid tm such that dis(t j , tm ) ≤ dis(t j , th )
2. t j ∈ Ki but dis(t j , th ) ≤ dis(t j , tm )∀ other medoid tm
3. t j ∈ Km , < Ki and dis(t j , tm ) ≤ dis(t j , th )
4. t j ∈ Km , < Ki but dis(t j , th ) ≤ dis(t j , tm )
Therefore the total cost associated to the medoid change becomes:
TCih =
k
∑
C jih
(2.74)
j=1
Compared to K-means the main improvement provided by PAM is a
main robust structure; however its use is not suggested for large datasets,
because is highly penalized by its complexity.
Self-Organizing Neural Networks
Artificial Neural Networks can be used to solve clustering problems
adopting unsupervised learning process. In this case ANNs are called SelfOrganizing Neural Networks, since the only parameters are the weights of
52
the neural network, which self-organizes to detect significant groupings in
data. Unsupervised learning can be competitive and non-competitive.
In non competitive learning the weight of connection between two nodes
of the network is proportional to the values of both nodes. Hebb rule is used
to update the values of weights. Given the j-th neuron of the neural network
connected to xij input neurons with weights wij , Hebb rule is defined by:
∆wi j = cxi j y j
(2.75)
where y j is the output of the j-th neuron and c is the learning rate.
In competitive learning neurons compete one with each other and the
winner can update its weights. Usually the network has two layers (input
and output); in the input layer there are p neurons, representative of the p
explanatory variables which describe the observations, connected with the
neurons of output layer.
When an observation is fed to the neural network each node in the
output gives an output value, based on the values of the connection weights.
The neuron whose weights are the most similar to the input values is the
winner. Following the "Winner Takes All" rule, the output is set to 1 for the
winner and 0 for the other neurons, and weights are updated.
At the end of the learning process some relations are detected between
observations and output nodes. These relations mean that some clusters
have been identified in the dataset: the values of the weights of nodes
grouped in a cluster are the mean values of the observations included in
this cluster.
The most famous neural network which adopt competitive learning are
Self-Organizing Maps (SOM), or Kohonen Networks.
Self-Organizing Maps
Self-Organizing Maps (SOM) are Artificial Neural Networks based on
unsupervised competitive learning. They are also known as Self-Organizing
Feature Maps (SOFM) or Kohonen Networks form the name of the mathematician who first proposed them (Kohonen 1982).
Kohonen Networks map each p-dimensional observation into a 1 or
2-dimensional space. In the latter case the output space is represented by
a grid of output neurons (competitive layer), which guarantees the spatial
correlation of clusters in the output space. This contiguity of similar clusters
is due to the fact that the update of neurons is done for the winner neuron
and a group of neuron in its neighbourhood. Using this approach at the end
of the learning spatial partitions of neurons are obtained, which graphically
represent the presence of clusters.
The algorithm is composed by some steps:
53
1. The weights wij between the i-th input neuron and the j-th output
neuron are defined at iteration t as wi j (t) , 0 ≤ i ≤ n − 1, where n is the
number of input. Initial values of weights are randomly chosen in the
interval [0, 1] and the values N j (0) s set, where N j () is the number of
neurons in the neighbourhood of the j-th neuron at the iteration t = 0.
2. Observation X = x0 (t), x1 (t), . . . , xn−1 (t) is fed to the neural network,
where xi (t) is the ’i-th input.
3. Distances d j between input neuron and each output neuron j are
calculated. If Euclidean distance is chosen:
d2j
=
n
∑
(xi (t) − wij (t))2
(2.76)
i=1
4. Neuron which has the minimum distance value is selected and called
j∗ .
5. Weights of node j∗ and nodes included in the neighbourhood defined
by N j∗ (t) are updated. The new weights are calculated by:
wij (t + i) = wij (t) + η(t)(xi (t) − wi j (t))
for j = j∗ e 0 ≤ i ≤ n − 1 (2.77)
where η(t) , 0 ≤ η(t) ≤ n − 1 is the learning rate, which decreases with
t. In this manner the adaptation of weights is progressively slowed
down. In a similar manner dimensions of N j∗ (t) decreases, stabilizing
the learning process.
6. Algorithm goes bask to step 1.
The learning rate η is initially set to values greater than 0.5 and decreases
during the learning process, which usually needs 100 to 1000 iterations.
Kohonen suggested the adoption of a linear decrease as a function of the
number of iteration. SOM are effective and useful clustering techniques, in
particular in those cases when it is important to maintain the spatial order
of input and output vectors.
2.3.4 Model-based Clustering
Model-based clustering has a completely different approach compared
to non-parametric methods and has a particular attractiveness for its capability of determining the optimal number of groups.
Model-based clustering assumes that each cluster could be represented
by a density function belonging to a certain parametric family (e.g. the multivariate normal) and that the associated parameters could be estimated
54
from observations (Fraley and Raftery 2002). The probability density function for the k-th group can be written as:
fx (xi |µk , Σk ) =
exp − 12 (xi − µk )T Σ−1
(xi − µk )
k
p
1
(2π) 2 |Σk | 2
(2.78)
where:
• µk is the p dimensional mean vector;
• Σk is the p × p covariance matrix;
• p is the number of attributes used for the clustering.
Therefore each cluster forms an ellipsoid centered at its means µk with
geometric characteristics determined by the covariance matrix Σk , which
can be decomposed as:
Σk = λk Dk Ak DTk
(2.79)
where:
• λk is the first eigenvalue of Σk , which specifies the volume of the k-th
cluster;
• Dk is the orthogonal matrix of eigenvectors, which determines the
orientation of the k-th cluster;
[
]
• Ak = α1k , . . . , αpk , which determines the shape of the k-th in the p
dimensional space.
Dk , λk and Ak can be considered as separate and independent parameters
and different models can be built and analyzed (Banfield and Raftery 1993).
Identifiers can be used to describe the geometric characteristics of the model
(volume, orientation and shape): E for equal, V for variable and I for Identity.
For example a model classified as VEI means that the clusters are assumed to
have a variable volume, equal shape and orientation parallel the coordinate
axes.
The identification of clusters can be done following two alternative approaches: the classification likelihood approach and the mixture likelihood
approach.
In the classification likelihood approach, the following likelihood function is
maximized, with the objective of identifying parameters θ and class labels
γ = (γ1 , . . . , γn )T adopted for the classification:
LC (θ1 , . . . , θG ; γ1 , . . . , γn |x =
n
∏
i=1
55
fγi (xi |θγi )
(2.80)
In practice exact maximization is impractical for combinatorial aspects
introduced by the presence of labels (Fraley and Raftery 2002), therefore
models-based hierarchical methods are commonly implemented as alternative choice. Adopting a hierarchical approach pairs of clusters which
produce the largest increase in maximum likelihood are merged; the suboptimal solution found is generally a good approximation of the optimal
grouping obtained with the exact maximization.
In the mixture likelihood approach, the objective is identifying parameters
θ and τ which maximize the likelihood function:
LM (θ1 , . . . , θG ; τ1 , . . . , τG |x =
n ∑
G
∏
τk fk (xi |θk )
(2.81)
i=1 k=1
where τk is the probability that an element belongs to k-th cluster, which
meets the following constraints:
τk ≥ 0
G
∑
τk = 1
(2.82)
(2.83)
k=1
To obtain the maximum-likelihood estimation an equivalent log-likelihood
function is derived and the Expectation-Maximization (EM) algorithm is
adopted (Fraley and Raftery 1998). Bayesian Information Criteria (BIC) is
finally applied to find the maximum mixture likelihood and consequently
the optimal number of clusters:
BIC = 2L − r log n
(2.84)
where:
• L is the log-likelihood of the model;
• r is the total number of parameters to be estimated in the model;
• n is the number of elements.
BIC can represent a valid measure of quality since a term is added to the
log-likelihood to penalize the complexity of the model. Otherwise the fit of
the model naturally increases adding more term to the model.
The main difference between classification and mixture likelihood approaches is that in the former each element is assigned to a unique cluster,
while in the latter each object is assigned with a certain probability to each
cluster.
56
2.3.5 Methods for Large Databases
The clustering methods already introduced represent a selection of the
most applied traditional techniques, which are limited when applied to
large datasets, in particular dynamic datasets. This particular type of data,
which is becoming quite common, requires that:
1. data must be read only one time;
2. the algorithm is online, i.e. the ’best’ solution of the algorithm is found
when it is executed;
3. the algorithm can be easily suspended, stopped and restarted;
4. results must be updated incrementally, when data are added or subtracted to the database;
5. memory resources are limited;
6. algorithm can make a scan of the database;
7. each observation is processed only one time.
BIRCH Algorithm
BIRCH algorithm (Balanced Iterative Reducing and Clustering Using Hierarchies) (Zhang, Ramakrishnan, and Livny 1996) is an incremental hierarchical
algorithm, which can be succesfully used in large database since it needs
limited memory resources and read the data only one time.
Its structure is based on the concepts of clustering feature CF and CF
Tree:
⃗ SS), where N is the number
• A clustering feature (CF) is the triplet (N, LS,
⃗ is the sum of elements in a cluster and SS
of elements in a cluster, LS
is the sum of the square of the elements in a cluster;
• A CF Tree is a balanced tree with branching factor B, which is the
maximum number of children that can be generated by a node. Each
node contains a triplet CF for each of its children. If the node is a leaf,
it is representative of a cluster and has a CF for each sub-cluster, which
cannot have a diameter larger than the threshold T.
Therefore CF Tree is a tree which is built adding observations and respecting the maximum diameter T allowed for each leaf, the maximum
number of children B that can be generated by a node and memory limits.
The diameter is calculated as the mean of the distances calculated between
all the couples of observations which belong to the cluster. A larger value
57
of T produces a smaller tree, which is a good clustering solution in case of
limited memory resources.
In the meanwhile clustering features, associated to each node of the tree,
summarise the characteristics of the clusters, speeding up the update of the
tree and reducing the access to the data to only one time.
The building of the CF Tree is a dynamic and incremental process. Defined by parameters B and T the limits of the tree, each observation is
considered and the distance between the centroid of the clusters is determined using the information available from the clustering feature. If the
parameters B and T are respected, the new observation is added to the closest cluster, the clustering feature of the cluster is updated and the same is
done for the clustering features from the cluster to the root of the tree.
If the conditions are violated the new observations is added to the node
as a new cluster and the interested clustering features are calculated and
updated.
BIRCH algorithm is very efficient if the threshold value T has been
correctly identified, otherwise the tree must be rebuilt. Furthermore BIRCH
is adapt in case of spherical clusters, since it is strongly related to the
maximum diameter T for the definition of the boundaries of clusters.
DBSCAN Algorithm
DBSCAN algorithm (Density-Based Spatial Clustering of Applications
with Noise) (Ester et al. 1996) is a partitioning algorithm based on the
measure of density, which is particularly interesting for the possibility of
identifying clusters of arbitrary shape.
The algorithm is guided by parameters MinPts, which defines the minimum number of elements in a cluster and Eps, which is the maximum
distance between two distinct elements in the same cluster. Some preliminary definitions must be given to have a correct comprehension of the
algorithm:
• The Eps − neigborhood of the element p is the set of elements that are
within a circle of radius Eps centered in p;
• If the Eps−neigborhood of p has a minimum number of elements MinPts,
then p is a core point;
• Given the values MinPts and Eps, the element p is ”directly densityreachable” by q if:
1. dis(p, q) ≤ Eps
2. | r|dis(r, q) ≤ Eps |≥ MinPts
i.e. if p is within the Eps − neigborhood of q and q is a core point.
58
• Given the values MinPts and Eps, the element p is ”density-reachable”
by q if exists a chain of elements p1 , . . . , pn , for which p1 = q and pn = p,
such that pi+1 is directly density-reachable by pi , with 1 < i < n.
• Given the values MinPts and Eps, the element p is ”density-connected”
to q if exists an element o, such that p and q are density-reachable by o.
In Figure 2.4 are reported some examples of these concepts, where the
circles have radii equal to Eps and MinPts = 3:
• the elements r and s are ”density-connected” to each other by o;
• the element q is ”density-reachable” by p.
Figure 2.4: DBSCAN Distances. Source: Ester et al. 1996
Using these concepts the density of a cluster is the criterion which determines the belonging of an element to a cluster: each cluster has a central set
of elements directly density-reachable, very close one to each other (the core
points), rounded by a series of other elements in the border of the cluster,
sufficiently closed to the central points (border points). Finally elements not
belonging to any cluster are defined as ’noise’ and considered as outliers.
OPTICS Algorithm
Algorithm DBSCAN is a powerful method to detect cluster with arbitrary shape. However the quality of results is influenced by the choice of
the correct values of parameters MinPts and Eps. This is a common issue
for many clustering methods, but this fact is more relevant in case of multidimensional data, especially if data distributions are distorted with respect
of some dimensions.
OPTICS algorithm (Ordering Points to Identify the Clustering Structure)
(Ankerst et al. 1999) has been developed to solve this problem, giving as
59
a result an ordering of clusters that can be automatically analysed. This
ordering is equivalent to the clustering obtained from DBSCAN algorithm
through a wide range of parameters’ values.
In fact for a constant value of MinPts, the clusters with a higher density
(with smaller values of Eps) are completely included in density-connected
sets with lower density. Executing DBSCAN algorithm with a progressive
variation of the parameter Eps, it is possible to obtain an ordering of clusters
starting from the ones with higher density (Figure 2.5).
Therefore two values are calculated or each of the elements analysed:
the core distance and the reachability distance:
• The core distance of an element p is the smallest Eps′ which makes p a
core object. If p is not a core items, this distance is indefinite.
• The reachability distance of an element q with respect to another element p is the maximum value between the core distance of p and
the Euclidean distance between p and q. If p is not a core items, this
distance is indefinite.
Analysing this couple of values, associated to each element of the
dataset, is possible to establish alternative clustering solutions, evaluating
the influence of the choice of the values of distance Eps′ .
Figure 2.5: OPTICS. Illustration of the Cluster Ordering. Source: Ankerst et al. 1999
2.3.6 Fuzzy Clustering
Differently from hard clustering, fuzzy clustering allows data elements
to belong to more than one cluster, assigning elements using membership
60
levels to each cluster. One of the most widely used fuzzy clustering algorithm is the Fuzzy C-Means (FCM) Algorithm, developed by Dunn in 1973
(Dunn 1973) and improved by Bezdek [(Bezdek 1981),(Bezdek, Ehrlich, and
Full 1984)].
The FCM algorithm attempts to partition a finite set of N elements
X = {x1 , . . . , xN } into C fuzzy clusters minimizing an objective function, as
done by the k-means algorithm. The standard objective function is:
Jm =
N ∑
C
∑
2
um
ij ||xi − c j ||
1≤m≤∞
(2.85)
i=1 j=1
where:
• m is the fuzzifier, which can assume any real number greater than 1;
• uij is the degree of membership of element xi to the j-th cluster;
• xi is the i-th element of the dataset, measured on d variables;
• c j is the d-dimension center of the j-th cluster;
• || ∗ || is any norm expressing the similarity between any measured data
and the center.
The fuzzifier m determines the level of cluster fuzziness: large m result
in smaller memberships uij and hence, fuzzier clusters. When m = 1 the
memberships uij converge to 0 or 1, which represents a crisp partitioning.
Usually, in the absence of experimentation or domain knowledge, m is
commonly set to 2.
Fuzzy partitioning is carried out through an iterative optimization of
the objective function, with the update of membership uij and the cluster
centers c j calculated by:
uij =
1
(
∑C ||xi −c j || )2/(m−1)
k
||xi −ck ||
∑C
cj =
(2.86)
m
j=1 uij
∑C
· xi
m
j=1 uij
(2.87)
The structure of FCM algorithm is very similar to the k-means algorithm:
• Choose the number c of clusters to be determined;
• Assign randomly to each point coefficients for being in the clusters;
• Repeat the following steps until the algorithm has converged:
61
– Compute the centroid for each cluster, using formula 2.87.
– For each point, compute the membership to the clusters, using
formula 2.86.
This convergence of the algorithm is controlled by
{
}
(k+1)
(k)
max |uij − uij | < ϵ
ij
(2.88)
where ϵ is a termination criterion between 0 and 1 and k are the iteration
steps.
The final output of the algorithm are a list of c cluster centres C =
{c1 , . . . , cc } and a partition matrix U = ui, j ∈ [0, 1], i = 1, . . . , n, j = 1, . . . , c.
The algorithm minimizes intra-cluster variance as well, but has the same
problems observed for k-means algorithm, in particular the minimum is a
local minimum, and the results depend on the initial choice of weights.
To reduce these limitations the algorithm should be run different times,
considering stable results.
2.4 Association Rules
Association rules represent widely applied Data Mining methods. However in this sections only the basic concepts are given, introducing the
well-known Apriori algorithm, since they have not be implemented in the
following sections of this thesis.
Differently from Classification and Clustering methods, which are global
approaches, Association rules are local methods. This means that the algorithm concentrates mainly on a part of the complete dataset, in terms of
both records and variables of interest, instead of considering the complete
dataset(records or variables of interest) as done by global approaches.
The main objective of Association rules is identify relationships among
the ”items”, grouping them based on common ”purchase behavior”.
The generic term ”items” is used since the type of elements included in
the database depends on the type of analysis considered. As an example,
for a Market Basket Analysis they are the purchase done by clients in a
supermarket, for a Web clickstream analysis they are the website pages
visited, for an Accident Analysis the accident happened in a given road
network.
The term ”purchase behavior” refers to how data have been collected.
Generally data are organised in transactional databases, where each record
includes all the purchases made by a client in each transaction (i.e. the list
of products bought by the client or the pages visited in the website).
The information acquired at each transaction may change depending
on the objective of the analysis. In the simplest cases only type of product
62
are analysed; in more detailed analyses quantity, cost and other elements of
interest could be added.
In the following paragraphs the basic concepts of Association rules
will be given considering the simple case where only types of product are
analysed. Each product is described by a binary variable, which takes the
value 1 if the product was purchased and 0 if not.
Given a set of items I = {I1 , I2 , . . . , Im } and a transitional database D =
{t1 , t2 , . . . , tn }, where ti = {Ii1 , Ii2 , . . . , Iik } con Iij ∈ I, an association rule is the
implication X ⇒ Y, where X, Y ⊂ I are itemsets for which X ∩ Y = ∅.
This means that the association rule X ⇒ Y describes a significant
relationship between two sets of items of interest (itemsets) from transactions
collected in a database. The fact that a significant relationship X ⇒ Y exists
does not mean that a causal relationship exist between X and Y. Association
rules only states that the two terms of the rule occur together in the database.
Other types of analyses (such as the analysis of the temporal sequences of
purchases) must be adopted to identify causal relationships among items.
Since association rule is a local model, it selects only two itemsets among the
total number of items in the database: the itemset X, which is the premise
of the rule, and itemset Y, which is the consequence of the rule.
To identify significant relationships and define significant association
rule, some indices of significance are adopted. Given an association rule
X ⇒ Y three indices can be defined: support, confidence and lift.
The support s of the association rule X ⇒ Y is the percentage of transactions in the database which contains X ∪ Y, that is the probability that both
events X and Y occurred simultaneously in the dataset:
s = support{X ⇒ Y} =
NX∪Y
= P(X ∩ Y)
N
(2.89)
where NX∪Y and N are the number of transactions which contains X ∪ Y
and the total number of transactions, respectively.
The confidence α of the association rule X ⇒ Y is the ratio between the
number of transactions which contains X∪Y and the number of transactions
which contains X, that is the conditional probability that event Y, given that
event Y has occurred:
α = confidence{X ⇒ Y} =
NX∪Y support{X ⇒ Y}
=
NX
support{X}
P(X ∩ Y)
=
P(X)
(2.90)
(2.91)
where NX∪Y is the number of transactions which contains X ∪ Y and NX
is the total number of transactions which contains X, respectively.
The lift l of the association rule X ⇒ Y is the ratio between the probability
that Y will occur when X occurs to the general probability that Y will occur,
63
that is:
lift{X ⇒ Y} =
support{X ⇒ Y}
confidence{X ⇒ Y}
=
support{Y}
support{X}support{Y}
P(X ∩ Y)
=
P(X) · P(Y)
(2.92)
(2.93)
Lift values are important to define the strength of the relationships
among itemsets:
• Lift > 1: There exists a positive association between event X and event
Y of the rule. In practice, if Lift = 2, it is twice as likely that event Y
will occur when event X occurs than the likelihood that event Y will
occur.
• Lift = 1: There is no association between occurrence of events X and
Y. In practice, if Lift = 1, it is neither more likely nor more unlikely that
event Y will occur when event X occurs, than the likelihood that event
Y will occur. In these cases, X and Y are considered independent.
• Lift < 1: There exists a negative association between event X and event
Y of the rule. In practice, if Lift < 1 it is less unlikely that event Y will
occur upon occurrence of event X, than the likelihood that event Y
will occur. If Lift = 0, then event Y will never occur simultaneously
with event X (X and Y are mutually exclusive events).
2.4.1 Basic Algorithm
To identify significant rules, the database must be screened in an efficient rule by the algorithm. Traditional algorithms are based on some basic
concepts:
• A frequent itemset l is an itemset which occurs a minimum number of
times in the dataset, that is it has a minimum support s;
• L is the complete set of frequent itemsets in a database.
Traditional algorithm are based on two steps:
Step 1. Identification of frequent itemsets in the database;
Step 2. Generation of association rules based on the frequent itemsets.
The definition of set L is necessary to the definition of association rules,
since for each association rule X ⇒ Y must be X ∪ Y ∈ L. The search can
become time-consuming if specific algorithm are not implemented, since,
given m records in the dataset, the total number of subsets is equal to 2m .
Traditional algorithms differ for the type of algorithm they implement
for the identification of ”candidate” itemsets c, which are grouped in the
”candidate set” C.
64
2.4.2 Apriori Algorithm
The Apriori algorithm is probably the most applied algorithm for the
definition of association rules. The main characteristic of this algorithm is
using the property that frequent itemsets are ”downward closed” to optimize
the search in the database.
One considers the dataset in Figure 2.6, which includes the elements
{A, B, C, D} with their subsets and the links representing the relationships
among items.
Figure 2.6: APriori. Net of {A, B, C, D}
Figure 2.7: APriori. Subsets of {A, C, D}
If ABCD is a frequent itemset, all the sets in the path from node ABCD
to the top of the net are frequent itemsets, since itemsets are downward
closed. In Figure 2.7 the paths which refers to set ABC are highlighted and
they include nodes {AC, AD, CD, A, C, D}.
65
The procedure followed by the algorithm is:
Step 1. Scan the entire database, generate candidate itemsets of dimension
i and determine how many of them (Ci ) are frequent;
Step 2. Only frequent itemsets Li are used to generate candidates of the
next level.
At the first level the generation of candidate itemsets is done considering
all the couples in the database, while in the next levels the Apriori-Gen subalgorithm is adopted. Apriori-Gen sub-algorithm combined couples of Li
frequent itemsets, which have i − 1 common items, generating Ci+1 new
candidate itemsets. To determine the frequency of Ci+1 itemsets a new scan
of the database is needed.
66
Chapter 3
Data Mining in Transportation
Engineering
Data Mining techniques have been adopted with increasing attention in
different research fields, including transportation engineering. This chapter
summarizes some of the main applications reported in literature in the latest years (Barai 2003). The objective is not producing an exhaustive review
(which is not feasible in practice given the amount of techniques and applications to be considered), but giving an idea of the research topics which
could benefit of these innovative tools.
It must be noted that other transportation engineering topics not included in this review could potentially take advantage of using these techniques. Whenever basic objectives must be achieved (e.g. classification,
clustering, estimation), DM techniques could be an efficient set of tools to
be used by the analyst.
Moreover the analysis of the wide literature concerning traffic monitoring is reported separately in Chapter 5.
3.1 Knowledge Discovery Process
As observed in Section 2.4, DM can be considered a part of a more general
process of knowledge discovery. Some paper dealt with the implementation
of this process for transportation applications.
In their paper Zhaohui et al. (2003) identified the importance and the
necessity of applying knowledge management (KM) in Intelligent Transportation System (ITS). The paper discussed possible targeted market, the
application ranges of KM in ITS, the main contents and the primary methods
of KM in ITS, finally proposing a typical model of KM in ITS.
Haluzová (2008) described the application of data mining methods
(KDD framework) in the database of the DORIS transportation information system, used by the Prague Public Transit Company. Descriptive sta67
tistical methods and association rules were successfully applied to create
knowledge about the behaviour of objects within the information system.
Similarly Maciejewski and Lipnicki (2008) applied data exploration and
data mining techniques for the analysis of monitoring databases of a large
transport system. The analyses focused on discovering relationships between key metrics of a transport system as such availability/usage profiles
of the fleet and various factors on which they apparently depend, such as
age. Based on real data from the transport system of the Polish Post, data
exploration, data mining and efficient data summarization / visualization
tools were found to be able to find bottlenecks, atypical patterns or event
fraud-related events in the mail transport / mail delivery process.
Yang and Tong (2008) implemented KDD framework in a InformationService-Oriented Experimental Transportation System (IETS) in China, with
the aim of deducing useful traffic information from real-time traffic data.
The example given by the authors showed that this method can be useful
to discovery unknown information in traffic data.
Zhang, Huang, and Zong (2010) applied KDD framework (Knowledge
Discovery in Databases) to decision making of railway operation safety in
China. The application of KDD framework to accident treatment showed
important significance to assist the decision making for train running safety.
Another application in railway context was proposed by MELO007
who presented the so-called MPDF-DM methodology for the prediction of
railroad demand. This methodology integrated existing methodologies (including for example CRISP-DM) and was successfully applied in a customer
transport request database of Brasil.
Recently Rahman, Desa, and Wibowo (2011) reviewed data mining (DM)
applications in logistics, considering their benefits and real-world expectations. The authors noted that very little is known about the usefulness of
applying data mining in transport related research. Since frameworks for
carrying out knowledge discovery and data mining (e.g. KDD or CRISPDM) have been revised over the years to meet the business expectations,
the authors proposed a framework to be tested within the context of transportation industry.
3.2 Pavement Management Systems
From the 1960s, Pavement Management Databases (PMS) have been
designed and implemented by many transportation agencies. The main
objective of these systems is supporting the efficient management of the infrastructure, in terms of planning, design, construction, maintenance, evaluation, and rehabilitation of pavements. With rapid increase of advanced
information technology, many investigators have successfully integrated
the Geographic Information System (GIS) into PMS for storing, retrieving,
68
analysing, and reporting information needed to support pavement-related
decision making (G-PMS). G-PMS are enhanced with features and functionality by using a geographic information system (GIS) to perform pavement
management operations, create maps of pavement condition, provide cost
analysis for the recommended maintenance strategies, and long-term pavement budget programming.
With the increasing amount of pavement data collected, updated and
exchanged due to deteriorating road conditions, increasing traffic loading,
and shrinking funds, DM and KDD process have been implemented as a
further tool for decision-makers.
Soibelman and Kim (2000) applied Knowledge Discovery in Databases
(KDD) and Data Mining (DM) as tools to identify the novel patterns in
construction fields. Leu, Chen, and Chang (2001) investigated the applicability of data mining in the prediction of tunnel support stability using
an artificial neural network (ANN) algorithm. The case data of a railway
tunnel were used to establish the model. The validation performed showed
that the ANN outperformed the discriminant analysis and the multiple
non-linear regression method in predicting tunnel support stability status.
Attoh-Okine (Attoh-Okine 2002; Attoh-Okine 1997) presented applications
of Rough Set Theory (RST) to enhance the decision support of the pavement
rehabilitation and maintenance. Sarasua and Jia (1995) explored an integration of GIS Technology with knowledge discovery and expert system for
pavement management.
More recently Zhou et al. (2010) explored the applicability of data mining and knowledge discovery (DMKD) in combination with Geographical
Information Systems (GIS) technology to pavement management in order
to better decide maintenance strategies, set rehabilitation priorities, and
make investment decisions. Different data mining techniques, including
decision trees and association rules, were used to find pertinent information hidden within a pavement database covering four counties within the
State of North Carolina. The knowledge discovery process identifies seven
rules, which guided the creation of maps (using GIS tools) of several pavement rehabilitation strategies. Analysing the results obtained for the pilot
experiment in the project, the authors concluded that:
• the use of the DMKD method for the decision of road maintenance and
rehabilitation can greatly increase the speed of decision-making, thus
largely saving time and money, and shortening the project period;
• the DMKD technology can make consistent decisions about road
maintenance and rehabilitation if the road conditions are similar, i.e.,
interference from human factors is less significant;
• integration of the DMKD and GIS technologies provides a Pavement
69
Management System with the capabilities to graphically display treatment decisions.
3.3 Accident Analysis
Analysis of accidents was found to be one of the topic with the largest
number of DM applications. Analyses were developed for different transportation modes (e.g. for aviation incidents see Gürbüz, Özbakir, and Yapici
(2009)), but the majority of studies have considered road accidents. Different research areas of interest can be identified, including Accident Detection
(where prediction methods such as ANNs were adopted (Hu et al. 2003)),
Black-spots and Risk factors identification.
In this review more details are given about the identification and quantification of risk factors, a rich research area that can be helpful to understand
future developments of DM in transportation studies. Analysing the large
amount of papers with DM techniques, two types of approaches can be
identified, depending on how the spatial characteristics of accidents were
treated.
In the first case traditional DM techniques (section 2) are applied, considering the spatial element as a variable like the others. In the second case
DM techniques are extended, integrating some important concepts from
Geographical Information Systems (GIS) and geographical sciences.
Some applications of the traditional approaches are the following.
Kalyoncuoglu and Tigdemir (2004) adopted an artificial neural networks
(ANN) approach to simulate the effects of driver characteristics (i.e. gender, age, education, driving years, Kms drive per year) into the traffic accidents (Percentage of involvement in traffic accidents). The flexible and
assumption-free ANN approach produced predictions considered by the
authors highly satisfactory.
Chang (2005) compared a negative binomial regression model and an
ANN model to analyse vehicle accident frequency in the National Freeway
1 in Taiwan. The results reported in the paper demonstrates that ANN is a
consistent alternative method for analyzing freeway accident frequency.
Analysing another accident dataset, the same author (Chang and Chen
2005) found that also Classification and Regression Tree (CART) is a good alternative method for analyzing freeway accident frequencies and establishing empirical relationships between traffic accidents, highway geometric
variables, traffic characteristics and environmental factors.
In 2006, Pande and Abdel-Aty (2006) used data mining process to relate
surrogate measure of traffic conditions (data from freeway loop detectors)
with occurrence of rear-end crashes on freeways. Freeway traffic surveillance data collected through underground loop detectors is a ”observational” database maintained for various ITS (Intelligent Transportation Sys70
tems) applications such as travel time prediction. The results highlighted
that a classification tree model with chi square test as splitting criterion
was better than any of the individual or combined neural network models
tested. The decision tree also provided simple interpretable rules to classify
the data in a real-time application.
Finally, the same authors (Pande and Abdel-Aty 2009) applied association rules analysis (market basket analysis) to detect interdependence
among crash characteristics using non-intersection crash data from the state
of Florida for the year 2004. Based on the association rules discovered from
the analysis, they found significant correlation of accidents with some conditions (e.g. lack of illumination or rainy conditions) consistent with the
understanding of crash characteristics. The authors stated that the the potential of this technique might be realized in the form of a decision support
tool for the traffic safety administrators.
In the more innovative approach specific attention is given to concept
of spatial autocorrelation (Getis 2008), which explains the correlations among
variables in a georeferenced space. This means that for certain dataset the
correlations among variables can vary depending on the spatial localization.
Without giving further details, it can be observed that some studies (Flahaut
2004; Flahaut et al. 2003; Khan, Qin, and Noyce 2008) have demonstrated
that spatial autocorrelation can be a relevant element when analysing the
spatial characteristics of events (such accidents), with relevant changes in
the final results of analyses.
DM techniques are evolving in the sense of integrating all these aspects
in common complex frameworks (Geurts, Thomas, and Wets 2005). In the
future the adoption of these multidisciplinary approaches will probably
represent a relevant topic for Transportation Engineering to be considered
with interest (Spatial Data Mining (Ladner, Petry, and Cobb 2003; Miller
and Han 2009) and Spatio-Temporal Data Mining (Cheng and Wang 2008;
Mennis and Liu 2005)).
3.4 Traffic forecasting
In recent years Intelligent Transportation Systems (ITS) have been implemented widely throughout the world, giving the access to large amounts of
data. In this particular case DM techniques are appropriate tools to acquire
useful traffic patterns, in particular when ”real-time” data are available.
This fact has moved traffic operating systems from passive to proactive
control and management through the adoption of traffic forecasting data.
From the review of traffic-flow forecasting models done by Han and Song
(2003) DM techniques represent one of the basic models currently adopted
in this field.
In particular ANNs have been commonly used for the problem since
71
1990’s, given the fact the appropriate neural network structure can approximate a given real-valued, continuous multi-variate function to any desired
degree of accuracy (Hornik 1989). Different structures have been applied,
including multi-layer perceptron, radial-based function and hybrid methods (Dougherty and Cobbett 1997; Van der Voort, Dougherty, and Watson
1996).
However, also other Data Mining techniques have been implemented
with interesting results. Gong and Liu (2003) proposed an algorithm based
on association rules mining and association analysis.
Wang et al. (2006) presented a dynamic traffic prediction model. The
model deals with traffic flow data to convert them into traffic status. In this
paper clustering analysis and classification analysis are used to develop the
model and the classification model can be used to predict traffic status in real
time. The experiment shows the prediction model can be used efficiently in
the dynamic traffic prediction for the urban traffic flow guidance.
In their paper Gong and Lu 2008 proposed a Data Mining based Traffic
Direction Control Algorithm (DMTDCA) to adjust the traffic direction of
Direction-Changeable Lanes (DCLs) in a tunnel in Shanghai. Current traffic
flow and short-term forecasted traffic flow of two tunnel entrances were
analysed and the direction change is decided automatically and timely.
Field tests showed an increase of average traffic capacity and a decrease of
average queue length.
Interesting applications of DM traffic forecasting have been done also
for air traffic flows and railway passenger flows.
Cheng, Cui, and Cheng (2003) employed a combination of neural networks and statistical analysis of historical data to forecast the inter-regional
air traffic flow in China. Models for different prediction conditions were
derived from the analysis of large collection of data radar. The accuracy of
predictions was found satisfactory.
In railway case Xu, Qin, and Huang 2004 proposed an approach to forecast railway passenger flow based on spatio-temporal data mining. The
approach first forecasts time sequence of target object using statistical principles, then figures out the spatial influence of neighbour objects using a
neural network, and finally combines the two forecasting results using linear regression. Comparing with previous approaches, that did not consider
the spatial influence, the approach resulted in higher forecasting accuracy.
3.5 Other Studies
In this section, other interesting applications of Data Mining in transportation field are reported.
Jiang and Huang (2009) addressed the problem of calibrating speeddensity relationship parameters used by mesoscopic traffic simulators with
72
data mining techniques. The authors combining K-means with agglomerative hierarchical clustering, being able to reduce early-stage errors inherent
in agglomerative hierarchical clustering resulted in improved clustering
performance. The results from the case study in Beijing showed that the
performance of the new algorithm was better than previous solutions.
Chen et al. (2004) applied DM techniques to online vehicle tracking
system data, which are generally an enormous quantity of records. Vehicle
behaviors could be mined to understand the status of every vehicle or driver
(e.g. deviations from routes, driving against traffic regulations) and alert to
abnormal conditions. The implementation of these techniques could give
reduction of the operation costs, greater flexibly of dispatching vehicles,
and therefore competitive advantages for the transportation industry.
Hayashi et al. (2005) presented a detection method of driver’s drowsiness with focus on analysing individual differences in biological signals
(pulse wave) and performance data (steering data). Biological signals of
different drivers were analysed by neural networks, which successfully
adapted to the differences observed among drivers. The correct detection
of driver’s drowsiness is a need for realization of safer traffic environment, contributing to prevent traffic accidents caused by human errors in a
drowse.
73
74
Part II
Road Traffic Monitoring
75
Chapter 4
Traffic Monitoring Guide
Based on the literature review of Data Mining applications (chapter 3),
road traffic monitoring was found to be a topic of interest for the application
of these techniques.
Road traffic monitoring represent a relevant aspect of highway planning
activities for many transportation agencies. Since data collection activities
produce relevant management costs, both for equipment and personnel,
monitoring programs need to be designed with attention to obtain the
maximum quality of results.
While in Italy there is little guidance on how monitoring program must
be implemented in practice (Sistemi di Monitoraggio del Traffico. Linee guida per
la progettazione), in the U.S.A. the Federal Highway Administration (FHWA)
provides the guidelines for the implementation of statewide data collection
programs by way of the Traffic Monitoring Guide (TMG) (FHWA 2001).
TMG describes the analytical logic followed in a monitoring program
and provides the information highway agencies need to optimize their
frameworks. In practice each State or local highway agency has its own
traffic counting needs, priorities, budgets, geographic and organizational
constraints. However the general framework proposed by TMG can be applied in different contexts selecting different equipment for data collection,
using different collection plans for obtaining traffic data, and emphasizing
different data reporting outputs.
For this completeness TMG represents a reference for many other countries (Italy included), therefore in this thesis it has been analysed in depth,
identifying the main issues still unsolved (Chapter 5).
4.1 Data Collection Design
Collecting data and predicting the traffic patterns for both short-term
and long-term planning are the basic responsibilities of transportation agencies. One of the most important traffic parameter is the Annual Average
77
Daily Traffic (AADT), which should be known for each section in the road
network. AADT is defined as the bi-direction traffic count representing an
average 24-hour day in a year for a given road section. It is an essential information for pavement design, fuel-tax revenue projection, and highway
planning.
Although it is ideal to monitor traffic volume for the entire year, data
collection incurs cost and manpower; hence, AADT is often estimated with
minimum data collection effort with monitoring activities. To provide accurate estimates of AADT traffic monitoring program must measure and
account for the variability of traffic patterns as they occur over time and
space. The variability in traffic conditions is related to different time scales,
including time of day, day of week, season (month) of the year and different
space levels, from the direction of traffic in the same section to geographical
variations on a certain area, such as a Province or a Region.
Moreover differences in traffic variation also exist by type of vehicle,
since truck volumes vary over time and space differently than automobile volumes (Hallenbeck et al. 1997). For this reason the latest version of
TMG (FHWA 2001) suggests to differentiate the data collection using three
or four simple categories, obtained aggregating existing FHWA classes,
such as passenger vehicles (motorcycles, cars, and light trucks), single-unit
trucks (including buses), single-unit combination trucks (tractor-trailers),
and multi-trailer combination trucks.
To monitor traffic flows in a network, the basic program design recommended by TMG consists of two types of counts:
1. Permanent Traffic Counts (PTCs)
2. Short Period Traffic Counts (SPTCs)
which are combined using the Factor approach.
4.1.1 Permanent Traffic Counts
Permanent Traffic Counts are counts taken on a road section all year
long using continuous traffic monitoring data collection equipments. PTCs
provide knowledge of time-of-day, day-of-week, and seasonal travel patterns and very precise measurements of changes in travel volumes and
characteristics at a limited number of locations.
The most common continuous data collection equipments include Automatic Traffic Recorders (ATR) and Automatic Continuous Vehicle Classifiers
(often abbreviated AVC or CVC). AVCs should be preferred to ATRs, since
they can provide information about volumes of different vehicle classes,
while ATRs provides only total volume values. This difference is relevant
for many traffic analyses, which are more dependent on truck volumes than
they are on total traffic volumes, such as pavement design. Both types of
78
counters generally adopt inductance loops technology because it allows for
reliable, long lasting installation of the vehicle detector. In case of AVCs
dual loop systems are installed and estimates the total length of vehicles
crossing the loops can be obtained.
Other equipments often included in traffic monitoring program are continuously operating weigh-in-motion (WIM) scales, placed to monitor vehicle weights, and volume and speed monitoring stations, that provide
facility performance data to traffic management systems. However they are
not described here since they are less important in the implementation of
the monitoring program and less common in the Italian application context.
4.1.2 Short Period Traffic Counts
Short Period Traffic Counts are taken for short periods (from 24 hours to
some weeks) using portable traffic counters based on various technologies
(e.g. infrared, microwave, radar). SPTCs provide the geographic coverage
needed to understand traffic characteristics on individual roadways. Because permanent counters are expensive to install, operate, and maintain,
SPTCs must be taken on the road network to provide accurate measurements of traffic conditions.
The TMG recommends that SPTCs data collection consists of a periodic
comprehensive coverage program over the entire system on a 6-year cycle
(reduced to a 3-year cycle on the main roads of the network). However
roadway sections that are of major interest (locations of pavement design
projects, corridors for major investment studies, etc.) are counted often
(every year or every two or three years), while other roadway sections with
little activity or with stable traffic volumes may be uncounted for many
years.
Depending on the the number and duration, SPTCs taken on a road
section can be further classified as:
• Seasonal Traffic Counts (STCs) or Coverage Counts, when the duration is from 24 hours to some weeks and the number is from 2 to 12
times in a year;
• Short Period Traffic Counts (SPTCs) strictly meaning, when the duration is less then a week and they are taken only once in a year.
In the following parts of this thesis this distinction between STCs and
SPTCs will be maintained.
4.2 TMG Factor Approach
TMG factor approach (also known as FHWA or traditional factor approach) indicates how SPTCs and PTCs are combined to estimate average
79
traffic conditions in a given section of the road network.
PTCs give detailed information about traffic conditions all the year long,
but they are obtained only in a limited number of road sections. Conversely
SPTCs can measure the traffic conditions in different road sections, but only
for the limited amount of time during which the counts are taken.
TMG factor approach assumes that temporal characteristics affect all
roads in the network and since continuous temporal data (PTCs) exist at
several points to describe the temporal variation, it is possible to transfer
this knowledge by developing a factoring mechanism.
The factoring process defines a set of roads as a ’road group’ and all
roads within that group are assumed to behave similarly. Then a sample of
locations on roads from within that group is taken and data are collected
(PTCs). The mean condition for that sample is computed and that mean
value used as the best measure of how all roads in the group behave.
Adjustments factors (seasonal adjustment factors) are calculated for each
road group to account for temporal variability (time-of-day, day-of-week
and seasonal) of the traffic stream. Road monitored with SPTCs are assigned
to one of the ’road group’ and road group seasonal adjustment factors are
used to correct data obtained from SPTCs and estimate annual average
conditions.
The procedure proposed in the TMG can be summarized in four steps:
Step 1: Define groups of roads which are assumed to behave similarly.
A sample of locations on roads from within each group is taken and
PTCs are collected, generally using an AVC;
Step 2: Compute the mean conditions for the sample of each road group
and assume those mean values as the seasonal adjustment factors for
the road group;
Step 3: Assign the road section in question, that is monitored with a SPTC,
to one of the groups defined in step 1;
Step 4: Apply the appropriate seasonal adjustment factor to the SPTC of
the road group to produce the AADT estimates for the road section in
question.
Assuming that a State has decided to consider weekly and monthly
variations of traffic volumes and has identified the road groups, the factor
approach leads to the following computation (further details about the
creation of road groups and the type of factors will be given in section 4.3).
The seasonal adjustment factor for an AVC site k for the i-th day of the
week of the j-th month is calculated by:
fijk =
AADTk
ADTijk
80
(4.1)
where AADTk is the AADT for the k-th AVC site, ADTi jk is the average
daily traffic recorded in the i-th day of week of the j-th month in the k-th
AVC site.
Since AVC sites grouped together are supposed to have similar traffic
patterns, the seasonal adjustment factors that correspond to (i, j) combinations are calculated for each road group. If n AVC sites are in road group c,
the seasonal adjustment factor for the i-th day of week of the j-th month is
calculated by:
1 ∑ AADTk
1∑ k
=
fi j
n
ADTijk
n
n
fijc =
n
k=1
(4.2)
k=1
where AADTk is the AADT for the k-th AVC site in group c and ADTijk
is the average daily traffic recorded in the i-th day of week of the j-th month
in the same AVC site.
Once a road section is assigned to a group c, AADT can be estimated by
multiplying the daily traffic count DTij , obtained for the i-th day of week of
the j-th month, by the corresponding seasonal adjustment factor fijc :
AADTEstimate = DTi j · fi jc
(4.3)
DT is the 24 hour volume obtained from SPTC; if SPTC is for more than
24 hours, then DT is the average 24 hour volumes for the duration of SPTC.
To calculate the AADT and the ADT needed in equations 4.1 and 4.2,
TMG suggests to use the AASHTO method (AASHTO 1992), which is more
accurate than the simple average of all daily counts in case of missing data.
Therefore the AADT (and similarly the ADT) for the k-th AVC site in the
group is calculated by:

 n

n
12

1 ∑  1 ∑  1 ∑


DT
AADTk =


i
jlk
n
 12
7
i=1
j=1
(4.4)
l=1
where DTijlk is the daily traffic count taken the i-th day of the week of
the j-th month in the k AVC site. The index l represents the occurrence of
the i-th day of the week in the j-th month, and n is the number of times that
the i-th day of the week during month j occurs (usually between 1 and 5,
depending on the calendar and the number of missing days).
For many years TMG factor approach has been applied using total volume values. This fact was mainly due to technological limits of vehicle
monitoring equipments. Since substantial differences were observed between truck and passengers car travel patterns, TMG now suggests to apply
the factoring approach in a separate way for different vehicle classes.
In the following sections the analysis will consider the case of a single
vehicle class (that is the same of considering total volume). Further details
81
about specific characteristics of truck vehicle class will be given in section
4.5.
4.2.1 Basic Issues of TMG Factor Approach
TMG highlights some relevant issues that must be considered when
applying the factor approach, since they are the source of many of the
errors in annual traffic estimates. Different techniques used to create and
apply traffic correction factors allow the user to control the errors associated
with any given step of the procedure. Unfortunately, none of the available
techniques can control for all of the limitations.
Variability and similarity. It is difficult to define groups of roads that are
similar with respect to traffic variation, and the more mathematically
alike the factoring groups created from the data, the more difficult it is
to define the attributes that determine which roads belong to a given
group.
Definition of road groups. The appropriate definition of a road group
changes depending on the characteristic being measured.
Sample selection. It is difficult to select a representative sample of roads
from which to collect data for calculating the mean values used as
factors.
Incomplete datasets. The datasets used to compute adjustment factors are
not complete.
Variability and similarity This issue relates to the fact that it is quite easy
to define groups of roads with a high level of precision based on variability.
However, the groups that can be easily defined based on variability usually
do not have clear characteristics to identify the group. Therefore, the creation
of factor groups usually involves balancing the need to easily define a group
of roads against the desire to ensure that all roads within a given group have
similar travel patterns.
This same trade-off occurs in the type and magnitude of errors in the
factoring process. For groups that are easy to define but include wider
ranges of travel patterns within the group, errors occur because the mean
factor computed for the group may not be a good estimate of the ”correct”
factor for a specific road segment. For groups that have very ”tight” factors
but for which it is difficult to define the roads that fit, the error occurs in
defining to which factor group a specific road segment belongs.
82
Definition of road groups This fact can be easily observed considering
that trucks have different travel patterns than passenger cars. Factor groups
that work extremely well for computing and applying total volume adjustments (dominated by car volume patterns) often do not work well for truck
volume adjustments.
In general, the ’best factor groups’ are those that can be readily defined
and at the same time contain similar traffic patterns.
Sample selection The best alternative for selecting these sites is to first
define the factor group and then perform a random selection of data collection sites. Normally, neither of these events takes place. Consequently, the
”mean” value computed is often not the ”true” mean value for the group,
in particular if one ”unusual” location is included within a group.
Data collection points are usually not ”perfect” for two reasons.
• Permanent data collection site locations are often selected for a number of reasons, only one of which is factor computation. Some of these
reasons are:
– the need for data from a specific site;
– the desire to track trends over time at sites that have been historically monitored;
– the need for specific physical conditions (the availability of power
or communications lines, the need for smooth, flat pavement).
• Because factor groups are often determined on the basis of data from
existing data collection sites, actual site locations often exist before
the grouping process and cost considerations tend to prevent their
being moved. Thus, the data drive the grouping process, rather than
the grouping process driving the selection of data collection points.
Incomplete datasets No data collection device is perfect, therefore within
any given road network, a certain number of ATR or AVC devices will fail
each year, from a few hours to several months. In some cases, so few data
are available that the site may not be included in the factor computation at
all.
A number of procedures, most notably the AASHTO process for computing AADT (AASHTO 1992), have been designed to limit the effects of
missing data. However, because of the holes in the data, errors are introduced into the factors being computed and, in general, the more data that
are missing, the more error that may be associated with the mean factors
applied to any given location.
The best way to decrease the chance of these errors occurring is to
monitor and repair permanent data collection equipment as quickly as
83
possible. It is also helpful to maintain more than the minimum number of
counters within any given factor group, so that if one counter experiences
a long data outage, the data from that counter can be removed from the
computation process without adversely affecting the factor computation.
4.3 Details About Factor Approach
To be applied in practice, the basic structure of TMG factor approach
needs to be further explained, giving some details about:
• the type of factors that can be used;
• the definition of road groups;
• the number of road sections to be monitored.
4.3.1 Types of Factor
Different procedures have been developed for calculating and applying
factors. Cambridge Systematics Corporation (1994)’s study showed that
different factoring techniques can result in reasonably similar levels of accuracy in the estimate of average annual conditions (Average Percentage of
Errors between 7.0% and 7.6%).
The key point is that the factoring technique must account for all types
of variation present in the data. This means that the level of aggregation
that exists in each factor and the definition of ’seasonal’ can change case by
case. In some cases, day-of-week and seasonal adjustments are combined in
a single factor; in other cases, these two components are treated as separate
factors.
For seasonal adjustments, some techniques use monthly factors, whereas
others use weekly factors. Both of these techniques can be successful. Seasonality does not necessarily vary smoothly from month to month. Consequently, some States find that weekly factors work better than monthly
adjustment factors. However, others find that the monthly factors provide
equally good annual adjustments and require considerably less effort to
compute and apply.
Similarly for day-of-week factors, some States use day-of-week adjustments for each day. Others combine some weekdays (traditionally Tuesday
to Thursday or Monday to Thursday). Both techniques can produce acceptable results if they are applied appropriately, that is if they can be
representative of traffic pattern variations.
84
4.3.2 Road Groups Identification
The TMG suggests three alternative techniques for computing factor
groups from existing PTCs data. Therefore the first step is to compute the
adjustment factors that will be used in the group selection process for each
site for which data are available. The analyst should pay particular attention
to the quality of the data produced by each counting device.
As explained later strengths and weaknesses exist for each alternative.
In many cases the combination of approaches could be better than following
any one technique exclusively. The three techniques here introduced are:
1. cluster analysis;
2. geographic/functional assignment of roads to groups;
3. same road factor application.
Cluster Analysis
With the term ”cluster analysis” the TMG refers to the application of the
Ward’s Minimum Variance method for hierarchical clustering (see section
2.3.2), which determines which AVCs or ATRs are most similar based on
the computed factors.
The analyst’s decision of determine at what point to stop the analysis
should be done in one of two ways:
• The first way is to analyse the mathematical distance between the
clusters formed, as usually done in clustering analysis;
• The second approach is to choose a predetermined number of groups
to be formed.
In practice both options require that the analyst is able to define the
group of roads a given cluster of continuous counters actually represents.
This definition (which is spatial and functional) is necessary to assign arbitrary roadway segments to the newly created factor groups, since the
analysts must understand the rules for assigning short counts to the factor
groups.
This task is extremely difficult, therefore the cluster process is often
modified by the use of secondary procedures (a combination of statistical
analysis and analyst knowledge and expertise) to develop the final factor
groups.
Geographic/Functional Classification of Roads Factor Groups
This approach is based on the available knowledge about traffic patterns
of the roads in the network. The analyst allocates roads into alternative
85
factor groups on the basis of available knowledge, which is usually obtained
from a combination of existing data summaries and professional experience
with traffic patterns.
First, factor groups are selected based on a combination of functional
roadway classification and geographic location. The use of functional classes
for group selection makes it easy to assign individual road sections to factor groups and also allows the creation of factor groups that are intuitively
logical. Sub-State geographical differences could be considered when different levels and types of economic activity exist and differences in traffic
characteristics can occur.
Once the initial factor groups have been identified, PTCs data are examined for each group. For each factor and each factor group, the mean
factor for the group and the standard deviation of that factor are computed.
The standard deviation tells the analyst the size of the expected error of the
average group factor. Since it is assumed that the continuous counters for
which data are available are representative of a random sample of roads
of that group, the errors should be roughly normally distributed about the
factor group mean. If the standard deviation is too high (i.e., the error associated with factors computed for that group of roads is too large), the
definition of roads included in that group may have to be changed. This can
mean the creation of new factor groups (for example, splitting some roads
based on geographical characteristics), or the redefinition of those groups.
Changing factor group definitions effectively moves continuous counters
from one factor group to another and allows the variation within a given
group to be decreased.
Performing this analysis, particular attention must be devoted to the
most important factors, which represent the most significant period of the
year for that roads. For example, if the majority of traffic counting takes place
from the middle of Spring to the middle of Fall, the factor group variation in
January and December is less important (because these factors may never
be used) than the variation in the key May through September time period,
when most short duration traffic counts are taken. Similarly attention is
needed to the presence of ”outlier” sections, which are continuous counters
that do not really fit within the basic pattern that is assumed to exist. For
example, if a counter does not fit within a factor group, having a plot of that
counter’s data will allow the analyst to determine whether the data for that
site contain potential errors that could affect the grouping process, indicate
the need to create a recreational factor group, or provide the insight to place
the stations in another group.
Same Road Application of Factors
The third approach proposed for the creation of factor groups can be
applied only if a dense network of continuous counters exist. This process
86
assigns the factor from a single continuous counter to all road segments
within an ”influence area” of that counter site. The boundary of that influence zone is defined as a road junction that causes the nature of the traffic
volume to change significantly. This approach has a particular interest since
it avoids the application of a mean value that does not accurately describe
traffic variation on that given road section and the problem of associating a
specific road section with a vaguely defined factor group.
Difficulties in the application of this technique occur when the short
duration count is not near the continuous counter and traffic patterns at
the count location may be different than those found at the continuous
counter. Application of factors from individual locations in this fashion
creates considerable potential for bias in the factoring process.
Combining Techniques As noted at the beginning of this subsection,
most States develop and apply factors by using some combination of the
above techniques. For example, on road sections where continuous counters
exist nearby, factors from specific counters can be applied to short duration
counts on those roads. For all other road sections, group factors can be
computed and applied. Factor groups can be initially identified by starting
with the cluster analysis process followed by the use of common sense and
professional judgement. In this way minor adjustments can be made to the
cluster results in order to define the final factor groups in such a way that
they can be easily identified for factor application.
4.3.3 ATR Sample Dimension
FHWA has fixed the precision levels requested by AADT estimates taken
form different road groups, considering the geographical context, the functional level of roads and the approximate expected values of AADT. Appendix C of HPMS field manual (FHWA 2005) presents these levels, which
have a confidence level included in the interval 80 − 90% and a level of error
in the range 5 − 10%.
HPMS standards for the precision of AADT estimates consider road
groups corresponding to specific functional classes. Otherwise TMG is more
focused on differences among roads in terms of temporal variations of traffic
patterns, which are shown to be not related to functional classification,
in particular for truck movements (Hallenbeck et al. 1997). In this sense
TMG recommends to achieve a 10% level of error with a 95% confidence
interval for each road group, excluding recreational road groups which are
characterised by much more complex traffic patterns.
Assuming permanent count stations in a given seasonal group being
randomly selected, the level of precision of AADT estimate for road group
i and a seasonal factor k can be calculated as:
87
CoV
dik = t α2 ,n−1 √ ik
nik
(4.5)
where:
• d = precision as a percentage to the mean factor for road group i and
a seasonal factor k;
• CoV = coefficient of variation of the seasonal factor k in road group i;
• n = required number of ATRs to be used in road group i for seasonal
factor k;
• t α2 ,n−1 = value of Student’s t statistic at 100(1 − α) percent confidence
interval and n − 1 degrees of freedom.
From equation 4.5 the number of ATRs needed to estimate AADT for
road group i and a seasonal factor k can be calculated by :
t2(1−α/2),n−1 CoV 2
(
n=
1+
1
N
d2
t2(1−α/2),n−1 CoV 2
d2
)
−1
(4.6)
where:
• d = precision as a percentage to the mean factor for road group i and
a seasonal factor k;
• CoV = coefficient of variation of the seasonal factor k in road group i;
• n = required number of ATRs to be used in road group i for seasonal
factor k;
• t α2 ,n−1 = value of Student’s t statistic at 100(1 − α) percent confidence
interval and n − 1 degrees of freedom.
• N number of road sections belonging to road group i
Student t statistic t α2 ,n−1 can be substituted by Z statistic in case of sample
dimensions larger than 30 sections.
If counts are routinely taken over a nine-month period, the one month
with the most variable monthly adjustment factor (among those nine months)
should be used to determine the variability of the adjustment factors and
should thus be used to determine the total sample size desired. In that way,
factors computed for any other month have higher precision. For most factor groups, at least six continuous counters should be included within each
factor group. This is an initial estimation based on AADT factor groups. If
it is assumed that some counters will fail each year because of equipment,
communications, or other problems, a margin of safety may be achieved by
adding additional counters.
88
4.4 Alternatives to Factor Approach
An alternative to factoring exists. This technique is not commonly used,
but it is appropriate where factor groups are not readily known and the
annual traffic estimate must be very accurate. Work done showed that for
volume counts by vehicle classification, it was possible to achieve accurate
annual estimates by taking 4 week-long STCs per year at the same location
(Hallenbeck and O’Brien 1994). This approach provides sufficient data to
overcome the primary sources of variation in the data collection process:
• Taking week-long counts removes the day-of-week variation;
• Counting at the same location four times at equally spaced intervals
removes the majority of seasonal bias.
4.5 Specificities of Truck Vehicles
As already exposed, traditional factor approach can be applied to truck
traffic patterns and, in general, to more vehicle classes. However, the characteristics that need to be accounted for can be very different and in practice
some points must be considered:
• Functional class of roadway has been shown to have a very inconsistent relationship to truck travel patterns (Hallenbeck et al. 1997). Local
truck traffic can be generated by a single facility such as a factory, or
by a wider activity such as agriculture or commercial and industrial
centers. These ’point’ or ’area’ truck trip generators create specific seasonal and day-of-week patterns much like recreational activity creates
specific passenger car patterns.
• Geographic stratification and functional classification can be used to
create truck volume factor groups that capture the temporal patterns
and are reasonably easy to apply. Also Clustering analysis can be
appropriate to identify the natural patterns of variation and to place
the continuous counters in the road groups.
• Independently of the approach adopted for the identification of road
groups, the information on variability must be reviewed to determine
whether the roads grouped together actually have similar truck travel
patterns. In practice no designed group will be optimal for all purposes or apply perfectly to all sites. At the same time, by changing
the road groups, it may be possible to classify roads so that all roads
have similar travel patterns for one vehicle class, but for another class
patterns become highly variable. At some point, the analyst will need
to determine the proper balance between the precision of the group
89
factors developed for these two classes, or they will have to accept the
fact that different factor groups are needed for different vehicle classes.
Then each road may end up in multiple factor groups depending on
what vehicle classification volume is being factored. Use of multiple
groups may result in a more accurate factor process but will certainly
result in a more complicated and confusing procedure.
• If very precise adjustment factors are desired, it is possible that the
factor process will require different factor groups for each vehicle
class. In such a case, each class volumes may need to be adjusted using a specific factor process and the volume estimates independently
obtained need to be added to produce the total AADT estimate.
90
Chapter 5
Review of Traffic Monitoring
Guide
Given its importance in the U.S., the FHWA factor approach has been
analysed and revised by many during the years. In this section the main
findings of these efforts are reported, with the aim of identify the state of
the art in this specific topic.
5.1 Introduction
The estimation of AADT based on factor approach and the combined
use of limited observation of traffic flow and road groups has been practiced
for nearly 40 years. Bodle 1967 classified the sources of possible errors of
the factor approach into three categories:
1. Error due to the day-to-day variations in traffic volumes;
2. Error in grouping of road segments and the use of wrong adjustment
factors;
3. Error in assigning the road segment where SPTC is obtained to the
road group.
The fact that traffic volumes fluctuate constantly presents a problem
when estimating AADT. This is an issue common among any estimation
problem in the transportation field. The FHWA factor approach tries to
reduce this effect, describing temporal traffic variations with seasonal adjustment factors for different period of the year, road groups and vehicle
classes.
The other sources of possible errors identified by Bodle deal more with
the correct implementation of FHWA factor approach in practice. Before
discussing them, some details about the different importance of these source
of errors is given in the next section.
91
5.2 Bias and Precision in MDT Estimation
Following the indications given by Davis (1997), Annual Average Daily
Traffic (AADT) can be considered an estimate of the Mean Daily Traffic
(MDT), which has to be considered as the expected daily traffic volume on
a ”typical day”.
One considers the simple case of determining the value of MDT using
a 24h SPTC, denoted by z. The value of z can vary day-to-day both systematically, due to seasonal and day-of-week quite predictable trends, both
randomly, due to the unpredictable decisions of drivers.
FHWA factor approach can be represented by a multiplicative model:
E[z] = Mi W j z0
(5.1)
where:
• z = the daily traffic count made during month i, i = 1, . . . , 12 and
day-of-week j, j = 1, . . . , 7;
• E[.] = the expected value of the random variable inserted in the square
brackets,
• z0 = Mean Daily Traffic,
• Mi = adjustment factor for month i,
• W j = adjustment factor for day-of-week j.
In statistic an estimator of a parameter is said to be unbiased if the
expected value of that estimator equals the parameter’s true value; the
difference between these two values is called the bias of the estimator. The
same agreement does not exist speaking of precision; one point of view is
referring to the variance of an estimator about its expected value.
Equation 5.1 indicates that a single day count will be an unbiased estimator of MDT z0 only if the product Mi W j of the monthly and day-of-week
factor equals 1. Otherwise, if the adjustment factors Mi and W j are already
known, an unbiased estimator of z0 can be written as:
]
[
z
= z0
(5.2)
E
Mi W j
In other terms, Equation 5.2 states that the factor approach helps to
reduce bias since the group adjustment factors calculated from continuous
recorders well represent the sample site’s true ones. From this point of view
the correct assignment of sample site to the correct group (i.e. the use of
correct adjustment factors) is crucial to obtain reliable estimates.
Being z an estimator of z0 , the percentage error of estimation (PE) can
be defined as:
92
PE =
ẑ − z0
× 100
z0
(5.3)
Since ẑ is a random outcome that depends on the sample of traffic counts,
some sample will give large values of PE, while others small ones. The
measure of the theoretical tendency of an estimator ẑ to produce estimates
close to z0 is the root mean square percentage error (RMSPE), defined as:
RMSPE =
√2
E[PE]2
(5.4)
Letting z = E[ẑ] denote the expected value of estimator ẑ, the RMSPE of
ẑ can be calculated as:
√2
RMSPE = E[PE]2
√2
= Var[PE] + (E[PE])2
√2
= E(PE − E[PE])2 + (E[PE])2
(5.5)
(5.6)
(5.7)
Since
E[PE] = E
[
]
z − z0
ẑ − z0
× 100 =
× 100
z0
z0
(5.8)
therefore
√
(
)2 (
)2
ẑ − z0 z − z0
z − z0
−
+
× 100
RMSPE = E
z0
z0
z0
√
(
)2 (
)2
ẑ − z
z − z0
2
= E
+
× 100
z0
z0
√
2
2
2 E (ẑ − z) + (z − z0 )
=
× 100
z0
2
Finally:
(5.9)
(5.10)
(5.11)
√
RMSPE =
2
E (ẑ − z)2 + (z − z0 )2
× 100
z0
(5.12)
The result obtained highlights that the RMSPE can be divided in two
distinct terms,each of them contributing to the estimator’s mean distance
to z0 .:
1. The variance of the estimator ẑ (the left-hand term under the radical
sign);
93
2. The square of the bias of ẑ (the right-hand term under the radical sign);
If the estimator ẑ is unbiased, meaning that z = z0 , the bias term in
equation 5.12 equals zero and RMSPE is equal to the coefficient of variation
CoV of ẑ. It is important to notice that variance and bias have two distinct
origins:
• The variance of ẑ depends on the random day-to-day variability in
traffic counts and usually decreases as the sample size increases;
• The bias of ẑ can be unrelated to sample size, but to an incorrect
accounting of seasonal and day-of-week trend variability.
This analysis of variance and bias is important for the practical consequences in AADT estimation:
• The use of SPTCs is expected to provide an accurate estimate of AADT
(within the 15% percent of the true value) only if the seasonal adjustment factors are close to the site’s true adjustment factors;
• The assignment of a section monitored with SPTCs to the incorrect
road group could led to a tripling of the estimation error (Davis 1996).
Therefore is necessary to consider the importance of grouping road
segments and assigning them to the correct road group. In the following
sections each aspect is considered in detail and some of the most important
improvements proposed in recent years are introduced.
5.3 Grouping of Road Segments
As introduced in section 4.3.2, the TMG suggests three ways to establish road groups based on the data obtained at AVC sites: geographical/functional classification, ”same road” application of the factors, and
clustering analysis. The choice of the ”best” method depends on the availability of data and to the analyst’s knowledge of the roadway network.
Since functional classification has been proved to be not effective in
many situations (in particular for truck vehicles (Hallenbeck et al. 1997))
and ”same road” application of the factors requires a dense network of AVC
sites with high costs, the most popular approach is the clustering analysis.
However, the application of clustering analysis could have some drawbacks:
1. The road groups formed could not have at the same time a clear
mathematical definition and a linguistic definition in terms of geographical/functional characteristics (FHWA 2001);
94
2. It is often difficult to establish an ”optimal” number of groups, both
using functional or cluster-based definition of road groups (FHWA
2001);
3. The clusters could not be the same over the years; that is, the group
to which an ATR site belongs can change over time (Faghri, Glaubitz,
and Parameswaran 1996; Ritchie 1986).
Different authors have tried to improve the accuracy of grouping step
adopting different methods, including Genetic Algorithms (GAs), Artificial Neural Networks (ANNs), Regression Analysis and a large number of
Clustering techniques.
Genetic Algorithms (GAs) are machine learning techniques, which reproduce the processes of evolution in nature. The origin of GAs is attributed
to Holland’s work (Adaptation in Natural and Artificial Systems) on cellular
automata and there has been significant interest in GAs (Buckles and Petry
1994) in different fields, including job shop scheduling, training neural nets,
image feature extraction, and image feature identification.
Lingras (2001) applied GAs to the definition of road groups from 264
monthly traffic patterns recorded between 1987 and 1991 on Alberta highways and compared the results with traditional hierarchical grouping approach (Sharma and Werner 1981). Groupings obtained from both the hierarchical and the GA approaches were analyzed for different numbers
of groups, considering the values of the within-group errors as a quality
measure. The results reported seems to prove that for a smaller number of
groups GAs will be able to provide a better classification scheme. Similarly,
if a larger number of groups is desired, hierarchical grouping may be a more
suitable choice. In particular the hierarchical grouping error was found to
be as high as 40% above the GAs error for the three-group classification,
but when the number of groups were larger than 14, hierarchical grouping
outperformed GAs.
ANNs have been implemented by many authors to define road groups
of permanent counters (AVCs).
Lingras (1995) adopted an unsupervised learning model to deal with
the traffic pattern classification. In his study he implemented a Kohonen
network (see section 2.3.3) to group patterns represented by k-dimensional
vectors into m groups (Figure 5.1).
In competitive learning neural networks, the output neurons compete
with each other: the winner output neuron has the output of 1, the rest of
the output neurons have outputs of 0. This means that a given pattern is
classified into exactly one of the mutually exclusive classes (hard clustering). Lingras compared the Kohonen Neural Network approach with the
hierarchical grouping approach to classify seasonal traffic patterns from 72
permanent traffic counters in Alberta.
95
Figure 5.1: Example of Kohonen Neural Network. Source: Lingras 1995
The comparative analysis was based on the similarity of grouping obtained with the two approaches. The results highlighted that:
• Road Groups determined were very similar and the Kohonen Neural
Network approach could be used to approximate the hierarchical
grouping technique;
• The classification was better using hierarchical grouping when a small
number of groups is considered, using Kohonen Neural Networks in
case of more input factors.
• Kohonen Networks could be successfully applied when further data
need to be added or when incomplete patterns (for example due to
equipment failures) were available.
• The use of Kohonen Networks should be extended to cope with other
types of inputs, such as day-to-day and hour-to-hour variations, having a greater flexibility of use compared to hierarchical approach.
The results proposed were interesting, but no details were given by the
authors on the final estimate of AADT.
Faghri and Hua (1995) similarly applied Artificial Neural Networks to
determine seasonal factors and AADT estimation, comparing the results
obtained with existing approaches. In this case Adaptive Resonance Theory (ART) was proposed by the authors (see section 2.2.3). Data from 29
ATR sites were provided by Delaware Department of Transportation and,
based on these data, 12 monthly adjustment factors were calculated for each
ATR site. ATR1 results were compared with the one obtained using clustering analysis (Ward’s method, the average linkage method, and centroid
method) and regression analysis.
96
The linear model adopted in the case study was:
fsm = α0m + α1m X1 + α2m X2 + α3m X3
(5.13)
where X1 = 1 if rural and 0 if urban, X2 = 1 if recreational and 0
otherwise, X3 = 1 if recreational and arterial, 0 otherwise.
The comparison was based on the deviation of the estimated seasonal
factors s f obtained from clustering analysis clus, regression regr and ART1
ART1, from the actual act, calculated for different ATR i and month j:
errclus (j) =
∑[
i
errregr (j) =
∑[
i
errART1 (j) =
]
s fclus (i, j) − s fact (i, j) 2
∑[
(5.14)
]2
s fregr (i, j) − s fact (i, j)
(5.15)
]
s fART1 (i, j) − s fact (i, j) 2
(5.16)
i
The average error obtained for different methods were calculated by:
1 ∑
errclus (j)
12
i
1 ∑
avgerrregr (j) =
errregr ( j)
12
i
1 ∑
errART1 (j)
avgerrART1 (j) =
12
avgerrclus (j) =
(5.17)
(5.18)
(5.19)
i
ATR1 was found to produces substantial improvements compared to the
results of clustering and regression analysis. In fact the average error obtained by ATR1 was an 81% improvement on the regression analysis results
and an 85% improvement on the cluster analysis. Based on these authors
indicated that the neural network approach was significantly better for estimating the seasonal factors and provides more accurate results. However,
as observed for Lingras (1995), no details were given by the author on the
final estimates of AADT.
5.3.1 Clustering techniques
As written in section 4.3.2 TMG suggests the implementation of Ward’s
agglomerative hierarchical method, which minimizes the increase in total
within-cluster sum of squared error.
Probably this choice was based on the study of Sharma and Werner
(1981), who applied the Ward’s method to group 45 permanent traffic counters (PTCs) in Alberta, Canada, based on monthly adjustment factors. They
97
combined the Ward’s method with the Scheffe’s S-method of multiple comparisons of group means to determine the optimal number of groups. The
approach was successfully implemented in other contexts (Sharma and Allipuram 1993; Sharma et al. 1986) and became a reference method for other
researchers (e.g. (Faghri and Hua 1995; Lingras 1995)).
However, other clustering techniques have been applied to road grouping of ATR or AVC sites.
Flaherty (1993) used the hierarchical clustering method and the k-means
method to analyze the monthly factor data collected over a 5-year period
from 28 PTCs in Arizona. The comparative analysis highlighted that functional classification of grouped roads was not relevant for the similarity of
monthly factor patterns. Conversely geography and topography were more
important to group similar roads.
Schneider and Tsapakis (2009) and Tsapakis et al. (2011) combined geographical and k-means method to identify road groups. The Ohio Highway
network considered in the analysis was divided in areas due to different geographical characteristics. For each area k-means method was implemented
analysing the effect of different vehicle classes (automobiles and trucks) and
direction of flow at each site.
Recently some authors have compared the results obtained applying different clustering techniques to the same dataset, with the aim of identifying
the ”best” method to be applied for road grouping.
Li et al. (2003) compared eight agglomerative clustering methods based
on data collected in Florida. Data from 21 ATR were used to calculate
monthly adjustment factors to be used as attributes for the clustering analysis. As done by Faghri and Hua (1995), the quality of clusters was tested
analysing the average pooled variance of cluster groups and the temporal
stability of results. The study found that average linkage, centroid, and single linkage methods were more robust to outliers than the other methods.
McQuitty’s method performed better than the other methods on grouping
ATR sites after outliers were eliminated. However the compositions of seasonal groups were not stable over time and the authors suggested that other
variables should be included in the grouping to cope with the change in the
spatially clustering.
Zhao, Li, and Chow (2004) extended the previous research applying
model-based clustering analysis on monthly adjustment factors from 129
ATR sites located in Florida rural areas. To evaluate the quality of clustering
the analysis process included:
1. The implementation of model-based strategy for clustering 2 to 100
groups using monthly adjustment factors as input variables;
2. Perform the same implementation adding the coordinates of ATR sites
to the input variable dataset.
98
Also in this case the quality of results was evaluated analysing the homogeneity of road groups. An ATR site was considered correctly classified if its
adjustment factors did not exceed the threshold defined as 10% of the mean
adjustment factors obtained for the road group. The results showed that
model-based clustering methods, such as the EEV model, could produce
classifications with negligible grouping error (2.08%) when adjustment factors data were used in the analysis. However ATR sites belonging to the
same road group were over-dispersed in space. By incorporating coordinates of the ATR sites in the clustering it was found that the EEI model was
the best one since it produced the least grouping error.
Finally Gecchele et al. (2011) compared the implementation of hierarchical, partitioning and model-based clustering techniques for the creation
of road groups based on adjustment factors. 54 AVC sites located on the
rural road network of the Province of Venice in 2005 were analysed, considering direction traffic variability and a 2-class scheme which differentiates
between passengers vehicles (PV) and truck vehicles (TV) with reference
to a 5 m-length threshold. Seasonal adjustment factors were calculated for
both classes and used together as input of clustering techniques.
Differently from previous studies the quality of road groups obtained
with the different methods was determined analysing the accuracy of AADT
estimates. Following a procedure adopted by other authors (Sharma et al.
1999, 2000), 24hr sample counts were generated from the dataset and, for
each classification made by clustering methods, were used as follows:
1. An AVC site was removed from the road group to which it belonged
to create sample counts;
2. The AVC site removed was called ’sample AVC site’ and the road
group from which it was taken ’home group’;
3. Seasonal adjustment factors were calculated for the home group, excluding the sample AVC site;
4. The factors thus created were used to estimate AADT from samples
generated by the sample AVC site;
5. The process was repeated using each section at a time as a sample
AVC site, generating a large number of samples and AADT estimates.
For each AADT estimate AADTi,Estimate the absolute percent error was
calculated by:
AADTEstimate − AADTActual ∆=
(5.20)
× 100
AADTActual
The analysis of the errors obtained with the various methods was developed considering the Mean of the Absolute percent error (MAE) for
99
different tests: by vehicle type (passenger and truck vehicles), by different
day-types and by different periods of the year. MAE was calculated by:
)
n (
1 ∑ AADTi,Estimate − AADTi,Actual MAE =
(5.21)
× 100
n
AADTi,Actual
i=1
The results were that:
• Clustering methods identified a common basic structure of AVC groups
with functional and geographical significance. Model-based clustering methods showed slightly better performances compared to the
other methods and had a robust mathematical structure;
• Error patterns showed worse results in AADT estimation in some
specific cases:
– Analyzing day-type, Saturdays and Sundays had higher errors
than Weekdays;
– Analyzing the period of the year, summer period showed higher
errors than winter period;
– Analyzing the vehicle type, the errors were higher for truck
vehicles than passenger vehicles.
Since a large difference was observed between AADT estimation errors
for passenger and truck vehicles, the authors developed a similar analysis
creating separated road groups for passenger and truck vehicles, with the
aim of better reproducing the temporal traffic patterns of both vehicle classes
(Rossi, Gastaldi, and Gecchele 2011).
In this case AADT estimates for passenger and truck vehicles were calculated in a separate manner using 48-hr sample counts generated from
the main dataset. Mean Absolute Errors (MAEs) were calculated for passenger and truck vehicles, considering the day of the week (weekdays or
weekends) and the period of the year in which the short counts were taken.
The results were analysed using the Friedman Test (a non-parametric alternative to the one-way ANOVA with repeated measures), and the most
relevant findings were that:
• MAE patterns were similar to the results obtained by previous studies,
considering when SPTCs are taken and vehicle classes;
• The use of different seasonal factors for each vehicle class affected the
number and the characteristics of the road groups identified. The use
of road groups which consider the specificity of traffic patterns for
passenger and truck vehicles has a positive effect on the accuracy of
AADT estimates;
100
• Clustering methods show common error patterns and give comparable results in term of accuracy of AADT estimates.
5.4 Assignment of Road Segments
As introduced in section 5.2 the assignment of a road segment monitored with SPTCs to the correct road group represents the most critical
aspect of FHWA factor approach. In particular it was found that two-week
counts taken in different months represent the minimum seasonal count to
guarantee the correct assignment [(Davis and Guan 1996),(Davis 1997)].
To minimize the risk of large errors, the TMG suggests the use of weekly
counts repeated in different periods of the year (STCs) in order to capture
the seasonal variability of the monitored road sections. However there are
no further specifications about the assignment process.
This lack of information has lead to the growth of a large number of
different methods, in some cases alternative to the factor approach, based
on different types of data. Following the scheme of Figure 5.2 they have
been reviewed and the major finding are reported in the next sections.
Figure 5.2: Scheme of Assignment Strategies
101
5.4.1 Assignment Using Traffic Patterns
The use of traffic patterns of short counts can be considered the most
important information for the assignment of a road section to road groups.
Different methods have been proposed to accomplish this task, given a
certain number of SPTCs or STCs at the specific road section.
One of the first proposal was made by Sharma and Allipuram (1993),
who developed an Index of Assignment Effectiveness to correctly assign
STCs to the road groups. In their paper they analysed the use of Seasonal
Traffic Counts (STCs) and developed a method for the assignment of STC
to road groups, rationalizing the length and frequency of STCs for various
types and road facilities. Finally they suggested STC schedules for traffic
monitoring. Their dataset included 61 PTC Sites in Alberta (Canada) monitored for year 1989. Based on monthly adjustment factors seven road groups
were identified using the approach of Sharma and Werner (1981).
The study considered the appropriateness of a schedule in terms of its
ability to correctly identify the pattern of seasonal variation at the location
of the traffic count. Different STC schedules S(L, F) were defined, being L
the length of a count in weeks and F the frequency of counts in a year.
The data from permanent counters allowed the generation of sample STC
data: 12 STC samples were built for each schedule S(L, F). The method of
assignment proposed had two parts, the first concerning the PTC and the
second one the STC.
The first part of the assignment considers PTC and can be subdivided
in different steps:
• Choice of a Schedule S(L, F);
• Selection of a sample PTC and generation of sample STCs according
to the schedule chosen [12 samples];
• Computation of an array of potential errors associated to the assignment of STCs to each road group:
1 ∑
( fsj − fij )2
12
12
MSESi =
(5.22)
j=1
where MSESi is the Mean Squared Error associated with the assignment of STC to i-th road group, fsj is the monthly traffic factor of the
STC for month j and fij is the average monthly factor of road group i
for month j.
• Scaling of performances, giving a 100% effectiveness for the correct
assignment (to the group the sample PTC belonged to) and 0% for the
102
worst one. A linear scale was used for intermediate values:
AEi =
max MSE − MSEi
× 100
max MSE − min MSE
(5.23)
where AEi is the effectiveness of the assignment to road group i.
Each PTC was chosen each at a time as a sample PTC, STCs were genrated and the Assignment Effectiveness to the road groups were calculated.
However when STCs are taken, the actual values of monthly factors fs j
are not available, due to the limited amount of data. Therefore a specific
analysis must be conducted:
• Computation of the Sample Average Daily Traffic (SADT) at the sample site combining sample values monitored:
SADT =
Total Volume counted during schedule S(L, F)
Total Number of days counted during schedule S(L, F)
(5.24)
• Computation of the Average Daily Traffic ADTk and the seasonal
traffic factors fsk for the k-th count of the schedule S(L, F), using the
relationships:
ADTk =
Total Volume during the k-th visit
Number of days counted in the visit
fsk =
ADTk
SADT
(5.25)
(5.26)
• Use of the average monthly factors of road groups fik to describe the
seasonal traffic variation in the months in witch STC counts were
undertaken, where i represents a road group and k a season (month).
• Comparison of the sample fsk values with the fik values for each road
group, using as error relationship:
F
1 ∑
MSESi =
( fsk − fik )2
F
(5.27)
(j=1)
where MSESi is the Mean Squared Error associated with the assignment of SPTC to i-th road group and F is the frequency or number
of counts in the schedule S(L, F). The value of correspondent AE can
be found considering the STC belonging to the road group with the
minimum error of assignment.
103
• Computation of the Index of Assignment Effectiveness (IAE), taking
off repeated samples of a giving schedule S(L, F) and moving from step
1 to 4. The resulting AE values can be used to compute the required
IAE by:
IAE =
I
1 ∑
ni (AEi )
N
(5.28)
(i=1)
where ni is the number of times the sample site is assigned to road
group i, AEi is the AE value for road group i, I is the total number of
road group and N is the total number of samples of a given S(L, F)
taken at the sample site.
The Sample Average Daily Traffic (SADT) was used also as a measure
of schedule evaluation. In fact, the closer is SADT to AADT actual value,
the more the chosen schedule is capable of describing temporal variations
of traffic patterns. The absolute value of percent difference (PDA) between
SADT and the actual AADT was used to evaluate the quality of schedules:
SADT − AADT (5.29)
PDA = × 100
AADT
The authors produced different tables with IAE and PDA values obtained for different schedule combinations. The results highlighted that an
increment in frequency or duration of STCs increased IAE and reduced
PDA. These trends were different among the road groups, as a consequence
of specific characteristics of traffic patterns observed in the road groups.
Given these findings, the choice made by a highway authority would
be the result of the trade-off between the accuracy of results and the cost
of alternative schedules. The authors suggested to adopt values of 95% or
higher for the IAE and 2% or lower for the PDA. Using these values, which
ensure a fairly accurate assignment, the authors determined the minimum
requirements for schedules S(L, F) for different road groups.
Table 5.1: Suggested Frequency of Counts of Different Durations. Source: Sharma and
Allipuram 1993
Type of route
Commuter
Average rural
Recreational
L=1
4
6
-
L=2
3
4
6
L=3
3
3
6
L=4
3
3
5
For rural highways in Alberta, the AADT volume error margins or
precision at a 95% confidence level as found in that study were:
104
1. ±8.00 for two noncontinuous month counts during a year;
2. ±5.60% for three noncontinuous month counts during a year;
3. ±3.61% for four noncontinuous month counts during a year.
Sharma, Gulati, and Rizak (1996) extended the findings of previous
paper considering the importance of accuracy of AADT estimates, based on
data from 63 ATR sites located on Minnesota rural roads. Since in Minnesota
48hr SPTCs were adopted as standard sample counts, starting at 12.00 noon
on Monday, Tuesday and Wednesday from May to September. Authors
defined road groups (ATRG) using Sharma and Werner 1981’s approach,
based on 15 AF factors, which accounted for the starting day and the month
of SPTCs (3 days by 5 months), that is:
AF(n, d, m) =
n
AADT
×
(Average n − h volumed,m) 24
(5.30)
where AF(n, d, m) is the adjustment factor (AF) for n (e.g., 48) hours of
counting starting at noon on day d (e.g., Monday) in the month m (e.g., July)
and AADT is the average annual daily traffic at the ATR site.
The authors analysed the effects of different SPTC’s duration (n =
24, 48, 72 hours) extracting from each ATR site 6 counts/month for each
duration. Totally 30 × 63 = 1890 SPTCs were generated for each duration.
The Estimate of AADT (EAADT) was calculated by:
EAADT =
24 × SV(n, d, m)
× AF(n, d, m)
n
(5.31)
where SV(n, d, m) was the sample volume.
The estimation error E for a sample was calculated by:
E=
EAADT − AADT
× 100
AADT
(5.32)
The errors were analysed for various ATRGs, checking for the normality
of E. In the case study histograms of error distribution developed for various
ATRGs and all the ATRGs combined together showed that the variable E
followed a normal distribution and that the mean values of estimation errors
for all groups were statistically equal to zero at a confidence level of 95%.
Assuming equality of variances for errors in the sample sites, the following relationship could be written;
1 ∑∑
=
(Eik − Ēik )2
N−1
ni
S2e
ns
i=1 k=1
where:
105
(5.33)
• S2e is the standard deviation of errors resulting from SPTCs from an
ATRG;
• Eik is the error at the i-th sample site for the k-th sample;
• Ēik is the mean value of Eik ;
• ni is the total number of sample sites included in the group;
• ns is the number of samples taken at the sample site i;
• N is the total number of samples resulting from the multiplication of
ni and ns .
The normal distribution of errors with mean 0 and standard deviation
equal to Se allowed computation of the confidence interval:
±Z( α2 ) × Se
(5.34)
where α is the confidence coefficient and Z( α2 ) is the standard normal
statistic corresponding to α. Using a confidence level equal to 95%, the
absolute value of the ’lower bound’ and the ’upper bound’ were denoted
as:
PB95 = | ± Z0.025 | × Se = 1.96Se
(5.35)
The authors first investigated the AADT estimation errors in case of
correct assignment of SPTCs to the ATR group to which they actually belong
(Figures 5.2 and 5.3).
Table 5.2: Effect of Road Type on AADT Estimation Errors from 48hr SPTCS. Source:
Sharma, Gulati, and Rizak 1996
ATR Group
Road Class
All ATRs
ATRG 1
ATRG 2
ATRG 3
ATRG 4
ATRG 5
All rural highways
Regional commuter 1
Regional commuter 2
Average rural
Regional/recreational
Tourist/recreational
Number of
Samples
1890
660
510
300
270
150
Mean Error
[%]
-0.21
-0.17
0.07
-0.77
-0.16
-0.74
Standard deviation
Se [%]
7.39
5.39
5.78
7.41
12.19
8.20
Error limit
PB95 [%]
14.48
10.56
11.33
14.52
23.89
16.07
However improper assignment of sample sites to ATR groups resulted in
the use of incorrect adjustment factors and possibly large estimation errors.
The standard deviation (Se) of the errors resulting from the 24hr, 48hr, and
72hr SPTCs were calculated for different AE categories: > 50%, 51−55%, 56−
60%, ..., 91 − 95%, > 95%. Figure 5.3 shows curves plotted between the Se
values for the counts of various durations and the AE categories.
One can observe that:
106
Table 5.3: Effect of Count Duration on AADT Estimation Errors. Source: Sharma, Gulati,
and Rizak 1996
ATR Group
All ATRs
ATRG 1
ATRG 2
ATRG 3
ATRG 4
ATRG 5
Error limit PB95
for 24hr counts [%]
16.50
12.89
15.23
17.01
27.15
16.50
Error limit PB95
for 48hr counts [%]
14.48
10.76
11.66
14.62
23.89
16.07
Error limit PB95
for 72hr counts [%]
13.13
10.76
11.66
14.62
19.36
14.56
• the relationships between the estimation errors Se and assignment
effectiveness AE values indicate that the degree of correctness of sample site assignment to an ATR group has a large influence on the
AADT estimation errors. Small decreases in assignment effectiveness
can result in large increases in the magnitude of AADT errors;
• the AADT estimation errors are more sensitive to the correctness of
sample site assignment to a proper ATR group than to the duration
of counts investigated. A 24hr SPTC at an appropriately assigned site
would result in better AADT estimates than a 72hr SPTC at the same
site when it has been incorrectly assigned.
Davis and Guan (1996) changed completely the approach to the problem,
employing the Bayesian theorem to assign a given road section to the road
group with the higher posterior probability, defined by:
f (z1 , . . . , zN | Gk )αk
Prob[site ∈ Gk | z1 , . . . , zN ] = ∑n
i=1 f (z1 , . . . , zN | Gi )αi
(5.36)
where:
• f (z1 , . . . , zN | Gk ) = a likelihood function measuring the probability of
obtaining the count sample had the site actually belonged to a given
road group Gk ;
• z1 , . . . , zN = a sequence of N daily traffic counts at a SPTC site;
• G1 , . . . , Gn = a total of n different road groups;
• αk = probability that the given site belong to Gk (prior classification
probability).
The prior classification probability was set equal to n1 , to represent the
complete uncertainty regarding which road group a SPTC belonged. As a
likelihood function in the posterior classification probability was used the
linear regression model:
107
Figure 5.3: Effect of AE on AADT Estimation Errors. Source: Sharma, Gulati, and Rizak
1996
yt = µ +
12
∑
∆t,i mk,i +
i=1
7
∑
δt, j wk, j + ϵt
(5.37)
j=1
where:
• yt = natural logarithm of the SPTC zt taken of day t;
• µ = expected log traffic count on a typical day;
• ∆t,i = 1 if the count zt was made during month i, i = 1, . . . , 12 and 0
otherwise;
• mk,i = correction term for month i, characteristic of road group k;
108
• δt, j = 1 if the count zt was made during day j, j = 1, . . . , 7 and 0
otherwise;
• wk,j = correction term for day-of-week j, characteristic of road group
k;
• ϵt is the random error.
It was further assumed that the random errors ϵ1 , . . . , ϵN were randomly
distributed with mean value 0 and covariance matrix σ2 V, where σ2 was
the common unconditional variance of yt and V is a N × N matrix of correlation coefficients such that the element Vs,t in row s and column t was the
correlation coefficient for ys and yt .
The model, validated using data from ATR sites in Minnesota, was
shown to be able to produce mean daily traffic estimates (AADT estimates)
that were near ±20% of actual values based on 14 well selected sampling
days of the year. The main advantage introduced by this method was not an
improvement of precision, but the reduction of the importance of subjective
judgements in the attribution process of SPTC sites.
Recently the use of Linear Discriminant Analysis (Schneider and Tsapakis
2009; Tsapakis et al. 2011) was recently tested to determine the group assignment of SPTCs (24hr) with good results. In their latest paper Tsapakis et al.
(2011) applied Linear Discriminant Analysis (LDA) to assign SPTCs to road
groups using in their analysis continuous traffic volume data from 51 ATR
sites, obtained in the state of Ohio during 2005 and 2006. The ATRs were
grouped together by using a combination of geographical classification and
cluster analysis. The k-means algorithm was used to further subdivide the
geographical factor groups by using the 12 monthly factors of each site as
input in cluster analysis. Seasonal adjustment factors were developed considering the total traffic volume at the site as well as the individual volume
in each direction of travel. The data set was used both as a training set to
estimate seasonal adjustment factors and create factor groupings and as a
test set from which sample SPTCs are generated. Specifically, every daily
count produced from each ATR was used as a sample 24hr count that is
assigned to one of the previously defined factor groupings.
Two traffic parameters were selected to develop Discriminant Analysis
(DA) assignment models:
• The set of 24 time-of-day factors estimated for weekdays:
Fhh =
ADT
HV hh
HV hh , 0
(5.38)
where HV hh is the hourly volume that corresponds to hh-th hour of a
day and hh = 1, 2, . . . , 24;
109
• the ADT, which represent the total traffic volume per day on a roadway section:
The methodology of the study relied on the assumption that the two
variables may be used interchangeably or in combination to identify sites
with similar traffic behavior, since the two variables capture both the variability and the magnitude of the daily traffic at a specific location. In fact,
two stations may exhibit similar traffic patterns but carry significantly different traffic volumes, and viceversa.
Given the set of independent variables, LDA attempted to find linear
combinations of those variables that best separate the groups of cases. These
combinations are called discriminant functions and have the form displayed
in the equation.
dik = b0k + b1k xi1 + . . . + bpk xip
(5.39)
where:
• dik is the value of the kth discriminant function for the ith case (standardized score);
• p is the number of predictors;
• b jk is the value of the jth coefficient of the kth function;
• xij is the value of the ith case of the jth predictor.
The estimation of the discriminant functions is made using the means of
the p predictors and the pooled within-group variance-covariance matrix W,
which is generated by dividing each element of the cross-products matrix
by the within group degree of freedom:
Dc = W−1 × Mc
(5.40)
dc0 = −0.5 × Dc × Mc
(5.41)
where:
• Dc = (dc1 , . . . , dcp );
• W−1 is the inverse of the within-group variance-covariance matrix;
• Mc = (Xc1 , . . . , Xcp ), means for function c on the p variables;
• dc0 is the constant for function c.
110
Two full discriminant models composed the basis for the model development. The first model, Equation 5.42, includes both the ADT and the
time-of-day factors, and the second full model, Equation 5.43, considered
the hourly factors only. The general form of the two full models was:
D1c = dc,0 + dc,1 ADT + dc,2 F1 + dc,3 F2 + . . . + dc,25 F24
(5.42)
D2c = dc,0 + dc,1 F1 + dc,2 F2 + . . . + dc,24 F24
(5.43)
where:
• Dc = standardized score of discriminant function c;
• Fhh = hourly factor that corresponds to hh-th hour of the day, and
• dc = discriminant function coefficient.
12 assignment models were specified, using different variable-selection
methods and algorithms, and tested on a large dataset, which consisted
of 35,100 test SPTCs for each DA model. The statistical evaluation of the
traditional method (based on functional classification of road sections) and
the 12 DA models are based on three statistical measurements:
• the absolute error of the AADT estimate,
AEv,dd =
| AADTv,Actual − AADTv,dd,Estimated |
× 100
AADTv,Actual
(5.44)
• the mean absolute error (MAE);
)
w (
1 ∑ | AADTv,Actual − AADTv,dd,Estimated |
MAE =
× 100
w
AADTv,Actual
(5.45)
s=1
• the standard deviation of the absolute error (SDAE):
√
)2
∑w (
s=1 AEv,dd − MAE
SDAE =
w−1
where:
• AE = absolute error;
• w = number of test SPTCs;
• s = SPTC (1, . . . , w);
111
(5.46)
• v = ATR index;
• dd = day of year;
• AADTv,Actual = actual AADT;
• AADTv,dd,Estimated = estimated AADT.
The analysis of the results, conducted using the ANOVA test, revealed
that:
• the best-performing directional volume-based model, which employs
the Rao’s V algorithm, produced a mean absolute error (MAE) of 4.2%,
which could be compared with errors reported in previous studies;
• an average decline in the MAE by 58% and in the SDAE by 70% is
estimated over the traditional roadway functional classification;
• when directional-specific factors are used instead of total volumebased seasonal adjustment factors, the improvement in the average
MAE is approximately 41% and 39% in the average SDAE.
5.4.2 Multiple Linear Regression
The use of Multiple Linear Regression to estimate AADT has been intensely investigated in recent years. Interesting results have been obtained
in terms of the identification of factors (regressors) which affect the AADT
values on a certain road section, however the results in terms of AADT
estimate accuracy appears still limited.
The Multiple Linear Regression considers the model:
AADTi = β0 + β1 Xi1 + +β j Xij + ϵi
(5.47)
where:
• AADTi = the value of the dependent variable for the i-th observation,
i = 1, . . . , n.
• Xij = the value of the j-th independent variable in the observation,
j = 1, . . . , m
• β0 = constant term;
• β j = regression coefficient for the j-th independent variable;
• ϵi = error term;
• n = observation number;
112
• m = number of independent variables.
Mohamad et al. (1998) estimated AADT on low-volume county roads
with a multiple linear regression model which incorporated quantitative
and qualitative predictor variables, including relevant demographic variables. Field traffic data collected from 40 counties in Indiana were used for
the calibration of the model and additional field traffic data collected from
8 counties to its validation.
The final model adopted had four predictor variables:
log10 AADT = 4.82 + 0.82X1 + 0.84X2 + 0.24 log(X4 ) − 0.46 log(X10 ) (5.48)
where:
• X1 = type of road, 1 if urban and 0 if rural;
• X2 = type of access, 1 easy access or close to state highway and 0
otherwise;
• X4 = county population;
• X1 0 = total arterial mileage of a county.
The model was considered acceptable, since had R2 = 0.77 and MSE =
0.1606, and the predictors resulted significant (form t-statistic values) with
signs as could be expected. The final model was applied on the validation
set, composed by data from additional Indiana counties. The percentage
difference between the observed and predicted values of the AADT ranged
from 1.56% to 34.18%, with the average difference of 16.78%.
Xia et al. (1999) applied the Multiple linear Regression model to estimate AADT for Nonstate Roads in Broward County in Florida, an urban
area with 1 million people living. The dataset adopted in the analysis was
particularly large, with 450 count stations randomly divided in calibration (90% = 399) and validation (10% = 44) subsamples. The model was
developed considering a large number of response variables, determined
from the elaboration of roadway, socio-economic and accessibility data. In
particular GIS tools were used to identify the influence of socio-economic
and accessibility characteristics on the monitoring stations. Buffers with different values of radius were considered to define land-use characteristics
around a counting station.
The final model was defined through a detailed variable selection process, which excluded the presence of outliers and avoided phenomena of
multicollinearity among predictors. Variable selected were: the number of
lanes on a roadway, the functional classification of roadway, the land use
type; the automobile ownership, the accessibility to nonstate roads, the
service employment.
113
The model was found acceptable (R2 = 0.6061), with significant values
of t-statics for the variables and signs coherent with the expectations, excepted the variable "Service employment". Also the validation of the model,
applied on 40 data points, leads to acceptable results. The percent difference
between observed and predicted AADT values ranged from 1.31% to 57%,
with an average difference of 22.7%. The model also underestimated the
AADT for about 5% of the entire test data set. Moreover, from the analysis
of error distribution and cumulative percent of testing points, 50% of the
test points had an error smaller than 20%, whereas 85% of the test points
had an error smaller than 40%.
In a later paper, Zhao and Chung 2001 applied the same approach on
a larger dataset that included all the AADTs for state roads, a new road
function classification system, and a more extensive analysis of land-use
and accessibility variables.
Four different models were calibrated: coefficients of determination (R2 )
were acceptable and the signs of the coefficients were as expected. 82 different data points (9.1% ot the total dataset) were used to examine the models’
predictive capability, comparing the AADTs for the roads estimated by the
models to the actual AADTs observed. The testing results included the mean
square of errors (MSEs) and the total error in percentage for the entire testing data set, which is less than 3% for all models. More interesting was the
analysis of the cumulative percent errors and the analysis of the maximum
error for each model. Comparison of the models shows that Model 1 had
the best performance in terms of prediction errors; about 37% of the testing
points had an error smaller than 10% and about 73% of the testing points
had an error smaller than 30%, respectively.
Considering percentage error, the authors stated that the models in their
current forms may not be adequate to meet the need of engineering design
or the calibration of travel demand models. They may be used for tasks that
do not require a high level of accuracy at individual sites such as estimating
systemwide vehicle miles traveled.
In order to better estimate AADT, Zhao and Park (2004) applied the
Geographically Weighted Regression (GWR) methods, which allow model
parameters to be estimated locally instead of globally, as in the case of
ordinary linear regression (OLR). The study, which used the same data
and model variables of Zhao and Chung (2001), investigated the spatial
variation of the parameter estimates, the local R2 from the GWR model and
the AADT estimation.
The basic idea of GWR is to allow different relationships between the dependent and independent variables at different points in space: the locally
linear regression parameters at a point are affected more by observations
near to that point than observations further away. The mathematical formalization of the model is similar to OLS models:
114
yi = β0 +
p
∑
βik xik + ϵi
(5.49)
k=1
where:
• yi = dependent variable at location i (i = 1, . . . , n) where n is the
number of observations;
• xik = independent variable of the k-th parameter at location i;
• βik = estimated k-th parameter at location i;
• ϵi = error term at location i;
• p = number of parameters.
The model parameters βik , (k = 0, . . . , p) are estimated for each observation of yi and xik , (k = 0, . . . , p). Therefore a total of n × (p + 1) parameters are
estimated for n observations, whereas the number of parameters estimated
for OLR model is p + 1. Error terms ϵi are assumed to be independent and
identically distributed (i.i.d.) with zero means and constant variance σ2 .
Parameters at a location i are estimated using the weighted least-squares
approach, which uses weights to allow the influence of observations from
different locations to vary according to their distance to location i. Weights
can be determined using different weighting functions. In the study two
popular weighting functions were tested: the bi-square function and the
Gaussian function.
The results produced with the validation dataset (82 count stations)
showed that both GWR models tested outperformed the OLR model in the
accuracy of AADT estimates (Figure 5.4). In particular the 63.41% of the
testing points had an error of less than 20% and 89% had an error smaller
than 32% for GWRBi model. More detailed analysis of AADT estimates
at specific sites were not provided, however the results appeared still not
satisfactory, compared to factor approach.
More recently Wang and Kockelman (2009) proposed the use of Krigingbased methods for mining network and count data, over time and space.
Using Texas highway count data (AADT estimates for 27738 sites from 1999
to 2005), the method forecasted future AADT values at locations where no
traffic detectors were present.
The method of Kriging, first developed by Matheron (1963), relies on
the notion that unobserved factors are autocorrelated over space, and the
levels of autocorrelation decline with distance. The values to be predicted
may depend on several observable causal factors (e.g., number of lanes,
posted speed limit, and facility type) which create a ”trend” estimate µ(s).
In general, spatial variables can be defined as follows:
115
Figure 5.4: Cumulative Percentage of Testing Points. Source: Zhao and Park 2004
Zi (s) = µi (s) + ϵi (s)
(5.50)
where:
• Zi (s) is the variable of interest (in the specific case the actual AADT);
• s is the location of site i, determined by coordinates (x, y);
• µi (s) is the deterministic trend;
• ϵi (s) is the random error component.
Based on the characteristics of Zi (s) three types of Kriging exist:
• Ordinary Kriging, if µ(s) is constant across locations (or explanatory
information is lacking);
• Simple Kriging, if the trend is known but varies across locations;
• Universal Kriging, if trends depend on explanatory variables and unknown regression coefficients.
116
For all types of Kriging weak stationarity is assumed, therefore the
correlation between Z(s) and Z(s+h) does not depend on the actual locations,
but only on the distance h between the two sites, and the variance of Z(s +
h) − Z(s) = 2γ(h) for any s and h. This fact can be expressed by the formula:
1
γ(h) = var[Z(s + h) − Z(s)]
(5.51)
2
where var[Z(s + h) − Z(s)] is the variance (over all sites) between counts
taken at sites s and s + h.
In Universal Kriging µ( s) can be described using any deterministic function, such as the linear function µ( s) = Xβ, where X contains explanatory
variables (e.g. number of lanes and facility type). In contrast, ϵi (s) reflects
unobserved variation (e.g. local land use patterns and transit service levels).
To estimate the random component ϵi (s), first an appropriate ”curve”
or semivariogram model to best fit the relationship γ(h) for a given dataset
is chose. There are several commonly used models, including exponential,
spherical and Gaussian models. These models all rely on three parameters
that describe their shape while quantifying the level of spatial autocorrelation in the data:
• c0 , the ”nugget effect”, which reflects discontinuity at the variogram’s
origin, as caused by factors such as sampling error and short scale
variability.
• a is the ”range”, a scale factor which determines the threshold distance
at which γ(h) stabilizes;
• c0 + c1 is the maximum γ(h) value, called the ”sill”, with c1 referred to
as the ”partial sill”.
Figure 5.5 illustrates these parameters.
While using Ordinary Kriging it is simple to estimate the three shape
parameters, in Universal Kriging, the vector of parameters β needs to be
estimated simultaneously or iteratively (in synch with c0 , c1 and a). This can
be done using a series of feasible general least square (FGLS) regressions or
to use restricted maximum likelihood estimation (REML) by assuming that
errors follow a normal distribution.
The real challenge is the calculation of distance: currently, all available
packages use Euclidean distances, which can be easily derived based on
locations of the sites. The computational burden can increase dramatically
if non-Euclidean distances are used and sample size is large.
The authors applied Kriging methods on their large dataset, spatially interpolating traffic counts on segments of the same functional class (interstate
highways and principal arterials). To ensure computational tractability they
rely on Euclidean distances and exponential semi-variogram specification
thanks to its better fit.
117
Figure 5.5: Illustration of Semivariogram. Source: Wang and Kockelman 2009
The error ratio obtained exhibited no clear spatial trends, however spatial interpolation using ordinary Kriging methods suggested that traffic
volumes on different classes of roadways had rather different patterns of
spatial autocorrelation.
Furthermore model validation suggested that, given the limited number
of influential factors considered, the spatial interpolation method yielded
fairly reliable estimation results, especially for road sections with moderate
to high traffic volumes (overall AADT-weighted median prediction error
was 31% across all Texas network sites). Compared to a previous studies,
where AADT were estimated by assigning the AADT from its nearest sampling site (the standard technique for no-information AADT estimation),
the authors concluded that the Kriging method is capable of yielding more
reliable estimations than simply relying on the count of a location’s nearest
count location.
In a more recent paper Selby and Kockelman (2011) implemented Universal Kriging to the Texas network, adding further site attributes, including
functional classification, lane numbers and speed limits to predict AADT
in count sites. Compared to previous paper the authors considered more
detailed data treatment options (i.e. AADT data transformation, network
distances and Euclidean distances) and estimation methodologies. Universal Kriging was found to reduce errors (in practically and statistically significant ways) over non-spatial regression techniques (between 16% and 79%
in the case study, depending on the data set and model specification used).
Dividing up the roadways into groups by class improved the prediction by
either method: lower errors were found on the urban roadways subset of
data points, and, in particular, on the interstate subset, while at some sites
errors remain quite high, particularly in less dense areas and on small roads
near major highways (Figure 5.6).
118
Figure 5.6: Comparison of Percentage Prediction Errors (Urban roads). Source: Selby and
Kockelman 2011
Concluding this review of methods which used land-use attributes in
the AADT estimation process, Li, Zhao, and Chow (2006) considered this
type of information to obtain the assignment of SPTCs to the road groups
using a fuzzy-decision tree. The decision tree had the advantage of being conceptually easy to understand and did not require extensive data to
operate. However, its implementation three counties in Florida was not satisfactory. The authors attributed the results to the fact that the four land use
variables did not completely explain the traffic variations, that the sample
size was limited, that ATR sites might not have reflected all representative
land use patterns.
5.4.3 Artificial Neural Networks
The use of Artificial Neural Network was introduced in section 5.3 as
a tool to create road groups. A different use of ANNs was proposed by
a group of researchers from the University of Regina (Canada) to directly
estimate AADT from SPTCs, avoiding the creation of road groups and, as a
main consequence, the risk of incorrect assignment.
The first paper introducing the use of ANNs to estimate AADT from
SPTCs was done by Lingras and Adamo (1996), but a more detailed description of the ANN was provided by (Sharma et al. 1999). In this paper
the authors presented the ANN as a way to estimate AADT from 48hr
SPTCs, comparing the results obtained with traditional factor approach
(FACT model).
119
Figure 5.7: ANN Model for AADT Estimation. Source: Sharma et al. 1999
The ANN structure adopted had a multilayered, feedforward, backpropagation design (Figure 5.7), and neurons adopted the sigmoid transfer
function. Data from 63 ATRs in Minnesota were used to test the quality
of this approach. Since Minnesota’s routine short-term counts are of 48hr
duration starting at 12:00 p.m. on Monday, Tuesday, or Wednesday from
April to October, the analysis was limited to three 48hr counts per week for
each ATR during the counting period of 7 months, April to October.
Traditional factor approach was implemented using Sharma and Werner
(1981)’s approach for ATRs grouping. On the basis of 21 composite expansion factors (3 days of week × 7 months of counting), 5 road groups were
identified: ATRG1 and ATRG2 included commuter roads, roads of ATRG3
were average rural routes and ATRG4 and ATRG5 included recreational
roads, with more traffic in summer and winter.
To evaluate the quality of AADT estimates, 48hr SPTCs were generated
120
from the dataset and were used as follows (see section 5.3.1):
1. An ATR was removed from the road group to which it belonged to
create sample counts;
2. The ATR removed was called ”sample ATR” and the group from
which it was taken ”home group”; Seasonal adjustment factors were
calculated for the home group, excluding the sample ATR;
3. The adjustment factors thus created were used to estimate AADT from
samples generated by the sample ATR;
4. The process is repeated using each section at a time as a sample ATR,
generating a large number of samples and AADT estimates.
The AADT estimated from a sample count was calculated by:
Estimated AADT =
48 − hrsamplevolume
× adjustment f actor
2
and the AADT estimation errors (∆) were calculated by:
AADTEstimate − AADTActual ∆=
× 100
AADTActual
(5.52)
(5.53)
The Neural Network approach was tested in different conditions:
• ANN1 (Model without classification): A single artificial neural network model was developed for all ATR sites without consideration of
the day or month of counting. 48 neurons were included in the input
layer, 24 in the hidden layer and and 1 in the output layer.
• ANN2 (Model with classification): A separate neural network model
was developed for each ATRG developed for the traditional method,
taking into account also the effect of the day and month of counting.
The structure of the ANN is similar to ANN1 case.
For the training of both types of ANN, the input data were 48 input
factors, defined as:
Hourly volume
(5.54)
SADT
where SADT was the total 48hr sample volume divided by 2. Since
supervised learning was adopted, also the neural network output was given
in the learning phase, using the actual AADT factor, defined as:
Input f actor =
ActualAADT f actor = 0.25 ×
121
AADT
SADT
(5.55)
In the testing phase the trained ANNs were fed with input factors for
testing sample counts and the ANNs gave output factors to be used in
AADT estimation as follows:
EstimatedAADT = 4(SADT × Output f actor)
(5.56)
The results obtained with traditional and ANN approaches were compared considering frequency and cumulative frequency distribution and
average errors for each of the models (Figure 5.8). The neural network
model ANN1 developed from unclassified data resulted in higher AADT
estimation errors than those of the other two models. However the authors
made a couple of relevant observations.
Figure 5.8: AADT Estimation Errors Using Various Models. Source: Sharma et al. 1999
First, they considered that factor group classification improved results
slightly when this kind of information is added in the neural network
structure. Second, the experimental sample sites generated for this model
122
were correctly assigned to the various factor groups. In practice, in the
absence of appropriate duration and frequency of seasonal counts covering
all potential sites on the road network, it is unlikely that all short-period
(e.g., 48hr) count sites would be assigned 100% correctly to one of the
available factor groups. It is therefore reasonable that, in actual practice, the
FACT model would not be able to provide comparable estimates of AADT,
as shown in Table 1.
Therefore the authors conducted a preliminary investigation of the accuracy of two 48hr counts taken in two different months as compared with
a single 48hr count, developing ANN1 model structure.
The first type of model was the ANN(1,48) model: in this case different
models were developed for various months as done for ANN1 models. The
second type of model used two 48hr counts and it was called ANN(2,48).
It had 96 neurons in the input layer, 24 neurons in the hidden layer and
a single neuron in the output layer. The network was fed with 96 hourly
volumes in the form of input factors and with the desired output factor. No
factors reflecting daily or monthly volume characteristics were provided as
input to the model.
As can be observed from the results reported in Figure 5.8 the use of
two 48hr counts taken in different months produced a notable reduction
in the errors. The authors suggested that this fact could be related to the
ANN’s capability to distinguish seasonal variation in traffic and produce
better AADT estimates by classifying patterns internally.
The authors implemented the ANN approach to another case study,
focusing their attention to the estimation of AADT on low-volume roads
(Sharma et al. 2000). Data from 55 ATR sites located on the provincial highway network of Alberta, Canada, were considered for the investigation.
ATR sites were representative of different functional classes and trip purposes and they were divided in three groups, based on the AADT observed
values: Group 1 (AADT < 500), Group 2 (501 < AADT < 750), Group 3
(751 < AADT < 1000).
Traditional factor approach (Sharma and Werner 1981) was implemented
on the basis of 15 composite adjustment factors (3 days of week × 5 months
of counting, May to September). 5 road groups were identified: ATRG1
(21 sites), ATRG2 (26 sites), ATRG3 (3 sites), ATRG4 (2 sites) and ATRG5 (3
sites). Also a subjective classification was made by traffic monitoring branch
of Alberta Infrastructure, based on limited sample counts taken in the past
and other factors, including geographic location and land-use characteristics. The grouping obtained consisted of four groups of low-volume roads:
agriculture/resource (40 sites), resource (10 sites), tourist/resource (3 sites),
tourist (2 sites).
The quality of AADT estimates obtained with the traditional factor approach was analysed considering three road classification schemes: the hierarchical grouping by Sharma and Werner 1981, the subjective classification
123
Figure 5.9: AADT Estimation Errors Using Neural Network Models. Source: Sharma
et al. 1999
and a scheme in which all ATR sites belonged to the same group. 48hr
SPTCs were generated from the dataset and were used as done in Sharma
et al. 1999’s paper, calculating the AADT estimation error ∆ for each sample
count (Equation 5.53).
The ANN approach used in this paper adopted a similar structure of
the previous paper SLXL99. In this case the number of neurons in the input
layer was equal to the total number of hourly volumes included in the
sample counting problem: 96 for two 48hr counts, 48 for one 48hr counts.
The hidden layer had a number of neuron equal to half of the number in the
input layer and one neuron was included in the output layer. The learning
process of the ANN is the same done by Sharma et al. (1999).
The results obtained using the different approaches were reported in
Figures 5.10 and 5.11.
Traditional factor approach based on hierarchical grouping produced
more accurate AADT estimates compared to the other classification schemes.
However the implementation of the ANN approach was a valuable option if
ANN(2,48) models were adopted. These ANNS produced AADT estimates
comparable with factor approach results, with the advantage of not being
124
Figure 5.10: AADT Estimation Errors Using Factor Approach. Source: Sharma et al.
2000
influenced by the assignment process of SPTC to road groups, as stated by
Sharma et al. 1999. In addition to this, the study highlighted that the effect
of volume on AADT estimation errors were negligible, since there were no
appreciable differences among percent error values for the various groups
of low-volume roads.
These results were confirmed by the authors in a third study (Sharma
et al. 2001), which analysed the implementation of similar ANN models,
characterised by different number (1, 2 and 3) and duration (24, 48 and 72
hours) of short counts. The observed results, in terms of percent error of
AADT estimates, highlighted in particular that:
• From a traffic monitoring point of view, if the AADT observed were
less than 1000 vehicles they could be considered low-volume roads,
without further AADT distinctions;
• The use of two 48hr short counts could be a reasonable choice, since
there was a little gain in the accuracy of estimates when three 48hr
counts were used instead of two.
125
Figure 5.11: Comparison of AADT Estimation Errors. Source: Sharma et al. 2000
• The ANN using two 48hr short counts produce better estimations
than those using 24-h counts, but there were not further advantages
in using two 72hr counts;
5.5 Final Considerations
This review concerning AADT estimation based of FHWA factor approach has lead to the identification of two relevant issues, not well addressed by past literature:
• With reference to the grouping of road segments, it is difficult to
identify the correct number and characteristics of road groups for
a given road network. An AVC site could belong to more than one
road group, and the groups cannot be easily defined in language (e.g.,
commuter road, recreational road);
• Considering the assignment problem of a given road section to a
126
road group, it was assumed that each road segment belongs to one
group only, but it is apparent that a road section could exhibit the
characteristics of more than one group.
Based on these considerations, a new approach has been developed and
proposed to solve these issues, introducing specific theoretical frameworks.
127
128
Chapter 6
Proposed Approach
Based on the review of TMG factor approach, two points resulted critical
and not well addressed by previous researches:
• the identification of the correct number and characteristics of road
groups for a given road network;
• the treatment of situations in which a particular road segment belong
to more than one group and how to measure the degree of similarity
to each group.
The proposed approach (Gecchele et al. 2012) allows the analyst to deal
with the situation when a road segment appears to belong to more than
one group and to provide the degree of belonging to each group, while
preserving the framework of the FHWA procedure. Fuzzy set theory is
introduced to represent the vagueness of boundaries between individual
road groups. To deal with the difficulties of identifying the group that
matches the given road section, the measures of uncertainty (non-specificity
and discord) are introduced. These measures help to interpret the quality
of the estimates in an objective manner and also indicate the need for
additional data collection.
As shown in Figure 6.1, the proposed approach has four steps:
Step 1: Group the AVC sites using the fuzzy C-means algorithm on the
basis of the seasonal adjustment factors of individual AVC (See section 1 of
Figure 6.1) ;
Step 2: Assign the road segment for which SPTC is available to one or
more of predefined road groups, using neural networks (See section 2 of
Figure 6.1);
Step 3: Calculate the measures of uncertainty associated with the assignment to specific road groups (See section 3 of 6.1);
Step 4: Estimate AADT as the weighted by the average of SPTC daily volumes adjusted by seasonal adjustment factor of the assigned road group(s)
(See section 4 of Figure 6.1).
129
Figure 6.1: Scheme of the Proposed Approach
More details about these steps are given in the sections below.
6.1 Grouping Step
When AVC is given for site i, the TMG’s clustering approach assigns it to
only one of the groups, assuming that the boundaries between road groups
are well defined. This is not always the case in reality. The boundaries of
the groups are often fuzzy; thus, AVC site i could belong to more than
one group with different degrees, between 0 and 1, where 0 indicates no
belonging to a group, and 1 represents complete belonging to a group (Klir
and Yuan 1995).
The Fuzzy C-means (FCM) (Bezdek 1981; Bezdek, Ehrlich, and Full
1984) is a well-established algorithm to implement clustering when the
boundaries of groups are fuzzy (see section 2.3.6). Given the number of
groups C, the algorithm provides the degree that an AVC site belongs to
each group used the seasonal adjustment factors as input variables.
Maintaining the structure of TMG factor approach, the seasonal adjustment factor for an AVC site k for the i-th day of the week of the j-th month
is calculated by:
130
fijk =
AADTk
ADTijk
(6.1)
where AADTk is the AADT for the k-th AVC site, ADTi jk is the average
daily traffic recorded in the i-th day of week of the j-th month in the k-th
AVC site.
FCM requires that the number of groups (C) be specified in advance;
however, the appropriate number is not known a priori. To solve this issue,
the following procedure can be adopted:
• The value C is set to 2, that is the minimum number of groups (clusters)
that can be identified;
• The FCM algorithm is ran for different time periods, changing the
starting point and verifying the stability of results. The final output is
the set of membership grades uij of each AVC site i to each road group
j;
• These steps are repeated by incrementing the values of C to a maximum value Cmax , which depends on the road network dimension and
complexity;
• The optimal number of road groups C∗ is chosen by comparing the
values of some performance indices adopted for hierarchical clustering (see section 2.3.2), such as Dunn Index, Silhouette measure or
Pseudo F Statistic.
• The set of membership grades corresponding to the optimal number
of clusters C∗ is assumed to be the final output.
Once the optimal number of groups C∗ is fixed, the set of membership
grades must be interpreted to identify the characteristics of the clustering
obtained.
This analysis allows to distinguish among ”well defined” AVC, which
clearly belong to a road group, and ”I don’t know” cases, where the AVC
could belong to different road groups. This distinction is based on the
membership grades uij of each AVC site i to each road group j , following
these rules (Hanesh, Sholger, and Dekkers 1984):
1. The highest membership grades uij is identified and set to uijmax ;
2. The ratios of all the membership grades uij to the highest one are
calculated, that is:
ruijk =
131
uij
uijmax
(6.2)
where j is the index for the j road group considered and k is the index
for the road group with the highest membership grade. ruijk assumes
the value 1 for the road group with the highest membership grade,
and decreasing values as the membership grade to the road group
decreases.
3. If the uijmax > 0.5 and the second highest ruijk < 0.75, then the AVC is
considered as clearly belonging to the road group, otherwise the AVC
is considered an ”I don’t know” case;
4. The AVC classified as an ”I don’t know” case could belong to the group
with the highest membership grade or to the groups with rui jk > 0.75.
To better explain how the procedure is applied, one considers the different results obtained for a clearly belonging AVC (AVC 1) and for an ”I
don’t know” AVC (AVC 2), whose membership grades to five road groups
are reported in table 6.1.
Table 6.1: Example of clearly belonging and ”I don’t know” AVCs
Group 1 Group 2 Group 3 Group 4 Group 5 Highest
AVC 1
0.7
0.2
0.1
0.0
0.0
0.7
AVC 2
0.05
0.0
0.35
0.3
0.3
0.35
AVC 1 Ratio
1.0
0.29
0.14
0.0
0.0
AVC 2 Ratio 0.14
0.0
1.0
0.85
0.85
-
• AVC 1 case: The highest membership grade uijmax is higher than 0.5
(u11 = 0.7 for Group 1). At the same time the ratio between the second
highest value of membership grade (u12 = 0.2 for Group 2) and the
highest one (u11 = 0.7 for Group 1) is lower than 0.75 (ru121 = 0.29).
The result is that AVC 1 clearly belongs to Group 1.
• AVC 2 case: The highest membership grade is lower than 0.5 (u23 =
0.35 for Group 3). At the same time the ratio between the second
highest value of the membership grade (u24 = u25 = 0.3 for Group
4 and 5) and the highest one (u23 = 0.35 for Group 3) is higher than
0.75 (ru243 = ru253 = 0.85). Therefore AVC 2 is considered an ”I don’t
know” case. Since the membership grade to Group 3 is the highest one
(u23 = 0.35) and only for Group 4 and 5 the ratio is higher than 0.75
(ru243 = ru253 = 0.85) it is assumed that AVC 2 could belong to ”Group
3 or 4 or 5”.
As done in the TMG factor approach (section 4.2), the seasonal adjustment factors that correspond to (i, j) combinations are calculated for the C∗
road groups. If n AVC sites clearly belong to road group c, the seasonal
adjustment factor for the i-th day of week of the j-th month is calculated by:
132
fijc
1 ∑ AADTk
1∑ k
=
=
fi j
n
ADTijk
n
n
n
k=1
k=1
(6.3)
where AADTk is the AADT for the k-th AVC site clearly belonging to
group c and ADTijk is the average daily traffic recorded in the i-th day of
week of the j-th month in the same AVC site.
6.2 Assignment Step
Artificial Neural Networks (ANNs) are an effective technique to assign
an input to an output when the causal relation is not well-understood.
In this study, a multi-layered, feed-forward, back-propagation design is
constructed, adopting the sigmoid transfer function for the neurons of the
network. The ANN has three layers: one input layer, one output layer and
one hidden layer, characterised as follows:
• the input layer includes a node describing the period of the year in
which the counts are taken (generally the month) and a node for each
hourly factor taken from the SPTC. The hourly factor, hl , is defined as:
HTl
(6.4)
DT
where l = 0, . . . , 23 is the starting time for the hourly count, HTl is
the hourly traffic volume for hour l and DT is the daily traffic for the
specific SPTC. Since the number of hourly factors can vary depending
on the duration of SPTCs (e.g. 24hr, 48hr), the number of neurons in
the input layer will vary consequently (i.e. 24 or 48).
hl =
• the output layer has a node for each road group and nodes for all the
power sets, such as Groups (1 or 2), Groups (2 or 3 or 5), including
Groups(1, . . . , C∗ ), which is the case of total ignorance or ”I don’t know
at all”.
• the hidden layer has a variable number of nodes depending on the
type of SPTCs considered. A rule of thumb that can be adopted as
a first approximation is h = (attribute + groups)/2, where h is the
number of hidden layers, attribute is the number of input variables
and groups the number of groups.
6.3 Measures of Uncertainty in Assignment
A new aspect of this research is to investigate the uncertainty associated
with assigning a road section to road groups. This section develops two
measures of uncertainty for this purpose.
133
One case is when one cannot say either this group or that group because
the traffic pattern fluctuation is not consistent enough to specify to just
one group. The other case is when traffic patterns are conflicting so that a
certain pattern at a particular time points to one group and at other times
points to another group. The uncertainty related to the former is called NonSpecificity, and that of the latter is called Discord. These two measures are
developed in the Dempster-Shafer theory. While details of this theory can
be found in (Klir and Wierman 1999), the expressions for these measures
are provided here.
Consider that the output nodes of the neural network represents all
possible combinations of road groups; that is, not only road groups 1 to C∗,
but also all the power set of the road groups, e.g, (1 or 2), (2 or 4), (1 or 2
or 3), etc., a total of 2n . Consider also that the weight associated with each
final node is the degree that the traffic pattern supports individual power
sets of road groups.
Let the weights associated with final node as m(x), where x is a road
group or more than one road group, and m(x) = 1.
• When m(A) = 1, it is certain that the road section in question belongs
to road group A.
• When m(A or B) = 1, the road section in question belongs either A or
B, but the specific is uncertain.
• When m(X) = 1, where X is all road groups, the road section in question can belong to any of the group, in other words, the situation of ”I
don’t know”.
Given this probability distribution, m(x), the measure of non-specificity,
N(m), and the measure of conflict D(m), are developed.
N(m) provides the measure of uncertainty that the available traffic pattern has no specific information about which road group the road section
belongs to, or Non-Specificity. It can be calculated by:
∑
N(m) =
m(A) · log2 | A |
(6.5)
A∈F
where | A | is the number of road groups in power set A.
The value of N(m) is within [0, log2 | X |]. X is the universal set (all road
groups) and | X | is the number of these groups. The minimum of N(m)
is obtained when m(A) = 1, or the probability of belonging to a particular
road group is one. The maximum of N(m) corresponds to the case of ”not
able to assign to any specific group”.
D(m) provides the measure of uncertainty that the available traffic pattern contains conflicting information; it can belongs to one group and also
another group, information of Discord. It can be calculated by:
134
D(m) = −
∑
A∈F



∑
|
A
∩
B
|

m(A) · log2 
m(B)

|B| 
(6.6)
B∈F
where | B | and | A ∩ B | are the numbers of power sets associated with
group B and for the intersections among subsets A and B, respectively. These
measures are used to characterize the traffic pattern data collected, SPTC,
at the road section which is to be classified to one or more of the predefined
road groups.
6.4 AADT Estimation
The final step is to estimate AADT for the road section in question,
where SPTC are available.
The degree that a road section belongs to each group, which is found in
the output of the neural network, is used to calculate AADT.
For example, if the degree of belonging to road group 1 and (1 or 2) are
m(1) = 0.4 and m(1, 2) = 0.6, respectively, the final weights adopted for the
estimation are calculated as w(1) = 0.4 + 0.6/2 = 0.7 and w(2) = 0.6/2 = 0.3.
Therefore, the AADT estimate for a given SPTC is calculated by:
AADT = w(1) · SPTC · fi j1 + w(2) · SPTC · fi j1
(6.7)
where fij1 and fij2 are found in Equation 6.3.
In the case of d-days SPTCs, the estimation of AADT is repeated d times
using the 24hr volume data. The final AADT estimate is the average of
these AADT estimates. For example, for a 3-days (72hr) SPTC the final
AADT estimate is:
AADT1 + AADT2 + AADT3
(6.8)
3
where each AADT is the AADT estimated using the 24hr volume data
for each of the d monitoring days.
AADTFinal =
135
136
Part III
Case Study Analysis
137
Chapter 7
Case Study
The approach presented in Chapter 6 has been implemented in a real
world situation, testing its validity using traffic data. Different combination
of SPCTs were tested, evaluating the accuracy of AADT estimates obtained
in each condition.
Given the road network and the AVC sites, the Fuzzy C-means algorithm
was used to create road groups. SPTCs were extracted from the AVC sites,
and AADTs were estimated from the SPTCs, based on the assignment to
road groups given by the Artificial Neural Network. The estimated AADT
value were compared with the actual AADTs of the AVC sites.
Results have been interpreted considering measures of uncertainty (discord and non-specificity) and compared with those obtained by two approaches proposed in previous studies.
7.1 Data Source
The case study traffic data were obtained from the SITRA (TRAnsportation Information System or ”Sistema Informativo TRAsporti”) monitoring
database of the Province of Venice.
Since 2000 the local Transportation Office is responsible of the monitoring activities taken on the rural road network of the Province of Venice. The
Department of Structural and Transportation Engineering of the University of Padova (Della Lucia 2000) collaborated to the design of the original
program and is still involved in data management and validation processes.
The design of the system was inspired by the procedure proposed by
Traffic Monitoring Guide. Automatic Vehicle Classifiers (AVCs) were installed in a small number of road sections to permanently collect traffic
data (PTCs). Road groups were identified based on similarity of seasonal
adjustment factors calculated from PTCs. SPTCs (48hr to 2 weeks) and
STCs (2 to 12 weeks) were periodically taken in the other road sections, following a detailed monitoring program. These road sections were assigned
139
to road groups and AADT could be estimated adjusting STPCs with the
corresponding seasonal adjustment factors.
Variations of traffic patterns in different period of the year are accounted
using 18 seasonal adjustment factors. These seasonal adjustment factors are
calculated based on the combinations of:
• 3 day-types (Weekdays, Saturdays, Sundays);
• 6 two-month periods (January-February, March-April, May-June, JulyAugust, September-October, November-December).
Monitoring road sections are generally equipped dual loop systems.
Since the network links are mainly two-lane roads, two AVCs were installed
in each site, one for each direction of traffic flow. AVCs collect hourly traffic
volumes for different vehicle and speed classes (Tables 7.1,7.2). Hourly data
are aggregated to obtain some relevant traffic parameters for each road
section, including:
• Annual Average Daily Traffic (AADT);
• Annual Average Daytime Traffic;
• Seasonal Adjustment Factors;
• 30th highest annual hourly volume;
• Peak hour factors;
• Traffic growth trends.
Table 7.1: Length Classes Adopted by SITRA Monitoring Program
Class
Length [m]
I
II
III
IV
V
VI
VII
L < 5.0
5.0 < L < 7.5
7.5 < L < 10.0
10.0 < L < 12.5
12.5 < L < 16.5
16.5 < L < 18.0
L ≥ 18
Level of Aggregation
0
1
2
LU01 LU11 LU21
LU02 LU12 LU22
LU03
LU13
LU04
LU23
LU05
LU14
LU06
LU07
Periodically the Transportation Office issues reports with traffic data
summaries for each road section, that can be used by pratictioners and
other public agencies (e.g. Figure A.1 reported in the Appendix). More detailed data can be also obtained from the website of the monitoring program
(http://trasporti.provincia.venezia.it/pianif_trasp/osservat.html).
140
Table 7.2: Speed Classes Adopted by SITRA Monitoring Program
Class
Speed [km/h]
I
II
III
IV
V
VI
VII
V < 30
30 < V < 50
50 < V < 70
70 < V < 90
90 < V < 110
110 < V < 130
V ≥ 130
Level of Aggregation
0
1
LU01 LU11
LU02
LU03
LU12
LU04
LU05
LU06 LU13
LU07
Since 2000 the number of AVCs has increased, reaching a good coverage
of different types of road in the network. In 2012 the number of working
AVC sites is 39, that is 78 AVCs which monitor directional traffic volumes
(see Figure A.2 reported in the Appendix).
The traffic data for the study are the volumes obtained for the year
2005 at 42 AVCs (see Figure A.3 reported in the Appendix). The remaining
AVCs have been excluded from the analysis since they were affected by
considerable amounts of missing data in some periods of the year due to
vandalisms and equipment failures.
Some assumptions have been made in the analysis:
• Data from monitored road sections have been analysed separately for
each direction, based on the findings of Tsapakis et al. 2011;
• Estimation of AADT has been done for passenger vehicles only. Passengers vehicles data were divided by truck vehicles data, with reference to a 5 m-length threshold. This choice was made following the
indications given by the FHWA concerning the specificities of truck
traffic pattens, as reported in section 4.5.
7.2 Data Treatment
The total amount of available data at the AVC sites was 12,695 days of
counts, which corresponded to 304,680 hours of counts. Hourly volumes
of each AVC were sampled to form SPTCs of different durations (24 to 72
hours). Different SPTCs combinations have been tested, to simulate alternative SPTCs campaigns performed in the road network. For each combination
a specific dataset has been created, as reported in Table 7.3.
Data included in each dataset were divided into 2 groups using the
stratified holdout method (Witten and Frank 2005):
141
Table 7.3: SPTCs Datasets Used for Simulations
Number
1
2
3
4
5
6
7
Duration
24hr
24hr
24hr
48hr
48hr
72hr
72hr
Day-type
Weekdays
Saturdays
Sundays
Weekdays
Week-ends
Weekdays
Week-ends
Starting Days
Mondays to Fridays
Mondays to Thursdays
Saturdays
Mondays to Wednesdays
Fridays
1. Training dataset (70% of samples), used for the learning process of the
ANN adopted for the Assignment of SPTCs;
2. Testing dataset (30% of samples), used to evaluate the accuracy of the
AADT estimates.
Therefore SPTCs included in the testing dataset were used as the input to
the ANN which has been developed from the training dataset.
Analysing the number of samples included for each AVC (Table A.1
reported in Appendix), it must be observed that:
• AVCs have different sample dimensions. AVC with less data have
been included in the analysis only if all the seasonal adjustment factors
could be calculated. This means that missing data did not concentrate
in specific periods of the year but were distributed throughout the
year;
• Counts taken during Italian holidays (e.g. Christmas, Easter, Celebration of Italian Republic) have been excluded from the analysis. In fact
these holidays can occur in weekdays, but usually they show traffic
patterns more similar to week-ends. Their exclusion from the analysis
preserved the integrity and the quality of seasonal adjustment factors.
7.3 Model Implementation
Three tasks were conducted for the implementation of the proposed
model: identifying road groups, developing and executing the artificial
neural networks, and calculating AADT.
7.3.1 Establishing Road Groups Using Fuzzy C-Means
PTCs data from the 42 AVCs were used to establish the road groups (see
section 6.1). Seasonal adjustment factors fijk were calculated for each AVC
142
k, following Eq. 6.1. Then the Fuzzy C-means algorithm was applied using
the adjustment factors as inputs. The algorithm was tested by changing the
values of C from 2 to 20, running the algorithm for different time periods
(10), changing the starting point and verifying the stability of results. The
best number of groups was chosen by comparing the values of four indices:
• the Dunn Index;
• the Silhouette measure;
• the Pseudo F Statistic;
• Goodman and Kruskal’s index G2.
Based on these criteria, the best number of groups C∗ was found to be
8 for the case study. For each AVC k the membership grades to each road
group uij (Table 7.4) were analysed with the procedure presented in section
6.1 and the belonging to a Road Group was determined.
Table 7.4: Membership Grades of the AVCs to Different Road Groups
AVC Number
Group1
Group 2
Group 3
1
0.01
0.01
0.01
2
0.01
0.01
0.01
3
0.05
0.02
0.04
4
0.03
0.01
0.02
5
0.19
0.17
0.60
6
0.07
0.03
0.90
7
0.02
0.01
0.01
8
0.01
0.01
0.01
9
0.61
0.09
0.26
10
0.62
0.08
0.26
11
0.66
0.06
0.25
12
0.68
0.07
0.21
13
0.85
0.04
0.10
14
0.33
0.09
0.26
15
0.32
0.09
0.25
16
0.47
0.09
0.40
17
0.40
0.10
0.45
18
0.32
0.10
0.26
19
0.08
0.74
0.14
20
0.08
0.76
0.13
21
0.12
0.08
0.79
22
0.14
0.13
0.69
23
0.08
0.77
0.12
24
0.10
0.72
0.13
25
0.13
0.27
0.56
26
0.14
0.22
0.60
27
0.08
0.05
0.86
28
0.12
0.16
0.71
29
0.26
0.16
0.54
30
0.15
0.06
0.76
31
0.36
0.28
0.28
32
0.27
0.27
0.36
33
0.01
0.01
0.01
34
0.01
0.01
0.01
35
0.09
0.05
0.07
36
0.00
0.00
0.00
37
0.01
0.01
0.01
38
0.06
0.78
0.15
39
0.00
0.00
0.00
40
0.01
0.00
0.01
41
0.78
0.05
0.14
42
0.80
0.04
0.13
AVCs ”clearly belonging” to a group are marked in bold
Group 4
0.03
0.04
0.78
0.87
0.02
0.00
0.06
0.02
0.02
0.02
0.02
0.02
0.01
0.25
0.26
0.03
0.03
0.25
0.02
0.02
0.01
0.02
0.02
0.02
0.02
0.02
0.01
0.01
0.03
0.01
0.05
0.05
0.04
0.04
0.63
0.04
0.02
0.01
0.00
0.01
0.02
0.02
143
Group 5
0.91
0.05
0.04
0.03
0.01
0.00
0.81
0.02
0.01
0.01
0.01
0.01
0.00
0.04
0.03
0.01
0.01
0.03
0.01
0.01
0.00
0.01
0.01
0.01
0.01
0.01
0.00
0.00
0.01
0.01
0.01
0.02
0.89
0.04
0.09
0.00
0.02
0.00
0.01
0.02
0.01
0.01
Group 6
0.02
0.79
0.05
0.02
0.01
0.00
0.05
0.91
0.01
0.01
0.00
0.01
0.00
0.02
0.03
0.00
0.01
0.03
0.01
0.00
0.00
0.01
0.00
0.01
0.01
0.01
0.00
0.00
0.00
0.01
0.01
0.02
0.03
0.86
0.05
0.00
0.05
0.00
0.03
0.00
0.00
0.00
Group 7
0.00
0.01
0.01
0.01
0.00
0.00
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.89
0.00
0.00
0.90
0.05
0.00
0.00
Group8
0.01
0.08
0.01
0.01
0.00
0.00
0.03
0.02
0.00
0.00
0.00
0.00
0.00
0.01
0.01
0.00
0.00
0.01
0.00
0.00
0.00
0.00
0.00
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.01
0.01
0.01
0.03
0.02
0.07
0.88
0.00
0.06
0.90
0.00
0.00
Road Group
5
6
4
4
3
3
5
6
1
1
1
1
1
(1, 3, 4)
(1, 3, 4)
(1, 3)
(1, 3)
(1, 3, 4)
2
2
3
3
2
2
3
3
3
3
3
3
(1, 2, 3)
(1, 2, 3)
5
6
4
7
8
2
7
8
1
1
The seasonal adjustment factors that correspond to (i, j) combinations
were calculated for the C∗ road groups with Equation 6.3. Reciprocals of the
seasonal adjustment factors r fijc were defined by:
r fi jc =
1
fijc
(7.1)
and they could represent the characteristics of fluctuations better than
the seasonal adjustment factor, fijc . Average seasonal adjustment factors of
the road groups and their reciprocals are reported in Tables A.2 and A.3 in
Appendix. In Figure 7.1 the average reciprocals of the seasonal adjustment
factors r fijc for different days and periods of the year are plotted for each
road group.
Figure 7.1: Reciprocals of the Seasonal Adjustment Factors for the Road Groups
Analysing the plot some clearly distinguishable traffic patterns can be
observed:
• Groups 1 (7 AVCs), 2 (5 AVCs), and 3 (10 AVCs) could be characterized
as commuter road groups. These groups show stable traffic patterns,
with seasonal adjustment factors close to one, in particular for weekdays. Weekly traffic patterns occur in similar manner during the year,
but some differences exist among groups:
– Group 1 passenger car volumes increase from weekdays to Saturdays and decrease from Saturdays to Sundays;
144
– Group 2 shows a pattern similar to Group 1, but the decrease observed during Sundays is more relevant, in particular in summer
period (seasonal adjustment factors SunMay/Jun and SunJul/Aug );
– Group 3 pattern is characterised by a decrease of traffic volumes
during week-ends, in particular on Sundays.
• Groups 5 (3 AVCs), 6 (3 AVCs), 7 (2 AVCs) and 8 (2 AVCs) could be
characterized as recreational road groups. Traffic patterns are characterised by a strong variability during the year: very small volumes in
winter period and high peaks in summer time, with higher variations
observed for Groups 7 and 8.
A deeper analysis of the composition of the groups shows that the
AVCs of the same site belong to different groups: Group 5 - Group 6 or
Group 7 - Group 8. This fact is due to different traffic patterns observed
in each direction during summer week-ends (May/Jun and Jul/Aug):
Groups 5 and 7 have peaks in traffic volumes during Saturdays, while
Groups 6 and 8 have their peaks during Sundays.
These differences can be explained considering the holiday trips made
by people during week-ends: the vacationers reach the holiday resorts
on Saturdays and go back home in Sundays driving in the opposite
direction.
• Group 4 includes 3 ATRs with intermediate characteristics.
• Seven AVCs were classified as ”I don’t know” cases. They belonged
to Group ”1 or 2 or 3” (2 AVCs), Group ”1 or 3”(2 AVCs), Group ”1 or
3 or 4”(3 AVCs), that is they had traffic patterns similar to commuter
roads.
Furthermore spatial distribution of road groups were analysed, to evaluate if the differences among road groups could also be interpreted based
on the knowledge of land-use characteristics.
As can be observed in Figure A.4 reported in the Appendix, AVCs
were grouped following clear spatial patterns which confirmed previous
observations:
• AVCs belonging to commuter road groups (1, 2, 3) are located in the
inland parts of the province, where tourist activities are limited and
traffic patterns are supposed to be quite stable during the year;
• AVCs belonging to recreational road groups (5, 6, 7, 8) are located in
the coastal part of the Province of Venice (groups 7 and 8) or to roads
which give access to the tourist facilities (groups 5 and 6). This means
that in summer period these roads are supposed to be characterised
by very high passenger car volumes compared to winter period;
145
• AVCs belonging to road group 4 are representative of intermediate
characteristics between commuter and recreational roads. Their location in the road network (between the inland and the coastal line)
confirms these characteristics;
• AVCs classified as ”I don’t know cases”, which are characterised by
commuter-type patterns, are in the inland parts of the Province.
7.3.2 Developing the Artificial Neural Networks
Multi-layered, feed-forward artificial neural networks (ANNs) were developed in order to assign the SPTCs to the road groups (see section 6.2).
Different structures of ANN are adopted, corresponding to the different
SPTCs combinations analysed, as reported in Table 7.5. Applying the proposed approach to the case study, the number of output nodes was reduced
to 11, since in the training dataset 8 road groups and 3 ”I don’t know” situations were found. Moreover different ANNs were trained for each datasets,
maintaining the structure corresponding to the specific duration of SPTCs
used. That is 24hr SPTCs taken on weekdays were used to train a network,
while 24hr SPTCs taken on Saturdays were used for another network with
the same structure.
Table 7.5: Characteristics of ANNs Used for the Assignment
Datasets
1,2,3
4,5
6,7
SPTCs Duration
24hr
48hr
72hr
Input Nodes
25
49
73
Hidden Nodes
30
60
84
Output Nodes
11
11
11
Some further details about the training process, repeated in the different
training datasets, are:
• Learning cycles = 25,000;
• Momentum α = 0.2;
• Learning rate η = 0.3.
7.3.3 Calculation of AADT
In this process each SPTC in the test dataset is assigned by the corresponding ANN, obtaining the probabilities of belonging to each road group.
The SPTC volume is used to estimate the AADT following only Equation
6.7 for 24hr SPTCs and also Equation 6.8 in case of 48hr and 72hr SPTCs.
146
7.4 Results and Discussion
7.4.1 Examination of the Estimated AADTs
The actual AADT for an AVC and the estimated AADT from SPTC for
the same site were compared. The percent absolute estimation error was
used as a measure of goodness of estimate:
AADTEstimate − AADTActual (7.2)
∆=
× 100
AADTActual
The Mean Absolute Error (MAE) and the Standard Deviation of Absolute Error (SDAE) (Equations 5.45 and 5.46) were used to analyse the
accuracy of the AADT estimates made with the proposed approach.
Tables 7.6 and 7.7 summarize MAEs and SDAEs for the AVCs for different tests: by group, by different durations of SPTCs and by different
day-types (weekdays, weekends, weekdays + weekends). Differences between the MAE and SDAE of road groups in the various conditions were
tested using a T-test at a 95% confidence level.
Table 7.6: Mean Absolute Error (MAE) of the Road Groups for Different Combination of
SPTC and Time Periods
SPTC
24
24
24
24
24
24
24
24
24
48
48
48
48
48
48
48
48
48
72
72
72
72
72
72
72
72
72
Group
1
2
3
4
5
6
7
8
Total
1
2
3
4
5
6
7
8
Total
1
2
3
4
5
6
7
8
Total
Total
Perfect
5.57
6.26
5.99
8.46
11.24
9.28
16.16
14.88
7.98
4.54
5.53
5.20
7.27
9.16
7.42
14.49
13.43
6.83
4.22
5.33
4.91
6.72
8.60
7.22
13.72
12.90
6.46
ANN
8.28
9.05
7.11
10.68
16.29
15.55
20.23
19.15
10.88
5.49
6.14
5.43
7.79
10.64
9.04
16.33
14.79
7.66
4.76
5.50
4.96
6.86
9.44
8.07
14.29
13.83
6.85
D <20%
7.20
7.75
6.52
10.19
15.39
15.32
19.11
16.67
9.96
4.90
5.55
5.24
7.38
9.79
7.86
15.64
13.92
7.11
4.24
5.31
4.87
6.73
9.12
7.70
14.19
12.87
6.56
% Samples
79.27
78.21
85.07
83.65
68.54
73.28
83.51
81.22
81.09
86.06
94.47
94.79
91.79
82.89
83.21
87.20
86.14
90.63
85.41
96.66
97.20
92.39
83.89
86.11
88.55
88.28
92.12
Weekdays
Perfect
4.96
5.90
5.73
8.06
10.74
8.92
14.85
13.55
7.47
4.32
5.50
5.14
7.28
8.87
7.26
13.62
12.67
6.63
3.97
5.36
4.80
6.88
8.13
7.04
12.69
12.18
6.24
ANN
7.03
7.71
6.59
9.08
15.50
12.64
17.32
15.35
9.45
5.14
5.98
5.34
7.50
10.33
8.38
15.53
13.48
7.31
4.49
5.56
4.79
6.90
8.90
8.00
13.28
13.02
6.60
Weekends
Perfect
7.41
7.82
6.96
11.25
13.01
10.59
19.28
19.73
9.17
5.41
5.67
5.43
7.25
10.01
7.91
17.88
16.47
7.56
5.01
5.21
5.22
6.24
9.83
7.71
16.67
15.06
7.09
ANN
11.15
12.44
8.43
14.72
17.80
21.12
27.29
28.54
14.21
6.60
6.81
5.80
8.98
11.49
11.04
19.36
19.91
8.89
5.55
5.33
5.46
6.72
10.89
8.32
17.02
16.16
7.55
The MAEs and the SDAEs have been calculated for different cases:
Perfect case. The performances refer to the case of assigning all the samples
coming from an AVC to the correct group. This is the ”perfect” case
since in practice it is difficult that 100% of SPTCs are assigned to the
correct group. However it has been introduced as a benchmark for
the proposed approach;
147
Table 7.7: Standard Deviation of Absolute Error (SDAE) of the Road Groups for Different
Combination of SPTC and Time Periods
SPTC
Group
24
24
24
24
24
24
24
24
24
48
48
48
48
48
48
48
48
48
72
72
72
72
72
72
72
72
72
1
2
3
4
5
6
7
8
Total
1
2
3
4
5
6
7
8
Total
1
2
3
4
5
6
7
8
Total
Total
Perfect
5.13
5.50
5.26
7.14
9.76
8.55
15.55
15.42
7.27
4.12
4.52
4.53
5.54
7.75
6.45
13.31
13.55
5.99
3.92
4.20
4.32
4.99
7.22
5.94
12.36
12.70
5.61
ANN
10.48
13.78
9.04
11.26
13.81
20.24
19.82
22.32
12.94
6.50
7.60
5.03
6.61
8.93
8.96
15.14
15.45
7.67
4.90
4.67
4.43
5.23
7.90
6.76
12.95
13.54
6.13
D <20%
8.97
12.63
8.29
11.17
14.51
19.66
19.78
19.87
12.12
5.64
4.90
4.74
5.97
8.29
7.52
15.14
15.30
6.79
4.00
4.13
4.23
5.04
7.74
6.62
12.89
12.54
5.72
% Samples
79.27
78.21
85.07
83.65
68.54
73.28
83.51
81.22
81.09
86.06
94.47
94.79
91.79
82.89
83.21
87.20
86.14
90.63
85.41
96.66
97.20
92.39
83.89
86.11
88.55
88.28
92.12
Weekdays
Perfect
4.41
4.99
4.82
6.34
9.81
8.68
14.91
15.06
6.81
3.94
4.30
4.39
5.20
7.40
6.55
12.66
12.92
5.76
3.76
4.00
4.18
4.53
6.64
6.11
10.82
11.55
5.28
ANN
8.98
11.23
8.14
8.39
12.37
10.73
17.47
16.62
10.37
6.23
6.22
4.94
5.45
8.85
8.17
14.79
13.92
7.11
4.80
4.56
4.17
4.58
7.34
6.61
11.53
12.42
5.76
Weekends
Perfect
5.96
6.66
5.55
9.16
9.68
7.67
15.41
16.07
7.93
4.51
5.20
4.92
6.55
8.21
6.04
15.14
15.54
6.59
4.23
4.59
4.59
6.20
8.37
5.53
15.72
15.59
6.32
ANN
12.03
16.81
10.48
15.27
15.04
30.59
22.50
30.50
16.05
7.03
10.57
5.28
9.34
8.68
10.74
16.15
19.61
8.93
7.03
10.57
5.28
9.34
8.68
10.74
16.15
19.61
6.87
ANN. These data represent the values obtained applying the proposed
approach;
D < 20%. In this case only the samples with a discord less than the 20%
of the maximum value have been used to calculate the statistics. The
percentage of samples passing this threshold is indicated as %Samples.
The following observations can be made based on the values of MAE
and SDAE reported in Tables 7.6 and 7.7:
• Recreational roads (Groups 5, 6, 7, 8) have larger values of MAE and
SDAE compared to those of the commuter roads (Groups 1, 2, 3) in
any condition tested, because of the effect of the higher variability in
traffic patterns;
• SPTCs taken on weekdays give more precise AADT estimate compared to the ones taken during weekends. Differences between Perfect
and ANN assignments are significant for 24hr SPTCs during weekdays and weekends for Groups 1, 2, 3, 6, 7 and 8. Adopting 48hr and
72hr SPCTS the difference is significant only for Group 1;
• the increase in the duration of the SPTCs has a positive influence on
the AADT estimate. The MAE and the SDAE decrease both in the
perfect assignment, both in the case of ANN assignment. Except for
the Group 1 sites, the differences between the results obtained in the
two cases for 48hr and 72hr SPTCs are not statistically significant;
148
• the use of SPTCs assigned to a road group with low values of discord
(< 20% of maximum) gives better AADT estimates compared of the
use of all the SPTCs available. The differences are statistically significant for 24hr and 48hr SPTCs excepting Groups 7 and 8, while there is
no significant difference using 72hr SPTCs. The percentage of samples
with low values of discord increases as the duration of SPTCs increase
for the majority of AVCs.
To better understand the effect of discord measure on AADT estimates,
further analyses were conducted on available data. Since the results presented similar schemes for the different day-types considered, only the
”Total” case sample data are reported here for a subset of the AVCs under analysis. However the results obtained for each AVCs in the different
conditions tested are reported in the Appendix.
Figure 7.2 reports the percentage of 48hr SPTCs with values of discord
lower than different thresholds for 14 AVCs. Figure 7.3 shows the corresponding MAEs obtained from the selected SPTCs for the same AVCs. One
can observe that:
• The number of SPCTs with low values of discord decreases as the
chosen threshold value of discord decreases;
• The use of SPTCs with low values of discord produces lower MAEs.
This effect increases as the chosen threshold value dicreases;
• The number of SPTCs with low values of discord and the corresponding reduction in the MAEs can vary depending on the AVC considered;
• The use of a discord threshold equal to the 20% of the maximum
seems to be a compromise between the possibility of reducing the
MAE obtained form SPTCs counts and the need of using as many
SPTCs as possible.
The effect obtained on MAEs due to the choice of SPTCs with low
values of discord (with a threshold set to 20% of maximum) was analysed
for different durations at the same 14 AVCs (Figure 7.4). As can be observed:
• The use of SPTCs with low values of discord gives more accurate
AADT estimates than the use of all SPTCs also for 24hr and 72hr
SPTCs;
• The reductions of MAE are more relevant for 24hr SPTCs than 72hr
SPTCs. Furthermore the increase of SPTCs’ duration produces more
relevant reduction compared to the selection of SPTCs based on values
of discord measure;
149
Figure 7.2: Percentage of 48hr SPTCs With Values of Discord Lower Than Different
Thresholds for 14 AVC Sites
• Those SPTC < 20% maximums vary among the AVCs in the observed
network, that is the difficulty in classifying to particular road group
is different among the AVCs.
The results here presented suggest that consistency in the traffic pattern
pointing to particular road pattern is important to obtain accurate AADT.
When the traffic pattern observed from a given SPTC clearly points to a
road group, the AADT accuracy increases. Discord measure can be used to
quantify the consistency of the assignment. However the results highlights
that the quality of assignment can change among the AVC sites.
AADT estimates were analysed also in terms of non-specificity measure. For the same 14 AVCs Figure 7.5 presents the percentage of SPTCs of
different durations with low non-specificity values (< 15% and < 25% of
maximum). One can observe that:
• Small differences in the percentage of SPTCs are observed for different
threshold values of non-specificity;
• AVCs 16, 17, 31 and 32 were classified as ”I don’t know” cases and
they showed a low percentage of samples with low values of nonspecificity compared to other AVCs;
• An increase in the duration of counts reduces the SPTCs with low
150
Figure 7.3: Mean Absolute Error for 14 ATR Sites Based on Different SPTC Durations
and Discord Values
values of non-specificity mainly for sections clearly belonging to a
group, as for AVC sites 33 and 34;
• AVCs clearly belonging to a road group and ”I don’t know” cases
show different trends as the duration of SPTCs increases:
– For AVCs clearly belonging to a road group, if the duration increases, the number of SPTCs with low values of non-specificity
increases as an effect of a better classification done by the ANN;
– For AVCs which could belong to more than one road group, if
the duration increases, the number of SPTCs with low values of
non-specificity decreases. This fact can be explained considering
that these AVCs cannot be clearly assigned to a road group, that
is they can belong to more than one group. The more they are
assigned to more than one group (when duration increases), the
more the assignment is ”correct” and the number of SPTCs with
low values of non-specificity decreases.
These findings suggested that the non-specificity measure is useful to
highlight another aspect of the uncertainty in AADT estimates, that is the
identification of SPTCs characterized by uncertain assignment. Based on
151
Figure 7.4: Mean Absolute Error for 14 ATR Sites Based on Different SPTC Durations
these results, the practical application of the proposed method should pay
attention to the following:
• SPTCs, preferably 48hr-long, should be taken during weekdays to
have sufficient information for a correct assignment of the road section
to road groups;
• Discord and non-specificity are important measure to evaluate the
quality of estimates and improve their interpretability. Low values of
discord are related to accurate AADT estimates and high values of
non-specificity indicate uncertain assignments of SPTCs. SPCTs with
high discord measure or high non-specificity suggest the need for
additional data collection.
7.4.2 Comparison with Other Models
The accuracy of AADT estimates presented in the previous section in
terms of MAE and SDAE were found close to previous studies (Cambridge
Systematics Corporation 1994; Sharma et al. 1999). However a detailed
comparison was not possible because large differences exist in terms of
counts and network characteristics among different studies.
As an example Tsapakis et al. (2011) recently obtained from 24hr SPTCs
assigned to road groups MAEs ranging from 4.4% to 4.2%. However no
details were given by the authors about the number of road groups considered or the distribution of errors for different day-of-week or road groups.
152
Figure 7.5: Percentage of Samples with Non-specificity Lower Than Different Thresholds
for 14 AVC Sites
Similarly Sharma et al. (1999) using 48hr SPTCs on weekdays from April
to October obtained a MAE of 4.41% and 4.74% for commuter road groups,
7.26% and 8.67% for recreational road groups and 7.05% for average road
group, assuming a perfect assignment. In this case it seems not correct
assuming that the network of the Province of Venice has the same characteristics of the rural road network of Minnesota. Consequently the obtained
AADT estimation accuracies cannot be directly compared.
For these reasons the results obtained with the proposed model were
compared with those obtained by two approaches proposed in previous
studies using the same dataset. 48hr SPTC data taken on weekdays were
used since they were found to be the best solution for AADT estimate from
SPTCs.
The two model tested were the methods proposed by Sharma et al.
(1999) (see section 5.4.3 for details) (here called Sharma et al.) which is an
ANN model with hourly factors as inputs without the definition of road
groups, and the approach proposed by Tsapakis et al. (2011), based on the
use Linear Discriminant Analysis (LDA)(see section 5.4.1 for details).
The MAEs of these approaches are presented in Table 7.8, according
to the road groups from which the SPTCs are extracted. For Sharma et al.
approach the road grouping step is not needed, but the MAEs obtained
from the different AVCs are grouped to facilitate the comparison of results.
153
Moreover, since the proposed approach uses singular SPTCs, Sharma et al.
approach has been adapted to the specific case, considering only one 48hr
SPTC at each time.
Table 7.8: Comparison of MAE of the Proposed Model with Previous Models
SPTC [hrs]
Group
Proposed Sharma et al. LDA
48
1
5.1
8.7 **
7.6 *
48
2
6.0
8.8 *
8.0 **
48
3
5.3
7.2 **
7.6 **
48
4
7.5
9.2
9.6
48
5
10.3
16.1 **
13.2 **
48
6
8.4
14.1 **
14.4
48
7
15.5
15.8
18.8
48
8
13.5
15.5
18.3 **
** = statistically significant at a 95% confidence level (Paired T-Test)
* = statistically significant at a 90% confidence level
Table 7.8 shows that the AADT estimated by the proposed approach
shows smaller MAE compared with Sharma et al. and LDA. The differences
are significant for groups 1, 2, 3 (commuter roads) and group 5 (recreational
roads). The difference is statistically significant also for groups 6 and 8.
Moreover these results highlight that:
• the use of procedures which follow FHWA factor approach (identification of road groups and assignment of SPTCs) can give accurate
AADT estimate, better than a direct estimate of AADT, as done by
Sharma et al.’s model;
• ANN model performs better than LDA model. This results might be
due to the presence of non-linear relationships between short traffic
patterns and seasonal traffic patterns, that the ANN model can better
represent.
154
Chapter 8
Conclusions and Further
Developments
In recent years the technological evolution is increasing the availability
of data that can be used in transportation studies. However the quality and
the amount of these data often represent a relevant issue when accurate
analysis need to be performed. Since Data Mining techniques have been
proved to be useful tools to be used in these situations, this thesis has
analysed the capabilities offered by the implementation of Data Mining
techniques in transportation systems analysis.
The first part of the thesis has presented a review of well-established
Data Mining techniques, focusing on classification, clustering and associations techniques, which are the most applied in practice. It has been
reported that their use has become quite common also in transportation
studies, where they have been used in different contexts.
The second part of the thesis has analysed in details FHWA factor monitoring approach for the estimation of Annual Average Daily Traffic. The
review of literature has identified in the definition of road groups and the
assignment of sections monitored with short counts (SPTCs) the most relevant research issues. A new approach that estimates AADT based on limited traffic counts while preserving the basic structure of the current FHWA
procedure has been proposed. In this approach fuzzy set theory is used
to represent the vagueness of boundaries between individual road groups.
The measures of uncertainty (non-specificity and discord) from DempsterShafer theory are also introduced to deal with the difficulties of identifying
the group that matches the given road section. These measures help to increase the interpretability of results, giving the degree of uncertainty when
assigning a given road section to a specific road group. This information is
particularly important in practice when a given road section can match the
patterns of more than one roads group.
The approach has been implemented using data from 42 Automatic Ve155
hicle Classifiers located on the rural road network of the Province of Venice.
The results obtained in the estimation of AADT for passenger vehicles, also
compared with those obtained by two approaches proposed in previous
studies, highlighted that:
• the accuracy of estimates in terms of Mean Absolute Error and Standard Deviation of Absolute Error is found better than the previous
studies;
• for the commuter roads, errors are smaller than the case of the recreational roads given the different variability of traffic patterns;
• sample counts taken on weekdays give more accurate AADT estimates than the ones taken during weekends;
• increase in the duration of the SPTCs results in a decrease in the errors
in AADT estimates, particularly when the counting duration increases
from 24 to 48 hours.
Concerning the use on uncertainty measures, it has been observed that:
• the measure of discord is useful to indicate the quality of the estimates
made from SPTCs, in particular a low value of discord is related to
an accurate AADT estimate. Conversely the non-specificity measure
indicates more uncertainty in the assignment of the SPTCs to the road
groups. When SPTCs have high discord or high non-specificity, then
an additional data collection is needed
These findings suggest that when apply the proposed method SPTCs
should be measured during weekdays, preferably for 48 hours. At the same
time the discord and the non-specificity are important measures for evaluating the quality of the estimate and they can be used to identify the need
for additional data collection.
Given the positive results obtained in the experimental phase of the
research, the design of a software tool to be used in next future in real world
applications has begun. However it the future, this work could be further
extended to the following topics:
• Examine whether the proposed method can be applied to estimate the
AADT for freight vehicles volumes, because the volume fluctuations
of freight vehicles are different from the ones of passenger cars;
• Examine the influence of socio-economic and land-use characteristics
of the environment of the road section when identifying the road
group and assigning the SPTCs.
156
Chapter 9
Bibliography
Data Mining Techniques
Ankerst, M. et al. (1999). “OPTICS: Ordering Points To Identify the Clustering Structure”. In: ACM SIGMOD International Conference on Management
of Data. ACM Press, pp. 49–60.
Arthur, D. and S. Vassilvitskii (2007). “k-means++: the Advantages of Careful Seeding”. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 1027–1035.
Banfield, J. and A. Raftery (1993). “Model-Based Gaussian and Non-Gaussian
Clustering”. In: Biometrics 49.3, pp. 803–821.
Berry, K. and G. Linoff (2004). Data Mining Techniques for Marketing Sales and
Customer Support. New York: John Wiley Sons, Inc.
Bezdek, C.J. (1981). Pattern Recognition with Fuzzy Objective Function Algoritms. New York: Plenum Press.
Bezdek, C.J., R. Ehrlich, and W. Full (1984). “FCM: the Fuzzy C-Means
Clustering Algorithm”. In: Computational Geosciences 10.2-3, pp. 191–
203.
Buckles, B.P. and F.E. Petry (1994). Genetic Algorithms. Los Alamitos, California: IEEE Computer Society Press.
Calinski, R. and J. Harabasz (1974). “A Dendrite Method for Cluster Analysis”. In: Communications in Statistic 3.1, pp. 1–27.
Chapman, P. et al. (2000). CRISP-DM Step-by-Step Data Mining Guide. http:
//www.crisp_dm.org.
Chaturvedi, A., P.E. Green, and J.D. Caroll (2001). “K-modes Clustering”.
In: Journal of Classification 18.1, pp. 35–55.
Davies, D. and D. Bouldin (1979). “A Cluster Separation Measure”. In: IEEE
Transactions on Pattern Analysis and Machine Intelligence 1.4, pp. 224–227.
Dunham, M.H. (2003). Data Mining. Introductory and Advanced Topics. Upper
Saddle River, New Jersey: Prentice Hall.
157
Dunn, J.C. (1973). “A Fuzzy Relative of the ISODATA Process and Its Use in
Detecting Compact Well-Separated Clusters”. In: Journal of Cybernetics
3, pp. 32–57.
— (1974). “Well Separated Clusters and Optimal Fuzzy Partitions”. In:
Journal of Cybernetics 4.1, pp. 95–104.
Ester, M. et al. (1996). “A Density-Based Algorithm for Discovering Clusters
in Large Spatial Databases with Noise”. In: Proceedings of Second International Conference on Knowledge Discovery and Data Mining (KDD-96).
Ed. by U. M. Fayyad E. Simoudis J. Han. AAAI Press, pp. 226–231.
Fayyad, U. et al. (1996). Advances in Knowledge Discovery and Data Mining.
pp.1-34. AAAI Press/The MIT Press.
Fraley, C. and A.E. Raftery (1998). “How Many Clusters? Which Clustering
Method? Answers Via Model Based Cluster Analysis”. In: The Computer
Journal 41.8, pp. 578–588.
— (2002). “Model-based Clustering, Discriminant Analysis and Density Estimation”. In: Journal of the American Statistical Association 97.458, pp. 611–
631.
Goodman, L. and W. Kruskal (1954). “Measures of Associations for CrossValidations”. In: Journal of the American Statistical Association 49, pp. 732–
764.
Hand, D.J., H. Manilla, and P. Smith (2001). Principles of Data Mining. New
York: MIT Press.
Hanesh, M., R. Sholger, and M.J Dekkers (1984). “The Application of Fuzzy
C-Means Cluster Analysis and Non-Linear Mapping to a Soil Data Set
for the Detection of Polluted Sites”. In: Physics and Chemistry of the Earth
26.11-12, pp. 885–891.
Hebb, D.O. (1949). The Organisation of Behavior. New York: Wiley and Sons.
J.H., Holland. Adaptation in Natural and Artificial Systems. Ann Arbor, Michigan: University of Michigan Press.
Kass, G.V. (1980). “An Exploratory Technique for Investigating Large Quantities of Categorical Data”. In: Applied Statistics 29.2, pp. 119–127.
Kohonen, T. (1982). “Self-Organized Formation of Topologically Correct
Feature Maps”. In: Biological Cybernetics 43, pp. 59–69.
Milligan, G. and M. Cooper (1985). “An Examination of Procedures of Determining the Number of Cluster in a Data Set”. In: Psychometrika 50.2,
pp. 159–179.
Pelleg, D. and A.W. Moore (2000). “X-means: Extending K-means with Efficient Estimation of the Number of Clusters”. In: Seventeenth International
Conference on Machine Learning, pp. 727–734.
Quinlan, J.R. (1986). “Induction of Decision Trees”. In: Machine Learning,
pp. 81–106.
— (1993). C4.5: programs for machine learning. San Matteo: Morgan Kaufman.
158
Rousseeuw, P.J. (1987). “Silhouettes: A Graphical Aid to the Interpretation
and Validation of Cluster Analysis”. In: Journal of Computational and
Applied Mathematics 20.1, pp. 53–65.
Simoudis, E. (1998). “Discovering Data Mining: From Concept to Implementation”. In: ed. by P. Cabena et al. Upper Saddle River, NJ: Prentice
Hall.
Ward, J. (1963). “Hierarchical Grouping to Optimize an Objective Function”.
In: Journal of the American Statistical Association 58.301, pp. 236–244.
Widrow, B. and M.E. Hoff (1960). “Adaptive Switching Circuits”. In: IRE
WESCON Convention Record. Vol. IV, pp. 96–104.
Witten, I.W. and E. Frank (2005). Data Mining. Practical Machine Learning Tools
and Techniques. Second Edition. San Francisco, CA: Morgan Kaufman
Publisher.
Zhang, T., R. Ramakrishnan, and M. Livny (1996). “BIRCH: An Efficient
Data Clusterig Method for Very Large Databases”. In: SIGMOD ’96 Proceedings of the 1996 ACM SIGMOD International Conference on Management
of Data. ACM Press, pp. 103–114.
Transportation Applications of Data Mining
Attoh-Okine, N. (2002). “Combining Use of Rough Set and Artificial Neural Networks in Doweled-Pavement-Performance Modeling. A Hybrid
Approach”. In: Journal of Transportation Engineering 128.3, pp. 270–275.
Attoh-Okine, N.O. (1997). “Rough Set Application to Data Mining Principles
in Pavement Management Database”. In: Journal of Computing in Civil
Engineering 11.4, pp. 231–237.
Barai, S.K. (2003). “Data Mining Application in Transportation Engineering”. In: Transport 18.5, pp. 216–223.
Chang, L.-Y. (2005). “Analysis of Freeway Accident Frequencies: Negative
Binomial Regression Versus Artificial Neural Network”. In: Safety Science 43, pp. 541–557.
Chang, L.-Y. and W.-C. Chen (2005). “Data Mining of Tree-based Models to
Analyze Freeway Accident Frequency”. In: Journal of Safety Research 36,
pp. 365–375.
Chen, C.-Y. et al. (2004). “Using Data Mining Techniques on Fleet Management System”. In: 24th Annual ESRI International User Conference. Ed. by
ESRI.
Cheng, T., D. Cui, and P. Cheng (2003). “Data Mining for Air Traffic Flow
Forecasting: A Hybrid Model of Neural-Network and Statistical Analysis”. In: 2003 IEEE International Conference on Intelligent Transportation
Systems. Ed. by IEEE. Shanghai, China, pp. 211–215.
Cheng, T. and J. Wang (2008). “Integrated Spatio-temporal Data Mining for
Forest Fire Prediction”. In: Transactions in GIS 5.12, pp. 591–611.
159
Dougherty, M. and M. Cobbett (1997). “Short-term Inter-urban Traffic Forecasts Using Neural Networks”. In: International Journal of Forecasting 13,
pp. 21–31.
Flahaut, B. (2004). “Impact of Infrastructure and Local Environment on Road
Unsafety. Logistic Modeling with Spatial Autocorrelation”. In: Accident
Analysis and Prevention 36, pp. 1055–1066.
Flahaut, B. et al. (2003). “The Local Spatial Autocorrelation and the Kernel
Method for Identifying Black Zones. A Comparative Approach”. In:
Accident Analysis and Prevention 35, pp. 991–1004.
Getis, A. (2008). “A History of the Concept of Spatial Autocorrelation: A
Geographer’s Perspective”. In: Geographical Analysis 40, pp. 297–309.
Geurts, K., I. Thomas, and G. Wets (2005). “Understanding Spatial Concentrations of Road Accidents using Frequent Item Sets”. In: Accident
Analysis and Prevention 37, pp. 787–799.
Gong, X. and X. Liu (2003). “A Data Mining Based Algorithm for Traffic
Network Flow Forecasting”. In: KIMAS 2003. Ed. by IEEE. Boston, MA,
USA, pp. 243–248.
Gong, X. and Y. Lu (2008). “Data Mining Based Research on Urban Tide Traffic Problem”. In: 11th International IEEE Conference on Intelligent Transportation Systems. Ed. by IEEE. Beijing, China.
Gürbüz, F., L. Özbakir, and H. Yapici (2009). “Classification Rule Discovery for the Aviation Incidents Resulted in Fatality”. In: Knowledge-Based
Systems 22.8, pp. 622–632.
Haluzová, P. (2008). “Effective Data Mining for a Transportation Information
System”. In: Acta Polytechnica 48.1. Czech Technical University Publishing House, pp. 24–29.
Han, C. and S. Song (2003). “A Review of Some Main Models for Traffic
Flow Forecasting”. In: 2003 IEEE International Conference on Intelligent
Transportation Systems. Ed. by IEEE. Shanghai, China, pp. 216–219.
Hayashi, K. et al. (2005). “Individualized Drowsiness Detection During
Driving by Pulse Wave Analysis with Neural Network”. In: IEEE Intelligent Transportation Systems. Ed. by IEEE, pp. 901–906.
Hornik, K. (1989). “Multilayer Feedforward Networks Are Universal Approximators”. In: Neural Networks 2, pp. 359–366.
Hu, M. et al. (2003). “Development of the Real-time Evaluation and Decision
Support System for Incident Management”. In: 2003 IEEE International
Conference on Intelligent Transportation Systems. Shanghai, China: IEEE,
pp. 426–431.
Jiang, Z. and Y.-X. Huang (2009). “Parametric Calibration of Speed-Density
Relationships in Mesoscopic Traffic Simulator with Data Mining”. In:
Information Sciences 179, pp. 2002–2013.
Kalyoncuoglu, S. and M. Tigdemir (2004). “An Alternative Approach for
Modelling and Simulation of Traffic Data: Artificial Neural Networks”.
In: Simulation Modelling Practice and Theory 12, pp. 351–362.
160
Khan, G., X. Qin, and D. Noyce (2008). “Spatial Analysis of Weather Crash
Patterns”. In: Journal of Transportation Engineering 134.5, pp. 191–202.
Ladner, R., F. Petry, and M. Cobb (2003). “Fuzzy Set Approaches to Spatial
Data Mining of Association Rules”. In: Transactions in GIS 1.7), pages =
123-138.
Leu, S.-S., C.-N. Chen, and S.-L. Chang (2001). “Data Mining for Tunnel
Support Stability: Neural Network Approach”. In: Automation in Construction 10.4, pp. 429–441.
Maciejewski, H. and T. Lipnicki (2008). “Data Exploration Methods for
Transport System Dependability Analysis”. In: Third International Conference on Dependability of Computer Systems DepCoS-RELCOMEX 2008.
IEEE, pp. 398–405.
Mennis, J. and J. Liu (2005). “Mining Association Rules in Spatio-Temporal
Data: An Analysis of Urban Socioeconomic and Land Cover Change”.
In: Transactions in GIS 1.9, pp. 5–17.
Miller, H. and J. Han (2009). Geographic Data Mining and Knowledge Discovery. Second Edition. Chapman Hall/CRC Data Mining and Knowledge
Discovery Series.
Pande, A. and M. Abdel-Aty (2006). “Assessment of Freeway Traffic Parameters Leading to Lane-change Related Collisions”. In: Accident Analysis
and Prevention 38, pp. 936–948.
— (2009). “Market Basket Analysis of Crash Data From Large Jurisdictions and its Potential as a Decision Support Tool”. In: Safety Science 47,
pp. 145–154.
Rahman, F.A., M.I. Desa, and A. Wibowo (2011). “A Review of KDD-Data
Mining Framework and its Application in Logistics and Transportation”. In: 7th International Conference on Networked Computing and Advanced Information Management (NCM), 2011, pp. 175 –180.
Sarasua, W.A. and X. Jia (1995). “Framework for Integrating GIS-T with
KBES: a Pavement Management System Example”. In: Transportation
Research Record: Journal of the Transportation Research Board 1497. Transportation Research Board of the National Academies, Washington, D.C.,
pp. 153–163.
Soibelman, L. and H. Kim (2000). “Generating Construction Knowledge
with Knowledge Discovery in Databases”. In: Computing in Civil and
Building Engineering 2, pp. 906–913.
Van der Voort, M., M.S. Dougherty, and S.M. Watson (1996). “Combining
Kohonen Maps with ARIMA Time Series Models to Forecast Traffic
Flow”. In: Transportation Research C 4.5, pp. 307–318.
Wang, Y. et al. (2006). “Dynamic Traffic Prediction Based on Traffic Flow
Mining”. In: 6th World Congress on Intelligent Control and Automation. Ed.
by IEEE. Dalian, China, pp. 6078–6081.
Xu, W., Y. Qin, and H. Huang (2004). “A New Method of Railway Passenger Flow Forecasting Based on Spatio-Temporal Data Mining”. In: 7th
161
International IEEE Conference on Intelligent Transportation Systems. Ed. by
IEEE. Washington, D.C., USA, pp. 402–405.
Yang, X. and M. Tong (2008). “Knowledge Discovery in Information-ServiceOriented Experimental Transportation System”. In: 2008 International
Conference on Intelligent Computation Technology and Automation. IEEE,
pp. 787–790.
Zhang, C., Y. Huang, and G. Zong (2010). “Study on the Application of
Knowledge Discovery in Data Bases to the Decision Making of Railway
Traffic Safety in China”. In: International Conference on Management and
Service Science (MASS), 2010, pp. 1 –10.
Zhaohui, W. et al. (2003). “Application of Knowledge Management in Intelligent Transportation Systems”. In: 2003 IEEE International Conference on
Intelligent Transportation Systems. Shanghai, China: IEEE, pp. 1730–1734.
Zhou, G. et al. (2010). “Integration of GIS and Data Mining Technology to
Enhance the Pavement Management Decision-Making”. In: Journal of
Transportation Engineering 136.4, pp. 332–341.
Road Traffic Monitoring
AASHTO, Join Task Force on Traffic Monitoring Standards (1992). AASHTO
Guidelines for Traffic Data Programs. Tech. rep. AASHTO.
Bodle, R.R. (1967). “Evaluation of Rural Coverage Count Duration for Estimating Annual Average Daily Traffic”. In: Highway Research Record,
HRB, National Research Council 199. Washington, D.C., pp. 67–77.
Cambridge Systematics Corporation (1994). Use of Data from Continuous
Monitoring Sites. Tech. rep. Prepared for FHWA, Volume I and II, Documentation, FHWA.
Davis, G.A. (1996). Estimation Theory Approaches to Monitoring and Updating
Average Daily Traffic. Tech. rep. Rep.No. 97-05. Center of Transportation
Studies, University of Minnesota, Minneapolis.
— (1997). “Accuracy of Estimates of Mean Daily Traffic: A Review”. In:
Transportation Research Record: Journal of the Transportation Research Board
1593. Transportation Research Board of the National Academies, Washington, D.C., pp. 12–16.
Davis, G.A. and Y. Guan (1996). “Bayesian Assignment of Coverage Count
Locations to Factor Groups and Estimation of Mean Daily Traffic”. In:
Transportation Research Record: Journal of the Transportation Research Board
1542. Transportation Research Board of the National Academies, Washington, D.C., pp. 30–37.
Della Lucia, L. (2000). Campagna di monitoraggio del traffico sulla rete di interesse regionale 1999-2000. Tech. rep. In Italian. Padova: Dipartimento
Costruzioni e Trasporti.
162
Faghri, A., M. Glaubitz, and J. Parameswaran (1996). “Development of
Integrated Traffic Monitoring System for Delaware”. In: Transportation
Research Record: Journal of the Transportation Research Board 1536. Transportation Research Board of the National Academies, Washington, D.C.,
pp. 40–44.
Faghri, A. and J. Hua (1995). “Roadway Seasonal Classification Using Neural Networks”. In: Journal of Computing in Civil Engineering 9.3, pp. 209–
215.
FHWA (2001). Traffic Monitoring Guide. Tech. rep. U.S. Department of Transportation.
— (2005). HPMS Field Manual.
Flaherty, J. (1993). “Cluster Analysis of Arizona Automatic Traffic Record
Data”. In: Transportation Research Record: Journal of the Transportation
Research Board 1410. Transportation Research Board of the National
Academies, Washington, D.C., pp. 93–99.
Gecchele, G. et al. (2011). “Data Mining Methods for Traffic Monitoring
Data Analysis: a Case Study”. In: Procedia Social and Behavioral Sciences
20, pp. 455–464.
Gecchele, G. et al. (2012). “Advances in Uncertainty Treatment in the FHWA
Procedure for Estimating Annual Average Daily Traffic Volume”. In:
Transportation Research Record: Journal of the Transportation Research Board.
Transportation Research Board of the National Academies, Washington,
D.C. Accepted for Publication.
Hallenbeck et al. (1997). Vehicle Volume Distributions by Classification. Tech.
rep. FHWA-PL-97-025. FHWA.
Hallenbeck, M. and A. O’Brien (1994). Truck Flows and Loads for Pavement
Management. Tech. rep. FHWA.
Klir, G.J. and M.J. Wierman (1999). Uncertainty-Based Information. Elements
of Generalized Information Theory. 2nd edition. Heidelberg, Germany:
Physica-Verlag.
Klir, G.J. and B. Yuan (1995). Fuzzy sets and fuzzy logic. Theory and applications.
Upper Saddle River, New Jersey, USA: Prentice Hall PTR.
Li, M.-T., F. Zhao, and L.F. Chow (2006). “Assignment of Seasonal Factor
Categories to Urban Coverage Count Stations Using a Fuzzy Decision
Tree”. In: Journal of Transportation Engineering 132.8, pp. 654–662.
Li, M.-T. et al. (2003). “Evaluation of Agglomerative Hierarchical Clustering
Methods”. In: Presented at the 82th Annual Meeting of the Transportation Research Board. Transportation Research Board of the National Academies,
Washington, D.C.
Lingras, P. (1995). “Hierarchical Grouping Versus Kohonen Neural Networks”. In: Journal of Transportation Engineering 121.4, pp. 364–368.
— (2001). “Statistical and Genetic Algorithms Classification of Highways”.
In: Journal of Transportation Engineering 127.3, pp. 237–243.
163
Lingras, P. and M. Adamo (1996). “Average and Peak Traffic Volumes: Neural Nets, Regression, Factor Approaches”. In: Journal of Computing in
Civil Engineering 10.4, pp. 300–306.
Matheron, G. (1963). “Principles of Geostatistics”. In: Economic Geology 58,
pp. 1246–1266.
Ministero dei Lavori Pubblici. Sistemi di Monitoraggio del Traffico. Linee guida
per la progettazione. Ispettorato Generale per la Circolazione e la Sicurezza
Stradale.
Mohamad, D. et al. (1998). “Annual Average Daily Traffic Prediction Model
for County Roads”. In: Transportation Research Record: Journal of the Transportation Research Board 1617. Transportation Research Board of the National Academies, Washington, D.C., pp. 69–77.
Ritchie, S. (1986). “Statistical Approach to Statewide Traffic Counting”. In:
Transportation Research Record: Journal of the Transportation Research Board
1090. Transportation Research Board of the National Academies, Washington, D.C., pp. 14–21.
Rossi, R., M. Gastaldi, and G. Gecchele (2011). “FHWA Traffic Monitoring
Approach. Innovative procedures in a study case”. In: SIDT Scientific
Seminar. Venice.
Schneider, W.H. and I. Tsapakis (2009). Review of Traffic Monitoring Factor
Groupings and the Determination of Seasonal Adjustment Factors for Cars
and Trucks. Tech. rep. FHWA/OH-2009/10A. Project performed in cooperation with the Ohio Department of Transportation and the Federal
Highway Administration. Ohio Department of Transportation.
Selby, B. and K.M. Kockelman (2011). “Spatial Prediction of AADT at Unmeasured Locations by Universal Kriging”. In: Presented at the 90th Annual Meeting of the Transportation Research Board. Transportation Research
Board of the National Academies, Washington, D.C.
Sharma, S.C. and R. Allipuram (1993). “Duration and Frequency of Seasonal
Traffic Counts”. In: Journal of Transportation Engineering 119.3, pp. 344–
359.
Sharma, S.C., B.M. Gulati, and S.N. Rizak (1996). “Statewide Traffic Volume
Studies and Precision of AADT Estimates”. In: Journal of Transportation
Engineering 122.6, pp. 430–439.
Sharma, S.C. and A. Werner (1981). “Improved Method of Grouping Provincewide
Permanent Traffic Counters”. In: Transportation Research Record: Journal
of the Transportation Research Board 815. Transportation Research Board
of the National Academies, Washington, D.C., pp. 12–18.
Sharma, S.C. et al. (1986). “Road Classification According to Driver Population”. In: Transportation Research Record: Journal of the Transportation
Research Board 1090. Transportation Research Board of the National
Academies, Washington, D.C., pp. 61–69.
Sharma, S.C. et al. (1999). “Neural Networks as Alternative to Traditional
Factor Approach to Annual Average Daily Traffic Estimation from Traf164
fic Counts”. In: Transportation Research Record: Journal of the Transportation Research Board 1660. Transportation Research Board of the National
Academies, Washington, D.C., pp. 24–31.
Sharma, S.C. et al. (2000). “Estimation of Annual Average Daily Traffic on
Low-Volume Roads”. In: Transportation Research Record: Journal of the
Transportation Research Board 1719. Transportation Research Board of the
National Academies, Washington, D.C., pp. 103–111.
Sharma, S.C. et al. (2001). “Application of Neural Networks to Estimate
AADT on Low-Volume Roads”. In: Journal of Transportation Engineering
127.5, pp. 426–432.
Tsapakis, I. et al. (2011). “Discriminant Analysis for Assigning Short-Term
Counts to Seasonal Adjustment Factor Groupings”. In: Transportation
Research Record: Journal of the Transportation Research Board 2256. Transportation Research Board of the National Academies, Washington, D.C.,
pp. 112–119.
Wang, X. and K.M. Kockelman (2009). “Forecasting Network Data: Spatial Interpolation of Traffic Counts Using Texas Data”. In: Transportation
Research Record: Journal of the Transportation Research Board 2105. Transportation Research Board of the National Academies, Washington, D.C.,
pp. 100–108.
Xia, Q. et al. (1999). “Estimation of Annual Average Daily Traffic for Nonstate Roads in a Florida County”. In: Transportation Research Record: Journal of the Transportation Research Board 1660. Transportation Research
Board of the National Academies, Washington, D.C., pp. 32–40.
Zhao, F. and S. Chung (2001). “Contributing Factors of Annual Average
Daily Traffic in a Florida County: Exploration with Geographic Information System and Regression Models”. In: Transportation Research Record:
Journal of the Transportation Research Board 1769. Transportation Research
Board of the National Academies, Washington, D.C., pp. 113–122.
Zhao, F., M.-T. Li, and L.F. Chow (2004). Alternatives for Estimating Seasonal
factors on Rural and Urban Roads in Florida. Tech. rep. Final Report. Research Office. Florida Department of Transportation.
Zhao, F. and N. Park (2004). “Using Geographically Weighted regression
Models to Estimate Annual Average Daily Traffic”. In: Transportation
Research Record: Journal of the Transportation Research Board 1879. Transportation Research Board of the National Academies, Washington, D.C.,
pp. 99–107.
165
166
Appendix A
Case Study Data
167
Figure A.1: Road Section Data Summary (SITRA Monitoring Program)
168
Figure A.2: Province of Venice AVC Sites (year 2012)
Legend
Province of Venice AVC Sites (year 2012)
0
3
6
12
18
24
Kilometers
169
Figure A.3: Case Study AVC Sites
Legend
Case Study AVC Sites
0
3
6
12
18
24
Kilometers
170
171
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
AVC Number
AVC Site
Road Section
ANASS014h0168
ANASS014h0168
ANASS309h0800
ANASS309h0800
VNTSR011h4005
VNTSR011h4005
VNTSR089h0142
VNTSR089h0142
VNTSR245h0107
VNTSR245h0107
VNTSR515h0239
VNTSR515h0239
VNTSR516h0413
xVESP012h0126
xVESP012h0126
xVESP013h0075
xVESP013h0075
xVESP018h0025
xVESP026h0021
xVESP026h0021
xVESP027h0017
xVESP027h0017
xVESP032h0092
xVESP032h0092
xVESP035h0065
xVESP035h0065
xVESP036h0045
xVESP036h0045
xVESP039h0034
xVESP039h0034
xVESP040h0102
xVESP040h0102
xVESP041h0045
xVESP041h0045
xVESP054h0048
xVESP074h0117
xVESP074h0117
xVESP081h0022
xVESP090h0033
xVESP090h0033
xVESP093h0024
xVESP093h0024
Direction
A
B
A
B
A
B
A
B
A
B
A
B
B
A
B
A
B
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
B
A
B
B
A
B
A
B
24hr SPTCs
Total
Weekdays
150
99
150
99
349
250
349
250
342
247
342
247
115
76
113
74
343
246
343
246
345
246
345
246
348
251
343
244
342
243
335
240
332
237
334
239
349
250
349
250
343
247
344
248
350
251
350
251
344
248
344
248
293
210
293
210
339
242
338
241
244
172
244
172
145
95
145
95
305
218
338
239
343
244
332
237
341
242
344
245
270
175
288
193
Week-ends
51
51
99
99
95
95
39
39
97
97
99
99
97
99
99
95
95
95
99
99
96
96
99
99
96
96
83
83
97
97
72
72
50
50
87
99
99
95
99
99
95
95
48hr SPTCs
Total
Weekdays
97
73
97
73
241
194
241
194
232
189
232
189
74
55
73
54
237
191
237
191
236
189
236
189
240
195
232
185
229
182
230
185
227
182
230
185
241
194
241
194
235
191
236
192
242
195
242
195
235
191
235
191
198
159
198
159
232
186
232
186
159
126
159
126
92
68
92
68
212
171
229
182
234
187
227
182
232
185
235
188
180
136
191
147
Table A.1: AVC Sites Sample Size
Week-ends
24
24
47
47
43
43
19
19
46
46
47
47
45
47
47
45
45
45
47
47
44
44
47
47
44
44
39
39
46
46
33
33
24
24
41
47
47
45
47
47
44
44
72hr SPTCs
Total
Weekdays
67
49
67
49
187
142
187
142
179
137
179
137
50
36
50
36
185
140
185
140
183
137
183
137
187
143
178
133
174
129
178
134
175
131
179
135
188
142
188
142
183
140
183
140
189
143
189
143
182
139
182
139
152
114
152
114
180
135
180
135
118
86
118
86
63
45
63
45
165
127
178
131
185
138
175
132
180
134
182
137
129
101
141
107
Week-ends
18
18
45
45
42
42
14
14
45
45
46
46
44
45
45
44
44
44
46
46
43
43
46
46
43
43
38
38
45
45
32
32
18
18
38
47
47
43
46
45
28
34
Figure A.4: Road Groups Identified with FCM Algorithm
Legend
Fuzzy_C_Means Grouping
Road Group 1
Road Group 2
Road Group 3
Road Group 4
Road Group 5
Road Group 6
Road Group 7
Road Group 8
Road Group (1, 3, 4)
Road Group (1, 3)
Road Group (1, 2)
0
3
6
12
18
24
Kilometers
172
173
AVC sites
Jan/Feb
Mar/Apr
May/Jun
Jul/Aug
Sep/Oct
Wday
Sat
Sun
Wday
Sat
Sun
Wday
Sat
Sun
Wday
Sat
Sun
Wday
Group 1
7
1.04
0.96
1.18
0.97
0.90
1.05
0.97
0.93
1.09
1.06
0.97
1.22
0.97
Group 2
5
0.90
0.88
1.36
0.86
0.84
1.29
0.88
0.89
1.43
1.01
1.00
1.69
0.86
Group 3
10
0.94
0.99
1.21
0.87
0.92
1.10
0.87
0.93
1.12
0.98
1.07
1.34
0.90
Group 4
3
1.34
1.14
1.23
1.19
0.99
1.02
1.07
0.78
0.82
0.90
0.66
0.79
1.20
Group 5
3
1.59
1.46
1.36
1.38
1.14
1.04
0.94
0.56
0.73
0.80
0.53
0.68
1.37
Group 6
3
1.55
1.42
1.31
1.37
1.28
0.99
1.00
0.76
0.52
0.84
0.70
0.53
1.36
Group 7
2
2.14
2.08
1.85
1.55
1.32
1.15
0.94
0.52
0.58
0.66
0.44
0.53
1.40
Group 8
2
2.13
2.09
1.83
1.56
1.46
1.13
0.97
0.69
0.47
0.66
0.53
0.44
1.39
’Jan/Feb" = ”January/February”. ”Mar/Apr” = ”March/April”. ”May/Jun” = ”May/June”. ”Jul/Aug” = ”July/August”. ’Sep/Oct” = ”September/October”. "Nov/Dec"=November/December"
”Wday” = ”Weekdays”. ”Sat” = ”Saturdays”. ”Sun” = ”Sundays”
Road Group
Table A.2: Average Seasonal Adjustment Factors for Road Groups
Sat
0.88
0.84
0.94
0.96
1.11
1.16
1.13
1.18
Sun
1.03
1.30
1.12
1.04
1.25
1.16
1.17
0.99
Nov/Dec
Wday
0.98
0.85
0.89
1.28
1.49
1.41
2.05
2.05
Sat
0.95
0.85
0.97
1.16
1.51
1.47
2.12
2.16
Sun
0.98
1.19
1.14
1.20
1.41
1.34
2.32
2.29
Table A.3: Average Reciprocals of the Seasonal Adjustment Factors for Road Groups
Road Group
AVC sites
Jan/Feb
Mar/Apr
May/Jun
Jul/Aug
Sep/Oct
Wday
Sat
Sun
Wday
Sat
Sun
Wday
Sat
Sun
Wday
Sat
Sun
Wday
Group 1
7
0.96
1.04
0.85
1.03
1.11
0.95
1.03
1.07
0.92
0.94
1.03
0.82
1.03
Group 2
5
1.11
1.13
0.74
1.16
1.19
0.77
1.14
1.13
0.70
0.99
1.00
0.59
1.16
Group 3
10
1.07
1.01
0.82
1.15
1.08
0.91
1.15
1.07
0.89
1.02
0.94
0.75
1.11
Group 4
3
0.75
0.87
0.81
0.84
1.01
0.98
0.94
1.29
1.22
1.11
1.52
1.27
0.83
Group 5
3
0.63
0.69
0.73
0.72
0.88
0.96
1.07
1.80
1.37
1.24
1.90
1.47
0.73
Group 6
3
0.65
0.71
0.76
0.73
0.78
1.01
1.00
1.31
1.93
1.19
1.42
1.88
0.74
Group 7
2
0.47
0.48
0.54
0.65
0.76
0.87
1.07
1.91
1.71
1.52
2.26
1.87
0.71
Group 8
2
0.47
0.48
0.55
0.64
0.68
0.89
1.03
1.45
2.15
1.52
1.89
2.26
0.72
’Jan/Feb" = ”January/February”. ”Mar/Apr” = ”March/April”. ”May/Jun” = ”May/June”. ”Jul/Aug” = ”July/August”. ’Sep/Oct” = ”September/October”. "Nov/Dec"=November/December"
”Wday” = ”Weekdays”. ”Sat” = ”Saturdays”. ”Sun” = ”Sundays”
Sat
1.14
1.18
1.07
1.04
0.90
0.86
0.88
0.85
Sun
0.97
0.77
0.89
0.96
0.80
0.86
0.85
1.01
Nov/Dec
Wday
1.02
1.18
1.12
0.78
0.67
0.71
0.49
0.49
Sat
1.05
1.17
1.03
0.86
0.66
0.68
0.47
0.46
Sun
1.02
0.84
0.88
0.83
0.71
0.75
0.43
0.44
174
Table A.4: MAE of AVC Sites. ”Total” Sample
AVC Number
24hr SPTCs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Perfect
11.10
9.81
8.43
9.14
5.93
5.28
10.51
8.08
5.34
5.06
5.70
6.49
4.80
6.33
6.49
4.05
4.23
4.83
7.61
5.61
5.13
7.00
5.73
6.29
7.25
7.55
4.42
5.49
6.39
5.49
7.93
10.20
12.10
9.94
7.81
15.93
15.02
6.05
16.39
14.73
5.94
5.70
ANN
15.72
15.15
11.89
10.18
7.42
6.25
15.10
11.02
6.24
7.52
7.05
8.13
6.27
9.87
10.34
6.98
21.48
10.08
11.19
9.25
7.07
8.11
8.13
9.39
9.38
8.40
5.27
6.22
7.13
5.88
10.44
15.03
18.07
20.47
9.98
21.36
19.79
7.28
19.10
18.52
10.08
12.69
48hr SPTCs
D<20%
14.82
13.93
11.09
9.83
6.84
5.61
14.65
10.29
5.59
7.29
6.30
7.00
5.67
9.34
9.67
5.83
18.95
8.61
9.03
7.26
6.18
7.39
7.42
7.92
8.60
8.04
4.75
5.65
6.75
5.42
9.98
14.08
16.69
21.74
9.65
19.81
15.76
7.13
18.42
17.57
8.33
10.23
Perfect
8.85
7.60
7.60
8.24
4.79
4.49
9.05
6.74
4.35
4.20
4.89
5.47
3.82
4.42
5.10
3.34
3.39
3.55
7.32
4.66
4.27
6.54
5.32
5.34
6.82
7.33
3.55
4.28
5.34
4.58
6.59
9.41
9.58
7.93
5.97
14.18
13.43
5.04
14.79
13.44
4.61
4.47
175
ANN
10.05
9.60
8.01
8.72
4.85
4.44
10.06
6.89
4.60
4.46
5.24
6.08
4.20
7.57
6.10
4.90
18.77
6.08
8.22
5.40
5.42
6.74
5.39
6.32
7.34
7.37
3.87
4.41
5.35
4.55
9.47
12.15
11.82
10.64
6.64
16.54
15.46
5.37
16.12
14.12
7.17
6.70
72hr SPTCs
D<20%
9.23
7.57
7.56
8.52
4.68
4.41
9.54
6.57
4.35
4.32
4.79
5.60
3.85
7.49
5.40
4.77
15.46
5.78
7.44
4.67
4.80
6.48
5.22
5.20
7.38
7.39
3.56
4.24
5.26
4.24
8.74
11.52
10.59
9.43
6.06
15.80
14.33
5.25
15.48
13.51
5.52
5.88
Perfect
8.71
7.20
7.04
7.55
4.42
4.19
8.24
6.93
4.09
4.01
4.59
5.19
3.57
4.19
4.92
3.09
3.18
3.29
7.21
4.37
4.02
6.44
5.00
5.00
6.45
6.89
3.32
4.08
5.05
4.23
6.19
8.84
8.85
7.53
5.57
13.66
12.97
5.04
13.79
12.83
4.18
3.88
ANN
9.19
7.82
7.01
7.68
4.36
4.18
8.39
6.84
4.26
4.41
4.81
5.38
3.70
6.28
5.22
4.03
17.84
5.35
7.37
4.59
4.20
6.53
5.06
5.34
6.49
6.92
3.35
4.09
5.23
4.24
8.66
10.86
10.74
9.54
5.89
14.55
14.37
5.15
14.04
13.29
4.77
5.98
D<20%
8.88
7.42
7.00
7.66
4.11
4.13
8.35
6.58
3.97
4.09
4.45
4.99
3.62
6.15
4.86
4.01
14.16
4.77
7.32
4.31
3.92
6.61
4.86
4.98
6.42
6.97
3.27
4.01
5.22
4.01
8.36
10.93
10.15
9.10
5.53
14.27
12.73
5.10
14.11
13.00
4.33
4.22
Table A.5: SDAE of AVC Sites. ”Total” Sample
AVC Number
24hr SPTCs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Perfect
10.83
10.26
6.67
6.87
5.47
4.00
8.34
6.87
4.39
4.41
5.71
6.31
4.16
6.08
5.41
3.73
3.77
4.41
5.67
5.77
4.93
4.95
5.17
5.16
5.51
6.46
4.30
4.89
6.36
5.74
6.00
9.16
10.12
8.52
7.88
17.25
16.89
5.72
13.86
13.96
5.34
5.59
ANN
17.09
22.16
14.37
7.53
12.63
9.04
11.28
14.31
6.29
9.71
7.70
9.01
7.74
9.89
11.66
10.06
26.10
12.62
16.65
13.83
12.06
6.73
15.12
14.96
11.59
7.76
6.91
7.75
9.37
6.53
10.15
21.35
13.07
24.24
11.88
23.43
24.02
8.34
16.22
20.63
13.52
19.38
48hr SPTCs
D<20%
16.91
19.61
13.95
7.12
11.35
9.10
12.52
14.80
5.07
9.80
7.07
6.99
7.34
9.89
11.55
8.10
25.01
8.88
12.26
11.48
11.75
5.22
15.62
14.49
9.97
7.51
6.09
6.85
8.97
6.05
9.64
20.96
14.10
24.57
12.43
23.42
19.30
9.30
16.13
20.44
9.95
16.60
Perfect
8.04
7.49
5.44
5.77
4.25
3.35
6.91
5.22
3.87
3.60
5.53
5.11
3.15
3.95
4.20
3.00
3.03
2.92
4.49
4.50
4.39
4.41
4.53
4.55
5.19
6.18
3.57
3.95
5.25
4.79
4.94
7.88
8.30
6.64
5.41
15.13
15.00
4.54
11.49
12.09
3.96
3.65
176
ANN
8.80
11.06
5.91
6.09
4.85
3.22
7.62
6.05
4.39
4.23
6.00
6.88
4.18
10.39
7.40
6.43
23.16
6.76
9.70
6.57
7.73
4.49
4.76
11.10
5.87
6.11
4.07
4.17
5.06
4.73
9.84
10.43
10.36
9.77
7.83
17.48
18.31
5.86
12.81
12.59
12.33
7.50
72hr SPTCs
D<20%
8.27
8.40
5.11
6.03
4.71
3.21
7.29
5.05
3.98
3.82
5.49
5.84
3.55
10.56
6.03
6.38
22.43
6.57
4.66
5.00
6.71
4.26
4.44
4.43
5.70
6.18
3.51
3.85
5.00
4.27
8.11
9.40
9.30
9.12
6.79
17.48
17.80
5.98
12.80
12.79
9.77
7.03
Perfect
7.54
7.55
4.85
5.19
4.03
3.22
6.75
4.71
3.66
3.47
5.35
4.91
2.92
3.85
4.07
2.82
2.87
2.56
4.07
4.54
4.25
4.21
4.08
4.09
4.91
6.02
3.26
3.62
5.11
4.58
4.48
7.02
7.36
5.55
4.95
14.13
14.20
4.24
10.60
11.21
3.92
3.20
ANN
8.28
8.92
4.66
5.36
3.94
3.26
6.88
4.82
3.97
3.93
5.80
5.62
3.40
6.34
5.80
3.50
26.37
4.84
4.22
4.63
4.76
4.20
4.28
5.95
4.91
5.97
3.18
3.65
5.42
4.99
6.78
8.72
8.54
6.52
5.66
14.22
15.68
4.27
11.67
11.39
4.55
7.06
D<20%
8.00
9.07
4.53
5.24
3.72
3.20
7.18
4.72
3.51
3.71
4.99
4.75
3.30
6.31
5.68
3.49
25.22
3.42
4.21
4.25
4.00
4.19
3.71
4.15
4.75
6.02
3.07
3.53
5.44
4.40
6.21
8.79
8.03
6.07
5.35
14.26
13.67
4.31
11.52
11.42
4.23
3.54
Table A.6: Maximum Error of AVC Sites. ”Total” Sample
AVC Number
24hr SPTCs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Perfect
56.87
53.48
59.60
63.07
49.94
22.80
35.12
39.70
26.34
33.07
40.85
43.32
25.52
33.23
27.45
21.17
20.67
33.52
40.53
36.43
29.39
28.23
38.39
36.59
29.14
29.85
25.98
31.46
45.25
34.60
49.78
46.77
47.40
39.30
49.20
122.72
123.88
36.27
104.35
113.57
31.42
40.11
ANN
96.63
174.61
135.40
42.12
133.35
133.57
40.65
124.02
50.78
91.73
49.10
65.13
71.30
104.43
116.09
88.43
122.63
115.85
141.96
131.18
125.90
46.34
168.74
185.26
122.09
63.48
55.94
87.65
113.43
42.00
69.70
234.44
65.94
180.31
94.21
162.81
226.25
104.15
100.47
185.75
94.21
142.96
48hr SPTCs
D<20%
96.63
129.00
135.40
42.12
133.35
133.57
40.65
124.02
37.64
91.73
46.64
55.07
71.30
104.43
116.09
63.50
122.63
68.52
135.66
131.18
125.90
30.11
168.74
185.26
122.09
63.48
55.94
87.65
113.43
42.00
69.70
234.44
65.94
180.31
94.21
162.81
153.48
104.15
100.47
185.75
81.56
115.03
Perfect
35.75
35.91
40.35
38.62
33.30
15.49
29.54
21.62
24.42
23.17
40.74
25.37
17.98
22.53
19.66
20.77
17.27
15.88
22.75
24.25
27.23
26.71
28.58
27.44
26.43
26.79
17.13
19.98
28.38
23.45
32.21
34.93
38.79
26.56
35.16
104.08
97.10
25.03
89.19
77.65
26.77
18.08
177
ANN
35.86
52.24
40.06
38.55
38.71
15.37
33.49
35.50
24.42
26.54
48.18
50.97
33.52
140.25
51.19
62.90
113.37
74.76
126.50
53.57
54.49
27.20
28.05
154.36
40.56
26.79
28.74
24.25
28.27
26.57
77.26
63.78
49.87
44.27
70.78
105.47
128.73
54.86
89.17
77.50
102.01
41.36
72hr SPTCs
D<20%
35.81
45.81
40.06
38.55
38.71
15.37
29.62
21.62
24.42
23.34
48.18
34.71
21.67
140.25
37.20
62.90
113.37
74.76
24.72
47.39
54.49
26.71
26.24
29.19
40.56
26.79
16.97
19.78
28.27
23.05
75.51
42.38
38.71
44.27
70.78
105.47
128.73
54.86
89.17
77.50
102.01
41.36
Perfect
31.17
32.06
31.30
37.58
27.33
14.57
25.54
18.82
22.93
22.41
39.53
24.40
17.34
17.52
17.73
18.35
16.12
15.29
21.24
22.47
27.20
26.95
27.30
26.29
23.88
24.99
16.38
16.72
24.70
21.36
20.25
30.07
33.04
23.99
29.29
107.53
97.25
24.14
81.33
77.00
23.54
17.19
ANN
33.27
44.77
30.08
37.44
27.13
15.76
25.62
18.68
22.94
22.44
39.55
30.26
21.38
63.71
42.97
18.02
113.54
36.92
21.82
26.56
33.38
26.95
27.40
60.01
22.27
25.03
16.11
16.72
34.56
35.95
39.77
33.11
32.25
30.83
29.26
107.53
96.63
24.27
81.31
76.92
22.97
37.68
D<20%
31.38
44.77
30.08
37.44
27.13
15.76
25.62
18.68
22.94
22.44
39.55
24.54
21.38
63.71
42.97
18.02
113.54
16.35
21.82
19.68
26.47
26.95
25.09
28.04
22.27
25.03
16.11
16.72
34.56
21.60
29.09
33.11
31.88
26.22
29.26
107.53
96.63
24.27
81.31
76.92
22.97
13.95
Table A.7: Number of Samples of AVC Sites. ”Total” Sample
AVC Number
24hr SPTCs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Perfect
150
150
349
349
342
342
115
113
343
343
345
345
348
343
342
335
332
334
349
349
343
344
350
350
344
344
293
293
339
338
244
244
145
145
305
338
343
332
341
344
270
288
ANN
150
150
349
349
342
342
115
113
343
343
345
345
348
343
342
335
332
334
349
349
343
344
350
350
344
344
293
293
339
338
244
244
145
145
305
338
343
332
341
344
270
288
48hr SPTCs
D<20%
111
111
295
294
283
297
72
88
296
227
284
283
296
269
266
295
236
299
292
264
297
294
295
274
277
281
253
274
280
290
210
219
98
100
250
292
273
228
275
285
225
198
Perfect
97
97
241
241
232
232
74
73
237
237
236
236
240
232
229
230
227
230
241
241
235
236
242
242
235
235
198
198
232
232
159
159
92
92
212
229
234
227
232
235
180
191
178
ANN
97
97
241
241
232
232
74
73
237
237
236
236
240
231
229
230
225
229
241
241
233
236
242
242
235
235
198
198
231
230
159
159
92
92
208
221
228
227
228
234
141
155
72hr SPTCs
D<20%
82
85
223
224
223
227
64
64
222
209
219
220
222
218
208
223
143
218
232
226
218
222
235
225
223
225
187
188
219
215
149
137
72
69
190
207
204
209
195
200
126
122
Perfect
67
67
187
187
179
179
50
50
185
185
183
183
187
178
174
178
175
179
188
188
183
183
189
189
182
182
152
152
180
180
118
118
63
63
165
178
185
175
180
182
129
141
ANN
67
67
187
187
179
179
50
50
185
185
183
183
187
177
174
178
173
178
188
188
180
183
189
189
182
182
152
152
178
178
118
118
63
63
159
169
175
175
176
181
83
94
D<20%
58
59
173
178
172
178
44
48
180
165
173
175
179
171
158
173
125
171
185
177
175
179
184
183
174
177
148
151
177
172
114
113
49
48
147
160
162
169
157
162
74
73
Table A.8: MAE of AVC Sites. ”Weekdays” Sample
AVC Number
24hr SPTCs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Perfect
10.50
9.21
8.25
9.59
5.64
5.22
11.14
8.79
4.67
4.40
5.33
6.35
4.26
5.08
5.86
3.63
3.95
4.62
8.18
5.16
4.59
7.05
5.20
5.43
7.32
7.74
3.85
5.23
5.75
4.88
7.38
10.28
10.57
8.75
6.35
14.87
13.95
5.52
14.83
13.16
5.08
4.65
ANN
12.95
10.87
9.75
10.31
6.70
6.13
17.33
9.35
5.58
6.06
6.32
7.40
5.24
8.75
9.34
6.30
26.67
8.27
10.32
9.04
6.30
7.97
5.56
6.75
8.61
8.20
4.66
6.13
6.26
4.97
9.60
14.69
16.23
17.70
7.18
16.93
15.75
6.87
17.71
14.96
10.39
8.25
48hr SPTCs
D<20%
11.86
9.35
9.29
9.97
6.20
5.75
17.70
8.68
4.93
5.87
5.84
6.56
5.16
7.94
8.41
5.01
24.25
7.13
9.19
6.87
5.33
7.43
5.02
5.94
8.16
7.77
4.39
5.35
5.63
4.53
9.25
14.10
14.22
19.10
6.72
16.26
12.48
6.63
16.44
14.24
8.09
7.25
Perfect
8.59
7.18
7.73
8.64
4.64
4.45
9.57
7.12
3.93
3.90
4.98
5.43
3.56
3.99
4.71
3.08
3.19
3.48
7.85
4.38
4.12
6.78
5.13
5.17
7.07
7.61
3.25
4.26
4.96
4.24
6.51
10.08
8.44
7.49
5.46
13.59
12.87
4.97
13.65
12.47
4.25
4.15
179
ANN
10.12
8.10
7.88
9.11
4.70
4.41
10.53
6.88
4.23
4.08
5.29
6.01
3.91
6.71
4.75
4.58
21.76
6.24
8.69
5.35
5.08
6.84
5.09
5.66
7.51
7.70
3.63
4.38
5.04
4.14
8.65
12.41
10.35
10.14
5.51
15.83
14.14
5.10
15.22
12.82
6.62
5.86
72hr SPTCs
D<20%
9.26
7.49
7.75
8.84
4.51
4.37
10.11
7.01
3.90
4.05
4.85
5.64
3.66
6.54
4.57
4.46
19.04
5.86
8.00
4.55
4.63
6.79
4.99
5.20
7.64
7.69
3.28
4.16
4.95
3.83
8.42
11.72
9.45
9.64
5.31
15.24
13.19
5.01
14.56
12.17
5.76
5.20
Perfect
7.96
6.68
7.27
8.03
4.13
4.01
8.43
7.03
3.68
3.69
4.59
4.82
3.27
3.48
4.48
2.77
2.84
3.13
7.94
4.07
3.92
6.74
5.00
5.02
6.73
7.30
2.97
3.79
4.56
3.89
6.12
9.98
8.00
7.39
5.34
12.67
12.31
4.79
12.72
12.04
3.99
3.73
ANN
7.81
6.82
7.27
8.20
3.98
3.97
8.73
6.95
3.78
4.11
4.54
4.97
3.46
5.62
3.94
3.54
19.03
5.40
8.11
4.46
3.84
6.83
4.98
5.35
6.69
7.38
3.00
3.81
4.73
3.68
8.17
11.45
10.16
10.21
5.23
13.41
13.45
4.89
13.16
12.58
4.51
6.07
D<20%
7.87
6.22
7.25
8.05
3.70
3.90
8.45
6.59
3.45
3.76
4.35
4.64
3.39
5.48
3.71
3.50
15.25
5.04
8.05
4.11
3.70
6.91
4.70
4.83
6.69
7.40
2.89
3.70
4.73
3.69
8.21
11.58
8.58
9.20
5.09
13.26
12.27
4.84
13.04
12.16
4.12
4.00
Table A.9: SDAE of AVC Sites. ”Weekdays” Sample
AVC Number
24hr SPTCs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Perfect
10.89
10.68
6.16
6.62
4.42
3.90
9.32
7.66
3.97
3.74
5.75
5.66
3.40
4.74
5.05
3.36
3.50
4.63
5.46
5.15
4.68
4.65
4.67
4.68
5.49
6.48
3.88
4.76
5.24
4.72
4.90
9.58
9.20
7.69
6.25
16.50
16.54
5.00
13.33
13.57
4.38
3.97
ANN
13.20
11.73
9.70
6.98
11.32
10.05
11.70
7.98
6.30
7.62
7.68
7.51
6.55
7.08
9.00
10.47
28.93
9.34
13.88
15.20
10.90
6.18
6.83
11.51
8.10
7.18
6.57
8.46
7.38
5.30
8.30
22.54
12.20
12.49
8.49
18.44
17.54
8.71
16.51
15.70
15.48
11.70
48hr SPTCs
D<20%
13.54
11.64
9.45
6.62
9.02
10.11
13.30
7.56
4.62
6.40
7.10
5.74
6.60
6.36
8.08
8.28
28.76
7.77
10.73
12.22
9.75
4.80
4.88
9.41
6.19
6.61
6.27
7.27
5.55
4.67
7.88
22.64
12.83
13.21
8.43
18.37
12.72
9.83
16.24
15.30
10.47
9.78
Perfect
7.70
7.69
5.29
5.55
3.81
3.45
7.45
5.56
3.60
3.41
5.79
5.17
2.67
3.65
3.96
2.67
2.76
2.89
4.24
4.25
4.46
4.38
4.32
4.34
5.19
6.27
3.43
3.86
4.59
4.47
4.58
8.30
7.06
6.40
4.77
14.39
14.32
4.36
10.94
11.51
3.51
3.44
180
ANN
9.14
9.56
5.28
5.92
4.52
3.32
8.09
5.41
4.27
3.86
6.28
7.04
3.55
4.52
4.98
6.75
24.85
7.26
9.61
6.96
7.67
4.32
4.27
5.74
5.92
6.31
4.06
4.16
4.63
4.45
8.25
11.13
9.32
9.54
5.14
16.87
16.13
4.50
12.70
11.72
11.75
6.86
72hr SPTCs
D<20%
8.60
8.90
5.21
5.82
4.30
3.30
7.66
5.31
3.65
3.67
5.65
6.05
3.29
4.24
4.85
6.69
25.58
7.04
4.41
5.16
7.07
4.36
3.96
4.42
5.94
6.34
3.37
3.74
4.60
3.90
8.06
9.85
8.97
9.85
4.73
16.97
15.20
4.49
12.64
11.71
11.48
6.68
Perfect
6.86
7.46
4.51
4.64
3.50
3.28
7.00
5.10
3.39
3.28
5.82
4.86
2.48
3.20
3.62
2.38
2.51
2.51
3.85
4.03
4.28
4.12
4.07
4.09
4.93
6.14
3.17
3.58
4.37
4.39
4.36
7.37
6.06
5.77
4.45
12.53
12.82
3.99
9.10
10.28
3.36
3.10
ANN
6.96
7.60
4.37
4.76
3.29
3.33
7.02
5.24
3.67
3.83
5.90
5.44
3.26
3.58
3.36
2.99
27.59
4.10
4.03
4.47
4.24
4.12
4.11
6.20
4.84
6.17
3.08
3.62
4.84
4.12
5.76
9.17
8.04
6.98
4.62
12.31
14.56
3.97
10.75
10.29
3.63
7.86
D<20%
7.06
7.53
4.34
4.62
2.93
3.25
7.18
5.14
2.99
3.51
5.25
4.61
3.22
3.37
3.13
2.94
26.75
3.48
4.02
3.90
3.91
4.10
3.25
3.76
4.71
6.20
2.87
3.44
4.86
4.13
5.78
9.29
6.71
6.43
4.61
12.53
12.68
3.98
10.39
10.32
3.54
3.19
Table A.10: Maximum Error of AVC Sites. ”Weekdays” Sample
AVC Number
24hr SPTCs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Perfect
56.87
53.48
59.60
63.07
26.83
22.80
35.12
39.70
26.34
23.43
40.85
27.96
19.88
24.96
24.67
20.94
20.67
33.52
40.53
36.43
29.39
28.23
34.96
36.59
29.14
29.85
25.73
31.46
28.11
24.76
26.19
46.77
36.51
32.68
49.20
114.42
123.88
36.27
104.35
113.57
26.75
16.04
ANN
62.96
65.90
108.31
38.12
121.70
133.57
40.65
39.56
50.78
76.03
49.10
54.98
71.30
37.01
52.67
88.43
122.63
66.67
135.66
131.18
125.90
46.34
51.82
117.08
87.47
53.67
55.94
87.65
69.70
35.26
57.21
234.44
43.86
45.16
94.21
114.40
121.33
104.15
100.47
112.98
94.21
105.33
48hr SPTCs
D<20%
62.96
65.90
108.31
35.95
112.97
133.57
40.65
39.56
37.64
44.81
46.64
31.94
71.30
37.01
52.67
63.50
122.63
57.44
135.66
131.18
125.90
29.01
42.46
117.08
31.48
28.59
55.94
87.65
29.17
35.26
57.21
234.44
43.86
45.16
94.21
114.40
70.14
104.15
100.47
112.98
81.56
53.91
Perfect
35.75
35.91
40.35
32.98
21.21
15.49
29.54
21.62
24.42
23.17
40.74
25.37
13.94
17.92
19.38
13.73
13.16
15.88
22.75
22.57
27.23
26.71
28.58
27.44
26.43
26.79
17.13
19.98
23.48
22.87
21.73
34.93
30.48
26.11
26.03
102.58
89.93
25.03
89.19
74.53
16.69
17.86
181
ANN
35.86
45.81
40.06
32.98
38.71
15.37
33.49
21.62
24.42
23.34
48.18
50.97
21.67
23.73
37.20
62.90
113.37
74.76
126.50
53.57
54.49
26.71
28.05
51.34
40.56
26.79
28.74
24.25
23.44
26.57
75.51
63.78
38.71
44.27
33.17
105.47
89.91
25.03
89.17
74.53
102.01
41.36
72hr SPTCs
D<20%
35.81
45.81
40.06
32.98
38.71
15.37
29.62
21.62
24.42
23.34
48.18
34.71
21.67
23.73
37.20
62.90
113.37
74.76
24.72
47.39
54.49
26.71
26.24
29.19
40.56
26.79
16.97
19.78
23.44
23.05
75.51
42.38
38.71
44.27
28.25
105.47
89.91
25.03
89.17
74.53
102.01
41.36
Perfect
28.01
32.06
26.78
19.26
16.29
14.57
25.54
18.82
22.93
22.41
39.53
24.40
11.82
13.77
15.70
13.08
11.98
15.29
21.24
19.00
27.20
26.95
27.30
26.29
23.88
24.99
16.38
16.72
21.36
21.36
15.29
30.07
26.87
23.99
20.59
64.85
75.94
24.14
38.71
49.63
15.11
17.19
ANN
28.09
41.03
26.74
19.08
15.90
15.76
25.62
18.68
22.94
22.44
39.55
30.26
21.38
22.11
17.32
14.97
113.54
22.77
21.82
26.56
26.47
26.95
27.40
60.01
21.16
25.03
16.11
16.72
34.56
21.60
28.55
33.11
32.25
30.83
20.59
64.25
87.04
24.27
50.33
49.11
15.21
37.68
D<20%
28.09
41.03
26.74
19.08
15.04
15.76
25.62
18.68
22.94
22.44
39.55
24.54
21.38
22.11
17.32
14.97
113.54
16.35
21.82
19.68
26.47
26.95
17.05
24.14
18.72
25.03
16.11
16.72
34.56
21.60
28.55
33.11
27.20
26.22
20.59
64.25
75.81
24.27
50.33
49.11
15.21
13.95
Table A.11: Number of Samples of AVC Sites. ”Weekdays” Sample
AVC Number
24hr SPTCs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Perfect
99
99
250
250
247
247
76
74
246
246
246
246
251
244
243
240
237
239
250
250
247
248
251
251
248
248
210
210
242
241
172
172
95
95
218
239
244
237
242
245
175
193
ANN
99
99
250
250
247
247
76
74
246
246
246
246
251
244
243
240
237
239
250
250
247
248
251
251
248
248
210
210
242
241
172
172
95
95
218
239
244
237
242
245
175
193
48hr SPTCs
D<20%
75
72
225
229
214
230
46
66
223
164
221
217
236
191
194
213
157
214
231
197
228
224
230
214
207
212
184
198
210
217
161
160
62
68
186
222
208
166
192
219
152
140
Perfect
73
73
194
194
189
189
55
54
191
191
189
189
195
185
182
185
182
185
194
194
191
192
195
195
191
191
159
159
186
186
126
126
68
68
171
182
187
182
185
188
136
147
182
ANN
73
73
194
194
189
189
55
54
191
191
189
189
195
184
182
185
180
184
194
194
189
192
195
195
191
191
159
159
185
184
126
126
68
68
167
174
181
182
181
187
97
111
72hr SPTCs
D<20%
65
68
188
182
184
185
48
49
183
174
180
180
186
174
171
181
101
175
189
181
178
185
192
183
183
186
150
150
180
175
121
112
54
53
155
165
167
172
156
170
88
90
Perfect
49
49
142
142
137
137
36
36
140
140
137
137
143
133
129
134
131
135
142
142
140
140
143
143
139
139
114
114
135
135
86
86
45
45
127
131
138
132
134
137
101
107
ANN
49
49
142
142
137
137
36
36
140
140
137
137
143
132
129
134
129
134
142
142
137
140
143
143
139
139
114
114
133
133
86
86
45
45
121
122
128
132
130
136
55
60
D<20%
45
43
135
140
131
136
33
34
136
125
132
132
140
128
124
130
98
129
139
134
134
137
139
139
133
136
112
113
132
132
85
81
33
35
113
115
122
129
118
125
47
47
Table A.12: MAE of AVC Sites. ”Week-ends” Sample
AVC Number
24hr SPTCs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Perfect
12.25
10.98
8.89
7.98
6.69
5.44
9.28
6.73
7.03
6.73
6.62
6.82
6.18
9.39
8.04
5.12
4.94
5.36
6.16
6.73
6.51
6.88
7.05
8.47
7.06
7.07
5.88
6.16
7.98
7.00
9.22
10.00
15.02
12.20
11.48
18.50
17.66
7.40
20.22
18.62
7.53
7.84
ANN
21.09
23.45
17.31
9.87
9.30
6.55
10.76
14.19
7.92
11.22
8.86
9.94
8.93
12.62
12.82
8.69
8.53
14.64
13.39
9.79
9.05
8.48
14.66
16.08
11.36
8.93
6.81
6.46
9.28
8.13
12.43
15.84
21.56
25.72
16.98
32.07
29.74
8.29
22.51
27.33
9.50
21.69
48hr SPTCs
D<20%
20.98
22.39
16.87
9.33
8.82
5.10
9.27
15.11
7.62
10.98
7.89
8.43
7.66
12.76
13.06
7.96
8.41
12.33
8.42
8.39
8.98
7.27
15.89
14.97
9.90
8.87
5.71
6.44
10.14
8.06
12.35
14.03
20.94
27.37
18.17
31.07
26.26
8.46
22.99
28.62
8.82
17.42
Perfect
9.64
8.90
7.08
6.60
5.47
4.65
7.55
5.65
6.07
5.45
4.54
5.60
4.97
6.11
6.59
4.40
4.22
3.84
5.10
5.81
4.92
5.48
6.07
6.01
5.75
6.13
4.78
4.36
6.85
5.94
6.89
6.86
12.84
9.18
8.07
16.47
15.66
5.34
19.30
17.29
5.73
5.52
183
ANN
9.81
14.17
8.55
7.13
5.50
4.57
8.67
6.90
6.15
6.06
5.04
6.33
5.46
10.92
11.32
6.19
6.80
5.43
6.31
5.61
6.86
6.27
6.60
9.02
6.63
5.95
4.88
4.52
6.60
6.19
12.58
11.14
15.99
12.05
11.27
19.14
20.53
6.49
19.58
19.28
8.37
8.80
72hr SPTCs
D<20%
9.12
7.87
6.57
7.13
5.48
4.58
7.85
5.15
6.50
5.63
4.53
5.45
4.83
11.27
9.27
6.10
6.86
5.46
4.97
5.17
5.57
4.94
6.24
5.21
6.19
5.97
4.70
4.55
6.70
5.99
10.12
10.61
13.99
8.73
9.38
18.02
19.48
6.37
19.16
21.09
4.97
7.78
Perfect
10.76
8.60
6.33
6.05
5.35
4.80
7.74
6.66
5.37
5.01
4.60
6.32
4.56
6.30
6.17
4.06
4.20
3.78
4.98
5.30
4.37
5.46
5.03
4.94
5.53
5.58
4.36
4.94
6.51
5.26
6.40
5.77
10.99
7.88
6.32
16.41
14.91
5.80
16.92
15.21
4.86
4.35
ANN
12.95
10.53
6.17
6.02
5.59
4.88
7.53
6.56
5.76
5.35
5.61
6.59
4.47
8.21
8.90
5.49
14.33
5.18
5.11
5.00
5.36
5.56
5.29
5.29
5.84
5.45
4.39
4.93
6.68
5.88
9.97
9.28
12.19
7.86
7.97
17.53
16.88
5.95
16.50
15.43
5.27
5.82
D<20%
12.38
10.64
6.12
6.22
5.42
4.88
8.05
6.56
5.58
5.10
4.80
6.07
4.43
8.17
9.06
5.55
10.19
3.92
5.11
4.92
4.65
5.64
5.35
5.43
5.56
5.55
4.46
4.93
6.68
5.05
8.79
9.28
13.38
8.82
6.99
16.85
14.12
5.92
17.37
15.86
4.70
4.62
Table A.13: SDAE of AVC Sites. ”Week-ends” Sample
AVC Number
24hr SPTCs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Perfect
10.71
9.37
7.83
7.37
7.51
4.26
5.92
4.84
4.92
5.46
5.52
7.70
5.47
7.75
5.95
4.39
4.29
3.76
5.97
7.01
5.29
5.67
6.08
5.68
5.61
6.40
4.96
5.18
8.37
7.52
7.93
8.11
11.19
9.59
10.08
18.78
17.52
7.06
14.44
14.20
6.49
7.52
ANN
22.01
32.98
21.29
8.80
15.44
5.68
9.07
21.58
6.00
12.97
7.49
11.82
9.73
14.37
16.25
8.77
7.84
17.72
22.09
9.57
14.50
8.00
25.20
19.91
17.56
9.12
7.54
5.58
12.89
8.50
13.44
18.29
14.04
37.20
15.74
29.97
33.32
7.27
15.02
27.67
8.91
27.29
48hr SPTCs
D<20%
21.29
27.32
22.26
8.70
16.58
4.09
8.93
26.43
5.83
14.93
6.77
9.98
9.55
14.96
17.44
7.21
7.84
10.35
16.96
8.93
16.54
6.41
30.69
24.24
16.75
9.79
5.51
5.58
14.72
8.49
13.77
15.70
15.33
38.77
17.41
32.61
30.22
7.64
14.99
29.64
8.81
25.39
Perfect
9.15
6.83
6.04
6.42
5.80
2.90
4.87
4.01
4.46
4.12
4.33
4.94
4.54
4.63
4.78
3.96
3.87
3.05
4.84
5.29
4.06
4.45
5.30
5.34
5.14
5.63
3.90
4.35
7.19
5.78
6.18
5.42
10.61
7.27
7.20
17.69
17.43
5.24
12.58
13.65
4.99
4.16
184
ANN
7.84
13.96
8.04
6.55
6.08
2.82
6.06
7.78
4.60
5.25
4.75
6.28
6.10
21.08
11.78
4.71
6.07
4.14
9.94
4.71
7.90
5.19
6.30
22.30
5.66
5.00
3.98
4.27
6.42
5.46
14.10
7.20
12.13
10.47
13.44
19.54
24.58
9.59
12.76
14.64
13.59
8.65
72hr SPTCs
D<20%
7.13
6.19
4.46
6.75
6.29
2.85
5.93
3.91
4.75
4.31
4.74
4.86
4.61
21.73
8.90
4.63
6.26
4.22
4.94
4.29
4.77
3.38
6.07
4.51
4.32
5.19
3.89
4.32
6.40
5.30
8.31
7.13
9.70
6.31
11.86
19.43
26.25
10.45
12.95
15.88
3.47
7.73
Perfect
9.05
7.82
5.80
6.46
5.38
2.97
6.27
3.68
4.17
3.90
3.63
4.96
3.91
4.79
4.98
3.72
3.61
2.69
3.96
5.79
4.16
4.41
4.14
4.12
4.79
5.47
3.32
3.64
6.72
5.00
4.86
4.85
9.78
5.09
6.34
17.71
17.66
4.92
13.73
13.51
5.52
3.51
ANN
10.43
11.64
5.42
6.74
5.43
2.96
6.69
3.68
4.53
4.13
5.47
6.02
3.76
10.84
8.98
4.46
22.31
6.67
4.03
5.13
6.04
4.36
4.84
5.15
5.12
5.10
3.30
3.66
6.71
6.76
8.96
7.26
9.77
4.98
7.84
18.08
18.33
5.04
13.79
14.13
6.01
5.46
D<20%
10.20
12.02
5.10
6.97
5.38
2.96
7.52
3.68
4.43
4.14
4.06
5.06
3.49
11.01
9.68
4.50
18.55
3.12
4.03
5.21
4.25
4.38
4.88
5.21
4.83
5.20
3.38
3.66
6.71
5.10
7.43
7.26
9.70
5.20
7.17
17.86
16.44
5.23
14.08
14.33
5.29
4.14
Table A.14: Maximum Error of AVC Sites. ”Week-ends” Sample
AVC Number
24hr SPTCs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Perfect
51.37
50.58
34.56
42.50
49.94
19.95
23.89
16.95
24.31
33.07
28.66
43.32
25.52
33.23
27.45
21.17
19.09
15.74
28.97
33.56
25.37
27.93
38.39
28.85
26.37
27.02
25.98
22.50
45.25
34.60
49.78
36.00
47.40
39.30
44.22
122.72
104.91
29.87
80.54
86.36
31.42
40.11
ANN
96.63
174.61
135.40
42.12
133.35
30.41
39.64
124.02
36.36
91.73
37.62
65.13
61.25
104.43
116.09
54.41
47.78
115.85
141.96
46.70
118.36
43.06
168.74
185.26
122.09
63.48
47.93
24.14
113.43
42.00
69.70
103.75
65.94
180.31
82.57
162.81
226.25
30.28
80.40
185.75
48.00
142.96
48hr SPTCs
D<20%
96.63
129.00
135.40
42.12
133.35
17.67
39.64
124.02
36.36
91.73
28.67
55.07
61.25
104.43
116.09
46.15
47.78
68.52
115.20
41.61
118.36
30.11
168.74
185.26
122.09
63.48
34.30
24.14
113.43
42.00
69.70
103.75
65.94
180.31
82.57
162.81
153.48
30.28
80.40
185.75
48.00
115.03
Perfect
33.21
26.74
32.69
38.62
33.30
12.34
15.54
14.09
21.07
21.89
27.25
19.95
17.98
22.53
19.66
20.77
17.27
12.12
19.83
24.25
20.82
23.51
21.67
21.87
25.21
26.29
15.77
16.70
28.38
23.45
32.21
22.53
38.79
26.56
35.16
104.08
97.10
23.59
69.82
77.65
26.77
18.08
185
ANN
31.07
52.24
34.70
38.55
31.60
12.50
19.71
35.50
21.15
26.54
27.20
30.34
33.52
140.25
51.19
18.89
24.98
17.73
64.34
17.01
44.00
27.20
24.65
154.36
31.35
19.24
15.65
15.93
28.27
20.04
77.26
25.37
49.87
41.92
70.78
103.89
128.73
54.86
69.82
77.50
86.48
33.39
72hr SPTCs
D<20%
19.40
19.78
21.73
38.55
31.60
12.50
19.09
14.05
21.15
21.75
27.20
22.12
17.87
140.25
35.12
18.89
24.98
17.73
20.69
17.01
20.80
16.31
24.65
18.47
15.78
19.24
15.65
15.93
28.27
18.44
31.94
24.47
37.03
17.66
70.78
103.89
128.73
54.86
69.82
77.50
13.18
29.83
Perfect
31.17
30.37
31.30
37.58
27.33
11.53
19.07
12.40
20.70
17.32
24.99
21.00
17.34
17.52
17.73
18.35
16.12
9.85
18.78
22.47
19.46
20.92
20.17
16.90
22.23
22.85
12.80
16.06
24.70
18.81
20.25
20.57
33.04
21.78
29.29
107.53
97.25
20.50
81.33
77.00
23.54
12.01
ANN
33.27
44.77
30.08
37.44
27.13
11.42
21.18
12.48
20.99
17.57
26.44
29.21
15.43
63.71
42.97
18.02
99.71
36.92
18.97
17.34
33.38
20.99
25.09
28.04
22.27
21.10
12.81
16.03
24.69
35.95
39.77
24.42
31.88
21.86
29.26
107.53
96.63
21.36
81.31
76.92
22.97
24.38
D<20%
31.38
44.77
30.08
37.44
27.13
11.42
21.18
12.48
20.99
17.57
26.44
22.03
13.46
63.71
42.97
18.02
99.71
14.31
18.97
17.34
19.46
20.99
25.09
28.04
22.27
21.10
12.81
16.03
24.69
18.78
29.09
24.42
31.88
21.86
29.26
107.53
96.63
21.36
81.31
76.92
22.97
12.00
186
Ringraziamenti
Desidero ringraziare la ”Fondazione Ing. Aldo Gini” di Padova e il Collegio
Docenti della Scuola di Dottorato in Ingegneria Civile e Ambientale dell’Università
di Trieste per il prezioso supporto finanziario che ha reso possibile il mio periodo di
ricerca presso il Virginia Tech.
Ringrazio il Prof. Romeo Vescovi e il Prof. Riccardo Rossi per avermi offerto questa posibilità di crescita professionale e personale e per la disponibilità
dimostratami in questi anni.
Ringrazio inoltre il Prof. Massimiliano Gastaldi per il supporto, la disponibilità
e lo stimolante scambio di idee che mi ha sempre saputo offrire.
Un ringraziamento speciale va al Prof. Shinya Kikuchi, mio ”tutore” durante
la mia permanenza al Virgina Tech, per la sua capacità di stimolare la continua
ricerca di nuove soluzioni e la squisita accoglienza che mi ha regalato.
Un grazie alla mia famiglia per il supporto, l’aiuto e l’affetto che mi hanno dato
in questi anni, grazie ai quali ho potuto superare anche i momenti più difficoltosi.
Infine un pensiero speciale a Giulia, la mia gioia più grande.
187