Download Knowledge Discovery from Transportation Network Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Knowledge Discovery from
Transportation Network Data
Paper Review
Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and
Banich, B. Knowledge Discovery from Transportation
Network Data. In ICDE, 2005
1
Outline
●
Background.
●
Experiments.
Structurally Similar Routes
Temporally Repeated Routes
●
Experiment results.
●
Conventional techniques.
●
New challenges.
2
A natural application area for Data Mining
●
Transportation and logistics are an important
sector of the economy.
--Transportation consumes 60% of oil worldwide
●
●
Data mining has lead to significant gains in
other areas
Computer use is widespread in transportation
and logistics.
--Inventory management, parcel tracking, and even ontruck location sensors
3
Existing Applications
Data Mining
●
Mining with transactional characteristics of freight and
events.
-- i.e. classification on safety/accident records might
find that trucks are prone to accidents at 7:00 AM on east west roads.
-- NO geometry of the network.
Network Structure
●
Optimization
-- Finds solution (Minimize cost)
4
Transportation Networks
●
Graph problems
●
Graph mining
i.e. Finding the frequent sub-graphs
Algorithms
* WARMR
* AGM
* SUBDUE
* FSG
5
Dataset
●
●
●
●
Six months of origin-destination (OD) data from a large
third-party logistic company. 98,292 transactions.
Represented as a directed graph by mapping locations
to vertices.
Each transaction can then be represented as the edge
of an OD pair.
The edges are labeled with the other attributes of the
transaction: pickup date, delivery date, distance,
hours, weight, and mode. (binning strategy)
6
7
Mining Interests
●
Structurally Similar Routes
--Identify structurally similar patterns that occur in many
locations.
Methods
* SUBDUE
* FSG
●
Temporally Repeated Routes
--Find patterns of routes repeated in time, rather
than space.
Method
* FSG
8
Structurally Similar Routes
●
●
We assign all vertices the same label.
Three variants for edge labels: weight, distance,
and time.
-- OD_TD : TOTAL-DISTANCE
-- OD_GW : GROSS-WEIGHT
-- OD_TH : MOVE-TRANSIT-HOURS
9
Experiments with SUBDUE (MDL
principle)
SUBDUE: A substructure discovery system
Results:
●
Took about 3.25 hours to handle a graph of 100
vertices and 561 edges to find the best 3
patterns of beam size 4.
●
Would need 6 months on the complete graph.
●
Results were trivial.
10
●
Significant traffic from node 2 to node 4 via node 3, but
not much return traffic (deadheading)
11
Experiments with FSG
●
●
FSG mines patterns across a set of graph
transactions.
Divides the single graph into multiple distinct
sub-graphs, and treats each sub-graph as a
separate transaction.
✔
Breadth first partitioning
✔
Depth first partitioning
✔
Both may result in patterns being broken
across partitions
12
Results
●
●
●
●
●
Partition sizes; 400, 800, 1200 and 1600.
Depth-first partitioning: 200 frequent patterns
were found with the minimum support 120.
Breadth-first partitioning: 667 frequent patterns
were found with the minimum support 240.
Had runtime and memory problems with lower
supports on the breadth-first partitions.
FSG is not an appropriate tool to use for mining
recurrence patterns in a large single graph
13
14
Temporally Repeated Routes
●
●
●
FSG
Exploits the temporal nature of the
transportation graph
Partition each graph into a set of graph
transactions based on date
15
Results
●
●
Unable to run FSG on the entire data set due to
insufficient memory / swap space.
Most were small patterns. (The following is the
biggest one)
16
Patterns Discovered by Using Conventional
Mining Algorithms
●
●
●
Mapped the dataset into a standard
“transactional” representation.
Used traditional data mining approaches.
Used Weka for association rule mining,
instance (tuple) classification and cluster
analysis on the transportation data.
17
Evaluations of Conventional
Algorithms
●
●
●
Traditional data mining techniques have
produced interesting and meaningful results to
summarize our data.
Further experimentation is required to explore
the potential and limitations of these techniques
on temporal transportation network data.
Lose some insights from the structural
characteristics of the data.
18
Challenges for
Data Mining Research
●
●
●
●
Handling the temporal aspects of graphs
(dynamic graphs).
Incorporating the notion of events into a graph.
Expanding graph mining techniques beyond
data similar to molecular structures.
Determining what makes a graph pattern
interesting.
19