Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Spatial DBMS and Intelligent Transportation System Shashi Shekhar Intelligent Transportation Institute and Computer Science Department University of Minnesota [email protected] (612) 624-8307 http://www.cs.umn.edu/~shekhar http://www.cs.umn.edu/research/shashi-group/ Biography Highlights 7/01-now : Professor, Dept. of CS, U. of MN 12/89-6/01 : Asst./Asso. Prof. of CS, U of MN Ph.D. (CS), M.B.A., U of California, Berkeley (1989) Member: CTS(since 1990),Army Center, CURA Author: “A Tour of Spatial Database” (Prentice Hall, 2002) and 100+ papers in Journals, Conferences Editor: Geo-Information(2002-onwards), IEEE Transactions on Knowledge and Data Eng.(96-00) Program chair: ACM Intl Conf. on GIS (1996) Tech. Advisor: UNDP(1997-98), ESRI(1995), MNDOT GuideStar(1993-95 on Genesis Travlink) Grants: FHWA, MNDOT, NASA, ARMY, NSF, ... Supervised 7+ Ph.D Thesis (placed at Oracle, IBM TJ Watson Research Center etc.), 30+ MS. Thesis Research Interests Knowledge and Data Engineering Spatial Database Management Spatial Data Mining(SDM) and Visualization Geographic Information System Application Domains : Transportation, Climatology, Defence Computations Spatial Data Mining, SDBMS Historical Examples London Cholera (1854) Dental health in Colorado Current Examples Environmental justice Crime mapping - hot spots (NIJ) Cancer clusters (CDC) Habitat location prediction (Ecology) Site selection, assest tracking, spatial outliers ITS Database Systems Highway Based Sensor Drivers ITS Database Home, office Shopping mall Information center, PCS Traffic Reports Road Maps City Maps Construction Schedule Business Directory Transportation Planners, Policy Maker SDBMS & SDM in ITS Operational Tactical Routing, Guidance, Navigation for travelers and Commuters Asset tracking in APTS, CVO for security, and customer service Emergency services Ramp meter control (freeway operation) Incident management Event planning (maintenance, sports connection) Infrastructure security - patrol routes Snow cleaning routes and schedules Impact analysis (e.g. Mall of America) Strategic Travel demand forecasting for capacity planning Public transportation route selection Policy decision(e.g. HOV lanes, ramp meter study) Research: Driving Simulation and Safety SDBMS and SDM in ITS Transportation Manager Traffic Engineering Where are the congestion (in time and space)? Which of these recurrent congestion? Which loop detection are not working properly? How congestion start and spread? Traveler, Commuter How the freeway system performed yesterday? Which locations are worst performers? What is the travel time on a route? Will I make to destination in time for a meeting? Where are the incident and events? Planner and Research How much can information technique to reduce congestion? What is an appropriate ramp meter strategy given specific evolution of congestion phenomenon? Transportation Projects Traffic Database System Traffic Data Visualization Spatial Outlier Detection Roadmap storage and Routing Algorithms Road Map Accuracy Assessment Other: Driving Simulation In-vehicle headup display evaluation Project: Traffic Database System Sponsor and time-period: MNDOT, 1998-1999 Students: Xinhong Tan, Anuradha Thota Contributions to Transportation Domain Reduce response of queries from hours to minutes Performance tuning (table design, index selection) Contributions to Computer Science GUI design for extracting relevant summaries Evaluate technologies with large dataset Map of Station in Mpls Gui Design http://www.cs.umn.edu/research/shashi-group/TMC/html/gui.html Flow of Data From TMC TMC Server FTP link Storage at University of Minnesota Binary Data made available for researchers ASCII FTP link FTP link PC FTP link Convert binary to 5min data Conversion programs Existing Table Fivemin Detector ReadDate Time Dayofweek Volume Occupancy Validity Speed Table Designs Current Fivemin Proposed-1 Fivemn_day Proposed-2 Detector ReadDate Time Volume occupancy ReadDate Detector Vol_Occ_Validty Five_min Detector Time_id Volume DateTime Time_id ReadDate MN/Dot Five_min Detector ReadDate Hour Day_week time Binary Five_min Detector ReadDate Time occupancy occupancy validity speed Day_week validity Time Volume Vol_5_ Occl_5_ Validity_5 15mn 1hr min min _ min validity Benchmark Queries 1. Get 5-min Volume, occupancy for detector ID = 10 on Oct. 1st, 1997 from 7am to 8am 2. Get 5-min volume, Occupancy for detector ‘5’ on Aug1 1997. 3. Get 5-min volume, Occupancy for detector ‘5’ on Aug1 1997 from 6.30am to 7.30am. 4. Get average 5-min volume, occupancy, for Monday in Aug1997 between 8.00 - 8.05,8.05-8.10 …… 9.00 5. Get maximum volume, Occupancy for detector ‘5’ on Aug1 1997 from 6am to 7am 6. Get the average of AM rushhour hourly volume for a set of stations on highway I35W-NB with milepoint between 0.0 and 4.0 from Oct. 1st, 1997 to Oct. 5th , 1997 Conclusion Examples of the Query Example1: Query description: Get 5-min Volume, occupancy for detector ID = 10 on Oct. 1st, 1997 from 7am to 8am SQL statement: SELECT readdate, time, xtan.fivemin.detector, occupancy, volume FROM xtan.fivemin, xtan.datetime WHERE ReadDate = to_date('01-OCT-97', 'DD-MON-YYYY') AND time BETWEEN '0705' AND '0800' AND xtan.fivemin.Detector = '10' AND xtan.fivemin. Examples of the Query Query result 1: Examples of the Query Example2: Query description: Get the average of AM rushhour hourly volume for a set of stations on highway I35W-NB with milepoint between 0.0 and 4.0 from Oct. 1st, 1997 to Oct. 5th , 1997 SQL statement: SELECT hour, xtan.v_stat_hour.station, avg(volume) FROM tan.v_stat_hour, xtan.statrdwy WHERE ReadDate BETWEEN to_date('01-OCT-97','DD-MONYYYY') AND to_date('05-OCT-97','DD-MON-YYYY') AND hour BETWEEN '06' AND '09' AND statrdwy.route = 'I35W-I' AND statrdwy.mp >= 0.0 AND statrdwy.mp <= 4.0 AND xtan.v_stat_hour.station = statrdwy.station GROUP BY xtan.v_stat_hour.station, hour Examples of the Query Query result 2: Conclusions MN/Dot model and Proposed-II(Normalized) are the two recommended models for the final structure Proposed-II Conversion effort MN/Dot Little modification on existing loading process Needs new loading program Future Compatibility Same format remains Number of columns increases Derived data Effort needed for derived data Query More flexible Fifteenmin & hourly data exist, station data needs to be derived Less flexible Project: Traffic Data Visualization Sponsor and time-period: USDOT/ITS Inst., 2000-2001 Students: Alan Liu, CT Lu Contributions to Transportation Domain Allow intuitive browsing of loop detector data Highlight patterns in data for further study Contributions to Computer Science Mapcube - Organize visualization using a dimension lattice Visual data mining, e.g. for clustering Motivation for Traffic Visualization Transportation Manager Traffic Engineering Where are the congestion (in time and space)? Which of these recurrent congestion? Which loop detection are not working properly? How congestion start and spread? Traveler, Commuter How the freeway system performed yesterday? Which locations are worst performers? What is the travel time on a route? Will I make to destination in time for a meeting? Where are the incident and events? Planner and Research How much can information technique to reduce congestion? What is an appropriate ramp meter strategy given specific evolution of congestion phenomenon? Dimensions Available • • • • TTD : Time of Day TDW : Day of Week TMY : Month of Year S : Station, Highway, All Stations Others • Scale, Weather, Seasons, Event types, … Mapcube : Which Subset of Dimensions ? TTDTDWTMYS TTDTDWS TTDTDW TDW S STTD TTD TDW S Next Project Singleton Subset : TTD Configuration: X-axis: time of day; Y-axis: Volume For station sid 138, sid 139, sid 140, on 1/12/1997 Trends: Station sid 139: rush hour all day long Station sid 139 is an S-outlier Singleton Subset: TDW Configuration: X axis: Day of week; Y axis: Avg. volume. For stations 4, 8, 577 Avg. volume for Jan 1997 Trends: Friday is the busiest day of week Tuesday is the second busiest day of week Singleton Subset: S Configuration: X-axis: I-35W South; Y-axis: Avg. traffic volume Avg. traffic volume for January 1997 Trends?: High avg. traffic volume from Franklin Ave to Nicollet Ave Two outliers: 35W/26S(sid 576) and 35W/TH55S(sid 585) Dimension Pair: TTD-TDW Configuration: Trends: X-axis: time of date; Y-axis: day of Week f(x,y): Avg. volume over all stations for Jan 1997, except Jan 1, 1997 Evening rush hour broader than morning rush hour Rush hour starts early on Friday. Wednesday - narrower evening rush hour Dimension Pair: S-TTD Configuration: X-axis: Time of Day Y-axis: Highway f(x,y): Avg. volume over all stations for 1/15, 1997 Trends: 3-Cluster • North section:Evening rush hour • Downtown area: All day rush hour • South section:Morning rush hour S-Outliers • station ranked 9th • Time: 2:35pm Missing Data Dimension Pair: TDW-S Configuration: Trends: X-axis: stations; Y-axis: day of week f(x,y): Avg. volume over all stations for Jan-Mar 1997 Busiest segment of I-35 SW is b/w Downtown MPLS & I-62 Saturday has more traffic than Sunday Outliers – highway branch Post Processing of cluster patterns Clustering Based Classification: Class 1: Stations with Morning Rush Hour Class 2: Stations Evening Rush Hour Class 3: Stations with Morning + Evening Rush Hour Triplet: TTDTDWS: Compare Traffic Videos Configuration: Traffic volume on Jan 9 (Th) and 10 (F), 1997 Trends: Evening rush hour starts earlier on Friday Congested segments: I-35W (downtown Mpls – I-62); I-94 (Mpls – St. Paul); I-494 ( intersection I-35W) Size 4 Subset: TTDTDWTMYS(Album) Configuration: Trends: Outer: X-axis (month of year); Y-axis (highway) Inner: X-axis (time of day); Y-axis (day of week) Morning rush hour: I-94 East longer than I-35 W North Evening rush hour: I-35W North longer than I-94 East Evening rush hour on I-94 East: Jan longer than Feb Project: Spatial Outlier Detection Sponsor and time-period: USDOT/ITS Inst. (2000-2002) Students: C T Lu, Pusheng Zhang Contributions to Transportation Domain Filter/reduce data for manual browsing Identify days with spatial outliers Identify sensors with anamolous behaviour Contributions to Computer Science Unified definition of spatial outliers using algebraic aggregates Spatial outlier detection algorithm = scan spatial join Algorithms for Spatial outlier detection Spatial outlier A data point that is extreme relative to it neighbors Given A spatial graph G={V,E} A neighbor relationship (K neighbors) An attribute function f: V -> R Test T for spatial outliers Find O = {vi | vi V, vi is a spatial outlier} Objective Correctness, Computational efficiency Constraints Computation cost dominated by I/O op. Test T is an algebraic aggregate function Spatial outlier detection Example Outlier Detection Test 1. Choice of Spatial Statistic S(x) = [f(x)–E y N(x)(f(y))] Theorem: S(x) is normally distributed if f(x) is normally distributed 2. Test for Outlier Detection | (S(x) - s) / s | > Hypothesis I/O cost = f( clustering efficiency ) f(x) Spatial outlier and its neighbors S(x) Spatial outlier detection Results 1. CCAM achieves higher clustering efficiency (CE) 2. CCAM has lower I/O cost 3. Higher CE leads to lower I/O cost 4. Page size improves CE for all methods Cell-Tree CE value I/O cost CCAM Z-order Project: Roadmap storage and Routing Algorithms Sponsor and time-period: FHWA/MNDOT, 1993-1997 Students: Prof. Du-Ren Liu, Dr. Mark Coyle, Contributions to Transportation Domain Andrew Fetterer, Ashim Kohli, Brajesh Goyal CRR = measure of storage methods for roadmaps In-vehicle navigation devices, routing servers on web Contributions to Computer Science CCAM - Better storage method for roadmaps Hierarchical routing - optimal routes even when map-size > memory size Road Map Storage - Problem Statement Given roadmaps Find efficient data-structure to store roadmap on disk blocks Goal - Minimize I/O-cost of operations Find(), Insert(), Delete(), Create() Get-A-Successor(), Get-Successors() Constraint Roadmaps larger than main memories Mpls map partitioning 1 Another way that we may partition the street network for Minneapolis among disk blocks for improving performance of network computations. Mpls map partitioning:CCAM This is one way that we may partition the street network for Minneapolis among disk blocks for improving performance of network computations. Road Map Storage Insight: I/O cost of network operations is minimized by maximizing CRR = Pr. ( road-intersection nodes connected by a road-segment edge are together in a disk page) WCRR = weighted CRR (edges have weights) Commercial database support geometric storage methods even though CRR is a graph property Measurements of CRR Shortest Path Problem Route computation Find a rout from current location to destination Criteria: Shortest travel distance or smallest travel time Useful for Travel during rush hour Travel in an unfamiliar area Travel to an unfamiliar destination Problem definition Given Graph G=(N,E,C) Each edge (u,v) in E has a cost C(u,v) Path from source to destination is a sequence of nodes Cost of path=C(vi-1,vi) A path cost estimation is a function f(u,v) that computes estimated cost of an optional path between the two nodes Smallest Paths Blue: Smallest travel time path between two points. It follows a freeway (I-94) which is faster but not shorter in distance. Red: Shortest distance path between the same two points Routing around incidents Algorithm for Single pair Path Computation Road Map Size<<Main Memory Size Iterative Algorithm Dijkstra’s Algorithm A* algorithm A* with euclidean distance heuristic A* with manhattan distance heuristic Road Map Size >> Main Memory Size Traditional algorithm run into difficulties! Hierarchical Algorithm Motivation for Hierarchical Algorithms Road Map Size >> Main Memory Size Traditional algorithms yield sub-optimal path Heuristics - bounding box (source, destination) or Freeway first then sideroads Example: Microsoft Expedia route(Tampa FL to Miami, FL via Canada) Need an algorithm to give optimal route A piece of roadmap in memory at a time Intuition - travelling from island to island Hierarchical Routing : Step 1 Step 1: Choose Boundary Node Pair Minimize COST(S,Ba)+COST(Ba,Bd)+COST(Bd,D) Determining Cost May Be Non-Travial Hierarchical Routing : Step 2 Step 2: Examine Alternative Boundary Paths Between Chosen Pair (Ba,Bd) Hierarchical Routing : Step 2 result Step 2 Result: Shortest Boundary Path Hierarchical Routing : Step 3 Step 3: Expand Boundary Path: (Ba1,Bd) -> Ba1 Bda2 Ba3 Bda4…Bd Boundary Edge (Bij,Bj) ->fragment path (Bi1,N1N2N3…….Nk,Bj) Project: Road Map Accuracy Assessment Sponsor and time-period: 10+ State DOTs, 2001-2003 Co-investigators: Prof. Max Donath, Dr. Pi-Ming Chen Students: Weili Wu, Hui Xiong, Zhihong Yao Contributions to Transportation Domain Defining map accuracy for navigable roadmaps Site selection for evaluating GPS and roadmap accuracy Contributions to Computer Science Definition of Co-location patterns with linear features Efficient algorithms for finding those Motivation: Identify road given GPS GPS accuracy and roadmap accuracy Garmin error circle USA topo Road Map Accuracy Evaluation of digital road map databases Goals road user charge system needs: accuracy, coverage Recommend a cost-effective approach Develop the content and quality requirements Rationale Each GIS dataset can contain various errors From different sources E.g. Map Scale, Area Cover, Density of Observations Failure to control and manage error Limit or invalidate GIS applications Map analysis questions Site Selection: Which road segments are vulnerable for mis-classification given GPS accuracy? Feasibility Issue: What fraction of highway miles are vulnerable? Problem Definition Given: A digital roadmap and a Gold standard Find: Spatial Accuracy of the given GIS dataset Objective: Fair, reliable Constrains: Gold-standard accuracy is better than GIS dataset accuracy Framework to test positional accuracy Compare with a reference of higher accuracy source Use internal evidence find a larger scale map use the Global Positioning System (GPS) use raw survey data Indications of inaccuracy: Unclosed polygons, lines which overshoot or undershoot junctions A measure of positional accuracy: The sizes of gaps, overshoots and undershoots Compute accuracy from knowledge of the errors By different sources, e.g 1 mm in source document 0.5 mm in map registration for digitizing 0.2 mm in digitizing Approach 1 : Visual Overlay of GPS Tracks Vs. Road Maps Tiger-based Map USGS Digital Map 2: National Map Accuracy Standard Pr. [ distance( P on map, real P) < D ] > 0.9 Tiger file in Windham County, VT (50025) Limitations of Related Work, Our Approach Natl. map accuracy standard Based on land survey of a sample of points Not aware of GPS accuracy Mixes lateral error and longitudal error Our Approach Lateral vs. longitudal positional accuracy Road classification accuracy Attribute accuracy Positional Accuracy Lateral accuracy Perpendicular (RMS) distance from GPS reading to center line of road in road map. Longitudinal accuracy Definition: horizontal distance from GPS reading to corresponding Geodetic point. Comment: Lateral error is more important when closest road is paralled Longitudinal error is important for other case Road Classification Accuracy Probability of correctly classifying road for a given GPS Fraction of miles of roads correctly classified at given confidence level (e.g. 90%) Attribute Accuracy & Completeness Interesting Attributes: Definition of Attribute Accuracy: Pr[Value of an attribute for given road segment is correct] Definition of Completeness: Economic attributes - administration zone(s), congestion zones Route attribute - name, type, time restrictions Route segment - direction, type (e.g. bridge), restrictions Routing attributes - intersections, turn restrictions Pr[a road’s segment is in digital map] Pr[attribute value is not defined for a road segment] Scope: Small sample Core Activities 1. 2. 3. 4. 5. 6. 7. Acquire digital road maps Select test sites Gather gold standard data for test site GPS tracks, Surveys, etc. Complete subsets of road maps for test sites Compute accuracy measures Statistical analysis Visualization Map Acquisition Etak/Tele Atlas map data for 7 counties of metropolitan Twincities Site Selection Red : another road within digen distance threshold (e.g. 30m) Blue: no other road withindistance threshold Site Selection - Zoom in Around Hwy 100, 169,7 in SW metro Comparing GPS tracks and maps Overlay of GPS tracks and digital road map (Hwy 7) Comparing GPS tracks and maps Overlay of GPS tracks and digital road map (Hwy 7) Other Challenges 1. 2. 3. Center-line representation of roads Two-dimensional maps Multi-level roads Altitude issues Map matching Conclusions Spatial databases, data mining and visualization Are useful for many ITS problems We have only scratched the surface so far Many new exciting opportunities ATMS : visualize freeway operations for operations, and planning, communicate impact of policies on freeway operations to public and lawmakers, new insights into congestion patterns, APTS : track buses for customer service, sercurity; communicate impact of APTS in reducing congestion. ATIS : understand traffic behaviour for route and transportation mode selection Motivation for Traffic Visualization Transportation Manager Traffic Engineering Where are the congestion (in time and space)? Which of these recurrent congestion? Which loop detection are not working properly? How congestion start and spread? Traveler, Commuter How the freeway system performed yesterday? Which locations are worst performers? What is the travel time on a route? Will I make to destination in time for a meeting? Where are the incident and events? Planner and Research How much can information technique to reduce congestion? What is an appropriate ramp meter strategy given specific evolution of congestion phenomenon?