Download Real World Applications: SumTime-Mousam

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Applications of Spatial Data
Mining & Visualization - Case
Studies
Introduction
• Meteorological Data and Demographics Data hold
important information that can help in several
application contexts
• Several data mining applications possible on these
data sets
• In the department we have research projects working
on these data
– RoadSafe – Summarizing large spatio-temporal weather
prediction data
– Atlas.txt – Summarizing UK 2001 Census data
• Both these projects present summaries to users in
natural language, English and other modes
• Real World applications contain data mining as one of
the modules or tasks in the project
– Not as the end product in itself
2
Road Ice Forecasts -RoadSafe
•
•
Road Ice Forecasts:
– Are required by local councils for winter road maintenance
operations
– Are driven by computer simulation models that predict weather
conditions for 1000’s of points on a road network
– Output of model is a huge spatio-temporal data set (up to 33mb for
some councils)
– Form part of a road forecasting service delivered to Road Engineers
via an online Road Weather Information System (RWIS)
RWIS allows model data to be communicated in various modalities, e.g.
text, tables, graphs and maps
3
• Model output is a large spatio-temporal data set (in order of
Megabytes)
• Road network split into routes, 9 meteorological parameters (e.g. Road
Surface Temperature) measured at each point on a route
• Sampled at 20 minute intervals over a 24hr period
4
5
24 Hour Forecast for Kirklees
All Routes
Min RST
Time <=
0c
Ice
Hoar
Frost
Snow
Fog
MaxGusts
Rain
TS
Worst/Best
-1.1 /1.4
21:00 /NA
Yes /No
No/No
No/No
Yes/Yes
15/13
No /No
No
Wind (mph)
Weather
Light south to south-easterlies for the duration of the forecast period. Winds may
become more moderate late morning on higher ground, but remaining southerly.
A mainly cloudy night, with foggy patches across much of the forecast area.
Higher ground above the low cloud level could see temperatures drop below
freezing during the late evening, with most western parts of the forecast area
dropping below freezing by the morning. Urban areas are expected to remain
marginal throughout the night.
Route
All routes summary worst/best
1
0.4/1.8
NA/NA
No/No
No/No
No/No
Yes/Yes
13/11
No /No
No
2
0.7/2.0
NA/NA
No/No
No/No
No/No
Yes/Yes
13/10
No /No
No
3
0.5/1.8
NA/NA
No/No
No/No
No/No
Yes/Yes
13/9
No /No
No
4
0.4/1.8
NA/NA
No/No
No/No
No/No
Yes/Yes
13/12
No /No
No
5
0.7/1.9
NA/NA
No/No
No/No
No/No
Yes/Yes
13/9
No /No
No
6
0.7/2.1
NA/NA
No/No
No/No
No/No
Yes/Yes
13/11
No /No
No
7
0.9/1.8
NA/NA
No/No
No/No
No/No
Yes/Yes
13/9
No /No
No
8
0.8/2.1
NA/NA
No/No
No/No
No/No
Yes/Yes
13/9
No /No
No
9
1.4/2.1
NA/NA
No/No
No/No
No/No
Yes/Yes
13/9
No /No
No
10
0.8/1.9
NA/NA
No/No
No/No
No/No
Yes/Yes
13/9
No /No
No
11
0.3/1.8
NA/NA
No/No
No/No
No/No
Yes/Yes
13/11
No /No
6
No
12
-0.8 /1.5
22:40 /NA
Yes /No
No/No
No/No
Yes/Yes
15/11
No /No
No
Problem
• Input: Spatio-temporal weather prediction data
(shown on slide 4)
• Output: Summary of input data (shown on slide 6)
• Task:?
– There is no well defined data mining task (classification or
clustering or a new task)
– Clusters of similar weather spatially and temporally can be
one kind of summary
– Classification of routes can be another kind of summary
– Both used in the final system
• Challenges
– Complex spatio-temporal data set
– Spatio-temporal analysis methods are still maturing
– Even visualization of the entire data is hard
7
8
Overview of Data Analysis
•
•
•
•
–
–
–
Two main challenges:
Analysing the input data along the temporal dimension
Analysing the input data along the spatial dimension
Ideally analysis should be performed on both dimensions
simultaneously
Solution inspired by Video Processing
The input data set is seen as a video containing 3*24*9=648 frames
(maps)
3 key elements:
0. Pre-processing – geo-characterization – merging required data
with other relevant themes
1.
•
2.
3.
-
Low level processing
Global Trends – Temporal segmentation
Local Events – Spatial Segmentation (Classification and Clustering)
Event detection and indexing
Keyframe extraction.
Extracted keyframes form the summary
9
Preprocessing
•
Geographic Characterisation assigns properties to each
data point based on frames of reference for the region
•
Frames of reference used for spatial clustering
10
Spatial Reference Frames
•
•
•
Spatial descriptions should be meteorologically correct (not
necessarily most geographically accurate)
Forecasters consider how geography influences weather
conditions in their descriptions (meteorological inferences)
"exposed locations may have gales at times”
Dominant geographical features within regions also affect
the reference strategy
Kirklees (land locked)
Hampshire
1. Altitude
1. Coastal Proximity
2. Direction
2. Altitude
3. Population
3. Direction
4. Population
11
Spatial Segmentation
• Each of the 648 frames (maps) are analysed to
compute spatial segmentations (clusters)
• Because weather parameters are continuous, they are
first discretized
• E.g for road surface temperature (map shown on the
next slide)
–
–
–
–
OK => {>4}
Marginal => {<=4 & >1}
Critical => {<=1 & >0}
Subzero => {<=0}
• Density based clustering used for performing spatial
segmentation
12
Discretization of weather
parameters
13
Cluster Densities
Frame of Reference
Altitude
0m:
100m:
200m:
300m:
400m:
500m:
Direction
Central:
Northeast:
Northwest:
Southeast:
Southwest:
Urban/Rural
Rural:
Urban:
Proportion of subzero points
07:20 0740 08:00 08:20 08:40
0.0
0.0
0.0
0.0
0.041
0.5
0.0
0.0
0.0
0.0
0.041
1.0
0.0
0.0
0.0
0.0
0.12
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.125 0.166
1.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.014 0.021
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.035 0.0354
0.0
0.0
0.0
0.0
0.042
0.002 0.003
0.0
0.0
0.005 0.006 0.007
0.0
0.0
0.0
14
-1
11:00
10:00
09:00
08:00
Marginal
07:00
06:00
05:00
04:00
03:00
02:00
01:00
00:00
1
23:00
22:00
21:00
20:00
19:00
18:00
17:00
4
16:00
15:00
14:00
13:00
12:00
Min. Road Surface Temperature
All Data Points 13/01/2007
7
6
Trend (RST Decreasing)
5
Ok
3
2
Event (RST <= 0)
Critical
0
Time
15
Atlas.txt
• Is an ongoing research project
– Produces textual summaries of geo-referenced
statistics
– for visually impaired users
• The focus of the project is more on
visualization of spatial data by visually
impaired (VI) users
– Spatial data is essentially geometric and it is not
clear how visually impaired users model geometric
information
– In the absence of vision , is it possible to model
geometric information based on tactile and audio
inputs?
• If possible, what is the nature of these mental models of
geometries
16
Input
<2.2
<3.5
<4.8
<6.1
%Unemployment
in Aberdeen
17
Output
• No gold standard models of spatial information
suitable to VI users available
• So several alternative summaries of spatial
information that need to be tested on real users
• One possible example textual summary:
“Some wards in the east and central parts (3,5,6,9) of
the city have high percentage of unemployed people
aged 16-74 above 03.51%”
• Are the textual summaries adequate on their own?
• Do they need to be supplemented by tactile or sonic
maps?
– Tactile maps
http://homepages.phonecoop.coop/vamos/work/intact/
– Sonic Maps http://www.cs.umd.edu/hcil/audiomap/
18
Problem
• Input: 2001 UK census data
• Output: Summary of input data
• Task: Spatial segmentation + Spatial
visualization for VI users
– Unlike RoadSafe the data mining task is well
defined
– What is less defined though is the task of
visualization of summary by VI users
– Shape (geometry) and topology of segments need
to be accessible to visually impaired users
19
Space and Visual Impairment
• Atlas.txt is an ongoing research project
– more open questions than useful answers
• VI users need to perform two tasks for
modeling spatial data
– Scanning space for information
• Several scanning strategies possible
• E.g. Left-right VS top down
– Coding spatial information using a suitable
reference frame
• Once again several coding strategies available
• E.g. body (ego) centric VS external
• VI users are trapped in a vicious circle while
finding efficient scanning and coding
strategies
20
Strategic Disadvantage for VI users
• Scanning strategy determines the quality of spatial
information acquisition
– But better scanning strategy possible only with knowledge of
spatial information
• Sighted users take a quick look at an image which helps them to
scan the image lot more efficiently
• VI users do not have the luxury of a quick glance!
• Coding strategy determines the quality of mental
representation
– Mental models coded on body centric reference frame less
useful for complicated spatial analysis
– External reference frames help to code better quality
mental models
– VI users need improved scanning strategies for acquiring
suitable external reference frames
– Because VI users are disadvantaged to find a quality
scanning strategy, they are also disadvantaged to find a
quality coding strategy
21
Solution Options
• VI users clearly need external help in finding suitable
external reference frames
• Atlas.txt solution
– Identify several reference frames and present summary
coded in each of these
– VI users may be familiar with some spatial layouts
• E.g. telephone key pad and clock face
– Use several of these to code summary information
“Some wards in the east and central parts (3,5,6,9) of the city
have high percentage of unemployed people aged 16-74 above
03.51%”
– E.G. ‘east and central parts’ can also be expressed by
(3,5,6,9) each number referring to a location on the
telephone keypad layout
22