Download I(t) - Projekt CRISIS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Integrácia a spracovanie údajov
o životnom prostredí
Technológia ADMIRE
Ondrej Habala
Seminár CRISIS, 18.10.2011
ITMS 26240220060
Goals
• Accelerate access to and increase the benefits from
data exploitation;
• Deliver consistent and easy to use technology for
extracting information and knowledge;
• Cope with complexity, distribution, change and
heterogeneity of services, data, and processes,
through abstract view of data mining and
integration; and
• Provide power to users and developers of data
mining and integration processes.
ITMS projekt 26240220060
ADMIRE Architecture:
Separation of Concerns
ITMS projekt 26240220060
ADMIRE Architecture
ITMS projekt 26240220060
ADMIRE’s High-Level Architecture
ITMS projekt 26240220060
ADMIRE Gateways
USMT
ITMS projekt 26240220060
DISPEL – Data Intensive Systems
Process-Engineering Language
• Data-intensive distributed systems
• Connection point of complex application requests
and complex enactment systems
–Benefit: method development, engineering and evolution
of supported practices can take place independently in
each world
• Describes enactment requests for streaming-data
workflows processes
• “Process-engineering time” – transform and optimize
process in preparation for enactment period
ITMS projekt 26240220060
DISPEL: Simple Example
Creating streams of literals
String sql1 = "SELECT * FROM some_table";
String sql2 = “SELECT * FROM table2”;
String resource = "128.18.128.255";
SQLQuery query = new SQLQuery;
|- sql1, sql2 -| => query.expression;
|- resource -| => query.resource;
Tee tee = new Tee;
query.result => tee.connectInput;
Creating connections
ITMS projekt 26240220060
DISPEL – real use
ITMS projekt 26240220060
APLIKAČNÉ ŠTÚRIE
NASADENIE TECHNOLÓGIE ADMIRE V ŽIVOTNOM PROSTREDÍ
18.10.2011
ITMS projekt 26240220060
Flood Application
Data sets used in hydrological scenarios
Dataset
Domain
Description
Volume
Temporal
coverage
Spatial coverage
HUSAV
Hydrology
Data from two probes, containing water
saturation of soil
10s of MB
1998-2007
Two distinct points
MARS
Meteorology
Historical meteorological data
(temperature, rainfall, etc) for Slovakia
100s of MB
1975-2007
Slovakia (grid 50x50
km)
SVP
Hydrology
Data from waterworks in western Slovakia 100s of MB
(mainly river Váh) – outflows, water levels,
temperature, rainfall
1998-2007
15 distinct waterworks
DAISY
Pedology
Various pedological parameters for one
probe in southern Slovakia
10s of MB
1961-2000
One point
WOFOST
Pedology
10s of MB
2006
Slovakia (grid)
SHMU_CURR
Meteorology
Crop data (with attached soil and
meteorological data) for Slovakia, year
2006
On-line database of meteorological data –
copied from SHMI web; including radar
imagery
10s of GB +
2008-
Slovakia (about 100
distinct probes)
SHMU_HIST
Meteorology
Historical meteorological data from SHMI
probes
100s of MB
1998-2007
Slovakia (more than
100 distinct probes)
SHMU_GRIB
Meteorology
Historical temperatures and rainfall
amounts in a gridded binary format
100s of GB
1998-2007
Slovakia (grid, various
sizes)
RADAR
Meteorology
Weather radar imagery
100s of GB
2005-2008
Slovakia
Hydrology
China, August 10-12
Historical dataYantai,
from hydrological
10s of MB
1998-2007
Orava and upper11Vah
FSKD 2010
SHMU_HYDRO
ITMS projekt 26240220060
Orava scenario
• Legend
– Green area – Orava
(part of north Slovakia)
– Blue – Orava reservoir
and local rivers
– Red dots – hydrological
measurement stations
• Notes
– We are interested only
on hydrological
stations below the
Orava reservoir
– In our tests we will use
the hydrological
station 5830 (Tvrdosin)
ITMS projekt 26240220060
ORAVA – data mining concept
• Predictors – rainfall amount (reservoir and station), air
temperature (reservoir and station), reservoir
discharge, reservoir temperature
• Targets –
water level
and
temperature
at a station
below the
reservoir
Time
Water
temp
Rainf
all
Air
temp
Air
temp
Orava
Orava
Orava
Station
RainFall
Outflow
Station
Orava
Water level
Water
temp
Station
Station
T-4
E-4
R-4
A-4
B-4
S-4
D-4
X-4
Y-4
T-3
E-3
R-3
A-3
B-3
S-3
D-3
X-3
Y-3
T-2
E-2
R-2
A-2
B-2
S-2
D-2
X-2
Y-2
T-1
E-1
R-1
A-1
B-1
S-1
D-1
X-1
Y-1
T
E
R
A
B
S
D
X
Y
T+1
R+1
A+1
B+1
S+1
D+1
X+1
Y+1
Targets of data mining
T+2
R+2
A+2
B+2
S+2
D+2
X+2
Y+2
Given in a schedule
T+3
R+3
A+3
B+3
S+3
D+3
X+3
Y+3
T+4
R+4
A+4
B+4
S+4
D+4
X+4
Y+4
T+5
R+5
A+5
B+5
S+5
D+5
X+5
Y+5
T+6
R+6
A+6
B+6
S+6
D+6
X+6
Y+6
Predicted by a meteo model
ITMS projekt 26240220060
ORAVA – data integration
• Integration of
data from
– GRIB files
– Reservoirs
• Inputs
– Time period of
experiment
– Reservoir ID
– List of hydro
stations
– Geo coordinates
ITMS projekt 26240220060
ORAVA – data sets
Dataset
SVP
Domain
Hydrology
SHMU_CURR Meteorology
SHMU_HIST
Meteorology
SHMU_GRIB
Meteorology
SHMU_HYDR Hydrology
O
ITMS projekt 26240220060
Description
Volume
Temporal
coverage
Data from waterworks in
100s of MB 1998-2007
western Slovakia (mainly
river Váh) – outflows, water
levels, temperature, rainfall
On-line database of
10s of GB + 2008meteorological data – copied
from SHMI web; including
radar imagery
Historical meteorological
100s of MB 1998-2007
data from SHMI probes
Historical temperatures and 100s of GB
rainfall amounts in a gridded
binary format
Historical data from
10s of MB
hydrological measurement
stations
1998-2007
1998-2007
Spatial
coverage
15 distinct
waterworks
Slovakia (about
100 distinct
probes)
Slovakia (more
than 100 distinct
probes)
Slovakia (grid,
various sizes)
Orava and upper
Vah river
ORAVA Scenario
Integrated and preprocessed data
Integrated raw data
Water_temp
Air_temp
[24 hours] Orava
Orava
Rainf
all
Outflow
Rainfall
Air_temp
Flow/Height
Water_temp
Orava
Station
Station
Station
Station
1
30
30
30
30
30
50
50
-5.55E-20
-5.55E-20
-4.24E-20
-8.47E-20
-8.47E-20
-8.47E-20
-8.47E-20
269.0278
269.0476
269.5059
270.2394
270.8507
271.2792
271.9238
28
28.62
28.62
28.62
28
28
28
0.7
0.7
0.7
0.7
0.7
0.7
0.8
Time [hours]
Orava
-4
-4
-5
-5
-5
-3
-3
Water_temp
Air_temp
Rainfall
Outflow
Rainfall
Air_temp
Flow/Height
Water_temp
Orava
Orava
Orava
Orava
Station
Station
Station
Station
1.000000
1.000000
0.995833
0.991667
0.987500
0.983333
0.979167
ITMS projekt 26240220060
-4.0
-4.0
-5.0
-5.0
-5.0
-3.0
-3.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
30.0
30.0
30.0
30.0
30.0
50.0
50.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-3.12223
-3.10240
-2.64408
-1.91062
-1.29926
-0.87076
-0.22617
28.00
28.62
28.62
28.62
28.00
28.00
28.00
0.7
0.7
0.7
0.7
0.7
0.7
0.8
Time [hours]
Integrated preprocessed data
Orava Scenario
Water temperature prediction
Properties \ Model
Correlation coefficient
Linear
regression
0.9639
Multilayer
perceptron
0.9821
Mean absolute error
1.1791
0.7748
Root mean squared
error
1.4607
1.0386
23.8739 %
15.6884 %
26.609 %
18.9195 %
8760
8760
Relative absolute error
Root relative squared
error
Total Number of
Instances
ITMS projekt 26240220060
Orava Scenario
Water level prediction
Properties \ Model
Correlation coefficient
0.9816
Mean absolute error
0.4105
Root mean squared error
0.9673
Relative absolute error
30.5869 %
Root relative squared
error
19.2384 %
Total Number of Instances
ITMS projekt 26240220060
Multilayer
perceptron
8735
Orava Scenario
Data integration workflow
ITMS projekt 26240220060
Orava Scenario
Training workflow
ITMS projekt 26240220060
Orava Scenario
Prediction workflow
ITMS projekt 26240220060
Implementation Notes
• Needed to write custom activities for certain data
extraction tasks
• Data integration was the most complex part of the
scenario in terms of workflow design
• Data integration was quite easy to write and modify
in DISPEL once we had all the PEs in place
– Used composite PE to extract different types of quantities
from meteorological GRIB files
ITMS projekt 26240220060
ADMIRE Architecture:
Separation of Concerns
ITMS projekt 26240220060
Orava Scenario
Portal
ITMS projekt 26240220060
Orava Scenario
Portal
ITMS projekt 26240220060
Radar Scenario
Very short-term rainfall prediction
from weather radar data
Radar Scenario
Description
• Very short-term rainfall
prediction from weather
radar data
Movement of areas with higher air
moisture content, and thus also
higher precipitation potential
Network of synoptic stations in Slovakia
• 27 stations in Slovakia
• Used data from years 2007 and 2008
• Available variables: rainfall, humidity, Radar
reflexivity, atmospheric pressure and temperature
values for each hour
ITMS projekt 26240220060
Radar Scenario
Main predictors and target variables
Time
Wind
Radar
reflexivity
Rainfall
Orava
T-2
W-2
D-2
F-2
T-1
W-1
D-1
F-1
T
W
D
F
T+1
W+1
D+1
F+1
T+2
W+2
D+2
F+2
ITMS projekt 26240220060
• Overview of the main predictors
and target variables in the Radar
scenario.
• The green cells are predicted
from meteo-model. Blue cells are
from model, based on motions
vectors. Yellow cells are final
target of data mining.
Radar Scenario
Atributes of model
• Isotonic regression model
• 10-foldNumerical
Cross Validation
characteristic
Value
0.4593
0.1105
0.5490
89 746
Correlation coefficient
Mean absolute error
Root mean squared error
Total number of instances
• Hydro-meteorological
performance
Attribute \ Threshold
0.3 mm 0.6 mm
Probability of detection
Miss Rate
Hans-Kuiper True skill score
Proportion of correct
ITMS projekt 26240220060
0.6387
0.0185
0.5987
0.9443
0.5622
0.0158
0.5383
0.9618
RADAR model
• Other tested models
– Neural networks, SMOreg, linear regression, ...
– Reached correlation coeficient between 0,35 and 0,42
– Validation - 10 Cross Fold
• Problems in model creation :
– process is significantly stochastic
– Some input variables/parameters (humidity) are backwards
dependent on output – rainfall.
– Meteorological process is very sensitive
– Reflection matrix represents quantity of water in atmosphere,
not exact rainfall rate in specified area, as opposed to data from
synoptic stations
ITMS projekt 26240220060
Radar Scenario
Start
Step
SelectRangeFiles
Training
Expression
End
Forecast
Filename
RadarDataTime
Synchronization
End
Host
ReadRaw
RadarData
Load model
ObtainFromFTP
RadarDataTime
Synchronization
Deserializer
RadarDataSpace
Synchronization
RadarDataSpace
Synchronization
Step
SelectRadarFiles
ReadRaw
RadarData
Resource
Precipitation
SQL Query
Start
Expression
Resource
Rainfall data
(SQL Query)
Expression
Tuple Aritmetic
Project
Tuple aritmetic
project
Column
names
Generic Tuple
Transform
Classify
Algorithm
class
BuildClassifier
Generic Tuple
Transform
Tuple Simple
Merge
Class index
TupleToCSV
Serialiser
Host
DeliverToFTP
Result
Filename
ITMS projekt 26240220060
Header
Radar Scenario
Motion vector computation
file name
resource
file name
Read From File
resource
file name
Read From File
resource
Read From File
ImageMotion
Vector
Radar Image
Motion
RadarImage
Visualization
file name
RadarImage
Visualization
file name
DeliverToFTP
host
ITMS projekt 26240220060
RadarImage
Visualization
file name
DeliverToFTP
host
DeliverToFTP
host
SVP Scenario
Forecast of reservoir inflow based on
temperature, precipitation and snow
cover
SVP Scenario
Structure of data
Time
Air
Temperature
Rainfall
Orava
Snow_prev
Snow
Inflow_prev
t-1
E(t-1)
R(t-1)
t
E(t)
R(t)
P(t)
S(t)
I(t)
F(t)
t+1
E(t+1)
R(t+1)
P(t+1)
S(t+1)
I(t+1)
F(t+1)
t+2
E(t+2)
R(t+2)
P(t+2)
S(t+2)
I(t+2)
F(t+2)
t+3
E(t+3)
R(t+3)
P(t+3)
S(t+3)
I(t+3)
F(t+3)
t+4
E(t+4)
R(t+4)
P(t+4)
S(t+4)
I(t+4)
F(t+4)
Two steps of prediction :
S(t-1)
Inflow
F(t-1)
1. Copy previous values of snow quantity and inflow 1. P(t) = S(t-1)
I(t) = F(t-1)
volume.
2. S(t) = f(P(t), R(t), E(t))
2. Apply trained models (snow model at first, and
F(t) = h(I(t), S(t), E(t), R(t))
then inflow model).
ITMS projekt 26240220060
SVP Scenario
Models & Attributes
• 10-Fold Cross Validation, 8760 records; models for inflow prediction
Properties \ Model
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Perceptron
Neural
Network
0.8810
7.0577
14.1005
40.5821%
48.6547%
Gaussian
Process
0.8469
6.9821
15.4974
40.1472%
53.4747%
Linear
Regression
Decision Tree
M5P
0.8079
8.3816
17.0586
48.1942%
58.8616%
0.8899
5.2562
13.1983
30.2231%
45.5415%
• N-Fold Cross Validation, 8760 records; Decision Tree Model M5P
Properties \ N-Fold
Correlation coefficient
N = 10
0.8899
N = 20
0.8933
N = 25
0.8855
N = 50
0.8937
N = 100
0.8934
5.2562
5.1253
5.2484
5.0973
5.0908
13.1983
13.0090
13.4454
12.9807
13.0033
Relative absolute error
30.2231%
29.4869%
30.2017%
29.3317%
29.2915%
Root relative squared error
45.5415%
44.9218%
46.4373%
44.8306%
44.9086%
Mean absolute error
Root mean squared error
ITMS projekt 26240220060
SVP Scenario
Data Integration workflow
Query
Resource
Inflow into reservoir
(SQL Query)
Query
Resource
Query
Resource
Temperature and rainfall
at reservoir
(SQL Query)
Quantity of snow
(SQL Query)
Daily Aggregation
Tuple merge
Tuple merge
Expression
Final projection
(TupleAritmeticProject)
Result col.
names
Eliminate summer seasons
(GenericTupleTransform)
Transform to WRS
(TupleToWRS)
Integrated
data
ITMS projekt 26240220060
SVP Scenario
Model training workflow
Integrated
data
Data correction
Linear trend filter
(for snow column)
Snow index
Delete invalid rows
Preprocessing 1
Build classifier - Linear
regression model
Preprocessing 2
Class index
Serializer
Store snow model to
repository
ITMS projekt 26240220060
Build classifier decision tree model
Class index
Serializer
Model name
Store inflow model to
repository
Model name
SVP Scenario
Forecast workflow
ITMS projekt 26240220060
ADMIRE Tools
•
•
•
•
Registry client GUI
Process designer
SKSA
Gateway Process
Manager
• DMI Model Visualizer
ITMS projekt 26240220060
Registry client GUI
• Read-only access to ADMIRE Registry
– list PEs and view their properties
– search, sort PEs
• Write access to Registry is done via DISPEL
documents
ITMS projekt 26240220060
Process Designer
Manage your DMI project
(files, directories –
project structure)
Select elements
from the Registry
View the canonical (DISPEL) representation
View the properties of
of your DMI process in real time
your chosen elements
Edit your DMI process
graphically
ITMS projekt 26240220060
Semantic Knowledge Sharing Assistant
Provides access to existing user’s knowledge, sorting and selecting
it automatically according to the user’s current working context
• Context the user works in
– Several reservoirs, one
settlement
• Knowledge that may be
useful in this context
– previously entered by
other users
ITMS projekt 26240220060
Gateway Process Manager
• Keep track of running
processes
– stop/pause/cancel the
process
– view the process’ source
DISPEL
• access process’ results
(if available) in several
ways – raw or visualized
ITMS projekt 26240220060
DMI Model Visualizer
For data mining experts
• Visualization of data
mining models
– Read Weka classifier
object
– produce PMML
description of the
model
– Show the PMML as a
graphical tree
ITMS projekt 26240220060
Custom Application Portal
for end-users (domain experts)
ITMS projekt 26240220060
Vďaka za pozornosť
ITMS projekt 26240220060