Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Integrácia a spracovanie údajov o životnom prostredí Technológia ADMIRE Ondrej Habala Seminár CRISIS, 18.10.2011 ITMS 26240220060 Goals • Accelerate access to and increase the benefits from data exploitation; • Deliver consistent and easy to use technology for extracting information and knowledge; • Cope with complexity, distribution, change and heterogeneity of services, data, and processes, through abstract view of data mining and integration; and • Provide power to users and developers of data mining and integration processes. ITMS projekt 26240220060 ADMIRE Architecture: Separation of Concerns ITMS projekt 26240220060 ADMIRE Architecture ITMS projekt 26240220060 ADMIRE’s High-Level Architecture ITMS projekt 26240220060 ADMIRE Gateways USMT ITMS projekt 26240220060 DISPEL – Data Intensive Systems Process-Engineering Language • Data-intensive distributed systems • Connection point of complex application requests and complex enactment systems –Benefit: method development, engineering and evolution of supported practices can take place independently in each world • Describes enactment requests for streaming-data workflows processes • “Process-engineering time” – transform and optimize process in preparation for enactment period ITMS projekt 26240220060 DISPEL: Simple Example Creating streams of literals String sql1 = "SELECT * FROM some_table"; String sql2 = “SELECT * FROM table2”; String resource = "128.18.128.255"; SQLQuery query = new SQLQuery; |- sql1, sql2 -| => query.expression; |- resource -| => query.resource; Tee tee = new Tee; query.result => tee.connectInput; Creating connections ITMS projekt 26240220060 DISPEL – real use ITMS projekt 26240220060 APLIKAČNÉ ŠTÚRIE NASADENIE TECHNOLÓGIE ADMIRE V ŽIVOTNOM PROSTREDÍ 18.10.2011 ITMS projekt 26240220060 Flood Application Data sets used in hydrological scenarios Dataset Domain Description Volume Temporal coverage Spatial coverage HUSAV Hydrology Data from two probes, containing water saturation of soil 10s of MB 1998-2007 Two distinct points MARS Meteorology Historical meteorological data (temperature, rainfall, etc) for Slovakia 100s of MB 1975-2007 Slovakia (grid 50x50 km) SVP Hydrology Data from waterworks in western Slovakia 100s of MB (mainly river Váh) – outflows, water levels, temperature, rainfall 1998-2007 15 distinct waterworks DAISY Pedology Various pedological parameters for one probe in southern Slovakia 10s of MB 1961-2000 One point WOFOST Pedology 10s of MB 2006 Slovakia (grid) SHMU_CURR Meteorology Crop data (with attached soil and meteorological data) for Slovakia, year 2006 On-line database of meteorological data – copied from SHMI web; including radar imagery 10s of GB + 2008- Slovakia (about 100 distinct probes) SHMU_HIST Meteorology Historical meteorological data from SHMI probes 100s of MB 1998-2007 Slovakia (more than 100 distinct probes) SHMU_GRIB Meteorology Historical temperatures and rainfall amounts in a gridded binary format 100s of GB 1998-2007 Slovakia (grid, various sizes) RADAR Meteorology Weather radar imagery 100s of GB 2005-2008 Slovakia Hydrology China, August 10-12 Historical dataYantai, from hydrological 10s of MB 1998-2007 Orava and upper11Vah FSKD 2010 SHMU_HYDRO ITMS projekt 26240220060 Orava scenario • Legend – Green area – Orava (part of north Slovakia) – Blue – Orava reservoir and local rivers – Red dots – hydrological measurement stations • Notes – We are interested only on hydrological stations below the Orava reservoir – In our tests we will use the hydrological station 5830 (Tvrdosin) ITMS projekt 26240220060 ORAVA – data mining concept • Predictors – rainfall amount (reservoir and station), air temperature (reservoir and station), reservoir discharge, reservoir temperature • Targets – water level and temperature at a station below the reservoir Time Water temp Rainf all Air temp Air temp Orava Orava Orava Station RainFall Outflow Station Orava Water level Water temp Station Station T-4 E-4 R-4 A-4 B-4 S-4 D-4 X-4 Y-4 T-3 E-3 R-3 A-3 B-3 S-3 D-3 X-3 Y-3 T-2 E-2 R-2 A-2 B-2 S-2 D-2 X-2 Y-2 T-1 E-1 R-1 A-1 B-1 S-1 D-1 X-1 Y-1 T E R A B S D X Y T+1 R+1 A+1 B+1 S+1 D+1 X+1 Y+1 Targets of data mining T+2 R+2 A+2 B+2 S+2 D+2 X+2 Y+2 Given in a schedule T+3 R+3 A+3 B+3 S+3 D+3 X+3 Y+3 T+4 R+4 A+4 B+4 S+4 D+4 X+4 Y+4 T+5 R+5 A+5 B+5 S+5 D+5 X+5 Y+5 T+6 R+6 A+6 B+6 S+6 D+6 X+6 Y+6 Predicted by a meteo model ITMS projekt 26240220060 ORAVA – data integration • Integration of data from – GRIB files – Reservoirs • Inputs – Time period of experiment – Reservoir ID – List of hydro stations – Geo coordinates ITMS projekt 26240220060 ORAVA – data sets Dataset SVP Domain Hydrology SHMU_CURR Meteorology SHMU_HIST Meteorology SHMU_GRIB Meteorology SHMU_HYDR Hydrology O ITMS projekt 26240220060 Description Volume Temporal coverage Data from waterworks in 100s of MB 1998-2007 western Slovakia (mainly river Váh) – outflows, water levels, temperature, rainfall On-line database of 10s of GB + 2008meteorological data – copied from SHMI web; including radar imagery Historical meteorological 100s of MB 1998-2007 data from SHMI probes Historical temperatures and 100s of GB rainfall amounts in a gridded binary format Historical data from 10s of MB hydrological measurement stations 1998-2007 1998-2007 Spatial coverage 15 distinct waterworks Slovakia (about 100 distinct probes) Slovakia (more than 100 distinct probes) Slovakia (grid, various sizes) Orava and upper Vah river ORAVA Scenario Integrated and preprocessed data Integrated raw data Water_temp Air_temp [24 hours] Orava Orava Rainf all Outflow Rainfall Air_temp Flow/Height Water_temp Orava Station Station Station Station 1 30 30 30 30 30 50 50 -5.55E-20 -5.55E-20 -4.24E-20 -8.47E-20 -8.47E-20 -8.47E-20 -8.47E-20 269.0278 269.0476 269.5059 270.2394 270.8507 271.2792 271.9238 28 28.62 28.62 28.62 28 28 28 0.7 0.7 0.7 0.7 0.7 0.7 0.8 Time [hours] Orava -4 -4 -5 -5 -5 -3 -3 Water_temp Air_temp Rainfall Outflow Rainfall Air_temp Flow/Height Water_temp Orava Orava Orava Orava Station Station Station Station 1.000000 1.000000 0.995833 0.991667 0.987500 0.983333 0.979167 ITMS projekt 26240220060 -4.0 -4.0 -5.0 -5.0 -5.0 -3.0 -3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 30.0 30.0 30.0 30.0 30.0 50.0 50.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -3.12223 -3.10240 -2.64408 -1.91062 -1.29926 -0.87076 -0.22617 28.00 28.62 28.62 28.62 28.00 28.00 28.00 0.7 0.7 0.7 0.7 0.7 0.7 0.8 Time [hours] Integrated preprocessed data Orava Scenario Water temperature prediction Properties \ Model Correlation coefficient Linear regression 0.9639 Multilayer perceptron 0.9821 Mean absolute error 1.1791 0.7748 Root mean squared error 1.4607 1.0386 23.8739 % 15.6884 % 26.609 % 18.9195 % 8760 8760 Relative absolute error Root relative squared error Total Number of Instances ITMS projekt 26240220060 Orava Scenario Water level prediction Properties \ Model Correlation coefficient 0.9816 Mean absolute error 0.4105 Root mean squared error 0.9673 Relative absolute error 30.5869 % Root relative squared error 19.2384 % Total Number of Instances ITMS projekt 26240220060 Multilayer perceptron 8735 Orava Scenario Data integration workflow ITMS projekt 26240220060 Orava Scenario Training workflow ITMS projekt 26240220060 Orava Scenario Prediction workflow ITMS projekt 26240220060 Implementation Notes • Needed to write custom activities for certain data extraction tasks • Data integration was the most complex part of the scenario in terms of workflow design • Data integration was quite easy to write and modify in DISPEL once we had all the PEs in place – Used composite PE to extract different types of quantities from meteorological GRIB files ITMS projekt 26240220060 ADMIRE Architecture: Separation of Concerns ITMS projekt 26240220060 Orava Scenario Portal ITMS projekt 26240220060 Orava Scenario Portal ITMS projekt 26240220060 Radar Scenario Very short-term rainfall prediction from weather radar data Radar Scenario Description • Very short-term rainfall prediction from weather radar data Movement of areas with higher air moisture content, and thus also higher precipitation potential Network of synoptic stations in Slovakia • 27 stations in Slovakia • Used data from years 2007 and 2008 • Available variables: rainfall, humidity, Radar reflexivity, atmospheric pressure and temperature values for each hour ITMS projekt 26240220060 Radar Scenario Main predictors and target variables Time Wind Radar reflexivity Rainfall Orava T-2 W-2 D-2 F-2 T-1 W-1 D-1 F-1 T W D F T+1 W+1 D+1 F+1 T+2 W+2 D+2 F+2 ITMS projekt 26240220060 • Overview of the main predictors and target variables in the Radar scenario. • The green cells are predicted from meteo-model. Blue cells are from model, based on motions vectors. Yellow cells are final target of data mining. Radar Scenario Atributes of model • Isotonic regression model • 10-foldNumerical Cross Validation characteristic Value 0.4593 0.1105 0.5490 89 746 Correlation coefficient Mean absolute error Root mean squared error Total number of instances • Hydro-meteorological performance Attribute \ Threshold 0.3 mm 0.6 mm Probability of detection Miss Rate Hans-Kuiper True skill score Proportion of correct ITMS projekt 26240220060 0.6387 0.0185 0.5987 0.9443 0.5622 0.0158 0.5383 0.9618 RADAR model • Other tested models – Neural networks, SMOreg, linear regression, ... – Reached correlation coeficient between 0,35 and 0,42 – Validation - 10 Cross Fold • Problems in model creation : – process is significantly stochastic – Some input variables/parameters (humidity) are backwards dependent on output – rainfall. – Meteorological process is very sensitive – Reflection matrix represents quantity of water in atmosphere, not exact rainfall rate in specified area, as opposed to data from synoptic stations ITMS projekt 26240220060 Radar Scenario Start Step SelectRangeFiles Training Expression End Forecast Filename RadarDataTime Synchronization End Host ReadRaw RadarData Load model ObtainFromFTP RadarDataTime Synchronization Deserializer RadarDataSpace Synchronization RadarDataSpace Synchronization Step SelectRadarFiles ReadRaw RadarData Resource Precipitation SQL Query Start Expression Resource Rainfall data (SQL Query) Expression Tuple Aritmetic Project Tuple aritmetic project Column names Generic Tuple Transform Classify Algorithm class BuildClassifier Generic Tuple Transform Tuple Simple Merge Class index TupleToCSV Serialiser Host DeliverToFTP Result Filename ITMS projekt 26240220060 Header Radar Scenario Motion vector computation file name resource file name Read From File resource file name Read From File resource Read From File ImageMotion Vector Radar Image Motion RadarImage Visualization file name RadarImage Visualization file name DeliverToFTP host ITMS projekt 26240220060 RadarImage Visualization file name DeliverToFTP host DeliverToFTP host SVP Scenario Forecast of reservoir inflow based on temperature, precipitation and snow cover SVP Scenario Structure of data Time Air Temperature Rainfall Orava Snow_prev Snow Inflow_prev t-1 E(t-1) R(t-1) t E(t) R(t) P(t) S(t) I(t) F(t) t+1 E(t+1) R(t+1) P(t+1) S(t+1) I(t+1) F(t+1) t+2 E(t+2) R(t+2) P(t+2) S(t+2) I(t+2) F(t+2) t+3 E(t+3) R(t+3) P(t+3) S(t+3) I(t+3) F(t+3) t+4 E(t+4) R(t+4) P(t+4) S(t+4) I(t+4) F(t+4) Two steps of prediction : S(t-1) Inflow F(t-1) 1. Copy previous values of snow quantity and inflow 1. P(t) = S(t-1) I(t) = F(t-1) volume. 2. S(t) = f(P(t), R(t), E(t)) 2. Apply trained models (snow model at first, and F(t) = h(I(t), S(t), E(t), R(t)) then inflow model). ITMS projekt 26240220060 SVP Scenario Models & Attributes • 10-Fold Cross Validation, 8760 records; models for inflow prediction Properties \ Model Correlation coefficient Mean absolute error Root mean squared error Relative absolute error Root relative squared error Perceptron Neural Network 0.8810 7.0577 14.1005 40.5821% 48.6547% Gaussian Process 0.8469 6.9821 15.4974 40.1472% 53.4747% Linear Regression Decision Tree M5P 0.8079 8.3816 17.0586 48.1942% 58.8616% 0.8899 5.2562 13.1983 30.2231% 45.5415% • N-Fold Cross Validation, 8760 records; Decision Tree Model M5P Properties \ N-Fold Correlation coefficient N = 10 0.8899 N = 20 0.8933 N = 25 0.8855 N = 50 0.8937 N = 100 0.8934 5.2562 5.1253 5.2484 5.0973 5.0908 13.1983 13.0090 13.4454 12.9807 13.0033 Relative absolute error 30.2231% 29.4869% 30.2017% 29.3317% 29.2915% Root relative squared error 45.5415% 44.9218% 46.4373% 44.8306% 44.9086% Mean absolute error Root mean squared error ITMS projekt 26240220060 SVP Scenario Data Integration workflow Query Resource Inflow into reservoir (SQL Query) Query Resource Query Resource Temperature and rainfall at reservoir (SQL Query) Quantity of snow (SQL Query) Daily Aggregation Tuple merge Tuple merge Expression Final projection (TupleAritmeticProject) Result col. names Eliminate summer seasons (GenericTupleTransform) Transform to WRS (TupleToWRS) Integrated data ITMS projekt 26240220060 SVP Scenario Model training workflow Integrated data Data correction Linear trend filter (for snow column) Snow index Delete invalid rows Preprocessing 1 Build classifier - Linear regression model Preprocessing 2 Class index Serializer Store snow model to repository ITMS projekt 26240220060 Build classifier decision tree model Class index Serializer Model name Store inflow model to repository Model name SVP Scenario Forecast workflow ITMS projekt 26240220060 ADMIRE Tools • • • • Registry client GUI Process designer SKSA Gateway Process Manager • DMI Model Visualizer ITMS projekt 26240220060 Registry client GUI • Read-only access to ADMIRE Registry – list PEs and view their properties – search, sort PEs • Write access to Registry is done via DISPEL documents ITMS projekt 26240220060 Process Designer Manage your DMI project (files, directories – project structure) Select elements from the Registry View the canonical (DISPEL) representation View the properties of of your DMI process in real time your chosen elements Edit your DMI process graphically ITMS projekt 26240220060 Semantic Knowledge Sharing Assistant Provides access to existing user’s knowledge, sorting and selecting it automatically according to the user’s current working context • Context the user works in – Several reservoirs, one settlement • Knowledge that may be useful in this context – previously entered by other users ITMS projekt 26240220060 Gateway Process Manager • Keep track of running processes – stop/pause/cancel the process – view the process’ source DISPEL • access process’ results (if available) in several ways – raw or visualized ITMS projekt 26240220060 DMI Model Visualizer For data mining experts • Visualization of data mining models – Read Weka classifier object – produce PMML description of the model – Show the PMML as a graphical tree ITMS projekt 26240220060 Custom Application Portal for end-users (domain experts) ITMS projekt 26240220060 Vďaka za pozornosť ITMS projekt 26240220060