Download PeterBajcsy_SP2Learn_v2 - PRAGMA Cloud/Grid Operation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Geographic information system wikipedia , lookup

Theoretical computer science wikipedia , lookup

Generalized linear model wikipedia , lookup

Theoretical ecology wikipedia , lookup

Corecursion wikipedia , lookup

Plateau principle wikipedia , lookup

History of numerical weather prediction wikipedia , lookup

Computer simulation wikipedia , lookup

Numerical weather prediction wikipedia , lookup

Regression analysis wikipedia , lookup

Pattern recognition wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Predictive analytics wikipedia , lookup

General circulation model wikipedia , lookup

SahysMod wikipedia , lookup

Data assimilation wikipedia , lookup

Atmospheric model wikipedia , lookup

Transcript
A FRAMEWORK FOR
GEOSPATIAL MODELING FROM
SPARSE FIELD MEASUREMENTS
USING IMAGE PROCESSING
AND MACHINE LEARNING
1Peter
Bajcsy, 1Chulyun Kim, 2Jihua Wang and 2Yu-Feng Lin
1National
Center for Supercomputing Applications (NCSA)
2Illinois State Water Survey (ISWS)
University of Illinois at Urbana-Champaign (UIUC)
Outline





Introduction
Problems Addressed by Spatial Pattern To
Learn (SP2Learn)
SP2Learn Architecture and Functionality
Overview
Running SP2Learn
Summary
2
Introduction
3
General Problem

Compute a set of geo-spatially dense accurate
predictions of variables



given a set of direct geo-spatially sparse point
measurements and
auxiliary variables with implicit relationships with
respect to the predicted variable
Motivation:



minimize cost of taking direct point measurements
maximize accuracy of predictions and
automate discovering relationships among direct
field measurements and indirect variables
4
Formulation



Input: sets of geo-spatially sparse variables {Vi{pij}} & dense auxiliary
variables & a priori tacit knowledge of experts
Output: geo-spatially dense (raster) {Ok}
Unknown: selection of methods & workflow of operations/methods &
parameters of methods & relationships of auxiliary variables w.r.t Ok &
quantitative metric of output goodness
p2j
Interpolations
Mathematical models
p1j
V1 & V2
Auxiliary Variables & Tacit Knowledge
O1
5
Applied Problem
Recharge and
Discharge
Rate Prediction
Bedrock elevation
Water table elevation
Discharged
Recharged
6
Interdisciplinary Objectives

Ground Water (Hydrologic Science) View:



Evaluation of Alternative Conceptual (implicit
relationships) and Mathematical Models
(explicit relationships)
Accurate Prediction of Groundwater Recharge
and Discharge Rates from Limited Number of
Field Measurements
Computer Science View:

DIALOG

Computer-Assisted Learning to Assess
Alternative Conceptual and Mathematical
Models
Optimization of Prediction Models From a Set
of Geo-Spatially Sparse Point Measurements
7
State-of-the-Art Results

Limited Spatial Resolution and Accuracy
Min. Grid:
805mX805m
Uniform Grid:
80mX80m
Recharge zone
Noisy pattern or weak R/D
Discharge
Recharge
Discharge zone
8
Existing Software for Groundwater
and Surface Water Modeling

MODFLOW is a three-dimensional finite-difference ground-water model


PEST - is software for model calibration, parameter estimation and
predictive uncertainty analysis


http://www.sspa.com/pest/ - freeware (2007); University of Queensland,
Australia
Precipitation-Runoff Modeling System (PRMS) – is deterministic,
distributed-parameter modeling system developed to evaluate the impacts
of various combinations of precipitation, climate, and land use on
streamflow, sediment yields, and general basin hydrology


http://water.usgs.gov/nrp/gwsoftware/modflow2005/modflow2005.html freeware (2005)
http://water.usgs.gov/software/prms.html - freeware (1996); USGS
Deep Percolation Model (DPM) - facilitates estimation of ground-water
recharge under a large range in climatic, landscape, and land-use and
land-cover conditions

http://pubs.usgs.gov/sir/2006/5318/; USGS
9
Related Work

Singh A. et al.
“Expert-Driven
‘Perceptive’ Models
for Reducing User
Fatigue in an
Interactive
Hydrologic Model
Calibration
Framework”
Conductivity (K) and Hydraulic heads
(H) for the hypothetical aquifer
10
Motivation

Ground Water (Hydrologic) Science:



Currently, there is no single method that could estimate R/D
rates and patterns for all practical applications.
Therefore, cross analyzing results from various estimation
methods and related field information is likely to be superior
than using only a single estimation method.
Computer Science :

It is currently impossible



(a) to replace an expert with a lot of tacit domain knowledge by
computer algorithms or
(b) to learn by an expert new I/O relationships from a plethora of
possible variables and an extremely large space of processing
methods and their parameters
Thus, assisting experts to discover, evaluate and validate new
relationships in an iterative way will likely enable


(a) better understanding of the underlying phenomena, and
(b) more automated and cost-efficient predictions
11
Problems Addressed by Spatial
Pattern To Learn
12
Our Approach

Data-Driven Analyses to Test Alternative Models,
and to Search the Space of Processing Operations
and Their Parameters






Computer-Assisted Comparisons and Evaluations of
Multiple Models and Sub-Optimal Solutions




Interpolation methods
Mathematical models
Image processing algorithms
Machine learning algorithms
Scalability of algorithms with large size data
Model/Solution Representation
Closed Loop (Iterative) Workflows
Human Computer Interfaces
Overall Approach: An Exploration Framework for a
Class of Alternative Models/Hypotheses and Optimal
13
Solutions
SP2Learn Problem Formulation




Given a set of geo-spatially sparse field
measurements and auxiliary variables, derive
accurate, spatially dense, R/D rate map by
(a) using physics-based model
(b) incorporating boundary conditions and
(c) exploring auxiliary variables representing prior
knowledge about R/D patterns but missing in the
physics-based model
14
Challenges


(1) How to Recognize ‘Meaningful’ Pattern of Predicted
Map?
(2) How to Quantify the Goodness of the Pattern?
Approach:
 (1a) Recognize patterns by utilizing multiple image
enhancement and segmentation techniques applied to
R/D rate predictions
 (1b) Introduce relationship between R/D pattern and
auxiliary (a priori reference) information
 (2a) Define goodness w.r.t. reference information using
expert’s selection of ‘meaningful’ relationships
 (2b) Define goodness w.r.t. reference information using
complexity of machine learning
15
Using Physics-Based Model
R/D Rate Prediction
Field Measurements
+ + ++ +
+ +
+
+
++ + +
+
+
Discharged Recharged
Water table
elevation +
Incoming water
Bed rock
elevation +
Hydraulic
conductivity +
Q  K * A*
dh
dL
Outgoing water
Ground water flux=hydraulic conductivity * cell
area * gradient of water table elevation (head)
over cell distance
16
Incorporating Spatial
Boundary Conditions


BC: R/D rate prediction could have smooth transitions
and recharge & discharge regions (contiguous pixels)
should be clearly delineated
Approach: Apply Image Restoration and De-noising
Techniques




Moving average based low pass filter
TVL (Total Variation regularized L1-norm function) based filter
Morphological operation based filter
Using multiple techniques multiple times
Discharged Recharged
17
Exploring Auxiliary Variables
Driving R/D Patterns
Prior Tacit Knowledge about R/D and Auxiliary Variables
Soil Type: P(R or D
area/Soil=Clay)~low

Slope: P(R or D area/
slope=high)~low

moving average normalization+TVL
Proximity to River: P(R or D
area/River is close)~high

moving average
18
normalization+TVL
From Auxiliary Variables To
Knowledge and Accurate R/D
Load R/D Map
Load Variables
Integrate Maps
Apply Rules
Create
Decision Tree
Define ROI
19
SP2Learn Output



A set of rules that define relationships between
predicted (R/D rate) variable and auxiliary
variables
Modified (more accurate) predictions according
to the user selected rules defining relationships
of predicted and auxiliary variables
Sensitivity analysis results with respect to



Methods (interpolations, image enhancement, …)
Models
Parameters
20
ROI
Example Results

<RULE ID=138 NUM_OF_CASES=3975 SUPPORT=32.65%>





<IF>Elevation is not in {330-344} AND
Soil type is in {Rm=Roscommon muck} AND
Proximity to water body is not {near_water} AND
Slope is in {0-0.9} </IF>
<THEN>R/D rate is -0.004,-0.002</THEN>
=
+
21
SP2Learn Architecture and
Functionality
22
Underlying SP2Learn Technology
23
SP2Learn Functionality Overview
Load Raster
Step
Rules
Step
Attribute
Selection
Step
Integration
Step
Create Mask
Step
Apply Rule
Step
24
SP2Learn Workflow
25
On-Line Help
26
Software and Test Data
Download

Download web page of Image Spatial
Data Analysis group at NCSA:
http://isda.ncsa.uiuc.edu/download/
27
Running SP2Learn
28
Input Data to SP2Learn

Raster files (maps)



Predicted R/D rate models
Auxiliary variables
For mask creation



Tables with geo-points
Vector files with boundaries
Raster files of categorical or continuous
variables
29
Image Processing

Filtering Methods





Low pass (moving average) filters
Morphological filters
TVL1 (Total Variation regularized L1 function)
Using multiple techniques multiple times
Parameters

Kernel size (row dimension, column dimension)
30
Example Input Maps
Low Pass Filter
Morphological Closing
Kernel = (10,10)
Kernel = (10,10)
Kernel = (5,5)
Kernel = (5,5)
Morphological Opening
Kernel = (10,10)
Kernel = (5,5)
31
Example Auxiliary Maps




Slope
DEM
Soil
River Stream
32
Loading Files


Load R/D rate models (maps)
Load auxiliary maps to
explore alternative models




Proximity to water
Soil type
Slope
…
33
Mosaic Maps


Large spatial coverage – a set of tiles
Out-of-core representation
34
Viewing Images

Right mouse click



Image information
Zoom
Check boxes


Pseudo-color
Auto-fit images
35
Registration

Integration of all maps (raster images)
to a common projection and spatial
resolution
Before “Convert”
After “Convert”
36
Create Mask
A
C
Mask
Parameters
B
Visualization
Panel
Mask
Operations
37
Mask Creation Options in SP2Learn
38
User Defined Mask Creation




Set Parameter:
User defined
Mouse click-anddrag selection of
region
Click Paint and
Show
Click Apply
39
Label Editor

Assign categorical labels to colors
40
Attribute Selection



Output:
Predicted
Variable
Input: Auxiliary
Variables
Check-boxes


Show Table
Prune Tree
41
Decision Tree Based Modeling

Tree structure can be represented as a set of
rules
Soil Type is {sand}?
no
yes
Discharge
Case A..
yes
Recharge
Case E..
Distance from river ≤ 100 ft?
no
Discharge
Case J..
42
Rules from Decision Tree






Num: Node number in a
decision tree.
Support(%): Among all
cases satisfying
conditions, the ratio of
cases having the same
class (conclusion).
# of cases: The number
of cases satisfying
conditions
Class: Conclusion of a
rule
Conditions: Conditions
of a rule
MDL Score: MDL score
of a decision tree. The
less the score is, the
better the tree is
43
Show Decision Tree
Show Tree Option
44
Export Rules

XML format
Export Rules Option
45
Apply Rules

Visualization of



Modified output variable
Changed pixels
Magnitude of changes (differences)
46
Summary


Novel Frameworks and Methodologies for
Exploratory Data-Driven Modeling and
Scientific Discoveries
Problems addressed in the prototype
SP2Learn solution:


Prediction accuracy improvement by a
combination of mathematical models and datadriven (knowledge based) models, supervised
and unsupervised iterative model optimization
Better Data Utilization!
47
Extra Information

A stack of informatics and cyber-infrastructure software is
open source

Other software of potential interest:





GeoLearn is an exploratory framework for extracting information and
knowledge from remote sensing imagery
CyberIntegrator to support creation of exploratory workflows, reuse of
workflows, remote server execution, data and process provenance
tracking and analysis, streaming data support
Image Provenance to Learn (IP2Learn) to support decision processes
based on visual inspection of images
Load Estimation (work in progress) to support optimal sampling of
sediment loads using several sediment-discharge rating curves, bias
correction factors and Monte Carlo simulations to predict confidence
limits
Download web page of Image Spatial Data Analysis group
at NCSA: http://isda.ncsa.uiuc.edu/download/
48
Acknowledgement

Funding Agencies:


Full Time Employees:


Peter Bajcsy, Rob Kooper, Sang-Chul Lee, Luigi Marini
Students:


NASA, NARA, NSF, NIH, NAVY, DARPA, ONR, NCSA Industrial
Partners, NCSA Internal, COM UIUC, State of Illinois
Shadi Ashnai, Melvin Casares, Miles Johnson, Chulyun Kim, Qi Li,
Tim Nee, Arlex Torres, Ryo Kondo, Henrik Lomotan, James Rapp
Collaborators:




College of Applied Health Sciences UIUC, Kinesiology Dept. UIUC,
CEE UIUC, CS UIUC, GISLIS UIUC
UIC, UC Berkeley, Univ. of Texas at Austin, Univ. of Iowa
ISWS, NARA, Nielsen, State Farm
Instituto Tecnológico de Costa Rica, UNESCO-IHE Netherlands
49
Thank you!

Questions:


Peter Bajcsy [email protected]
Need More Details

Publications: http://isda.ncsa.uiuc.edu
50
Backup
51