* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PeterBajcsy_SP2Learn_v2 - PRAGMA Cloud/Grid Operation
Survey
Document related concepts
Geographic information system wikipedia , lookup
Theoretical computer science wikipedia , lookup
Generalized linear model wikipedia , lookup
Theoretical ecology wikipedia , lookup
Corecursion wikipedia , lookup
Plateau principle wikipedia , lookup
History of numerical weather prediction wikipedia , lookup
Computer simulation wikipedia , lookup
Numerical weather prediction wikipedia , lookup
Regression analysis wikipedia , lookup
Pattern recognition wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Predictive analytics wikipedia , lookup
General circulation model wikipedia , lookup
Transcript
A FRAMEWORK FOR GEOSPATIAL MODELING FROM SPARSE FIELD MEASUREMENTS USING IMAGE PROCESSING AND MACHINE LEARNING 1Peter Bajcsy, 1Chulyun Kim, 2Jihua Wang and 2Yu-Feng Lin 1National Center for Supercomputing Applications (NCSA) 2Illinois State Water Survey (ISWS) University of Illinois at Urbana-Champaign (UIUC) Outline Introduction Problems Addressed by Spatial Pattern To Learn (SP2Learn) SP2Learn Architecture and Functionality Overview Running SP2Learn Summary 2 Introduction 3 General Problem Compute a set of geo-spatially dense accurate predictions of variables given a set of direct geo-spatially sparse point measurements and auxiliary variables with implicit relationships with respect to the predicted variable Motivation: minimize cost of taking direct point measurements maximize accuracy of predictions and automate discovering relationships among direct field measurements and indirect variables 4 Formulation Input: sets of geo-spatially sparse variables {Vi{pij}} & dense auxiliary variables & a priori tacit knowledge of experts Output: geo-spatially dense (raster) {Ok} Unknown: selection of methods & workflow of operations/methods & parameters of methods & relationships of auxiliary variables w.r.t Ok & quantitative metric of output goodness p2j Interpolations Mathematical models p1j V1 & V2 Auxiliary Variables & Tacit Knowledge O1 5 Applied Problem Recharge and Discharge Rate Prediction Bedrock elevation Water table elevation Discharged Recharged 6 Interdisciplinary Objectives Ground Water (Hydrologic Science) View: Evaluation of Alternative Conceptual (implicit relationships) and Mathematical Models (explicit relationships) Accurate Prediction of Groundwater Recharge and Discharge Rates from Limited Number of Field Measurements Computer Science View: DIALOG Computer-Assisted Learning to Assess Alternative Conceptual and Mathematical Models Optimization of Prediction Models From a Set of Geo-Spatially Sparse Point Measurements 7 State-of-the-Art Results Limited Spatial Resolution and Accuracy Min. Grid: 805mX805m Uniform Grid: 80mX80m Recharge zone Noisy pattern or weak R/D Discharge Recharge Discharge zone 8 Existing Software for Groundwater and Surface Water Modeling MODFLOW is a three-dimensional finite-difference ground-water model PEST - is software for model calibration, parameter estimation and predictive uncertainty analysis http://www.sspa.com/pest/ - freeware (2007); University of Queensland, Australia Precipitation-Runoff Modeling System (PRMS) – is deterministic, distributed-parameter modeling system developed to evaluate the impacts of various combinations of precipitation, climate, and land use on streamflow, sediment yields, and general basin hydrology http://water.usgs.gov/nrp/gwsoftware/modflow2005/modflow2005.html freeware (2005) http://water.usgs.gov/software/prms.html - freeware (1996); USGS Deep Percolation Model (DPM) - facilitates estimation of ground-water recharge under a large range in climatic, landscape, and land-use and land-cover conditions http://pubs.usgs.gov/sir/2006/5318/; USGS 9 Related Work Singh A. et al. “Expert-Driven ‘Perceptive’ Models for Reducing User Fatigue in an Interactive Hydrologic Model Calibration Framework” Conductivity (K) and Hydraulic heads (H) for the hypothetical aquifer 10 Motivation Ground Water (Hydrologic) Science: Currently, there is no single method that could estimate R/D rates and patterns for all practical applications. Therefore, cross analyzing results from various estimation methods and related field information is likely to be superior than using only a single estimation method. Computer Science : It is currently impossible (a) to replace an expert with a lot of tacit domain knowledge by computer algorithms or (b) to learn by an expert new I/O relationships from a plethora of possible variables and an extremely large space of processing methods and their parameters Thus, assisting experts to discover, evaluate and validate new relationships in an iterative way will likely enable (a) better understanding of the underlying phenomena, and (b) more automated and cost-efficient predictions 11 Problems Addressed by Spatial Pattern To Learn 12 Our Approach Data-Driven Analyses to Test Alternative Models, and to Search the Space of Processing Operations and Their Parameters Computer-Assisted Comparisons and Evaluations of Multiple Models and Sub-Optimal Solutions Interpolation methods Mathematical models Image processing algorithms Machine learning algorithms Scalability of algorithms with large size data Model/Solution Representation Closed Loop (Iterative) Workflows Human Computer Interfaces Overall Approach: An Exploration Framework for a Class of Alternative Models/Hypotheses and Optimal 13 Solutions SP2Learn Problem Formulation Given a set of geo-spatially sparse field measurements and auxiliary variables, derive accurate, spatially dense, R/D rate map by (a) using physics-based model (b) incorporating boundary conditions and (c) exploring auxiliary variables representing prior knowledge about R/D patterns but missing in the physics-based model 14 Challenges (1) How to Recognize ‘Meaningful’ Pattern of Predicted Map? (2) How to Quantify the Goodness of the Pattern? Approach: (1a) Recognize patterns by utilizing multiple image enhancement and segmentation techniques applied to R/D rate predictions (1b) Introduce relationship between R/D pattern and auxiliary (a priori reference) information (2a) Define goodness w.r.t. reference information using expert’s selection of ‘meaningful’ relationships (2b) Define goodness w.r.t. reference information using complexity of machine learning 15 Using Physics-Based Model R/D Rate Prediction Field Measurements + + ++ + + + + + ++ + + + + Discharged Recharged Water table elevation + Incoming water Bed rock elevation + Hydraulic conductivity + Q K * A* dh dL Outgoing water Ground water flux=hydraulic conductivity * cell area * gradient of water table elevation (head) over cell distance 16 Incorporating Spatial Boundary Conditions BC: R/D rate prediction could have smooth transitions and recharge & discharge regions (contiguous pixels) should be clearly delineated Approach: Apply Image Restoration and De-noising Techniques Moving average based low pass filter TVL (Total Variation regularized L1-norm function) based filter Morphological operation based filter Using multiple techniques multiple times Discharged Recharged 17 Exploring Auxiliary Variables Driving R/D Patterns Prior Tacit Knowledge about R/D and Auxiliary Variables Soil Type: P(R or D area/Soil=Clay)~low Slope: P(R or D area/ slope=high)~low moving average normalization+TVL Proximity to River: P(R or D area/River is close)~high moving average 18 normalization+TVL From Auxiliary Variables To Knowledge and Accurate R/D Load R/D Map Load Variables Integrate Maps Apply Rules Create Decision Tree Define ROI 19 SP2Learn Output A set of rules that define relationships between predicted (R/D rate) variable and auxiliary variables Modified (more accurate) predictions according to the user selected rules defining relationships of predicted and auxiliary variables Sensitivity analysis results with respect to Methods (interpolations, image enhancement, …) Models Parameters 20 ROI Example Results <RULE ID=138 NUM_OF_CASES=3975 SUPPORT=32.65%> <IF>Elevation is not in {330-344} AND Soil type is in {Rm=Roscommon muck} AND Proximity to water body is not {near_water} AND Slope is in {0-0.9} </IF> <THEN>R/D rate is -0.004,-0.002</THEN> = + 21 SP2Learn Architecture and Functionality 22 Underlying SP2Learn Technology 23 SP2Learn Functionality Overview Load Raster Step Rules Step Attribute Selection Step Integration Step Create Mask Step Apply Rule Step 24 SP2Learn Workflow 25 On-Line Help 26 Software and Test Data Download Download web page of Image Spatial Data Analysis group at NCSA: http://isda.ncsa.uiuc.edu/download/ 27 Running SP2Learn 28 Input Data to SP2Learn Raster files (maps) Predicted R/D rate models Auxiliary variables For mask creation Tables with geo-points Vector files with boundaries Raster files of categorical or continuous variables 29 Image Processing Filtering Methods Low pass (moving average) filters Morphological filters TVL1 (Total Variation regularized L1 function) Using multiple techniques multiple times Parameters Kernel size (row dimension, column dimension) 30 Example Input Maps Low Pass Filter Morphological Closing Kernel = (10,10) Kernel = (10,10) Kernel = (5,5) Kernel = (5,5) Morphological Opening Kernel = (10,10) Kernel = (5,5) 31 Example Auxiliary Maps Slope DEM Soil River Stream 32 Loading Files Load R/D rate models (maps) Load auxiliary maps to explore alternative models Proximity to water Soil type Slope … 33 Mosaic Maps Large spatial coverage – a set of tiles Out-of-core representation 34 Viewing Images Right mouse click Image information Zoom Check boxes Pseudo-color Auto-fit images 35 Registration Integration of all maps (raster images) to a common projection and spatial resolution Before “Convert” After “Convert” 36 Create Mask A C Mask Parameters B Visualization Panel Mask Operations 37 Mask Creation Options in SP2Learn 38 User Defined Mask Creation Set Parameter: User defined Mouse click-anddrag selection of region Click Paint and Show Click Apply 39 Label Editor Assign categorical labels to colors 40 Attribute Selection Output: Predicted Variable Input: Auxiliary Variables Check-boxes Show Table Prune Tree 41 Decision Tree Based Modeling Tree structure can be represented as a set of rules Soil Type is {sand}? no yes Discharge Case A.. yes Recharge Case E.. Distance from river ≤ 100 ft? no Discharge Case J.. 42 Rules from Decision Tree Num: Node number in a decision tree. Support(%): Among all cases satisfying conditions, the ratio of cases having the same class (conclusion). # of cases: The number of cases satisfying conditions Class: Conclusion of a rule Conditions: Conditions of a rule MDL Score: MDL score of a decision tree. The less the score is, the better the tree is 43 Show Decision Tree Show Tree Option 44 Export Rules XML format Export Rules Option 45 Apply Rules Visualization of Modified output variable Changed pixels Magnitude of changes (differences) 46 Summary Novel Frameworks and Methodologies for Exploratory Data-Driven Modeling and Scientific Discoveries Problems addressed in the prototype SP2Learn solution: Prediction accuracy improvement by a combination of mathematical models and datadriven (knowledge based) models, supervised and unsupervised iterative model optimization Better Data Utilization! 47 Extra Information A stack of informatics and cyber-infrastructure software is open source Other software of potential interest: GeoLearn is an exploratory framework for extracting information and knowledge from remote sensing imagery CyberIntegrator to support creation of exploratory workflows, reuse of workflows, remote server execution, data and process provenance tracking and analysis, streaming data support Image Provenance to Learn (IP2Learn) to support decision processes based on visual inspection of images Load Estimation (work in progress) to support optimal sampling of sediment loads using several sediment-discharge rating curves, bias correction factors and Monte Carlo simulations to predict confidence limits Download web page of Image Spatial Data Analysis group at NCSA: http://isda.ncsa.uiuc.edu/download/ 48 Acknowledgement Funding Agencies: Full Time Employees: Peter Bajcsy, Rob Kooper, Sang-Chul Lee, Luigi Marini Students: NASA, NARA, NSF, NIH, NAVY, DARPA, ONR, NCSA Industrial Partners, NCSA Internal, COM UIUC, State of Illinois Shadi Ashnai, Melvin Casares, Miles Johnson, Chulyun Kim, Qi Li, Tim Nee, Arlex Torres, Ryo Kondo, Henrik Lomotan, James Rapp Collaborators: College of Applied Health Sciences UIUC, Kinesiology Dept. UIUC, CEE UIUC, CS UIUC, GISLIS UIUC UIC, UC Berkeley, Univ. of Texas at Austin, Univ. of Iowa ISWS, NARA, Nielsen, State Farm Instituto Tecnológico de Costa Rica, UNESCO-IHE Netherlands 49 Thank you! Questions: Peter Bajcsy [email protected] Need More Details Publications: http://isda.ncsa.uiuc.edu 50 Backup 51