Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Vision for the 21st Century Information Environment in Ecology (Ecoinformatics) Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer Center Data Types Ecological Metadata Language (EML) ====== Field data Small Complex formats Heterogeneous Imagery Massive Simple formats Continuous spatial Spatial Data Workbench: SEEK: large ITR project Small NPACI project Ground sensors If georeferenced Massive GIS Simple formats Moderately large Continuous temporal Complex formats Wireless Sensor Workshop NEON Observatories: question driven data collection Analytical Domains: Information Acquisition, Archival & Retrieval Data Preprocessing & Product Creation Integrated Data Analysis & Synthesis Inference From Pattern Information Technologies: Hardware, networks Semantic mediation Electronic notebooks Data mining Processing Pipelines Remote Sensing Exploratory spatial High-throughput Wireless Sensors data analysis processing Metadata Pattern matching Expert systems Databases & Query Visualization Web design Grid technologies EML Spatial Data Workbench Wireless Sensors SEEK Workflows Computational Models Genetic algorithms Cellular automata Adaptive agents, et al. Characteristics of Ecological Data High Satellite Images Wireless Sensors GIS Weather Stations Business Data Data Volume (per dataset) SEEK Primary Productivity Gene Sequences Biodiversity Surveys Population Data Soil Cores Low High Complexity/Metadata Requirements Modified from B. Michener Field Data: Semantics Date 10/1/1993 10/3/1994 10/1/1993 Site N654 N654 N654 Species PIRU PIRU BEPA Date 10/1/1993 Site N654 10/3/1994 N654 10/1/1993 N654 10/31/1993 1 10/31/1993 1 11/14/1994 1 11/14/1994 1 Area 2 2 1 Count 26 29 3 Species Picea rubens Picea rubens Betula papyifera Picea rubens Betula papyifera Picea rubens Betula papyifera Date 31Oct1993 14Nov1994 Site 1 1 picrub 13.5 8.4 betpap 1.6 1.8 Density 13 14.5 3 13.5 1.6 8.4 1.8 Modified from B. Michener, 2003 Remotely Sensed & Ground Data Remotely sensed Satellite Landsat since 1972 (multispectral) Ikonos (hyperspatial) Hyperion (hyperspectral) Airborne Air photos (historical reconnaisance) Radar Thermal ADAR (multispectral) Aviris (hyperspectral) Ground data Field data Automated sensors Wireless sensors Target Remotely sensed images capture information continuous space, which can then be compared through time to derive events Event t=2 t=1 Wireless sensors capture information at a continuous time, which can then be compared through space to derive spatial patterns Event A Event A Event A t t t History Repeats Itself… “…use of remotely sensed data…lagged for many years. The reasons for this have little to do with the sophistication of remote sensing technology. Rather it has to do more with the ability to store, manage, access and use the massive data produced by satellites, radar facilities and other remote sensing instruments. Without advanced information processing, it would take decades to compile and analyze the incredible amounts of information that produced by many of these instruments.” -Dr. Rita Colwell, Director NSF, 1998 Environmental Cyberinfrastructure Needs for Distributed Sensor Networks: a Report from a NSF Sponsored Workshop (2003) Sensors Deployed Sensor Networks Metadata Security and Error Resiliency Cyberinfrastructure for Sensor Networks Analysis and Visualization Education Outreach Collaboration and Partnering Data Integrated Data Preprocessing Analysis & & Product Creation Synthesis Information Acquisition, Archival & Retrieval Inference From Pattern Incorporating IT Analytical Advances into Ecology Grid Technologies Knowledge Representation, Semantics and Ontologies The Semantic Web Extend the current web with “knowledge” and “meaning” for Better searching (that is, better answers to current searches) Automated software tools that process web information (comparison shopping, making appointments, and so on) Proposes a new form of web content, which uses ontologies and knowledge representation techniques The Semantic Web [Sci. Am., May ‘01, Berners-Lee] “Mom needs to see a specialist for a series of physical therapy sessions – can you take her?” Find physical therapist for mom using my schedule get openings get physician prescription Semantic-Web Agent get possible providers and availability Return provider available within 10 miles of location get locations Semantic Web Architecture (RDF) The Resource Description Framework (RDF), which is a language to: Define standard ontologies Annotate web-pages with Semantic-Web content Ultimately, tools … to exploit semantic mark up Web-crawlers, search engines, personal agents RDF / RDF Schema Insurance Provider covers Physican worksAt Medical Facility locatedAt Location Physical Therapist An RDF Schema (or OWL) ontology Serves as a common set of terms (a vocabulary) with relationships and constraints Can be published as Web-content using RDF (for others to use) RDF / RDF Schema Insurance Provider covers Physican worksAt Medical Facility worksAt University Hospital locatedAt Location Physical Therapist BlueCross covers Dr. Hartman With RDF, this Web-page can be annotated using the ontology locatedAt 555 Univ. Drive … RDF / RDF Schema Which Physical Insurance Provider covers Physican Medical Facility worksAt University Hospital Physical Therapist BlueCross covers Dr. Hartman locatedAt Therapists workAt Location a Facility within Location X? worksAt locatedAt Annotations provide access to the meaningful, or semantic content of the Web-page 555 Univ. Drive … SEEK and the Semantic Web We want to build technology using Semantic-Web standards to … … explore the use of semantics to help scientists deal with heterogeneity Define standard ecological ontologies Automate dataset and analytic-step discovery, exchange, and integration Help researchers construct and reuse scientific workflows, for example, for ecological modeling 1. 2. 3. 4. 5. SEEK EcoGrid Question of interest Query EcoGrid for workflows (ontologies) Query EcoGrid for data (ontologies & semantic mediation) SRB optimizes and runs analysis Get results…archive to EcoGrid Working Groups: 1. EcoGrid 2. Semantic mediation & KR 3. Analysis & Modeling 4. Taxon 5. BEAM 6. EOT Pipeline 60 Gigabits/second Resources (data & computational) Managed by Storage Resource Broker (SRB) Pipeline EcoGrid Analytical Services Storage Resource Broker Data Services (includes analytical libraries) 1. Node Registry • Web service: XML standards, SOAP/WSDL protocols • Data: REQUIRES standard metadata (EML and others) • Workflows: standard workflow metadata? Matt Jones, 2003 SEEK Components Overview of architecture AM: Analysis and Modeling System Analytical Pipeline (AP) AS x TS1 ASy ASz Example of “AP 0” TS2 ASr etc. Parameters w/ Semantics Data Binding SM: Semantic Mediation System Logic Rules Semantic Mediation Engine WSDL/UDDI EG: EcoGrid ASr AP 0 Invasive species over time Library of Analysis Steps, Pipelines & Results WSDL/UDDI C ECO2 C W S D L / U D D I Query Processing ECO2-CL Parameter Ontologies & Taxonomies Execution Environment: SAS. MATLAB, etc. C ECO2 C C MC EML Darw Wrap SRB KNB Species … C TaxOn Raw data sets wrapped for integration w/ EML, etc. Benefits to Users Scientists Access to high end computing technologies Better integration of all relevant data Workflow standardization and analysis Time and resource efficiency Reusable analytical steps & workflows Students Improved access to knowledge base Environmental Managers Accessibility to current scientific approach Policy makers Timely input to decision making Formal documentation of methods (output in report format) Reproducibility of methods Visual creation and communication of methods Versioning Automated data typing and transformation SEEK: ENM workflows EcoGrid DataBase Species pres. & abs. points Species pres. & abs. points Test sample +A2 +A3 EcoGrid Query Physical Transformation Sample Data EcoGrid DataBase Training sample GARP rule set Data Calculation Validation GARP rule set Integrated layers Env. layers EcoGrid DataBase EcoGrid Query EcoGrid DataBase Model quality parameters +A1 Integrated layers Layer Integration Native range prediction map Map Generation User Selected prediction maps Scaling Archive To Ecogrid Generate Metadata Analytical Pipelines Sloan Digital Sky Project: Mapping the Universe “The raw data…are fed through data analysis software pipelines…to extract about 400 attributes for each celestial object…These pipelines embody much of mankind’s knowledge of astronomy.” Szalay et al., 2001 Species Distribution Pipeline Species pres. & abs. points Acoustic Signal Processing Pipeline Species pres. & abs. points Test sample +A2 +A3 Physical Transformation Model quality parameters +A1 Sample Data Training sample GARP rule set Data Calculation Validation GARP rule set Integrated layers Image Processing Pipeline EcoGrid Query Interpolation Pipeline Env. layers Integrated layers Layer Integration Native range prediction map Map Generation User Selected prediction maps Scaling Remotely sensed data (land cover class, etc.) Archive To Ecogrid Ground sensor data (climate, etc.) Generate Metadata Analytical Pipelines: SDW SRB/ MCAT Radiometric Corrections Maps HPSS @ SDSC Remotely Sensed Imagery Georegistration Band Indices Data Transformation Site Field Observations Ground truth Climate Supervised Classification Band Selection Unsupervised Classification Segmentation Land Cover (Patch) Metrics Climate/Land Cover Integrated Graphics Exploratory analysis Vegetation patterns Vegetation dynamics Model parameterization Brain atlas Registration Template Distance Transforms Prototypes Grey value images Statistical Classification Segmented images Biomedical Informatics Research Network Kikinis et al., 2001 T. Kapur, et al., 1998; Tina Kapur, 1999. Surgical Planning Laboratory, 2001 Society for Industrial and Applied Mathematics (SIAM) Conference on Imaging Science, 2004 CONFERENCE THEMES Image acquisition Image reconstruction and restoration Image storage, compression, and retrieval Image coding and transmission PDEs in image filtering and processing Image registration and warping Image modeling and analysis Statistical aspects of imaging Wavelets and multiscale analysis Multidimensional imaging sciences Inverse problems in imaging sciences Mathematics of visualization Biomedical imaging Applications “By their very nature, these challenges cut across the disciplines of physics, engineering, mathematics, biology, medicine, and statistics.” Why not ecology and environmental science? Ontologies Astrophysics Ontology Ecology Ontology •Landscape Ecology •Land Managers •Soil science •Etc. Generic Image/Signal Ontologies Biomedical Ontology Digital Film Ontology And many others… Landscape Ecology Example Generic Image Ontologies Structural Ontologies Method Ontologies Pixel calc Classification Segmentation Domain Ontologies Patch metrics Atm Corr Land cover class Patch ID Physical Ontologies Modified from Camara et al. (2001) TM EMR 7 bands HDF Place/date Calibrations So far…. Grid Technology EcoGrid vs semantic web Analytical pipelines/Workflows Sensors: generic vs domain specific Reuse of actors/workflows Workflow metadata and reporting Ontologies/Semantic Mediation Query EcoGrid for workflows Query EcoGrid for data to fit the selected workflow(s) Integration of heterogenous data types Exploratory Data Analysis Data Mining -finding interesting patterns Visualization -showing interesting patterns NDVI at Sevilleta 1989 90 91 92 93 94 95 96 97 98 99 00 01 2002 TM AVHRR MODIS AVHRR: 1 x 1 km pixels, 14 years * 26 images/year * 1824 pixels = 663,936 data points TM: 30 x 30m pixels, 14 years * 2 images/year * 65,260 pixels = 1,827,280 data points if 20 images/year => 18,272,800 data points if 30 years => 39,156,000 data points Spatiotemporal Analysis & Vis: Drought Effects 1999 2000 2001 2002 July 16-29 July 30-12 Aug 13-26 Aug 27-9 Sep 10-23 6 4 2 0 of all cells Sum Spatiotemporal Analysis & Vis: 1989 Drought 90 91 92 93 94 Effects 95 96 97 98 99 00 01 2002 Year Spring percentile 5 198 199 199 199 199 199 199 199 199 199 199 200 200 200 9 0 1 2 3 4 5 6 7 8 9 0 1 2 April 23 - October 8 Drought-Effects Number of cells with significantly low productivity compared with historic range (5%) Summer/Fall 160 160 140 North South 120 100 group Count N S 80 60 40 20 00 9 10 11 12 15 17 19 20 21 9 14 15 16 19 9 15 16 17 10 12 13 14 16 17 16 17 18 19 22 9 11 12 11 19 12 14 18 19 21 9 12 13 14 15 16 17 18 19 20 21 22 9 10 11 12 13 14 15 16 17 18 19 C Sum of count S 1989 F 1989 S F 19911993 SF F S1994 F 1990 90 9193 94 S = Spring F = Summer/Fall F 1995 S SF 2000 S F S 1996 1999 period 95year96 99 00 Year 2001 01 F S 2002 F 2002 Linking and Brushing Visualization : Investigating cancer incidence and risk factors. From GeoVista Studio, Penn State University. Hyperspectral Imagery = 224 bands AVIRIS hyperspectral data cube > 50 gigabytes of raw data per acquisition Hyperspectral Example Pavement True Color Riparian Clouds Agriculture False Color River 300 pixels 6 km 300 pixels * 300 pixels * 224 bands = 20,160,000 data points Arid Upland Training Samples Testing Samples Legend Limited Set Label Error Land Cover Class Full Set Clouds River Riparian Arid Upland Semi-arid Upland Pavement Agriculture Barren Limited Set: 192 training pixels, 7 mislabeled, out of 90,000 total pixels *low % training pixels *errors in training set Supervised Classifiers Class 1 Support Vector Machine Hyperplane Class Means Band 2 x Class 2 Band 1 x Pixel to be classified Probability Contours Euclidean Distance Limited Sample Set A) ML 89.4% C) SVM 77.2% B) NBN 83.3% D) MD 69.4% Clouds River Riparian Agriculture Arid Upland Barren Pavement ML = Maximum Likelihood NBN = Naïve Bayesian Network SVM = Support Vector Machine MD = Minimum Distance Full Sample Set A) ML 96.4% C) SVM 72.9% B) NBN 90.9% D) MD 88.4% Clouds River Riparian Agriculture Arid Upland Semi-arid Upland Barren Pavement ML = Maximum Likelihood NBN = Naïve Bayesian Network SVM = Support Vector Machine MD = Minimum Distance Data Mining Challenges Biomedical Data Large sample sets Few correlates (dozens) Hard classes Ecologic Data Paucity of accurate reference data Spatial autocorrelation Large number of potential correlates Fuzzy classes Uncertainty Basic Research Need Spatiotemporal analysis & visualization techniques that explicitly deal with these challenges EcoGrid archive of ground truth data and the ontologies that will allow us to semantically mediate the classes Where do we start? Field data SEEK: infrastructure Imagery Spatial Data Workbench: Small NPACI project Ground sensors Wireless Sensor Workshop Future Systems: Link with SEEK Pipeline Semantic transformation to integrate field data Pipeline Unspecified ground sensor pipeline EcoGrid Query + Sample Data SRB/ MCAT Data Calculation Map Generation Radiometric Corrections HPSS @ SDSC Georegistration Remotely Sensed Imagery Data Transformation Site Field Observations Ground truth Climate User Maps Band Indices Layer Integration Archive Generate To Ecogrid Metadata Unsupervised Classification Supervised Land Cover Segmentation Classification (Patch) Metrics Band Selection Climate/Land Cover Integrated Graphics Algorithm Ontologies Validation Image Ontologies Geographic Ontologies Spatial & Temporal Ontologies Models Competition Connectivity Climate Urban expansion Et al. Domain Ontologies Signal Processing Ontologies We start with you! Data Sharing Metadata Databases Computer savvy End! Incorporating sensor processing 1. Build a generic image and signal processing knowledge base 2. Develop actors for these functions 3. Build knowledge bases for domains of interest, and relate them to the generic • ENM pipelines • NEON competition • Hazards (fire, flood, drought, disease) 4. Develop processing pipelines 5. Identify sensor (image and signal) data and analytical resources, convert them to web services 6. When EcoGrid is ready, register them as nodes National Center? Multidisciplinary staff Working groups (4-6 weeks) Multidisciplinary postdocs Summer school in ecoinformatics