Download 20030409-Grid-Redman

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Grid-enabled Collaborative
Research Applications
Internet2 Member Meeting
Spring, 2003
Sara J. Graves
Director, Information Technology and Systems Center
University Professor, Computer Science Department
University of Alabama in Huntsville
Director, Information Technology Research Center
National Space Science and Technology Center
256-824-6064
[email protected]
http://www.itsc.uah.edu
“…drowning in data but starving for knowledge”
Data glut affects
business, medicine,
military, science
How do we leverage
data to make BETTER
decisions???
Information
User
Community
Collaborative Research
Applications
• Enabling Technologies for Collaborative
Research
– Grid-Enabled Data Mining Services
– Interchange Technology Mark-ups
– Collaboration Tools
• Collaborative Research Applications on the
Grid
– TeraGrid Expeditions
– Linked Environments for Atmospheric Discovery
– Propulsion Research: Rocket Engine Advancement
Project 2
Data Mining
• Automated discovery of patterns, anomalies from vast
observational data sets
• Derived knowledge for decision making, predictions and disaster
response
• ADaM – Algorithm Development and Mining System
http://datamining.itsc.uah.edu
Mining Environment:
When,Where, Who and Why?
WHEN
•Real Time
•On-Ingest
•On-Demand
•Repeatedly
WHERE
•User Workstation
•Data Mining Center
•GRID
WHO
•End Users
•Domain Experts
•Mining Experts
Data Mining
WHY
•Event
•Relationship
•Association
•Corroboration
•Collaboration
Iterative Nature of the Data
Mining Process
KNOWLEDGE
EVALUATION
And
PRESENTATION
DISCOVERY
MINING
CLEANING
And
INTEGRATION
PREPROCESSING
DATA
SELECTION
And
TRANSFORMATION
ADaM Engine Architecture
Results
Translated
Data
Data
Preprocessed
Data
Patterns/
Models
Processing
Input
Preprocessing
Analysis
Output
HDF
HDF-EOS
GIF PIP-2
SSM/I Pathfinder
SSM/I TDR
SSM/I NESDIS Lvl 1B
SSM/I MSFC
Brightness Temp
US Rain
Landsat
ASCII Grass
Vectors (ASCII Text)
Selection and Sampling
Subsetting
Subsampling
Select by Value
Coincidence Search
Grid Manipulation
Grid Creation
Bin Aggregate
Bin Select
Grid Aggregate
Grid Select
Find Holes
Image Processing
Cropping
Inversion
Thresholding
Others...
Clustering
K Means
Isodata
Maximum
Pattern Recognition
Bayes Classifier
Min. Dist. Classifier
Image Analysis
Boundary Detection
Cooccurrence Matrix
Dilation and Erosion
Histogram
Operations
Polygon
Circumscript
Spatial Filtering
Texture Operations
Genetic Algorithms
Neural Networks
Others...
GIF Images
HDF-EOS
HDF Raster Images
HDF SDS
Polygons (ASCII, DXF)
SSM/I MSFC
Brightness Temp
TIFF Images
Others...
Intergraph Raster
Others...
Mining Environments
Multilevel Mining (ADaM)
–
–
–
–
–
–
Complete System (Client and Engine)
Mining Engine (User provides its own client)
Application Specific Mining Systems
Operations Tool Kit
Stand Alone Mining Algorithms
Data Fusion
Distributed/Federated Mining
– Distributed services
– Distributed data
– Chaining using Interchange Technologies
On-board Mining (EVE)
– Real time and distributed mining
– Processing environment constraints
Grid-Enabled Data Mining
Services
• Distributed
researchers, data
sources, storage and
computational
resources in a secure
environment
• ADaM data mining
modules as Open
Grid Services
Architecture (OGSA)
services
Data Mining / Earth Science Collaboration:
Tropical Cyclone Detection
Advanced
Microwave
Sounding Unit
(AMSU-A) Data
Calibration/
Limb Correction/
Converted to Tb
Mining Plan:
• Water cover mask to eliminate land
• Laplacian filter to compute temperature
gradients
• Science Algorithm to estimate wind speed
• Contiguous regions with wind speeds
above a desired threshold identified
• Additional test to eliminate false positives
• Maximum wind speed and location
produced
Further Analysis
Knowledge
Base
Data Archive
Hurricane Floyd
Mining
Environment
Result
Results are placed on the web, made available to
National Hurricane Center & Joint Typhoon Warning Center,
and stored for further analysis
http://pm-esip.msfc.nasa.gov/cyclone
Data Mining / Earth Science Collaboration:
Classification Based on Texture Features
Cumulus cloud fields have a very characteristic
texture signature in the GOES visible imagery

Science Rationale: Man-made changes to land use cause
changes in weather patterns, especially cumulus clouds

Comparison based on
– Accuracy of detection
– Amount of time required to classify
Parallel Version of Cloud Extraction
• GOES images can be
used to recognize
GOES Image
cumulus cloud fields
Sobel Horizontal
Sobel Vertical
Laplacian Filter
Filter
Filter
• Cumulus clouds are
small and do not
Energy
Energy
Energy
Energy
Computation
Computation
Computation
Computation
show up well in 4km
resolution IR channels
Classifier
• Detection of cumulus
cloud fields in GOES
can be accomplished
Cloud Image
by using texture
GOES Image Cumulus Cloud
features or edge
Mask
detectors
• Three edge detection filters are used together to detect cumulus
clouds which lends itself to implementation on a parallel cluster
Data Mining / Earth Science Collaboration:
Detecting Signatures
• Detecting mesocyclone
signatures from Radar data
• Science Rationale:
Mesocyclone is an indicator
of Tornadic activity
• Developing an algorithm
based on wind velocity
shear signatures
– Improve accuracy and
reduce false alarm rates
Data Mining / Space Science Collaboration:
Boundary Detection and Quantification
• Analysis of polar cap
auroras in large
volumes of spacecraft
UV images
• Scientific Rationale:
– Indicators to predict
geomagnetic storm
A
B
C
D
• Damage satellites
• Disrupt radio connection
• Developing different
mining algorithms to
detect and quantify
polar cap boundary
Polar Cap Boundary
Data Mining / BioInformatics Collaboration:
Genome Patterns
Text Pattern Recognition:
Used to search for text patterns in
bioscience data as well as other text
documents.
Scientists
Mining Engine
Input
Modules
Analysis
Modules
Output
Modules
Mining
Results:
MCSs
Event/
Relationship
Event/
Search
Relationship
System
Search
System
Knowledge base
Genome DB
Sensor Data Characteristics
• Many different
formats, types and
structures
• Different states of
processing ( raw,
calibrated,
derived, modeled
or interpreted )
• Enormous volumes
• Heterogeneity
leads to data
usability problems
Interchange Technologies:
Accessing Heterogeneous Data
The Problem
DATA
FORMAT 1
DATA
FORMAT 3
DATA
FORMAT 2
FORMAT
CONVERTER
READER 1
READER 2
APPLICATION
The Solution
DATA
DATA
DATA
FORMAT 1
FORMAT 2
FORMAT 3
ESML
ESML
ESML
FILE
FILE
FILE
ESML
LIBRARY
APPLICATION
• Earth science data comes in:
 Different formats, types and structures
 Different states of processing (raw,
calibrated, derived, modeled or
interpreted)
 Enormous volumes
• Heterogeneity leads to data usability
problems
• One approach: Standard data formats
 Difficult to implement and enforce
 Can’t anticipate all needs
 Some data can’t be modeled or is
lost in translation
 The cost of converting legacy data
• A better approach: Interchange
Technologies
 Earth Science Markup Language
What is ESML?






It is a specialized markup language for Earth Science
metadata based on XML - NOT another data format.
It is a machine-readable and -interpretable representation of
the structure, semantics and content of any data file,
regardless of data format
ESML description files contain external metadata that can be
generated by either data producer or data consumer (at
collection, data set, and/or granule level)
ESML provides the benefits of a standard, self-describing
data format (like HDF, HDF-EOS, netCDF, geoTIFF, …) without
the cost of data conversion
ESML is the basis for core Interchange Technology that
allows data/application interoperability
ESML complements and extends data catalogs such as FGDC
and GCMD by providing the use/access information those
directories lack.
http://esml.itsc.uah.edu
Components of the
ESML Interchange Technology
DATA
FORMAT1
DATA
FORMAT2
DATA
FORMAT3
OTHER FORMATS
ESML
FILE
ESML
FILE
ESML
ESML
FILE
FILE
ESML
SCHEMA
ESML LIBRARY
ESML
EDITOR
ESML CONSISTS OF:
(1) MARKUPS
ESML
DATA
BROWSER
ADaM DATA
MINING
SYSTEM
(1) External
description file for
dataset or formats
(2) RULES FOR THE
MARKUPS
OTHER
APPLICATIONS
(2) Rules that govern
the description of
the data files
(3) MIDDLEWARE FOR
AUTOMATION
(3) Library parses and interprets
the description file and figures
out how to read the data
ESML in Numerical Modeling
GOES
ESML
Skin Temp file
Insolation ESML
Products file
Soundings, ESML
file
Others
Network
ESML Library
NUMERICAL WEATHER
MODELS (MM5, ETA, RAMS)
Purpose:
264
263
Chn 5 Temperature (AMSU) Degree Kelvin
• Use ESML to incorporate
observational data into
the numerical models for
simulation
Scientists can:
265
262
261
260
259
258
257
256
255
200
210
220
230
240
250
260
270
280
Sea Surface Temperature (TMI) Degree Kelvin
Prediction
290
300
• Select remote files across the
network
• Select different observational
data to increase the model
prediction accuracy
Collaboration Tools
Technologies to coordinate complex projects
CAMEX-4 campaign
• Data acquisition and
integration from
multiple platforms and
instruments for quick
exploitation
• Intra-project
communications before,
during, and after CAMEX
campaigns
• Collaborators included
NASA, NOAA, USAF, and
multiple universities
http://camex.msfc.nasa.gov
NASA managers
review status
Coordination
Clearinghouse
Web-based interface
Data management
CAMEX-4
Distributed
Mission
Coordination
Experiment PI
RDBMS
NASA Aircraft
Forecasters
USAF Aircraft
NOAA Aircraft
Aircraft Crew:
maintenance and
report status.
Radars
Mission
Managers
Modeling Environment for Atmospheric Discovery
(MEAD): Use of the TeraGrid Infrastructure
•Argonne National Lab
• will develop/adapt a
cyberinfrastructure that will
enable simulation,
datamining, and visualization
of hurricanes and storms
•Georgia Tech University
•Indiana University
•Lawrence Berkley National Lab
•NCSA
•NOAA/FSL
•NOAA/NSSL
•Northwestern University
•Ohio State University
• will integrate model and grid
workflow management, data
management, model
coupling, and analysis/mining
of large, ensemble datasets.
•Oklahoma University
•Portland State University
•Rice University
•Rutgers
•UAH
•UCAR
•University of Wisconsin
•University of Minnesota
Primary MEAD Software Components
•
•
•
•
•
•
•
•
•
WRF Model (Weather Research and Forecasting)
ROMS Model (Regional Ocean Modeling System)
Coupled WRF/ROMS Model
D2K (Data to Knowledge)
ADaM (Algorithm Development and Mining
System)
Visualization Engines (NCAR Graphics, Vis5D,
IDV-VisAD, HVR, VTK)
netCDF, HDF5, ESML
Middleware (Globus, JavaCog, GridFTP)
Metadata Catalogue Service
Example MEAD Workflow
Initial Setup
Initial Data
and
Parameters
Model Execution
Multiple
WRF Models
(Weather)
Inter-model
communications
Multiple
ROMS Models
(Ocean)
Initial Data
and
Parameters
Post Run Analysis
Model
Results
Data Mining
(ADaM)
Model
Results
Visualization
Need the Grid to support the huge computational,
data storage and post analysis requirements
Linked Environments for Atmospheric
Discovery (LEAD)
Create for the university community an
integrated, scalable framework for use
in accessing, preparing, assimilating,
predicting, managing,
mining/analyzing, and displaying a
broad array of meteorological and
related information independent of
format and physical location.
Collaborators:
– University of Oklahoma
– University of Alabama in Huntsville
– UCAR/Unidata
– Indiana University
– University of Illinois/NCSA
– Millersville University
– Howard University
– Colorado State University
LEAD Architecture
MyLEAD Portal
MyLEAD Virtual Environment
Interchange
Technologies
Workflow
Orchestration
Semantics for data
and services
Visualization
tools
Models
Personal Data Space
Application Services
Data Mining
Others…
Middleware
Data Management
Workflow Management
Monitoring
Grid and Web infrastructure
Resource
Allocation
Scheduling
Others…
Security
national
supercomputer
facilities
pools
of workstations
tertiary
storage
clusters
scientific
instr’mts
Distributed Resources
Collaborative Environment for
Propulsion Research:
Rocket Engine Advancement Program 2
• Consortium of propulsion research centers.
•
•
•
•
Auburn University
Purdue University
Pennsylvania State University
Tuskegee University
•
•
•
•
University of Alabama in Huntsville
University of Tennessee
NASA Marshall Space Flight Center
NASA Glenn Research Center
• Grid configuration will make distributed computational and data
resources available to researchers without having to negotiate
separate access to each resource.
• Linking or integration of multiple distributed experiment steps
into a single investigation for more timely results and analysis.
• Will rely on the security capabilities of the Grid due to the
sensitive nature of the propulsion research.
Collaborative Environment for
Propulsion Research
Cluster(s)
Supercomputer
REAP2
Grid Portal
Test
Equipment
REAP2
User Portal
Data and
Results
Rocket Engine
Advancement Program 2
Evolution of Frameworks for
Advanced Applications
• Changing Computational Landscape
–
–
–
–
–
GRIDS
Clusters
Web Services
Pervasive Computing
On-Board Processing
• Middleware for applications on GRID/Clusters
– Automate parallelization of mining tasks
– Estimate using resource requirements using
computational complexity of the algorithms
• Federated Model for Mining
– Individual components that can be distributed and
can execute across different platforms