Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UAH GRIDS Center Middleware Testing Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science and Technology Center 256-961-7806 [email protected] [email protected] www.itsc.uah.edu “…drowning in data but starving for knowledge” Data glut affects business, medicine, military, science How do we leverage data to make BETTER decisions??? Information User Community Data Mining • Automated discovery of patterns, anomalies from vast • observational data sets Derived knowledge for decision making, predictions and disaster response http://datamining.itsc.uah.edu Mining Environment: When,Where, Who and Why? WHEN •Real Time •On-Ingest •On-Demand •Repeatedly WHERE •User Workstation •Data Mining Center •GRID WHO •End Users •Domain Experts •Mining Experts Data Mining WHY •Event •Relationship •Association •Corroboration •Collaboration Algorithm Development and Mining (ADaM) ADaM consists of: • a data mining engine • an extensible set of core functional applications to aid researchers in defining and performing data mining operations on spatial data sets • data mining modules as Open Grid Services Architecture (OGSA) services ADaM Engine Architecture Results Translated Data Data Preprocessed Data Patterns/ Models Processing Input HDF HDF-EOS GIF PIP-2 SSM/I Pathfinder SSM/I TDR SSM/I NESDIS Lvl 1B SSM/I MSFC Brightness Temp US Rain Landsat ASCII Grass Vectors (ASCII Text) Intergraph Raster Others... Preprocessing Selection and Sampling Subsetting Subsampling Select by Value Coincidence Search Grid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find Holes Image Processing Cropping Inversion Thresholding Others... Analysis Output Clustering K Means Isodata Maximum Pattern Recognition Bayes Classifier Min. Dist. Classifier Image Analysis Boundary Detection Concurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture Operations Genetic Algorithms Neural Networks Others... GIF Images HDF-EOS HDF Raster Images HDF SDS Polygons (ASCII, DXF) SSM/I MSFC Brightness Temp TIFF Images Others... NMI Testing ADaM Feature Subset Selection application chosen for testing Supervised pattern classification is a technique important in many domains Used to improve both the runtime and accuracy of a supervised pattern classifier by eliminating noisy, irrelevant or redundant attributes or features from the data set. Feature subset selection is the process of choosing a subset of the features from the original data set in order to maximize classifier accuracy Both processor and data-intensive Parallel Version of Cloud Extraction • GOES images can be used to recognize cumulus cloud fields • Cumulus clouds are small and do not show up well in 4km resolution IR channels • Detection of cumulus cloud fields in GOES can be accomplished by using texture features or edge detectors GOES Image Energy Computation Laplacian Filter Sobel Horizontal Filter Sobel Vertical Filter Energy Computation Energy Computation Energy Computation Classifier Cloud Image GOES Image Cumulus Cloud Mask • Three edge detection filters are used together to detect cumulus clouds which lends itself to implementation on a parallel cluster Feature Subset Selection Application • Application ported to • • • • • • linux Support Vector Machine downloaded and tested Developed application scripts Modified for Globus environment by writing simple Globus RSL file Ran each combination of tools on a different node on the grid Globus used to execute jobs on different machines Experimented with both real and synthetic data Satellite Data Grid Mining Agent Archive X Grid Processor Grid Mining Agent Grid Processor Satellite Data Grid Mining Agent Archive Y Grid Processor Components used in testing Globus toolkit - the “defacto standard,” an open source software toolkit and libraries for building grid applications; Resource Management, scheduling, information services, file transfer GSI- OpenSSH - a modified version of OpenSSH that adds support for GSI authentication, providing a single sign-on remote login capability for the Grid Condor-G - workload management system for compute-intensive jobs; job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Network Weather Service - monitors and dynamically forecasts the performance various network and computational resources can deliver over a given time interval Some Lessons Learned • Component testing went well Globus documentation improved, installation trouble-free, application port straight-forward No problems encountered during Condor-G installation, but found problem with Condor-G under Redhat linux 7.3 when using nss_ldap. Developer provided workaround - start name service caching daemon (nscd) GSI-OpenSSH installed, but Kerberos authentication did not work since linux was not compiled with PAM option (undocumented) Network Weather Service installed, but learned we are more interested in MDS Some Lessons Learned • NMI Testbed Process working well • • • Answers found through NMI discussion lists from developers and other users Have to “sell” the grid concept to developers, administrators, users NMI Work proven helpful in other grid work TeraGrid ISS Space-based Science Operations Grid CEOS Grid Need more components!