Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific Data Mining IPAM January 14-18, 2002 Information Technology and Systems Center University of Alabama in Huntsville [email protected] ITSC/University of Alabama in Huntsville Talk Overview Mining System Requirements ADaM System Architecture ADaM Plan Builder Research directions ITSC/University of Alabama in Huntsville Mining System Requirements: When,Where and Who WHEN •Real Time •On-Ingest •On-Demand •Repeatedly WHERE •User Workstation •Data Archive Center •Data Mining Center WHO •Casual Users •Domain Experts •Mining Experts Data Mining ITSC/University of Alabama in Huntsville Algorithm Development and Mining (ADaM) System ADaM system developed under NASA research grant The system provides knowledge discovery, feature detection and content-based searching for data values, as well as for metadata. It contains over 120 different operations to be performed on the input data stream. Operations vary from specialized atmospheric science dataset specific algorithms to different digital image processing techniques, processing modules for automatic pattern recognition, machine perception, neural networks and genetic algorithms. ITSC/University of Alabama in Huntsville ADaM Features Handles science data set variability Multiple resolution/multiple scales Variability of formats Granularity of data Includes spatial/temporal dimensions Allows addition of new algorithms Allow scientists to select and sequence different operations ITSC/University of Alabama in Huntsville ADaM Engine Architecture Results Translated Data Data Preprocessed Data Patterns/ Models Processing Input HDF HDF-EOS GIF PIP-2 SSM/I Pathfinder SSM/I TDR SSM/I NESDIS Lvl 1B SSM/I MSFC Brightness Temp US Rain Landsat ASCII Grass Vectors (ASCII Text) Intergraph Raster Others... Preprocessing Analysis Clustering Selection and Sampling K Means Subsetting Isodata Subsampling Maximum Select by Value Pattern Recognition Bayes Classifier Coincidence Search Min. Dist. Classifier Grid Manipulation Image Analysis Grid Creation Boundary Detection Bin Aggregate Cooccurrence Matrix Dilation and Erosion Bin Select Histogram Grid Aggregate Operations Grid Select Polygon Find Holes Circumscript Spatial Filtering Image Processing Texture Operations Cropping Genetic Algorithms Inversion Neural Networks Thresholding ITSC/University of Alabama Others... in Others... Huntsville Output GIF Images HDF-EOS HDF Raster Images HDF SDS Polygons (ASCII, DXF) SSM/I MSFC Brightness Temp TIFF Images Others... ADaM Mining Environment Distributed Clients Web-based Workstation based Other Systems Analysis/Vis Tools Data Mining Server Common Client API Knowledge Base Mining Engine (ADaM) Input Modules Analysis Modules Output Modules Mining Results Data Stores ITSC/University of Alabama in Huntsville Event/ Relationship Search System ADaM Architecture ITSC/University of Alabama in Huntsville ADaM Miner Engine Manages the processing of data through a series of specified operations Loads input, processing and output modules dynamically as needed at execution time Allows for the addition of newly developed modules without the need to rebuild the engine Interprets a mining plan script that provides the details about specified operations and the order that they should be executed ITSC/University of Alabama in Huntsville ADaM Miner Database Used to store information that includes the names, locations and related metadata for input data sets available on the server Includes information about users, jobs, mining results, and other related information Simple relational database ITSC/University of Alabama in Huntsville ADaM Daemon and Scheduler Scheduler Examines the list of jobs to be executed on the server and determines which job or jobs to execute at any given time Queues the requests and executes them sequentially. Daemon Handles all network communications with the mining system Is configured to listen on a specific port for any socket communications ITSC/University of Alabama in Huntsville ADaM Input/Operation Filters Input/Output Filters are data readers and writers Operations are the algorithms Each of the operations and (input/output) filters is implemented as a shared library New modules may be added to the system without recompiling or relinking. All operations/filters either produce or operate on a data collection, which provides a common format for representing scientific ITSC/University of Alabama in data. Huntsville General Mining Steps Select data files to be mined “Check-In” the data files into the Miner Database Write a “Mining Plan” consisting of sequence of input filter and operations Execute the Mining Plan using the engine Check and save results Iterate ITSC/University of Alabama in Huntsville What is Check-In? Process of encoding information such as the names, locations and related metadata for input data sets available on the server Create complex data hierarchy in the database ITSC/University of Alabama in Huntsville ADaM Plan Builder: Check-In Two Modes of Operation -General: which only requires minimal information -Advanced: requires more detailed information and Allows user to set up structured database Path to the data file Data file name Input Filter associated with the Data file Load an XML file containing existing Check-In specifications ITSC/University of Alabama in Huntsville ADaM Plan Builder – Layout Operation Menu contains the list of operations one can select Input Menu contains the list of Input Filters one can select Plan Menu allows one to: •Select a new plan •Load existing plan •Check-In data ITSC/University of Alabama in Huntsville ADaM Plan Builder – Layout Panel where Mining Plan can be viewed either as text or a tree ITSC/University of Alabama in Huntsville ADaM Plan Builder – Layout Description about the Operation/Input Filter can be viewed in this panel ITSC/University of Alabama in Huntsville ADaM Plan Builder – Layout All the parameters needed for the Operation are described here ITSC/University of Alabama in Huntsville ADaM Plan Builder – Layout Sample values for Operation’s parameters are shown in this panel ITSC/University of Alabama in Huntsville ADaM Plan Builder – Layout Allows user to select the operation and add it to the Mining Plan Go Mine the data using the Mining Plan ITSC/University of Alabama in Huntsville Research Directions Generic Data Reader for ADaM ESML – Earth Science Markup Language Programmers Guide for ADaM Distributed Mining Grid Mining Successful implementation and testing of the ADaM system on the NASA Information Power Grid Mining Onboard the Space Craft The EnVironmEnt for On-Board Processing (EVE) system ITSC/University of Alabama in Huntsville ADaM Information Web site: datamining.itsc.uah.edu ADaM Lite beta version download Contact: [email protected] ITSC/University of Alabama in Huntsville