Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GATES: A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen, Kolagatla Reddy, Gagan Agrawal Department of Computer Science and Engineering The Ohio State University {chenlia, reddyk, agrawal}@cis.ohio-state.edu 1 Streaming Data Model Continuous data arrival and processing Emerging model for data processing Sources that produce data continuously: sensors, long running simulations WAN bandwidths growing faster than disk bandwidths Active topic in many computer science communities Databases Data Mining Networking …. 2 Summary/Limitations of Current Work Focus on centralized processing of stream from a single source (databases, data mining) communication only (networking) Many applications involve distributed processing of streams streams from multiple sources 3 Motivating Application Network Fault Management System Switch Network X Network Fault Management System 4 Motivating Application (2) Computer Vision Based Surveillance 5 Motivating Application (3) Tatabe et al. CCGRID 2002 6 Features of Distributed Streaming Processing Applications Data sources could be distributed Over a WAN Continuous data arrival Enormous volume Probably can’t communicate it all to one site Results from analysis may be desired at multiple sites Real-time constraints A real-time, high-throughput, distributed processing problem 7 Motivation Challenges & Possible Solutions Challenge1: Data, Communication, and/or Compute- Intensive Switch Network X 8 Motivation Challenges & possible Solutions Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Switch Network 9 Motivation Challenges & possible Solutions Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Challenge 2: real-time analysis is required Solution: Self-Adaptation functionality is desired 10 Need for a Grid-Based Stream Processing Middleware Application developers interested in data stream processing Will like to have abstracted • Grid standards and interfaces • Adaptation function Will like to focus on algorithms only GATES is a middleware for Grid-based Self-adapting Data Stream Processing 11 Roadmap GATES Architecture and API Adaptation algorithm Evaluation Related work Conclusion On-going & Future work 12 GATES Grid-based AdapTive Execution on Streams Targets (distributed) processing of (distributed) data streams Built on OGSA model Self adaptation to meet real-time constraint on processing 13 GATES and Grid-Standards Applications GATES Globus-OGSA Web service Internet 14 Using GATES Break down the analysis into several sub-tasks that make a pipeline Implement each sub-task in Java Write an XML configuration file for the sub-tasks to be automatically deployed. Launch the application by running a java program (StreamClient.class) provided by the GATES 15 System Architecture 16 Adaptation for Real-time Processing Analysis on streaming data is approximate Accuracy and execution rate trade-off can be captured by certain parameters (Adaptation parameters) Sampling Rate Size of summary structure Application developers can expose these parameters and a range of values 17 API for Adaptation Public class Sampling-Stage implements StreamProcessing{ … void init(){…} … void work(buffer in, buffer out){ … GATES.Information-About-Adjustment-Parameter(min, max, 1) while(true) { Image img = get-from-buffer-in-GATES(in); Image img-sample = Sampling(img, sampling-ratio); put-to-buffer-in-GATES(img-sample, out); sampling-ratio = GATES.getSuggestedParameter(); } … } 18 Self-Adaptation Approach Stage A Stage B Stage C A B C :Buffers :Queues :Grid services of the GATES :Stages of an application 19 Adaptation algorithm Goal Issues A B C No specific information about applications Filtering out short-term bursts and sensitive to longterm behaviors Quickly find converged values of adjustment parameters Basic idea Query Theory and Heuristic algorithm 20 Adaptation algorithm Equations 21 Evaluation Two applications A counting sample application A computational steering application Three experiments were conducted The First one was running counting sample applications on the GATES the other two were running computational steering applications 22 The Experiment One: Non-adaptive Vs. Adaptive version Performance comparison Network Bandwidth (Kilo-Byte sec.) 40 (sec.) 80 (sec.) 120 (sec.) 160 (sec.) Adaptive Version (Kilo-Byte/Sec.) 1 462.3 612.9 459.9 671 463.5 10 187.7 193.3 509.1 302.1 234.9 100 246.4 466.7 296.2 371.6 387.1 1000 240.4 298.8 307.7 478 399.9 Network Bandwidth (Kilo-Byte/Sec.) 40 (sec.) 80 (sec.) 120 (sec.) 160 (sec.) 1 0.891 0.962 0.981 0.987 0.986 10 0.896 0.963 0.983 0.992 0.986 100 0.887 0.957 0.979 0.988 0.974 1000 0.879 0.963 0.983 0.989 0.988 Accuracy comparison Adaptive Version (Kilo-Byte/Sec.) 23 Self-Adaptation with Different Processing Requirements 24 Self-Adaptation with Different Data Generation Rates 25 Related work dQUOB (dynamic QUery Objects) DataCutter A lot of work on adaptation Adaptation for real-time processing of streams Streaming database systems Support DB Operations, usually centralized 26 Conclusion High-volume, distributed, stream processing is in our future Grid computing could be an effective solution for distributed data stream processing GATES Distributed processing Exploit grid web services Self-adaptation to meet the real-time constraints 27 On-going and Future Work Continuous (dynamic) resource discovery & monitoring Resource Reallocation (self-mobility) Larger application (time-varying visualization) Generalize Adaptation Algorithm More evaluation studies 28