Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inverse problem wikipedia , lookup
Theoretical computer science wikipedia , lookup
Neuroinformatics wikipedia , lookup
Geographic information system wikipedia , lookup
Pattern recognition wikipedia , lookup
Multidimensional empirical mode decomposition wikipedia , lookup
Corecursion wikipedia , lookup
Agent Technology for Data Analysis Tony Johnson - SLAC 21st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS Motivation and Disclaimer Many efforts to use supernetworks to link supercomputers to transfer huge datasets Few efforts to make effective use of existing real-world networks • Allow university users to access remote data I am not an agent technology expert • We do have a prototype application • I’m hoping some of you are! Outline Overview of problem • Network restraints Why agent technology? Why Java • For Agent Technology? • For Data Analysis? Analysis Studio application More information What Problem are we trying to solve? Widely distributed users who need access to petabyte datasets • Many university users with mediocre networks • Most universities have no way to handle petabyte data samples Physicist needs unfettered access to data • Would like effective use of desktop machine • Canned analysis wont do CPU/data access requirements are infinite Faster networks? • Faster networks will not solve our problems anytime soon • No matter how fast networks are they are always saturated. • As networks become saturated latency becomes high Why Agent Technology? By encapsulating users analysis code as a “user agent” we can send it to the data, wide-area network bandwidth requirements become trivial • Analysis modules are typically small <10’s kBytes • HEP output is typically histograms (binned) and scatterplots, which are both small Possible to do GUI based analysis of large datasets using 28.8 modem connection Give user the impression his analysis is running locally. Why Java for Agent Technology? Java produces machine independent bytecodes • Trivial to move from one machine to another • Network handling and Remote Method Invocation (RMI c.f. Corba) built-in • (Remote) Dynamic loading build-in • Multithreaded servers easy to write • Built-in Java “Sandbox” can be used to restrict agents Why Java for Data Analysis Easy to learn yet very powerful, fully OO language • Very wide industry support • Just In Time compilation = Fast • Dynamic Optimization = Faster • Very fast code, load, test, fix cycle • Built in debugger, including remote debugging • Numerical functionality good – Java Grande Forum enhancing numerical support “Java Analysis Studio” Local Data Desktop Client DIM Remote Data Network Data Server DIM Data Controller Distributed Data Data Server DIM Data Server DIM Data Server DIM Data Server DIM Data Server Data Server DIM DIM Demo Network Performance View (Histogram) View Adapter Model (Data Source) Model Adapter Caching Prefetching of data Data clumping, streaming More Information Java • http://java.sun.com Java Analysis Studio • http://www-sldnt.slac.stanford.edu/jas Java Grande Forum (numeric computing in Java) • http://www.javagrande.org/ • Desktop access to remote resources – http://www-fp.mcs.anl.gov/~gregor/datorr/