Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Implementation of a streaming database management system on a Blue Gene architecture for measurement data processing. Erik Zeitler Uppsala data base lab www.it.uu.se/research/group/udbl Looking out into space: Use large radio telescopes! Problem: Size matters We have hit the limit Use many large radio telescopes? Augment the measurements using signal processing They act together as a HUGE telescope • Look in one direction only • Expensive… Solution Use a huge amount of small antennas • Broad band • Multi direction } receivers This enables new scientific applications (and challenges) Scientific applications • Re-ionization epoch • the 1st 105 years – hydrogen forming • Deep Extragalactic Surveys • To boldly go… • Transient Sources • All-sky surveys of – gamma bursts – flare stars – supernovae • Ultra High Energy Cosmic Rays • Pulsars Antennas, antennas, antennas… • Broad band radio receiver • 80…300 MHz, 3 dimensions • Produces 0.9 Gbps raw data • Central site + 20 outstations located within a circular area, diameter 350 km 13103 antennas System overview • Antennas • Basic beam forming • FPGAs • Network • GbE, 10GbE • Central Processing facility • Linux clusters, IBM Blue Gene/L • Off line analysis • PCs, workstations, Blue Gene System overview Central processing tasks • FFT • Signal correlation • Calibration • RFI mitigation (noise from human activities) • Stratosphere plasma • Subtracting known objects • Transient analysis • Peak detection Computing challenges • Multiple incoming data streams • 20 Tbps • Multiple experiments • Complex computations • Demand for rapid reconfiguration of computing systems • Use case: On-line transient analysis Central processing facilities • On line processing • Linux cluster (buffering) • Light weight BG/L (beam) • 6 racks 6144 compute nodes + 96 I/O nodes • Off-line processing • Linux clusters, SAN, GRID, … Blue Gene Dataflow supercomputer • LLNL installation: 64 racks (65536 CPUs) 70 TFLOPS on the size of a tennis court BG/L architecture • I/O node: • 2x PPC440@700MHz • Linux • Each I/O node coordinates 64 compute nodes • 512 MB RAM • Compute node: • 2x PPC440@700MHz • Single threaded light weight OS • Typically: – 1 CPU for computation – 1 CPU for communication • 512 MB RAM User agent BG/L dataflow computer Query result stream Continuous query Query result stream (Scientist) user Continuous query Continuous query Incoming measurement data streams Query result stream (Scientist) user UDBL project • Implement a very high performance stream database manager • based on AmosII DB kernel (http://user.it.uu.se/~udbl/amos/) • Utilize the BG/L computing environment for • scalable data stream queries • involving user-defined computations • Implement specialized query optimization: • Planning BG/L node configuration for given stream queries • Re-configuration when interesting phenomena occur This far (after 4 months) • Implementing primitives for data ~ • • • • Computation Aggregation Communication Fusion • Proof of concept cases • Signal processing • Peak detection • Stream join • Benchmark • Based on real LOFAR/LOIS data • Performance analysis for stream databases A simple example • gnuplot(peakdetect(vector_elements(wina gg(vector_elements(readlofarvectorfile( "temp.DAT")),256,256)))); Other application areas • Other space physics research areas • projects at IRFU • Network traffic analysis • Financial (stock market) information • Content analysis of streaming media Questions?