Download PowerPoint-presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Implementation of a
streaming database management system
on a Blue Gene architecture
for measurement data processing.
Erik Zeitler
Uppsala data base lab
www.it.uu.se/research/group/udbl
Looking out into space:
Use large radio telescopes!
Problem:
Size matters
We have hit the limit
Use many large radio telescopes?
Augment the measurements using signal processing
They act together as a HUGE telescope
• Look in one direction only
• Expensive…
Solution
Use a huge amount of small antennas
• Broad band
• Multi direction
}
receivers
This enables new scientific applications (and challenges)
Scientific applications
• Re-ionization epoch
• the 1st 105 years – hydrogen forming
• Deep Extragalactic Surveys
• To boldly go…
• Transient Sources
• All-sky surveys of
– gamma bursts
– flare stars
– supernovae
• Ultra High Energy Cosmic Rays
• Pulsars
Antennas, antennas, antennas…
• Broad band radio receiver
• 80…300 MHz, 3 dimensions
• Produces 0.9 Gbps raw data
• Central site + 20 outstations
located within a circular area, diameter 350 km
 13103 antennas
System overview
• Antennas
• Basic beam forming
• FPGAs
• Network
• GbE, 10GbE
• Central Processing facility
• Linux clusters, IBM Blue Gene/L
• Off line analysis
• PCs, workstations, Blue Gene
System overview
Central processing tasks
• FFT
• Signal correlation
• Calibration
• RFI mitigation (noise from human activities)
• Stratosphere plasma
• Subtracting known objects
• Transient analysis
• Peak detection
Computing challenges
• Multiple incoming data streams
• 20 Tbps
• Multiple experiments
• Complex computations
• Demand for rapid reconfiguration of
computing systems
• Use case: On-line transient analysis
Central processing facilities
• On line processing
• Linux cluster (buffering)
• Light weight BG/L (beam)
• 6 racks  6144 compute nodes + 96 I/O nodes
• Off-line processing
• Linux clusters, SAN, GRID, …
Blue Gene
Dataflow supercomputer
• LLNL installation: 64 racks (65536 CPUs)
 70 TFLOPS on the size of a tennis court
BG/L architecture
• I/O node:
• 2x PPC440@700MHz
• Linux
• Each I/O node coordinates 64 compute nodes
• 512 MB RAM
• Compute node:
• 2x PPC440@700MHz
• Single threaded light weight OS
• Typically:
– 1 CPU for computation
– 1 CPU for communication
• 512 MB RAM
User agent
BG/L dataflow computer
Query
result stream
Continuous
query
Query
result stream
(Scientist)
user
Continuous
query
Continuous
query
Incoming
measurement data
streams
Query
result stream
(Scientist)
user
UDBL project
• Implement a very high performance stream database
manager
• based on AmosII DB kernel (http://user.it.uu.se/~udbl/amos/)
• Utilize the BG/L computing environment for
• scalable data stream queries
• involving user-defined computations
• Implement specialized query optimization:
• Planning BG/L node configuration for given stream queries
• Re-configuration when interesting phenomena occur
This far (after 4 months)
• Implementing primitives for data ~
•
•
•
•
Computation
Aggregation
Communication
Fusion
• Proof of concept cases
• Signal processing
• Peak detection
• Stream join
• Benchmark
• Based on real LOFAR/LOIS data
• Performance analysis for stream databases
A simple example
• gnuplot(peakdetect(vector_elements(wina
gg(vector_elements(readlofarvectorfile(
"temp.DAT")),256,256))));
Other application areas
• Other space physics research areas
• projects at IRFU
• Network traffic analysis
• Financial (stock market) information
• Content analysis of streaming media
Questions?
Related documents