Download No Slide Title

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Leibniz Institute for Astrophysics Potsdam wikipedia, lookup

Indian Institute of Astrophysics wikipedia, lookup

Transcript
The Australian Virtual
Observatory
Clusters and Grids
David Barnes
Astrophysics Group
Overview
•
•
•
•
•
•
•
•
What is a Virtual Observatory?
Scientific motivation
International scene
Australian scene
DataGrids for VOs
ComputeGrids for VOs
Sketch of AVO DataGrid and ComputeGrid
Clustering experience at Swinburne
What is a Virtual Observatory?
• A Virtual Observatory (VO) is a distributed, uniform
interface to the data archives of the world’s major
astronomical observatories.
• A VO is explored with advanced data mining and
visualisation tools which exploit the unified interface
to enable cross-correlation and combined
processing of distributed and diverse datasets.
• VOs will rely on, and provide motivation for, the
development of national and international
computational and data grids.
Scientific motivation
• Understanding of astrophysical processes depends
on multi-wavelength observations and input from
theoretical models.
• As telescopes and instruments grow in complexity,
surveys generate massive databases which require
increasing expertise to comprehend.
• Theoretical modeling codes are growing in
sophistication to consume available compute time.
• Major advances in astrophysics will be enabled by
transparently cross-matching, cross-correlating and
inter-processing otherwise disparate data.
Sample multi-wavelength data for the galaxy IC5332 (Ryan-Weber)
blue
HI spectral line
column density
H-alpha spectral line
HI spectral line
velocity dispersion
infrared
HI spectral line
velocity field
HI profile from public release
International scene
• AstroGrid (www.uk-vo.org) – phase A (1yr
R&D) complete; phase B (3yr
implementation) funded £3.7M.
• Astrophysical Virtual Observatory
(www.euro-vo.org) – phase A (3yr R&D)
funded €4.0M.
• National Virtual Observatory (www.usvo.org) – (5yr framework development)
funded USD 10M.
Australian scene
• Australian Virtual Observatory (www.aus-vo.org) –
phase A (1yr common-format archive
implementation) funded AUD 260K (2003 LIEF
grant [Melb, Syd, ATNF, AAO]).
• Data archives are:
–
–
–
–
HIPASS: 1.4 GHz continuum and HI spectral line survey
SUMSS: 843 MHz continuum survey
S4: digital images of the southern sky in five optical filters
ATCA archive: continuum and spectral line images of the
southern sky
– 2dFGRS: optical spectra of >200K southern galaxies
– and more...
DataGrids for VOs
• archives listed on previous slide range from
~10 GB to ~10 TB in processed (reduced)
size.
• providing just the processed images and
spectra on-line requires a distributed, highbandwidth network of data servers – that is,
a DataGrid.
• users may want some simple operations
such as smoothing or filtering, applied at the
data server. This is a Virtual DataGrid.
ComputeGrids for VOs
• More complex operations may be applied
requiring significant processing:
– source detection and parameterisation
– reprocessing of raw or intermediate data
products with new calibration algorithms
– combined processing of raw, intermediate or
"final product" data from different archives
• These operations require a distributed, highbandwidth network of computational nodes
– that is, a ComputeGrid.
Possible initial players
in the Australian Virtual
Observatory Data and
Compute Grids…
CPU?
Parkes?
Data CPU?
ATNF/AAO
2dFGRS
RAVE
Data
Canberra
CPU?
ATCA
Adelaide
Theory?
CPU
Data CPU?
VPAC
Melbourne
HIPASS
Gemini?
Theory
Data
Sydney
SUMSS
Grangenet
CPU
APAC
CPU
Swinburne
Theory
Clustering @ Swinburne
•
•
•
•
•
1998 – 2000: 40 Compaq Alpha workstations
2001: +16 Dell dual PIII rackmount servers
2002: +30 Dell dual P4 workstations
mid 2002: +60 Dell dual P4 rackmount servers
November 2002: placed 180th in Top500 with 343
sustained Gflop/s. (APAC 63rd with 825 Gflop/s)
• +30 Dell dual P4 rackmount servers installed mid
2002 at the Parkes telescope in NSW.
• psuedo-Grid with data pre-processed in realtime at
the telescope, shipped back in “slowtime”.
Swinburne activities
• N-body simulation codes:
– galaxy formation
– stellar disk astrophysics
– cosmology
• Pulsar searching and timing
– (1 GB/min data recording)
• Survey processing as a coarse-grained
problem
• Rendering of virtual reality content
Clustering costs
nodes
price/node price/cpu
1 cpu, 256MB std mem, 20GB
disk, ethernet
1.3K
1.3K
2 cpu, 1 GB fast mem, 20 GB
disk, ethernet
4.4K
2.2K
2 cpu, 2GB fast mem, 60 GB
SCSI disk, ethernet
8.0K
4.0K
Giganet, Myrinet, ...
1.5K
1.5K (1 cpu)
0.8K (2 cpu)
(estimates incl. on-site warranty; 2nd fastest cpu; excl. infrastructure)
Some ideas...
• “desktop cluster” – astro group has 6 dual-cpu
workstations.
– Add MPI, PVM, Nimrod libs and Ganglia monitoring tool to
get 12-cpu loose cluster with 8GB mem.
– Use MOSIX to provide transparent job migration with
workstations joining the cluster at night-time.
• “pre-purchase cluster” – univ. buys ~500
desktops/yr – use them for ~6 months!
– build up a cluster of desktops purchased ahead of
demand, and replace as deployed to desktops.
– Gain compute power of new CPUs without any real effect
on end-users.