Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
The Australian Virtual Observatory Clusters and Grids David Barnes Astrophysics Group Overview • • • • • • • • What is a Virtual Observatory? Scientific motivation International scene Australian scene DataGrids for VOs ComputeGrids for VOs Sketch of AVO DataGrid and ComputeGrid Clustering experience at Swinburne What is a Virtual Observatory? • A Virtual Observatory (VO) is a distributed, uniform interface to the data archives of the world’s major astronomical observatories. • A VO is explored with advanced data mining and visualisation tools which exploit the unified interface to enable cross-correlation and combined processing of distributed and diverse datasets. • VOs will rely on, and provide motivation for, the development of national and international computational and data grids. Scientific motivation • Understanding of astrophysical processes depends on multi-wavelength observations and input from theoretical models. • As telescopes and instruments grow in complexity, surveys generate massive databases which require increasing expertise to comprehend. • Theoretical modeling codes are growing in sophistication to consume available compute time. • Major advances in astrophysics will be enabled by transparently cross-matching, cross-correlating and inter-processing otherwise disparate data. Sample multi-wavelength data for the galaxy IC5332 (Ryan-Weber) blue HI spectral line column density H-alpha spectral line HI spectral line velocity dispersion infrared HI spectral line velocity field HI profile from public release International scene • AstroGrid (www.uk-vo.org) – phase A (1yr R&D) complete; phase B (3yr implementation) funded £3.7M. • Astrophysical Virtual Observatory (www.euro-vo.org) – phase A (3yr R&D) funded €4.0M. • National Virtual Observatory (www.usvo.org) – (5yr framework development) funded USD 10M. Australian scene • Australian Virtual Observatory (www.aus-vo.org) – phase A (1yr common-format archive implementation) funded AUD 260K (2003 LIEF grant [Melb, Syd, ATNF, AAO]). • Data archives are: – – – – HIPASS: 1.4 GHz continuum and HI spectral line survey SUMSS: 843 MHz continuum survey S4: digital images of the southern sky in five optical filters ATCA archive: continuum and spectral line images of the southern sky – 2dFGRS: optical spectra of >200K southern galaxies – and more... DataGrids for VOs • archives listed on previous slide range from ~10 GB to ~10 TB in processed (reduced) size. • providing just the processed images and spectra on-line requires a distributed, highbandwidth network of data servers – that is, a DataGrid. • users may want some simple operations such as smoothing or filtering, applied at the data server. This is a Virtual DataGrid. ComputeGrids for VOs • More complex operations may be applied requiring significant processing: – source detection and parameterisation – reprocessing of raw or intermediate data products with new calibration algorithms – combined processing of raw, intermediate or "final product" data from different archives • These operations require a distributed, highbandwidth network of computational nodes – that is, a ComputeGrid. Possible initial players in the Australian Virtual Observatory Data and Compute Grids… CPU? Parkes? Data CPU? ATNF/AAO 2dFGRS RAVE Data Canberra CPU? ATCA Adelaide Theory? CPU Data CPU? VPAC Melbourne HIPASS Gemini? Theory Data Sydney SUMSS Grangenet CPU APAC CPU Swinburne Theory Clustering @ Swinburne • • • • • 1998 – 2000: 40 Compaq Alpha workstations 2001: +16 Dell dual PIII rackmount servers 2002: +30 Dell dual P4 workstations mid 2002: +60 Dell dual P4 rackmount servers November 2002: placed 180th in Top500 with 343 sustained Gflop/s. (APAC 63rd with 825 Gflop/s) • +30 Dell dual P4 rackmount servers installed mid 2002 at the Parkes telescope in NSW. • psuedo-Grid with data pre-processed in realtime at the telescope, shipped back in “slowtime”. Swinburne activities • N-body simulation codes: – galaxy formation – stellar disk astrophysics – cosmology • Pulsar searching and timing – (1 GB/min data recording) • Survey processing as a coarse-grained problem • Rendering of virtual reality content Clustering costs nodes price/node price/cpu 1 cpu, 256MB std mem, 20GB disk, ethernet 1.3K 1.3K 2 cpu, 1 GB fast mem, 20 GB disk, ethernet 4.4K 2.2K 2 cpu, 2GB fast mem, 60 GB SCSI disk, ethernet 8.0K 4.0K Giganet, Myrinet, ... 1.5K 1.5K (1 cpu) 0.8K (2 cpu) (estimates incl. on-site warranty; 2nd fastest cpu; excl. infrastructure) Some ideas... • “desktop cluster” – astro group has 6 dual-cpu workstations. – Add MPI, PVM, Nimrod libs and Ganglia monitoring tool to get 12-cpu loose cluster with 8GB mem. – Use MOSIX to provide transparent job migration with workstations joining the cluster at night-time. • “pre-purchase cluster” – univ. buys ~500 desktops/yr – use them for ~6 months! – build up a cluster of desktops purchased ahead of demand, and replace as deployed to desktops. – Gain compute power of new CPUs without any real effect on end-users.