Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AstroGrid http://www.astrogrid.ac.uk Belfast Cambridge Edinburgh Jodrell Leicester MSSL RAL NAM 2001 Andy Lawrence Cambridge AstroGrid http://www.astrogrid.ac.uk Optical Infrared X-ray Radio Solar Space Plasma NAM 2001 Andy Lawrence Cambridge collectivisation • thirty year trend..... – – – – – facility class (common-user) instruments central development of data reduction s/w calibrated archives with simple tools information services (ADS, NED) large consortium projects (MACHO, 2dF, SLOAN, VISTA...) • next steps – inter-operable archives (joint queries) – communal exploration and analysis tools (data mining) – information discovery tools the archive is the sky – large fraction of astro-papers based on archives – HST : retrieval growing faster than ingest – ISO : whole archive downloaded twice 30 Gbytes/Day 25 20 15 10 5 Already more retrieval than ingest! 0 1994.8 1995.3 1995.8 1996.3 Ingest 1996.8 Year 1997.3 1997.8 1998.3 Retrievals Ingest 1998.8 1999.3 graphics from US NVO project Large database science • • • • • Rare object searches modelling populations statistical manipulation large sample monitoring the UNKNOWN next steps in use of archives • inter-operability and joint queries – e.g. retrieve Sloan, UKIDSS and XMM images from single query – click on image and get spectrum – give me all objects redder than X that have no radio counterpart but already have a spectrum next steps in use of archives • exploration and visualisation tools – large image scrolling and projection – N-d parameter space plotting – VR headsets next steps in use of archives • large data-set manipulation tools – Fourier transforms – Finding outliers – Data compression – PCA analysis next steps in use of archives • information discovery tools – intelligent search agents – networked NED the scary bit..... • SDSS science archive a few TB • WFCAM will produce a TB/week • VISTA even worse... • Peta-Byte databases coming your way ... data intensive computing • • • • search SuperCOS data : few hours search VISTA DB : few months ! need clever DB structures / query memory need parallel machines – simple PC farms for simple queries ? – shared memory architecture for manipulations ? remote analysis services • Janet delivers 10 Mb/s to door – 10TByte dataset takes 93 days to download • lesson : shift the results not the data – i.e. data centres must also be service providers • • • • data subset access database query service analysis tools on line OR ability to upload code remote visualisation service Grids • services remote .... also distributed ? • computational grids – – – – web is distributed information ; grid is distributed CPU networked users have supercomputers at their fingertips don't even need to know where they are like plugging into the electrical power grid technical issues • • • • • • • • • • • • • • • • • data format standards metadata and annotation standards information exchange protocols presentation service standards request translation middleware workload scheduling, resource allocation mass storage management computing fabric management differentiated service network technology distributed data management - caching, file replication, file migration visualisation technology and algorithms data discovery methods search agents and AI database structure and query methods data mining algorithms s/w libraries and tools for upload requests data quality assurance (levels of club membership ?) – all science-wide and commerce-wide issues ... context • Global Grids work – basic computer science and technology development • grids work in other sciences – CERN grid – Earth observation grid • international astro-plans – US National Virtual Observatory (NVO) project – UK AstroGrid project – European Astrophysical Virtual Observatory (AVO) project AstroGrid project • • • • developed during LTSR proposal to PPARC October 2000 three year project one year Phase A study – – – – community consultation science requirements analysis benchmark tests pilot database federations • we need use-cases.... Phase B - preliminary • uniform AstroGrid interface • data-mining machines connected in grid • tool for simultaneous browsing – plus advanced visualisation, links to spectra etc. • tools for advanced database analysis – advanced querying, mixture fitting, statistical manipulations etc. • tools for on-line data analysis – statistics, model fitting • system for uploading code FIN