Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research http://research.microsoft.com/~gray Alex Szalay Johns Hopkins University 1 The Evolution of Science • Observational Science – Scientist gathers data by direct observation – Scientist analyzes data • Analytical Science – Scientist builds analytical model – Makes predictions. • Computational Science – Simulate analytical model – Validate model and makes predictions • Data Exploration Science Data captured by instruments Or data generated by simulator – Processed by software – Placed in a database / files – Scientist analyzes database / files 2 Information Avalanche • In science, industry, government,…. – better observational instruments and – and, better simulations producing a data avalanche Image courtesy C. Meneveau & A. Szalay @ JHU • Examples – BaBar: Grows 1TB/day 2/3 simulation Information 1/3 observational Information – CERN: LHC will generate 1GB/s .~10 PB/y – VLBA (NRAO) generates 1GB/s today – Pixar: 100 TB/Movie BaBar, Stanford P&E Gene Sequencer From http://www.genome.uci.edu/ • New emphasis on informatics: – Capturing, Organizing, Summarizing, Analyzing, Visualizing 3 Space Telescope World Wide Telescope Virtual Observatory http://www.astro.caltech.edu/nvoconf/ http://www.voforum.org/ • Premise: Most data is (or could be online) • The Internet is the world’s best telescope: – – – – It has data on every part of the sky In every measured spectral band: optical, x-ray, radio.. As deep as the best instruments (2 years ago). It is up when you are up. The “seeing” is always great (no working at night, no clouds no moons no..). – It’s a smart telescope: links objects and data to literature on them. 4 Why Astronomy Data? IRAS 25m •It has no commercial value –No privacy concerns –Can freely share results with others –Great for experimenting with algorithms 2MASS 2m •It is real and well documented – High-dimensional data (with confidence intervals) – Spatial data – Temporal data DSS Optical •Many different instruments from many different places and many different times •Federation is a goal •There is a lot of it (petabytes) •Great sandbox for data mining algorithms IRAS 100m WENSS 92cm –Can share cross company –University researchers •Great way to teach both Astronomy and Computational Science NVSS 20cm 5 ROSAT ~keV GB 6cm SkyServer.SDSS.org • A modern Astronomy archive – Raw Pixel data lives in file servers – Catalog data (derived objects) lives in Database – Online query to any and all • Also used for education – 150 hours of online Astronomy – Implicitly teaches data analysis • Interesting things – – – – – – Spatial data search Client query interface via Java Applet Query interface via Emacs Popular Cloned by other surveys (a template design) Web services are core of it. 6 Federation: SkyQuery.Net • Combine 4 archives initially • Just added 6 more • Send query to portal, portal joins data from archives. • Problem: want to do multi-step data analysis (not just single query). • Solution: Allow personal databases on portal • Problem: some queries are monsters • Solution: “batch schedule” on portal server, Deposits answer in personal database. 7 SkyQuery Structure • Each SkyNode publishes – Schema Web Service – Database Web Service • Portal is – Plans Query (2 phase) – Integrates answers – Is itself a web service Image Cutout SDSS INT SkyQuery Portal FIRST 2MASS 8 Information Avalanche: science, business, personal Astronomy data SkyServer: http://SkyServer.SDSS.org demo http://skyquery.net/ pixel space record space set space Personal SkyServer download http://skyserver.org/myskyserver/ Mention data mining. World-Wide Telescope Federated web services demo http://skyquery.net/ Other web services Interop with Linux/Python/… Other stuff Portal with batch job scheduler http://skyservice.pha.jhu.edu/devel/casjobs/ 9