Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Virtual Observatory: A Quick Overview, and Some Lessons Learned S. George Djorgovski Caltech ESIP Workshop, UCSB, July 2009 Astronomy Has Become Very Data-Rich • Typical digital sky survey now generates ~ 10 - 100 TB, plus a comparable amount of derived data products – PB-scale data sets are on the horizon • Astronomy today has ~ 1 - 2 PB of archived data, and generates a few TB/day – Both data volumes and data rates grow exponentially, with a doubling time ~ 1.5 years – Even more important is the growth of data complexity • For comparison: Human memory ~ a few hundred MB Human Genome < 1 GB 1 TB ~ 2 million books Library of Congress (print only) ~ 30 TB 1000 doubling t ≈ 1.5 yrs 100 10 1 0.1 1970 1975 1980 1985 1990 1995 2000 Exponential Growth in Data Volumes and Complexity TB’s to PB’s of data, 108 - 109 sources, 102 - 103 param./source Crab CCDs Star forming complex Glass Multi- data fusion leads to a more complete, less biased picture (also: multi-scale, multi-epoch, …) Visible + X-ray Radio + IR Understanding of complex phenomena requires complex data! Numerical simulations are also producing many TB’s of very complex “data” Data + Theory = Understanding The Archive Archipelago • As the data sets kept increasing, a number of archives, data depositories, and digital library services were created • All of them are mission-, domain-, or observatory-specific, distinct and independent scientifically, technologically, institutionally, heterogeneous in look-feel, usage, etc. – – – – There was a considerable replication of effort There was some functional redundancy There was almost no interoperability Some standards have been generally adopted (e.g., FITS) • All of them were primarily designed for single-object (or single-pointing) queries - and thus inherently unsuitable for the science enabled by the massive and complex data sets • The next step was clearly to connect them in a functional manner, and develop interoperability standards, formats, etc. The Virtual Observatory Concept • A complete, dynamical, distributed, open research environment for the new astronomy with massive and complex data sets – Provide and federate content (data, metadata) services, standards, and analysis/compute services – Develop and provide data exploration and discovery tools – Not just the archives! – A part of a broader Cyber-Infrastructure and e-Science movement From Traditional to Survey to VO Science Traditional: Survey-Based: Another Survey/Archive? Telescope Survey Telescope Archive Data Analysis Results Follow-Up Telescopes Target Selection Data Mining Results Highly successful, but inherently limited by the information content of individual sky surveys … What comes next, beyond survey science is the VO science A Systemic View of the VO-Based Science Primary Data Providers Surveys Observatories Missions Survey and Mission Archives Secondary Data Providers VO Data Services --------------Data Mining and Analysis, Target Selection Follow-Up Telescopes and Missions Results Digital libraries VO connects the whole system of astronomical research A Brief History of the VO Concept • Early (pre-web!) ideas already in the “Astrophysics Data System” (only the digital library part survives) • Concept developed through 1990’s, mainly from large digital sky surveys (DPOSS, SDSS…), discussions at conferences and workshops in the late 1990’s • Top recommendation in the “small projects” category in the NAS Decadal Astronomy & Astrophysics survey (the McKee-Taylor report), 2001 • The first major VO conference at Caltech in 2000; the NVO White paper • National Virtual Observatory Science Definition Team, 2001 - 2002 • ESO conferences, 2001 - 2002 • Vigorous international efforts, coordinated via International VO Alliance (IVOA) VO Development and Status • NSF-funded framework development project (2001-2008): the U.S. National Virtual Observatory (NVO) • Now into a facility regime: Virtual Astro. Obs. (VAO) • Joint funding by the NSF and NASA • Work largely done in the existing data archives, and thus very data-centric • Vigorous international efforts (IVOA) http://us-vo.org http:// ivoa.net Scientific Roles and Benefits of a VO • Facilitate science with massive data sets (observations and theory/simulations) efficiency amplifier • Provide an added value from federated data sets (e.g., multi-wavelength, multi-scale, multi-epoch …) – Discover the knowledge which is present in the data, but can be uncovered only through data fusion • Enable and stimulate some qualitatively new science with massive data sets (not just old-but-bigger) • Optimize the use of expensive resources (e.g., space missions, large ground-based telescopes, computing …) • Provide R&D drivers, application testbeds, and stimulus to the partnering disciplines (CS/IT, statistics …) VO Represents a New Type of a Scientific Organization for the era of information abundance • It is not yet another data center, archive, mission, or a traditional project It does not fit into any of the usual organizational structures – It is inherently distributed, and web-centric – It is fundamentally based on a rapidly developing technology (IT/CS) – It transcends the traditional boundaries between different wavelength regimes, agency domains – It has an unusually broad range of constituents and interfaces – It is inherently multidisciplinary Broader and Societal Benefits of a VO • Professional Empowerment: Scientists and students anywhere with an internet connection would be able to do a first-rate science A broadening of the talent pool in astronomy, democratization of the field • Interdisciplinary Exchanges: – The challenges facing the VO are common to most sciences and other fields of the modern human endeavor – Intellectual cross-fertilization, feedback to IT/CS • Education and Public Outreach: – Unprecedented opportunities in terms of the content, broad geographical and societal range, at all levels – Astronomy as a magnet for the CS/IT education “Weapons of Mass Instruction” VO Education and Public Outreach Microsoft’s World Wide Telescope, and Google Sky: use DSS, SDSS, HST data, etc., for easy sky browsing VO Functionality Today What we did so far: • Lots of progress on interoperability, standards, etc. • An incipient data grid of astronomy • Some useful web services • Community training, EPO What we did not do (yet): • Significant data exploration and mining tools That is where the science will come from! Thus, little VO-enabled science so far Thus, a slow community buy-in Development of powerful, usable knowledge discovery tools should be a key priority An Evolving Sociology • We have transitioned from the data poverty regime into an era of exponential data abundance – Most astronomers do not seem too fully realize this – Proprietary periods should be re-thought; there are other modes of data access rights currencies, different scenarios? – Data are cheap, but the expertise is expensive (and creativity is priceless) • Telescopes are just the hardware needed to generate the data; and data are just incidental to our real mission, which is knowledge creation – When the data and the exploration tools are on the web, the value of large facilities ownership should be rethought – Computers are (relatively) cheap, but software is expensive — especially if you are not approaching it in a smart way Information Technology New Science • The information volume grows exponentially Most data will never be seen by humans! The need for data storage, network, database-related technologies, standards, etc. • Information complexity is also increasing greatly Most data (and data constructs) cannot be comprehended by humans directly! The need for data mining, KDD, data understanding technologies, hyperdimensional visualization, AI/Machine-assisted discovery … • We need to create a new scientific methodology on the basis of applied CS and IT • VO is the framework to effect this for astronomy Some Readings: • A quick summary: – “Virtual Observatory: From Concept to Implementation”, Djorgovski, S.G., & Williams, R. 2005, A.S.P. Conf. Ser. 345, 517, available as http://arXiv.org/abs/astro-ph/0504006 • The original VO White Paper: – “Toward a National Virtual Observatory: Science Goals, Technical Challenges, and Implementation Plan”, in Virtual Observatories of the Future, A.S.P. Conf. Ser. 225, 353, available as http://arXiv.org/abs/astro-ph/0108115 • The NVO SDT report, from http://www.us-vo.org/sdt • Many other good documents available at http://us-vo.org (especially the summer school presentations) • Technical documents at http://www.ivoa.net