Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Extremely Large Databases in Astronomy: LSST Extremely Large Databases Workshop SLAC October 25, 2007 Kian-Tat Lim Stanford Linear Accelerator Center Outline Why? What? How? What does this mean? Extremely Large Databases Workshop October 25, 2007 SLAC 2 / 30 Outline Why? What? How? What does this mean? Extremely Large Databases Workshop October 25, 2007 SLAC 3 / 30 LSST Overview Proposed telescope to be built in Chile Extremely Large Databases Workshop October 25, 2007 SLAC 4 / 30 Telescope 3.2 gigapixel camera 8.4 meter diameter mirror Extremely Large Databases Workshop October 25, 2007 SLAC 5 / 30 Survey Wide Fast Deep Extremely Large Databases Workshop October 25, 2007 SLAC 6 / 30 Dark Matter and Energy Photo: J. A. Tyson, W. Colley, E. L. Turner, and NASA Extremely Large Databases Workshop October 25, 2007 SLAC 7 / 30 Variable Objects Extremely Large Databases Workshop October 25, 2007 SLAC 8 / 30 Transient Objects Extremely Large Databases Workshop October 25, 2007 SLAC 9 / 30 Moving Objects Photo: D. Roddy, Lunar and Planetary Institute Extremely Large Databases Workshop October 25, 2007 SLAC 10 / 30 Outline Why? What? How? What does this mean? Extremely Large Databases Workshop October 25, 2007 SLAC 11 / 30 Data Challenge Images Database Extremely Large Databases Workshop October 25, 2007 SLAC 12 / 30 Data Locations Mountain Base Camp Archive Center Data Access Center Extremely Large Databases Workshop October 25, 2007 SLAC 13 / 30 Non-Image Data In Database Moving Objects Catalog Object Catalog Source Catalog Metadata Provenance Statistics Summaries Difference Image Source Catalog Calibration Engineering and Facility Database 2 Extremely Large Databases Workshop October 25, 2007 SLAC 14 / 30 Sagans of Rows 49 billion objects 2.8 trillion sources Extremely Large Databases Workshop October 25, 2007 SLAC 15 / 30 Lots of Columns 269 columns/Object 56 columns/Source Extremely Large Databases Workshop October 25, 2007 SLAC 16 / 30 Denormalization Load Query Extremely Large Databases Workshop October 25, 2007 SLAC 17 / 30 Database Size Grows to 14 PB with indices Extremely Large Databases Workshop October 25, 2007 SLAC 18 / 30 Managing The Database Similar to commercial analytical data warehouses Extremely Large Databases Workshop October 25, 2007 SLAC 19 / 30 Comparisons Google and few others are here LSST We have 12 years to reach today’s commercial limits Likely growth faster than linear Wal-mart BaBar SDSS (~0.02) AT&T * All numbers based on publicly available data Extremely Large Databases Workshop October 25, 2007 SLAC 20 / 30 Outline Why? What? How? What does this mean? Extremely Large Databases Workshop October 25, 2007 SLAC 21 / 30 Reliability Duplicate Buffer Replicate Backup Mirror Extremely Large Databases Workshop October 25, 2007 SLAC 22 / 30 Integrity Never modify raw data Provenance Reprocessing Extremely Large Databases Workshop October 25, 2007 SLAC 23 / 30 Scalability Horizontally scalable Portable code Extremely Large Databases Workshop October 25, 2007 SLAC 24 / 30 Specialized file system Database servers Map/Reduce + SQL Extremely Large Databases Workshop October 25, 2007 SLAC 25 / 30 Outline Why? What? How? What does this mean? Extremely Large Databases Workshop October 25, 2007 SLAC 26 / 30 What Can Vendors Do? Mine! Mining Extremely Large Databases Workshop October 25, 2007 SLAC 27 / 30 What Can Vendors Do? Relax Relax Extremely Large Databases Workshop October 25, 2007 SLAC 28 / 30 What Can Vendors Do? Maintain Maintenance Extremely Large Databases Workshop October 25, 2007 SLAC 29 / 30 Summary Many issues common to extremely large DBs Extremely Large Databases Workshop October 25, 2007 SLAC 30 / 30