Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Sloan Digital Sky Survey Alex Szalay Department of Physics and Astronomy The Johns Hopkins University The Sloan Digital Sky Survey A project run by the Astrophysical Research Consortium (ARC) The University of Chicago Princeton University The Johns Hopkins University The University of Washington Fermi National Accelerator Laboratory US Naval Observatory The Japanese Participation Group The Institute for Advanced Study Max Planck Inst, Heidelberg SLOAN Foundation, NSF, DOE, NASA Goal: To create a detailed multicolor map of the Northern Sky over 5 years, with a budget of approximately $80M Data Size: 40 TB raw, 2 TB processed Alex Szalay, JHU Scientific Motivation Create the ultimate map of the Universe: The Cosmic Genome Project! Study the distribution of galaxies: What is the origin of fluctuations? What is the topology of the distribution? Measure the global properties of the Universe: How much dark matter is there? Local census of the galaxy population: How did galaxies form? Find the most distant objects in the Universe: What are the highest quasar redshifts? Alex Szalay, JHU Cosmology Primer The Universe is expanding: the galaxies move away from us spectral lines are redshifted The fate of the universe depends on the balance between gravity and the expansion velocity v = Ho r Hubble’s law = density/critical if <1, expand forever Most of the mass in the Universe is dark matter, and it may be cold (CDM) The spatial distribution of galaxies is correlated, due to small ripples in the early Universe d> * P(k): power spectrum Alex Szalay, JHU The ‘Naught’ Problem What are the global parameters of the Universe? H0 0 0 the Hubble constant the density parameter the cosmological constant 55-75 km/s/Mpc 0.25-1 0 - 0.7 Their values are still quite uncertain today... Goal: measure these parameters with an accuracy of a few percent High Precision Cosmology! Alex Szalay, JHU The Cosmic Genome Project The SDSS will create the ultimate map of the Universe, with much more detail than any other measurement before daCosta etal 1995 deLapparent, Geller and Huchra 1986 Gregory and Thompson 1978 Alex Szalay, JHU SDSS Collaboration 2002 Area and Size of Redshift Surveys 1.00E+09 SDSS photo-z 1.00E+08 No of objects 1.00E+07 SDSS main SDSS abs line 1.00E+06 SDSS red 1.00E+05 CfA+ SSRS 2dF LCRS 1.00E+04 SAPM 1.00E+03 1.00E+04 2dFR 1.00E+05 1.00E+06 QDOT 1.00E+07 1.00E+08 Volume in M pc 3 Alex Szalay, JHU 1.00E+09 1.00E+10 1.00E+11 Clustering of Galaxies We will measure the spectrum of the density fluctuations to high precision even on very large scales The error in the amplitude of the fluctuation spectrum 1970 1990 1995 1998 1999 2002 x100 x2 ±0.4 ±0.2 ±0.1 ±0.05 Alex Szalay, JHU Relevant Scales Distances measured in Mpc [megaparsec] 1 Mpc 5 Mpc 3000 Mpc = 3 x 1024 cm = distance between galaxies = scale of the Universe if >200 Mpc fluctuations have a PRIMORDIAL shape if <100 Mpc gravity creates sharp features, like walls, filaments and voids Biasing conversion of mass into light is nonlinear light is much more clumpy than the mass Alex Szalay, JHU The Topology of Local Universe Measure the Topology of the Universe Does it consist of walls and voids or is it randomly distributed? Alex Szalay, JHU Finding the Most Distant Objects Intermediate and high redshift QSOs Multicolor selection function. Luminosity functions and spatial clustering. High redshift QSO’s (z>5). Alex Szalay, JHU Features of the SDSS Special 2.5m telescope, located at Apache Point, NM 3 degree field of view. Zero distortion focal plane. Two surveys in one: Photometric survey in 5 bands. Spectroscopic redshift survey. Huge CCD Mosaic 30 CCDs 2K x 2K (imaging) 22 CCDs 2K x 400 (astrometry) Two high resolution spectrographs 2 x 320 fibers, with 3 arcsec diameter. R=2000 resolution with 4096 pixels. Spectral coverage from 3900Å to 9200Å. Automated data reduction Over 100 man-years of development effort. (Fermilab + collaboration scientists) Very high data volume Expect over 40 TB of raw data. About 2 TB processed products Data made available to the public Alex Szalay, JHU Apache Point Observatory Located in New Mexico, near White Sands National Monument Alex Szalay, JHU The Telescope Special 2.5m telescope 3 degree field of view Zero distortion focal plane Wind screen moved separately Alex Szalay, JHU The Photometric Survey Northern Galactic Cap 5 broad-band filters ( u', g', r', i', z’ ) limiting magnitudes (22.3, 23.3, 23.1, 22.3, 20.8) drift scan of 10,000 square degrees 55 sec exposure time 40 TB raw imaging data -> pipeline -> 100,000,000 galaxies 50,000,000 stars calibration to 2% at r'=19.8 only done in the best seeing (20 nights/yr) pixel size is 0.4 arcsec, astrometric precision is 60 milliarcsec Southern Galactic Cap multiple scans (> 30 times) of the same stripe Continuous data rate of 8 Mbytes/sec Alex Szalay, JHU Survey Strategy Overlapping 2.5 degree wide stripes Avoiding the Galactic Plane (dust) Multiple exposures on the three Southern stripes Alex Szalay, JHU The Spectroscopic Survey Measure redshifts of objects distance SDSS Redshift Survey: 1 million galaxies 100,000 quasars 100,000 stars Two high throughput spectrographs spectral range 3900-9200 Å. 640 spectra simultaneously. R=2000 resolution. Automated reduction of spectra Very high sampling density and completeness Objects in other catalogs also targeted Alex Szalay, JHU Optimal Tiling Fields have 3 degree diameter Centers determined by an optimization procedure A total of 2200 pointings 640 fibers assigned simultaneously Alex Szalay, JHU The Mosaic Camera Alex Szalay, JHU Photometric Calibrations The SDSS will create a new photometric system: u' g' r' i' z' Primary standards: observed with the USNO 40-inch telescope in Flagstaff Secondary standards: observed with the SDSS 20-inch telescope at Apache Point – calibrating the SDSS imaging data Alex Szalay, JHU The Spectrographs Two double spectrographs very high throughput two 2048x2048 CCD detectors mounted on the telescope light fed through slithead Alex Szalay, JHU The Fiber Feed System Galaxy images are captured by optical fibers lined up on the spectrograph slit Manually plugged during the day into Al plugboards 640 fibers in each bundle The largest fiber system today Alex Szalay, JHU First Light Images Telescope: First light May 9th 1998 Equatorial scans Alex Szalay, JHU The First Stripes Camera: 5 color imaging of >100 square degrees Multiple scans across the same fields Photometric limits as expected Alex Szalay, JHU NGC 2068 Alex Szalay, JHU UGC 3214 Alex Szalay, JHU NGC 6070 Alex Szalay, JHU The First Quasars The four highest redshift quasars have been found in the first SDSS test data ! Alex Szalay, JHU Methane/T Dwarf SDSS T-dwarf Discovery of several new(June 1999) objects by SDSS & 2MASS Alex Szalay, JHU Detection of Gravitational Lensing 28,000 foreground galaxies and 2,045,000 background galaxies in test data (McKay etal 1999) Alex Szalay, JHU SDSS Data Flow Alex Szalay, JHU Distributed Collaboration Fermilab U.Washington ESNET U.Chicago I. Advanced Study VBNS Japan Apache Point Observatory Princeton U. JHU NMSU USNO Alex Szalay, JHU Data Processing Pipelines Alex Szalay, JHU Concept of the SDSS Archive Operational Archive Science Archive (products accessible to users) (raw + processed data) Other Archives Other OtherArchives Archives Alex Szalay, JHU SDSS Data Products Object catalog parameters of >108 objects Redshift Catalog parameters of 106 objects 400 GB 1 GB Atlas Images 5 color cutouts of >108 objects 1.5 TB Spectra in a one-dimensional form 60 GB Derived Catalogs - clusters - QSO absorption lines 20 GB 4x4 Pixel All-Sky Map heavily compressed 60 GB All raw data saved in a tape vault at Fermilab Alex Szalay, JHU Who will be using the archive? Power Users sophisticated, with lots of resources research is centered around the archive data moderate number of very intensive queries mostly statistical, large output sizes General Astronomy Public frequent, but casual lookup of objects/regions the archives help their research, but not central to it large number of small queries a lot of cross-identification requests Wide Public browsing a ‘Virtual Telescope’ can have large public appeal need special packaging could be a very large number of requests Alex Szalay, JHU How will the data be analyzed? The data are inherently multidimensional => positions, colors, size, redshift Improved classifications result in complex N-dimensional volumes => complex constraints, not ranges Spatial relations will be investigated => nearest neighbors => other objects within a radius Data Mining: finding the ‘needle in the haystack’ => separate typical from rare => recognize patterns in the data Output size can be prohibitively large for intermediate files => import output directly into analysis tools Alex Szalay, JHU Geometric Approach The Main Problem: •fast, indexed, complex searches of Terabytes in k-dim space •searches are not necessary parallel to the axes => traditional indexing (b-tree) does not work Geometric Approach: •Use the geometric nature of the k-dimensional data •Quantize data into containers of ‘friends’: objects of similar colors close on the sky stored together => efficient cache performance •Containers represent a coarse grained density map of the data multidimensional index tree: k-d tree + r-tree Alex Szalay, JHU Geometric Indexing “Divide and Conquer” Partitioning Attributes Number Sky Position Multiband Fluxes Other 3 N = 5+ M= 100+ 3NM Hierarchical Triangular Mesh Split as k-d tree Stored as r-tree of bounding boxes Alex Szalay, JHU Using regular indexing techniques Sky coordinates Stored as Cartesian coordinates: projected onto a unit sphere Longitude and Latitude lines: intersections of planes and the sphere Boolean combinations: query polyhedron Alex Szalay, JHU Sky Partitioning Hierarchical Triangular Mesh - based on octahedron Alex Szalay, JHU Hierarchical Subdivision Hierarchical subdivision of spherical triangles represented as a quadtree In SDSS the tree is 5 levels deep - 8192 triangles Alex Szalay, JHU Result of the Query Alex Szalay, JHU Magnitudes and Multicolor Searches Galaxy fluxes • large dynamic range • errors divergent as x 0 ! m 2.5 log 10 ( f / f 0 ) 2.5 log 10 x x 2 m 2 2 m x 2 x x 2 For multicolor magnitudes the error contours can be very anisotropic and skewed, extremely poor localization! But: this is an artifact of the logarithm at zero flux, in flux space the object is well localized Alex Szalay, JHU Novel Magnitude Scale 2.5 1 f sinh c ln 10 b b: softness c: set to match normal magnitudes Advantages: monotonic degrades gracefully objects have small error ellipse unified handling of detections and upper limits! Disadvantages: unusual (Lupton, Gunn and Szalay, AJ 99) Alex Szalay, JHU Flux Indexing Split along alternating flux directions Create balanced partitions Store bounding boxes at each step Build a 10-12 level tree in each triangle Alex Szalay, JHU How to build compact cells? The SDSS will measure fluxes in 5 bands => asinh magnitudes Axis-parallel splits in median flux, in 8 separate zones in Galactic latitude => 5 dimensional bounding boxes The fluxes are strongly correlated => 2 + dimensional distribution of typical objects => widely scattered rare objects => large density contrasts Therefore: first create a local density and split on its value (Csabai etal 96) typical (98%) rare (2%) Alex Szalay, JHU Coarse Grained Design Analysis Engine User Interface Archive Query Support Data Warehouse Alex Szalay, JHU Distributed Implementation User Interface Analysis Engine Master SX Engine Objectivity Federation Objectivity Slave Slave Slave Objectivity Slave Objectivity RAID Objectivity RAID Objectivity RAID RAID Alex Szalay, JHU JHU Contributions Fiber spectrographs P. Feldman A. Uomoto S. Friedman S. Smee Management T. Heckman T. Poehler A. Davidsen A. Uomoto A. Szalay Science Archive A. Szalay A. Thakar P. Kunszt I. Csabai Gy. Szokoly A. Connolly A. Chaudhaury A lot of help from Jim Gray, Microsoft Alex Szalay, JHU Processing Platforms At Fermilab: 2 AlphaServer 8200 data processing 1 data bases SGI Origin 2000 Archive at JHU: 1 AlphaServer 1000A (development) 10 Intel based servers w. LVD RAID software verified on Digital Unix, IRIX, Solaris, Linux Alex Szalay, JHU Exploring new methods New spectral classification techniques galaxy spectra can be expressed as a superposition of a few (<5) principal components => objective classification of 1 million spectra! Photometric redshifts galaxy colors systematically change with redshift, the SDSS photometry works like a 5-pixel spectrograph => z=0.05, but with 100 million objects! Measuring cosmological parameters before: data analysis was limited by small number statistics after: dominant errors are systematic (extinction) => new analysis methods are required! Alex Szalay, JHU Photometric redshifts Multicolor photometry maps physical parameters luminosity L observed fluxes redshift z spectral type T Inversion: u’,g’,r’,I’,z’ => z, L, T Redshifts are statistical, with large errors: z0.05 The data set is huge, more than 100 million galaxies Easy to subdivide into coarse z bins, and by type => study evolution => enormous volume - 1 Gpc3 Alex Szalay, JHU Measuring P(k) Karhunen-Loeve transform: Signal-to-noise eigenmodes of the redshift survey Optimal extraction of clustering signal Maximal rejection of systematic errors (Vogeley and Szalay 96, Matsubara, Szalay and Landy 99) North South Combined 8 22 0.4800..20 22 0.3100..19 15 0.40 00..14 06 0.82 00..06 05 0.7500..05 04 0.7800..04 05 0.1500..05 05 0.14 00..05 03 0.14 00..03 Pilot project using the Las Redshift Survey WeCampanas simultaneously measure the values of the redshift-distortion with 22,000 galaxies parameter (=0.6/b), the normalization (8 ) and the CDM shape parameter ( = h). Alex Szalay, JHU Trends • Future dominated by detector improvements 1000 100 10 1 0.1 1970 1975 1980 1985 1990 1995 2000 CCDs • Moore’s Law growth in CCD capabilities • Gigapixel arrays on the horizon • Improvements in computing and storage will track growth in data volume • Investment in software is critical, and growing Glass Total area of 3m+ telescopes in the world in m2, total number of CCD pixels in Megapix, as a function of time. Growth over 25 years is a factor of 30 in glass, 3000 in pixels. Alex Szalay, JHU The Age of Mega-Surveys The next generation of astronomical archives with Terabyte catalogs will dramatically change astronomy top-down design large sky coverage built on sound statistical plans uniform, homogeneous, well calibrated well controlled and documented systematics The technology to acquire, store and index the data is here we are riding Moore’s Law Data mining in such vast archives will be a challenge, but possibilities are quite unimaginable Integrating these archives into a single entity is a project for the whole community => National Virtual Observatory Alex Szalay, JHU New Astronomy – Different! Systematic Data Exploration will have a central role in the New Astronomy Digital Archives of the Sky will be the main access to data Data “Avalanche” the flood of Terabytes of data is already happening, whether we like it or not! Transition to the new may be organized or chaotic Alex Szalay, JHU NVO: The Challenges Size of the archived data • • • • 40,000 square degrees is 2 trillion pixels One band: 4 Terabytes Multi-wavelength: 10-100 Terabytes Time dimension: few Petabytes The development of • new archival methods • new analysis tools • new standards (metadata, interchange formats) Hardware/networking requirements Training the next generation! Alex Szalay, JHU Summary The SDSS project combines astronomy, physics, and computer science It promises to fundamentally change our view of the universe It will determine how the largest structures in the universe were formed It will serve as the standard astronomy reference for several decades Its ‘virtual universe’ can be explored by both scientists and the public Through its archive it will create a new paradigm in astronomy Alex Szalay, JHU www.sdss.org www.sdss.jhu.edu Alex Szalay, JHU