Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Databases@MPA, access methods and plans With contributions from • JHU : Alex Szalay, Jan Vanderberg • MPA: Jeremy Blaizot, Jarle Brinchmann, Guinevere Kauffmann, Anja von der Linden, Ben Panter, Guo Qi, Volker Springel, Vivienne Wild Toledo, 2006-02-25 Databases @ MPA Last year, Budapest • Presented milli-Millennium halo merger tree database • Requests: – – – – More properties (lambda, ...) X Galaxies V Correlation with environment (galaxies in voids) V Millennium • Why use databases ? Ask Alex. Toledo, 2006-02-25 Databases @ MPA Current status • milli-Millennium – Galaxies added: merger trees, links to their parent halos – Density field at various smoothings – Updated web site (demo) • Millennium subset – Subset (~2%, 10x milli-Mil) of halo and galaxy trees – Z=0 density field • Millennium – Halo trees in database (proprietary) – SAM galaxies under way (settle on model etc) – Density fields at all Z will be added: 1056964608 rows • Durham – milli_Millennium mirror (Postgres) – Durham halo tree and galaxy catalogues Toledo, 2006-02-25 Databases @ MPA Other databases • ROSAT: source catalogues and RASS photons (~100 million) • SDSS Peripherals – SDSS_MPA (Brinchman, Kauffmann, Tremonti et al) – MOPED (Ben Panter) – SDSS_PCA (Vivienne Wild et al) • GalICS (Jeremy Blaizot) • HEALPix all sky maps (Alex Szalay, Tony Banday) – – – – wmap (3 year data soon !) extinction maps radio maps (Bonn) ROSAT background (hopefully) Toledo, 2006-02-25 Databases @ MPA Access • Public: http://www.g-vo.org/mpasims • Local web apps to Millennium, BESTDR3 and peripherals: http://www.g-vo.org/sdssdr3/ • Public web browser queries limited (1min, 10000 rows) • Local databases + web apps less limited Toledo, 2006-02-25 Databases @ MPA Streaming • Query results temporarily buffered on server: memory • Streaming queries: faster, less limited (only timeout) • Access: – IDL (with Ben Panter) • wget –http-user=*** --http-password=*** -O localfile.csv http://www.g-vo.org/sdssdr3/DBQueryStream?SQL=select * from moped..agebin • GUI asking for username/password • Interprets CSV stream, turned into IDL components – TOPCAT Toledo, 2006-02-25 Databases @ MPA Plans: Millennium • Millennium: – Tune database • 750000000 halos • N x 1000000000 galaxies • 63 x 256^3 density field grid cells – More halo properties (shape, λ, ...) – More galaxy catalogues • different parameters • different algorithms (GalICS, Durham, ...) – – – – – – – Light cone mock catalogues Galaxy spectra (+ PCA) Links to SDSS mirror and peripherals Proper metadata handling (ala SkyServer) "SAM online„ Move webapps to MPA Use JHU services, install CAS jobs Toledo, 2006-02-25 Databases @ MPA Plans: SDSS mirror + peripherals • Make mirror web site public • Upgrade SDSS mirror to DR4 … • Stabilize, document, publish SDSS peripherals • Proper metadata handling • Links to Millennium • Personal databases: MyDB (ala SkyServer) • Add logos Toledo, 2006-02-25 Databases @ MPA Theory VO: spectra • Combine theory and observations • Example: query-by-example on theory spectra • Find similar spectra, from these the actual galaxy formation history • Chi-squared on all stored spectra ? Slow, requires storing all of them • Idea (not original, see HVO/JHU talks): use PCA to compress data Toledo, 2006-02-25 Databases @ MPA PCA • Need training sample of theory spectra to create eigenspectra • Project all spectra • Store PCA amplitudes in DB • Provide web service: – Upload (observational) spectrum (IVOA SSA/SED) – Project onto theory eigenspectra – Use amplitudes as parameters in query for “nearby” amplitudes – Return corresponding theory spectra – Return corresponding galaxy formation histories, or their halos, or their environment … Toledo, 2006-02-25 Databases @ MPA Issues • Dealing with errors, gaps: “gappy PCA” (Connolly & Szalay) • Normalization: – incoming spectrum in general from very different dataset, needs common normalization – Incoming set will have gaps, errors – Ad hoc normalization possible (and works quite good) • Indexing of complex multi-dimensional point set for quick nearest k neigbours search (Voronoi ? See Laszlo‘s work) Toledo, 2006-02-25 Databases @ MPA Normalized gappy PCA • Fit normalization factor at same time as PCA amplitudes. Model: • Minimize (over ai and N ) : Toledo, 2006-02-25 Databases @ MPA Toledo, 2006-02-25 Databases @ MPA Toledo, 2006-02-25 Databases @ MPA Toledo, 2006-02-25 Databases @ MPA So far • Ran PCA on BC03 stochastic bursts (Vivienne) • On first GalICS+milli-Millennium spectra (Jeremy) • Projected SDSS spectra on both • Defined a PCA data model/schema • Stored PCAs in database • TOPCAT Toledo, 2006-02-25 Databases @ MPA PCA data model (RDB schema available) -algorithm PCADecompositionAlgorithm 1 * -pcaDecomposition -catalogue PCARun SpectrumCatalogue -restRedshift : double 1 * 1 PCAPreProcessing -preprocessing -lambda -mean -variance -wavelengthMask * * PCAEigenSpectrum -pcaRank : int -eigenSpectra PCASpectrum -inputSpectra -assumedRedshift : double -featureMask * 1 * -spectrum Spectrum * -spectrum * -redshift -target -spectra PCAProjectionRun 1 * * -amplitudes * 1 -spectrum PCAAmplitudes * -normalization : double -redshiftShift : double -amplitudes : double -algorithm PCAProjectionAlgrithm Toledo, 2006-02-25 Databases @ MPA PhotometryPoint -lambda -bin -flux -error Toledo, 2006-02-25 Databases @ MPA Toledo, 2006-02-25 Databases @ MPA Toledo, 2006-02-25 Databases @ MPA milliMil-GalICS PC1 vs PC2 Voronoi tesselation Toledo, 2006-02-25 Databases @ MPA Issues for query-by-example • • • • Overlap quite good, but good enough ? GalICS spread less than SDSS. BC03 comparable with SDSS, but different slope. Systematics – Model: • physics very preliminary (see Blaizot & de Lucia?) • resolution effects – Preprocessing SDSS galaxies • Rebinning: different algorithms give comparable results • (slightly) wrong redshift ? Can be easily simulated – Projection algorithm: normalization does not affect outcome – Observational systematics: use virtual telescope (+virtual spectrograph) to test on the theory spectra. Easier to blow up simulation than to shrink observation cloud Toledo, 2006-02-25 Databases @ MPA Comments • Millennium database being used for science projects (Guo Qi) • SDSS peripherals used for science projects (see Vivienne’s talk, Ben Panter) • Use of mydb for debugging and testing (Jeremy) • Please give comments, feedback. Toledo, 2006-02-25 Databases @ MPA