Download Gerard Lemson

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Astronomical spectroscopy wikipedia , lookup

Transcript
Databases@MPA,
access methods and plans
With contributions from
• JHU : Alex Szalay, Jan Vanderberg
• MPA: Jeremy Blaizot, Jarle Brinchmann,
Guinevere Kauffmann, Anja von der Linden,
Ben Panter, Guo Qi, Volker Springel,
Vivienne Wild
Toledo, 2006-02-25
Databases @ MPA
Last year, Budapest
• Presented milli-Millennium halo merger tree
database
• Requests:
–
–
–
–
More properties (lambda, ...) X
Galaxies V
Correlation with environment (galaxies in voids) V
Millennium
• Why use databases ? Ask Alex.
Toledo, 2006-02-25
Databases @ MPA
Current status
• milli-Millennium
– Galaxies added: merger trees, links to their parent halos
– Density field at various smoothings
– Updated web site (demo)
• Millennium subset
– Subset (~2%, 10x milli-Mil) of halo and galaxy trees
– Z=0 density field
• Millennium
– Halo trees in database (proprietary)
– SAM galaxies under way (settle on model etc)
– Density fields at all Z will be added: 1056964608 rows
• Durham
– milli_Millennium mirror (Postgres)
– Durham halo tree and galaxy catalogues
Toledo, 2006-02-25
Databases @ MPA
Other databases
• ROSAT: source catalogues and RASS photons (~100
million)
• SDSS Peripherals
– SDSS_MPA (Brinchman, Kauffmann, Tremonti et al)
– MOPED (Ben Panter)
– SDSS_PCA (Vivienne Wild et al)
• GalICS (Jeremy Blaizot)
• HEALPix all sky maps (Alex Szalay, Tony Banday)
–
–
–
–
wmap (3 year data soon !)
extinction maps
radio maps (Bonn)
ROSAT background (hopefully)
Toledo, 2006-02-25
Databases @ MPA
Access
• Public: http://www.g-vo.org/mpasims
• Local web apps to Millennium, BESTDR3 and
peripherals: http://www.g-vo.org/sdssdr3/
• Public web browser queries limited (1min,
10000 rows)
• Local databases + web apps less limited
Toledo, 2006-02-25
Databases @ MPA
Streaming
• Query results temporarily buffered on server:
memory
• Streaming queries: faster, less limited (only
timeout)
• Access:
– IDL (with Ben Panter)
• wget –http-user=*** --http-password=*** -O localfile.csv
http://www.g-vo.org/sdssdr3/DBQueryStream?SQL=select * from
moped..agebin
• GUI asking for username/password
• Interprets CSV stream, turned into IDL components
– TOPCAT
Toledo, 2006-02-25
Databases @ MPA
Plans: Millennium
• Millennium:
– Tune database
• 750000000 halos
• N x 1000000000 galaxies
• 63 x 256^3 density field grid cells
– More halo properties (shape, λ, ...)
– More galaxy catalogues
• different parameters
• different algorithms (GalICS, Durham, ...)
–
–
–
–
–
–
–
Light cone mock catalogues
Galaxy spectra (+ PCA)
Links to SDSS mirror and peripherals
Proper metadata handling (ala SkyServer)
"SAM online„
Move webapps to MPA
Use JHU services, install CAS jobs
Toledo, 2006-02-25
Databases @ MPA
Plans: SDSS mirror + peripherals
• Make mirror web site public
• Upgrade SDSS mirror to DR4 …
• Stabilize, document, publish SDSS
peripherals
• Proper metadata handling
• Links to Millennium
• Personal databases: MyDB (ala SkyServer)
• Add logos
Toledo, 2006-02-25
Databases @ MPA
Theory VO: spectra
• Combine theory and observations
• Example: query-by-example on theory
spectra
• Find similar spectra, from these the actual
galaxy formation history
• Chi-squared on all stored spectra ? Slow,
requires storing all of them
• Idea (not original, see HVO/JHU talks): use
PCA to compress data
Toledo, 2006-02-25
Databases @ MPA
PCA
• Need training sample of theory spectra to
create eigenspectra
• Project all spectra
• Store PCA amplitudes in DB
• Provide web service:
– Upload (observational) spectrum (IVOA SSA/SED)
– Project onto theory eigenspectra
– Use amplitudes as parameters in query for
“nearby” amplitudes
– Return corresponding theory spectra
– Return corresponding galaxy formation histories,
or their halos, or their environment …
Toledo, 2006-02-25
Databases @ MPA
Issues
• Dealing with errors, gaps: “gappy PCA”
(Connolly & Szalay)
• Normalization:
– incoming spectrum in general from very different
dataset, needs common normalization
– Incoming set will have gaps, errors
– Ad hoc normalization possible (and works quite
good)
• Indexing of complex multi-dimensional point
set for quick nearest k neigbours search
(Voronoi ? See Laszlo‘s work)
Toledo, 2006-02-25
Databases @ MPA
Normalized gappy PCA
• Fit normalization factor at same time as PCA
amplitudes. Model:
• Minimize (over ai and N ) :
Toledo, 2006-02-25
Databases @ MPA
Toledo, 2006-02-25
Databases @ MPA
Toledo, 2006-02-25
Databases @ MPA
Toledo, 2006-02-25
Databases @ MPA
So far
• Ran PCA on BC03 stochastic bursts
(Vivienne)
• On first GalICS+milli-Millennium spectra
(Jeremy)
• Projected SDSS spectra on both
• Defined a PCA data model/schema
• Stored PCAs in database
• TOPCAT
Toledo, 2006-02-25
Databases @ MPA
PCA data model (RDB schema available)
-algorithm
PCADecompositionAlgorithm
1
*
-pcaDecomposition
-catalogue
PCARun
SpectrumCatalogue
-restRedshift : double
1
*
1
PCAPreProcessing
-preprocessing
-lambda
-mean
-variance
-wavelengthMask
*
*
PCAEigenSpectrum
-pcaRank : int
-eigenSpectra
PCASpectrum
-inputSpectra
-assumedRedshift : double
-featureMask
*
1
*
-spectrum
Spectrum
*
-spectrum
*
-redshift
-target
-spectra
PCAProjectionRun
1
*
*
-amplitudes
*
1
-spectrum
PCAAmplitudes
*
-normalization : double
-redshiftShift : double
-amplitudes : double
-algorithm
PCAProjectionAlgrithm
Toledo, 2006-02-25
Databases @ MPA
PhotometryPoint
-lambda
-bin
-flux
-error
Toledo, 2006-02-25
Databases @ MPA
Toledo, 2006-02-25
Databases @ MPA
Toledo, 2006-02-25
Databases @ MPA
milliMil-GalICS
PC1 vs PC2 Voronoi tesselation
Toledo, 2006-02-25
Databases @ MPA
Issues for query-by-example
•
•
•
•
Overlap quite good, but good enough ?
GalICS spread less than SDSS.
BC03 comparable with SDSS, but different slope.
Systematics
– Model:
• physics very preliminary (see Blaizot & de Lucia?)
• resolution effects
– Preprocessing SDSS galaxies
• Rebinning: different algorithms give comparable results
• (slightly) wrong redshift ? Can be easily simulated
– Projection algorithm: normalization does not affect outcome
– Observational systematics: use virtual telescope (+virtual
spectrograph) to test on the theory spectra.
Easier to blow up simulation than to shrink observation
cloud
Toledo, 2006-02-25
Databases @ MPA
Comments
• Millennium database being used for science
projects (Guo Qi)
• SDSS peripherals used for science projects
(see Vivienne’s talk, Ben Panter)
• Use of mydb for debugging and testing
(Jeremy)
• Please give comments, feedback.
Toledo, 2006-02-25
Databases @ MPA