* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download László Dobos - magpop
Survey
Document related concepts
Transcript
Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary Typical datasets ~200M data points multidimensional parameter space real space, magnitudes, colors, redshift etc. over several hundred GBs Typical tasks Object types are classified by a set of linear inequlities in magnitude space (n dimensional polihedra) Compute a histogram of the whole parameter space Find similar objects Find clusters Compare the distribution of two very large datasets Database servers Ideal for storing a large amount of data even when data structure is not so complex Optimized access of data over file systems Clever caching methods MSSQL 2005 can be programmed efficiently Problems In DB servers evaluating data points against inequalities is done on a per row basis Traditionally points close in the magnitude space may be far on the disk(s) when the expected resultset is small, it is not optimal to run a tablescan Main idea divide the parameter space into small cells two main methods: hierarchical: kd-tree, modified kd-tree etc adaptive: Voronoi tessalation section cells with search polihedra instead of check every point check on a per point basis only when needed Voronoi-Dealunay tessalation kd-Trees Steps for creating the tessalation 1. 2. 3. 4. 5. For adaptive (like Voronoi): choose starting points (randomly?) Calculate the cells, store in the database Lookup the cell of each data point Create a DB index on the cell ID orders dataset on the disk Run queries Technical details done in MSSQL using the new SQL CLR features allows running programs within the process of the database server: very fast! linear programming, Voronoi etc. libraries are ported to .net/c# Preliminary results BoxTree vs. Standard SQL query times 80000 70000 query time [msec] 60000 50000 BoxTree durations 40000 SQL durations 30000 20000 10000 0 0 0.05 0.1 0.15 0.2 # returned rows / # total rows 0.25 0.3 0.35 Scientific ideas SDSS photometry – 5D – 300M points finding all objects with similar colors source classification star – quasar separation blue – red galaxy locus etc. Karhunen-Loeve (PCA) coeffs of BruzualCharlot models – 5-15D – 100K-100M p Quick match with observed spectra Scientific ideas cont. Magnitudes of spectral synthesis models – 5-10D – 100K-100M points match with observations photo-z physical props. from photometry check consitency of various models (BC-GRASIL) Multiresolution visualization of large number of points