Download László Dobos - magpop

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

PL/SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Spatial Indexing of large
astronomical databases
László Dobos,
István Csabai,
Márton Trencséni
ELTE, Hungary
Typical datasets

~200M data points

multidimensional parameter space
real space, magnitudes, colors, redshift etc.

over several hundred GBs
Typical tasks
Object types are classified by a set of
linear inequlities in magnitude space
(n dimensional polihedra)
 Compute a histogram of the whole
parameter space
 Find similar objects
 Find clusters
 Compare the distribution of two very large
datasets

Database servers
Ideal for storing a large amount of data
even when data structure is not so
complex
 Optimized access of data over file systems
 Clever caching methods
 MSSQL 2005 can be programmed
efficiently

Problems
In DB servers evaluating data points
against inequalities is done on a per row
basis
 Traditionally points close in the magnitude
space may be far on the disk(s)
 when the expected resultset is small, it is
not optimal to run a tablescan

Main idea
divide the parameter space into small cells
 two main methods:

hierarchical: kd-tree, modified kd-tree etc
 adaptive: Voronoi tessalation

section cells with search polihedra instead
of check every point
 check on a per point basis only when
needed

Voronoi-Dealunay tessalation
kd-Trees
Steps for creating the tessalation
1.
2.
3.
4.
5.
For adaptive (like Voronoi):
choose starting points (randomly?)
Calculate the cells, store in the database
Lookup the cell of each data point
Create a DB index on the cell ID
orders dataset on the disk
Run queries
Technical details
done in MSSQL using the new SQL CLR
features
 allows running programs within the
process of the database server: very fast!
 linear programming, Voronoi etc. libraries
are ported to .net/c#

Preliminary results
BoxTree vs. Standard SQL query times
80000
70000
query time [msec]
60000
50000
BoxTree durations
40000
SQL durations
30000
20000
10000
0
0
0.05
0.1
0.15
0.2
# returned rows / # total rows
0.25
0.3
0.35
Scientific ideas

SDSS photometry – 5D – 300M points
finding all objects with similar colors
source classification
star – quasar separation
blue – red galaxy locus etc.

Karhunen-Loeve (PCA) coeffs of BruzualCharlot models – 5-15D – 100K-100M p
Quick match with observed spectra
Scientific ideas cont.
Magnitudes of spectral synthesis models –
5-10D – 100K-100M points
match with observations
photo-z
physical props. from photometry
check consitency of various models
(BC-GRASIL)
 Multiresolution visualization of large
number of points
