Download Hierarchical Triangular Mesh

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft SQL Server wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
How to speed up search of ILMT light curves using the
HTM (Hierarchical Triangular Mesh)
method in relational databases
Poels, J.
•ILMT Software
• Data acquisition
• Clustering Framework (HPC)
• Data reduction
• Image subtraction (application to GL)
• Databases (RDBMS)
• ….
•GAIA QSO Classifier Software
ARC Liège, 11 February 2010
Motivation
• We explore ways of doing spatial search within a relational database
• hierarchical triangular mesh and HEALPix (a tessellation of the
sphere) as a zoned bucketing system, representing areas as disjunctivenormal form constraints.
• The approach has the virtue that the zone mechanism works well on BTrees native to all SQL systems and integrates naturally with current
query optimizers
• Involved projects:
– SDSS (Sloan Digital Sky Survey)
– GSC (Guide Star Catalog) Palomar and UK Schmidt surveys
– COBE (Cosmic Background Explorer)
– WMAP (Wilkinson Microwave Anisotropy Probe)
– ….
ILMT_operating_param
time
temperature
CCD position
rotation (ppm)
........
ILMT_reference_cat
2MASS/SDSS cat
Iterative_cat_1
Iterative_cat_2
Iterative_cat_(N-1)
ILMT_reference_cat
Iterative_cat_N
Fits_files_cat
fits_file_id
filename
path
file_type
processing_level
x_global_0
y_global_0
alpha_0
Reference_img_cat
source_id
fits_file_id1
fits_file_id2
x_local
y_local
x_global
y_global
alpha
delta
flux_aperture
flux_psf_fit
Mag_R
Object_type
Processing_No
Fwhm
Isolated_flag
Night catalogs
Night_1_cat
Night_2_cat
source_id
fits_file_id1
Night_3_cat
source_id
fits_file_id2
fits_file_id1
source_id
x_localfits_file_id2
Night_N_cat
y_localx_localfits_file_id1
source_id
fits_file_id2
x_global
y_localx_localfits_file_id1
y_global
x_global
y_localfits_file_id2
alpha y_global
x_local
delta alpha x_global
y_local
y_global
flux_aperture
delta alpha x_global
flux_psf_fit
flux_aperture
delta y_global
Mag_Rflux_psf_fit
alpha
flux_aperture
Object_type
Mag_Rflux_psf_fit
delta
Processing_Nr
Object_type
Mag_Rflux_aperture
FwhmProcessing_Nr
flux_psf_fit
Object_type
Isolated_flag
FwhmProcessing_Nr
Mag_R
Isolated_flag
FwhmObject_type
Processing_Nr
Isolated_flag
Fwhm
Isolated_flag
Objects_rowid_ptr
source_id
Night1_rowid
Night2_rowid
…
…
…
…
…
…
…
NightN_rowid
Ref_img_cat_rowid
Is this (horizontal time) model suitable ?
•
•
•
•
•
•
•
•
At the beginning the RDBMS should run smoothly but after 5 years of
operation ?
Indexing is not an easy task
A given source will be measures an order of 10^3 where each measure set
featuring ~ 200 bytes
Assuming ~10x10^6 sources we get a multi-TB DB (for alphanumeric data
only)! Consider also ~ 25TB of image data
Example of performance bottleneck: Search all constant point-like sources. We
have to scan the whole DB and for each source, track its history. This means
that for each source we have to issue 10x10^6x(N-1) SQL statements ! Forget
it.
Beyond the query complexity, the DMS prefetch the rows which are more
likely to be read: useless and slow disk activity.
The data must be rearranged
Solution: HTM
HTM (Hierachical Triangular Mesh)
• HTM [18] maps triangular
regions of the sphere to unique
identifiers
• The technique for subdividing
the sphere in spherical triangles
is a recursive process
• At each level of the recursion, the
area of the resulting triangles is
roughly the same
• In areas with a larger data
density, the recursion process can
be applied with a greater level of
detail than in areas with lower
density
• The starting point is a spherical
octahedron which identifies 8
spherical triangles of equal size
•
The term quadtree is used to describe a class of hierarchical data structures
based on the principle of a fast recursive decomposition of the space.
•
Sky tessellation with various mapping functions have been proposed. It is a
matter of fact that the astronomical community is accepting the HTM and
HEALPix (Hierarchical, Equal Area, and iso-Latitude Pixelisation) schema as
the default for object catalogues and for maps visualization and analysis,
respectively. HEALPix gives a hierarchical iso-area and iso-latitude
tessellation of the sphere and so are convenient for harmonic data analysis on
the sphere (densities, integrals, spherical harmonics, Fourier transforms,etc.,)
• Using a 64 bit long integer to store the index IDs leads to a limit for
the pixels size of about 7.7 and 0.44 milli-arcsec on a side for HTM
and HEALPix, respectively. Being able to quickly retrieve the list of
objects in a given sky region is crucial in several projects.
Application to the ILMT
• The indexing scheme does not have to go as deep as the actual pixel resolution
• Each triangle is linked to its own database table (the GSC approach)
• SQL queries involving searches based on RA,DEC (+range) quickly provide the
HtmId(s) which in turn is used to build the table_name(s) to be accessed.
• We can dynamically choose a triangle surface coverage depending on the
maximum number of sources (e.g. max 100 sources per triangle)
• One single SQL statement returns a cursor with the whole source_id history
• The problem is now: for each triangle database table, select distinct all
source_id and fetch history rows ~ 10x10^6 SQL runs ! Manageable 
• The indexing C++ software is freely available at:
http://www.sdss.jhu.edu/