Download this link

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to the
Millennium Database
with an SQL tutorial
2007-01-17/19 Leiden
Millennium DB Tutorial
Overview
• Why relational database ?
• Overview relational databases
– general
– Millennium DB design
•
•
•
•
SQL Tutorial
Science queries
Tools
Advanced subjects (not now)
2007-01-17/19 Leiden
Millennium DB Tutorial
Website documentation
http://www.g-vo.org/Millennium/Help
2007-01-17/19 Leiden
Millennium DB Tutorial
Why use relational database ?
• encapsulation of data in terms of rigorous logical
model
– no need to know about internals of data storage
– forces one to think carefully about data structure
• ANSI standard query language (SQL) for finding
information one is interested in
– remote filtering
– speeds up path from science question to answer
– facilitates communication
• many implementations, commercial and open source
– advanced query optimizers (indexes, clustering)
2007-01-17/19 Leiden
Millennium DB Tutorial
Relational Database concepts
Millennium database design
2007-01-17/19 Leiden
Millennium DB Tutorial
Relational database stores data in
relations ( = tables)
2007-01-17/19 Leiden
Millennium DB Tutorial
Tables
• Tables have names
• Related data values are stored in rows
• Rows have columns
– all the same for a given table
• Columns have names and data types
• Rows often have a unique identifier
consisting of the values of >= 1
columns: primary key
2007-01-17/19 Leiden
Millennium DB Tutorial
Primary Key Column
Column
Foreign Key Columns
Row
2007-01-17/19 Leiden
Millennium DB Tutorial
Foreign keys
• Database can contain many tables
• The set of table definitions in a database is
called the schema of the database
• Tables can related by foreign keys: pointers
(by value) from a row in one table to a row in
another (or possibly the same) table
• Why not combine these rows into one table ?
• Consider storing galaxies, with info about
their sub-halo as well as the FOF groups
these live in.
Note, a subhalo contains >=1 galaxies,
a FOF halo >= 0 subhalos
2007-01-17/19 Leiden
Millennium DB Tutorial
One table: redundancy
GalaxyEtc
galId mStar magB
X
haloId
np
hX
vMax fofId nSub
m200
fX
112
0.215
-17.9
7.6
6625
100
7.6
165
123
2
445.77
7.6
113
0.038
-15.6
7.4
6625
100
7.6
165
123
2
445.77
7.6
154
0.173
-17.1
7.65
6626
65
7.9
130
123
2
445.77
7.6
221
1.20
-20.7
35.1
7883
452 35.1
200
456
2
101.32
35.1
223
0.225
-19.7
35.0
7883
452 35.1
200
456
2
101.32
35.1
225
0.04
-17.5
34.9
7883
452 35.1
200
456
2
101.32
35.1
278
1.54
-19.4
35.2
7884
255 35.2
190
456
2
101.32
35.1
…
2007-01-17/19 Leiden
Millennium DB Tutorial
FOF
Normalization
galId haloId mStar magB
112
6625
0.215
-17.9
x
…
123
2
445.77
7.6
…
101.32 35.1 …
…
456
2
7.6
…
789
1
70.0
…
…
…
…
6625
0.038
-15.6
7.4
154
6626
0.173
-17.1
7.65 …
221
7883
1.20
-20.7
35.1 …
223
7883
0.225
-19.7
35.0 …
6625
225
7883
0.04
-17.5
34.9 …
278
7884
1.54
-19.4
35.2 …
…
…
…
…
2007-01-17/19 Leiden
m200
X
113
Galaxy
fofId nSub
…
…
haloId fofId
67.0 …
…
…
vMax …
Np
X
123
100
7.6
165
…
6626
123
65
7.9
130
…
7883
456
452 35.1
200
…
7884
456
255 35.2
190
…
9885
789
30
67.0
110
…
…
…
…
…
…
…
Millennium DB Tutorial
SubHalo
Millennium database
FOF
DHalo
Bower2006a
MPAMocks
SubHalo
DSubHalo
MField
DeLucia2006a
2007-01-17/19 Leiden
MPAHalo
Millennium DB Tutorial
Web browser:
http://www.g-vo.org/Millennium
http://www.g-vo.org/MyMillennium
2007-01-17/19 Leiden
Millennium DB Tutorial
SQL Tutorial
2007-01-17/19 Leiden
Millennium DB Tutorial
SQL
•
•
•
•
Sequentiual Query Language
Filtering, combining, sub-setting of tables
Functions, procedures, aggregations
Data manipulation:
insert/update/delete
• A query produces tabular results, which can
be used as tables again in sub-queries, or
stored in a database
• Table creation...
2007-01-17/19 Leiden
Millennium DB Tutorial
Table creation statement
create table MPAHalo (
haloId long not null,
descendantId long, -- foreign key
lastProgenitorId long, -- foreign key
snapnum integer, redshift real,
x real,y real,z real,
np integer, velDisp real, vmax real,
...,
primary key (haloId)
);
2007-01-17/19 Leiden
Millennium DB Tutorial
SELECT ... FROM ... WHERE ...
1.
select
from
2.
select
from
3.
select
from
where
2007-01-17/19 Leiden
*
MPAHalo
snapnum, redshift, np
MPAHalo
*
MPAHalo
redshift = 0
Millennium DB Tutorial
WHERE conditions
• = <> != < > <= >=
• np between 100 and 200
• name like ‘%Frenk’
• a=b and d=e
• a=b or e=d
• id in (1,2,3)
• a is null
• a is not null
• exists ... (later)
2007-01-17/19 Leiden
Millennium DB Tutorial
Custom column names
select
,
,
from
2007-01-17/19 Leiden
snapnum as snapshotIndex
redshift as z
np as numberOfParticles
MPAHalo
Millennium DB Tutorial
Demo queries
select *
from snapshots
select
from
where
and
and
select haloid,snapnum
from MPAHalo
where np = 100
x,y
MPAHalo
z between 10 and 12
np > 50
snapnum = 63
2007-01-17/19 Leiden
Millennium DB Tutorial
ORDER BY ... [ASC | DESC]
select h.*
from MPAHalo h
order by h.snapnum desc
,
h.x asc
2007-01-17/19 Leiden
Millennium DB Tutorial
TOP
select top 10 haloid, np
from mpahalo
where snapnum = 63
order by np desc
2007-01-17/19 Leiden
Millennium DB Tutorial
Aggregation:
count, sum, max, min, avg, stddev
select
,
,
from
where
and
2007-01-17/19 Leiden
count(*) as num
max(stellarmass) as maxmass
avg(stellarmass) as avgmass
delucia2006a
snapnum = 63
type = 1
Millennium DB Tutorial
JOIN
(note the aliases)
select h.haloid, g.stellarMass
from delucia2006a g
,
mpahalo h
where h.np = 1000
and g.haloid = h.haloid
galId
haloId
mStar
magB
X
112
6625
0.215
-17.9
7.6
113
6625
0.038
-15.6
154
6626
0.173
221
7883
223
haloId
fofId
Np
X
vMax
7.4
6625
123
100
7.6
165
-17.1
7.65
6626
123
65
7.9
130
1.20
-20.7
35.1
7883
456
452
35.1
200
7883
0.225
-19.7
35.0
7884
456
255
35.2
190
225
7883
0.04
-17.5
34.9
9885
789
30
67.0
110
278
7884
1.54
-19.4
35.2
2007-01-17/19 Leiden
Millennium DB Tutorial
Demo: galaxies in massive halos
select
from
,
where
and
and
2007-01-17/19 Leiden
h.haloId, g.*
DeLucia2006a g
MPAHalo h
h.snapnum = 63
h.np between 10000 and 11000
g.haloId = h.haloId
Millennium DB Tutorial
Demo: direct progenitors of massive halos
select
from
,
where
and
and
prog.*
MPAHalo prog
MPAHalo des
des.haloId = prog.descendantId
des.np > 10000
des.snapnum = 63
2007-01-17/19 Leiden
Millennium DB Tutorial
GROUP BY
select redshift
,
type
,
count(*) as numGal
,
avg(stellarMass) as m_avg
,
max(stellarMass) as m_max
from DeLucia2006a
group by redshift, type
order by redshift, type
2007-01-17/19 Leiden
Millennium DB Tutorial
Sub-selects
select g.galaxyId
from DeLucia2006a g
,
(select top 10 haloId
from mpahalo
where snapnum = 63
order by np desc) mh
where g.haloId = mh.haloId
2007-01-17/19 Leiden
Millennium DB Tutorial
Science questions as SQL
2007-01-17/19 Leiden
Millennium DB Tutorial
Motivation for data model
1.Return the galaxies residing in halos of mass between 10^13 and 10^14 solar
masses.
2.Return the galaxy content at z=3 of the progenitors of a halo identified at z=0
3.Return all the galaxies within a sphere of radius 3Mpc around a particular halo
4.Return the complete halo merger tree for a halo identified at z=0
5.Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour
and bulge-to-disk ratio within given intervals.
6.Find properties of all galaxies in haloes of mass 10**14 at redshift 1 which have had a
major merger (mass-ratio < 4:1) since redshift 1.5.
7.Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5)
8.Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3
9.Make a list of all haloes at z=3 which contain a galaxy of mass >10**9 Msun which is a
progenitor of BCG's in z=0 cluster of mass >10**14.5
10.Find all z=3 galaxies which have NO z=0 descendant.
11.Return the complete galaxy merging history for a given z=0 galaxy.
12.Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at
some previous redshift.
13.Find the multiplicity function of halos depending on their environment (overdensity
of density field smoothed on certain scale)
14.Find the dependency of halo formation times on environment (“Gao-effect”)
2007-01-17/19 Leiden
Millennium DB Tutorial
5. Find positions and velocities for all galaxies at
redshift zero with B-luminosity, colour and bulge-to-disk
ratio within given intervals.
select x,y,z,velX, velY, velZ
from DeLucia2006a
where mag_b between -23 and -18
and bulgeMass >= .9*stellarMass
and snapnum = 50
2007-01-17/19 Leiden
Millennium DB Tutorial
4. Return the complete halo merger tree
for a halo identified at z=0
2007-01-17/19 Leiden
Millennium DB Tutorial
Efficient storage of merger trees in a
relational database
• Goal: allow queries for the formation history of any
object
• No recursion possible in RDB, nor desired
• Method:
– depth first ordering of trees
– label by rank in order
– pointer to “last progenitor” below each node
– all progenitors have label BETWEEN label of root
AND that of last progenitor
– cluster table on label
2007-01-17/19 Leiden
Millennium DB Tutorial
Merger trees
2007-01-17/19 Leiden
Millennium DB Tutorial
select
,
,
,
from
,
where
prog.snapnum
prog.x
prog.y
prog.np
millimil..mpahalo des
millimil..mpahalo prog
prog.haloId between des.haloId
and des.lastProgenitorId
and des.haloId = 0
2007-01-17/19 Leiden
Millennium DB Tutorial
Some more features of the
merger tree data model
Leaves :
select
from
where
=
galaxyId as leaf
galaxies des
galaxyId
lastProgenitorId
Branching points :
select descendantId
from galaxies des
where descendantId != -1
group by descendantId
having count(*) > 1
2007-01-17/19 Leiden
Millennium DB Tutorial
Main branches
• Roots and leaves:
select
,
into
from
,
where
and
des.galaxyId as rootId
min(prog.lastprogenitorid) as leafId
rootLeaf
mpagalaxies..delucia2006a des
mpagalaxies.. delucia2006a prog
des.galaxyId = 0
prog.galaxyId between
des.galaxyId and des.lastProgenitorId
• Main branch
select
from
,
where
rl.rootId, b.*
rootLeaf rl
mpagalaxies..delucia2006a b
b.galaxyId between
rl.rootId and rl.leafId
2007-01-17/19 Leiden
Millennium DB Tutorial
Find all halos in a subvolume of space:
15 <= x <= 20
20 <= y <= 25
5 <= z <= 10
2007-01-17/19 Leiden
Millennium DB Tutorial
select
from
where
and
and
and
x,y,z
mpahalo
snapnum =
x between
y between
z between
63
10 and 20
20 and 30
0 and 10
Inefficient, even when indexed !
2007-01-17/19 Leiden
Millennium DB Tutorial
x
15.001083
15.001247
15.002215
15.002735
15.002753
15.005095
15.006593
15.011488
15.011741
15.011868
15.013065
15.013158
15.014361
15.017322
15.018202
2007-01-17/19 Leiden
y
42.471325
58.420914
38.042484
50.487785
20.000177
13.637599
22.170828
24.824438
48.099907
23.312265
23.969515
56.041866
59.503357
46.257664
27.333895
z
24.673561
42.722874
29.557423
57.716877
8.21466
16.135191
48.242783
19.773285
11.500685
27.858799
18.883507
40.82894
45.31733
44.37695
9.441319
Millennium DB Tutorial
Spatial indexes
• Performance of finding things is
improved if those things are co-located
on disk: ordering, indices
• Co-locating a 3D configuration of points
on a 1D disk can only be done
approximately
• Space filling curves: Peano-Hilbert,
Z-curve
2007-01-17/19 Leiden
Millennium DB Tutorial
Zones
2007-01-17/19 Leiden
Millennium DB Tutorial
Zone index
• Course sampling of points in multiple
dimensions allows simple multidimensional ordering
• ix = floor(x/10Mpc)
iy = floor(y/10Mpc)
iz = floor(z/10Mpc)
• index on
(snapnum,ix,iy,iz,x,y,z,galaxyId)
2007-01-17/19 Leiden
Millennium DB Tutorial
IX
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
IY
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2007-01-17/19 Leiden
IZ
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
X
15.061804
15.069336
15.100678
15.173968
15.194122
15.2500305
15.365576
15.372606
15.524696
15.583943
15.6358385
15.66383
15.673803
15.717824
15.847992
15.883896
15.91041
15.916905
16.047333
Y
20.891907
23.437601
20.905642
22.36883
20.67583
24.246683
23.290754
20.203691
21.03997
22.344622
26.785904
22.829983
26.918291
22.365341
24.700747
22.593819
26.531118
27.137867
28.93811
Millennium DB Tutorial
Z
4.4156647
9.812217
4.613036
8.01832
4.8034463
1.6651521
9.404872
2.0006201
4.280077
9.421347
9.881406
7.137772
3.302736
9.221828
1.389664
7.277129
2.5693457
4.289855
5.414605
Return B-band luminosity function of galaxies residing
in halos of mass between 10^13 and 10^14 solar
masses.
select
,
from
,
where
and
and
group
.2*floor(5*g.mag_b) as magB
count(*) as num
DeLucia2006a g
MPAHalo h
g.haloId = h.haloId
h.m_TopHat between 1000 and 10000
h.redshift=0
by .2*floor(5*g.mag_b)
2007-01-17/19 Leiden
Millennium DB Tutorial
13.Find the dependency of halo
formation times on environment
2007-01-17/19 Leiden
Millennium DB Tutorial
select zForm
,
avg(g5) as g5
,
avg(g10) as g10
from MMField
, ( select des.haloId, des.phkey,
max(PROG.redshift) as zForm
from MPAHalo PROG,
MPAHalo DES
where DES.snapnum = 63
and PROG.haloId between DES.haloId
and DES.lastProgenitorId
and prog.np >= des.np/2
and des.np between 100 and 200
group by des.haloId, des.phkey ) t
where t.phkey = f.phkey
and f.snapnum=63
group by zForm
2007-01-17/19 Leiden
Millennium DB Tutorial
Tools
2007-01-17/19 Leiden
Millennium DB Tutorial
Other tools
• wget, UNIX/LINUX command
wget "http://www.g-vo.org/Millennium?action=doQuery &
SQL=select top 10 haloid,snapnum, x,y,z,np from mpahalo"
• Use in R (similar in IDL) ...
• TOPCAT
2007-01-17/19 Leiden
Millennium DB Tutorial
Thank you.
2007-01-17/19 Leiden
Millennium DB Tutorial
Related documents