Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Introduction to the Millennium Database with an SQL tutorial 2007-01-17/19 Leiden Millennium DB Tutorial Overview • Why relational database ? • Overview relational databases – general – Millennium DB design • • • • SQL Tutorial Science queries Tools Advanced subjects (not now) 2007-01-17/19 Leiden Millennium DB Tutorial Website documentation http://www.g-vo.org/Millennium/Help 2007-01-17/19 Leiden Millennium DB Tutorial Why use relational database ? • encapsulation of data in terms of rigorous logical model – no need to know about internals of data storage – forces one to think carefully about data structure • ANSI standard query language (SQL) for finding information one is interested in – remote filtering – speeds up path from science question to answer – facilitates communication • many implementations, commercial and open source – advanced query optimizers (indexes, clustering) 2007-01-17/19 Leiden Millennium DB Tutorial Relational Database concepts Millennium database design 2007-01-17/19 Leiden Millennium DB Tutorial Relational database stores data in relations ( = tables) 2007-01-17/19 Leiden Millennium DB Tutorial Tables • Tables have names • Related data values are stored in rows • Rows have columns – all the same for a given table • Columns have names and data types • Rows often have a unique identifier consisting of the values of >= 1 columns: primary key 2007-01-17/19 Leiden Millennium DB Tutorial Primary Key Column Column Foreign Key Columns Row 2007-01-17/19 Leiden Millennium DB Tutorial Foreign keys • Database can contain many tables • The set of table definitions in a database is called the schema of the database • Tables can related by foreign keys: pointers (by value) from a row in one table to a row in another (or possibly the same) table • Why not combine these rows into one table ? • Consider storing galaxies, with info about their sub-halo as well as the FOF groups these live in. Note, a subhalo contains >=1 galaxies, a FOF halo >= 0 subhalos 2007-01-17/19 Leiden Millennium DB Tutorial One table: redundancy GalaxyEtc galId mStar magB X haloId np hX vMax fofId nSub m200 fX 112 0.215 -17.9 7.6 6625 100 7.6 165 123 2 445.77 7.6 113 0.038 -15.6 7.4 6625 100 7.6 165 123 2 445.77 7.6 154 0.173 -17.1 7.65 6626 65 7.9 130 123 2 445.77 7.6 221 1.20 -20.7 35.1 7883 452 35.1 200 456 2 101.32 35.1 223 0.225 -19.7 35.0 7883 452 35.1 200 456 2 101.32 35.1 225 0.04 -17.5 34.9 7883 452 35.1 200 456 2 101.32 35.1 278 1.54 -19.4 35.2 7884 255 35.2 190 456 2 101.32 35.1 … 2007-01-17/19 Leiden Millennium DB Tutorial FOF Normalization galId haloId mStar magB 112 6625 0.215 -17.9 x … 123 2 445.77 7.6 … 101.32 35.1 … … 456 2 7.6 … 789 1 70.0 … … … … 6625 0.038 -15.6 7.4 154 6626 0.173 -17.1 7.65 … 221 7883 1.20 -20.7 35.1 … 223 7883 0.225 -19.7 35.0 … 6625 225 7883 0.04 -17.5 34.9 … 278 7884 1.54 -19.4 35.2 … … … … … 2007-01-17/19 Leiden m200 X 113 Galaxy fofId nSub … … haloId fofId 67.0 … … … vMax … Np X 123 100 7.6 165 … 6626 123 65 7.9 130 … 7883 456 452 35.1 200 … 7884 456 255 35.2 190 … 9885 789 30 67.0 110 … … … … … … … Millennium DB Tutorial SubHalo Millennium database FOF DHalo Bower2006a MPAMocks SubHalo DSubHalo MField DeLucia2006a 2007-01-17/19 Leiden MPAHalo Millennium DB Tutorial Web browser: http://www.g-vo.org/Millennium http://www.g-vo.org/MyMillennium 2007-01-17/19 Leiden Millennium DB Tutorial SQL Tutorial 2007-01-17/19 Leiden Millennium DB Tutorial SQL • • • • Sequentiual Query Language Filtering, combining, sub-setting of tables Functions, procedures, aggregations Data manipulation: insert/update/delete • A query produces tabular results, which can be used as tables again in sub-queries, or stored in a database • Table creation... 2007-01-17/19 Leiden Millennium DB Tutorial Table creation statement create table MPAHalo ( haloId long not null, descendantId long, -- foreign key lastProgenitorId long, -- foreign key snapnum integer, redshift real, x real,y real,z real, np integer, velDisp real, vmax real, ..., primary key (haloId) ); 2007-01-17/19 Leiden Millennium DB Tutorial SELECT ... FROM ... WHERE ... 1. select from 2. select from 3. select from where 2007-01-17/19 Leiden * MPAHalo snapnum, redshift, np MPAHalo * MPAHalo redshift = 0 Millennium DB Tutorial WHERE conditions • = <> != < > <= >= • np between 100 and 200 • name like ‘%Frenk’ • a=b and d=e • a=b or e=d • id in (1,2,3) • a is null • a is not null • exists ... (later) 2007-01-17/19 Leiden Millennium DB Tutorial Custom column names select , , from 2007-01-17/19 Leiden snapnum as snapshotIndex redshift as z np as numberOfParticles MPAHalo Millennium DB Tutorial Demo queries select * from snapshots select from where and and select haloid,snapnum from MPAHalo where np = 100 x,y MPAHalo z between 10 and 12 np > 50 snapnum = 63 2007-01-17/19 Leiden Millennium DB Tutorial ORDER BY ... [ASC | DESC] select h.* from MPAHalo h order by h.snapnum desc , h.x asc 2007-01-17/19 Leiden Millennium DB Tutorial TOP select top 10 haloid, np from mpahalo where snapnum = 63 order by np desc 2007-01-17/19 Leiden Millennium DB Tutorial Aggregation: count, sum, max, min, avg, stddev select , , from where and 2007-01-17/19 Leiden count(*) as num max(stellarmass) as maxmass avg(stellarmass) as avgmass delucia2006a snapnum = 63 type = 1 Millennium DB Tutorial JOIN (note the aliases) select h.haloid, g.stellarMass from delucia2006a g , mpahalo h where h.np = 1000 and g.haloid = h.haloid galId haloId mStar magB X 112 6625 0.215 -17.9 7.6 113 6625 0.038 -15.6 154 6626 0.173 221 7883 223 haloId fofId Np X vMax 7.4 6625 123 100 7.6 165 -17.1 7.65 6626 123 65 7.9 130 1.20 -20.7 35.1 7883 456 452 35.1 200 7883 0.225 -19.7 35.0 7884 456 255 35.2 190 225 7883 0.04 -17.5 34.9 9885 789 30 67.0 110 278 7884 1.54 -19.4 35.2 2007-01-17/19 Leiden Millennium DB Tutorial Demo: galaxies in massive halos select from , where and and 2007-01-17/19 Leiden h.haloId, g.* DeLucia2006a g MPAHalo h h.snapnum = 63 h.np between 10000 and 11000 g.haloId = h.haloId Millennium DB Tutorial Demo: direct progenitors of massive halos select from , where and and prog.* MPAHalo prog MPAHalo des des.haloId = prog.descendantId des.np > 10000 des.snapnum = 63 2007-01-17/19 Leiden Millennium DB Tutorial GROUP BY select redshift , type , count(*) as numGal , avg(stellarMass) as m_avg , max(stellarMass) as m_max from DeLucia2006a group by redshift, type order by redshift, type 2007-01-17/19 Leiden Millennium DB Tutorial Sub-selects select g.galaxyId from DeLucia2006a g , (select top 10 haloId from mpahalo where snapnum = 63 order by np desc) mh where g.haloId = mh.haloId 2007-01-17/19 Leiden Millennium DB Tutorial Science questions as SQL 2007-01-17/19 Leiden Millennium DB Tutorial Motivation for data model 1.Return the galaxies residing in halos of mass between 10^13 and 10^14 solar masses. 2.Return the galaxy content at z=3 of the progenitors of a halo identified at z=0 3.Return all the galaxies within a sphere of radius 3Mpc around a particular halo 4.Return the complete halo merger tree for a halo identified at z=0 5.Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals. 6.Find properties of all galaxies in haloes of mass 10**14 at redshift 1 which have had a major merger (mass-ratio < 4:1) since redshift 1.5. 7.Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5) 8.Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3 9.Make a list of all haloes at z=3 which contain a galaxy of mass >10**9 Msun which is a progenitor of BCG's in z=0 cluster of mass >10**14.5 10.Find all z=3 galaxies which have NO z=0 descendant. 11.Return the complete galaxy merging history for a given z=0 galaxy. 12.Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at some previous redshift. 13.Find the multiplicity function of halos depending on their environment (overdensity of density field smoothed on certain scale) 14.Find the dependency of halo formation times on environment (“Gao-effect”) 2007-01-17/19 Leiden Millennium DB Tutorial 5. Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals. select x,y,z,velX, velY, velZ from DeLucia2006a where mag_b between -23 and -18 and bulgeMass >= .9*stellarMass and snapnum = 50 2007-01-17/19 Leiden Millennium DB Tutorial 4. Return the complete halo merger tree for a halo identified at z=0 2007-01-17/19 Leiden Millennium DB Tutorial Efficient storage of merger trees in a relational database • Goal: allow queries for the formation history of any object • No recursion possible in RDB, nor desired • Method: – depth first ordering of trees – label by rank in order – pointer to “last progenitor” below each node – all progenitors have label BETWEEN label of root AND that of last progenitor – cluster table on label 2007-01-17/19 Leiden Millennium DB Tutorial Merger trees 2007-01-17/19 Leiden Millennium DB Tutorial select , , , from , where prog.snapnum prog.x prog.y prog.np millimil..mpahalo des millimil..mpahalo prog prog.haloId between des.haloId and des.lastProgenitorId and des.haloId = 0 2007-01-17/19 Leiden Millennium DB Tutorial Some more features of the merger tree data model Leaves : select from where = galaxyId as leaf galaxies des galaxyId lastProgenitorId Branching points : select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1 2007-01-17/19 Leiden Millennium DB Tutorial Main branches • Roots and leaves: select , into from , where and des.galaxyId as rootId min(prog.lastprogenitorid) as leafId rootLeaf mpagalaxies..delucia2006a des mpagalaxies.. delucia2006a prog des.galaxyId = 0 prog.galaxyId between des.galaxyId and des.lastProgenitorId • Main branch select from , where rl.rootId, b.* rootLeaf rl mpagalaxies..delucia2006a b b.galaxyId between rl.rootId and rl.leafId 2007-01-17/19 Leiden Millennium DB Tutorial Find all halos in a subvolume of space: 15 <= x <= 20 20 <= y <= 25 5 <= z <= 10 2007-01-17/19 Leiden Millennium DB Tutorial select from where and and and x,y,z mpahalo snapnum = x between y between z between 63 10 and 20 20 and 30 0 and 10 Inefficient, even when indexed ! 2007-01-17/19 Leiden Millennium DB Tutorial x 15.001083 15.001247 15.002215 15.002735 15.002753 15.005095 15.006593 15.011488 15.011741 15.011868 15.013065 15.013158 15.014361 15.017322 15.018202 2007-01-17/19 Leiden y 42.471325 58.420914 38.042484 50.487785 20.000177 13.637599 22.170828 24.824438 48.099907 23.312265 23.969515 56.041866 59.503357 46.257664 27.333895 z 24.673561 42.722874 29.557423 57.716877 8.21466 16.135191 48.242783 19.773285 11.500685 27.858799 18.883507 40.82894 45.31733 44.37695 9.441319 Millennium DB Tutorial Spatial indexes • Performance of finding things is improved if those things are co-located on disk: ordering, indices • Co-locating a 3D configuration of points on a 1D disk can only be done approximately • Space filling curves: Peano-Hilbert, Z-curve 2007-01-17/19 Leiden Millennium DB Tutorial Zones 2007-01-17/19 Leiden Millennium DB Tutorial Zone index • Course sampling of points in multiple dimensions allows simple multidimensional ordering • ix = floor(x/10Mpc) iy = floor(y/10Mpc) iz = floor(z/10Mpc) • index on (snapnum,ix,iy,iz,x,y,z,galaxyId) 2007-01-17/19 Leiden Millennium DB Tutorial IX 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 IY 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2007-01-17/19 Leiden IZ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X 15.061804 15.069336 15.100678 15.173968 15.194122 15.2500305 15.365576 15.372606 15.524696 15.583943 15.6358385 15.66383 15.673803 15.717824 15.847992 15.883896 15.91041 15.916905 16.047333 Y 20.891907 23.437601 20.905642 22.36883 20.67583 24.246683 23.290754 20.203691 21.03997 22.344622 26.785904 22.829983 26.918291 22.365341 24.700747 22.593819 26.531118 27.137867 28.93811 Millennium DB Tutorial Z 4.4156647 9.812217 4.613036 8.01832 4.8034463 1.6651521 9.404872 2.0006201 4.280077 9.421347 9.881406 7.137772 3.302736 9.221828 1.389664 7.277129 2.5693457 4.289855 5.414605 Return B-band luminosity function of galaxies residing in halos of mass between 10^13 and 10^14 solar masses. select , from , where and and group .2*floor(5*g.mag_b) as magB count(*) as num DeLucia2006a g MPAHalo h g.haloId = h.haloId h.m_TopHat between 1000 and 10000 h.redshift=0 by .2*floor(5*g.mag_b) 2007-01-17/19 Leiden Millennium DB Tutorial 13.Find the dependency of halo formation times on environment 2007-01-17/19 Leiden Millennium DB Tutorial select zForm , avg(g5) as g5 , avg(g10) as g10 from MMField , ( select des.haloId, des.phkey, max(PROG.redshift) as zForm from MPAHalo PROG, MPAHalo DES where DES.snapnum = 63 and PROG.haloId between DES.haloId and DES.lastProgenitorId and prog.np >= des.np/2 and des.np between 100 and 200 group by des.haloId, des.phkey ) t where t.phkey = f.phkey and f.snapnum=63 group by zForm 2007-01-17/19 Leiden Millennium DB Tutorial Tools 2007-01-17/19 Leiden Millennium DB Tutorial Other tools • wget, UNIX/LINUX command wget "http://www.g-vo.org/Millennium?action=doQuery & SQL=select top 10 haloid,snapnum, x,y,z,np from mpahalo" • Use in R (similar in IDL) ... • TOPCAT 2007-01-17/19 Leiden Millennium DB Tutorial Thank you. 2007-01-17/19 Leiden Millennium DB Tutorial