Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
PS1 PSPS Object Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA Outline slide 2 ODM Overview Critical Requirements Driving Design Work Completed Detailed Design Spatial Querying [AS] ODM Prototype [MN] Hardware/Scalability [JV] How Design Meets Requirements WBS and Schedule Issues/Risks [AS] = Alex, [MN] = Maria, [JV] = Jan ODM Overview The Object Data Manager will: Provide a scalable data archive for the PanSTARRS data products Provide query access to the data for Pan-STARRS users Provide detailed usage tracking and logging slide 3 ODM Driving Requirements Total size 100 TB, • 1.5 x 1011 P2 detections • 8.3x1010 P2 cumulative-sky (stack) detections • 5.5x109 celestial objects Nominal daily rate (divide by 3.5x365) • P2 detections: 120 Million/day • Stack detections: 65 Million/day • Objects: 4.3 Million/day Cross-Match requirement: 120 Million / 12 hrs ~ 2800 / s DB size requirement: • 25 TB / yr • ~100 TB by of PS1 (3.5 yrs) slide 4 Work completed so far Built a prototype Scoped and built prototype hardware Generated simulated data • 300M SDSS DR5 objects, 1.5B Galactic plane objects Initial Load done – Created 15 TB DB of simulated data • Largest astronomical DB in existence today Partitioned the data correctly using Zones algorithm Able to run simple queries on distributed DB Demonstrated critical steps of incremental loading It is fast enough • Cross-match > 60k detections/sec • Required rate is ~3k/sec slide 5 Detailed Design Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS) • Schema and Test Queries • Database Management System • Scalable Data Architecture • Hardware Query Manager (QM: CasJobs for prototype) slide 6 High-Level Organization Data DataTransformation TransformationLayer Layer(DX) (DX) objZoneIndx orphans Detections_l1 objZoneIndx Linked servers Load Support1 LoadAdmin Load Supportn PartitionMap Data Loading Pipeline (DLP) [LnkToObj_p1] Linked servers [Objects_pm] P1 Pm [Detections_p1] Meta Detections Data Storage (DS) Database Full table [partitioned table] Output table Partitioned View slide 7 [LnkToObj_pm] [Detections_pm] PS1 PartionsMap Objects LnkToObj Legend Detections_ln LnkToObj_ln LnkToObj_l1 [Objects_p1] orphans Meta Query QueryManager Manager(QM) (QM) Web WebBased BasedInterface Interface(WBI) (WBI) Meta Detailed Design Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS) • Schema and Test Queries • Database Management System • Scalable Data Architecture • Hardware Query Manager (QM: CasJobs for prototype) slide 8 Data Transformation Layer (DX) Based on SDSS sqlFits2CSV package • LINUX/C++ application • FITS reader driven off header files Convert IPP FITS files to • ASCII CSV format for ingest (initially) • SQL Server native binary later (3x faster) Follow the batch and ingest verification procedure described in ICD • 4-step batch verification • Notification and handling of broken publication cycle Deposit CSV or Binary input files in directory structure • Create “ready” file in each batch directory Stage input data on LINUX side as it comes in from IPP slide 9 DX Subtasks DX slide 10 Initialization Job Batch Ingest Batch Verification Batch Conversion FITS schema FITS reader CSV Converter CSV Writer Interface with IPP Naming convention Uncompress batch Read batch Verify Batch Verify Manifest Verify FITS Integrity Verify FITS Content Verify FITS Data Handle Broken Cycle CSV Converter Binary Converter “batch_ready” Interface with DLP DX-DLP Interface Directory structure on staging FS (LINUX): • Separate directory for each JobID_BatchID • Contains a “batch_ready” manifest file – Name, #rows and destination table of each file • Contains one file per destination table in ODM – Objects, Detections, other tables Creation of “batch_ready” file is signal to loader to ingest the batch Batch size and frequency of ingest cycle TBD slide 11 Detailed Design Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS) • Schema and Test Queries • Database Management System • Scalable Data Architecture • Hardware Query Manager (QM: CasJobs for prototype) slide 12 Data Loading Pipeline (DLP) sqlLoader – SDSS data loading pipeline • Pseudo-automated workflow system • Loads, validates and publishes data – From CSV to SQL tables • Maintains a log of every step of loading • Managed from Load Monitor Web interface Has been used to load every SDSS data release • EDR, DR1-6, ~ 15 TB of data altogether • Most of it (since DR2) loaded incrementally • Kept many data errors from getting into database – Duplicate ObjIDs (symptom of other problems) – Data corruption (CSV format invaluable in catching this) slide 13 sqlLoader Design Existing functionality • Shown for SDSS version • Workflow, distributed loading, Load Monitor New functionality • Schema changes • Workflow changes • Incremental loading – Cross-match and partitioning slide 14 sqlLoader Workflow Distributed design Export LOAD achieved with EXP Check CSV linked servers and CHK Build Task DBs SQL Server Agent BLD Build SQL Schema Validate LOAD stage can SQL Backup be done in parallel VAL by loading into BCK Detach temporary task DTC databases PUBLISH stage PUBLISH Publish writes from task FINISH Cleanup PUB DBs to final DB CLN FIN FINISH stage creates indices Loading pipeline is a system of VB and SQL and auxiliary scripts, stored procedures and functions (derived) tables slide 15 Load Monitor Tasks Page slide 16 Load Monitor Active Tasks slide 17 Load Monitor Statistics Page slide 18 Load Monitor – New Task(s) slide 19 Data Validation Tests for data integrity and consistency Scrubs data and finds problems in upstream pipelines Most of the validation can be performed within the individual task DB (in parallel) slide 20 Test Uniqueness Of Primary Keys Test the unique Key in each table Test Foreign Keys Test for consistency of keys that link tables Test Cardinalities Test consistency of numbers of various quantities Test HTM IDs Test the Hierarchical Triamgular Mesh IDs used for spatial indexing Test Link Table Consistency Ensure that links are consistent Distributed Loading Samba-mounted CSV/Binary Files Load Monitor Master Master Schema Slave LoadSupport View of Task DB Master Schema LoadAdmin Task Data Slave LoadSupport View of Task DB Master Task Data Schema Publish Finish Publish Schema slide 21 Publish Data LoadSupport View of Task DB Master Task Data Schema Schema Changes Schema in task and publish DBs is driven off a list of schema DDL files to execute (xschema.txt) Requires replacing DDL files in schema/sql directory and updating xschema.txt with their names PS1 schema DDL files have already been built Index definitions have also been created Metadata tables will be automatically generated using metadata scripts already in the loader slide 22 Workflow Changes Cross-Match and Partition steps will be added to the workflow Cross-match will match detections to objects Partition will horizontally partition data, move it to slice servers, and build DPVs on main slide 23 Export LOAD Check CSVs Create Task DBs Build SQL Schema Validate XMatch PUBLISH Partition Matching Detections with Objects Algorithm described fully in prototype section Stored procedures to cross-match detections will be part of the LOAD stage in loader pipeline Vertical partition of Objects table kept on load server for matching with detections Zones cross-match algorithm used to do 1” and 2” matches Detections with no matches saved in Orphans table slide 24 XMatch and Partition Data Flow Detections Loadsupport Load Detections ObjZoneIndx Detections_In XMatch Orphans LinkToObj_In Pm Update Objects Detections_m LinkToObj_m Merge Partitions Detections_chunk LinkToObj_chunk Pull Chunk PS1 Objects_m slide 25 Pull Partition Objects_m LinkToObj_m Switch Partition Objects LinkToObj Detailed Design Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS) • Schema and Test Queries • Database Management System • Scalable Data Architecture • Hardware Query Manager (QM: CasJobs for prototype) slide 26 Data Storage – Schema slide 27 PS1 Table Sizes Spreadsheet Stars Galaxies Total Objects 5.00E+09 5.00E+08 5.50E+09 1.51E+11 36750000000 2.3E-07 P2 Detections per year 4.30E+10 tablename AltModels CameraConfig FileGroupMap IndexMap Objects ObjZoneIndx PartitionMap PhotoCal PhotozRecipes SkyCells Surveys DropP2ToObj DropStackToObj P2AltFits P2FrameMeta P2ImageMeta P2PsfFits P2ToObj P2ToStack StackDeltaAltFits StackHiSigDeltas StackLowSigDelta StackMeta StackModelFits StackPsfFits StackToObj StationaryTransient 0.3 columns bytes/row total rows 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 5 4 7 88 7 3 10 2 2 2 4 4 13 18 64 34 3 2 13 32 2 49 131 44 4 2 m 3.00 0.3*DR1 1547 287 4335 2301 420 63 4111 151 267 10 267 39 39 71 343 2870 183 31 15 71 167 5000 1551 535 215 39 23 10 30 100 100 5.50E+09 5.50E+09 100 1000 10 50000 30 4.00E+06 4.00E+06 1.51E+10 1.05E+06 6.72E+07 1.51E+11 1.51E+11 1.51E+11 3.68E+09 3.68E+10 1.65E+06 700000 7.50E+09 8.25E+10 8.25E+10 5.00E+08 sum indices total total size (TB) 1.547E-08 8.61E-09 4.335E-07 2.301E-07 2.31 0.3465 4.111E-07 0.000000151 2.67E-09 0.0000005 8.01E-09 0.000156 0.000156 1.06855 0.00036015 0.192864 27.5415 4.6655 2.2575 0.260925 6.13725 0.00825 0.0010857 4.0125 17.7375 3.2175 0.0115 69.76959861 13.95391972 83.72351833 Prototype 1.547E-08 8.61E-09 4.335E-07 2.301E-07 0.693 0.10395 4.111E-07 0.000000151 2.67E-09 0.0000005 8.01E-09 1.33714E-05 1.33714E-05 0.09159 0.00003087 0.0165312 2.3607 0.3999 0.1935 0.022365 0.52605 0.000707143 0.00032571 0.343928571 1.520357143 0.275785714 0.000985714 6.549735569 1.309947114 7.859682683 0 means the table size is essentially the same for all data releases 1 means the table size will grow 0.29 DR1 1.547E-08 8.61E-09 4.335E-07 2.301E-07 2.31 0.3465 4.111E-07 0.000000151 2.67E-09 0.0000005 8.01E-09 4.45714E-05 4.45714E-05 0.3053 0.0001029 0.055104 7.869 1.333 0.645 0.07455 1.7535 0.002357143 0.0010857 1.146428571 5.067857143 0.919285714 0.003285714 21.83244779 4.366489558 26.19893735 0.57 DR2 1.547E-08 8.61E-09 4.335E-07 2.301E-07 2.31 0.3465 4.111E-07 0.000000151 2.67E-09 0.0000005 8.01E-09 8.91429E-05 8.91429E-05 0.6106 0.0002058 0.110208 15.738 2.666 1.29 0.1491 3.507 0.004714286 0.0010857 2.292857143 10.13571429 1.838571429 0.006571429 41.00730812 8.201461624 49.20876974 0.86 DR3 1.547E-08 8.61E-09 4.335E-07 2.301E-07 2.31 0.3465 4.111E-07 0.000000151 2.67E-09 0.0000005 8.01E-09 0.000133714 0.000133714 0.9159 0.0003087 0.165312 23.607 3.999 1.935 0.22365 5.2605 0.007071429 0.0010857 3.439285714 15.20357143 2.757857143 0.009857143 60.18216845 12.03643369 72.21860214 1.00 DR4 1.547E-08 8.61E-09 4.335E-07 2.301E-07 2.31 0.3465 4.111E-07 0.000000151 2.67E-09 0.0000005 8.01E-09 0.000156 0.000156 1.06855 0.00036015 0.192864 27.5415 4.6655 2.2575 0.260925 6.13725 0.00825 0.0010857 4.0125 17.7375 3.2175 0.0115 69.76959861 13.95391972 83.72351833 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0.33 0 1 1 1 1 1 0.33 0.33 0.33 1 1 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 Primary filegroup 0 means full table 1 means the table is Fraction of the ta Note: These estimates are for the whole PS1, assuming 3.5 years. 7 bytes added to each row for overhead as suggested by Alex slide 28 PS1 Table Sizes - All Servers Table Year 1 Year 2 Year 3 Year 3.5 Objects 4.63 4.63 4.61 4.59 StackPsfFits 5.08 10.16 15.20 17.76 StackToObj 1.84 3.68 5.56 6.46 StackModelFits 1.16 2.32 3.40 3.96 P2PsfFits 7.88 15.76 23.60 27.60 P2ToObj 2.65 5.31 8.00 9.35 Other Tables 3.41 6.94 10.52 12.67 Indexes +20% 5.33 9.76 14.18 16.48 31.98 58.56 85.07 98.87 Total Sizes are in TB slide 29 Data Storage – Test Queries Drawn from several sources • Initial set of SDSS 20 queries • SDSS SkyServer Sample Queries • Queries from PS scientists (Monet, Howell, Kaiser, Heasley) Two objectives • Find potential holes/issues in schema • Serve as test queries – Test DBMS iintegrity – Test DBMS performance Loaded into CasJobs (Query Manager) as sample queries for prototype slide 30 Data Storage – DBMS Microsoft SQL Server 2005 • Relational DBMS with excellent query optimizer Plus • Spherical/HTM (C# library + SQL glue) – Spatial index (Hierarchical Triangular Mesh) • Zones (SQL library) – Alternate spatial decomposition with dec zones • Many stored procedures and functions – From coordinate conversions to neighbor search functions • Self-extracting documentation (metadata) and diagnostics slide 31 Documentation and Diagnostics slide 32 Data Storage – Scalable Architecture Monolithic database design (a la SDSS) will not do it SQL Server does not have cluster implementation • Do it by hand Partitions vs Slices • Partitions are file-groups on the same server – Parallelize disk accesses on the same machine • Slices are data partitions on separate servers • We use both! Additional slices can be added for scale-out For PS1, use SQL Server Distributed Partition Views (DPVs) slide 33 Distributed Partitioned Views Difference between DPVs and file-group partitioning • FG on same database • DPVs on separate DBs • FGs are for scale-up • DPVs are for scale-out Main server has a view of a partitioned table that includes remote partitions (we call them slices to distinguish them from FG partitions) Accomplished with SQL Server’s linked server technology NOT truly parallel, though slide 34 Scalable Data Architecture Shared-nothing architecture Detections split across cluster Objects Head replicated on Objects Head and Objects_S1 Slice DBs Objects_S2 DPVs of Objects_S3 Detections tables on the Detections DPV Headnode DB Detections_S1 Queries on Detections_S2 Objects stay Detections_S3 on head node Queries on detections use only local data on slices slide 35 S1 Detections_S1 Objects_S1 S2 Detections_S2 Objects_S2 S3 Detections_S3 Objects_S3 Hardware - Prototype Storage: S3 PS04 10A = 10 x [13 x 750 GB] 3B = 3 x [12 x 500 GB] 2A Function: LX = Linux L = Load server S/Head = DB server M = MyDB server W = Web server LX PS01 Function Total space RAID config Disk/rack config slide 36 4 2B Staging 10 TB RAID5 4 S2 PS03 4 2A L2/M PS05 L1 PS13 4 A 8 A Loading 9 TB RAID10 14D/3.5W S1 PS12 Server Naming Convention: PS0x = 4-core PS1x = 8-core 8 2A Head PS11 8 2A DB 39 TB RAID10 12D/4W W PS02 MyDB 4 B Web 0 TB RAID10 Hardware – PS1 Ping-pong configuration to maintain high availability and query performance 2 copies of each slice and of main (head) node database on fast hardware (hot spares) 3rd spare copy on slow hardware (can be just disk) Updates/ingest on offline copy then switch copies when ingest and replication finished Queries Queries Queries Synchronize second copy while first copy is online Queries Both copies live when no ingest 3x basic config. for PS1 slide 37 Live (Copy 1) Ingest Offline (Copy 2) Live (Copy 1) Offline (Copy 2) Live (Copy 2) Offline (Copy 1) Live (Copy 1) Queries Spare (Copy 3) Replicate Spare (Copy 3) Replicate Live (Copy 2) Spare (Copy 3) Spare (Copy 3) Detailed Design Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS) • Schema and Test Queries • Database Management System • Scalable Data Architecture • Hardware Query Manager (QM: CasJobs for prototype) slide 38 Query Manager Based on SDSS CasJobs Configure to work with distributed database, DPVs Direct links (contexts) to slices can be added later if necessary Segregates quick queries from long ones Saves query results server-side in MyDB Gives users a powerful query workbench Can be scaled out to meet any query load PS1 Sample Queries available to users PS1 Prototype QM demo slide 39 ODM Prototype Components Data Loading Pipeline Data Storage CasJobs • Query Manager (QM) • Web Based Interface (WBI) Testing slide 40 Spatial Queries (Alex) slide 41 Spatial Searches in the ODM slide 42 Common Spatial Questions Points in region queries 1. Find all objects in this region 2. Find all “good” objects (not in masked areas) 3. Is this point in any of the regions Region in region 4. Find regions near this region and their area 5. Find all objects with error boxes intersecting region 6. What is the common part of these regions Various statistical operations 7. Find the object counts over a given region list 8. Cross-match these two catalogs in the region slide 43 Sky Coordinates of Points Many different coordinate systems • Equatorial, Galactic, Ecliptic, Supergalactic Longitude-latitude constraints Searches often in mix of different coordinate systems • gb>40 and dec between 10 and 20 • Problem: coordinate singularities, transformations How can one describe constraints in a easy, uniform fashion? How can one perform fast database queries in an easy fashion? • Fast:Indexes • Easy: simple query expressions slide 44 Describing Regions Spacetime metadata for the VO (Arnold Rots) Includes definitions of • • • Constraint: single small or great circle Convex: intersection of constraints Region: union of convexes Support both angles and Cartesian descriptions Constructors for • CIRCLE, RECTANGLE, POLYGON, CONVEX HULL Boolean algebra (INTERSECTION, UNION, DIFF) Proper language to describe the abstract regions Similar to GIS, but much better suited for astronomy slide 45 Things Can Get Complex A B A B Green area: A (B- ε) should find B if it contains an A and not masked Yellow area: A (B±ε) is an edge case may find B if it contains an A. slide 46 We Do Spatial 3 Ways Hierarchical Triangular Mesh (extension to SQL) • Uses table valued functions • Acts as a new “spatial access method” Zones: fits SQL well • Surprisingly simple & good 3D Constraints: a novel idea • Algebra on regions, can be implemented in pure SQL slide 47 PS1 Footprint Using the projection cell definitions as centers for tessellation (T. Budavari) slide 48 CrossMatch: Zone Approach Divide space into declination zones Objects ordered by zoneid, ra (on the sphere need wrap-around margin.) Point search look in neighboring zones within ~ (ra ± Δ) bounding box All inside the relational engine Avoids “impedance mismatch” Can “batch” comparisons Automatically parallel Details in Maria’s thesis r ra-zoneMax x zoneMax ra ± Δ slide 49 Indexing Using Quadtrees Cover the sky with hierarchical pixels COBE – start with a cube Hierarchical Triangular Mesh (HTM) uses trixels • Samet, Fekete Start with an octahedron, and split each triangle into 4 children, 2,0 20 down to 20 levels deep 2,1 23 Smallest triangles are 0.3” 2,2 21 2,3 22 Each trixel has a unique htmID slide 50 2,3,0 222 223 2,3,1 2,3,2 220 2,3,3 221 Space-Filling Curve [0.12,0.13) [0.120,0.121) [0.121,0.122) [0.122,0.123) [0.123,0.130) 122 1,2,1 [0.120,0.121) [0.122,0.130) 132 Triangles correspond to ranges All points inside the triangle are inside the range. 100 slide 51 121 120 131 102 133 112 103 130 113 101 110 111 SQL HTM Extension Every object has a 20-deep htmID (44bits) Clustered index on htmID Table-valued functions for spatial joins • Given a region definition, routine returns up to 10 ranges of covering triangles • Spatial query is mapped to ~10 range queries Current implementation rewritten in C# Excellent performance, little calling overhead Three layers • General geometry library • HTM kernel • IO (parsing + SQL interface) slide 52 Writing Spatial SQL -- region description is contained by @area DECLARE @cover TABLE (htmStart bigint,htmEnd bigint) INSERT @cover SELECT * from dbo.fHtmCover(@area) -DECLARE @region TABLE ( convexId bigint,x float, y float, z float) INSERT @region SELECT dbo.fGetHalfSpaces(@area) -SELECT o.ra, o.dec, 1 as flag, o.objid FROM (SELECT objID as objid, cx,cy,cz,ra,[dec] FROM Objects q JOIN @cover AS c ON q.htmID between c.HtmIdStart and c.HtmIdEnd ) AS o WHERE NOT EXISTS ( SELECT p.convexId FROM @region AS p WHERE (o.cx*p.x + o.cy*p.y + o.cz*p.z < p.c) GROUP BY p.convexId ) slide 53 Status slide 54 All three libraries extensively tested Zones used for Maria’s thesis, plus various papers New HTM code in production use since July on SDSS Same code also used by STScI HLA, Galex Systematic regression tests developed Footprints computed for all major surveys Complex mask computations done on SDSS Loading: zones used for bulk crossmatch Ad hoc queries: use HTM-based search functions Excellent performance Prototype (Maria) slide 55 PS1 PSPS Object Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA slide 56 Detail Design slide 57 General Concepts Distributed Database architecture Ingest Workflow Prototype Zones slide 58 Partition and bin the data into declination zones • ZoneID = floor ((dec + 90.0) / zoneHeight) Few tricks required to handle spherical geometry Place the data close on disk • Cluster Index on ZoneID and RA Fully implemented in SQL Efficient • Nearby searches • Cross-Match (especially) Declination (Dec) Zones (spatial partitioning and indexing algorithm) Fundamental role in addressing the critical requirements • Data volume management • Association Speed • Spatial capabilities Right Ascension (RA) Zoned Table ObjID ZoneID* RA Dec 1 0 0.0 -90.0 2 20250 180.0 0.0 3 20250 181.0 0.0 4 40500 360.0 +90.0 CX CY CZ ZoneID = floor ((dec + 90.0) / zoneHeight) * ZoneHeight = 8 arcsec in this example slide 59 … SQL CrossNeighbors SELECT * FROM prObj1 z1 JOIN zoneZone ZZ ON ZZ.zoneID1 = z1.zoneID JOIN prObj2 z2 ON ZZ.ZoneID2 = z2.zoneID WHERE z2.ra BETWEEN z1.ra-ZZ.alpha AND z2.ra+ZZ.alpha AND z2.dec BETWEEN z1.dec-@r AND z1.dec+@r AND (z1.cx*z2.cx+z1.cy*z2.cy+z1.cz*z2.cz) > cos(radians(@r)) slide 60 Good CPU Usage slide 61 Partitions SQL Server 2005 introduces technology to handle tables which are partitioned across different disk volumes and managed by a single server. Partitioning makes management and access of large tables and indexes more efficient • Enables parallel I/O • Reduces the amount of data that needs to be accessed • Related tables can be aligned and collocated in the same place speeding up JOINS slide 62 Partitions 2 key elements • Partitioning function – Specifies how the table or index is partitioned • Partitioning schemas – Using a partitioning function, the schema specifies the placement of the partitions on file groups Data can be managed very efficiently using Partition Switching • • • Add a table as a partition to an existing table Switch a partition from one partitioned table to another Reassign a partition to form a single table Main requirement • slide 63 The table must be constrained on the partitioning column Partitions For the PS1 design, • Partitions mean File Group Partitions • Tables are partitioned into ranges of ObjectID, which correspond to declination ranges. • ObjectID boundaries are selected so that each partition has a similar number of objects. slide 64 Distributed Partitioned Views Tables participating in the Distributed Partitioned View (DVP) reside on different databases which reside in different databases which reside on different instances or different (linked) servers slide 65 Concept: Slices In the PS1 design, the bigger tables will be partitioned across servers To avoid confusion with the File Group Partitioning, we call them “Slices” Data is glued together using Distributed Partitioned Views The ODM will manage slices. Using slices improves system scalability. For PS1 design, tables are sliced into ranges of ObjectID, which correspond to broad declination ranges. Each slice is subdivided into partitions that correspond to narrower declination ranges. ObjectID boundaries are selected so that each slice has a similar number of objects. slide 66 Detail Design Outline slide 67 General Concepts Distributed Database architecture Ingest Workflow Prototype PS1 Distributed DB system objZoneIndx detections objZoneIndx detections orphans_l1 Orphans_ln Linked servers Detections_l1 Load Support1 LnkToObj_l1 Detections_ln Load Supportn LoadAdmin LnkToObj_ln PartitionsMap Linked servers [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_pm] P1 Meta Pm PartitionsMap Detections PS1 [LnkToObj_pm] [Detections_pm] Meta Objects LnkToObj Meta PS1 database Query Manager (QM) Legend Database Web Based Interface (WBI) slide 68 Full table Output table Partitioned View [partitioned table] Design Decisions: ObjID Objects have their positional information encoded in their objID • fGetPanObjID (ra, dec, zoneH) • ZoneID is the most significant part of the ID It gives scalability, performance, and spatial functionality Object tables are range partitioned according to their object ID slide 69 ObjectID Clusters Data Spatially Dec = –16.71611583 ZH = 0.008333 ZID = (Dec+90) / ZH = 08794.0661 ObjectID = 087941012871550661 RA = 101.287155 ObjectID is unique when objects are separated by >0.0043 arcsec slide 70 Design Decisions: DetectID Detections have their positional information encoded in the detection identifier • fGetDetectID (dec, observationID, runningID, zoneH) • Primary key (objID, detectionID), to align detections with objects within partitions • Provides efficient access to all detections associated to one object • Provides efficient access to all detections of nearby objects slide 71 DetectionID Clusters Data in Zones Dec = –16.71611583 ZH = 0.008333 ZID = (Dec+90) / ZH = 08794.0661 DetectID = 0879410500001234567 ObservationID = 1050000 Running ID = 1234567 slide 72 ODM Capacity 5.3.1.3 The PS1 ODM shall be able to ingest into the ODM a total of • 1.51011 P2 detections • 8.31010 cumulative sky (stack) detections • 5.5109 celestial objects together with their linkages. slide 73 PS1 Table Sizes - Monolithic Table Year 1 Year 2 Year 3 Year 3.5 Objects 2.31 2.31 2.31 2.31 StackPsfFits 5.07 10.16 15.20 17.74 StackToObj 0.92 1.84 2.76 3.22 StackModelFits 1.15 2.29 3.44 4.01 P2PsfFits 7.87 15.74 23.61 27.54 P2ToObj 1.33 2.67 4.00 4.67 Other Tables 3.19 6.03 8.87 10.29 Indexes +20% 4.37 8.21 12.04 13.96 26.21 49.24 72.23 83.74 Total Sizes are in TB slide 74 What goes into the main Server Linked servers P1 Pm PartitionsMap PS1 Objects LnkToObj Meta PS1 database Objects PartitionsMap LnkToObj Meta Legend Database Full table [partitioned table] Output table Distributed Partitioned View slide 75 What goes into slices Linked servers [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_pm] P1 Pm [LnkToObj_pm] [Detections_pm] PartitionsMap PartitionsMap PartitionsMap Meta PS1 Meta Objects LnkToObj Meta PS1 database [Objects_p1] PartitionsMap [LnkToObj_p1] Legend [Detections_p1] Database Meta Full table [partitioned table] Output table Distributed Partitioned View slide 76 What goes into slices Linked servers [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_pm] P1 Pm [LnkToObj_pm] [Detections_pm] PartitionsMap PartitionsMap PartitionsMap Meta PS1 Meta Objects LnkToObj Meta PS1 database [Objects_p1] PartitionsMap [LnkToObj_p1] Legend [Detections_p1] Database Meta Full table [partitioned table] Output table Distributed Partitioned View slide 77 Duplication of Objects & LnkToObj Objects are distributed across slices Objects, P2ToObj, and StackToObj are duplicated in the slices to parallelize “inserts” & “updates” Detections belong into their object’s slice Orphans belong to the slice where their position would allocate them • Orphans near slices’ boundaries will need special treatment Objects keep their original object identifier • Even though positional refinement might change their zoneID and therefore the most significant part of their identifier slide 78 Glue = Distributed Views Linked servers [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_pm] P1 Pm [Detections_pm] PartitionsMap PartitionsMap Meta [LnkToObj_pm] PartitionsMap Detections PS1 Meta Objects LnkToObj Meta PS1 database Detections Legend Database Full table [partitioned table] Output table Distributed Partitioned View slide 79 Partitioning in Main Server Main server is partitioned (objects) and collocated (lnkToObj) by objid Slices are partitioned (objects) and collocated (lnkToObj) by objid Linked servers P1 Pm PS1 PS1 database Query Manager (QM) Web Based Interface (WBI) slide 80 PS1 Table Sizes - Main Server Table Year 1 Year 2 Year 3 Year 3.5 2.31 2.31 2.31 2.31 StackPsfFits StackToObj 0.92 1.84 2.76 3.22 StackModelFits P2PsfFits P2ToObj 1.33 2.67 4.00 4.67 Other Tables 0.41 0.46 0.52 0.55 Indexes +20% 0.99 1.46 1.92 2.15 Total 5.96 8.74 11.51 12.90 Objects Sizes are in TB slide 81 PS1 Table Sizes - Each Slice m=4 m=8 m=10 m=12 Year 1 Year 2 Year 3 Year 3.5 Objects 0.58 0.29 0.23 0.19 StackPsfFits 1.27 1.27 1.52 1.48 StackToObj 0.23 0.23 0.28 0.27 StackModelFits 0.29 0.29 0.34 0.33 P2PsfFits 1.97 1.97 2.36 2.30 P2ToObj 0.33 0.33 0.40 0.39 Other Tables 0.75 0.81 1.00 1.01 Indexes +20% 1.08 1.04 1.23 1.19 Total 6.50 6.23 7.36 7.16 Table Sizes are in TB slide 82 PS1 Table Sizes - All Servers Table Year 1 Year 2 Year 3 Year 3.5 Objects 4.63 4.63 4.61 4.59 StackPsfFits 5.08 10.16 15.20 17.76 StackToObj 1.84 3.68 5.56 6.46 StackModelFits 1.16 2.32 3.40 3.96 P2PsfFits 7.88 15.76 23.60 27.60 P2ToObj 2.65 5.31 8.00 9.35 Other Tables 3.41 6.94 10.52 12.67 Indexes +20% 5.33 9.76 14.18 16.48 31.98 58.56 85.07 98.87 Total Sizes are in TB slide 83 Detail Design Outline slide 84 General Concepts Distributed Database architecture Ingest Workflow Prototype PS1 Distributed DB system objZoneIndx detections objZoneIndx detections orphans_l1 Orphans_ln Linked servers Detections_l1 Load Support1 LnkToObj_l1 Detections_ln Load Supportn LoadAdmin LnkToObj_ln PartitionsMap Linked servers [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_pm] P1 PartitionsMap Meta Pm Detections PS1 [LnkToObj_pm] [Detections_pm] PartitionsMap PartitionsMap Objects Meta LnkToObj Meta PS1 database Query Manager (QM) Legend Database Web Based Interface (WBI) slide 85 Full table Output table Partitioned View [partitioned table] “Insert” & “Update” slide 86 SQL Insert and Update are expensive operations due to logging and re-indexing In the PS1 design, Insert and Update have been refactored into sequences of: Merge + Constrain + Switch Partition Frequency • f1: daily • f2: at least monthly • f3: TBD (likely to be every 6 months) Ingest Workflow DZone X(1”) ObjectsZ DXO_1a NoMatch CSV X(2”) Detect Resolve DXO_2a P2PsfFits Orphans slide 87 P2ToObj Ingest @ frequency = f1 11 12 13 1 2 Stack*_1 Objects ObjectsZ Objects_1 P2ToObj P2ToObj_1 P2PsfFits P2ToPsfFits_1 StackToObj Orphans Orphans_1 Orphans_1 Metadata+ LOADER SLICE_1 slide 88 P2ToObj MAIN 3 Updates @ frequency = f2 11 12 13 1 2 Stack*_1 Objects Objects LOADER slide 89 Objects_1 P2ToObj_1 P2ToObj P2ToPsfFits_1 StackToObj Orphans_1 Metadata+ SLICE_1 MAIN 3 Updates @ frequency = f2 11 12 13 1 2 Stack*_1 Objects Objects LOADER slide 90 Objects_1 P2ToObj_1 P2ToObj P2ToPsfFits_1 StackToObj Orphans_1 Metadata+ SLICE_1 MAIN 3 Snapshots @ frequency = f3 1 2 3 Objects Objects P2ToObj Snapshot StackToObj Metadata+ MAIN slide 91 Batch Update of a Partition select into 1 1 2 1 … 2 3 merged slide 92 select into … where select into … where select into … where B1 + PK index B2 + PK index B3 + PK index switch switch switch B1 A1 A2 A3 Scaling-out Apply Ping-Pong strategy to satisfy query performance during ingest 2 x ( 1 main + m slices) [Objects_p1] Linked servers [LnkToObj_p1] [Detections_p1] [Objects_p2] [LnkToObj_p2] P1 P2 [Objects_pm] P2 P3 Pm-1 Pm [Detections_p2] Detections PartitionsMap Meta Objects Pm P1 Detections PartitionsMap PS1 PS1 LnkToObj Meta Meta [Detections_pm] [Objects_p1] [LnkToObj_p1] [Detections_p1] Objects LnkToObj [LnkToObj_pm] Meta PS1 database Query Manager (QM) Legend Database Duplicate Full table [partitioned table] Partitioned View Duplicate P view slide 93 Scaling-out More robustness, fault-tolerance, and reabilability calls for 3 x ( 1 main + m slices) [Objects_p1] Linked servers [LnkToObj_p1] [Detections_p1] [Objects_p2] [LnkToObj_p2] P1 P2 [Objects_pm] P2 P3 Pm-1 Pm [Detections_p2] Detections PartitionsMap Meta Objects Pm P1 Detections PartitionsMap PS1 PS1 LnkToObj Meta Meta [Detections_pm] [Objects_p1] [LnkToObj_p1] [Detections_p1] Objects LnkToObj [LnkToObj_pm] Meta PS1 database Query Manager (QM) Legend Database Duplicate Full table [partitioned table] Partitioned View Duplicate P view slide 94 Adding New slices SQL Server range partitioning capabilities make it easy slide 95 Recalculate partitioning limits Transfer data to new slices Remove data from slices Define an d Apply new partitioning schema Add new partitions to main server Apply new partitioning schema to main server Adding New Slices slide 96 Detail Design Outline slide 97 General Concepts Distributed Database architecture Ingest Workflow Prototype ODM Ingest Performance 5.3.1.6 The PS1 ODM shall be able to ingest the data from the IPP at two times the nominal daily arrival rate* * The nominal daily data rate from the IPP is defined as the total data volume to be ingested annually by the ODM divided by 365. Nominal daily data rate: • 1.51011 / 3.5 / 365 = 1.2108 P2 detections / day • 8.31010 / 3.5 / 365 = 6.5107 stack detections / day slide 99 Number of Objects miniProto myProto Prototype SDSS* Stars 5.7 x 104 1.3 x 107 1.1 x 108 SDSS* Galaxies 9.1 x 104 1.1 x 107 1.7 x 108 Galactic Plane 1.5 x 106 3 x 106 1.0 x 109 TOTAL 1.6 x 106 2.6 x 107 1.3 x 109 PS1 5.5 x 109 * “SDSS” includes a mirror of 11.3 < < 30 objects to < 0 Total GB of csv loaded data: 300 GB CSV Bulk insert load: 8 MB/s Binary Bulk insert: 18-20 MB/s Creation Started: October 15th 2007 Finished: October 29th 2007 (??) Includes • 10 epochs of P2PsfFits detections • 1 epoch of Stack detections slide 100 Prototype in Context slide 102 Survey Objects Detections SDSS DR6 3.8 108 2MASS 4.7 108 USNO-B 1.0 109 Prototype 1.3 109 1.4 1010 PS1 (end of survey) 5.5 109 2.3 1011 Size of Prototype Database Table Main Slice1 0.43 Slice2 0.43 Slice3 Loader 0.43 1.30 Total Objects 1.30 3.89 StackPsfFits 6.49 6.49 StackToObj 6.49 6.49 StackModelFits 0.87 0.87 P2PsfFits 4.02 3.90 3.35 0.37 11.64 P2ToObj 4.02 3.90 3.35 0.12 11.39 Total 15.15 8.47 8.23 7.13 1.79 40.77 Extra Tables 0.87 4.89 4.77 4.22 6.86 21.61 Grand Total 16.02 13.36 13.00 11.35 8.65 62.38 Table sizes are in billions of rows slide 103 Size of Prototype Database Table Main Slice1 Slice2 Slice3 Loader Total Objects 547.6 165.4 165.3 165.3 137.1 StackPsfFits 841.5 841.6 StackToObj 300.9 300.9 StackModelFits 476.7 476.7 P2PsfFits 879.9 853.0 733.5 74.7 2541.1 P2ToObj 125.7 121.9 104.8 3.8 356.2 2166.7 1171.0 1140.2 1003.6 215.6 5697.1 207.9 987.1 960.2 840.7 957.3 3953.2 Allocated / Free 1878.0 1223.0 1300.0 1121.0 666.0 6188.0 Grand Total 4252.6 3381.1 3400.4 2965.3 1838.9 15838.3 Total Extra Tables Table sizes are in GB slide 104 1180.6 9.6 TB of data in a distributed database Well-Balanced Partitions Server slide 105 Partition Rows Fraction Dec Range Main 1 432,590,598 33.34% 32.59 Slice 1 1 144,199,105 11.11% 14.29 Slice 1 2 144,229,343 11.11% 9.39 Slice 1 3 144,162,150 11.12% 8.91 Main 2 432,456,511 33.33% 23.44 Slice 2 1 144,261,098 11.12% 8.46 Slice 2 2 144,073,972 11.10% 7.21 Slice 2 3 144,121,441 11.11% 7.77 Main 3 432,496,648 33.33% 81.98 Slice 3 1 144,270,093 11.12% 11.15 Slice 3 2 144,090,071 11.10% 14.72 Slice 3 3 144,136,484 11.11% 56.10 Ingest and Association Times Task Create Detections Zone Table 39.62 X(0.2") 121M X 1.3B 65.25 Build #noMatches Table 1.50 X(1") 12k X 1.3B 0.65 Build #allMatches Table (121M) 6.58 Build Orphans Table 0.17 Create P2PsfFits Table 11.63 Create P2ToObj Table 14.00 Total of Measured Times slide 106 Measured Minutes 140.40 Ingest and Association Times Task slide 107 Estimated Minutes Compute DetectionID, HTMID 30 Remove NULLS 15 Index P2PsfFits on ObjID 15 Slices Pulling Data from Loader 5 Resolve 1 Detection - N Objects 10 Total of Estimated Times 75 Educated Guess Wild Guess Total Time to I/A daily Data Task Time Time (hours) (hours) 0.32 0.98 Total of Measured Times 2.34 2.34 Total of Estimated Times 1.25 1.25 Total Time to I/A Daily Data 3.91 4.57 Ingest 121M Detections (binary) Ingest 121M Detections (CSV) Requirement: Less than 12 hours (more than 2800 detections / s) Detection Processing Rate: 8600 to 7400 detections / s Margin on Requirement: 3.1 to 2.6 Using multiple loaders would improve performance slide 108 Insert Time @ slices Estimated Minutes Task Import P2PsfFits (binary out/in) 20.45 Import P2PsfFits (binary out/in) 2.68 Import Orphans 0.00 Merge P2PsfFits 58 Educated Guess 193 Add constraint P2PsfFits Merge P2ToObj 13 Add constraint P2ToObj 54 Total of Measured Times 6 h with 8 partitions/slice (~1.3 x 109 detections/partition) slide 109 362 Detections Per Partition slide 110 Years Total Detections Slices 0.0 0.00 4 8 32 0.00 1.0 4.29 1010 4 8 32 1.34 109 1.0 4.29 1010 8 8 64 6.7 108 2.0 8.57 1010 8 8 64 1.34 109 2.0 8.57 1010 10 8 80 1.07 109 3.0 1.29 1011 10 8 80 1.61 109 3.0 1.29 1011 12 8 96 1.34 109 3.5 1.50 1011 12 8 96 1.56 109 Partition Total Detections per Slice Partitions per Slice Total Time for Insert @ slice Task Time (hours) Total of Measured Times 0.25 Total of Estimated Times 5.3 Total Time for daily insert 6 Daily insert may operate in parallel with daily ingest and association. Requirement: Less than 12 hours Margin on Requirement: 2.0 Using more slices will improve insert performance. slide 111 Summary slide 112 Ingest + Association < 4 h using 1 loader (@f1= daily) • Scales with the number of servers • Current margin on requirement 3.1 • Room for improvement Detection Insert @ slices (@f1= daily) • 6 h with 8 partitions/slice • It may happen in parallel with loading Detections Lnks Insert @ main (@f2 < monthly) • Unknown • 6 h available Objects insert & update @ slices (@f2 < monthly) • Unknown • 6 hours available Objects update @ main server (@f2 < monthly) • Unknown • 12 h available. Transfer can be pipelined as soon as objects have been processed Risks Estimates of Insert & Update at slices could be underestimated • Need more empirical evaluation of exercising parallel I/O Estimates and lay out of disk storage could be underestimated • Merges and Indexes require 2x the data size slide 113 Hardware/Scalability (Jan) slide 114 PS1 Prototype Systems Design Jan Vandenberg, JHU Early PS1 Prototype slide 115 Engineering Systems to Support the Database Design Sequential read performance is our life-blood. Virtually all science queries will be I/O-bound. ~70 TB raw data: 5.9 hours for full scan on IBM’s fastest 3.3 GB/s Champagne-budget SAN • Need 20 GB/s IO engine just to scan the full data in less than an hour. Can’t touch this on a monolith. Data mining a challenge even with good index coverage • ~14 TB worth of indexes: 4-odd times bigger than SDSS DR6. Hopeless if we rely on any bulk network transfers: must do work where the data is Loading/Ingest more cpu-bound, though we still need solid write performance slide 116 Choosing I/O Systems So killer sequential I/O performance is a key systems design goal. Which gear to use? • FC/SAN? • Vanilla SATA? • SAS? slide 117 Fibre Channel, SAN slide 118 Expensive but not-so-fast physical links (4 Gbit, 10 Gbit) Expensive switch Potentially very flexible Industrial strength manageability Little control over RAID controller bottlenecks Straight SATA Fast Pretty cheap Not so industrialstrength slide 119 SAS Fast: 12 Gbit/s FD building blocks Nice and mature, stable SCSI’s not just for swanky drives anymore: takes SATA drives! So we have a way to use SATA without all the “beige”. Pricey? $4400 for full 15x750GB system ($296/drive == close to Newegg media cost) slide 120 SAS Performance, Gory Details SAS v. SATA differences Native SAS V. SATA Performance 500 450 400 20% 350 MB/s 300 250 200 150 100 50 0 1 2 3 4 Disks slide 121 5 6 7 Per-Controller Performance One controller can’t quite accommodate the throughput of an entire storage enclosure. Controller Limits 1400 1200 6 Gbit Limit 1000 One Controller Ideal MB/s 800 600 400 200 0 1 2 3 4 5 6 7 Disks slide 122 8 9 10 11 12 13 Resulting PS1 Prototype I/O Topology 1100 MB/s single-threaded sequential reads per server Aggregate Design I/O Performance 1600 1400 6 Gbit Limit Dual Controllers 1200 One Controller Ideal MB/s 1000 800 600 400 200 0 1 2 3 4 5 6 7 8 9 10 Disks slide 123 11 12 13 14 15 16 17 18 RAID-5 v. RAID-10? Primer, anyone? RAID-5 perhaps feasible with contemporary controllers… …but not a ton of redundancy But after we add enough disks to meet performance goals, we have enough storage to run RAID-10 anyway! slide 124 RAID-10 Performance 0.5*RAID-0 for single-threaded reads RAID-0 perf for 2-user/2-thread workloads 0.5*RAID-0 writes slide 125 PS1 Prototype Servers Prototype Loader Prototype DB H1 Linux staging L1 L2 S1 slide 126 S2 S3 PS1 Prototype Servers PS1 Prototype slide 127 PS1 Prototype Servers slide 128 Projected PS1 Systems Design Loader …. Linux staging LN L1 R1 R2 H1 R3 H1 H2 …. S1 S2 H1 H2 …. S8 S1 S2 H2 …. S8 S1 S2 S8 G1 H1 H2 …. S1 slide 129 S2 S8 Backup/Recovery/Replication Strategies No formal backup • …except maybe for mydb’s, f(cost*policy) 3-way replication • Replication != backup – Little or no history (though we might have some point-intime capabilities via metadata – Replicas can be a bit too cozy: must notice badness before replication propagates it • Replicas provide redundancy and load balancing… • Fully online: zero time to recover • Replicas needed for happy production performance plus ingest, anyway Off-site geoplex • Provides continuity if we lose HI (local or trans-Pacific network outage, facilities outage) • Could help balance trans-Pacific bandwidth needs (service continental traffic locally) slide 130 Why No Traditional Backups? Money no object… do traditional backups too!!! Synergy, economy of scale with other collaboration needs (IPP?)… do traditional backups too!!! Not super pricey… …but not very useful relative to a replica for our purposes • Time to recover slide 131 Failure Scenarios (Easy Ones) Zero downtime, little effort: • Disks (common) – Simple* hotswap – Automatic rebuild from hotspare or replacement drive • Power supplies (not uncommon) – Simple* hotswap • Fans (pretty common) – Simple* hotswap * Assuming sufficiently non-beige gear slide 132 Failure Scenarios (Mostly Harmless Ones) Some downtime and replica cutover: • System board (rare) • Memory (rare and usually proactively detected and handled via scheduled maintenance) • Disk controller (rare, potentially minimal downtime via cold-spare controller) • CPU (not utterly uncommon, can be tough and time consuming to diagnose correctly) slide 133 Failure Scenarios (Slightly Spooky Ones) Database mangling by human or pipeline error • Gotta catch this before replication propagates it everywhere • Need lots of sanity checks before replicating • (and so off-the-shelf near-realtime replication tools don’t help us) • Need to run replication backwards from older, healthy replicas. Probably less automated than healthy replication. Catastrophic loss of datacenter • Okay, we have the geoplex – …but we’re dangling by a single copy ‘till recovery is complete – …and this may be a while. – …but are we still in trouble? Depending on colo scenarios, did we also lose the IPP and flatfile archive? slide 134 Failure Scenarios (Nasty Ones) Unrecoverable badness fully replicated before detection Catastrophic loss of datacenter without geoplex Can we ever catch back up with the data rate if we need to start over and rebuild with an ingest campaign? Don’t bet on it! slide 135 Operating Systems, DBMS? Sql2005 EE x64 • Why? • Why not DB2, Oracle RAC, PostgreSQL, MySQL, <insert your favorite>? (Win2003 EE x64) Why EE? Because it’s there. <indexed DPVs?> Scientific Linux 4.x/5.x, or local favorite Platform rant from JVV available over beers slide 136 Systems/Database Management slide 137 Active Directory infrastructure Windows patching tools, practices Linux patching tools, practices Monitoring Staffing requirements Facilities/Infrastructure Projections for PS1 Power/cooling • Prototype is 9.2 kW (2.6 Tons AC) • PS1: something like 43 kW, 12.1 Tons Rack space • Prototype is 69 RU, <2 42U racks (includes 14U of rackmount UPS at JHU) • PS1: about 310 RU (9-ish racks) Networking: ~40 Gbit Ethernet ports …plus sundry infrastructure, ideally already in place (domain controllers, monitoring systems, etc.) slide 138 Operational Handoff to UofH Gulp. slide 139 How Design Meets Requirements Cross-matching detections with objects • Zone cross-match part of loading pipeline • Already exceeded requirement with prototype Query performance • Ping-pong configuration for query during ingest • Spatial indexing and distributed queries • Query manager can be scaled out as necessary Scalability • Shared-nothing architecture • Scale out as needed • Beyond PS1 we will need truly parallel query plans slide 140 WBS/Development Tasks 2 PM Refine Prototype/Schema 3 PM Staging/Transformation 1 PM Initial Load 3 PM Load/Resolve Detections 3 PM 1 PM Create Snapshot 2 PM Replication Module 2 PM Query Processing 2 PM Hardware 2 PM Redistribute Data 4 PM Documentation 4 PM slide 141 Resolve/Synchronize Objects Testing 4 PM • Workflow Systems • Logging • Data Scrubbing • SSIS (?) + C# 2 PM • QM/Logging Total Effort: Delivery: 35 PM 9/2008 Personnel Available slide 142 2 new hires (SW Engineers) 100% Maria 80% Ani 20% Jan 10% Alainna 15% Nolan Li 25% Sam Carliles 25% George Fekete 5% Laszlo Dobos 50% (for 6 months) Issues/Risks Versioning • Do we need to preserve snapshots of monthly versions? • How will users reproduce queries on subsequent versions? • Is it ok that a new version of the sky replaces the previous one every month? Backup/recovery • Will we need 3 local copies rather than 2 for safety • Is restoring from offsite copy feasible? Handoff to IfA beyond scope of WBS shown • This will involve several PMs slide 143 Mahalo! Query Manager MyDB table that query results go into Context that query is executed in Load one of the sample queries into query buffer slide 145 Check query syntax Name that this query job is given Get graphical query plan Query buffer Run query in quick (1 minute) mode Submit query to long (8hour) queue Query Manager Stored procedure arguments SQL code for stored procedure slide 146 Query Manager MyDB context is the default, but other contexts can be selected User can browse DB Views, Tables, Functions and Procedures The space used and total space available Multiple tables can be selected and dropped at once Table list can be sorted by name, size, type. slide 147 Query Manager The query that created this table slide 148 Query Manager Context to run search on Search radius Table to hold results slide 149