Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Status of TAG Infrastructure Jack Cranshaw Argonne National Lab July 6, 2006 1 TAG Content • Focus of Review 11/05 to 3/06. • Content recommendations implemented modulo trigger and streaming information. • Within the the 1k limit, general improvement in usability. • Infrastructure issues assigned to DB group – Extensibility of existing TAG’s – Improved query performance for relational TAGs. – Plan to deal with ‘TAG’ queries which include nonevent metadata queries. 2 Tag Content (Review results) • Event References (310->30) • Global Event Information (128) – Quality Information • • • • Global Trigger Information (36) Electron/Photon/Muon Object Quantities (240) Tau/Jet Object Quantities (216) Analysis Specific Information (44) Any connection to ConditionsDB data resolved at initial TAG creation. Milestone 1: Implement a Trigger representation which is usable for September streams test. Milestone 2: Demonstrate that pure disconnect from other databases such as ConditionsDB is practical with new versions. 3 File Catalog / DDM • The TAG’s have an implicit dependence on a file catalog system. • This was complicated when the DDM system placed ‘dataset’ between the TAG system and the file identifiers • Glasgow group (CN, HM) has been instrumental in trying to understand how to handle this – Some purely local use cases don’t require DDM. – DDM system has been evolving towards a system which eases the problem. Milestone 1: Users able to define DDM dataset based on TAG selection. Milestone 2: Users able to access TAG database at local site in athena and read files. Milestone 3: User able to use TAG as input to pathena 4 File Based TAG data • Files can be used as event index for files if generated isomorphically to AOD files. • First order TAG database is to be able to distribute these TAG files to sites with data and use site catalog to find files. Milestone 1: Define DDM datasets composed of TAG files and transfer them to another site using DQ2 tools. Milestone 2*: Run a job using TAG files as input to pathena using streamed data. Milestone 3: Document procedure for building local relational TAG database from TAG file dataset. 5 Collection Metadata • As Collections become more numerous and move into actual deployment, must be able to locate instances and understand how and when they were created. • Initial tests with AMI need followup. • Related to need for monitoring of TAG production. Milestone 1: Be able to list all collections at a site, their attributes, their size, and their type. Command line tool and web page. Milestone 2: Move all attributes identified in the Tag Review into a Collection metadata system. Then be able to select on one of the attributes, e.g. software version, and run a job which uses one of the results as input. 6 Partitioning Ideas • Horizontal Partitioning – Partitioning by rows – Put data from different periods or different datasets in separate stores. – Should allow parallelization of queries if server set up correctly. • Vertical Partitioning – Partitioning by column. – Put data from different ‘sources’ in different stores. – Could speed up queries if many of them touch only one or two of the partitions. – Allows content to be extended in situ. – Requires optimization to avoid being killed by joins. 7 Tier 0 Test: Scope and Metadata Components • Followup to January Tier 0 Tests which did not include TAG • Include TAG production and upload to relational database in Tier 0 production system. • System faked at various levels – Insufficient resources to run reconstruction means use pre-staged files on worker nodes to generate TAGs with almost new schema. – TAG’s point to merged AOD files which are dummy files deleted on job completion. No follow-up navigation possible. – AOD input will repeat, but Tier 0 system will supply unique run numbers to TAG production. 8 Database Setup Tier 0 Oracle : intr castor load archive 9 Accomplishments and Problems • Managed to use POOL tools to load oracle database as procedure called by Tier 0 control process. O(107 rows). • Was able to do this using a table partitioned by run. New partitions were defined as the Tier 0 process encountered new runs. • Many problems found in POOL tools in the area of robustness and error reporting. Being fed back into POOL. • Monitoring best described as non-existent. Much work needed. • Upload speed insufficient (80 Hz). Not fully understood. • No metadata stored on collection status. • Ability to recover partially completed upload very limited. 10 Followup • O-stream Testing: provide raw materials for tests in July/August. • Performance Studies: – Benchmark queries on partitioned table. – Try various indexing strategies, e.g. bitmaps on all columns. • • • • Improve upload speed. Sanity check that Mysql export still possible. Fix all the various problems identified. Show that these problems have been fixed in the September Tier 0 exercise. Milestone 1: Demonstrate a system which can manage upload speeds of > 400 Hz. Milestone 2: Show that benchmark queries on a 107 table can be done at a similar speed to 106 table. Milestone 3: Export partitioned TAG database Oracle -> Mysql with 107 rows and run benchmark queries. 11 All the stuff I didn’t have time to include or forgot ! 12