Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Concurrency control wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
UPPSALA DATABASE LABORATORY Managing Scientific Queries over Distributed Data in a Grid Environment Ruslan Fomkin UU- IT - UDBL Ruslan Fomkin Uppsala DataBase Laboratory (UDBL) Supervisor • prof. T. Risch Database research • How to make extensible middleware query processing allowing scalable and application oriented search to different kinds of wrapped information sources http://www.it.uu.se/research/group/udbl/ January 20, 2006 NGN workshop Uppsala 2 UU- IT - UDBL Ruslan Fomkin AMOS II Simulation Visualization Analysis Applications Queries and views Plug-ins Virtual Mediator Database Continuous Queries Queries Wrappers Relational Databases January 20, 2006 Patient Monitoring GRID hist. Data sources Measurments NGN workshop Uppsala 3 UU- IT - UDBL Ruslan Fomkin Ongoing Research at UDBL Mediating Web Services Stream Queries on BlueGene Manivasakan Sabesan, BSc Erik Zeitler, MSc Semantic Web Queries to Hidden Web Stream Data Manager Milena Ivanova, PhD Johan Petrini, MSc UDBL FEM Databases Expensive GRID Queries Kjell Orsborn, PhD Ruslan Fomkin, MSc January 20, 2006 NGN workshop Uppsala 4 UU- IT - UDBL Ruslan Fomkin Outline Introduction The project Test application Developed framework Conclusion Future work January 20, 2006 NGN workshop Uppsala 5 UU- IT - UDBL Ruslan Fomkin Scientific Applications, Grid and Databases A lot of scientific data • Complex structure • Stored in files distributed in Grid Scientific analyses can be represented as declarative queries • Complex queries with numerical computations • Long running or batch queries Utilization of computational resources of Grid January 20, 2006 NGN workshop Uppsala 6 UU- IT - UDBL Ruslan Fomkin Parallel Object Query System for Expensive Computations (POQSEC) Query processor for scientific applications • high-level interface to specify the analyses • automatically generates execution plans and evaluates them Requirements • Scalable, efficient, flexible, transparent Properties • Distributed and parallel January 20, 2006 NGN workshop Uppsala 7 UU- IT - UDBL Ruslan Fomkin Layered Architecture of the System POQSEC provides • scientific query management Grid provides • computation management • file management NorduGrid Middleware Application area provides • computational libraries • data management libraries User POQSEC Application Grid ROOT NorduGrid libraries Data Clusters ROOT library January 20, 2006 NGN workshop Uppsala 8 UU- IT - UDBL Ruslan Fomkin Our Test Application From Particle Physics Analysis of collision events for presence of Higgs particles Data produced by ATLAS simulation software • stored in files • distributed in the Grid (e.g. NorduGrid) • managed by ROOT library January 20, 2006 NGN workshop Uppsala 9 UU- IT - UDBL Ruslan Fomkin Object-Relational Schema of the Application Data PxMiss Px PyMiss Event 1 particles Py n Pz Kf Particle Ee Lepton inheritance relationship Muon January 20, 2006 NGN workshop Uppsala Electron Jet 10 UU- IT - UDBL Ruslan Fomkin General Query of the Analysis Selection of those events that satisfy predicates containing numerical operations SELECT ev FROM Event ev WHERE jetvetocut(ev) AND zvetocut(ev) AND topcut(ev) AND misseecuts(ev) AND leptoncuts(ev)AND threeleptoncut(ev); Each predicate called cut in application area Predicates are defined as queries January 20, 2006 NGN workshop Uppsala 11 UU- IT - UDBL Ruslan Fomkin Example of a predicate: Z-veto cut Either event does not have a pair of opposite charged leptons or invariant mass of the pair is not close to the mass of a Z particle CREATE FUNCTION zvetocut(Event ev)-> Event AS SELECT ev WHERE NOTANY(oppositeLeptons(ev)) OR abs(invMass(oppositeLeptons(ev)) - zMass) >= minZMass; CREATE FUNCTION oppositeLeptons (Event ev) -> bag of <Lepton, Lepton> AS SELECT l1, l2 FROM Lepton l1, Lepton l2 WHERE l1 = particles(ev) AND l2 = particles(ev) AND Kf(l1) = -Kf(l2); January 20, 2006 NGN workshop Uppsala 12 UU- IT - UDBL Ruslan Fomkin Current Framework Basic tool for utilizing NorduGrid through Advanced Resource Connector (ARC) Submission mechanism • submit query • parallelize query to several subqueries • generate job scripts (one per subquery) Babysitter functionality Data exchange mechanism through files January 20, 2006 NGN workshop Uppsala 13 UU- IT - UDBL Ruslan Fomkin Client and Coordinator Part POQSEC client personal database with application schema ROOT wrapper Coordinator server receives queries creates jobs Grid Client Node Coordinator server POQSEC Client Grid MetaDatabase Grid Meta-Database computational resources data files January 20, 2006 Local Storage Submission Database received submissions created jobs NGN workshop Uppsala Job queue Query Coordinator Submission Database Babysitter ARC Client Babysitter interactions with ARC 14 UU- IT - UDBL Ruslan Fomkin Query Submission Query submission query file name selection degree of parallelism CPU time for each job Grid Client Node Coordinator server POQSEC Client Grid MetaDatabase January 20, 2006 Submission Database Babysitter ARC Client Local Storage Coordinator server creates jobs same query partitions of data with equal size same CPU time provided by user corresponding job script files Job queue Query Coordinator Submission and its jobs saved in Submission Database Created jobs added to Job queue Script files saved to Local Storage NGN workshop Uppsala 15 UU- IT - UDBL Ruslan Fomkin Jobs Submission Babysitter Takes jobs from Job queue Submits each job to ARC client Change status of submitted jobs in Submission DB Grid Client Node Coordinator server POQSEC Client Grid MetaDatabase January 20, 2006 Job queue Submission Database NGN workshop Uppsala Babysitter ARC Client Local Storage ARC client finds Computing Element submits job to corresponding ARC Grid manager Query Coordinator CE ARC Grid Manager CE ARC Grid Manager 16 UU- IT - UDBL Ruslan Fomkin Job Execution ARC Grid Manager downloads input files submits job to Local Batch System After some delay LBS starts Executor on allocated a CE node Executor during execution execute given subquery accesses data through ROOT wrapper saves result to files on CE Storage SE SE CE ARC Grid Manager CE Storage LBS Queue Executor wrapper CE node January 20, 2006 NGN workshop Uppsala 17 UU- IT - UDBL Ruslan Fomkin Downloading Result Babysitter polls ARC client for jobs statuses requests to download results for finished jobs Results downloaded to Local Storage User can retrieve result when all jobs are ready January 20, 2006 Grid Client Node Coordinator server POQSEC Client Grid MetaDatabase Local Storage CE ARC Grid Manager CE Storage NGN workshop Uppsala Query Coordinator Job queue Submission Database Babysitter ARC Client CE ARC Grid Manager CE Storage 18 UU- IT - UDBL Ruslan Fomkin Conclusion We provide • declarative query interface for representation scientific queries • parallel query execution in Grid (generating scripts) • babysitter to keep track of job execution Query parallelization is important Standalone desktop Grid, one job Grid, four jobs Response time 190 min 225 min 24 min Requested CPU time - 200 min 20 min January 20, 2006 NGN workshop Uppsala 19 UU- IT - UDBL Ruslan Fomkin Future work Estimation time of executing query Dealing with underestimation of execution time Automatic making decision on degree of parallelism and resource brokering • adaptive • based on current load and job statistics Dealing with failures in Grid POOL wrapper January 20, 2006 NGN workshop Uppsala 20 UU- IT - UDBL Ruslan Fomkin Thank you for attention! Your questions? January 20, 2006 NGN workshop Uppsala 21