Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory M. Vranicar, J. Weicher PIOCON Technologies XV International Conference on Computing in High Energy and Nuclear Physics T.I.F.R., Mumbai, India February 13-17, 2006 Open Science Grid Databases and Grids • In addition to petabytes of file-based event data, high energy physics applications require access to non-event data (detector conditions, calibrations, etc.) stored in relational databases • Databases also play a critical role in grid middleware: file catalogues, monitoring, etc. • Crosscutting the computational grid infrastructure, a database hyperinfrastructure emerges CHEP06 Mumbai India File Transport RFT Database RLS Database OSG PanDA DB Non-LHC Sites Large Scale Distributed Computations Management Production DB System Workload Orchestration Meta-data DB WLCG Sites ATLAS Sites NorduGrid Production DB CMS Sites Sites Cluster RLS Database RLS Database Monitoring DB Worker Node Head Node Edge Services Worker Node Worker Node Alexandre Vaniachine (ANL) World-Wide Federation of Computational Grids Conditions DB 2 Open Science Grid Project DASH • As grid computing technologies mature, development must focus on database and grid integration • New technologies are required to bridge the gap between data accessibility and the increasing power of grid computing used for distributed event production and processing • The Database Access for Secure Hyperinfrastructure (DASH) project is funded by the DOE Small Business Innovative Research Program to build and test secure high-performance database access technology for distributed computing www.piocon.com/DASH.php A project of PIOCON Technologies, Inc and Argonne National Laboratory CHEP06 Mumbai India Alexandre Vaniachine (ANL) 3 Open Science Grid Database Access on the Grid Two different architectures: • A separate middleware server does the grid authorization: • • • • OGSA-DAI: SOAP/XML + XML binary extensions Spitfire (EDG WP2): SOAP/XML text-only data transport Perl DBI database proxy (ALICE): SQL data transport Oracle 10g (separate authorization layer) • Grid middleware is integrated in database server process: • Instead of surrounding database with external secure middleware layers the safety features are embedded inside of the code • By pushing secure authorization into the database engine the inefficient data transfer bottlenecks are eliminated CHEP06 Mumbai India Alexandre Vaniachine (ANL) 4 Open Science Grid Embedded Security Approach • The embedded security approach is listed among the top ten innovations in security by the panel of experts convened by Battelle: – “The Global Cyber Net: Communications and information are the lifeblood of security. Today we enjoy a worldwide web, which is open but unsecured. In the future, we will have a global cyber net that is faster and better protected than today… Software will contain embedded safety features inside of the code rather than just surrounding it.” http://www.battelle.org/forecasts/defense.stm CHEP06 Mumbai India Alexandre Vaniachine (ANL) 5 Open Science Grid End-to-End Secure Transport • DASH technology bridges the gap between data accessibility and the increasing power of grid computing • To overcome database access inefficiencies inherent in a traditional middleware approach the DASH project implements secure authorization on the transport level • Pushing the grid authorization into the database engine eliminates the middleware message-level security layer and delivers transportlevel efficiency of SSL/TLS protocols for grid applications • The DASH proof-of-concept prototype provides Globus grid proxy certificate authorization technologies for MySQL database access control • DASH technology brings database access efficiencies similar to the https advantages introduced in the Globus Toolkit 4.0 • The database architecture with embedded grid authorization provides a foundation for secure end-to-end data processing solutions for the grids CHEP06 Mumbai India Alexandre Vaniachine (ANL) 6 Open Science Grid Aspect-Oriented Programming • To avoid a brittle, monolithic system DASH uses an aspect-oriented programming approach • By localizing Globus security concerns in a software aspect, DASH achieves a clean separation of Globus Grid Security Infrastructure dependencies from the MySQL server code • During the database server build, the AspectC++ tool automatically generates the transport-level code to support a grid security infrastructure • www.aspectc.org CHEP06 Mumbai India Alexandre Vaniachine (ANL) 7 Open Science Grid Automatic Code Generation DASH grid security grid.ah aspects code Globus GSI code cbk.c OpenSSL Transport Level Security code CHEP06 Mumbai India tls.c Auto-generated grid-enabled MySQL database server code vio.c Alexandre Vaniachine (ANL) MySQL database server code 8 Open Science Grid AOP is the Next ‘Big Thing’ A 2001 paper on Aspect Oriented Programming is on Top 10 Downloads from ACM’s Digital Library • Paper by our collaborators from Illinois Institute of Technology ATLAS experience with AOP was first reported at the previous CHEP04 CHEP06 Mumbai India Alexandre Vaniachine (ANL) 9 Open Science Grid Testing New Functionalities • Prototype servers built with DASH technology are being tested in ANL, BNL, CERN and U Geneva • We thank to – – – Jason Smith (BNL) Yuri Smirnov (BNL) Frederik Orellana (U Geneva) Among the new functionalities are • Check for the proxy expiration time • Host name checking (to reject impersonation) CHEP06 Mumbai India Alexandre Vaniachine (ANL) 10 Open Science Grid Packaging Challenge • Initial response from our beta-testers suggested that because of the globus gsi libraries dependencies the preferred distribution would be the static build • However test showed that static builds works best on the platforms (Linux distributions) very close to those that of the build machine • We experienced unexpected sensitivities to the minor variations in the glibc library version • We are now addressing that issue by developing the dynamic build that will have the static globus gsi and openssl libraries built in CHEP06 Mumbai India Alexandre Vaniachine (ANL) 11 Open Science Grid Scalability Challenge • Large-scale world-wide distributed simulations performed by the ATLAS Collaboration show steady progress in grid computing • The chaotic Rome Production (mix of jobs) LCG/CondorG nature of LCG/Original opportunistic grid NorduGrid Grid3 computations results in Data Challenge 2 variations in daily (short jobs period) production rates Data Challenge 2 • Database (long jobs period) services capacities should be adequate for peak demand 14000 12000 Jobs/day 10000 8000 6000 4000 2000 0 Jul CHEP06 Mumbai India Aug Sep Oct Nov Alexandre Vaniachine (ANL) Dec Jan Feb Mar Apr May 12 Open Science Grid Why Dynamic Deployment? • The high level of sharing of computational resources achieved on grids result in increased fluctuations in demand for database services, because of the chaotic nature of shared resource availability • Static services deployment require over-capacity • Opportunistic production on non-LCG sites requires database services deployment on-demand • To provide on-demand database services capability for Open Science Grid, the Edge Services Framework activity builds the DASH mysql-gsi database server into the virtual machine image, which is dynamically deployed via Globus Virtual Workspaces CHEP06 Mumbai India Alexandre Vaniachine (ANL) 13 Open Science Grid Edge Services • Services executing on the edge of the public and private network CMS CE ATLAS CDF Guest VO SE Site Compute nodes and Storage nodes • See CHEP06 contribution id # 214 http://indico.cern.ch/contributionDisplay.py?contribId=214&sessionId=7&confId=048 CHEP06 Mumbai India Alexandre Vaniachine (ANL) 14 Open Science Grid Synergistic Collaboration CMS & ATLAS collaborate in OSG ESF Activity http://www.opensciencegrid.org/esf To achieve the ESF proof-of-concept milestone: • The first ESF VM was deployed by CMS • The first ESF service on that VM was by ATLAS: – Grid-enabled MySQL database built by the DASH project • To access the server the grid job used proxy certificate (instead of the clear-text passwords hardwired in the scripts that are distributed world-wide) CHEP06 Mumbai India Alexandre Vaniachine (ANL) 15 Open Science Grid Collaboration Benefits Celebrating ESF proof-of-concept milestone at Supercomputing 2005 CHEP06 Mumbai India Alexandre Vaniachine (ANL) 16 Open Science Grid Globus Folder at SC05 http://www.globus.org/alliance/events/sc05 CHEP06 Mumbai India Alexandre Vaniachine (ANL) 17 Open Science Grid Complementary Project • A new collaborative project with the Globus team has just started at Argonne – to grid-enable the PostgreSQL database • Both DASH and the new project target technology integration with OSGA-DAI • Please contact us if you are interested to contribute to these projects CHEP06 Mumbai India Alexandre Vaniachine (ANL) 18 Open Science Grid OGSA-DAI Complementarity Why you might NOT want to use OGSA-DAI You want very fast data access – OGSA-DAI is slower than direct connection methods e.g., JDBC – But remember OGSA-DAI provides functionality “over and above” these methods • e.g. data delivery and transformation You need scalability • Neil P Chue Hong, OGSADAI Status Summary Third OGSADAI Users Group Meeting, 6/1/2005 – Depends on your intended usage of e.g., delivery mechanisms, number of clients etc. You don’t care about interoperability with other Grid software or are only using one type of data • Through resource our continued interactions with OGSA-DAI team we have established – OGSA-DAI may be overkill working relationships to achieve technological compatibility CHEP06 Mumbai India Alexandre Vaniachine (ANL) 19 Open Science Grid Additional Benefits • Direct access to database servers unleashes a broad range of vendor-specific server capabilities for data processing applications: distributed XA transactions, binary data transport, etc. • Grid proxy certificate technology opens technical opportunities to enable fine-grained delegation of rights for access control (attribute certificates) • Grid-enabled relational database server technology has the potential for application beyond the domain of high energy physics, and is of interest to bioinformatics and other data-intensive sciences CHEP06 Mumbai India Alexandre Vaniachine (ANL) 20 Open Science Grid DASH Outreach DASH Presentations at the Conferences and Workshops Supercomputing 2005, November 12-18, 2005 Washington State Convention and Trade Center, Seattle, Washington, USA http://osg-docdb.opensciencegrid.org/cgi-bin/ShowDocument?docid=307 First DIALOGUE Workshop: Applications-Driven Issues in Data Grids August 1-2, 2005, The Ohio State University, Columbus, Ohio, USA http://www.datagrids.org/ws/docs/High-performanceDatabaseAccess.ppt CHEP06 Mumbai India Alexandre Vaniachine (ANL) 21