Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Grid Computing Midwest Grid Workshop Module 1 1 Scaling up Social Science: Citation Network Analysis 1975 1980 1985 1990 1995 2000 2002 Work of James Evans, University of Chicago, Department of Sociology 2 Scaling up the analysis Database queries of 25+ million citations Work started on small workstations Queries grew to month-long duration With database distributed across U of Chicago TeraPort cluster: 50 (faster) CPUs gave 100 X speedup Many more methods and hypotheses can be tested! Higher throughput and capacity enables deeper analysis and broader community access. 3 Computing clusters have commoditized supercomputing Cluster Management “frontend” I/O Servers typically RAID fileserver Disk Arrays A few Headnodes, gatekeepers and other service nodes Lots of Worker Nodes Tape Backup robots 4 PUMA: Analysis of Metabolism PUMA Knowledge Base Information about proteins analyzed against ~2 million gene sequences Analysis on Grid Natalia Maltsev et al. http://compbio.mcs.anl.gov/puma2 Involves millions of BLAST, BLOCKS, and other processes 5 Initial driver: High Energy Physics ~PBytes/sec Online System ~100 MBytes/sec ~20 TIPS There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~622 Mbits/sec or Air Freight (deprecated) France Regional Centre SpecInt95 equivalents Offline Processor Farm There is a “bunch crossing” every 25 nsecs. Tier 1 1 TIPS is approximately 25,000 Tier 0 Germany Regional Centre Italy Regional Centre ~100 MBytes/sec CERN Computer Centre FermiLab ~4 TIPS ~622 Mbits/sec Tier 2 ~622 Mbits/sec Institute Institute Institute ~0.25TIPS Physics data cache Caltech ~1 TIPS Institute Tier2 Centre Tier2 Centre Tier2 Centre Tier2 Centre ~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server ~1 MBytes/sec Tier 4 Physicist workstations Image courtesy Harvey Newman, Caltech 6 Grids Provide Global Resources To Enable e-Science 7 Ian Foster’s Grid Checklist A Grid is a system that: Coordinates resources that are not subject to centralized control Uses standard, open, general-purpose protocols and interfaces Delivers non-trivial qualities of service 8 Virtual Organizations Groups of organizations that use the Grid to share resources for specific purposes Support a single community Deploy compatible technology and agree on working policies Security policies - difficult Deploy different network accessible services: Grid Information Grid Resource Brokering Grid Monitoring Grid Accounting 9 What is a grid made of ? Middleware. Grid Middleware Grid Storage Grid Protocols Grid resource time purchased from commercial provider Grid Storage Grid Middleware Computing Cluster Grid Middleware Grid resources shared by OSG, LCG, NorduGRID Grid Storage Resource, Workflow And Data Catalogs Grid Middleware Computing Cluster Application User Interface Computing Cluster Grid Resources dedicated by UC, IU, Boston Grid Client Security to control access and protect communication (GSI) Directory to locate grid sites and services: (VORS, MDS) Uniform interface to computing sites (GRAM) Facility to maintain and schedule queues of work (Condor-G) Fast and secure data set mover (GridFTP, RFT) Directory to track where datasets live (RLS) 10 We will focus on Globus and Condor Condor provides both client & server scheduling In grids, Condor provides an agent to queue, schedule and manage work submission Globus Toolkit provides the lower-level middleware Client tools which you can use from a command line APIs (scripting languages, C, C++, Java, …) to build your own tools, or use direct from applications Web service interfaces Higher level tools built from these basic components, e.g. Reliable File Transfer (RFT) 11 Grid components form a “stack” Grid Application (M6,M7,M8) (often includes a Portal) Workflow system (explicit or ad-hoc) (M6) Condor Grid Client (M3) Job Management (M3) Data Management (M4) Information (config & status) Services (M5) Grid Security Infrastructure (M2) Core Globus Services Standard Network Protocols 12 Grid architecture is evolving to a Service-Oriented approach. ...but this is beyond our workshop’s scope. See “Service-Oriented Science” by Ian Foster. Service-oriented applications Wrap applications as services Compose applications into workflows Service-oriented Grid infrastructure Provision physical resources to support application workloads Users Composition Workflows Invocation Appln Service Appln Service Provisioning “The Many Faces of IT as Service”, Foster, Tuecke, 2005 13 Security Workshop Module 2 14 Grid security is a crucial component Resources are typically valuable Problems being solved might be sensitive Resources are located in distinct administrative domains Each resource has own policies, procedures, security mechanisms, etc. Implementation must be broadly available & applicable Standard, well-tested, well-understood protocols; integrated with wide variety of tools 15 Security Services Forms the underlying communication medium for all the services Secure Authentication and Authorization Single Sign-on User explicitly authenticates only once – then single sign-on works for all service requests Uniform Credentials Example: GSI (Grid Security Infrastructure) 16 Authentication means identifying that you are whom you claim to be Authentication stops imposters Examples of authentication: Username and password Passport ID card Public keys or certificates Fingerprint 17 Authorization controls what you are allowed to do. Is this device allowed to access to this service? Read, write, execute permissions in Unix Access conrol lists (ACLs) provide more flexible control Special “callouts” in the grid stack in job and data management perform authorization checks. 18 Job and resource management Workshop Module 3 19 Job Management Services provide a standard interface to remote resources Includes CPU, Storage and Bandwidth Globus component is Globus Resource Allocation Manager (GRAM) The primary Condor grid client component is Condor-G Other needs: scheduling monitoring job migration notification 20 GRAM provides a uniform interface to diverse resource scheduling systems. GRAM User Condor VO Site A PBS VO Site C VO Site B LSF UNIX fork() VO Site D Grid 21 GRAM: What is it? Globus Resource Allocation Manager Given a job specification: Create an environment for a job Stage files to and from the environment Submit a job to a local resource manager Monitor a job Send notifications of the job state change Stream a job’s stdout/err during execution 22 A “Local Resource Manager” is a batch system for running jobs across a computing cluster In GRAM Examples: Condor PBS LSF Sun Grid Engine Most systems allow you to access “fork” Default behavior It runs on the gatekeeper: A bad idea in general, but okay for testing 23 Managing your jobs We need something more than just the basic functionality of the globus job submission commands Some desired features Job tracking Submission of a set of inter-dependant jobs Check-pointing and Job resubmission capability Matchmaking for selecting appropriate resource for executing the job Options: Condor, PBS, LSF, … 24 Data Management Workshop Module 4 25 Data management services provide the mechanisms to find, move and share data Fast Flexible Secure Ubiquitous Grids are used for analyzing and manipulating large amounts of data Metadata (data about data): What is the data? Data location: Where is the data? Data transport: How to move the data? 26 GridFTP is a secure, efficient and standards-based data transfer protocol Robust, fast and widely accepted Globus GridFTP server Globus globus-url-copy GridFTP client Other clients exist (e.g., uberftp) 27 GridFTP is secure, reliable and fast Security through GSI Authentication and authorization Can also provide encryption Reliability by restarting failed transfers Fast Can set TCP buffers for optimal performance Parallel transfers Striping (multiple endpoints) Not all features are accessible from basic client Can be embedded in higher level and application-specific frameworks 28 File catalogues tell you where the data is Replica Location Service (RLS) Phedex RefDB / PupDB 29 Requirements from a File Catalogue Abstract out the logical file name (LFN) for a physical file maintain the mappings between the LFNs and the PFNs (physical file names) Maintain the location information of a file 30 In order to avoid “hotspots”, replicate data files in more than one location Effective use of the grid resources Each LFN can have more than 1 PFN Avoids single point of failure Manual or automatic replication Automatic replication considers the demand for a file, transfer bandwidth, etc. 31 The Globus-Based LIGO Data Grid LIGO Gravitational Wave Observatory Birmingham• Cardiff AEI/Golm Replicating >1 Terabyte/day to 8 sites >40 million replicas so far www.globus.org/solutions 32 National Grid Cyberinfrastructure Workshop Module 5 33 TeraGrid provides vast resources via a number of huge computing facilities. 34 Open Science Grid (OSG) provides shared computing resources, benefiting a broad set of disciplines A consortium of universities and national laboratories, building a sustainable grid infrastructure for science. OSG incorporates advanced networking and focuses on general services, operations, end-to-end performance Composed of a large number (>50 and growing) of shared computing facilities, or “sites” http://www.opensciencegrid.org/ 35 Open Science Grid 50 sites (15,000 CPUs) & growing 400 to >1000 concurrent jobs Many applications + CS experiments; includes long-running production operations Up since October 2003; few FTEs central ops Jobs (2004) www.opensciencegrid.org 36 To efficiently use a Grid, you must locate and monitor its resources. Check the availability of different grid sites Discover different grid services Check the status of “jobs” Make better scheduling decisions with information maintained on the “health” of sites 37 OSG Resource Selection Service: VORS 38 Monitoring and Discovery Service MDS Clients (e.g., WebMDS) WS-ServiceGroup GT4 Container MDSIndex Registration & WSRF/WSN Access GT4 Container MDSIndex Automated registration in container GRAM adapter GT4 Cont. Custom protocols for non-WSRF entities MDSIndex GridFTP User RFT 39 Grid Workflow Workshop Module 6 40 A typical workflow pattern in image analysis runs many filtering apps. 3a.h 3a.i 4a.h 4a.i ref.h ref.i 5a.h 5a.i 6a.h align_warp/1 align_warp/3 align_warp/5 align_warp/7 3a.w 4a.w 5a.w 6a.w reslice/2 reslice/4 reslice/6 reslice/8 3a.s.h 3a.s.i 4a.s.h 4a.s.i 5a.s.h 5a.s.i 6a.s.h 6a.i 6a.s.i softmean/9 atlas.h slicer/10 atlas.i slicer/12 slicer/14 atlas_x.ppm atlas_y.ppm atlas_z.ppm convert/11 convert/13 convert/15 atlas_x.jpg atlas_y.jpg atlas_z.jpg Workflow courtesy James Dobson, Dartmouth Brain Imaging Center 41 Workflows can process vast datasets. Many HEP and Astronomy experiments consist of: Large datasets as inputs (find datasets) “Transformations” which work on the input datasets (process) The output datasets (store and publish) The emphasis is on the sharing of the large datasets Transformations are usually independent and can be parallelized. But they can vary greatly in duration. Mosaic of M42 created on TeraGrid = Data Transfer = Compute Job Montage Workflow: ~1200 jobs, 7 levels 42 Virtual data model enables workflow to abstract grid details. 43 An important application pattern: Grid Data Mining Workshop Module 7 44 Mining Seismic data for hazard analysis (Southern Calif. Earthquake Center). Seismicity Paleoseismology Local site effects Geologic structure Faults Seismic Hazard Model InSAR Image of the Hector Mine Earthquake A satellite generated Interferometric Synthetic Radar (InSAR) image of the 1999 Hector Mine earthquake. Shows the displacement field in the direction of radar imaging Each fringe (e.g., from red to red) corresponds to a few centimeters of displacement. Stress transfer Crustal motion Crustal deformation Seismic velocity structure Rupture dynamics 45 Data mining in the grid Decompose across network Clients integrate dynamically Select & compose services Select “best of breed” providers Publish result as a new service Decouple resource & service providers Users Discovery tools Analysis tools Data Archives Fig: S. G. Djorgovski 46 Conclusion: Why Grids? New approaches to inquiry based on Deep analysis of huge quantities of data Interdisciplinary collaboration Large-scale simulation and analysis Smart instrumentation Dynamically assemble the resources to tackle a new scale of problem Enabled by access to resources & services without regard for location & other barriers 47 Grids: Because Science Takes a Village … Teams organized around common goals With diverse membership & capabilities Expertise in multiple areas required And geographic and political distribution People, resource, software, data, instruments… No location/organization possesses all required skills and resources Must adapt as a function of the situation Adjust membership, reallocate responsibilities, renegotiate resources 48 Based on: Grid Intro and Fundamentals Review Dr Gabrielle Allen Center for Computation & Technology Department of Computer Science Louisiana State University [email protected] Grid Summer Workshop June 26-30, 2006 49