Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Financial Informatics –XVIII: Grid Computing Khurshid Ahmad, Professor of Computer Science, Department of Computer Science Trinity College, Dublin-2, IRELAND November 19th, 2008. https://www.cs.tcd.ie/Khurshid.Ahmad/Teaching.html 1 1 Financial Services: Data and Compute Intense Activities Financial programs that are data- and compute-intense (Giga-byte analysis using Monte Carlo for example) Domain Capital markets Applications Pricing or scenario analysis – buy-sell online decisions Risk management or Portfolio analysis and re-evaluation middle-office functions 2 The Evolution of Computing Communication COMPUTING * HTC * Mainframes * Minicomputers * PCs * PDAs * Workstations * P2P * Grids * PC Clusters * Crays * MPPs * XEROX PARC worm * WS Clusters * IETF * W3C * TCP/IP * Ethernet * HTML * Mosaic * Email * Sputnik 1960 * Internet Era * ARPANET 1970 1975 1980 1985 Source: www.gridbus.org * WWW Era 1990 * Web Services * XML 1995 2000 3 The Evolution of Computing 2100 2100 2100 2100 2100 2100 2100 2100 Source: www.gridbus.org P E R F O R M A N C E 2100 Administrative Barriers •Individual •Group •Department •Campus •State •National •Globe •Inter Planet •Universe + Q o S Personal Device SMPs or SuperComputers Local Cluster Enterprise Cluster/Grid Global Grid 4 DEFINITIONS: Grid? ELECTRICITY GRID: A network of high-voltage transmission lines and connections that supply electricity from a number of generating stations to various distribution centres in a country or a region, so that no consumer is dependent on a single station. UTILITY GRID: (Term) used of any network that serves a similar purpose for other services. 5 DEFINITIONS: Grid? The Grid is envisaged to be ‘the computing and data management infrastructure that will provide the electronic underpinning for a global society in business, government, science and entertainment’ Berman F, Fox G. C, and Hey A. J. G. (Eds). (2003). Grid Computing: Making the Global Infrastructure a Reality, Wiley: Chichester 6 DEFINITIONS: Grid? A Grid is a virtual information processing environment where the user has the ‘illusion’ of a seamless single-source computing power which is actually distributed. 7 DEFINITIONS: Grid? Grids have succeeded in providing an infrastructure for deploying parallel applications in a distributed setting with a high degree of automation. R. Jiménez-Peris, M. Patiño-Martínez, and B. Kemme. (2007)Enterprise Grids: Challenges Ahead. Journal of Grid Computing Vol 5, pp 283–294 8 DEFINITIONS: Grid? Year 1998 2001 2002 Elaboration Grid computing systems are “an infrastructure to provide easy and inexpensive access to highend computing “an infrastructure to share resources for collaborative problem solving” “an infrastructure to pool and virtualize resources and enable their use in a transparent fashion.” R. Jiménez-Peris, M. Patiño-Martínez, and B. Kemme. (2007)Enterprise Grids: Challenges Ahead. Journal of Grid Computing Vol 5, pp 283–294 9 The Evolution of the GRID 1980’s 1990’s 2000 Parallel computing clusters - improved performance from tightly coupled clusters and data sharing Grid 1: Extend the advances in parallel computing to geographically distributed systems Grid II: Grid is a platform for integrating loosely coupled applications: some components running in parallel and some for linking disparate resources largely developed in the serial-von-Neumann paradigm storage, visualisation, a-d/d-a converters and sensors 10 The Evolution of the Financial Services’ Computing Hardware Mainframe with lots of distributed dumb terminals; 24/7 operation, Software Programs on a disk that were retrieved ‘manually’ for execution, data on tapes Client/server and Remote access, LAN-based systems Programs online, data on (unsecured) disks Data distributed Programs and data through N-tier distributed, easily architectures accessible, ad-hoc security Reach/Service Local operations; heavy-duty operations scheduled afterhours; National and limited transnational operations Quasi-Globalised operations 11 The Evolution of the Financial Services’ Computing Evolution in financial services computing Hardware Software Reach/Service Grid Pool and Globalised, quasi Computing virtualize secure operations hardware and software resources 12 The Evolution of the GRID Currently there are (clusters) of very powerful computing/ communications systems (i) Systems for acquiring digital data and processing data (Amazon.com or Oracle clusters) (ii) Systems for analysing and visualising information (CERN’s large hadron collider, Protein Synthesis systems) (iii) Systems for imaging, analysis and visualisation for distributed data (weather prediction, satellite based military civilian systems) (iv) Systems that can link Sensors and predict on realtime information (military systems, video surveillance) 13 The Evolution of the GRID Developments in networking technologies, operating systems, clustered data bases, application services and device technologies have enabled developers to build systems with literally distributed millions of nodes for providing: •Web-based services personal commercial transactions •Content delivery networks that can cache web-pages seamlessly •Wireless networks have spawned ad-hoc distributed systems that when linked to wide-area networks lead to a complex distributed system. Problems of efficiency, reliability, accessibility and security are not addressed in ‘global’ terms. 14 The Evolution of the GRID Grid is being developed not only to make distributed resources available to end-user not also to co-ordinate such usage for sharing and aggregation of resources. 15 The Evolution of the GRID Moore’s law improvements in computing produce highly functional end-systems The internet and burgeoning wired and wireless provide wide-spread connectivity Changing modes of working and problem solving emphasise teamwork, computation Network growth produce dramatic changes in topolgy and geography 16 The Evolution of the GRID The first generation involved proprietary solutions for sharing high-performance computing resources The second generation introduced middleware to cope with scale and heterogenity The third generation introduced a serviceoriented approach leading to commercial projects in addition to the scientific projects now collectively known as e-Science 17 The Evolution of the GRID The first generation FAFNER, I-WAY The second generation Technologies: Globus, Legion Distributed object systems (Jini and RMI, The common component architecture form) Grid resource brokers and schedulers Grid portals Integrated systems Peer-to-Peer computing The third generation Service-oriented architecture (web services, OGSA, Agents) Information aspects: relation with the World Wide Web Live information systems 18 Building blocks of the Grid (1) Networks (2) Computational ‘nodes’ on the Grid (3) Pulling it all together (4) Common infrastructure: standards 19 GRID: Key Issues Resources Discovery, Allocation, Scheduling Availability Access, Security, Networks Efficiency Hardware Economy, Management Administration. Computers, Services, Networks Application Development, Testing 20 GRID: Key Issues Sharing Sharing issues are not adequately addressed by existing technologies Complicated requirements:” run program X at site Y subject to community policy P, providing access to data at Z according to policy Q” High performance: unique demands of advanced & high-performance systems. 21 GRID: Key Issues Sharing A biochemist will be able to exploit 10,000 computers to screen 100,000 compounds in an hour; 1,000 physicists worldwide will be able to pool resources for petop analyses of petabytes of data A multidisciplinary analysis in aerospace couples code and data in geographically distributed organisations may be possible; Civil engineers colloborate to design, execute, and analyse shake table experiments; Climate scientists will be able to visualise, annotate, and analyse terabyte simulation datasets 22 GRID: Key Issues Sharing Online Access to Scientific Instruments Advanced Photon Source wide-area dissemination real-time collection archival storage desktop & VR clients with shared controls tomographic reconstruction DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago 23 GRID: Key Issues Sharing Data Grids for High Energy Physics ~PBytes/sec Online System ~100 MBytes/sec ~20 TIPS There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~622 Mbits/sec or Air Freight (deprecated) France Regional Centre SpecInt95 equivalents Offline Processor Farm There is a “bunch crossing” every 25 nsecs. Tier 1 1 TIPS is approximately 25,000 Tier 0 Germany Regional Centre Italy Regional Centre ~100 MBytes/sec CERN Computer Centre FermiLab ~4 TIPS ~622 Mbits/sec Tier 2 ~622 Mbits/sec Institute Institute Institute ~0.25TIPS Physics data cache Institute Caltech ~1 TIPS Tier2 Centre Tier2 Centre Tier2 Centre Tier2 Centre ~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server ~1 MBytes/sec Tier 4 Physicist workstations Image courtesy Harvey Newman, Caltech24 GRID: Key Issues Sharing Network for Earthquake Engineering Simulation NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other On-demand access to experiments, data streams, computing, archives, collaboration NEESgrid: Argonne, Michigan, NCSA, UIUC, USC 25 GRID: Key Issues Sharing The 13.6 TF TeraGrid: Computing at 40 Gb/s Site Resources 26 4 HPSS Site Resources HPSS 24 8 External Networks Caltech HPSS 5 Argonne External Networks External Networks Site Resources External Networks SDSC 4.1 TF 225 TB NCSA/PACI 8 TF 240 TB TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne Site Resources UniTree www.teragrid.org26 GRID: Key Issues Sharing iVDGL:International Virtual Data Grid Laboratory Tier0/1 facility Tier2 facility Tier3 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link U.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org 27 GRID: Key Issues Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource. Enable communities(“Virtual organisations”) to share geographically distributed resources as they pursue common goals – assuming the absence of … Central location, Central control, Omniscience, Existing trust relationships. 28 Components of the Grid 1. 2. 3. 4. 5. 6. Resource Network protocol Network enabled service Application Programming Interface(API) Software Development Kit (SDK) Syntax 29 Components of the Grid An entity that is to be shared E.g., computers, storage, data, software Does not have to be physical entity E.g., Condor pool, distributed file system,… Defined in terms of interfaces, not devices E.g. scheduler such as LSF and PBS define a compute resource Open/close/read/write define access to a distributed file system, e.g NFS, AFS, DFS 30 Components of the Grid Network protocol A formal description of message formats and a set of rules for message exchange Rules may define sequence of message exchanges Protocol may define state-change in endpoint, e.g. file system state change Good protocols designed to do one thing Protocols can be layered Examples of protocols IP, TCP, TLS( was SSL), HTTP, Kerberos 31 Components of the Grid: Network enabled services Implementation of a protocol that defines a set of capabilities Protocol defines interaction with service All services require protocols Not all protocols are used to provide services (e.g. IP, TLS) Examples: FTP and Web servers 32 Components of the Grid : Application Programming Interface (API) A specification for a set of routines to facilitate application development Spec often language specific (or IDL) Routine name, number, order and type of arguments; mapping to language constructs Behaviour or function of routine Examples GSS API(security), MPI (message passing) 33 Components of the Grid Software Development Kit (SDK) A particular instantiation of API SDK consists of libraries and tools Provides implementation of API specification Can have multiple SDKs for an API Examples of SDKs MPICH, Motif Widgets 34 Components of the Grid Syntax Rules for encoding information, e.g. XML, Condor ClassAds, Globus RSL Distinct from protocols One syntax may be used by many protocols Syntaxes may be layered E.g., Condor ClassAds -> XML->ASCII 35 Components of the Grid Syntax The key impediments to the wider use of Grids in enterprises is the “interactive nature of business applications, the large amount of data that resides in database systems and requires transactional access, and the component-based and multi-tier architecture of current enterprise applications.” Jiménez-Peris, Patiño-Martínez, and Kemme 2007:292) R. Jiménez-Peris, M. Patiño-Martínez, and B. Kemme. (2007)Enterprise Grids: Challenges Ahead. Journal of Grid Computing Vol 5, pp 283–294 36