Download November 2008_Grid - Trinity College Dublin

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

IEEE 1355 wikipedia , lookup

Computer security wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Distributed operating system wikipedia , lookup

Transcript
Financial Informatics –XVIII:
Grid Computing
Khurshid Ahmad,
Professor of Computer Science,
Department of Computer Science
Trinity College,
Dublin-2, IRELAND
November 19th, 2008.
https://www.cs.tcd.ie/Khurshid.Ahmad/Teaching.html
1
1
Financial Services:
Data and Compute Intense Activities
Financial programs that are data- and compute-intense
(Giga-byte analysis using Monte Carlo for example)
Domain
Capital markets
Applications
Pricing or scenario
analysis – buy-sell online decisions
Risk management or Portfolio analysis and
re-evaluation
middle-office
functions
2
The Evolution of Computing
Communication
COMPUTING
* HTC
* Mainframes
*
Minicomputers
* PCs
* PDAs
* Workstations
* P2P
* Grids
* PC Clusters
* Crays
* MPPs
* XEROX PARC worm
* WS Clusters
* IETF
* W3C
* TCP/IP
* Ethernet
* HTML * Mosaic
* Email
* Sputnik
1960
* Internet Era
* ARPANET
1970
1975
1980
1985
Source: www.gridbus.org
* WWW Era
1990
* Web Services
* XML
1995
2000
3
The Evolution of Computing
2100
2100
2100
2100
2100
2100
2100
2100
Source: www.gridbus.org
P
E
R
F
O
R
M
A
N
C
E
2100
Administrative Barriers
•Individual
•Group
•Department
•Campus
•State
•National
•Globe
•Inter Planet
•Universe
+
Q
o
S
Personal Device
SMPs or
SuperComputers
Local
Cluster
Enterprise
Cluster/Grid
Global
Grid
4
DEFINITIONS: Grid?
ELECTRICITY GRID:
A network of high-voltage transmission lines
and connections that supply electricity from a
number of generating stations to various
distribution centres in a country or a region, so
that no consumer is dependent on a single
station.
UTILITY GRID:
(Term) used of any network that serves a
similar purpose for other services.
5
DEFINITIONS: Grid?
The Grid is envisaged to be ‘the
computing and data management
infrastructure that will provide the
electronic underpinning for a global
society in business, government,
science and entertainment’
Berman F, Fox G. C, and Hey A. J. G. (Eds). (2003). Grid Computing: Making
the Global Infrastructure a Reality, Wiley: Chichester
6
DEFINITIONS: Grid?
A Grid is a virtual information
processing environment where
the user has the ‘illusion’ of a
seamless single-source
computing power which is
actually distributed.
7
DEFINITIONS: Grid?
Grids have succeeded in
providing an infrastructure for
deploying parallel applications
in a distributed setting with a
high degree of automation.
R. Jiménez-Peris, M. Patiño-Martínez, and B. Kemme. (2007)Enterprise Grids: Challenges
Ahead. Journal of Grid Computing Vol 5, pp 283–294
8
DEFINITIONS: Grid?
Year
1998
2001
2002
Elaboration
Grid
computing
systems
are
“an infrastructure to
provide easy and
inexpensive access to highend computing
“an infrastructure to share
resources for collaborative
problem solving”
“an infrastructure to pool
and virtualize resources
and enable their use in a
transparent fashion.”
R. Jiménez-Peris, M. Patiño-Martínez, and B. Kemme. (2007)Enterprise Grids: Challenges
Ahead. Journal of Grid Computing Vol 5, pp 283–294
9
The Evolution of the GRID
1980’s
1990’s
2000
Parallel computing clusters - improved
performance from tightly coupled clusters and
data sharing
Grid 1: Extend the advances in parallel
computing to geographically distributed
systems
Grid II: Grid is a platform for integrating
loosely coupled applications: some
components running in parallel and some for
linking disparate resources largely developed
in the serial-von-Neumann paradigm storage, visualisation, a-d/d-a converters and
sensors
10
The Evolution of the Financial
Services’ Computing
Hardware
Mainframe with
lots of distributed
dumb terminals;
24/7 operation,
Software
Programs on a disk
that were retrieved
‘manually’ for
execution, data on
tapes
Client/server and
Remote access,
LAN-based systems Programs online, data
on (unsecured) disks
Data distributed
Programs and data
through N-tier
distributed, easily
architectures
accessible, ad-hoc
security
Reach/Service
Local operations;
heavy-duty
operations
scheduled afterhours;
National and
limited transnational operations
Quasi-Globalised
operations
11
The Evolution of the Financial
Services’ Computing
Evolution in financial services computing
Hardware Software
Reach/Service
Grid
Pool and
Globalised, quasi
Computing virtualize
secure operations
hardware and
software
resources
12
The Evolution of the GRID
Currently there are (clusters) of very powerful
computing/ communications systems
(i) Systems for acquiring digital data and processing
data (Amazon.com or Oracle clusters)
(ii) Systems for analysing and visualising information
(CERN’s large hadron collider, Protein Synthesis
systems)
(iii) Systems for imaging, analysis and visualisation for
distributed data (weather prediction, satellite based
military civilian systems)
(iv) Systems that can link Sensors and predict on realtime information (military systems, video surveillance)
13
The Evolution of the GRID
Developments in networking technologies, operating systems,
clustered data bases, application services and device
technologies have enabled developers to build systems with
literally distributed millions of nodes for providing:
•Web-based services personal commercial transactions
•Content delivery networks that can cache web-pages
seamlessly
•Wireless networks have spawned ad-hoc distributed
systems that when linked to wide-area networks lead to a
complex distributed system.
Problems of efficiency, reliability, accessibility and security
are not addressed in ‘global’ terms.
14
The Evolution of the GRID
Grid is being developed
not only to make
distributed resources
available to end-user not
also to co-ordinate such
usage  for sharing and
aggregation of resources.
15
The Evolution of the GRID
Moore’s law improvements in computing
produce highly functional end-systems
The internet and burgeoning wired and
wireless provide wide-spread connectivity
Changing modes of working and problem
solving emphasise teamwork, computation
Network growth produce dramatic changes in
topolgy and geography
16
The Evolution of the GRID
The first generation involved proprietary
solutions for sharing high-performance
computing resources
The second generation introduced middleware
to cope with scale and heterogenity
The third generation introduced a serviceoriented approach leading to commercial
projects in addition to the scientific projects
now collectively known as e-Science
17
The Evolution of the GRID
The first generation










FAFNER, I-WAY
The second generation
Technologies: Globus, Legion
Distributed object systems (Jini and RMI, The common
component architecture form)
Grid resource brokers and schedulers
Grid portals
Integrated systems
Peer-to-Peer computing
The third generation
Service-oriented architecture (web services, OGSA, Agents)
Information aspects: relation with the World Wide Web
Live information systems
18
Building blocks of the Grid
(1) Networks
(2) Computational ‘nodes’ on the
Grid
(3) Pulling it all together
(4) Common infrastructure:
standards
19
GRID: Key Issues
Resources
Discovery, Allocation,
Scheduling
Availability Access, Security, Networks
Efficiency
Hardware
Economy,
Management Administration.
Computers, Services, Networks
Application Development, Testing
20
GRID: Key Issues  Sharing
Sharing issues are not adequately addressed
by existing technologies
 Complicated requirements:” run program
X at site Y subject to community policy
P, providing access to data at Z according
to policy Q”
 High performance: unique demands of
advanced & high-performance systems.
21
GRID: Key Issues  Sharing
A biochemist will be able to exploit 10,000
computers to screen 100,000 compounds in an hour;
1,000 physicists worldwide will be able to pool resources for
petop analyses of petabytes of data
A multidisciplinary analysis in aerospace couples
code and data in geographically distributed
organisations may be possible;
Civil engineers colloborate to design, execute, and analyse
shake table experiments;
Climate scientists will be able to visualise, annotate,
and analyse terabyte simulation datasets
22
GRID: Key Issues  Sharing
Online Access to Scientific Instruments
Advanced Photon Source
wide-area
dissemination
real-time
collection
archival
storage
desktop & VR clients
with shared controls
tomographic reconstruction
DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago
23
GRID: Key Issues  Sharing
Data Grids for High Energy Physics
~PBytes/sec
Online System
~100 MBytes/sec
~20 TIPS
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
~622 Mbits/sec
or Air Freight (deprecated)
France Regional
Centre
SpecInt95 equivalents
Offline Processor Farm
There is a “bunch crossing” every 25 nsecs.
Tier 1
1 TIPS is approximately 25,000
Tier 0
Germany Regional
Centre
Italy Regional
Centre
~100 MBytes/sec
CERN Computer Centre
FermiLab ~4 TIPS
~622 Mbits/sec
Tier 2
~622 Mbits/sec
Institute
Institute Institute
~0.25TIPS
Physics data cache
Institute
Caltech
~1 TIPS
Tier2 Centre
Tier2 Centre
Tier2 Centre
Tier2 Centre
~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more
channels; data for these channels should be cached by the
institute server
~1 MBytes/sec
Tier 4
Physicist workstations
Image courtesy Harvey Newman, Caltech24
GRID: Key Issues  Sharing
Network for Earthquake Engineering Simulation
NEESgrid: national
infrastructure to couple
earthquake engineers
with experimental
facilities, databases,
computers, & each
other
On-demand access to
experiments, data
streams, computing,
archives, collaboration
NEESgrid: Argonne, Michigan, NCSA, UIUC, USC
25
GRID: Key Issues  Sharing
The 13.6 TF TeraGrid: Computing at 40 Gb/s
Site Resources
26
4
HPSS
Site Resources
HPSS
24
8
External
Networks
Caltech
HPSS
5
Argonne
External
Networks
External
Networks
Site Resources
External
Networks
SDSC
4.1 TF
225 TB
NCSA/PACI
8 TF
240 TB
TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne
Site Resources
UniTree
www.teragrid.org26
GRID: Key Issues  Sharing
iVDGL:International Virtual Data Grid Laboratory
Tier0/1 facility
Tier2 facility
Tier3 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
U.S. PIs: Avery, Foster, Gardner, Newman, Szalay
www.ivdgl.org
27
GRID: Key Issues
Flexible, secure, coordinated resource sharing among
dynamic collections of individuals, institutions,
and resource.
Enable communities(“Virtual organisations”) to
share geographically distributed resources as they
pursue common goals – assuming the absence of
…
Central location,
Central control,
Omniscience,
Existing trust relationships.
28
Components of the Grid
1.
2.
3.
4.
5.
6.
Resource
Network protocol
Network enabled service
Application Programming Interface(API)
Software Development Kit (SDK)
Syntax
29
Components of the Grid
An entity that is to be shared

E.g., computers, storage, data, software
Does not have to be physical entity

E.g., Condor pool, distributed file system,…
Defined in terms of interfaces, not devices


E.g. scheduler such as LSF and PBS define a compute
resource
Open/close/read/write define access to a distributed file
system, e.g NFS, AFS, DFS
30
Components of the Grid
Network protocol
A formal description of message formats and a set of
rules for message exchange


Rules may define sequence of message exchanges
Protocol may define state-change in endpoint, e.g. file
system state change
Good protocols designed to do one thing

Protocols can be layered
Examples of protocols

IP, TCP, TLS( was SSL), HTTP, Kerberos
31
Components of the Grid:
Network enabled services
Implementation of a protocol that defines a set
of capabilities
 Protocol defines interaction with service
 All services require protocols
 Not all protocols are used to provide
services (e.g. IP, TLS)
Examples: FTP and Web servers
32
Components of the Grid :
Application Programming Interface (API)
A specification for a set of routines to
facilitate application development
Spec often language specific (or IDL)


Routine name, number, order and type of
arguments; mapping to language constructs
Behaviour or function of routine
Examples

GSS API(security), MPI (message passing)
33
Components of the Grid
Software Development Kit (SDK)
A particular instantiation of API
SDK consists of libraries and tools

Provides implementation of API
specification
Can have multiple SDKs for an API
Examples of SDKs

MPICH, Motif Widgets
34
Components of the Grid
Syntax
Rules for encoding information, e.g.

XML, Condor ClassAds, Globus RSL
Distinct from protocols

One syntax may be used by many
protocols
Syntaxes may be layered

E.g., Condor ClassAds -> XML->ASCII
35
Components of the Grid
Syntax
The key impediments to the wider use of Grids
in enterprises is the “interactive nature of
business applications, the large amount of data
that resides in database systems and requires
transactional access, and the component-based
and multi-tier architecture of current enterprise
applications.” Jiménez-Peris, Patiño-Martínez,
and Kemme 2007:292)
R. Jiménez-Peris, M. Patiño-Martínez, and B. Kemme. (2007)Enterprise Grids: Challenges
Ahead. Journal of Grid Computing Vol 5, pp 283–294
36