Download Farm/Grid for USTC

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Piggybacking (Internet access) wikipedia , lookup

IEEE 1355 wikipedia , lookup

Transcript
Introduction of Farm&Grid
I. PC Farm: clustering local resources
The basic idea of Farm is to cluster economical commercial PCs together to achieve some
powerful capabilities of Super-Computer.
The three key components of Farm are Disk Server, Node and Network connection:
1) Disk is used for large and frequent data storage/transfer. A typical amount of space is
about >1Terabyte, which can be composed by grouping multiple ~100GB cheap
commercial disks into a “whole disk” with fast and reliable Net link. High quality
motherboard is required for Disk survival in large/frequent data transformation.
2) Nodes are where users’ programs are run and data is processed. They are composed by
high performance PCs that act as an integrated CPU and Memory.
3) GHz fast Network Links and Switches are needed to integrate Disk and Nodes up, and
talk to outside world for data transformation.
Following is the example of Manchester Dzero Farm composition:
A brief, Farm is a kind of local computer network cluster, which can act as Super Computers with
a large Disk server and CPU/Memory Nodes array integrated by fast Network Link.
II. Grid:
Today, we scientists are now facing a challenging problem: new generations of science, including
particle physics experiments, astronomical satellites and telescopes, genome databases,
digitization of paper archives etc, are expected to produce huge boosts in the amount of data to be
stored and processed in the next few years by increasingly dispersed groups of scientists and
engineers. In particle physics, LHC (Large Hadron Collider) is due to start operation in 2007-8 to
probe fundamental questions such as the origin of mass in the Universe. The two general-purpose
detectors at LHC, Atlas and CMS, contain over a hundred million individual electronic channels,
each producing digital data at a rate of 40 million cycles a second. Even after event selection for
interesting physics, the total amount of data produced is likely to be several petabytes (1PB is a
million GB, equivalent to about 10 million CDs) per year. Such huge volume of data must be
made available for analysis by hundreds of physicists all over the world looking for a handful of
very rare events. The “Grid” is considered as the solution to these kinds of computational and
data-intensive problem.
The Grid takes its name from the Electricity Grid that provides a ubiquitous supply of electricity
through a standard interface (plug and socket) throughout the country and with suitable
conversion across the world. The complexity of power stations, sub-stations, power lines etc, is
hidden from the end-user, who simply plugs in his appliance.
In a computational Grid, the power stations are collections/clusters (“Farms”) of computers and
data storage centers and the power lines are fiber optics of the network links. Special software,
called “middleware”, provides the interfaces through which users can submit their own programs
to these computers and access the data stored. The user doesn’t need to know or care where his
program actually runs or where his data is actually located as long as he gets his results back as
quickly and reliably as possible. Since the computing resources will have many different owners,
economic models need to be established or credits will be exchanged within “Virtual
Organizations”(VO) such as worldwide particle physics community, which is similar to real
money is charged using electricity.
Although the components of Grid, computers, disks, network etc have existed for many years, to
seamlessly integrate thousands of them together into one distributed system that looks very muck
like one enormous PC to users is a severe challenge.
Standard protocols as a means of enabling interoperability and common infrastructure have been
defined for Grid. A “Grid” is a system that:



Coordinates resources that are not subject to centralized control. This is vital point
of Grid, otherwise we are dealing with a local management system such as Farm. It
would be impracticable that everyone who wants to join the project would have to put all
his own investment into one place, e.g. CERN. Grid should be the integration of local
Farms spreading all over the world with some large “Regional Centers” (as CERN for
particle physics). Clustering at local level reflects the normal funding mechanism and
divides the hardware into maintainable chunks as well. Such clustering ensures the
resources remain under local control, and they can be switched in and out of Grid at will
without breaking the Grid. Of course, agreements should be setup among disperse Farms
to build up the VO.
Using standard, open, general-purpose protocols and interfaces. A Grid is built from
multi-purpose protocols and interfaces that address such fundamental issues as
authentication, authorization, resource discovery, and resource access. It is important that
these protocols and interfaces be standard and open. Otherwise, we are dealing with an
application-specific system. In a simple word, Grid wouldn’t be and should not be some
kind “Dzero” or “LHC” computer; although some constituent Farms in Grid may focus
on these particle physics experiments, but the whole Grid should be much more generalpurpose and open to different users.
Deliver nontrivial qualities of service. A Grid allows its constituent resources to be
used in a coordinated fashion to deliver various qualities of service, relating for example
to response time, throughput, availability, and security, and/or co-allocation of multiple
resource types to meet complex user demands, so that the utility of the combined system
is significantly greater than that of the sum of its parts. This is that user can not only have
access to the data stored on Grid, but also hardware capabilities (CPU, memory etc) of its
constituent Farm resources based on pre-setup agreement. A very interesting example is
Web: it satisfies the first to criteria, i.e. its open, general-purpose protocols support
access to distributed resources, but it fails to last one of delivering high qualities of
service for you can only access/download data you want but can’t run them on remote
machines.
In a brief, the Grid is considered as the next and more important IT revolution after
CERN developing World-Wide-Web. The standard protocols are clearly defined;
management tools (e.g. “middleware”) have been developed and under test; the particle
physics community VO hierarchical system is being formed and agreements have been
setup among VO’s constituent Farms so that the “membership” can be identified in the
same way Web identifies IP (Internet Protocol). It is becoming more and more clear that
the Grid has the potential to bring fundamental changes to the way large-scale computing
is handled both in academic and in industry. We particle physicists will build a Grid to
enable us to analyze data from LHC and other collider experiments, and use it as a test
bed for the new technology.
III. DØ Application for Grid:
DØ is expected to accumulate a data and Monte Carlo (MC) sample in excess of 10 PB
over the duration of the current Tevatron running period. Due to the complex nature of
proton-antiproton collisions, the simulation and reconstruction of the events is a
complicated and resource intensive operation. The computing resources required to
support this effort are far larger than can be supplied at a single central location
(Fermilab). Thus, DØ Collaboration calls for an increase in the amount of production
level computing resources carried out at off-site regional analysis centers (RAC Farm)
including the reconstruction of data, MC generation, and running analysis software. This
way the Collaboration would integrate remote computing resources into the DØ
computation model. This realization led to the instigation of the SAM (Sequential data
Access via Meta-data, Tevatron RunII database) project between DØ and Fermilab
Computing Division to develop an effective functioning computation grid for the DØ
experiment.
It will be a good opportunity for USTC/China to learn and build a world first-class
computer system, by joining DØ Collaboration, developing a local RAC Farm and
deploying the Grid technology within China:



To get the help of DØ/Manchester computer experts to build our local DØ RAC Farm
To provide a computational Grid for the USTC/China DØ groups.
To contribute to DØ’s overall computing needs and so receive a credit from the Grid
Appendix:
 Farm: clustering cheap commercial computer resources with fast network link to
achieve some power of Super-Computer. It’s actually a local high capability
computational Network.
 Grid: disperse Farms over world constitute a Virtual Organization by committing
some common agreements. To an “ignorant” individual user who just wants to
run his code on some specified data, the Grid looks like a Hyper-Computer with
unique hardware resource (CPU, Memory etc) and all data stored. The
membership Farm has potential to access any resources (data/information +
hardware) in the Grid. Farms are under local control and can switch in/out-of the
Grid at will.
 SAM: Sequential data Access via Meta-data, is a DØ-FNAL Computer Division
project to build the first fully functional HEP Data-Grid moving PByte data
among worldwide distributed computing resources. According the first two-year
Computing Model performance review, it has become apparent that the
computing resources required to support the DØ physics program are larger than
previously expected; thus, the off-site Grid solution for SAM is crucial to DØ
Collaboration. Lots of effort has been contributed to the SAM Grid development,
and decisive progress is expected in the near future.