Download Cluster Building and Design

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Distributed operating system wikipedia , lookup

Computer cluster wikipedia , lookup

Transcript
Cluster Building and Design
Cluster Building and Design
Vikas Singhal
VECC, Kolkata, India
Vikas Singhal, VECC
February 9, 2006
1
Cluster Building and Design
Cluster Building and Design
General View of HPC
Clustering Concept
Requirement for clustering
Quattor Description
Working of Condor
Glimpse of Ganglia
Current status of our cluster
Vikas Singhal, VECC
February 9, 2006
2
Cluster Building and Design
High Performance Computing
Branch of Computing that deals with extremely powerful computers and
the applications that use them.
High Computing Power required for Data Intensive applications or High
Computing applications. (As per requirement)
Eg. Supercomputer is one of the answer for HPC.
Supercomputer is characterized by very high speed, very large
memory.
Speed measured in terms of number of flops.
Fastest computer in the world BlueGene/L (IBM made) 280
Tflops.
Vikas Singhal, VECC
February 9, 2006
3
Cluster Building and Design
Technologies for HPC
Traditional : Build Faster CPUs
Parallel Processing
(Harness large number of
ordinary CPUs and divide the
job between then)
Special electronic Advanced CPU architecture
technology for
(Pipelining, Vector
Large number of conventional CPUs
increasing clock speed
Processing, Multiple
Interconnected through a Network
functional units etc)
Eg: CRAY
Very high clock speed
Cost effective
Expensive
Very High heat dissipation
Advanced cooling techniques required
Liquid Freon / Liquid nitrogen
Program writing is difficult,
Job has to be split into
independently executable units
But easy for User
No special programming required
Vikas Singhal, VECC
February 9, 2006
4
Cluster Building and Design
Why Clustering

Low cost technology than Supercomputer.

Faster than super computer of same hardware cost.

No technical and technological limitations.

Scalable and Simple.
For High Performance and High Availability computing,
Making Cluster of computers is one of the best solution.
Vikas Singhal, VECC
February 9, 2006
5
Cluster Building and Design
High Computing Power
Clustering of Computers
Application
Computing Intensive Task
Main aim is High Performance Computing (HPC)
(Most of TOP500 computers are built by clustering,
In BlueGene/L 1,31,000 processors (approx))
Single User and single number crunching problem
Communication between nodes should be much faster
(Some Hi-Fi network card is required (Costly))
Program should be written with the help of any parallel language or in
Parallel environment.
Parallel Languages: LINDA, OCCAM etc
Parallel Extension to serial languages:
High Performance Fortran (HPF)
Parallel APIs: OpenMP, MPI
Vikas Singhal, VECC
February 9, 2006
6
Cluster Building and Design
High Computing Power
Clustering of Computers
Application
Data Intensive Task
Main aim is not High Performance Computing (HPC) but High
Availability.
Multi User and Multi Job System
7 collaborating Institutes
More than 100 Users (Consult with Mr. S. K. Pal Talk)
It is Part of Global Grid like EDG
Security is main concern
Internet Connectivity (High Bandwidth) is required.
(We have installed 4-Mbps Leased Line (1:4))
Vikas Singhal, VECC
February 9, 2006
7
Cluster Building and Design
How to build Cluster of Our Requirement
Hardware
Purchase according to
requirement and Budget.
Processors
Memory (RAM)
Storage
No need to purchase Hi-Fi
Network Card
Software
According to requirement.
Open Source Availability.
Software Area is Very Big.
Cluster Building S/W
Cluster Monitoring S/W
Job Scheduling S/W
User Management S/W
Vikas Singhal, VECC
February 9, 2006
8
Cluster Building and Design
Our specific requirement
Procurement of
HARDWARE
Procurement of
SOFTWARE
Procurement of full cluster is not at Once.
Step by step process.
Different H/W support different S/W.
Vikas Singhal, VECC
February 9, 2006
9
Cluster Building and Design
Present status of Tier2-Kol Cluster
125.20.3.11
DMZ
4Mbps (1:4)
Management Nodes
HP Proliant-360DLG3
Dual CPU Xeon 2.4 GHz
192.168.x.x (Stand by)
Giga-bit Switch
Giga-bit Switch
Vikas Singhal, VECC
February 9, 2006
Computing Nodes
Based on High Availability
10
Cluster Building and Design
High Availability
For Data Intensive and Real time task critical system requires High availability
High Availability
Redundancy (Eliminate single point of failure)
2-Gigabit Switch
Eth1
Eth0
Each server has 2-NICs
Based on Bonding Concept
Vikas Singhal, VECC
February 9, 2006
11
Cluster Building and Design
Redundancy Cont.
Both are mirror of each other.
2 Hard Disks
Both are hot swappable.
Implemented on Hardware RAID-0 technique.
Both synchronized in each millisecond.
rsync
Trying to make mirror of Management node.
Vikas Singhal, VECC
February 9, 2006
12
Cluster Building and Design
Software Requirement for making Cluster
Open Source Software for Cluster Building:OSCAR
SCALI
Redhat Cluster Suits
: Free but harnessing of Client nodes is
limited
: Not free S/W. Paid with Network Cards
(as in IMSc)
: Not much suitable
CPM (Central Processor Manager) : IBM Proprietary
Rocks
: Not free software
Quattor
: Free and Best Suitable
For selecting which one is “Best” according to our requirement one have to
get experience with all.
Vikas Singhal, VECC
February 9, 2006
13
Cluster Building and Design
Installing a Quattor Server and Client
Quattor is an administration toolkit for optimizing resources.
Quattor is a large scale management system for managing medium to
very large (>1000 node) clusters.
3 Sets of Quattor RPM are available:1. i386 :- For all Pentium or Xeon processor or that has IA32 bit Instruction
set
2. IA64 :- For 64 bit machine means Intel Itanium
3. i86x64 :- For 64 bit machine but also supports x86 instruction set like AMD
Opetron
Site Address:http://quattor.org
Package RPMs:http://quattorsw.web.cern.ch/quattorsw/software/quatttor
Requirements:
It supports SLC or RH Linux 7.3
Disk: 6.5 GB for Server, 2.5 GB per client OS
No Specific Hardware or software required for Vikas Singhal, VECC
February 9, 2006
building Quattor Cluster.
14
Cluster Building and Design
CDB
Configuration Data Base
Hierarchical Template Based Structure
Makes one common structure for
different databases
Contains cluster descriptions,
networking parameters etc
SPMA
Software Package Manager Agent for
software deployment
Manages the different software
packages installation
Handle multiple package formats
Manages Software Repository (SWRep)
AII
NCM
Automated Installation Infrastructure
Node Configuration Manager for
Works on top of native RH/SL installer using PXE.
system configuration
Anaconda / KickStart.
Framework, where serviceDHCP server (IP address + kernel location).
specific plug-in (Components)
TFTP server (boot kernel).
makes necessary system.
HTTP server (OS images + packages).
Vikas Singhal, VECC
February 9, 2006
15
Cluster Building and Design
For Installing Cluster Site Basic Requirement
Cluster Building : Quattor
Some basic steps after Quattor installations
C3 commands
for High availability (if Dual NIC)
Bonding Package
LDAP (Lightweight Directory Access Protocol)
S/W Firewall (Make firewall rules)
Job Scheduling : Condor
Specialized workload management system.
Provides a job queuing mechanism, scheduling policy, resource
monitoring, and resource management.
Can checkpoint and migrate a job to a different machine
Vikas Singhal, VECC
February 9, 2006
16
Cluster Building and Design
Condor Daemons
Vikas Singhal, VECC
February 9, 2006
17
Cluster Building and Design
Job Submission Steps
Vikas Singhal, VECC
February 9, 2006
18
Cluster Building and Design
Condor Commands
condor_compile
Re-links source or object files with condor libraries
Condor library provides check-pointing, migration, remote
system calls
condor_submit - Takes as input submit description file and
produces a job classAd for further processing by central
manager
condor_status – to view about various machines in the Condor
pool
condor_q – for viewing job status
Vikas Singhal, VECC
February 9, 2006
19
Cluster Building and Design
Submit description files
Directs queuing of jobs
Contains
Executable location
Command line arguments to job
stdin, stderr, stdout
Initial working directory
should_transfer_files = <YES | NO | IF_NEEDED >. NO
disables condor file transfer mechanism
when_to_transfer_output = < ON_EXIT | ON_EXIT_OR_EVICT
>
Vikas Singhal, VECC
February 9, 2006
20
Cluster Building and Design
Cluster Monitoring & Job Throwing : Ganglia
Ganglia is a scalable distributed monitoring system for high-performance
computing systems.
Relies on a multicast-based listen/announce protocol to monitor state.
Very low per-node overheads and high concurrency.
It uses
XML for data representation
XDR for compact, portable data transport,
RRDtool for data storage and visualization.
Vikas Singhal, VECC
February 9, 2006
21
Cluster Building and Design
Cluster Monitoring & Job Throwing : Ganglia
Ganglia Monitoring Daemon (gmond)
Gmond is a multi-threaded daemon.
Runs on each cluster node those we want to monitor .
Ganglia Meta Daemon (gmetad)
Start it only Management node.
Ganglia PHP Web Front-end
Displays Ganglia data in a meaningful way
New Era of Internet Use started
We had used Internet / Web as Information / Knowledge Base
Now we can use http for computing also.
Vikas Singhal, VECC
Open page, select executable file and submit it.
February 9, 2006
This file will execute on Cluster Client node.
22
Cluster Building and Design
Cluster  Grid
With EDG Grid connectivity :ALIEN, EGEE, gLite, LCG-2 ???
To become a Part of Global Monitoring :
MonaLisa, Lemon.
Vikas Singhal, VECC
February 9, 2006
23
Cluster Building and Design
VECC Cluster Machine status
One Interactive node:At this time we have only one Interactive node we will procure
more in near future.
#ssh interactive001
Other Computing type of nodes:Here 6 Computing nodes (node001 to node006).
One cannot login to these nodes but compute jobs.
One can use these for Batch mode for computing, not in Interactive
mode.
Vikas Singhal, VECC
February 9, 2006
24
Cluster Building and Design
Where we land up Now
PC – Post Card
PC – Personal Computer
PC – Packed Cluster
Vikas Singhal, VECC
February 9, 2006
25
Cluster Building and Design
Future Work
C++ and MPI (Massage Passing Interface) will be the Future for clusters.
For optimum use of cluster users have to learn MPI
Questions ??
Vikas Singhal, VECC
February 9, 2006
26