Download Sheffield Site Report

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
HEP Computing Status
Sheffield University
Matt Robinson
Paul Hodgson
Andrew Beresford
Interactive Cluster
•
•
•
•
•
•
•
•
•
•
30 self built linux boxes
AMD Athlon XP cpu’s, 256/512 meg ram
OS Scientific Linux 303
100 megabit network
Use NIS for authentication, NFS mount /home etc
System install using kickstart + post install scripts
Separate backup machine
15 Laptops mostly dual boot
Some MAC’s and one Windows Box
3 Disk servers mounted as /data1 /data2 etc (few TB)
Batch Cluster
•
•
•
•
•
•
•
100 cpu farm Athlon XP 2400/2800
OS Scientific Linux 303
NFS mounted /home and /data
OpenPBS batch system for job submission
Gigabit Backbone with 100 MBit to worker nodes
Disk server provides 1.3 TB as /data Raid5
Entire cluster assembled in house from OEM components
for less than 50k
• Hard part was finding air-conditioned room with sufficient
power
Cluster Usage
Software
•
•
•
•
•
•
PAW, CERNLIB etc
Geant4
ROOT
Atlas 10.0.1
FLUKA
ANSYS, LS-DYNA
Comments - Issues
• Have tightened up security in last year
• Strict firewall policy, limited machine exemption
• Blocking scripts prevent ssh access after 3
authentication failures within 1 hour
• Cheap disks allow construction of large disk
arrays
• Very happy with SL3 for desktop machines
• Use FC3 for Laptops – 2.6 kernel
The Sheffield LCG Cluster
Division of Hardware
• 162 x AMD Opteron 250 (2.4
GHz)
• 4 GB RAM/box (2 GB/CPU)
• 72 GB U320 10K RPM local
SCSI disk
• Currently running 32 bit
SL303 for maximum
compatibility with grid.
• ~2.5 TB storage for
experiments.
• Middleware: 2.4.0
• Probably the most purple
cluster in the grid.
Looking Sinister
Status
Usage so far
• We can take quite a bit more.
Monitoring
• Ganglia with modified
webfrontend to present
queue information
Installation
• Service nodes connected to VPN and Internet
• PXE Installation via VPN allows complete control of
dhcpd and named
• RedHat kickstart + post install script
• ssh servers not exposed
• RGMA always the hardest part
• Stumbled across routing rules.
• WN install takes about 30 minutes, can do up to 40
simultaneously.
Matt Robinson:
Future plans
• Keep up with middleware updates
• Increase available storage as required in
~3-4 TB steps
• Fix things that break
• Try not to mess anything up by screwing
around
• Look toward operating with 64 bit OS.