Download jgonza03@qlogin4 demo

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Distributed operating system wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Computer cluster wikipedia , lookup

Transcript
Introduction to the New High
Performance Facility at Sick Kids
Service provided by
The Centre for Computational Medicine
Agenda
 The HPC resources upgrade process.
 New HPF architecture.
 Nodes’ description.
 How to submit a single job.
 How to submit an array job.
 How to check status of jobs.
 Known issues.
Old HPF Architecture 2006 – 2014
Update HPC Resources
 Process started fall 2010 (RFI, RFQ and performance testing)
 HSC purchased the HPC infrastructure through RFP418.
 RFP requirements were:
 HPC cluster formed by:



1- Compute nodes
2- High Performance Storage
3- Fast network connections.
 Mandatory Performance for the HPC system:
 Storage: 80 Gb/s steady throughput from compute nodes to storage
 Compute:



Pass an acceptance test using a DNA sequencing pipeline (bfast). Benchmark
criteria: time, results and power used per rack during the test.
Iozone test of the local hard drive (I/O to disk)
Network redundancy test: Plug and unplug cables between all of the switches.
Proposed Solution
 RFP 418: Award winner: Scalar Decisions Inc.
 Infrastructure:
 10,000 Computer threads (246 nodes)
 Network: Infiniband (IPoIB) for data and 10/1 Gb/s for
management
 Storage: 2.3 PB usable Isilon systems (21 nodes)
Resource
Manager/Sc
heduler
Authentication
system
Moab/Torque
LDAP
Image management
OS provisioning
Bright Computing
Storage
Computer Resources
Network
SGI
Monitoring
system
Mellanox
ISILON
Illustration of the Benchmarks Results
COMPUTE NODES -----> STORAGE
POWER per PDU IN A RACK
IOZONE TEST WRITING TO
LOCAL HARD DRIVE
PGCRL Infrastructure: Current Status
carbon.research.sickkids.ca
Ancillary Nodes Description
•
4x Login nodes
– Cluster access
– hpf.ccm.sickkids.ca: hpf23, hpf24, hpf25 and hpf26.ccm.sickkids.ca
– Small nodes with /home read/write and /projects read-only
– No tools directory
– Operating system Centos 6.5
•
2x Data transfer nodes
– Large data transfer in or out the cluster
– data.ccm.sickkids.ca: data1 and data2.ccm.sickkids.ca
– Small nodes with read/write on /projects and /home
– Secure shell protocols allowed(ONLY!)
– Directly connected to the Isilon storage
– Operating system Centos 6.5
•
4x Qlogin nodes (working or interactive nodes)
– Working nodes for job submission and interactive sessions
– qlogin1, qlogin2, qlogin3 and qlogin4
– 96 GB RAM, 900 GB localhd, 32 compute threads/node
– Small nodes with read/write on /projects and /home
– Operating system Centos 6.4
Computer Nodes Description
•
100 x Sandy Bridge nodes (rack1 + 28 additional nodes)
– Dual sockets Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
– 32 threads per node
– 120GB RAM
– 750 GB of /localhd
– Infiniband (IPoIB) to the storage
– Operating system Centos 6.4
•
3x Ivy Bridge nodes (racks 2-4)
– Dual sockets Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
– 40 threads per node
– 120GB RAM
– 750 GB of /localhd
– Infiniband (IPoIB) to the storage
– Operating system Centos 6.4
•
2x Large memory nodes
– Dual sockets Intel(R) Xeon(R) CPU E5-4650L 0 @ 2.60GHz
– 64 threads per node
– 512GB RAM
– 1.6 TB of /localhd
– Infiniband (IPoIB) to the storage
– Operating system Centos 6.4
Beta Testers Period Stats
Period dedicated to test and optimize the
computer cluster and storage.
20 users from different groups.
New HPF
statistics
Moab7.2.6/Torque4.2.6
(90 days)
Moab8/Torque5
(25 days)
Total jobs
235059
320945
Total CPU hours
176988
190056
Total wall time
111556.5
164527
How to Submit a Job
• Adaptive Computing software: Moab and Torque
• Command: qlogin and qsub
Login Node
localhost:ssh_newhpf jorge gonzalez-outei$ ssh -A hpf.ccm.sickkids.ca
[root@hpf23 ~]# df -kh
Filesystem
Size Used Avail Use% Mounted on
/dev/vda1
19G 6.7G 12G 38% /
none
303M 0 303M 0% /dev/shm
/dev/vda5
71G 181M 67G 1% /localhd
/dev/vda2
5.7G 664M 4.7G 13% /var
brightstorage:/export/cm/shared
96G 46G 46G 50% /cm/shared
carbon.hpf.cluster:/ifs/CCM/home
2.0P 1.7P 313T 85% /home
carbon.hpf.cluster:/ifs/CCM/os/login-image2/usr
2.0P 1.7P 313T 85% /usr
qlogin
Note: No project or
tools directories on
Login Nodes
[jgonza03@hpf23 ~]$ qlogin
qsub: waiting for job 125636 to start
qsub: job 125636 ready
[jgonza03@qlogin2 ~]$
Qlogin Nodes: Interactive & Submit Nodes
[root@qlogin4 ~]# df -kh
Filesystem
Size Used Avail Use% Mounted on
/dev/vda1
19G 7.4G 11G 42% /
none
45G 0 45G 0% /dev/shm
/dev/vda3
965G 200M 916G 1% /localhd
brightstorage:/export/cm/shared
96G 46G 46G 50% /cm/shared
carbon.hpf.cluster:/ifs/CCM/home
2.0P 1.7P 313T 85% /home
carbon.hpf.cluster:/ifs/CCM/tools
2.0P 1.7P 313T 85% /hpf/tools
carbon.hpf.cluster:/ifs/CCM/largeprojects
2.0P 1.7P 313T 85% /hpf/largeprojects
carbon.hpf.cluster:/ifs/CCM/projects
2.0P 1.7P 313T 85% /hpf/projects
carbon.hpf.cluster:/ifs/CCM/os/interactive-image2/usr
2.0P 1.7P 313T 85% /usr
carbon.hpf.cluster:/ifs/CCM/moabmaster8/torquefiles/job_logs
2.0P 1.7P 313T 85% /opt/torque_job_logs
carbon.hpf.cluster:/ifs/CCM/fs07
2.0P 1.7P 313T 85% /hpf/fs07.original
carbon.hpf.cluster:/ifs/CCM/tcagstor.original
2.0P 1.7P 313T 85% /hpf/tcagstor.original
carbon.hpf.cluster:/ifs/CCM/tcagstor.original.1
2.0P 1.7P 313T 85% /hpf/tcagstor.original.1
carbon.hpf.cluster:/ifs/CCM/tcagstor.original.2
2.0P 1.7P 313T 85% /hpf/tcagstor.original.2
carbon.hpf.cluster:/ifs/CCM/tcagstor.original.3
2.0P 1.7P 313T 85% /hpf/tcagstor.original.3
Home directory
Tools directory
Projects directories
Data Nodes: Data Transfer Nodes
[root@data2 ~]# df -kh
Filesystem
Size Used Avail Use% Mounted on
/dev/vda1
19G 2.1G 16G 12% /
none
15G 0 15G 0% /dev/shm
/dev/vda6
61G 180M 58G 1% /localhd
/dev/vda3
1.9G 35M 1.8G 2% /tmp
/dev/vda2
5.7G 717M 4.7G 14% /var
brightstorage:/export/cm/shared
96G 46G 46G 50% /cm/shared
carbon.hpf.cluster:/ifs/CCM/os/data-transfer-image/usr
2.0P 1.7P 313T 85% /usr
carbon.hpf.cluster:/ifs/CCM/home
2.0P 1.7P 313T 85% /home
carbon.hpf.cluster:/ifs/CCM/largeprojects
2.0P 1.7P 313T 85% /hpf/largeprojects
carbon.hpf.cluster:/ifs/CCM/projects
2.0P 1.7P 313T 85% /hpf/projects
carbon.hpf.cluster:/ifs/CCM/fs07
2.0P 1.7P 313T 85% /hpf/fs07.original
carbon.hpf.cluster:/ifs/CCM/tcagstor.original
2.0P 1.7P 313T 85% /hpf/tcagstor.original
carbon.hpf.cluster:/ifs/CCM/tcagstor.original.1
2.0P 1.7P 313T 85% /hpf/tcagstor.original.1
carbon.hpf.cluster:/ifs/CCM/tcagstor.original.2
2.0P 1.7P 313T 85% /hpf/tcagstor.original.2
carbon.hpf.cluster:/ifs/CCM/tcagstor.original.3
2.0P 1.7P 313T 85% /hpf/tcagstor.original.3
Note: No tools
directory on Data
Nodes
Home directory
Projects directories
Compute Node
[root@r2b-5 ~]# df -kh
Filesystem
Size Used Avail Use% Mounted on
rootfs
114G 3.9G 110G 4% /
Job’s local space
tmpfs
114G 3.9G 110G 4% /
none
64G 156K 64G 1% /dev
TMPDIR=/localhd/PBS_JOBID
/dev/sda1
826G 201M 784G 1% /localhd
none
1.3G 0 1.3G 0% /dev/shm
carbon.hpf.cluster:/ifs/CCM/os/clone-standard-image/usr
2.0P 69T 2.0P 4% /usr
brightstorage:/export/cm/shared
96G 45G 47G 50% /cm/shared
carbon.hpf.cluster:/ifs/CCM/home
Home directory (read/write)
2.0P 69T 2.0P 4% /home
carbon.hpf.cluster:/ifs/CCM/largeprojects
2.0P 69T 2.0P 4% /hpf/largeprojects
carbon.hpf.cluster:/ifs/CCM/projects
Projects directories (read/write)
2.0P 69T 2.0P 4% /hpf/projects
carbon.hpf.cluster:/ifs/CCM/tools
2.0P 69T 2.0P 4% /hpf/tools
Tools directory (read-only)
Modules
 Two ways to load the executables and libraries to run your jobs:
1- Large $PATH and $LD_LIBRARY_PATHs in the .bashrc file.
2- Modules
Available modules: [jgonza03@qlogin4 demo]$ module avail
On your scripts add the module load <package name> <package name>
For example:
[jgonza03@qlogin4 demo]$ gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[jgonza03@qlogin4 demo]$ module load gcc/4.9.1
[jgonza03@qlogin4 demo]$ gcc --version
gcc (GCC) 4.9.1
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[jgonza03@qlogin4 demo]$
Torque and Moab Commands
Command
qlogin (options)
qsub (options)
Description
Torque command to access interactive node
Torque command to submit jobs
qstat –t (options)
Torque command to check job status
showq (options)
Moab command to check job status
qstat –f JOBID
Torque command to check details about jobs
checkjob –vvv JOBID
qdel JOBID
canceljob
Showjobs (options)
showq –c
Moab command to check details about jobs
Torque command to remove jobs
Moab command to remove jobs
CCM command to check detail of finished jobs
Moab command to check details on jobs that finished
within 1-3 days.
Parameters to Consider When Submitting a Job
REQUEST THE RESOURCE YOU NEED!
QUEUES AREA AUTOMATICALLY ASSIGNED!
Interesting parameters:
 #PBS -l nodes=1:ppn=X
 #PBS -l gres=localhd:XX
 #PBS -l vmem=Xg
Number of processors per job (default 1 processor). Max 40 per job!
Amount of local space required (default 10GB) Max 700 GB per job
Amount of virtual memory per job (total, equivalent to h_vmem)
Regular job max 124GB. Large memory jobs 510 GB (default 2GB)
#PBS -l walltime=HR:MIN:SC
Walltime of your calculation (default 24 hours)
 #PBS -joe /path to directory/output/ “joe” join ouput/error in the directory indicated.
Default is the working directory from where you submitted your jobs.

Submit a single job with the command:
qsub script.sh or qsub options script.sh
[jgonza03@qlogin2 demo]$ qsub tutorial.sh
125637
mem size Amount of physical
memory used by the job.
vmem size Amount of virtual
memory used by all concurrent
processes in the job.
Example Scripts: Things to Remember!!!!
[jgonza03@qlogin2 demo]$ cat tutorial.sh
#!/bin/bash -x
#PBS -N tutorial
#PBS -l nodes=1:ppn=1
#PBS -l gres=localhd:20
#PBS -l vmem=4g
#PBS -l walltime=00:01:30
#PBS -joe /home/jgonza03/demo/output/
JOB name
Number of processor
Amount of storage in the nodes’ localhd
Amount of memory ( similar to h_vmem)
Length of your calculations
Output/error directory
cd /home/jgonza03/demo/
ls -ld /localhd/*
export TMPDIR=/localhd/$PBS_JOBID
echo step1
touch $TMPDIR/jorgetest.txt
echo "JORGE TESTING" > $TMPDIR/jorgetest.txt
cat $TMPDIR/jorgetest
ulimit -a
vmem size
job.
Amount of virtual memory used by all concurrent processes in the
Request Resources and No Queues!
[jgonza03@qlogin2 demo]$ qsub -l nodes=1:ppn=2,walltime=48:00:00 tutorial.sh
125639
[jgonza03@qlogin2 demo]$ qsub -l nodes=1:ppn=2,walltime=8:00:00 tutorial.sh
125640
[jgonza03@qlogin2 demo]$ qsub -l walltime=03:00:00 tutorial.sh
125641
[jgonza03@qlogin2 demo]$ qsub -l walltime=00:05:00 tutorial.sh
125642
[jgonza03@qlogin2 demo]$ qsub -l walltime=30:05:00 tutorial.sh
125643
[jgonza03@qlogin2 demo]$ qsub -l nodes=1:ppn=2 tutorial.sh
125644
[jgonza03@qlogin2 demo]$ qsub tutorial.sh
125645
[jgonza03@qlogin2 demo]$ qstat
Job ID
Name
User
Time Use S Queue
------------------------- ---------------- --------------- -------- - ----125636
STDIN
jgonza03
00:00:00 R qloginQ
125639
tutorial
jgonza03
0 Q parallel_long
125640
tutorial
jgonza03
0 Q parallel
125641
tutorial
jgonza03
0 Q all
125642
tutorial
jgonza03
0 Q short
125643
tutorial
jgonza03
0 Q long
125644
tutorial
jgonza03
0 Q parallel
125645
tutorial
jgonza03
0 Q short
Example of Output
Tutorial.sh script
cd /home/jgonza03/demo/
ls -ld /localhd/*
export TMPDIR=/localhd/$PBS_JOBID
echo step1
touch $TMPDIR/jorgetest.txt
echo "JORGE TESTING" >
$TMPDIR/jorgetest.txt
cat $TMPDIR/jorgetest
ulimit -a
[jgonza03@qlogin2 demo]$ more tutorial.o125637
drwx------ 2 jgonza03 root 4096 Sep 9 20:18 /localhd/125637
drwx------ 2 root root 16384 Sep 9 11:06 /localhd/lost+found
drwxrwxrwt 2 root root 4096 Sep 9 11:10 /localhd/scratch
drwxrwxrwt 3 root root 4096 Sep 9 12:49 /localhd/tmp
step1
JORGE TESTING
core file size
(blocks, -c) 0
data seg size
(kbytes, -d) 2097152
scheduling priority
(-e) 0
file size
(blocks, -f) unlimited
pending signals
(-i) 1032600
max locked memory
(kbytes, -l) unlimited
max memory size
(kbytes, -m) 2097152
open files
(-n) 32768
pipe size
(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority
(-r) 0
stack size
(kbytes, -s) unlimited
cpu time
(seconds, -t) unlimited
max user processes
(-u) 1032600
virtual memory
(kbytes, -v) 4194304
file locks
(-x) unlimited
Check Status of Finished Jobs
showjobs: Command available in the qlogin nodes.
[jgonza03@qlogin2 ~]$ showjobs --help
Usage:
showjobs [-u *user_name*] [-g *group_name*] [-a *account_name*] [-q
*queue_name*] [-s *start date*] [-e *end date*] [-n *days*]
[-o|--oneonly] [--help] [--man] [[-j] <job id>]
[jgonza03@qlogin2 ~]$ showjobs -u jgonza03 -s 2014-08-27
Job Id
: 114760.moab8master.hpf.cluster
Job Name
: STDIN
Output File
: /dev/pts/2
Error File
: /dev/pts/2
Working Directory : /home/jgonza03
Home Directory : /home/jgonza03
Submit Arguments : -I -q qloginQ -l os=interactive2,hostlist=qlogin[1-4]
User Name
: jgonza03
Group Name
: ccm
Queue Name
: qloginQ
Wallclock Duration: 00:17:48
CPUTime
: 00:00:04
Memory Used
: 17044
Memory Limit : 1
vmem Used
: 365632
Submit Time
: Wed Aug 27 09:41:32 2014
Start Time
: Wed Aug 27 09:41:32 2014
End Time
: Wed Aug 27 09:59:20 2014
Exit Code
: 265
Master Host
: qlogin3
Interactive
: True
-------------------------------------------------------------------------------.......
See next slide.
mem size Amount of physical
memory used by the job.
vmem size Amount of virtual
memory used by all concurrent
processes in the job.
Exit Status
Once a job under TORQUE is complete, the exit status attribute will contain the
result code returned by the job script. If TORQUE was unable to start the job, this
field will contain a negative number produced by the pbs_mom. Otherwise, if the job
script was successfully started, the value in this field will be the return value of the
script. TORQUE supplied exit codes (256+ error code below)
The C routine exit passes only the low order byte ( 256) of its argument. In this case,
256+11 is really 267 but the resulting exit code is only 11 as seen in the output.
Array Jobs
[jgonza03@qlogin4 demo]$ qsub -t 1-200 -l vmem=4g,walltime=00:02:30 array.sh
125647[]
[jgonza03@qlogin4 demo]$ qstat
Job ID
Name
User
Time Use S Queue
------------------------- ---------------- --------------- -------- - ----125636
STDIN
jgonza03
00:00:00 R qloginQ
125647[]
array.sh
jgonza03
0 Q short
[jgonza03@qlogin4 demo]$ qstat -t
Job ID
Name
User
Time Use S Queue
------------------------- ---------------- --------------- -------- - ----125636
STDIN
jgonza03
00:00:00 R qloginQ
125647[1]
array.sh-1
jgonza03
0 Q short
125647[2]
array.sh-2
jgonza03
0 Q short
125647[3]
array.sh-3
jgonza03
0 Q short
125647[4]
array.sh-4
jgonza03
0 Q short
125647[5]
array.sh-5
jgonza03
0 Q short
……
[jgonza03@qlogin4 demo]$ showq -s -v
active jobs: 201 eligible jobs: 0 blocked jobs: 0
Total jobs: 201
Job with Dependencies
Parameter
Application
afterok:jobid[:jobid...]
This job may be scheduled for execution only after jobs
jobid have terminated with errors. See the csh warning
under Extended description.
afteranyarray:arrayid[count]
This job may be scheduled for execution after jobs
in arrayid have terminated, with or without errors.
[jgonza03@qlogin4 demo]$ qstat –t
Job ID
Name
User
Time Use S Queue
------------------------- ---------------- --------------- -------- - ----125636
STDIN
jgonza03
00:00:00 R qloginQ
125648
tutorial
jgonza03
0 Q short
125649[1]
array.sh-1
jgonza03
0 H all
125649[2]
array.sh-2
jgonza03
0 H all
125649[3]
array.sh-3
jgonza03
0 H all
….
…..
125649[19]
array.sh-19 jgonza03
0 H all
125649[20]
array.sh-20 jgonza03
0 H all
Known Issues
Scheduler issues:
 In occasions the qstat command without options
reports the wrong status of the jobs.
We are working with Adaptive Computing to develop
fixes to these software bugs.
Other Projects
RIT is working to deploy an archive system (tape).
Project scheduled to go live during fall 2014.
We are working with RIT to develop a project that will
make the new cluster available for external users:
1- Collaborators from other institutions
2- People without a Sick Kids windows account
domain.
Comments and Questions?
Please send comments, suggestions and
requests to [email protected]
and cc [email protected] indicating
in the subject that this request is for HPF.
Thank you!
The Centre for Computational Medicine
HPC team
Len Zaifman
Goran Marik
Jorge GonzalezOuteirino
Florin Stingaciu
Yuki Saito
Brian Phan
12 Floor, Room 129830