Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to the New High Performance Facility at Sick Kids Service provided by The Centre for Computational Medicine Agenda  The HPC resources upgrade process.  New HPF architecture.  Nodes’ description.  How to submit a single job.  How to submit an array job.  How to check status of jobs.  Known issues. Old HPF Architecture 2006 – 2014 Update HPC Resources  Process started fall 2010 (RFI, RFQ and performance testing)  HSC purchased the HPC infrastructure through RFP418.  RFP requirements were:  HPC cluster formed by:    1- Compute nodes 2- High Performance Storage 3- Fast network connections.  Mandatory Performance for the HPC system:  Storage: 80 Gb/s steady throughput from compute nodes to storage  Compute:    Pass an acceptance test using a DNA sequencing pipeline (bfast). Benchmark criteria: time, results and power used per rack during the test. Iozone test of the local hard drive (I/O to disk) Network redundancy test: Plug and unplug cables between all of the switches. Proposed Solution  RFP 418: Award winner: Scalar Decisions Inc.  Infrastructure:  10,000 Computer threads (246 nodes)  Network: Infiniband (IPoIB) for data and 10/1 Gb/s for management  Storage: 2.3 PB usable Isilon systems (21 nodes) Resource Manager/Sc heduler Authentication system Moab/Torque LDAP Image management OS provisioning Bright Computing Storage Computer Resources Network SGI Monitoring system Mellanox ISILON Illustration of the Benchmarks Results COMPUTE NODES -----> STORAGE POWER per PDU IN A RACK IOZONE TEST WRITING TO LOCAL HARD DRIVE PGCRL Infrastructure: Current Status carbon.research.sickkids.ca Ancillary Nodes Description • 4x Login nodes – Cluster access – hpf.ccm.sickkids.ca: hpf23, hpf24, hpf25 and hpf26.ccm.sickkids.ca – Small nodes with /home read/write and /projects read-only – No tools directory – Operating system Centos 6.5 • 2x Data transfer nodes – Large data transfer in or out the cluster – data.ccm.sickkids.ca: data1 and data2.ccm.sickkids.ca – Small nodes with read/write on /projects and /home – Secure shell protocols allowed(ONLY!) – Directly connected to the Isilon storage – Operating system Centos 6.5 • 4x Qlogin nodes (working or interactive nodes) – Working nodes for job submission and interactive sessions – qlogin1, qlogin2, qlogin3 and qlogin4 – 96 GB RAM, 900 GB localhd, 32 compute threads/node – Small nodes with read/write on /projects and /home – Operating system Centos 6.4 Computer Nodes Description • 100 x Sandy Bridge nodes (rack1 + 28 additional nodes) – Dual sockets Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz – 32 threads per node – 120GB RAM – 750 GB of /localhd – Infiniband (IPoIB) to the storage – Operating system Centos 6.4 • 3x Ivy Bridge nodes (racks 2-4) – Dual sockets Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz – 40 threads per node – 120GB RAM – 750 GB of /localhd – Infiniband (IPoIB) to the storage – Operating system Centos 6.4 • 2x Large memory nodes – Dual sockets Intel(R) Xeon(R) CPU E5-4650L 0 @ 2.60GHz – 64 threads per node – 512GB RAM – 1.6 TB of /localhd – Infiniband (IPoIB) to the storage – Operating system Centos 6.4 Beta Testers Period Stats Period dedicated to test and optimize the computer cluster and storage. 20 users from different groups. New HPF statistics Moab7.2.6/Torque4.2.6 (90 days) Moab8/Torque5 (25 days) Total jobs 235059 320945 Total CPU hours 176988 190056 Total wall time 111556.5 164527 How to Submit a Job • Adaptive Computing software: Moab and Torque • Command: qlogin and qsub Login Node localhost:ssh_newhpf jorge gonzalez-outei$ ssh -A hpf.ccm.sickkids.ca [root@hpf23 ~]# df -kh Filesystem Size Used Avail Use% Mounted on /dev/vda1 19G 6.7G 12G 38% / none 303M 0 303M 0% /dev/shm /dev/vda5 71G 181M 67G 1% /localhd /dev/vda2 5.7G 664M 4.7G 13% /var brightstorage:/export/cm/shared 96G 46G 46G 50% /cm/shared carbon.hpf.cluster:/ifs/CCM/home 2.0P 1.7P 313T 85% /home carbon.hpf.cluster:/ifs/CCM/os/login-image2/usr 2.0P 1.7P 313T 85% /usr qlogin Note: No project or tools directories on Login Nodes [jgonza03@hpf23 ~]$ qlogin qsub: waiting for job 125636 to start qsub: job 125636 ready [jgonza03@qlogin2 ~]$ Qlogin Nodes: Interactive & Submit Nodes [root@qlogin4 ~]# df -kh Filesystem Size Used Avail Use% Mounted on /dev/vda1 19G 7.4G 11G 42% / none 45G 0 45G 0% /dev/shm /dev/vda3 965G 200M 916G 1% /localhd brightstorage:/export/cm/shared 96G 46G 46G 50% /cm/shared carbon.hpf.cluster:/ifs/CCM/home 2.0P 1.7P 313T 85% /home carbon.hpf.cluster:/ifs/CCM/tools 2.0P 1.7P 313T 85% /hpf/tools carbon.hpf.cluster:/ifs/CCM/largeprojects 2.0P 1.7P 313T 85% /hpf/largeprojects carbon.hpf.cluster:/ifs/CCM/projects 2.0P 1.7P 313T 85% /hpf/projects carbon.hpf.cluster:/ifs/CCM/os/interactive-image2/usr 2.0P 1.7P 313T 85% /usr carbon.hpf.cluster:/ifs/CCM/moabmaster8/torquefiles/job_logs 2.0P 1.7P 313T 85% /opt/torque_job_logs carbon.hpf.cluster:/ifs/CCM/fs07 2.0P 1.7P 313T 85% /hpf/fs07.original carbon.hpf.cluster:/ifs/CCM/tcagstor.original 2.0P 1.7P 313T 85% /hpf/tcagstor.original carbon.hpf.cluster:/ifs/CCM/tcagstor.original.1 2.0P 1.7P 313T 85% /hpf/tcagstor.original.1 carbon.hpf.cluster:/ifs/CCM/tcagstor.original.2 2.0P 1.7P 313T 85% /hpf/tcagstor.original.2 carbon.hpf.cluster:/ifs/CCM/tcagstor.original.3 2.0P 1.7P 313T 85% /hpf/tcagstor.original.3 Home directory Tools directory Projects directories Data Nodes: Data Transfer Nodes [root@data2 ~]# df -kh Filesystem Size Used Avail Use% Mounted on /dev/vda1 19G 2.1G 16G 12% / none 15G 0 15G 0% /dev/shm /dev/vda6 61G 180M 58G 1% /localhd /dev/vda3 1.9G 35M 1.8G 2% /tmp /dev/vda2 5.7G 717M 4.7G 14% /var brightstorage:/export/cm/shared 96G 46G 46G 50% /cm/shared carbon.hpf.cluster:/ifs/CCM/os/data-transfer-image/usr 2.0P 1.7P 313T 85% /usr carbon.hpf.cluster:/ifs/CCM/home 2.0P 1.7P 313T 85% /home carbon.hpf.cluster:/ifs/CCM/largeprojects 2.0P 1.7P 313T 85% /hpf/largeprojects carbon.hpf.cluster:/ifs/CCM/projects 2.0P 1.7P 313T 85% /hpf/projects carbon.hpf.cluster:/ifs/CCM/fs07 2.0P 1.7P 313T 85% /hpf/fs07.original carbon.hpf.cluster:/ifs/CCM/tcagstor.original 2.0P 1.7P 313T 85% /hpf/tcagstor.original carbon.hpf.cluster:/ifs/CCM/tcagstor.original.1 2.0P 1.7P 313T 85% /hpf/tcagstor.original.1 carbon.hpf.cluster:/ifs/CCM/tcagstor.original.2 2.0P 1.7P 313T 85% /hpf/tcagstor.original.2 carbon.hpf.cluster:/ifs/CCM/tcagstor.original.3 2.0P 1.7P 313T 85% /hpf/tcagstor.original.3 Note: No tools directory on Data Nodes Home directory Projects directories Compute Node [root@r2b-5 ~]# df -kh Filesystem Size Used Avail Use% Mounted on rootfs 114G 3.9G 110G 4% / Job’s local space tmpfs 114G 3.9G 110G 4% / none 64G 156K 64G 1% /dev TMPDIR=/localhd/PBS_JOBID /dev/sda1 826G 201M 784G 1% /localhd none 1.3G 0 1.3G 0% /dev/shm carbon.hpf.cluster:/ifs/CCM/os/clone-standard-image/usr 2.0P 69T 2.0P 4% /usr brightstorage:/export/cm/shared 96G 45G 47G 50% /cm/shared carbon.hpf.cluster:/ifs/CCM/home Home directory (read/write) 2.0P 69T 2.0P 4% /home carbon.hpf.cluster:/ifs/CCM/largeprojects 2.0P 69T 2.0P 4% /hpf/largeprojects carbon.hpf.cluster:/ifs/CCM/projects Projects directories (read/write) 2.0P 69T 2.0P 4% /hpf/projects carbon.hpf.cluster:/ifs/CCM/tools 2.0P 69T 2.0P 4% /hpf/tools Tools directory (read-only) Modules  Two ways to load the executables and libraries to run your jobs: 1- Large $PATH and $LD_LIBRARY_PATHs in the .bashrc file. 2- Modules Available modules: [jgonza03@qlogin4 demo]$ module avail On your scripts add the module load <package name> <package name> For example: [jgonza03@qlogin4 demo]$ gcc --version gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3) Copyright (C) 2010 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [jgonza03@qlogin4 demo]$ module load gcc/4.9.1 [jgonza03@qlogin4 demo]$ gcc --version gcc (GCC) 4.9.1 Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [jgonza03@qlogin4 demo]$ Torque and Moab Commands Command qlogin (options) qsub (options) Description Torque command to access interactive node Torque command to submit jobs qstat –t (options) Torque command to check job status showq (options) Moab command to check job status qstat –f JOBID Torque command to check details about jobs checkjob –vvv JOBID qdel JOBID canceljob Showjobs (options) showq –c Moab command to check details about jobs Torque command to remove jobs Moab command to remove jobs CCM command to check detail of finished jobs Moab command to check details on jobs that finished within 1-3 days. Parameters to Consider When Submitting a Job REQUEST THE RESOURCE YOU NEED! QUEUES AREA AUTOMATICALLY ASSIGNED! Interesting parameters:  #PBS -l nodes=1:ppn=X  #PBS -l gres=localhd:XX  #PBS -l vmem=Xg Number of processors per job (default 1 processor). Max 40 per job! Amount of local space required (default 10GB) Max 700 GB per job Amount of virtual memory per job (total, equivalent to h_vmem) Regular job max 124GB. Large memory jobs 510 GB (default 2GB) #PBS -l walltime=HR:MIN:SC Walltime of your calculation (default 24 hours)  #PBS -joe /path to directory/output/ “joe” join ouput/error in the directory indicated. Default is the working directory from where you submitted your jobs.  Submit a single job with the command: qsub script.sh or qsub options script.sh [jgonza03@qlogin2 demo]$ qsub tutorial.sh 125637 mem size Amount of physical memory used by the job. vmem size Amount of virtual memory used by all concurrent processes in the job. Example Scripts: Things to Remember!!!! [jgonza03@qlogin2 demo]$ cat tutorial.sh #!/bin/bash -x #PBS -N tutorial #PBS -l nodes=1:ppn=1 #PBS -l gres=localhd:20 #PBS -l vmem=4g #PBS -l walltime=00:01:30 #PBS -joe /home/jgonza03/demo/output/ JOB name Number of processor Amount of storage in the nodes’ localhd Amount of memory ( similar to h_vmem) Length of your calculations Output/error directory cd /home/jgonza03/demo/ ls -ld /localhd/* export TMPDIR=/localhd/$PBS_JOBID echo step1 touch $TMPDIR/jorgetest.txt echo "JORGE TESTING" > $TMPDIR/jorgetest.txt cat $TMPDIR/jorgetest ulimit -a vmem size job. Amount of virtual memory used by all concurrent processes in the Request Resources and No Queues! [jgonza03@qlogin2 demo]$ qsub -l nodes=1:ppn=2,walltime=48:00:00 tutorial.sh 125639 [jgonza03@qlogin2 demo]$ qsub -l nodes=1:ppn=2,walltime=8:00:00 tutorial.sh 125640 [jgonza03@qlogin2 demo]$ qsub -l walltime=03:00:00 tutorial.sh 125641 [jgonza03@qlogin2 demo]$ qsub -l walltime=00:05:00 tutorial.sh 125642 [jgonza03@qlogin2 demo]$ qsub -l walltime=30:05:00 tutorial.sh 125643 [jgonza03@qlogin2 demo]$ qsub -l nodes=1:ppn=2 tutorial.sh 125644 [jgonza03@qlogin2 demo]$ qsub tutorial.sh 125645 [jgonza03@qlogin2 demo]$ qstat Job ID Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----125636 STDIN jgonza03 00:00:00 R qloginQ 125639 tutorial jgonza03 0 Q parallel_long 125640 tutorial jgonza03 0 Q parallel 125641 tutorial jgonza03 0 Q all 125642 tutorial jgonza03 0 Q short 125643 tutorial jgonza03 0 Q long 125644 tutorial jgonza03 0 Q parallel 125645 tutorial jgonza03 0 Q short Example of Output Tutorial.sh script cd /home/jgonza03/demo/ ls -ld /localhd/* export TMPDIR=/localhd/$PBS_JOBID echo step1 touch $TMPDIR/jorgetest.txt echo "JORGE TESTING" > $TMPDIR/jorgetest.txt cat $TMPDIR/jorgetest ulimit -a [jgonza03@qlogin2 demo]$ more tutorial.o125637 drwx------ 2 jgonza03 root 4096 Sep 9 20:18 /localhd/125637 drwx------ 2 root root 16384 Sep 9 11:06 /localhd/lost+found drwxrwxrwt 2 root root 4096 Sep 9 11:10 /localhd/scratch drwxrwxrwt 3 root root 4096 Sep 9 12:49 /localhd/tmp step1 JORGE TESTING core file size (blocks, -c) 0 data seg size (kbytes, -d) 2097152 scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 1032600 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) 2097152 open files (-n) 32768 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 1032600 virtual memory (kbytes, -v) 4194304 file locks (-x) unlimited Check Status of Finished Jobs showjobs: Command available in the qlogin nodes. [jgonza03@qlogin2 ~]$ showjobs --help Usage: showjobs [-u *user_name*] [-g *group_name*] [-a *account_name*] [-q *queue_name*] [-s *start date*] [-e *end date*] [-n *days*] [-o|--oneonly] [--help] [--man] [[-j] <job id>] [jgonza03@qlogin2 ~]$ showjobs -u jgonza03 -s 2014-08-27 Job Id : 114760.moab8master.hpf.cluster Job Name : STDIN Output File : /dev/pts/2 Error File : /dev/pts/2 Working Directory : /home/jgonza03 Home Directory : /home/jgonza03 Submit Arguments : -I -q qloginQ -l os=interactive2,hostlist=qlogin[1-4] User Name : jgonza03 Group Name : ccm Queue Name : qloginQ Wallclock Duration: 00:17:48 CPUTime : 00:00:04 Memory Used : 17044 Memory Limit : 1 vmem Used : 365632 Submit Time : Wed Aug 27 09:41:32 2014 Start Time : Wed Aug 27 09:41:32 2014 End Time : Wed Aug 27 09:59:20 2014 Exit Code : 265 Master Host : qlogin3 Interactive : True -------------------------------------------------------------------------------....... See next slide. mem size Amount of physical memory used by the job. vmem size Amount of virtual memory used by all concurrent processes in the job. Exit Status Once a job under TORQUE is complete, the exit status attribute will contain the result code returned by the job script. If TORQUE was unable to start the job, this field will contain a negative number produced by the pbs_mom. Otherwise, if the job script was successfully started, the value in this field will be the return value of the script. TORQUE supplied exit codes (256+ error code below) The C routine exit passes only the low order byte ( 256) of its argument. In this case, 256+11 is really 267 but the resulting exit code is only 11 as seen in the output. Array Jobs [jgonza03@qlogin4 demo]$ qsub -t 1-200 -l vmem=4g,walltime=00:02:30 array.sh 125647[] [jgonza03@qlogin4 demo]$ qstat Job ID Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----125636 STDIN jgonza03 00:00:00 R qloginQ 125647[] array.sh jgonza03 0 Q short [jgonza03@qlogin4 demo]$ qstat -t Job ID Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----125636 STDIN jgonza03 00:00:00 R qloginQ 125647[1] array.sh-1 jgonza03 0 Q short 125647[2] array.sh-2 jgonza03 0 Q short 125647[3] array.sh-3 jgonza03 0 Q short 125647[4] array.sh-4 jgonza03 0 Q short 125647[5] array.sh-5 jgonza03 0 Q short …… [jgonza03@qlogin4 demo]$ showq -s -v active jobs: 201 eligible jobs: 0 blocked jobs: 0 Total jobs: 201 Job with Dependencies Parameter Application afterok:jobid[:jobid...] This job may be scheduled for execution only after jobs jobid have terminated with errors. See the csh warning under Extended description. afteranyarray:arrayid[count] This job may be scheduled for execution after jobs in arrayid have terminated, with or without errors. [jgonza03@qlogin4 demo]$ qstat –t Job ID Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----125636 STDIN jgonza03 00:00:00 R qloginQ 125648 tutorial jgonza03 0 Q short 125649[1] array.sh-1 jgonza03 0 H all 125649[2] array.sh-2 jgonza03 0 H all 125649[3] array.sh-3 jgonza03 0 H all …. ….. 125649[19] array.sh-19 jgonza03 0 H all 125649[20] array.sh-20 jgonza03 0 H all Known Issues Scheduler issues:  In occasions the qstat command without options reports the wrong status of the jobs. We are working with Adaptive Computing to develop fixes to these software bugs. Other Projects RIT is working to deploy an archive system (tape). Project scheduled to go live during fall 2014. We are working with RIT to develop a project that will make the new cluster available for external users: 1- Collaborators from other institutions 2- People without a Sick Kids windows account domain. Comments and Questions? Please send comments, suggestions and requests to [email protected] and cc [email protected] indicating in the subject that this request is for HPF. Thank you! The Centre for Computational Medicine HPC team Len Zaifman Goran Marik Jorge GonzalezOuteirino Florin Stingaciu Yuki Saito Brian Phan 12 Floor, Room 129830