# $Id: pbs.txt,v 1.4 2002/05/10 00:26:59 tjn Exp $ # # PBS Information, usage, examples, and references. # -------------------------------------------------------------- Introduction/Overview: OpenPBS is the free version, PBS Pro is the commercial version offered from Veridian, [see also: http://www.openpbs.org and http://www.pbspro.com]. The basic structure is comprised of the 'pbs_server', 'pbs_mom' and the scheduler portion. The default FIFO schedule, 'pbs_sched' can be used or quite often the 'maui' scheduler is used. [see also: SAMPLE #4] There are also GUI based tools: 'xpbsmon' & 'xpbs' for monitoring and managing, respectively. Note, the output of commands is spooled on the compute node and transferred back to the server via 'rcp' or 'scp'. Therefore, these must be functioning properly for PBS to function properly. There are a number of commands and that are not listed here but the following commands give a brief summary. These should be sufficient for using a previously installed/configured PBS enabled cluster. -------------------------------------------------------------- PBS Usage: {This section is mainly from [1, ch16].} + Commands: qsub -- submit a jobs (script) to PBS [see also: Sample #1,2] qstat -- status of current jobs in PBS queue [see also: Sample #1, 2] qdel -- delete a jobs from the PBS queue [see also: Sample #3] pbsnodes -- list/manipulation nodes (mark down, free, offline, etc) [see also: Sample #5] qmgr -- system manager NOTE TO TJN: I'm not clear on qstop, qenable, qdisable, qmgr and how you'd use these things to take a node out of the pool and let the queues drain and then do upgrades, etc. on that node and then re-introduce it to the pool. The the manual pages (/usr/local/pbs/man) and the stuff in Ch16 of [1]. -------------------------------------------------------------- PBS Samples: SAMPLE #1 #!/bin/sh # FILENAME: "mpich-tst1.pbs" # # resource directives for PBS, all '-l' are handed to 'qsub' #PBS -N mpich-tst1 #PBS -l walltime=1:00:00 #PBS -l mem=400mb ## #PBS -l ncpus=4 #PBS -j oe # setup environment on the OSCAR cluster (pre-switcher clusters) . /etc/profile.d/mpi-*mpich.sh cd ${HOME}/PBS/test $MPICH/bin/mpicc cpi.c -o tst1 $MPICH/bin/mpirun -machinefile $PBS_NODEFILE -np 2 ./tst1 Then from the command-line you would type: [tjn@oscar test]$ qstat [tjn@oscar test]$ qsub mpich-tst1.pbs 44.oscar [tjn@oscar test]$ qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 44.oscar mpich-tst1 tjn 00:00:00 R workq [tjn@oscar test]$ [tjn@oscar test]$ [tjn@oscar test]$ ls cpi.c cpi.o mpich-tst1.o44 mpich-tst1.pbs mpich-tst2.pbs tst1 [tjn@oscar test]$ [tjn@oscar test]$ [tjn@oscar test]$ [tjn@oscar test]$ cat mpich-tst1.o44 DBG: done sourcing /etc/profile (which sources profile.d/) Process 0 of 2 on oscarnode1.localdomain 1000 iterations: pi is approx. 3.1415927370900434, error = 0.0000000835002503 wall clock time = 0.000243 Process 1 of 2 on oscarnode1.localdomain SAMPLE #2 #!/bin/sh #PBS -N mpich-tst2 #PBS -j oe #PBS -q workq cd ${HOME}/OSCAR_test/mpi #cd $PBS_O_WORKDIR echo "NODEFILE is at ($PBS_NODEFILE)" mpich_profile=/etc/profile.d/mpi-00mpich.sh . $mpich_profile $MPICH/bin/mpicc cpi.c -o tst2 $MPICH/bin/mpirun -machinefile $PBS_NODEFILE -np 2 ./tst2 echo exit [tjn@oscar test]$ qsub mpich-tst2.pbs 46.oscar [tjn@oscar test]$ qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 46.oscar mpich-tst2 tjn 00:00:00 R workq [tjn@oscar test]$ [tjn@oscar test]$ [tjn@oscar test]$ ls cpi.c mpich-tst1.pbs mpich-tst2.o46 mpich-tst2.pbs [tjn@oscar test]$ [tjn@oscar test]$ [tjn@oscar test]$ cat mpich-tst2.o46 DBG: done sourcing /etc/profile (which sources profile.d/) NODEFILE is at (/usr/spool/PBS/aux/46.oscar) Process 0 of 2 on oscarnode1.localdomain 1000 iterations: pi is approx. 3.1415927370900434, error = 0.0000000835002503 wall clock time = 0.000270 Process 1 of 2 on oscarnode1.localdomain [tjn@oscar test]$ SAMPLE #3 [tjn@oscar test]$ qsub myprogram 52.oscar [tjn@oscar test]$ qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 52.oscar myprogram tjn 00:00:00 R workq [tjn@oscar test]$ qdel 52 [tjn@oscar test]$ qstat [tjn@oscar test]$ SAMPLE #4 {restaring PBS on the cluster, using C3-2.7.2} [root@oscar /root]# service pbs_server restart Shutting down PBS Server: [ OK ] Starting PBS Server: [ OK ] [root@oscar /root]# service maui restart Shutting down MAUI Scheduler: [ OK ] Starting MAUI Scheduler: [ OK ] [root@oscar /root]# cexec -c "service pbs_mom restart" Shutting down PBS Mom: [ OK ] Starting PBS Mom: [ OK ] [root@oscar /root]# SAMPLE #5 {list/manipulate the cluster nodes} # Using 'linus' cluster b/c it has 6 nodes (oscar only has 1 node!). # {options from pbsnodes(8B)} # # OPTIONS # -a All nodes and their attributes are listed. The # attributes include "state" and "properties". # # -c Clear OFFLINE and DOWN from listed nodes. The listed # nodes are "free" to be allocated to jobs. # # -l List all nodes marked in any way. # # -o Mark listed nodes as OFFLINE even if currently in use. # This is different from being marked DOWN. An automated # script that checks nodes being up or down and calls pbsnodes # with a list of nodes down will not change the status of # nodes marked OFFLINE. This gives the administrator a tool # to hold a node out of service without changing the automatic # script. # # -r clear OFFLINE from listed nodes. # # -s specifiy the PBS server to which to connect. # [root@linus root]# pbsnodes -a node1.linus state = free np = 2 ntype = cluster node2.linus state = free np = 2 ntype = cluster node3.linus state = free np = 2 ntype = cluster node4.linus state = free np = 2 ntype = cluster node5.linus state = free np = 2 ntype = cluster node6.linus state = free np = 2 ntype = cluster [root@linus root]# pbsnodes -l [root@linus root]# [root@linus root]# pbsnodes -o node6 Error marking node node6 - Unknown node [root@linus root]# [root@linus root]# pbsnodes -o node6.linus [root@linus root]# pbsnodes -l node6.linus offline [root@linus root]# [root@linus root]# [root@linus root]# pbsnodes -c node6.linus [root@linus root]# pbsnodes -l [root@linus root]# ---tmp--- echo "NODEFILE=($PBS_NODEFILE)" echo "PBS_O_WORKDIR=($PBS_O_WORKDIR)" produces... NODEFILE=(/usr/spool/PBS/aux/23.oscar) PBS_O_WORKDIR=(/home/tjn/OSCAR_test/mpi) ---tmp--- -------------------------------------------------------------- References: [1] Thomas Sterling (Editor) et al.,"Beowulf Cluster Computing with Linux", Scientific and Engineering Computation Series, MIT Press, 2002, ISBN:0-262-69274-0. [2] Mike Brim's PBS presentation. [3] Jeremy Enos' presentation at Trieste, Italy, 2002. [4] PBS Manual pages (on oscar-1.2.1 systems, /usr/local/pbs/man/*)