본문 바로가기
Tools

PBS - Workload manager

by wycho 2022. 3. 13.

작업 보내기

$ qsub job_script

${PBS_O_WORKDIR}/${PBS_JOBNAME}.${PBS_JOBID}

 

https://docs.oracle.com/cd/E19957-01/820-0699/i999036/index.html

더보기
  • HOME – Home directory on execution machine
  • USER – User ID of job owner
  • JOB_ID – Current job ID
  • JOB_NAME – Current job name; see the -N option
  • HOSTNAME – Name of the execution host
  • TASK_ID – Array job task index number
SYNOPSIS
qsub [-a date_time] [-A account_string] [-c interval] [-C directive_prefix] [-e path] [-h] [-I] [-j join] [-k keep] [-l resource_list] [-m mail_options] [-M user_list] [-N name] [-o path] [-p priority] [-q destination] [-r c] [-S path_list] [-u user_list] [-v variable_list] [-V] [-W additional_attributes] [-z] [script]

 

작업 확인

$ qstat -u username

 

작업 삭제

$ qselect -u userID | axgs qdel

$ qselect -u userID -S R | axgs qdel

$qdel 2113{0..9}.JobID

 

Node 상태 확인

$ pbsnodes -a

 

Example

$ cat script.pbs
python running.py $VAR1 $VAR2 $VAR3

$ cat pbs_command.txt
qsub -l nodes=node01 -N test1 -v VAR1=variable_01,VAR2=variable_02,VAR3=variable_03 script.pbs
qsub -l nodes=node02 -N test2 -v VAR1=variable_01,VAR2=variable_02,VAR3=variable_03 script.pbs
qsub -l nodes=node03 -N test3 -v VAR1=variable_01,VAR2=variable_02,VAR3=variable_03 script.pbs

$ sh pbs_command.txt

 

qsub

더보기
Option Name Description
-a date_time Declares the time after which the job is eligible for execution.
The date_time argument is in the form:
[[[[CC]YY]MM]DD]hhmm[.SS]
where CC is the first two digits of the year (the century), YY is the second two digits of the year, MM is the two digits for the month, DD is the day of the month, hh is the hour, mm is the minute, and the optional SS is the seconds.
If the month (MM) is not specified, it will default to the current month if the specified day (DD) is in the future. Otherwise, the month will be set to next month. Likewise, if the day (DD) is not specified, it will default to today if the time (hhmm) is in the future. Otherwise, the day will be set to tomorrow.
For example, if you submit a job at 11:15 am with a time of -a 1110, the job will be eligible to run at 11:10 am tomorrow.
-A account_string Defines the account string associated with the job. The account_string is an undefined string of characters and is interpreted by the server which executes the job. See section 2.7.1 of the PBS ERS.
-b seconds Defines the maximum number of seconds qsub will block attempting to contact pbs_server. If pbs_server is down, or for a variety of communication failures, qsub will continually retry connecting to pbs_server for job submission.
This value overrides the CLIENTRETRY parameter in torque.cfg. This is a non-portable TORQUE extension. Portability-minded users can use the PBS_CLIENTRETRY environmental variable. A negative value is interpreted as infinity. The default is 0.
-c checkpoint_options Defines the options that will apply to the job. If the job executes upon a host which does not support checkpoint, these options will be ignored.
Valid checkpoint options are:
  • none – No checkpointing is to be performed.
  • enabled – Specify that checkpointing is allowed but must be explicitly invoked by either the qhold or qchkpt commands.
  • shutdown – Specify that checkpointing is to be done on a job at pbs_mom shutdown.
  • periodic – Specify that periodic checkpointing is enabled. The default interval is 10 minutes and can be changed by the $checkpoint_interval option in the MOM config file or by specifying an interval when the job is submitted
  • interval=minutes – Checkpointing is to be performed at an interval of minutes, which is the integer number of minutes of wall time used by the job. This value must be greater than zero.
  • depth=number – Specify a number (depth) of checkpoint images to be kept in the checkpoint directory.
  • dir=path – Specify a checkpoint directory (default is /var/spool/torque/checkpoint).
-C directive_prefix Defines the prefix that declares a directive to the qsub command within the script file. (See the paragraph on script directives under Extended description.)
If the -C option is presented with a directive_prefix argument that is the null string, qsub will not scan the script file for directives.
-d path Defines the working directory path to be used for the job. If the -d option is not specified, the default working directory is the home directory. This option sets the environment variable PBS_O_INITDIR.
-D path Defines the root directory to be used for the job. This option sets the environment variable PBS_O_ROOTDIR.
-e path Defines the path to be used for the standard error stream of the batch job. The path argument is of the form:
[hostname:]path_name
where hostname is the name of a host to which the file will be returned, and path_name is the path name on that host in the syntax recognized by POSIX. The argument will be interpreted as follows:
  • path_name – where path_name is not an absolute path name, then the qsub command will expand the path name relative to the current working directory of the command. The command will supply the name of the host upon which it is executing for the hostname component.
  • hostname:path_name – where path_name is not an absolute path name, then the qsub command will not expand the path name relative to the current working directory of the command. On delivery of the standard error, the path name will be expanded relative to the users home directory on the hostname system.
  • path_name – where path_name specifies an absolute path name, then the qsub will supply the name of the host on which it is executing for the hostname.
  • hostname:path_name – where path_name specifies an absolute path name, the path will be used as specified.
If the -e option is not specified, the default file name for the standard error stream will be used. The default name has the following form:
  • job_name.esequence_number – where job_name is the name of the job (see the -n name option) and sequence_number is the job number assigned when the job is submitted.
-f --- Job is made fault tolerant. Jobs running on multiple nodes are periodically polled by mother superior. If one of the nodes fails to report, the job is canceled by mother superior and a failure is reported. If a job is fault tolerant, it will not be canceled based on failed polling (no matter how many nodes fail to report). This may be desirable if transient network failures are causing large jobs not to complete, where ignoring one failed polling attempt can be corrected at the next polling attempt.
If TORQUE is compiled with PBS_NO_POSIX_VIOLATION (there is no config option for this), you have to use -W fault_tolerant=true to mark the job as fault tolerant.
-F --- Specfies the arguments that will be passed to the job script when the script is launched. The accepted syntax is:
qsub -F "myarg1 myarg2 myarg3=myarg3value" myscript2.sh
Quotation marks are required. qsub will fail with an error message if the argument following -F is not a quoted value. The pbs_mom server will pass the quoted value as arguments to the job script when it launches the script.
-h --- Specifies that a user hold be applied to the job at submission time.
-I --- Declares that the job is to be run "interactively". The job will be queued and scheduled as any PBS batch job, but when executed, the standard input, output, and error streams of the job are connected through qsub to the terminal session in which qsub is running. Interactive jobs are forced to not rerunable. See Extended description for additional information of interactive jobs.
-j join Declares if the standard error stream of the job will be merged with the standard output stream of the job.
An option argument value of oe directs that the two streams will be merged, intermixed, as standard output. An option argument value of eo directs that the two streams will be merged, intermixed, as standard error.
If the join argument is n or the option is not specified, the two streams will be two separate files.
-k keep Defines which (if either) of standard output or standard error will be retained on the execution host. If set for a stream, this option overrides the path name for that stream. If not set, neither stream is retained on the execution host.
The argument is either the single letter "e" or "o", or the letters "e" and "o" combined in either order. Or the argument is the letter "n".
  • e – The standard error stream is to retained on the execution host. The stream will be placed in the home directory of the user under whose user id the job executed. The file name will be the default file name given by:
  • o – The standard output stream is to retained on the execution host. The stream will be placed in the home directory of the user under whose user id the job executed. The file name will be the default file name given by:
  • eo – Both the standard output and standard error streams will be retained.
  • oe – Both the standard output and standard error streams will be retained.
  • n – Neither stream is retained.
-l resource_list Defines the resources that are required by the job and establishes a limit to the amount of resource that can be consumed. If not set for a generally available resource, such as CPU time, the limit is infinite. The resource_list argument is of the form:
resource_name[=[value]][,resource_name[=[value]],...]
In this situation, you should request the more inclusive resource first. For example, a request for procs should come before a gres request.
In TORQUE 3.0.2 or later, qsub supports the mapping of -l gpus=X to -l gres=gpus:X. This allows users who are using NUMA systems to make requests such as -l ncpus=20:gpus=5 indicating they are not concerned with the GPUs in relation to the NUMA nodes they request, they only want a total of 20 cores and 5 GPUs.
For more information, see Requesting resources.
For information on specifying multiple types of resources for allocation, see "Multi-Req Support" under "General Job Policies" in the Moab Workload Manager documentation.
-m mail_options Defines the set of conditions under which the execution server will send a mail message about the job. The mail_options argument is a string which consists of either the single character "n", or one or more of the characters "a", "b", and "e".
If the character "n" is specified, no normal mail is sent. Mail for job cancels and other events outside of normal job processing are still sent.
For the letters "a", "b", and "e":
  • a – Mail is sent when the job is aborted by the batch system.
  • b – Mail is sent when the job begins execution.
  • e – Mail is sent when the job terminates.
If the -m option is not specified, mail will be sent if the job is aborted.
-M user_list Declares the list of users to whom mail is sent by the execution server when it sends mail about the job.
The user_list argument is of the form:
user[@host][,user[@host],...]
If unset, the list defaults to the submitting user at the qsub host, i.e. the job owner.
-n node-exclusive Allows a user to specify an exclusive-node access/allocation request for the job. This affects only cpusets and compatible schedulers (see Linux cpuset support).
-N name Declares a name for the job. The name specified may be an unlimited number of characters in length. It must consist of printable, non white space characters with the first character alphabetic.
If the -N option is not specified, the job name will be the base name of the job script file specified on the command line. If no script file name was specified and the script was read from the standard input, then the job name will be set to STDIN.
-o path Defines the path to be used for the standard output stream of the batch job. The path argument is of the form:
[hostname:]path_name
where hostname is the name of a host to which the file will be returned, and path_name is the path name on that host in the syntax recognized by POSIX. The argument will be interpreted as follows:
  • path_name – where path_name is not an absolute path name, then the qsub command will expand the path name relative to the current working directory of the command. The command will supply the name of the host upon which it is executing for the hostname component.
  • hostname:path_name – where path_name is not an absolute path name, then the qsub command will not expand the path name relative to the current working directory of the command. On delivery of the standard output, the path name will be expanded relative to the users home directory on the hostname system.
  • path_name – where path_name specifies an absolute path name, then the qsub will supply the name of the host on which it is executing for the hostname.
  • hostname:path_namewhere path_name specifies an absolute path name, the path will be used as specified.
If the -o option is not specified, the default file name for the standard output stream will be used. The default name has the following form:
  • job_name.osequence_number – where job_name is the name of the job (see the -n name option) and sequence_number is the job number assigned when the job is submitted.
-p priority Defines the priority of the job. The priority argument must be a integer between -1024 and +1023 inclusive. The default is no priority which is equivalent to a priority of zero.
-P user[:group] Allows a root user or manager to submit a job as another user. TORQUE treats proxy jobs as though the jobs were submitted by the supplied username. This feature is available in TORQUE 2.4.7 and later, however, TORQUE 2.4.7 does not have the ability to supply the [:group] option; it is available in TORQUE 2.4.8 and later.
-q destination Defines the destination of the job. The destination names a queue, a server, or a queue at a server.
The qsub command will submit the script to the server defined by the destination argument. If the destination is a routing queue, the job may be routed by the server to a new destination.
If the -q option is not specified, the qsub command will submit the script to the default server. (See Environment variables and the PBS ERS section 2.7.4, "Default Server".)
If the -q option is specified, it is in one of the following three forms:
  • queue
  • @server
  • queue@server
If the destination argument names a queue and does not name a server, the job will be submitted to the named queue at the default server.
If the destination argument names a server and does not name a queue, the job will be submitted to the default queue at the named server.
If the destination argument names both a queue and a server, the job will be submitted to the named queue at the named server.
-r y/n Declares whether the job is rerunable (see the qrerun command). The option argument is a single character, either y or n.
If the argument is "y", the job is rerunable. If the argument is "n", the job is not rerunable. The default value is y, rerunable.
-S path_list Declares the path to the desires shell for this job.
qsub script.sh -S /bin/tcsh
If the shell path is different on different compute nodes, use the following syntax:
path[@host][,path[@host],...]
qsub script.sh -S /bin/tcsh@node1,/usr/bin/tcsh@node2

Only one path may be specified for any host named. Only one path may be specified without the corresponding host name. The path selected will be the one with the host name that matched the name of the execution host. If no matching host is found, then the path specified without a host will be selected, if present.
If the -S option is not specified, the option argument is the null string, or no entry from the path_list is selected, the execution will use the users login shell on the execution host.
-t array_request Specifies the task ids of a job array. Single task arrays are allowed.
The array_request argument is an integer id or a range of integers. Multiple ids or id ranges can be combined in a comma delimted list. Examples: -t 1-100 or -t 1,10,50-100
An optional slot limit can be specified to limit the amount of jobs that can run concurrently in the job array. The default value is unlimited. The slot limit must be the last thing specified in the array_request and is delimited from the array by a percent sign (%).
qsub script.sh -t 0-299%5
This sets the slot limit to 5. Only 5 jobs from this array can run at the same time.
You can use qalter to modify slot limits on an array. The server parameter max_slot_limit can be used to set a global slot limit policy.
-u user_list Defines the user name under which the job is to run on the execution system.
The user_list argument is of the form:
user[@host][,user[@host],...]
Only one user name may be given per specified host. Only one of the user specifications may be supplied without the corresponding host specification. That user name will used for execution on any host not named in the argument list. If unset, the user list defaults to the user who is running qsub.
-v variable_list Expands the list of environment variables that are exported to the job.
In addition to the variables described in the "Description" section above, variable_list names environment variables from the qsub command environment which are made available to the job when it executes. The variable_list is a comma separated list of strings of the form variable or variable=value. These variables and their values are passed to the job.
-V --- Declares that all environment variables in the qsub commands environment are to be exported to the batch job.
-W additional_attributes The -W option allows for the specification of additional job attributes. The general syntax of -W is in the form:
-W attr_name=attr_value[,attr_name=attr_value...]
If white space occurs anywhere within the option argument string or the equal sign, "=", occurs within an attribute_value string, then the string must be enclosed with either single or double quote marks.
PBS currently supports the following attributes within the -W option:
  • depend=dependency_list – Defines the dependency between this and other jobs. The dependency_list is in the form:
  • group_list=g_list – Defines the group name under which the job is to run on the execution system. The g_list argument is of the form:
  • interactive=true – If the interactive attribute is specified, the job is an interactive job. The -I option is a alternative method of specifying this attribute.
  • job_radix=<int> – To be used with parallel jobs. It directs the Mother Superior of the job to create a distribution radix of size <int> between sisters. See Managing multi-node jobs.
  • stagein=file_list
  • stageout=file_list – Specifies which files are staged (copied) in before job start or staged out after the job completes execution. On completion of the job, all staged-in and staged-out files are removed from the execution system. The file_list is in the form:
  • umask=XXX – Sets umask used to create stdout and stderr spool files in pbs_mom spool directory. Values starting with 0 are treated as octal values, otherwise the value is treated as a decimal umask value.
-x --- By default, if you submit an interactive job with a script, the script will be parsed for PBS directives but the rest of the script will be ignored since it's an interactive job. The -x option allows the script to be executed in the interactive job and then the job completes. For example:
script.sh
#!/bin/bash
ls
---end script---
qsub -I script.sh
qsub: waiting for job 5.napali to start
dbeer@napali:#
<displays the contents of the directory, because of the ls command>
qsub: job 5.napali completed
-X --- Enables X11 forwarding. The DISPLAY environment variable must be set.
-z --- Directs that the qsub command is not to write the job identifier assigned to the job to the commands standard output.

 

qstat

더보기
Option Description
-f Specifies that a full status display be written to standard out. The [time] value is the amount of walltime, in seconds, remaining for the job. [time] does not account for walltime multipliers.
-a All jobs are displayed in the alternative format (see Standard output). If the operand is a destination id, all jobs at that destination are displayed. If the operand is a job id, information about that job is displayed.
-e If the operand is a job id or not specified, only jobs in executable queues are displayed. Setting the PBS_QSTAT_EXECONLY environment variable will also enable this option.
-i Job status is displayed in the alternative format. For a destination id operand, status for jobs at that destination which are not running are displayed. This includes jobs which are queued, held or waiting. If an operand is a job id, status for that job is displayed regardless of its state.
-r If an operand is a job id, status for that job is displayed. For a destination id operand, status for jobs at that destination which are running are displayed, this includes jobs which are suspended.
-n In addition to the basic information, nodes allocated to a job are listed.
-1 In combination with -n, the -1 option puts all of the nodes on the same line as the job ID. In combination with -f, attributes are not folded to fit in a terminal window. This is intended to ease the parsing of the qstat output.
-s In addition to the basic information, any comment provided by the batch administrator or scheduler is shown.
-G Show size information in giga-bytes.
-M Show size information, disk or memory in mega-words. A word is considered to be 8 bytes.
-R In addition to other information, disk reservation information is shown. Not applicable to all systems.
-t Normal qstat output displays a summary of the array instead of the entire array, job for job. qstat -t expands the output to display the entire array. Note that arrays are now named with brackets following the array name; for example:
dbeer@napali:~/dev/torque/array_changes$ echo sleep 20 | qsub -t 0-299 189[].napali
Individual jobs in the array are now also noted using square brackets instead of dashes; for example, here is part of the output of qstat -t for the preceding array:
189[299].napali STDIN[299] dbeer 0 Q batch
-u Job status is displayed in the alternative format. If an operand is a job id, status for that job is displayed. For a destination id operand, status for jobs at that destination which are owned by the user(s) listed in user_list are displayed. The syntax of the user_list is:
user_name[@host][,user_name[@host],...]
Host names may be wild carded on the left end, e.g. "*.nasa.gov". User_name without a "@host" is equivalent to "user_name@*", that is at any host.
-Q Specifies that the request is for queue status and that the operands are destination identifiers.
-q Specifies that the request is for queue status which should be shown in the alternative format.
-B Specifies that the request is for batch server status and that the operands are the names of servers.

 

Reference

- http://docs.adaptivecomputing.com/torque/4-1-3/Content/topics/commands/qsub.htm

- http://docs.adaptivecomputing.com/torque/4-1-3/Content/topics/commands/qstat.htm

 

 

 

'Tools' 카테고리의 다른 글

Python 3.11  (0) 2022.11.11
PDF dark mode  (0) 2022.04.11
Methylcheck  (0) 2022.01.20
Methylprep  (0) 2022.01.19
ngrok - local PC에 접속하기  (0) 2021.12.27

댓글