Slurm - Workload manager

Slurm은 cluster server에서 job을 manage해주는 프로그램이다.

https://slurm.schedmd.com/sbatch.html

Package를 통해 설치하거나, 파일을 다운받아 설치하는 두 가지의 방법이 있다. Package 설치가 편리하다. 하지만 최신버전은 package가 없기 때문에, 홈페이지에서 설치파일을 다운받아 설치한다.

Slurm 은 node간에 통신을 통해 job management가 이루어지 때문에, 각 compute node에서 방화벽을 해제하여야 한다. 또한 보안 통신을 위해 munge가 필요하고, master node에는 DB를 위해 mysql (mariadb) 설정이 필요하다.

Slurm 20.02 부터는 compute node에 slurm.conf를 작성하지 않아도, slurmd를 activation하면 master node의 설정들을 가지고 온다.

기본 사용법

- 통신의 암호화를 위해서 Munge 를 실행시켜 주어야 하고, compute node에서는 방화벽을 해제한다.

$ sudo systemctl start munge

$ sudo systemctl stop firewall && sudo systemctl disable firewall

- Master node에서는 slurmctld, 즉 slurm system and service management 를 control 한다.

$ sudo systemctl start slurmctld

- Compute node에서는 slurmd 가 사용된다.

$ sudo systemctl start slurmd

상태 확인

$ sinfo # 자주 사용하는 옵션은 -l(--long)과 -N(--Node)이다. ex) $ sinfo -lN

- Compute node의 상태가 idle 이 아닌, down이나 drain 상태일 때 마스터 노트에서 다음을 입력한다.

$ sudo scontrol update NodeName=name State=RESUME

- 작업이 끝났음에도 불구하고 complete 상태가 지속되고 있을 경우:

$ sudo scontrol update NodeName=name State=DOWN Reason=hung_completing

"Could not resolve hostname SERVER: Name or service not known" 이라는 문구가 나오면 hostfile에 추가해준다.

# Sample /etc/hosts file
127.0.0.1    localhost
127.0.1.1    computerhostnamehere

10.0.2.15    server

$ scontrol show nodes

문제가 생기면 log 파일을 확인한다.

Compute node bugs: tail /var/log/slurmd.log
Server node bugs: tail /var/log/slurmctld.log

$ sinfo --help

Usage: sinfo [OPTIONS]
  -a, --all                  show all partitions (including hidden and those
                             not accessible)
  -b, --bg                   show bgblocks (on Blue Gene systems)
  -d, --dead                 show only non-responding nodes
  -e, --exact                group nodes only on exact match of configuration
      --federation           Report federated information if a member of one
  -h, --noheader             no headers on output
  --hide                     do not show hidden or non-accessible partitions
  -i, --iterate=seconds      specify an iteration period
      --local                show only local cluster in a federation.
                             Overrides --federation.
  -l, --long                 long output - displays more information
  -M, --clusters=names       clusters to issue commands to. Implies --local.
                             NOTE: SlurmDBD must be up.
  -n, --nodes=NODES          report on specific node(s)
  --noconvert                don't convert units from their original type
                             (e.g. 2048M won't be converted to 2G).
  -N, --Node                 Node-centric format
  -o, --format=format        format specification
  -O, --Format=format        long format specification
  -p, --partition=PARTITION  report on specific partition
  -r, --responding           report only responding nodes
  -R, --list-reasons         list reason nodes are down or drained
  -s, --summarize            report state summary only
  -S, --sort=fields          comma separated list of fields to sort on
  -t, --states=node_state    specify the what states of nodes to view
  -T, --reservation          show only reservation information
  -v, --verbose              verbosity level
  -V, --version              output version information and exit

Help options:
  --help                     show this help message
  --usage                    display brief usage message

- 작업 실행

$ srun python test.py

$ srun bash -c "python test.py"

$ sbatch --wrap="python test.py" # stdout 은 slurm-1234.out 과 같은 "slurm-"+"Job ID"의 이름을 가지고 생성된다.

srun으로 실행하는 것과 sbatch로 실행하는 것에는 차이가 있다.

srun은 master에서 실행시킨 job에 대해서 종료신호를 기다리고 있기 때문에 리소스를 계속 가지고 있으며, job의 수가 상당히 많을때는 master node가 hang에 걸리거나 실행이 안되는 문제가 발생한다.
그에 반해 sbatch는 job을 각 compute node에 던져주고 종료신호를 기다리지 않기 때문에 srun으로 실행시켰을 때와 같은 문제는 발생하지 않는다.

정확히 이야기해서 실행되고 있는 job에 대해서이다. (Process 자체는 srun이나 sbatch나 queue에 먼저 넣어주고 node에 job을 할당한다.)

편의를 위한 option

srun	-J jobname	job을 queue로 보낼 때, 표시되는 이름.
	--mem=20000	task 하나에 할당하는 메모리 (단위 kb). 한 node의 최대 메모리를 기준으로 할당된 메모리에 따라 실행되는 job의 수가 결정됨. 예를 들어 node의 메모리가 128GB이고 --mem=20000 (~20GB)을 할당했다면 6개의 job이 동시에 실행됨.
	-Q	job allocation과 같은 메시지를 출력하지 않음. error 내용은 표시됨.
	-N 2	사용할 node의 수. 2개 node에 동시에 같은 명령을 내림.
	-n 2	동일한 명령을 2개 실행함.
	-c 2	task 하나에 사용될 cpu 수.
	-w nd-2	nd-2라는 이름의 node에서 명령을 실행.
	-p part1	slurm에서 파티션을 나누었다면, 특정 파티션에만 명령을 실행시킴.
	-o display.out	srun에서는 화면에 stdout이 출력되는데, 출력하지 않고 파일에 저장함.
	-e display.err	job이 실행에서 error가 발생하면 표시되는 내용을 파일에 저장함.

$ srun -J test python test.py

$ squeue -S LIST -l

Mon Aug 17 14:44:36 2020
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
            180193     part1     test     user  RUNNING       0:10 UNLIMITED      1 nd-1

$ for i in {1..6};do srun -J job${i} --mem=20000 bash -c "python test.py ${i}" & done

$ squeue -S LIST -l

Mon Aug 17 15:00:00 2020
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
            180193     part1     job1     user  RUNNING       0:01 UNLIMITED      1 nd-1
            180194     part1     job3     user  RUNNING       0:01 UNLIMITED      1 nd-1
            180195     part1     job4     user  RUNNING       0:01 UNLIMITED      1 nd-2
            180196     part1     job2     user  RUNNING       0:01 UNLIMITED      1 nd-2
            180197     part1     job5     user  PENDING       0:01 UNLIMITED      1 nd-1
            180198     part1     job6     user  PENDING       0:01 UNLIMITED      1 nd-2

$ srun -h

Usage: srun [OPTIONS...] executable [args...]

Parallel run options:
  -A, --account=name          charge job to specified account
      --acctg-freq=<datatype>=<interval> accounting and profiling sampling
                              intervals. Supported datatypes:
                              task=<interval> energy=<interval>
                              network=<interval> filesystem=<interval>
      --bb=<spec>             burst buffer specifications
      --bbf=<file_name>       burst buffer specification file
      --bcast=<dest_path>     Copy executable file to compute nodes
      --begin=time            defer job until HH:MM MM/DD/YY
  -c, --cpus-per-task=ncpus   number of cpus required per task
      --checkpoint=time       job step checkpoint interval
      --checkpoint-dir=dir    directory to store job step checkpoint image
                              files
      --comment=name          arbitrary comment
      --compress[=library]    data compression library used with --bcast
      --cpu-freq=min[-max[:gov]] requested cpu frequency (and governor)
  -d, --dependency=type:jobid defer job until condition on jobid is satisfied
      --deadline=time         remove the job if no ending possible before
                              this deadline (start > (deadline - time[-min]))
      --delay-boot=mins       delay boot for desired node features
  -D, --chdir=path            change remote current working directory
      --export=env_vars|NONE  environment variables passed to launcher with
                              optional values or NONE (pass no variables)
  -e, --error=err             location of stderr redirection
      --epilog=program        run "program" after launching job step
  -E, --preserve-env          env vars for node and task counts override
                              command-line flags
      --get-user-env          used by Moab.  See srun man page.
      --gres=list             required generic resources
      --gres-flags=opts       flags related to GRES management
  -H, --hold                  submit job in held state
  -i, --input=in              location of stdin redirection
  -I, --immediate[=secs]      exit if resources not available in "secs"
      --jobid=id              run under already allocated job
  -J, --job-name=jobname      name of job
  -k, --no-kill               do not kill job on node failure
  -K, --kill-on-bad-exit      kill the job if any task terminates with a
                              non-zero exit code
  -l, --label                 prepend task number to lines of stdout/err
      --launch-cmd            print external launcher command line if not SLURM
      --launcher-opts=        options for the external launcher command if not
                              SLURM
  -L, --licenses=names        required license, comma separated
  -M, --clusters=names        Comma separated list of clusters to issue
                              commands to.  Default is current cluster.
                              Name of 'all' will submit to run on all clusters.
                              NOTE: SlurmDBD must up.
  -m, --distribution=type     distribution method for processes to nodes
                              (type = block|cyclic|arbitrary)
      --mail-type=type        notify on state change: BEGIN, END, FAIL or ALL
      --mail-user=user        who to send email notification for job state
                              changes
      --mcs-label=mcs         mcs label if mcs plugin mcs/group is used
      --mpi=type              type of MPI being used
      --multi-prog            if set the program name specified is the
                              configuration specification for multiple programs
  -n, --ntasks=ntasks         number of tasks to run
      --nice[=value]          decrease scheduling priority by value
  -N, --nodes=N               number of nodes on which to run (N = min[-max])
  -o, --output=out            location of stdout redirection
  -O, --overcommit            overcommit resources
      --pack-group=value      pack job allocation(s) in which to launch
                              application
  -p, --partition=partition   partition requested
      --power=flags           power management options
      --priority=value        set the priority of the job to value
      --prolog=program        run "program" before launching job step
      --profile=value         enable acct_gather_profile for detailed data
                              value is all or none or any combination of
                              energy, lustre, network or task
      --propagate[=rlimits]   propagate all [or specific list of] rlimits
      --pty                   run task zero in pseudo terminal
      --quit-on-interrupt     quit on single Ctrl-C
  -q, --qos=qos               quality of service
  -Q, --quiet                 quiet mode (suppress informational messages)
      --reboot                reboot block before starting job
  -r, --relative=n            run job step relative to node n of allocation
      --restart-dir=dir       directory of checkpoint image files to restart
                              from
  -s, --oversubscribe         over-subscribe resources with other jobs
  -S, --core-spec=cores       count of reserved cores
      --signal=[B:]num[@time] send signal when time limit within time seconds
      --slurmd-debug=level    slurmd debug level
      --spread-job            spread job across as many nodes as possible
      --switches=max-switches{@max-time-to-wait}
                              Optimum switches and max time to wait for optimum
      --task-epilog=program   run "program" after launching task
      --task-prolog=program   run "program" before launching task
      --thread-spec=threads   count of reserved threads
  -T, --threads=threads       set srun launch fanout
  -t, --time=minutes          time limit
      --time-min=minutes      minimum time limit (if distinct)
  -u, --unbuffered            do not line-buffer stdout/err
      --use-min-nodes         if a range of node counts is given, prefer the
                              smaller count
  -v, --verbose               verbose mode (multiple -v's increase verbosity)
  -W, --wait=sec              seconds to wait after first task exits
                              before killing job
      --wckey=wckey           wckey to run job under
  -X, --disable-status        Disable Ctrl-C status feature

Constraint options:
      --cluster-constraint=list specify a list of cluster-constraints
      --contiguous            demand a contiguous range of nodes
  -C, --constraint=list       specify a list of constraints
      --mem=MB                minimum amount of real memory
      --mincpus=n             minimum number of logical processors (threads)
                              per node
      --reservation=name      allocate resources from named reservation
      --tmp=MB                minimum amount of temporary disk
  -w, --nodelist=hosts...     request a specific list of hosts
  -x, --exclude=hosts...      exclude a specific list of hosts
  -Z, --no-allocate           don't allocate nodes (must supply -w)

Consumable resources related options:
      --exclusive[=user]      allocate nodes in exclusive mode when
                              cpu consumable resource is enabled
                              or don't share CPUs for job steps
      --exclusive[=mcs]       allocate nodes in exclusive mode when
                              cpu consumable resource is enabled
                              and mcs plugin is enabled
                              or don't share CPUs for job steps
      --mem-per-cpu=MB        maximum amount of real memory per allocated
                              cpu required by the job.
                              --mem >= --mem-per-cpu if --mem is specified.
      --resv-ports            reserve communication ports

Affinity/Multi-core options: (when the task/affinity plugin is enabled)
  -B, --extra-node-info=S[:C[:T]]           Expands to:
      --sockets-per-node=S    number of sockets per node to allocate
      --cores-per-socket=C    number of cores per socket to allocate
      --threads-per-core=T    number of threads per core to allocate
                              each field can be 'min' or wildcard '*'
                              total cpus requested = (N x S x C x T)

      --ntasks-per-core=n     number of tasks to invoke on each core
      --ntasks-per-socket=n   number of tasks to invoke on each socket


Help options:
  -h, --help                  show this help message
      --usage                 display brief usage message

Other options:
  -V, --version               output version information and exit

$ sbatch -h

Usage: sbatch [OPTIONS...] executable [args...]

Parallel run options:
  -a, --array=indexes         job array index values
  -A, --account=name          charge job to specified account
      --bb=<spec>             burst buffer specifications
      --bbf=<file_name>       burst buffer specification file
      --begin=time            defer job until HH:MM MM/DD/YY
      --comment=name          arbitrary comment
      --cpu-freq=min[-max[:gov]] requested cpu frequency (and governor)
  -c, --cpus-per-task=ncpus   number of cpus required per task
  -d, --dependency=type:jobid defer job until condition on jobid is satisfied
      --deadline=time         remove the job if no ending possible before
                              this deadline (start > (deadline - time[-min]))
      --delay-boot=mins       delay boot for desired node features
  -D, --chdir=directory       set working directory for batch script
  -e, --error=err             file for batch script's standard error
      --export[=names]        specify environment variables to export
      --export-file=file|fd   specify environment variables file or file
                              descriptor to export
      --get-user-env          load environment from local cluster
      --gid=group_id          group ID to run job as (user root only)
      --gres=list             required generic resources
      --gres-flags=opts       flags related to GRES management
  -H, --hold                  submit job in held state
      --ignore-pbs            Ignore #PBS options in the batch script
  -i, --input=in              file for batch script's standard input
  -I, --immediate             exit if resources are not immediately available
      --jobid=id              run under already allocated job
  -J, --job-name=jobname      name of job
  -k, --no-kill               do not kill job on node failure
  -L, --licenses=names        required license, comma separated
  -M, --clusters=names        Comma separated list of clusters to issue
                              commands to.  Default is current cluster.
                              Name of 'all' will submit to run on all clusters.
                              NOTE: SlurmDBD must up.
  -m, --distribution=type     distribution method for processes to nodes
                              (type = block|cyclic|arbitrary)
      --mail-type=type        notify on state change: BEGIN, END, FAIL or ALL
      --mail-user=user        who to send email notification for job state
                              changes
      --mcs-label=mcs         mcs label if mcs plugin mcs/group is used
  -n, --ntasks=ntasks         number of tasks to run
      --nice[=value]          decrease scheduling priority by value
      --no-requeue            if set, do not permit the job to be requeued
      --ntasks-per-node=n     number of tasks to invoke on each node
  -N, --nodes=N               number of nodes on which to run (N = min[-max])
  -o, --output=out            file for batch script's standard output
  -O, --overcommit            overcommit resources
      --profile=value         enable acct_gather_profile for detailed data
                              value is all or none or any combination of
                              energy, lustre, network or task
      --propagate[=rlimits]   propagate all [or specific list of] rlimits
  -q, --qos=qos               quality of service
  -Q, --quiet                 quiet mode (suppress informational messages)
      --reboot                reboot compute nodes before starting job
      --requeue               if set, permit the job to be requeued
  -s, --oversubscribe         over subscribe resources with other jobs
  -S, --core-spec=cores       count of reserved cores
      --signal=[B:]num[@time] send signal when time limit within time seconds
      --spread-job            spread job across as many nodes as possible
      --switches=max-switches{@max-time-to-wait}
                              Optimum switches and max time to wait for optimum
      --thread-spec=threads   count of reserved threads
  -t, --time=minutes          time limit
      --time-min=minutes      minimum time limit (if distinct)
      --uid=user_id           user ID to run job as (user root only)
      --use-min-nodes         if a range of node counts is given, prefer the
                              smaller count
  -v, --verbose               verbose mode (multiple -v's increase verbosity)
  -W, --wait                  wait for completion of submitted job
      --wckey=wckey           wckey to run job under
      --wrap[=command string] wrap command string in a sh script and submit

Constraint options:
      --cluster-constraint=[!]list specify a list of cluster constraints
      --contiguous            demand a contiguous range of nodes
  -C, --constraint=list       specify a list of constraints
  -F, --nodefile=filename     request a specific list of hosts
      --mem=MB                minimum amount of real memory
      --mincpus=n             minimum number of logical processors (threads)
                              per node
      --reservation=name      allocate resources from named reservation
      --tmp=MB                minimum amount of temporary disk
  -w, --nodelist=hosts...     request a specific list of hosts
  -x, --exclude=hosts...      exclude a specific list of hosts

Consumable resources related options:
      --exclusive[=user]      allocate nodes in exclusive mode when
                              cpu consumable resource is enabled
      --exclusive[=mcs]       allocate nodes in exclusive mode when
                              cpu consumable resource is enabled
                              and mcs plugin is enabled
      --mem-per-cpu=MB        maximum amount of real memory per allocated
                              cpu required by the job.
                              --mem >= --mem-per-cpu if --mem is specified.

Affinity/Multi-core options: (when the task/affinity plugin is enabled)
  -B  --extra-node-info=S[:C[:T]]            Expands to:
       --sockets-per-node=S   number of sockets per node to allocate
       --cores-per-socket=C   number of cores per socket to allocate
       --threads-per-core=T   number of threads per core to allocate
                              each field can be 'min' or wildcard '*'
                              total cpus requested = (N x S x C x T)

      --ntasks-per-core=n     number of tasks to invoke on each core
      --ntasks-per-socket=n   number of tasks to invoke on each socket


Help options:
  -h, --help                  show this help message
  -u, --usage                 display brief usage message

Other options:
  -V, --version               output version information and exit

$ squeue -h

Usage: squeue [OPTIONS]
  -A, --account=account(s)        comma separated list of accounts
                                  to view, default is all accounts
  -a, --all                       display jobs in hidden partitions
      --array-unique              display one unique pending job array
                                  element per line
      --federation                Report federated information if a member
                                  of one
  -h, --noheader                  no headers on output
      --hide                      do not display jobs in hidden partitions
  -i, --iterate=seconds           specify an interation period
  -j, --job=job(s)                comma separated list of jobs IDs
                                  to view, default is all
      --local                     Report information only about jobs on the
                                  local cluster. Overrides --federation.
  -l, --long                      long report
  -L, --licenses=(license names)  comma separated list of license names to view
  -M, --clusters=cluster_name     cluster to issue commands to.  Default is
                                  current cluster.  cluster with no name will
                                  reset to default. Implies --local.
  -n, --name=job_name(s)          comma separated list of job names to view
      --noconvert                 don't convert units from their original type
                                  (e.g. 2048M won't be converted to 2G).
  -o, --format=format             format specification
  -O, --Format=format             format specification
  -p, --partition=partition(s)    comma separated list of partitions
                                  to view, default is all partitions
  -q, --qos=qos(s)                comma separated list of qos's
                                  to view, default is all qos's
  -R, --reservation=name          reservation to view, default is all
  -r, --array                     display one job array element per line
      --sibling                   Report information about all sibling jobs
                                  on a federated cluster. Implies --federation.
  -s, --step=step(s)              comma separated list of job steps
                                  to view, default is all
  -S, --sort=fields               comma separated list of fields to sort on
      --start                     print expected start times of pending jobs
  -t, --states=states             comma separated list of states to view,
                                  default is pending and running,
                                  '--states=all' reports all states
  -u, --user=user_name(s)         comma separated list of users to view
      --name=job_name(s)          comma separated list of job names to view
  -v, --verbose                   verbosity level
  -V, --version                   output version information and exit
  -w, --nodelist=hostlist         list of nodes to view, default is
                                  all nodes

Help options:
  --help                          show this help message
  --usage                         display a brief summary of squeue options

/etc/slurm/slurm.conf

SlurmctldHost=g-master
SlurmctldParameters=enable_configless
#SlurmctldHost=
#
AuthType=auth/munge
#CheckpointType=checkpoint/none
CryptoType=crypto/munge
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=999999
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobCheckpointDir=/var/lib/slurm-llnl/checkpoint
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=1
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/usr/bin/mail
#MaxJobCount=5000
#MaxStepCount=40000
#MaxTasksPerNode=128
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/cgroup
#Prolog=
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
#SallocDefaultCommand=
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/none
#TaskPlugin=task/affinity
#TaskPluginParam=Sched

#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#

# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#

# SCHEDULING
#FastSchedule=1
SchedulerType=sched/backfill
#SelectType=select/cons_tres
#SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory

SchedulerParameters=max_rpc_cnt=0

MessageTimeout=30

# JOB PRIORITY
#PriorityFlags=
#PriorityType=priority/basic
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#

# LOGGING AND ACCOUNTING
AccountingStorageEnforce=limits
AccountingStorageType=accounting_storage/slurmdbd
#AccountingStoragePort=7031
AccountingStoreJobComment=YES
AccountingStorageUser=slurm
ClusterName=NGScluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
JobCompType=jobcomp/mysql
JobCompLoc=slurm_comp_db
JobCompUser=slurm
JobCompPass=SLMbio0912$
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
MaxArraySize=1000000
MaxJobCount=1000000
MaxStepCount=1000000
MaxTasksPerNode=65500
#OverSubscribe=FORCE:40
#MaxCPUsPerTask=unlimit
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
GresTypes=gpu

# COMPUTE NODES
#NodeName=g-master CPUs=4 RealMemory=28000 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=adm-022 CPUs=40 RealMemory=200000 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 State=UNKNOWN
NodeName=g-[11-12] CPUs=6 RealMemory=120000 Sockets=1 CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN Gres=gpu:1

# Partition Configurations
PartitionName=all Nodes=adm-022,g-[11-12] Default=YES MaxTime=INFINITE State=UP OverSubscribe=Yes
#PartitionName=master Nodes=g-master Default=NO MaxTime=INFINITE State=UP OverSubscribe=Yes

/etc/slurm/gres.conf

##################################################################
# Slurm's Generic Resource (GRES) configuration file
# Define GPU devices with MPS support
##################################################################
#AutoDetect=nvml
NodeName=g-[11-12] Name=gpu File=/dev/nvidia0

Reference

- 설치에 관련해서는 잘 정리된 곳, https://wonwooddo.tistory.com/35

- KISTI SLURM / 관리자 이용자 가이드, https://repository.kisti.re.kr/bitstream/10580/6542/1/2014-147%20Slurm%20%EA%B4%80%EB%A6%AC%EC%9E%90%20%EC%9D%B4%EC%9A%A9%EC%9E%90%20%EA%B0%80%EC%9D%B4%EB%93%9C.pdf

- https://dandyrilla.github.io/2017-04-11/jobsched-slurm/

- https://curc.readthedocs.io/en/latest/running-jobs/slurm-commands.html

저작자표시

'Tools' 카테고리의 다른 글

checkVCF (0)	2020.09.15
Windows Terminal (0)	2020.08.17
GATK (0)	2020.08.07
LDpop (0)	2020.07.12
Shapeit4 (0)	2020.07.09

Analytic reasoning

Slurm - Workload manager

'Tools' 카테고리의 다른 글

댓글

티스토리툴바

Slurm - Workload manager

'Tools' 카테고리의 다른 글

관련글

댓글

티스토리툴바