Description of Slurm partitions

The following subsections sum up the parameters of the different partitions available on ruche.

Note : partition is the slurm-oriented term for PBS queue

Default partition

Unlike PBS (the previous ressource manager on Fusion), Slurm does not allow the definition of a queue/partition that route automatically the jobs depending on the resources selection.

You will have to choose in which partition you want to submit your jobs. This can be done throught the command #SBATCH -p <partition name>

If you don't select a partition, your job will be redirected to cpu_short.

CPU partitions

These are the common partitions for CPU computing on ruche.

Resources

See CPU resources for complete description of CPU nodes.

Pool name Nb of
nodes
Nb of cores/node Max. memory per node
CPU 216 40 180 000 MB

Note : 180 000 MB is about 175 GB

Partitions

Partition name Max Walltime Max Jobs per user* Max CPU per user* Max Mem per user* Node equivalent
cpu_short 01:00:00 1 1000 4 500 000 MB CPU/Mem limits equivalent to 25 nodes
cpu_med 04:00:00 - 1000 4 500 000 MB CPU/Mem limits equivalent to 25 nodes
cpu_long 168:00:00 - 160 720 000 MB CPU/Mem limits equivalent to 4 nodes
cpu_prod** 06:00:00 - 2000 9 000 000 MB CPU/Mem limits equivalent to 50 nodes
cpu_scale** 01:00:00 1 4000 18 000 000 MB CPU/Mem limits equivalent to 100 nodes

* : This rule apply on running jobs of a user on the specified partition. This limit does not apply to queued jobs or jobs submitted on other partitions.

** : the cpu_prod partition is allowed to run jobs :

  • at night during weekdays (from 20 pm to 8 am)
  • all day long during the weekend

Mem partitions

These are the partitions for CPU computing on shared memory on a large number of memory or a large number of cores:

  • for very memory intensive jobs that require more than 4 GB per core or more than 175 GB per node,
  • or for shared memory jobs that require more than 40 cores on single node.

Resources

See Mem resources and Fusion resources for complete description of Mem nodes.

Pool name Nb of
nodes
Nb of cores/node Max. memory per node
mem 14 80 1 540 000 MB
fusion* 20 80 (x4), 40 (x16) 1 540 000 MB (x1), 756 000 MB (x3), 180 000 (x16)

Note : 1 540 000 MB is about 1500 GB

* : this partition is formed with heterogenous compute nodes. Check Hardware description (fusion resources) for further details.

Partitions

Partition name Pool Max Walltime Max Jobs per user* Max CPU per user* Max Mem per user* Node equivalent
mem_short mem 01:00:00 1 80 1 540 000M MB CPU/Mem limits equivalent to 1 node
mem mem 72:00:00 - 160 3 080 000M MB CPU/Mem limits equivalent to 2 nodes
fusion_shm fusion 168:00:00 - 160 720 000 MB CPU/Mem limits equivalent to 4 fusion 768 gb nodes
fusion_shm_big fusion 168:00:00 1 320 1 440 000 MB CPU/Mem limits equivalent to 8 fusion 192 gb nodes

* : This rule apply on running jobs of a user on the specified partition. This limit does not apply to queued jobs or jobs submitted on other partitions.

GPU partition

This partition is meant for GPU computing and redirect jobs onto GPU resources.

Resources

See Mem resources for complete description of GPU nodes.

Pool name Nb of
nodes
Nb of cores/node Max. memory per node GPUs per node GPU Reference
V100 10 40 756 000 MB 4 Nvidia Tesla V100
A100 9 32 1 012 000 MB 4 Nvidia HGX A100
P100 4 24 180 000 MB 2 Nvidia Tesla P100

Partitions

Partition name Pool Max Walltime Max Jobs per user* Max GPU per user* Max CPU per GPU reserved**
gpu_test V100 01:00:00 1 8 10
gpu V100 24:00:00 - 8 10
gpua100 A100 24:00:00 - 4 8
gpup100 P100 168:00:00 - - 12

* : This rule apply on running jobs of a user on the specified partition. This limit does not apply to queued jobs or jobs submitted on other partitions.

** : This rule is local to the job. Ex : a job requiring 2 GPU cannot require more than 20 CPU cores.

SLURM Command

The sinfo SLURM command lists the available partitions for the supercomputer.

$ sinfo