Description of Slurm partitions
The following subsections sum up the parameters of the different partitions available on ruche.
Note : partition is the slurm-oriented term for PBS queue
Default partition
Unlike PBS (the previous ressource manager on Fusion), Slurm does not allow the definition of a queue/partition that route automatically the jobs depending on the resources selection.
You will have to choose in which partition you want to submit your jobs. This can be done throught the command #SBATCH -p <partition name>
If you don't select a partition, your job will be redirected to cpu_short
.
CPU partitions
These are the common partitions for CPU computing on ruche.
Resources
See CPU resources for complete description of CPU nodes.
Pool name | Nb of nodes |
Nb of cores/node | Max. memory per node |
---|---|---|---|
CPU | 216 | 40 | 180 000 MB |
Note : 180 000 MB is about 175 GB
Partitions
Partition name | Max Walltime | Max Jobs per user* | Max CPU per user* | Max Mem per user* | Node equivalent |
---|---|---|---|---|---|
cpu_short | 01:00:00 | 1 | 1000 | 4 500 000 MB | CPU/Mem limits equivalent to 25 nodes |
cpu_med | 04:00:00 | - | 1000 | 4 500 000 MB | CPU/Mem limits equivalent to 25 nodes |
cpu_long | 168:00:00 | - | 160 | 720 000 MB | CPU/Mem limits equivalent to 4 nodes |
cpu_prod** | 06:00:00 | - | 2000 | 9 000 000 MB | CPU/Mem limits equivalent to 50 nodes |
cpu_scale** | 01:00:00 | 1 | 4000 | 18 000 000 MB | CPU/Mem limits equivalent to 100 nodes |
* : This rule apply on running jobs of a user on the specified partition. This limit does not apply to queued jobs or jobs submitted on other partitions.
** : the cpu_prod partition is allowed to run jobs :
- at night during weekdays (from 20 pm to 8 am)
- all day long during the weekend
Mem partitions
These are the partitions for CPU computing on shared memory on a large number of memory or a large number of cores:
- for very memory intensive jobs that require more than 4 GB per core or more than 175 GB per node,
- or for shared memory jobs that require more than 40 cores on single node.
Resources
See Mem resources and Fusion resources for complete description of Mem nodes.
Pool name | Nb of nodes |
Nb of cores/node | Max. memory per node |
---|---|---|---|
mem | 14 | 80 | 1 540 000 MB |
fusion* | 20 | 80 (x4), 40 (x16) | 1 500 000 MB (x1), 756 000 MB (x3), 180 000 (x16) |
Note : 1 540 000 MB is about 1500 GB, 1 500 000 MB is about 1460 GB
* : this partition is formed with heterogenous compute nodes. Check Hardware description (fusion resources) for further details.
Partitions
Partition name | Pool | Max Walltime | Max Jobs per user* | Max CPU per user* | Max Mem per user* | Node equivalent |
---|---|---|---|---|---|---|
mem_short | mem | 01:00:00 | 1 | 80 | 1 540 000M MB | CPU/Mem limits equivalent to 1 node |
mem | mem | 72:00:00 | - | 160 | 3 080 000M MB | CPU/Mem limits equivalent to 2 nodes |
fusion_shm | fusion | 168:00:00 | - | 160 | 720 000 MB | CPU/Mem limits equivalent to 4 fusion 768 gb nodes |
fusion_shm_big | fusion | 168:00:00 | 1 | 320 | 1 440 000 MB | CPU/Mem limits equivalent to 8 fusion 192 gb nodes |
* : This rule apply on running jobs of a user on the specified partition. This limit does not apply to queued jobs or jobs submitted on other partitions.
GPU partition
This partition is meant for GPU computing and redirect jobs onto GPU resources.
Resources
See Mem resources for complete description of GPU nodes.
Pool name | Nb of nodes |
Nb of cores/node | Max. memory per node | GPUs per node | GPU Reference |
---|---|---|---|---|---|
V100 | 10 | 40 | 756 000 MB | 4 | Nvidia Tesla V100 |
A100 | 9 | 32 | 1 012 000 MB | 4 | Nvidia HGX A100 |
P100 | 4 | 24 | 180 000 MB | 2 | Nvidia Tesla P100 |
Partitions
Partition name | Pool | Max Walltime | Max Jobs per user* | Max GPU per user* | Max CPU per GPU reserved** |
---|---|---|---|---|---|
gpu_test | V100 | 01:00:00 | 1 | 8 | 10 |
gpu | V100 | 24:00:00 | - | 8 | 10 |
gpua100 | A100 | 24:00:00 | - | 4 | 8 |
gpup100 | P100 | 168:00:00 | - | - | 12 |
* : This rule apply on running jobs of a user on the specified partition. This limit does not apply to queued jobs or jobs submitted on other partitions.
** : This rule is local to the job. Ex : a job requiring 2 GPU cannot require more than 20 CPU cores.
SLURM Command
The sinfo
SLURM command lists the available partitions for the supercomputer.
$ sinfo