SLURM: Node lists

Most member nodes of large clusters are named using a simple repetitive pattern composed of a basename followed by a number (even though they may have other aliases). Such names are easier to generate in scripts than unique names; lend themselves to abbreviated forms; and generally indicate that nodes are related members of a cluster.

When host names follow this ${string}${number} pattern Slurm commands can list nodes used for a job or command in a compact notation, in which square brackets (i.e. [ and ]) are used to delimit lists and/or ranges of the numeric values.

This compressed form can be useful in creating compact displays as well as generally being easier to enter in command-line mode than a long list of hostnames.

expand a compact list of hostnames

Here is how to expand a compact list to full names on separate lines. The command is just performing a string manipulation so the expanded node names are not required to exist. This example works on any platform, not just on a cluster where the hostnames exist:

scontrol show hostnames 'fy24-[1-3,5-9],fy25-[1,4,8]'
fy24-1
fy24-2
fy24-3
fy24-5
fy24-6
fy24-7
fy24-8
fy24-9
fy25-1
fy25-4
fy25-8

Leading zeros can be used to specify the minimum length of the numbers in a range:

$scontrol show hostnames 'pgh[0001-20,0100,0200]'|xargs -n 10
pgh0001 pgh0002 pgh0003 pgh0004 pgh0005 pgh0006 pgh0007 pgh0008 pgh0009 pgh0010
pgh0011 pgh0012 pgh0013 pgh0014 pgh0015 pgh0016 pgh0017 pgh0018 pgh0019 pgh0020
pgh0100 pgh0200

Create a compact list from a list of hostnames

The reverse operation is performed by the "hostlist" option, which can generate a compact list from a list of full hostnames:

$scontrol show hostlist 'fy21-24-1,fy21-24-2,fy21-24-3,\
 fy21-24-5,fy21-24-6,fy21-24-7,fy21-24-8,fy21-24-9,\
 fy21-25-1,fy21-25-4,fy21-25-8'

results in

fy21-24-[1-3,5-9],fy21-25-[1,4,8]

list all the nodes in a cluster (compact and expanded)

The sinfo(1) command allows printing many subsets of the nodes in a cluster. By partition, job, state, ... . To

 # list all nodes of current cluster
 sinfo --format='%n' --noheader  # expanded
 sinfo --format='%N' --noheader  # compact

 # A more generic example shows that a list of nodes external to sinfo(1)
 # can be compressed (although in this case sinfo(1) is used to generate
 # the list):

scontrl show hostlist "$(sinfo --format='%n' --noheader xargs)"

list all the nodes in a job

The list of nodes a job spans is often useful

    squeue --jobs=$SLURM_JOB_ID --format '%n' --noheader > machine_file
    squeue --jobs=$SLURM_JOB_ID --format '%N' --noheader > machine_file

UP

category: index