SLURM: care and feeding of a node

show information about node state

scontrol show node=mercury
scontrol show -a node=mercury 

# show all node information
scontrol --all show nodes

close a node to new jobs

scontrol update NodeName=mercury State=DRAIN

open a node to new jobs

scontrol update NodeName=mercury State=IDLE

close a node for maintenance with a reason

export INITIALS=JSU  # a local convention for adminstrators all logged in under root 
# reason should identify admin who closed it and date 
scontrol update NodeName=mercury State=FAILING Reason="bad IB board:${INITIALS:-${SUDO_USER:-$USER}}:$(date)"

list jobs on a node

sinfo --nodes=mercury

sinfo (1) - view information about Slurm nodes and partitions.

kill jobs running on a node

scancel --nodelist=mercury

UP

category: index