SLURM: user tips

Memory requests

Groups may find themselves with jobs pending due to having reached their memory limits (QOSGrpMemLimit).

While it is important to request more memory than will be used (10-20% is usually sufficient), requesting memory only reduces the number of jobs that a group can run as well as overall throughput on the cluster. Many groups, and our overall user community, will be able to run far more jobs if they request more reasonable amounts of memory.

The sacct(1) command can show you the high-water mark (the maximum memory the job actually used) and can be used to adjust memory requests for future jobs.

example

The SLURM directives for memory requests are the --mem or --mem-per-cpu.

link to man-page

It is in the user’s best interest to adjust the memory request to a realistic value.

slow if job pages -- how to tell

when does setting too low cause problems and when a job can be killed

If an application can use more memory, it will get more memory. Only when the job crosses the limit based on the memory request does SLURM kill the job.

the pathname to stdout of the job

Here are three ways using bash(1) to get the pathname of the output file:

declare -x $(scontrol show jobid=$SLURM_JOBID|grep StdOut)

StdOut=`squeue --Format=STDOUT --job $SLURM_JOBID`

StdOut=`realpath /proc/$SLURM_TASK_PID/fd/1`

If you query Slurm you have to worry about scaleability. Is it OK for a 100 000, jobs to query Slurm simulataneously via squeue(1) or scontrol(1)? Is Slurm responding? The biggest problem is that not all macros are expanded, so if you used something like "#SBATCH --output %J.out". "%J" is not expanded by the Slurm commands (although "%j" is). So using commands like stat(1), ls(1),find(1),and getting pids with fuser(1), pidof(1), ps(1), and full pathnames become attractive. I find using the realpath(1) command at the top of the job and $SLURMTASKID works the best. realpath(1) may not be available on all platforms(?) or the /proc/$PID/fd/1 file, in which case the other commands might be used to the same effect.

SLURMSTDOUT,SLURMSTDERR, and SLURM_STDIN would be useful as a standard feature for batch jobs. I might want the job itself to mail the output, or archive it, or search it for certain strings with grep as examples.

# create softlink OUTPUT to stdout of job
ln -s -f $StdOut OUTPUT

To minimize network traffic and filespace on central servers I might specify the output of a job goes to local storage or a memory-resident file system and move it all to a location on another system (perhaps with scp(1)) or a long-term archive instead of a central area that might be filled by other jobs or that I have a quota on, etc.

So since the stdout might not be easily accessible from my other sessions also having something like "scontrol write batchscript=$SLURMJOBID" that shows the stdout of the job would also be useful. Note the LSF bpeek(1) command allows for a tail(1)-like interface to the job stdout, as an example.

The current situation where the output file is in a globally mounted area and I can just access it from other nodes with grep(1), tail(1), and so on covers many users needs in simple cluster configurations; but there are several scenarios primarily encountered by people running many thousands of single-node jobs where assigning output to a local scratch device and then selectively processing at job termination is preferable. Those users can specify such a local name with "#SBATCH --output" and then, knowing it, process it as desired but having something like $SLURM_STDOUT being the full pathname to the file is much more generic.

UP

category: index