Slurm Administration
Some preliminary notes on SLURM administration and advanced usage.
Writing it as I learn SLURM so user beware, as these are first
impressions.
External Slurm Reading List
HPC Links
scripts and auxilary commands
The "s" menu script allows for indexing and executing the commands in bin/
that start with "s-" as sub-commands:
- bin/s-cluster (INFO-SLURM) :HW: Identify master Slurm server running slurmctld(1)
- bin/s-contract (FUNCTIONS) :-: compact a list of names with number suffixes
- bin/s-docs (INFO-SLURM) :H: combine or process all Slurm man-pages
- bin/s-drain-cluster (ADMIN) :H: Create/modify a backfilling draindown via a Slurm Reservation
- bin/s-example (INFO-SLURM) :W: example Slurm job scripts and crib sheets
- bin/s-expand (FUNCTIONS) :-: expand a compact list of names with numeric suffixes to a column of values
- bin/s-expand-to-list (FUNCTIONS) :-: expand a compact list of names with numeric suffixes to a comma-delimited line of values
- bin/s-goto (INTER) :HW: start interactive shell in a running batch job
- bin/s-joblists (INFO-JOB) :H: list information on queued jobs grouping by various data classifications
- bin/s-login (INTER) :W: basic login using resources allocated via Slurm
- bin/s-mem (INFO JOB) :W: list memory resources used by a completed job
- bin/s-nodelists (INFO-NODES) :W: list node information using various groupings
- bin/s-partitions (STATE-PARTITIONS) :*: display or turn Slurm partitions/queues on and off
- bin/s-pause (STATE-JOB) :W: place Slurm jobs on hold
- bin/s-peek (INFO-JOB) :H: display Slurm job script, job parameters, and job output for specified job IDs
- bin/s-qos (INFO-QOS) :W: list available QOS:
- bin/s-requeue (STATE-JOB) :HW: requeue Slurm jobs
- bin/s-reservations (STATE-RESERVATIONS) :W: control and list reservations
- bin/s-restart-slurm (ADMIN) :W: restart Slurm
- bin/s-run-on-all (ADMIN) :HW: submit small confidence test to each node
- bin/s-select-jobid-args (FUNCTIONS) :-: parse common parameters for selecting jobIDs and output the numbers
- bin/s-slurm2sql (INFO-JOB) :W: dump squeue(1) job data into an SQLite3 table
- bin/s-state (STATE-JOB) :HW: suspend/resume, hold/release, freeze/thaw selected Slurm jobs
- bin/s-top (STATE-JOB) :W: forces the Slurm job to the top of the queue
The s-example script contains discussions, issues, and tips conceringing
Slurm. With the bin/ directory in your seach path enter "s exa" for more
information.
Download