limiting groups of jobs or users via accounting and association as well as job arrays
forcing a job to start on unused resources
reporting system utilization
listing number of nodes with different numbers of cores
option to tightly pack on a specific subset of nodes to maximize number of open nodes for parallel large jobs
more understanding of the scheduling priority
more understanding of reservations
summary of users -- how many jobs, how many cores running, pending and what their limits if any are
integrate local healthchecks, orphaned process killer, tmp and scratch cleanup, job directory cleanup, and so-on.
for users with millions of jobs dayfiles need written locally and copied at job termination or too many inodes, simultaneous access of directories via ethernet, and so on.
what happens if specified output file is on local disk, particularly if multi-host job
want only owner and root to see, but documentation says ... The batch script can only be retrieved by an admin or operator, or by the owner of the job. Can that be prevented?
Can Admins/User edit or change a job that is already submitted, like by editing cached job file in LSF?
is there a central point where all the job are? Looks like /var/spool which could easily be overrun. Healthcheck should check (/proc/mounts, /etc/fstab ?) on permissions
what about space or inodes or directory size when millions of jobs are queued?
Anything checking who can see/edit the cached files?
when running, job is executed from copy placed in /var/spool/ on the node so the /var/spool area needs mounted with execution allowed. This filesystem is often mounted with the "noexec" option (as well as "nosuid,nodev"). Check in proc/mount and change mount to see if this is a problem.
If not, saw job submittals fail with "permission denied". Vendor could execute just with read permission if launching execution of script was changed to read shebang "bash /var/spool/..."
A built-in check of all file permissions and mounts might be useful
a copy is made when a job is submitted, so changes to input file after submit do not change job. True with LSF only if run job from stdin, for example; so might not be expected by some users; although that is good in my opinion (other than duplicate files and storage and possible huge number of files cached).