Access reason why slurm stopped a job -


is there way find out why job canceled slurm? distinguish cases resource limit hit other reasons (like manual cancellation). in case resource limit hit, know one.

the slurm log file contains information explicitly. written job's output file like:

job <jobid> cancelled @ <time> due time limit 

or

job <jobid> exceeded <mem> memory limit, being killed: 

or

job <jobid> cancelled @ <time> due node failure 

etc.


Comments

Popular posts from this blog

javascript - Create a stacked percentage column -

vue.js - Create hooks for automated testing -

Optimising Firebase database by automatically overwriting data -