condor_shadow Exit Codes
When a condor_shadow daemon exits, the condor_shadow exit code is recorded in the condor_schedd log, and it identifies why the job exited. Prose in the log appears of the form
Shadow pid XXXXX for job XX.X exited with status YYY
where YYY
is the exit code, or
Shadow pid XXXXX for job XX.X reports job exit reason 100.
where the exit code is the value 100. The following table lists these codes:
Value |
Error Name |
Description |
4 |
JOB_EXCEPTION |
the job exited with an exception |
44 |
DPRINTF_ERROR |
there was a fatal error with dprintf() |
100 |
JOB_EXITED |
the job exited (not killed) |
101 |
JOB_CKPTED |
no longer used |
102 |
JOB_KILLED |
the job was killed |
103 |
JOB_COREDUMPED |
the job was killed and a core file was produced |
105 |
JOB_NO_MEM |
not enough memory to start the condor_shadow |
106 |
JOB_SHADOW_USAGE |
incorrect arguments to condor_shadow |
107 |
JOB_NOT_CKPTED |
no longer used |
107 |
JOB_SHOULD_REQUEUE |
same number as JOB_NOT_CKPTED, to achieve the same behavior. This exit code implies that we want the job to be put back in the job queue and run again. |
108 |
JOB_NOT_STARTED |
can not connect to the condor_startd or request refused |
109 |
JOB_BAD_STATUS |
job status != RUNNING on start up |
110 |
JOB_EXEC_FAILED |
exec failed for some reason other than ENOMEM |
111 |
JOB_NO_CKPT_FILE |
no longer used |
112 |
JOB_SHOULD_HOLD |
the job should be put on hold |
113 |
JOB_SHOULD_REMOVE |
the job should be removed |
114 |
JOB_MISSED_DEFERRAL_TIME |
the job goes on hold, because it did not run within the specified window of time |
115 |
JOB_EXITED_AND_CLAIM_CLOSING |
the job exited (not killed) but the condor_startd is not accepting any more jobs on this claim |
116 |
JOB_RECONNECT_FAILED |
the condor_shadow was started in reconnect mode, and yet failed to reconnect to the starter |