condor_shadow Exit Codes

When a condor_shadow daemon exits, the condor_shadow exit code is recorded in the condor_schedd log, and it identifies why the job exited. Prose in the log appears of the form

Shadow pid XXXXX for job XX.X exited with status YYY

where YYY is the exit code, or

Shadow pid XXXXX for job XX.X reports job exit reason 100.

where the exit code is the value 100. The following table lists these codes:

Value

Error Name

Description

4

JOB_EXCEPTION

the job exited with an exception

44

DPRINTF_ERROR

there was a fatal error with dprintf()

100

JOB_EXITED

the job exited (not killed)

101

JOB_CKPTED

no longer used

102

JOB_KILLED

the job was killed

103

JOB_COREDUMPED

the job was killed and a core file was produced

105

JOB_NO_MEM

not enough memory to start the condor_shadow

106

JOB_SHADOW_USAGE

incorrect arguments to condor_shadow

107

JOB_NOT_CKPTED

no longer used

107

JOB_SHOULD_REQUEUE

same number as JOB_NOT_CKPTED, to achieve the same behavior. This exit code implies that we want the job to be put back in the job queue and run again.

108

JOB_NOT_STARTED

can not connect to the condor_startd or request refused

109

JOB_BAD_STATUS

job status != RUNNING on start up

110

JOB_EXEC_FAILED

exec failed for some reason other than ENOMEM

111

JOB_NO_CKPT_FILE

no longer used

112

JOB_SHOULD_HOLD

the job should be put on hold

113

JOB_SHOULD_REMOVE

the job should be removed

114

JOB_MISSED_DEFERRAL_TIME

the job goes on hold, because it did not run within the specified window of time

115

JOB_EXITED_AND_CLAIM_CLOSING

the job exited (not killed) but the condor_startd is not accepting any more jobs on this claim

116

JOB_RECONNECT_FAILED

the condor_shadow was started in reconnect mode, and yet failed to reconnect to the starter