Job Event Log Codes
Table B.2 lists codes that appear as the first
These are all of the events that can show up in a job log file:
Event Number: 000
Event Name: Job submitted
Event Description: This event occurs when a user submits a job. It
is the first event you will see for a job, and it should only occur
once.
Event Number: 001
Event Name: Job executing
Event Description: This shows up when a job is running. It might
occur more than once.
Event Number: 002
Event Name: Error in executable
Event Description: The job could not be run because the executable
was bad.
Event Number: 003
Event Name: Job was checkpointed
Event Description: No longer used.
Event Number: 004
Event Name: Job evicted from machine
Event Description: A job was removed from a machine before it
finished, usually for a policy reason. Perhaps an interactive user has
claimed the computer, or perhaps another job is higher priority.
Event Number: 005
Event Name: Job terminated
Event Description: The job has completed.
Event Number: 006
Event Name: Image size of job updated
Event Description: An informational event, to update the amount of
memory that the job is using while running. It does not reflect the
state of the job.
Event Number: 007
Event Name: Shadow exception
Event Description: The condor_shadow, a program on the submit
computer that watches over the job and performs some services for the
job, failed for some catastrophic reason. The job will leave the machine
and go back into the queue.
Event Number: 008
Event Name: Generic log event
Event Description: Not used.
Event Number: 009
Event Name: Job aborted
Event Description: The user canceled the job.
Event Number: 010
Event Name: Job was suspended
Event Description: The job is still on the computer, but it is no
longer executing. This is usually for a policy reason, such as an
interactive user using the computer.
Event Number: 011
Event Name: Job was unsuspended
Event Description: The job has resumed execution, after being
suspended earlier.
Event Number: 012
Event Name: Job was held
Event Description: The job has transitioned to the hold state.
This might happen if the user applies the condor_hold command to the
job.
Event Number: 013
Event Name: Job was released
Event Description: The job was in the hold state and is to be
re-run.
Event Number: 014
Event Name: Parallel node executed
Event Description: A parallel universe program is running on a
node.
Event Number: 015
Event Name: Parallel node terminated
Event Description: A parallel universe program has completed on a
node.
Event Number: 016
Event Name: POST script terminated
Event Description: A node in a DAGMan work flow has a script that
should be run after a job. The script is run on the submit host. This
event signals that the post script has completed.
Event Number: 021
Event Name: Remote error
Event Description: The condor_starter (which monitors the job
on the execution machine) has failed.
Event Number: 022
Event Name: Remote system call socket lost
Event Description: The condor_shadow and condor_starter
(which communicate while the job runs) have lost contact.
Event Number: 023
Event Name: Remote system call socket reestablished
Event Description: The condor_shadow and condor_starter
(which communicate while the job runs) have been able to resume contact
before the job lease expired.
Event Number: 024
Event Name: Remote system call reconnect failure
Event Description: The condor_shadow and condor_starter
(which communicate while the job runs) were unable to resume contact
before the job lease expired.
Event Number: 025
Event Name: Grid Resource Back Up
Event Description: A grid resource that was previously unavailable
is now available.
Event Number: 026
Event Name: Detected Down Grid Resource
Event Description: The grid resource that a job is to run on is
unavailable.
Event Number: 027
Event Name: Job submitted to grid resource
Event Description: A job has been submitted, and is under the
auspices of the grid resource.
Event Number: 028
Event Name: Job ad information event triggered.
Event Description: Extra job ClassAd attributes are noted. This
event is written as a supplement to other events when the configuration
parameter EVENT_LOG_JOB_AD_INFORMATION_ATTRS
is set.
Event Number: 029
Event Name: The job’s remote status is unknown
Event Description: No updates of the job’s remote status have been
received for 15 minutes.
Event Number: 030
Event Name: The job’s remote status is known again
Event Description: An update has been received for a job whose
remote status was previous logged as unknown.
Event Number: 031
Event Name: Job stage in
Event Description: A grid universe job is doing the stage in of
input files.
Event Number: 032
Event Name: Job stage out
Event Description: A grid universe job is doing the stage out of
output files.
Event Number: 033
Event Name: Job ClassAd attribute update
Event Description: A Job ClassAd attribute is changed due to
action by the condor_schedd daemon. This includes changes by
condor_prio.
Event Number: 034
Event Name: Pre Skip event
Event Description: For DAGMan, this event is logged if a PRE
SCRIPT exits with the defined PRE_SKIP value in the DAG input file.
This makes it possible for DAGMan to do recovery in a workflow that has
such an event, as it would otherwise not have any event for the DAGMan
node to which the script belongs, and in recovery, DAGMan’s internal
tables would become corrupted.
Event Number: 035
Event Name: Cluster Submit
Event Description: This event occurs when a user submits a cluster
with multiple procs.
Event Number: 036
Event Name: Cluster Remove
Event Description: This event occurs after all the jobs in a multi-proc
cluster have completed, or when the cluster is removed (by condor_rm).
Event Number: 037
Event Name: Factory Paused
Event Description: This event occurs when job materialization for
a cluster has been paused.
Event Number: 038
Event Name: Factory Resumed
Event Description: This event occurs when job materialization for
a cluster has been resumed
Event Number: 039
Event Name: None
Event Description: This event should never occur in a log but may
be returned by log reading code in certain situations (e.g., timing out
while waiting for a new event to appear in the log).
Event Number: 040
Event Name: File Transfer
Event Description: This event occurs when a file transfer event
occurs: transfer queued, transfer started, or transfer finished, for
both the input and output sandboxes.
Table B.2: Event Codes in a Job Event Log
001 |
EXECUTE |
Execute |
002 |
EXECUTABLE_ERROR |
Executable error |
003 |
CHECKPOINTED |
no longer used |
004 |
JOB_EVICTED |
Job evicted |
005 |
JOB_TERMINATED |
Job terminated |
006 |
IMAGE_SIZE |
Image size |
007 |
SHADOW_EXCEPTION |
Shadow exception |
009 |
JOB_ABORTED |
Job aborted |
010 |
JOB_SUSPENDED |
Job suspended |
011 |
JOB_UNSUSPENDED |
Job unsuspended |
012 |
JOB_HELD |
Job held |
013 |
JOB_RELEASED |
Job released |
014 |
NODE_EXECUTE |
Node execute |
015 |
NODE_TERMINATED |
Node terminated |
016 |
POST_SCRIPT_TERMINATED |
Post script terminated |
021 |
REMOTE_ERROR |
Remote error |
022 |
JOB_DISCONNECTED |
Job disconnected |
023 |
JOB_RECONNECTED |
Job reconnected |
024 |
JOB_RECONNECT_FAILED |
Job reconnect failed |
025 |
GRID_RESOURCE_UP |
Grid resource up |
026 |
GRID_RESOURCE_DOWN |
Grid resource down |
027 |
GRID_SUBMIT |
Grid submit |
028 |
JOB_AD_INFORMATION |
Job ClassAd attribute values added to event log |
029 |
JOB_STATUS_UNKNOWN |
Job status unknown |
030 |
JOB_STATUS_KNOWN |
Job status known |
031 |
JOB_STAGE_IN |
Grid job stage in |
032 |
JOB_STAGE_OUT |
Grid job stage out |
033 |
ATTRIBUTE_UPDATE |
Job ClassAd attribute update |
034 |
PRESKIP |
DAGMan PRE_SKIP defined |
035 |
CLUSTER_SUBMIT |
Cluster submitted |
036 |
CLUSTER_REMOVE |
Cluster removed |
037 |
FACTORY_PAUSED |
Factory paused |
038 |
FACTORY_RESUMED |
Factory resumed |
039 |
NONE |
No event could be returned |
040 |
FILE_TRANSFER |
File transfer |