Job Event Log Codes

Table B.2 lists codes that appear as the first

These are all of the events that can show up in a job log file:

Event Number: 000
Event Name: Job submitted
Event Description: This event occurs when a user submits a job. It is the first event you will see for a job, and it should only occur once.
Event Number: 001
Event Name: Job executing
Event Description: This shows up when a job is running. It might occur more than once.
Event Number: 002
Event Name: Error in executable
Event Description: The job could not be run because the executable was bad.
Event Number: 003
Event Name: Job was checkpointed
Event Description: The job’s complete state was written to a checkpoint file. This might happen without the job being removed from a machine, because the checkpointing can happen periodically.
Event Number: 004
Event Name: Job evicted from machine
Event Description: A job was removed from a machine before it finished, usually for a policy reason. Perhaps an interactive user has claimed the computer, or perhaps another job is higher priority.
Event Number: 005
Event Name: Job terminated
Event Description: The job has completed.
Event Number: 006
Event Name: Image size of job updated
Event Description: An informational event, to update the amount of memory that the job is using while running. It does not reflect the state of the job.
Event Number: 007
Event Name: Shadow exception
Event Description: The condor_shadow, a program on the submit computer that watches over the job and performs some services for the job, failed for some catastrophic reason. The job will leave the machine and go back into the queue.
Event Number: 008
Event Name: Generic log event
Event Description: Not used.
Event Number: 009
Event Name: Job aborted
Event Description: The user canceled the job.
Event Number: 010
Event Name: Job was suspended
Event Description: The job is still on the computer, but it is no longer executing. This is usually for a policy reason, such as an interactive user using the computer.
Event Number: 011
Event Name: Job was unsuspended
Event Description: The job has resumed execution, after being suspended earlier.
Event Number: 012
Event Name: Job was held
Event Description: The job has transitioned to the hold state. This might happen if the user applies the condor_hold command to the job.
Event Number: 013
Event Name: Job was released
Event Description: The job was in the hold state and is to be re-run.
Event Number: 014
Event Name: Parallel node executed
Event Description: A parallel universe program is running on a node.
Event Number: 015
Event Name: Parallel node terminated
Event Description: A parallel universe program has completed on a node.
Event Number: 016
Event Name: POST script terminated
Event Description: A node in a DAGMan work flow has a script that should be run after a job. The script is run on the submit host. This event signals that the post script has completed.
Event Number: 021
Event Name: Remote error
Event Description: The condor_starter (which monitors the job on the execution machine) has failed.
Event Number: 022
Event Name: Remote system call socket lost
Event Description: The condor_shadow and condor_starter (which communicate while the job runs) have lost contact.
Event Number: 023
Event Name: Remote system call socket reestablished
Event Description: The condor_shadow and condor_starter (which communicate while the job runs) have been able to resume contact before the job lease expired.
Event Number: 024
Event Name: Remote system call reconnect failure
Event Description: The condor_shadow and condor_starter (which communicate while the job runs) were unable to resume contact before the job lease expired.
Event Number: 025
Event Name: Grid Resource Back Up
Event Description: A grid resource that was previously unavailable is now available.
Event Number: 026
Event Name: Detected Down Grid Resource
Event Description: The grid resource that a job is to run on is unavailable.
Event Number: 027
Event Name: Job submitted to grid resource
Event Description: A job has been submitted, and is under the auspices of the grid resource.
Event Number: 028
Event Name: Job ad information event triggered.
Event Description: Extra job ClassAd attributes are noted. This event is written as a supplement to other events when the configuration parameter EVENT_LOG_JOB_AD_INFORMATION_ATTRS is set.
Event Number: 029
Event Name: The job’s remote status is unknown
Event Description: No updates of the job’s remote status have been received for 15 minutes.
Event Number: 030
Event Name: The job’s remote status is known again
Event Description: An update has been received for a job whose remote status was previous logged as unknown.
Event Number: 031
Event Name: Job stage in
Event Description: A grid universe job is doing the stage in of input files.
Event Number: 032
Event Name: Job stage out
Event Description: A grid universe job is doing the stage out of output files.
Event Number: 033
Event Name: Job ClassAd attribute update
Event Description: A Job ClassAd attribute is changed due to action by the condor_schedd daemon. This includes changes by condor_prio.
Event Number: 034
Event Name: Pre Skip event
Event Description: For DAGMan, this event is logged if a PRE SCRIPT exits with the defined PRE_SKIP value in the DAG input file. This makes it possible for DAGMan to do recovery in a workflow that has such an event, as it would otherwise not have any event for the DAGMan node to which the script belongs, and in recovery, DAGMan’s internal tables would become corrupted.
Event Number: 035
Event Name: Cluster Submit
Event Description: This event occurs when a user submits a cluster with multiple procs.
Event Number: 036
Event Name: Cluster Remove
Event Description: This event occurs after all the jobs in a multi-proc cluster have completed, or when the cluster is removed (by condor_rm).
Event Number: 037
Event Name: Factory Paused
Event Description: This event occurs when job materialization for a cluster has been paused.
Event Number: 038
Event Name: Factory Resumed
Event Description: This event occurs when job materialization for a cluster has been resumed
Event Number: 039
Event Name: None
Event Description: This event should never occur in a log but may be returned by log reading code in certain situations (e.g., timing out while waiting for a new event to appear in the log).
Event Number: 040
Event Name: File Transfer
Event Description: This event occurs when a file transfer event occurs: transfer queued, transfer started, or transfer finished, for both the input and output sandboxes.

Table B.2: Event Codes in a Job Event Log

001

EXECUTE

Execute

002

EXECUTABLE_ERROR

Executable error

003

CHECKPOINTED

Checkpointed

004

JOB_EVICTED

Job evicted

005

JOB_TERMINATED

Job terminated

006

IMAGE_SIZE

Image size

007

SHADOW_EXCEPTION

Shadow exception

009

JOB_ABORTED

Job aborted

010

JOB_SUSPENDED

Job suspended

011

JOB_UNSUSPENDED

Job unsuspended

012

JOB_HELD

Job held

013

JOB_RELEASED

Job released

014

NODE_EXECUTE

Node execute

015

NODE_TERMINATED

Node terminated

016

POST_SCRIPT_TERMINATED

Post script terminated

021

REMOTE_ERROR

Remote error

022

JOB_DISCONNECTED

Job disconnected

023

JOB_RECONNECTED

Job reconnected

024

JOB_RECONNECT_FAILED

Job reconnect failed

025

GRID_RESOURCE_UP

Grid resource up

026

GRID_RESOURCE_DOWN

Grid resource down

027

GRID_SUBMIT

Grid submit

028

JOB_AD_INFORMATION

Job ClassAd attribute values added to event log

029

JOB_STATUS_UNKNOWN

Job status unknown

030

JOB_STATUS_KNOWN

Job status known

031

JOB_STAGE_IN

Grid job stage in

032

JOB_STAGE_OUT

Grid job stage out

033

ATTRIBUTE_UPDATE

Job ClassAd attribute update

034

PRESKIP

DAGMan PRE_SKIP defined

035

CLUSTER_SUBMIT

Cluster submitted

036

CLUSTER_REMOVE

Cluster removed

037

FACTORY_PAUSED

Factory paused

038

FACTORY_RESUMED

Factory resumed

039

NONE

No event could be returned

040

FILE_TRANSFER

File transfer