Job Event Log Codes

Table: Event Codes in a Job Event Log

Code

Constant Name

Description

000

SUBMIT

This event occurs when a user submits a job. It is the first event you will see for a job, and it should only occur once.

001

EXECUTE

This shows up when a job is running. It might occur more than once.

002

EXECUTABLE_ERROR

The job could not be run because the executable was bad.

003

CHECKPOINTED

No longer used.

004

JOB_EVICTED

A job was removed from a machine before it finished, usually for a policy reason. Perhaps an interactive user has claimed the computer, or perhaps another job is higher priority.

005

JOB_TERMINATED

The job has completed.

006

IMAGE_SIZE

An informational event, to update the amount of memory that the job is using while running. It does not reflect the state of the job.

007

SHADOW_EXCEPTION

The condor_shadow, a program on the submit computer that watches over the job and performs some services for the job, failed for some catastrophic reason. The job will leave the machine and go back into the queue.

009

JOB_ABORTED

The user canceled the job.

010

JOB_SUSPENDED

The job is still on the computer, but it is no longer executing. This is usually for a policy reason, such as an interactive user using the computer.

011

JOB_UNSUSPENDED

The job has resumed execution, after being suspended earlier.

012

JOB_HELD

The job has transitioned to the hold state. This might happen if the user applies the condor_hold command to the job.

013

JOB_RELEASED

The job was in the hold state and is to be re-run.

014

NODE_EXECUTE

A parallel universe program is running on a node.

015

NODE_TERMINATED

A parallel universe program has completed on a node.

016

POST_SCRIPT_TERMINATED

A node in a DAGMan work flow has a script that should be run after a job. The script is run on the submit host. This event signals that the post script has completed.

021

REMOTE_ERROR

The condor_starter (which monitors the job on the execution machine) has failed.

022

JOB_DISCONNECTED

The condor_shadow and condor_starter (which communicate while the job runs) have lost contact.

023

JOB_RECONNECTED

The condor_shadow and condor_starter (which communicate while the job runs) have been able to resume contact before the job lease expired.

024

JOB_RECONNECT_FAILED

The condor_shadow and condor_starter (which communicate while the job runs) were unable to resume contact before the job lease expired.

025

GRID_RESOURCE_UP

A grid resource that was previously unavailable is now available.

026

GRID_RESOURCE_DOWN

The grid resource that a job is to run on is unavailable.

027

GRID_SUBMIT

A job has been submitted, and is under the auspices of the grid resource.

028

JOB_AD_INFORMATION

Extra job ClassAd attributes are noted. This event is written as a supplement to other events when the configuration parameter EVENT_LOG_JOB_AD_INFORMATION_ATTRS is set.

029

JOB_STATUS_UNKNOWN

No updates of the job’s remote status have been received for 15 minutes.

030

JOB_STATUS_KNOWN

An update has been received for a job whose remote status was previous logged as unknown.

031

JOB_STAGE_IN

A grid universe job is doing the stage in of input files.

032

JOB_STAGE_OUT

A grid universe job is doing the stage out of output files.

033

ATTRIBUTE_UPDATE

A Job ClassAd attribute is changed due to action by the condor_schedd daemon. This includes changes by condor_prio.

034

PRESKIP

For DAGMan, this event is logged if a PRE SCRIPT exits with the defined PRE_SKIP value in the DAG input file. This makes it possible for DAGMan to do recovery in a workflow that has such an event, as it would otherwise not have any event for the DAGMan node to which the script belongs, and in recovery, DAGMan’s internal tables would become corrupted.

035

CLUSTER_SUBMIT

This event occurs when a user submits a cluster with multiple procs.

036

CLUSTER_REMOVE

This event occurs after all the jobs in a multi-proc cluster have completed, or when the cluster is removed (by condor_rm).

037

FACTORY_PAUSED

This event occurs when job materialization for a cluster has been paused.

038

FACTORY_RESUMED

This event occurs when job materialization for a cluster has been resumed

039

NONE

This event should never occur in a log but may be returned by log reading code in certain situations (e.g., timing out while waiting for a new event to appear in the log).

040

FILE_TRANSFER

This event occurs when a file transfer event occurs: transfer queued, transfer started, or transfer finished, for both the input and output sandboxes.

047

COMMON_FILES

This event occurs when a common files event occurs: common files transfer queued, started, or finished; or when waiting for common files to transfer started or finished.