Quick Reference

DAG Commands

General

INCLUDE (see Full Description)

Parse the provided file as if it was inline to the current file.

INCLUDE filename
JOB (see Full Description)

Create a normal DAG node to execute a specified HTCondor job.

JOB NodeName SubmitDescription [DIR directory] [NOOP] [DONE]
PARENT/CHILD (see Full Description)

Create dependencies between two or more DAG nodes.

PARENT ParentNodeName [ParentNodeName2 ... ] CHILD  ChildNodeName [ChildNodeName2 ... ]
SPLICE (see Full Description)

Incorporate the specified DAG file into the structure of another DAG.

SPLICE SpliceName DagFileName [DIR directory]
SUBDAG (see Full Description)

Specify a DAG workflow to be submitted by condor_submit_dag and managed by a parent DAG.

SUBDAG EXTERNAL JobName DagFileName [DIR directory] [NOOP] [DONE]
SUBMIT-DESCRIPTION (see Full Description)

Create an inline job submit description that can be applied to multiple DAG nodes.

SUBMIT-DESCRIPTION DescriptionName {
    # submit attributes go here
}

Node Behavior

DONE

Mark a DAG node as done causing neither the associated job or scripts to execute.

DONE NodeName
PRE_SKIP (see Full Description)

Inform DAGMan to skip the remaining node execution if that nodes specified PRE script exits with a specified code.

PRE_SKIP <NodeName | ALL_NODES> non-zero-exit-code
PRIORITY (see Full Description)

Assign a node priority to control DAGMan node submission.

PRIORITY <NodeName | ALL_NODES> PriorityValue
RETRY (see Full Description)

Inform DAGMan to retry a node up to a specified number of times when a failure occurs.

RETRY <NodeName | ALL_NODES> NumberOfRetries [UNLESS-EXIT value]
SCRIPT (see Full Description)

Apply a script to be executed on the AP for a specified node.

# PRE-Script
SCRIPT [DEFER status time] [DEBUG filename type] PRE <NodeName | ALL_NODES> ExecutableName [arguments]
# POST-Script
SCRIPT [DEFER status time] [DEBUG filename type] POST <NodeName | ALL_NODES> ExecutableName [arguments]
# HOLD-Script
SCRIPT [DEFER status time] [DEBUG filename type] HOLD <NodeName | ALL_NODES> ExecutableName [arguments]
VARS (see Full Description)

Specify a list of key=”Value” pairs of information to be applied to the specified node as referable submit macros.

VARS <NodeName | ALL_NODES> [PREPEND | APPEND] macroname="string" [macroname2="string2" ... ]

Special Nodes

FINAL (see Full Description)

Create a DAG node guaranteed to run at the end of a DAG regardless of successful or failed execution.

FINAL NodeName SubmitDescription [DIR directory] [NOOP]
PROVISIONER (see Full Description)

Create a DAG node responsible for provisioning resources to be utilized by other DAG nodes. Guaranteed to start before all other nodes.

PROVISIONER NodeName SubmitDescription
SERVICE (see Full Description)

Create a DAG node for specialized management/monitoring tasks. All service nodes are submitted prior to normal nodes.

SERVICE NodeName SubmitDescription

Throttling

CATEGORY (see Full Description)

Assign a specified node to a DAG category.

CATEGORY <NodeName | ALL_NODES> CategoryName
MAXJOBS (see Full Description)

Set the max number of submitted set of jobs for a specified CATEGORY

MAXJOBS CategoryName MaxJobsValue

DAG Control

ABORT-DAG-ON (see Full Description)

Inform DAGMan to write a rescue file and exit when specified node exits with the specified value.

ABORT-DAG-ON <NodeName | ALL_NODES> AbortExitValue [RETURN DAGReturnValue]
CONFIG (see Full Description)

Specify custom DAGMan configuration file for the DAG.

CONFIG filename
ENV (see Full Description)

Modify the DAGMan proper job’s environment by explicitly setting environment variables or filtering variables from the condor_submit_dags environment at submit time.

ENV GET VAR-1 [VAR-2 ... ]
#  or
ENV SET Key=Value;Key=Value; ...
SET_JOB_ATTR (see Full Description)

Set a ClassAd attribute in the DAGMan proper job’s ad.

SET_JOB_ATTR AttributeName = AttributeValue
REJECT

Mark the DAG input file as rejected to prevent execution.

REJECT

Special Files

DOT (see Full Description)

Inform DAGMan to produce a Graphiz Dot file for visualizing a DAG.

DOT filename [UPDATE | DONT-UPDATE] [OVERWRITE | DONT-OVERWRITE] [INCLUDE <dot-file-header>]
JOBSTATE_LOG (see Full Description)

Inform DAGMan to produce a machine-readable history file.

JOBSTATE_LOG filename
NODE_STATUS_FILE (see Full Description)

Inform DAGMan to produce a snapshot status file for the DAG nodes.

NODE_STATUS_FILE filename [minimumUpdateTime] [ALWAYS-UPDATE]
SAVE_POINT_FILE (see Full Description)

Inform DAGMan to write a save file the first time the specified node starts.

SAVE_POINT_FILE NodeName [Filename]

Produced Files

The following are always produced automatically by DAGMan on execution. Where the primary DAG is the only or first DAG file specified at submit time.

  1. condor_dagman scheduler universe job files:
    <Primary DAG>.condor.sub | DAGMan proper jobs submit description file.
    <Primary DAG>.dagman.log | DAGMan proper jobs event log file.
    <Primary DAG>.lib.out    | DAGMan proper jobs output file.
    <Primary DAG>.lib.err    | DAGMan proper jobs error file.
  2. DAGMan informational files:
    <Primary DAG>.dagman.out | DAGMan processes debug log file.
    <Primary DAG>.nodes.log  | Shared job event log file for all jobs managed by DAGMan (Heart of DAGMan).
    <Primary DAG>.metrics    | JSON formatted file containing DAGMan metrics outputted at DAGMan exit.
    
  3. Other:
    <Primary DAG>.rescue<XXX> | Rescue DAG file denoting completed work from previous execution (see The Rescue DAG).

Referable DAG Information

DAGMan provides various pieces of DAG information to scripts and jobs in the form of special referable macros and job ClassAd attributes.

Job Macros

Macros referable by job submit description as $(<macro>)

JOB          | Name of the node this job is associated with.
RETRY        | Current node retry attempt value. Set to 0 on first execution.
FAILED_COUNT | Number of failed nodes currently in the DAG (intended for Final Node).
DAG_STATUS   | Current DAG_Status (intended for Final Node).

Job ClassAd Attributes

ClassAd attributes added to the job ad of all jobs managed by DAGMan.

DAGManJobId        | Job-Id of the DAGMan job that submitted this job.
DAGNodeName        | The node name of which this job belongs.
DAGManNodeRetry    | The nodes current retry number. First execution is 0. This is only included if DAGMAN_NODE_RECORD_INFO includes Retry.
DAGParentNodeNames | List of parent node names. Note depending on the number of parent nodes this may be left empty.
DAG_Status         | Current DAG status (Intended for Final Nodes).

Script Macros

Macros that can be passed to a script as optional arguments like $<macro>

For All Scripts:
    JOB               | Name of the node this script is associated with.
    RETRY             | The nodes current retry number. Set to 0 on first execution.
    MAX_RETRIES       | Maximum number of retries allowed for the node.
    FAILED_COUNT      | Current number of failed nodes in the DAG.
    DAG_STATUS        | The current DAG_Status.
Only for POST Scripts:
    JOBID             | The Job-Id of the job executed by node. It is the ClusterId and ProcId of the last job in the set.
    RETURN            | The exit code of the first failed job in the set or 0 for a successful job execution.
    PRE_SCRIPT_RETURN | Return value of the associated node's PRE Script.

DAG Submission and Management

For more in depth explanation of controlling a DAG see Basic DAG Controls

DAG Submission

To submit a DAGMan workflow simply use condor_submit_dag an input file describing the DAG.

$ condor_submit_dag diamond.dag

DAG Monitoring

All the jobs managed by the DAG and the DAGMan proper job itself can be monitored with the tools listed below. condor_q by default returns a condensed overview of jobs managed by DAGMan currently in the queue. To see all jobs individually use the -nobatch flag.

condor_q

condor_watch_q

htcondor dag status

Stopping a DAG

Pause/Restart

A DAG can temporarily be stopped by using condor_hold on the DAGMan proper job. To restart the DAG simply use condor_release.

Remove

To remove a DAG simply use condor_rm on the DAGMan proper job.