DAGMan Introduction

Describing Workflows with DAGMan

A DAGMan workflow is described in a DAG input file. The input file specifies the nodes of the DAG as well as the dependencies that order the DAG.

A node within a DAG represents a unit of work. It contains the following:

  • Job: An HTCondor job, defined in a submit file.

  • PRE script (optional): A script that runs before the job starts. Typically used to verify that all inputs are valid.

  • POST script (optional): A script that runs after the job finishes. Typically used to verify outputs and clean up temporary files.

The following diagram illustrates the elements of a node – every node must contain a job, with an optional pre and an optional post script.

flowchart LR Start((Start)) --> Job Start --> PREscript subgraph DAG Node PREscript --> Job Job --> POSTscript end Job --> End((End)) POSTscript --> End((End))

An edge in DAGMan describes a dependency between two nodes. DAG edges are directional; each has a parent and a child, where the parent node must finish running before the child starts. Any node can have an unlimited number of parents and children.

Example: Diamond DAG

A simple diamond-shaped DAG, as shown in the following image is presented as a starting point for examples. This diamond DAG contains 4 nodes. This diamond-shaped DAG would be described like the following in the DAG input file:

# File name: diamond.dag

JOB  A  A.sub
JOB  B  B.sub
JOB  C  C.sub
JOB  D  D.sub
PARENT A CHILD B C
PARENT B C CHILD D

JOB

The JOB command specifies an HTCondor job. The syntax used for each JOB command is:

JOB JobName SubmitDescriptionFileName [DIR directory] [NOOP] [DONE]

A JOB entry maps a JobName to an HTCondor submit description. The JobName uniquely identifies nodes within the DAG input file and in output messages. Each node name, given by JobName, within the DAG must be unique.

The values defined for JobName and SubmitDescriptionFileName are case sensitive, as file names in a file system are case sensitive. The JobName can be any string that contains no white space, except for the strings PARENT and CHILD (in upper, lower, or mixed case). JobName also cannot contain special characters (. & +) which are reserved for system use.

The optional DIR keyword specifies a working directory for this node, from which the HTCondor job will be submitted, and from which a PRE and/or POST script will be run. If a relative directory is specified, it is relative to the current working directory as the DAG is submitted.

Note

DAG containing DIR specifications cannot be run in conjunction with the -usedagdir command-line argument to condor_submit_dag.

The optional NOOP keyword identifies a no-operation node. Meaning the nodes job will not be submitted to HTCondor. DAGMan will still execute any PRE and/or POST scripts associated with the node. Marking a node with NOOP is useful for debugging complex DAG structures without changing the flow of the DAG.

The optional DONE keyword identifies a node as being already completed. Meaning neither the nodes job nor scripts will be executed. This is mainly used by Rescue DAGs generated by DAGMan itself, in the event of a failure to complete the workflow.

PARENT/CHILD Relationships

The PARENT … CHILD … command specifies the dependencies within the DAG. Nodes are parents and/or children within the DAG. A parent node must be completed successfully before any of its children may be started. A child node may only be started once all its parents have successfully completed.

The syntax used for each dependency (PARENT/CHILD) command is

PARENT ParentJobName [ParentJobName2 ... ] CHILD  ChildJobName [ChildJobName2 ... ]

The PARENT keyword is followed by one or more ParentJobName(s). The CHILD keyword is followed by one or more ChildJobName(s). Each child job depends on every parent job within the line. A single line in the input file can specify the dependencies from one or more parents to one or more children. The diamond-shaped DAG example may specify the dependencies with

PARENT A CHILD B C
PARENT B C CHILD D

An alternative specification for the diamond-shaped DAG may specify some or all of the dependencies on separate lines:

PARENT A CHILD B C
PARENT B CHILD D
PARENT C CHILD D

As a further example, the following line produces four dependencies:

PARENT p1 p2 CHILD c1 c2
flowchart TD p1 & p2 --> c1 & c2

Node Job Submit File Contents

Inline Submit Descriptions

Instead of using a submit description file, you can alternatively include an inline submit description directly inside the .dag file. An inline submit description should be wrapped in { and } braces, with each argument appearing on a separate line, just like the contents of a regular submit file. Using the previous diamond-shaped DAG example, the diamond.dag file would look like this:

This can be helpful when trying to manage lots of submit descriptions, so they can all be described in the same file instead of needed to regularly shift between many files.

SUBMIT-DESCRIPTION command

In addition to declaring inline submit descriptions as part of a job, they can be declared independently of jobs using the SUBMIT-DESCRIPTION command. This can be helpful to reduce the size and readability of a .dag file when many nodes are running the same job.

A SUBMIT-DESCRIPTION can be defined using the following syntax:

SUBMIT-DESCRIPTION DescriptionName {
    # submit attributes go here
}

An independently declared submit description must have a unique name that is not used by any of the jobs. It can then be linked to a job as follows:

JOB JobName DescriptionName
classDiagram DiamondDesc --|> A DiamondDesc --|> B DiamondDesc --|> C DiamondDesc --|> D class DiamondDesc{ executable = /path/diamond.exe output = diamond.out.$(cluster) error = diamond.err.$(cluster) log = diamond_condor.log universe = vanilla request_cpus = 1 request_memory = 1024M request_disk = 10240K }

One SUBMIT-DESCRIPTION Applied to Multiple Nodes

Note

Both inline submit descriptions and the SUBMIT-DESCRIPTION command don’t allow a queue statement resulting in only a single instance of the job being submitted to HTCondor.

External File Descriptions

Each node in a DAG may use a unique submit description file. A key limitation is that each HTCondor submit description file must submit jobs described by a single cluster number; DAGMan cannot deal with a submit description file producing multiple job clusters.

Consider again the diamond-shaped DAG example, where each node job uses the same submit description file. Since each node uses the same HTCondor submit description file, this implies that each node within the DAG runs the same job. The $(Cluster) macro produces unique file names for each job’s output.

The job ClassAd attribute DAGParentNodeNames is also available for use within the submit description file. It defines a comma separated list of each JobName which is a parent node of this job’s node. This attribute may be used in the arguments command for all but scheduler universe jobs. For example, if the job has two parents, with JobNames B and C, the submit description file command

arguments = $$([DAGParentNodeNames])

will pass the string "B,C" as the command line argument when invoking the job.