Advance DAGMan Functionality

Custom Job Macros for Nodes

HTCondor has the ability for submit description files to include custom macros $(macroname) that can be set at submit time by passing key=value pairs of information to condor_submit. DAGMan can be what told key=value pairs to pass at node job submit time allowing a single submit description to easily be used for multiple nodes in a DAG with variance.

Macro Variables for Nodes

The VARS DAG command in the DAG description file defines variables using a key=value syntax. These variables can then be used in the job description file. The complete VARS lines looks like:

VARS <NodeName | ALL_NODES> [PREPEND | APPEND] macroname="string" [macroname2="string2" ... ]

A macroname may contain alphanumeric characters (a-z, A-Z, and 0-9) and the underscore character. A restriction is that the macroname itself cannot begin with the string queue, in any combination of upper or lower case letters.

Correct syntax requires that the value string must be enclosed in double quotes. To use a double quote mark within a string, escape the double quote mark with the backslash character (\"). To add the backslash character itself, use two backslashes (\\).

A single VARS line may contain multiple space-separated key=value pairs. Alternatively a node can be specified in multiple VARS lines.

The use of VARS to provide information for submit description macros is very useful to reduce the number of submit files needed when multiple nodes have the same submit description with simple variance. The following example shows this behavior for a DAG with jobs that only vary in filenames.

Example DAG description utilizing VARS and a shared submit description file
# File: example.dag
JOB A shared.sub
JOB B shared.sub
JOB C shared.sub

VARS A filename="alpha"
VARS B filename="beta"
VARS C filename="charlie"
Example shared submit description file referencing DAG VARS
# Generic submit description: shared.sub
executable   = progX
output       = $(filename).out
error        = $(filename).err
log          = $(filename).log
queue

For a DAG such as above, but with thousands of nodes, the ability to write and maintain a single submit description file together with a single DAG description file is worthwhile.

Prepend or Append Variables to Node

The VARS command can take either the optional PREPEND or APPEND keyword to specify how the following variable information is passed to the node’s list of jobs at submission time.

  • APPEND will add the variable after the submit description is read. Resulting in the passed variable being added as a macro overwriting any already existing variable values.

  • PREPEND will add the variable before the submit description file is read. This allows the variable to be used in submit description conditionals.

For example, a DAG such as the following in conjunction with the submit description on the right will result in the jobs Arguments to be A was prepended and the output file being named results-B.out.

Example DAG description specifying VARS prepend/append
JOB A conditional.sub

VARS A PREPEND var1="A"
VARS A APPEND  var2="B"

If instead var1 used APPEND and var2 used PREPEND then Arguments will become No variables prepended and the output file will be named results-C.out.

Note

If neither PREPEND nor APPEND is used in the VARS line then the variable will either be prepended or appended based on the configuration variable DAGMAN_DEFAULT_APPEND_VARS.

Multiple macroname definitions

If a node has defined the same macroname multiple times in a DAG then a warning will be written to the log and the last defined instance will be used for the variables value. Given the following example, custom_macro will be set to bar and output the following warning message.

Example DAG description declaring the same VARS variable multiple times
# File: example.dag
JOB ONLY sample.sub
VARS ONLY custom_macro="foo"
VARS ONLY custom_macro="bar"
Warning: VAR custom_macro is already defined in node ONLY
Discovered at file "example.dag", line 4

Variables for Job Arguments

The value provided for a variable is capable of containing whitespace such as spaces and tabs, single and double quotes, and backslashes. To use these special characters in the arguments line for condor_submit use the appropriate syntax and/or character escaping mechanisms.

Note

Regardless of chosen arguments syntax, the variable value is surrounded in double quotes. Meaning proper double quote escaping must be provided to utilize double quotes in a node jobs arguments.

Single quotes can be used in three ways for arguments:

  • in Old Syntax, within a macro’s value specification

  • in New Syntax, within a macro’s value specification

  • in New Syntax only, to delimit an argument containing white space

  • in New Syntax only, escape a single quote with another to pass a single quote as part of an argument. Example provided in NodeA’s fourth macro (see right).

Provided the example DAG description file on the right, the following would occur:

  1. NodeA using the New Syntax:

    The following arguments line would produce the subsequent values passed to NodeA’s executable. The single quotes around each variable reference are only necessary if the variable value may contain spaces or tabs.

    arguments = "'$(first)' '$(second)' '$(third)' '($fourth)' '$(misc)'"
    
    Alberto Contador
    "Andy Schleck"
    Lance\ Armstrong
    Vincenzo 'The Shark' Nibali
    !@#$%^&*()_-=+=[]{}?/
    
  2. NodeB using the Old Syntax:

    The following arguments line would produce the subsequent values passed to NodeB’s executable.

    arguments = $(first) $(second) $(third) $(fourth) $(misc)
    
    Lance_Armstrong
    "Andreas_Kloden"
    Ivan_Basso
    Bernard_'The_Badger'_Hinault
    !@#$%^&*()_-=+=[]{}?/
    
  3. NodeC using the New Syntax for single quote delimiting:

    The following arguments line would produce the subsequent values passed to NodeC’s executable.

    arguments = "$(args)"
    
    Nairo Quintana
    Chris Froome
    

Referencing Macros Within a Definition

The variables value can contain an HTCondor Job Description Language (JDL) macro expansion $(<macroname>) allowing for the DAGMan provided macros to utilize other existing macros like the following:

Example DAG description creating expandable macros with DAG VARS
# File: example.dag
JOB A sample.sub
VARS A test_case="$(JOB)-$(ClusterId)"
Example submit description file
# File: sample.sub
executable = progX
arguments  = $(args)
output     = $(test_case).out
error      = $(test_case).err
log        = $(test_case).log

queue

Given the example listed above, if the list of jobs ClusterId is 42 then the output file would be A-42.out, the error file would be A-42.err, and the log file would be A-42.log.

Using VARS to Define ClassAd Attributes

The macroname may also begin with a My., in which case it names a ClassAd attribute. For example, the VARS specification

VARS NodeA My.name="\"Greg\""

results in the the NodeA job ClassAd attribute

A = "Greg"

Special Node Types

While most DAGMan nodes are the standard JOB type that run work jobs and possibly a PRE or POST script, special nodes can be specified in the DAG submit description to help manage the DAG and its resources in various ways.

FINAL Node

The FINAL node is a single and special node that is always run at the end of the DAG, even if previous nodes in the DAG have failed or the DAG is removed via condor_rm (On Unix systems). The FINAL node can be used for tasks such as cleaning up intermediate files and checking the output of previous nodes. To declare a FINAL node simply use the following syntax for the FINAL command:

FINAL NodeName SubmitDescription [DIR directory] [NOOP]

Like the JOB command the FINAL command produces a node with name NodeName and an associated submit description. The DIR and NOOP keywords work exactly like they do detailed in the JOB command.

Warning

There can only be one FINAL node in a DAG. If multiple are defined then DAGMan will log a parse error and fail.

The success or failure of the FINAL node determines the success or failure of the entire DAG. This includes any status specified by any ABORT-DAG-ON specification that has taken effect. If some nodes of a DAG fail, but the FINAL node succeeds, the DAG will be considered successful. Therefore, it is important to be careful about setting the exit status of the FINAL node.

The FINAL node can utilize the special macros DAG_STATUS and/or FAILED_COUNT in the job submit description or the script (PRE/POST) arguments to help determine the correct exit behavior of the FINAL node, and subsequently the DAG as a whole.

If DAGMan is removed via condor_rm then DAGMan will allow two submit attempts of the FINAL node (On Unix only).

PROVISIONER Node

The PROVISIONER node is a single and special node that is always run at the beginning of a DAG. It can be used to provision resources (i.e. Amazon EC2 instances, in-memory database servers, etc.) that can then be used by the remainder of the nodes in the workflow. The syntax used for the PROVISIONER command is

PROVISIONER NodeName SubmitDescription

Note

Unlike all other node’s in DAGMan, the PROVISIONER node is limited to running a single job. If more than one job is detected at the node’s job submission time DAGMan will exit without writing a Rescue file or running the FINAL node (if provided).

When the PROVISIONER node is defined in a DAG, DAGMan will run the PROVISIONER node before all other nodes and wait for the provisioner node’s job to state it is ready. To achieve this, the provisioner node’s job must set it’s job ClassAd attribute ProvisionerState to the enumerated value ProvisionerState::PROVISIONING_COMPLETE (currently: 2). Once notified, DAGMan will begin running the other nodes.

The PROVISIONER node runs for a set amount of time defined in its job. It does not get terminated automatically at the end of a DAG workflow. The expectation is that the job needs to explicitly de-provision any resources, such as expensive cloud computing instances that should not be allowed to run indefinitely.

Warning

Currently only one PROVISIONER node may exist for a DAG. If multiple are defined in a DAG then an error will be logged and the DAG will fail.

SERVICE Node

A SERVICE node is a special type of node that is always run at the beginning of a DAG. These are typically used to run tasks that need to run alongside a DAGMan workflow (i.e. progress monitoring) without any direct dependencies to the other nodes in the workflow.

The syntax used for the SERVICE command is

SERVICE NodeName SubmitDescription

If a DAGMan workflow finishes while there are SERVICE nodes still running, it will remove all running SERVICE nodes and exit.

While the SERVICE node is started before other nodes in the DAG, there is no guarantee that it will start running before any of the other nodes. However, running it directly on the access point by setting universe to Local will make it more likely to begin running prior to other nodes.

Note

A SERVICE node runs on a best-effort basis. If this node fails to submit correctly, this will not register as an error and the DAG workflow will continue normally.

Node Priorities

DAGMan workflows can assign a priority to a node in its DAG. Doing so will determine which nodes, who’s PARENT dependencies have completed, will be submitted. Just like the Job Priority for a job in the queue, the priority value is an integer (which can be negative). Where a larger numerical priority is better. The default priority is 0. To assign a nodes priority follow the syntax for the PRIORITY command as follows:

PRIORITY <NodeName | ALL_NODES> PriorityValue

Node priorities are most relevant when DAGMan Throttling is being utilized or if there are not enough resources in the pool to run all recently submitted node jobs.

Properties of Setting Node Priority

  • If a node priority is set, then at submission time DAGMan will set the JobPrio via priority. This is passed before processing the submit description.

  • When a Sub-DAG has an associated node PRIORITY, the Sub-DAG priority will affect all priorities for nodes in the Sub-DAG. See Effective node priorities.

  • Splices cannot be assigned priorities, but individual nodes within a splice can.

  • DAGs containing PRE scripts may not submit the nodes in exact priority order, even if doing so would satisfy the DAG dependencies.

Note

When using an external submit file for a node (not inline or submit-description), any declared priority take precedence over the DAGMan value passed at job submission time.

Note

Node priorities do not override DAG PARENT/CHILD dependencies and are not guarantees of the relative order in which node jobs are run.

Effective node priorities

When a Sub-DAG has an associated node priority, all of the node priorities within the Sub-DAG get modified to become the effective node priority. The effective node priority is calculated by adding the Sub-DAGs priority to each internal nodes priority. The default Sub-DAG priority is 0.

Example DAG description declaring a Sub-DAG with node priorities
# File: priorities.dag
JOB A sample.sub
SUBDAG EXTERNAL B lower.dag

PRIORITY A 25
PRIORITY B 100
Example sub-DAG description using node priorities
# File: lower.dag
JOB lowA sample.sub
JOB lowB sample.sub

PRIORITY lowA 10
PRIORITY lowB 50

Provided the DAGs described on the above, the effective node priorities (not including the Sub-DAG node B) are as follows:

Node

Effective Prio

A

25

lowA

110

lowB

150

DAGMan and Accounting Groups

condor_dagman will propagate it’s accounting_group and accounting_group_user values down to all nodes within the DAG (including Sub-DAGs). Any explicitly set accounting group information within DAGMan node submit descriptions will take precedence over the propagated accounting information. This allows for easy setting of accounting information for all DAG nodes while giving a way for specific nodes to run with different accounting information.

For more information about HTCondor’s accounting behavior see Group Accounting and/or Accounting Groups with Hierarchical Group Quotas.

ALL_NODES Option

Certain DAG description file commands take the alternative case insensitive keyword ALL_NODES in place of a specific node name. This allows for common node property to be applied to all nodes (excluding SERVICE and the FINAL node). The following commands can utilize ALL_NODES:

SCRIPT

PRE_SKIP

RETRY

VARS

PRIORITY

CATEGORY

ABORT-DAG-ON

When multiple commands set a DAG nodes property, the last one defined takes precedent overriding other earlier definitions. For example:

Example DAG description using ALL_NODES keyword
# File: sample.dag
JOB A node.sub
JOB B node.sub
JOB C node.sub

SCRIPT PRE ALL_NODES my_script $JOB

VARS A name="alphaNode"

VARS ALL_NODES name="$(JOB)"

# This overrides the above VARS command for node B.
VARS B name="nodeB"

RETRY all_nodes 3

RETRY A 10

INCLUDE

The INCLUDE command allows the contents of one DAG file to be parsed inline as if they were physically included in the referencing DAG file. The syntax for INCLUDE is

INCLUDE FileName

The INCLUDE command allows for easier DAG management and ability to easily change the DAG without losing the older setup. For example, a DAG could describe all the nodes to be executed in the workflow and include a file the describes the PARENT/CHILD relationships. If multiple different DAG structure files were created then by simply changing the INCLUDE line can modify the entire DAG structure without manually changing each line in between executions.

All INCLUDE files must contain proper DAG syntax. Included files can nested to any depth (be careful of creating a cycle).

Warning

INCLUDE does not modify node names like splicing which will result in a parse error if the same node name is used more than once.

DAG Manager Job Specifications

While most DAG commands modify/describe the DAG workflow and its various pieces, some commands modify the DAGMan proper job itself.

Setting Job Ad Attributes

The SET_JOB_ATTR command sets an attribute/value pair to be set in the DAGMan proper job’s ClassAd. The syntax is:

SET_JOB_ATTR AttributeName = AttributeValue

The SET_JOB_ATTR attribute is not propagated down to node jobs of the DAG.

The provided value can contain spaces when contained in single or double quotes. These quote marks will become part of the value.

If the same attribute is specified multiple times then the last-specified value is utilized. An attribute set in the DAG file can be overridden at submit time as follows:

Example setting DAGMan job ad attribute at submit time
$ condor_submit_dag -append 'My.<attribute> = <value>'

Controlling the Job Environment

The ENV command is used to specify environment variables to set into the DAGMan job’s environment or get from the environment that the DAGMan job was submitted from. It is important to know that the environment variables in the DAG manager job’s environment effect scripts and node jobs that rely environment variables since scripts and node jobs are submitted from the DAGMan job’s environment. The syntax is:

ENV GET VAR-1 [VAR-2 ... ]
#  or
ENV SET Key=Value;Key=Value; ...
  • GET Keyword:

    Takes a list of environment variable names to be added to the DAGMan job’s getenv command in the *.condor.sub file.

  • SET Keyword:

    Takes a semi-colon delimited list of key=value pairs of information to explicitly add to the DAGMan job’s environment command in the *.condor.sub file.

    Note

    The added key=value pairs must follow the normal HTCondor job environment rules.

DAG Specific Configuration

DAGMan allows for all DAGMan Configuration File Entries to be applied on a per DAG basis. To apply custom configuration for a DAGMan workflow simply create a custom configuration file to provide the the CONFIG command.

Only one configuration file is permitted per DAGMan process. If multiple DAGs are submitted at one time or a workflow is comprised of Splices then a fatal error will occur upon detection of more than one configuration file. Sub-DAGs run as their own DAGMan process allowing Sub-DAGs to have there own configuration files.

Custom configuration values are applied for the entire DAG workflow. So, if multiple DAGs are submitted at one time then all of the DAGs will use the custom configuration even though some DAGs didn’t specify a custom config file.

Note

Only configuration options that apply specifically to DAGMan or to DaemonCore (like debug log levels) take effect when added to a custom DAG configuration file.

Given there are many layers of configuration processing, and some condor_submit_dag options that have the same effect as a DAGMan configuration options, the values DAGMan uses is dictated by the following ordered list where elements processed later take precedence:

  1. HTCondor system configuration as set up by the AP administrator(s).

  2. Configuration options passed as special HTCondor environment variables _CONDOR_<config option>=Value.

  3. Custom configuration provided by the CONFIG command or condor_submit_dags -config option.

  4. condor_submit_dag options that control the same behavior as a configuration option such as DAGMAN_MAX_JOBS_SUBMITTED and -maxjobs.

Visualizing DAGs

To help visualize a DAG, DAGMan has the ability to create a dot input file for the AT&T Research Labs Graphiz package to draw the DAG. DAGMan will produce dot files when the DOT command is declared with the following syntax:

DOT filename [UPDATE | DONT-UPDATE] [OVERWRITE | DONT-OVERWRITE] [INCLUDE <dot-file-header>]

The DOT command can take several optional parameters as follows:

  • UPDATE This will update the dot file every time a significant update happens.

  • DONT-UPDATE Creates a single dot file, when the DAGMan begins executing. This is the default if the parameter UPDATE is not used.

  • OVERWRITE Overwrites the dot file each time it is created. This is the default, unless DONT-OVERWRITE is specified.

  • DONT-OVERWRITE Creates a new dot file each time one is written as <filename>.<num>. Where the number increases with each write such as dag.dot.0 to dag.dot.1.

  • INCLUDE Includes the contents of the specified file in the produced dot file after the graphs label line.