Advance DAGMan Functionality
Custom Job Macros for Nodes
HTCondor has the ability for submit description files to include custom macros
$(macroname)
that can be set at submit time by passing key=value
pairs
of information to condor_submit. DAGMan can be what told key=value
pairs to pass at node job submit time allowing a single submit description to
easily be used for multiple nodes in a DAG with variance.
Macro Variables for Nodes
The VARS DAG command in the DAG description file defines variables using
a key=value
syntax. These variables can then be used in the job description file.
The complete VARS lines looks like:
VARS <NodeName | ALL_NODES> [PREPEND | APPEND] macroname="string" [macroname2="string2" ... ]
A macroname may contain alphanumeric characters (a-z, A-Z, and 0-9)
and the underscore character. A restriction is that the macroname
itself cannot begin with the string queue
, in any combination of
upper or lower case letters.
Correct syntax requires that the value string must be enclosed in
double quotes. To use a double quote mark within a string, escape
the double quote mark with the backslash character (\"
). To add
the backslash character itself, use two backslashes (\\
).
A single VARS line may contain multiple space-separated key=value
pairs. Alternatively a node can be specified in multiple VARS lines.
The use of VARS to provide information for submit description macros is very useful to reduce the number of submit files needed when multiple nodes have the same submit description with simple variance. The following example shows this behavior for a DAG with jobs that only vary in filenames.
# File: example.dag
JOB A shared.sub
JOB B shared.sub
JOB C shared.sub
VARS A filename="alpha"
VARS B filename="beta"
VARS C filename="charlie"
# Generic submit description: shared.sub
executable = progX
output = $(filename).out
error = $(filename).err
log = $(filename).log
queue
For a DAG such as above, but with thousands of nodes, the ability to write and maintain a single submit description file together with a single DAG description file is worthwhile.
Prepend or Append Variables to Node
The VARS command can take either the optional PREPEND or APPEND keyword to specify how the following variable information is passed to the node’s list of jobs at submission time.
APPEND will add the variable after the submit description is read. Resulting in the passed variable being added as a macro overwriting any already existing variable values.
PREPEND will add the variable before the submit description file is read. This allows the variable to be used in submit description conditionals.
For example, a DAG such as the following in conjunction with the submit
description on the right will result in the jobs Arguments to
be A was prepended
and the output file being named results-B.out
.
JOB A conditional.sub
VARS A PREPEND var1="A"
VARS A APPEND var2="B"
If instead var1 used APPEND and var2 used PREPEND then Arguments
will become No variables prepended
and the output file will be named
results-C.out
.
Note
If neither PREPEND nor APPEND is used in the VARS line then the variable will either be prepended or appended based on the configuration variable DAGMAN_DEFAULT_APPEND_VARS.
Multiple macroname definitions
If a node has defined the same macroname multiple times in a DAG
then a warning will be written to the log and the last defined instance
will be used for the variables value. Given the following example,
custom_macro
will be set to bar
and output the following
warning message.
# File: example.dag
JOB ONLY sample.sub
VARS ONLY custom_macro="foo"
VARS ONLY custom_macro="bar"
Warning: VAR custom_macro is already defined in node ONLY
Discovered at file "example.dag", line 4
Variables for Job Arguments
The value provided for a variable is capable of containing whitespace such as spaces and tabs, single and double quotes, and backslashes. To use these special characters in the arguments line for condor_submit use the appropriate syntax and/or character escaping mechanisms.
Note
Regardless of chosen arguments syntax, the variable value is surrounded in double quotes. Meaning proper double quote escaping must be provided to utilize double quotes in a node jobs arguments.
Single quotes can be used in three ways for arguments:
in Old Syntax, within a macro’s value specification
in New Syntax, within a macro’s value specification
in New Syntax only, to delimit an argument containing white space
in New Syntax only, escape a single quote with another to pass a single quote as part of an argument. Example provided in NodeA’s
fourth
macro (see right).
Provided the example DAG description file on the right, the following would occur:
- NodeA using the New Syntax:
The following arguments line would produce the subsequent values passed to NodeA’s executable. The single quotes around each variable reference are only necessary if the variable value may contain spaces or tabs.
arguments = "'$(first)' '$(second)' '$(third)' '($fourth)' '$(misc)'"
Alberto Contador "Andy Schleck" Lance\ Armstrong Vincenzo 'The Shark' Nibali !@#$%^&*()_-=+=[]{}?/
- NodeB using the Old Syntax:
The following arguments line would produce the subsequent values passed to NodeB’s executable.
arguments = $(first) $(second) $(third) $(fourth) $(misc)
Lance_Armstrong "Andreas_Kloden" Ivan_Basso Bernard_'The_Badger'_Hinault !@#$%^&*()_-=+=[]{}?/
- NodeC using the New Syntax for single quote delimiting:
The following arguments line would produce the subsequent values passed to NodeC’s executable.
arguments = "$(args)"
Nairo Quintana Chris Froome
Referencing Macros Within a Definition
The variables value can contain an HTCondor Job Description Language (JDL)
macro expansion $(<macroname>)
allowing for the DAGMan provided
macros to utilize other existing macros like the following:
# File: example.dag
JOB A sample.sub
VARS A test_case="$(JOB)-$(ClusterId)"
# File: sample.sub
executable = progX
arguments = $(args)
output = $(test_case).out
error = $(test_case).err
log = $(test_case).log
queue
Given the example listed above, if the list of jobs ClusterId is 42 then the
output file would be A-42.out
, the error file would be A-42.err
, and
the log file would be A-42.log
.
Using VARS to Define ClassAd Attributes
The macroname may also begin with a My.
, in which case it
names a ClassAd attribute. For example, the VARS specification
VARS NodeA My.name="\"Greg\""
results in the the NodeA
job ClassAd attribute
A = "Greg"
Special Node Types
While most DAGMan nodes are the standard JOB type that run work jobs and possibly a PRE or POST script, special nodes can be specified in the DAG submit description to help manage the DAG and its resources in various ways.
FINAL Node
The FINAL node is a single and special node that is always run at the end of the DAG, even if previous nodes in the DAG have failed or the DAG is removed via condor_rm (On Unix systems). The FINAL node can be used for tasks such as cleaning up intermediate files and checking the output of previous nodes. To declare a FINAL node simply use the following syntax for the FINAL command:
FINAL NodeName SubmitDescription [DIR directory] [NOOP]
Like the JOB command the FINAL command produces a node with name NodeName and an associated submit description. The DIR and NOOP keywords work exactly like they do detailed in the JOB command.
Warning
There can only be one FINAL node in a DAG. If multiple are defined then DAGMan will log a parse error and fail.
The success or failure of the FINAL node determines the success or failure of the entire DAG. This includes any status specified by any ABORT-DAG-ON specification that has taken effect. If some nodes of a DAG fail, but the FINAL node succeeds, the DAG will be considered successful. Therefore, it is important to be careful about setting the exit status of the FINAL node.
The FINAL node can utilize the special macros DAG_STATUS
and/or
FAILED_COUNT
in the job submit description or the script (PRE/POST)
arguments to help determine the correct exit behavior of the FINAL
node, and subsequently the DAG as a whole.
If DAGMan is removed via condor_rm then DAGMan will allow two submit attempts of the FINAL node (On Unix only).
PROVISIONER Node
The PROVISIONER node is a single and special node that is always run at the beginning of a DAG. It can be used to provision resources (i.e. Amazon EC2 instances, in-memory database servers, etc.) that can then be used by the remainder of the nodes in the workflow. The syntax used for the PROVISIONER command is
PROVISIONER NodeName SubmitDescription
Note
Unlike all other node’s in DAGMan, the PROVISIONER node is limited to running a single job. If more than one job is detected at the node’s job submission time DAGMan will exit without writing a Rescue file or running the FINAL node (if provided).
When the PROVISIONER node is defined in a DAG, DAGMan will run the PROVISIONER
node before all other nodes and wait for the provisioner node’s job to state it is ready.
To achieve this, the provisioner node’s job must set it’s job ClassAd attribute
ProvisionerState to the enumerated value ProvisionerState::PROVISIONING_COMPLETE
(currently: 2). Once notified, DAGMan will begin running the other nodes.
The PROVISIONER node runs for a set amount of time defined in its job. It does not get terminated automatically at the end of a DAG workflow. The expectation is that the job needs to explicitly de-provision any resources, such as expensive cloud computing instances that should not be allowed to run indefinitely.
Warning
Currently only one PROVISIONER node may exist for a DAG. If multiple are defined in a DAG then an error will be logged and the DAG will fail.
SERVICE Node
A SERVICE node is a special type of node that is always run at the beginning of a DAG. These are typically used to run tasks that need to run alongside a DAGMan workflow (i.e. progress monitoring) without any direct dependencies to the other nodes in the workflow.
The syntax used for the SERVICE command is
SERVICE NodeName SubmitDescription
If a DAGMan workflow finishes while there are SERVICE nodes still running, it will remove all running SERVICE nodes and exit.
While the SERVICE node is started before other nodes in the DAG, there is
no guarantee that it will start running before any of the other nodes.
However, running it directly on the access point by setting universe
to Local
will make it more likely to begin running prior to other nodes.
Note
A SERVICE node runs on a best-effort basis. If this node fails to submit correctly, this will not register as an error and the DAG workflow will continue normally.
Node Priorities
DAGMan workflows can assign a priority to a node in its DAG. Doing so will determine which nodes, who’s PARENT dependencies have completed, will be submitted. Just like the Job Priority for a job in the queue, the priority value is an integer (which can be negative). Where a larger numerical priority is better. The default priority is 0. To assign a nodes priority follow the syntax for the PRIORITY command as follows:
PRIORITY <NodeName | ALL_NODES> PriorityValue
Node priorities are most relevant when DAGMan Throttling is being utilized or if there are not enough resources in the pool to run all recently submitted node jobs.
Properties of Setting Node Priority
If a node priority is set, then at submission time DAGMan will set the JobPrio via priority. This is passed before processing the submit description.
When a Sub-DAG has an associated node PRIORITY, the Sub-DAG priority will affect all priorities for nodes in the Sub-DAG. See Effective node priorities.
Splices cannot be assigned priorities, but individual nodes within a splice can.
DAGs containing PRE scripts may not submit the nodes in exact priority order, even if doing so would satisfy the DAG dependencies.
Note
When using an external submit file for a node (not inline or submit-description), any declared priority take precedence over the DAGMan value passed at job submission time.
Note
Node priorities do not override DAG PARENT/CHILD dependencies and are not guarantees of the relative order in which node jobs are run.
Effective node priorities
When a Sub-DAG has an associated node priority, all of the node priorities within the Sub-DAG get modified to become the effective node priority. The effective node priority is calculated by adding the Sub-DAGs priority to each internal nodes priority. The default Sub-DAG priority is 0.
# File: priorities.dag
JOB A sample.sub
SUBDAG EXTERNAL B lower.dag
PRIORITY A 25
PRIORITY B 100
# File: lower.dag
JOB lowA sample.sub
JOB lowB sample.sub
PRIORITY lowA 10
PRIORITY lowB 50
Provided the DAGs described on the above, the effective node priorities (not including the Sub-DAG node B) are as follows:
Node |
Effective Prio |
---|---|
A |
25 |
lowA |
110 |
lowB |
150 |
DAGMan and Accounting Groups
condor_dagman will propagate it’s accounting_group and accounting_group_user values down to all nodes within the DAG (including Sub-DAGs). Any explicitly set accounting group information within DAGMan node submit descriptions will take precedence over the propagated accounting information. This allows for easy setting of accounting information for all DAG nodes while giving a way for specific nodes to run with different accounting information.
For more information about HTCondor’s accounting behavior see Group Accounting and/or Accounting Groups with Hierarchical Group Quotas.
ALL_NODES Option
Certain DAG description file commands take the alternative case insensitive keyword ALL_NODES in place of a specific node name. This allows for common node property to be applied to all nodes (excluding SERVICE and the FINAL node). The following commands can utilize ALL_NODES:
When multiple commands set a DAG nodes property, the last one defined takes precedent overriding other earlier definitions. For example:
# File: sample.dag
JOB A node.sub
JOB B node.sub
JOB C node.sub
SCRIPT PRE ALL_NODES my_script $JOB
VARS A name="alphaNode"
VARS ALL_NODES name="$(JOB)"
# This overrides the above VARS command for node B.
VARS B name="nodeB"
RETRY all_nodes 3
RETRY A 10
INCLUDE
The INCLUDE command allows the contents of one DAG file to be parsed inline as if they were physically included in the referencing DAG file. The syntax for INCLUDE is
INCLUDE FileName
The INCLUDE command allows for easier DAG management and ability to easily change the DAG without losing the older setup. For example, a DAG could describe all the nodes to be executed in the workflow and include a file the describes the PARENT/CHILD relationships. If multiple different DAG structure files were created then by simply changing the INCLUDE line can modify the entire DAG structure without manually changing each line in between executions.
All INCLUDE files must contain proper DAG syntax. Included files can nested to any depth (be careful of creating a cycle).
Warning
INCLUDE does not modify node names like splicing which will result in a parse error if the same node name is used more than once.
DAG Manager Job Specifications
While most DAG commands modify/describe the DAG workflow and its various pieces, some commands modify the DAGMan proper job itself.
Setting Job Ad Attributes
The SET_JOB_ATTR command sets an attribute/value pair to be set in the DAGMan proper job’s ClassAd. The syntax is:
SET_JOB_ATTR AttributeName = AttributeValue
The SET_JOB_ATTR attribute is not propagated down to node jobs of the DAG.
The provided value can contain spaces when contained in single or double quotes. These quote marks will become part of the value.
If the same attribute is specified multiple times then the last-specified value is utilized. An attribute set in the DAG file can be overridden at submit time as follows:
$ condor_submit_dag -append 'My.<attribute> = <value>'
Controlling the Job Environment
The ENV command is used to specify environment variables to set into the DAGMan job’s environment or get from the environment that the DAGMan job was submitted from. It is important to know that the environment variables in the DAG manager job’s environment effect scripts and node jobs that rely environment variables since scripts and node jobs are submitted from the DAGMan job’s environment. The syntax is:
ENV GET VAR-1 [VAR-2 ... ]
# or
ENV SET Key=Value;Key=Value; ...
- GET Keyword:
Takes a list of environment variable names to be added to the DAGMan job’s getenv command in the
*.condor.sub
file.
- SET Keyword:
Takes a semi-colon delimited list of key=value pairs of information to explicitly add to the DAGMan job’s environment command in the
*.condor.sub
file.Note
The added key=value pairs must follow the normal HTCondor job environment rules.
DAG Specific Configuration
DAGMan allows for all DAGMan Configuration File Entries to be applied on a per DAG basis. To apply custom configuration for a DAGMan workflow simply create a custom configuration file to provide the the CONFIG command.
Only one configuration file is permitted per DAGMan process. If multiple DAGs are submitted at one time or a workflow is comprised of Splices then a fatal error will occur upon detection of more than one configuration file. Sub-DAGs run as their own DAGMan process allowing Sub-DAGs to have there own configuration files.
Custom configuration values are applied for the entire DAG workflow. So, if multiple DAGs are submitted at one time then all of the DAGs will use the custom configuration even though some DAGs didn’t specify a custom config file.
Note
Only configuration options that apply specifically to DAGMan or to DaemonCore (like debug log levels) take effect when added to a custom DAG configuration file.
Given there are many layers of configuration processing, and some condor_submit_dag options that have the same effect as a DAGMan configuration options, the values DAGMan uses is dictated by the following ordered list where elements processed later take precedence:
HTCondor system configuration as set up by the AP administrator(s).
Configuration options passed as special HTCondor environment variables
_CONDOR_<config option>=Value
.Custom configuration provided by the CONFIG command or condor_submit_dags -config option.
condor_submit_dag options that control the same behavior as a configuration option such as DAGMAN_MAX_JOBS_SUBMITTED and -maxjobs.
Visualizing DAGs
To help visualize a DAG, DAGMan has the ability to create a dot input file for the AT&T Research Labs Graphiz package to draw the DAG. DAGMan will produce dot files when the DOT command is declared with the following syntax:
DOT filename [UPDATE | DONT-UPDATE] [OVERWRITE | DONT-OVERWRITE] [INCLUDE <dot-file-header>]
The DOT command can take several optional parameters as follows:
UPDATE This will update the dot file every time a significant update happens.
DONT-UPDATE Creates a single dot file, when the DAGMan begins executing. This is the default if the parameter UPDATE is not used.
OVERWRITE Overwrites the dot file each time it is created. This is the default, unless DONT-OVERWRITE is specified.
DONT-OVERWRITE Creates a new dot file each time one is written as
<filename>.<num>
. Where the number increases with each write such asdag.dot.0
todag.dot.1
.INCLUDE Includes the contents of the specified file in the produced dot file after the graphs label line.