Special Node Types
While most DAGMan nodes are the standard JOB type that run a job and possibly a PRE or POST script, special nodes can be specified in the DAG submit description to help manage the DAG and its resources in various ways.
FINAL Node
A FINAL node is a single and special node that is always run at the end of the DAG, even if previous nodes in the DAG have failed. A FINAL node can be used for tasks such as cleaning up intermediate files and checking the output of previous nodes. The FINAL command in the DAG input file specifies a node job to be run at the end of the DAG.
The syntax used for the FINAL command is
FINAL JobName SubmitDescriptionFileName [DIR directory] [NOOP]
The FINAL node within the DAG is identified by JobName, and the HTCondor job is described by the contents of the HTCondor submit description file given by SubmitDescriptionFileName.
The keywords DIR and NOOP are as detailed in JOB command documentation. If both DIR and NOOP are used, they must appear in the order shown within the syntax specification.
There may only be one FINAL node in a DAG. A parse error will be logged
by the condor_dagman job in the dagman.out
file, if more than one
FINAL node is specified.
The FINAL node is virtually always run. It is run if the
condor_dagman job is removed with condor_rm. The only case in
which a FINAL node is not run is if the configuration variable
DAGMAN_STARTUP_CYCLE_DETECT is set to True
, and a
cycle is detected at start up time. If DAGMAN_STARTUP_CYCLE_DETECT
is set to False
and a cycle is detected during the course of
the run, the FINAL node will be run.
The success or failure of the FINAL node determines the success or failure of the entire DAG, overriding the status of all previous nodes. This includes any status specified by any ABORT-DAG-ON specification that has taken effect. If some nodes of a DAG fail, but the FINAL node succeeds, the DAG will be considered successful. Therefore, it is important to be careful about setting the exit status of the FINAL node.
The $DAG_STATUS
and $FAILED_COUNT
macros can be used both as PRE
and POST script arguments, and in node job submit description files. As
an example of this, here are the partial contents of the DAG input file,
FINAL final_node final_node.sub
SCRIPT PRE final_node final_pre.pl $DAG_STATUS $FAILED_COUNT
and here are the partial contents of the submit description file,
final_node.sub
arguments = "$(DAG_STATUS) $(FAILED_COUNT)"
If there is a FINAL node specified for a DAG, it will be run at the end
of the workflow. If this FINAL node must not do anything in certain
cases, use the $DAG_STATUS
and $FAILED_COUNT
macros to take
appropriate actions. Here is an example of that behavior. It uses a PRE
script that aborts if the DAG has been removed with condor_rm, which,
in turn, causes the FINAL node to be considered failed without actually
submitting the HTCondor job specified for the node. Partial contents of
the DAG input file:
FINAL final_node final_node.sub
SCRIPT PRE final_node final_pre.pl $DAG_STATUS
and partial contents of the Perl PRE script, final_pre.pl
:
#!/usr/bin/env perl
if ($ARGV[0] eq 4) {
exit(1);
}
There are restrictions on the use of a FINAL node. The DONE option is not allowed for a FINAL node. And, a FINAL node may not be referenced in any of the following specifications:
PARENT, CHILD
RETRY
ABORT-DAG-ON
PRIORITY
CATEGORY
As of HTCondor version 8.3.7, DAGMan allows at most two submit attempts of a FINAL node, if the DAG has been removed from the queue with condor_rm.
PROVISIONER Node
A PROVISIONER node is a single and special node that is always run at the beginning of a DAG. It can be used to provision resources (ie. Amazon EC2 instances, in-memory database servers) that can then be used by the remainder of the nodes in the workflow.
The syntax used for the PROVISIONER command is
PROVISIONER JobName SubmitDescriptionFileName
When a PROVISIONER is defined in a DAG, it gets run at the beginning of the
DAG, and no other nodes are run until the PROVISIONER has advertised that it
is ready. It does this by setting the ProvisionerState
attribute in its
job classad to the enumerated value ProvisionerState::PROVISIONING_COMPLETE
(currently: 2). Once DAGMan sees that it is ready, it will start running
other nodes in the DAG as usual. At this point the PROVISIONER job continues
to run, typically sleeping and waiting while other nodes in the DAG use its
resources.
A PROVISIONER runs for a set amount of time defined in its job. It does not get terminated automatically at the end of a DAG workflow. The expectation is that it needs to explicitly deprovision any resources, such as expensive cloud computing instances that should not be allowed to run indefinitely.
SERVICE Node
A SERVICE node is a special type of node that is always run at the beginning of a DAG. These are typically used to run tasks that need to run alongside a DAGMan workflow (ie. progress monitoring) without any direct dependencies to the other nodes in the workflow.
The syntax used for the SERVICE command is
SERVICE ServiceName SubmitDescriptionFileName
When a SERVICE is defined in a DAG, it gets started at the beginning of the
workflow. There is no guarantee that it will start running before any of the
other nodes, although running it directly from the access point using
universe = local
or universe = scheduler
will almost always make this
go first.
A SERVICE node runs on a best-effort basis. If this node fails to submit correctly, this will not register as an error and the DAG workflow will continue normally.
If a DAGMan workflow finishes while there are SERVICE nodes still running, it will shut these down and then exit the workflow successfully.