htcondor.dags API Reference
Attention
This is not documentation for DAGMan itself! If you run into DAGMan jargon that isn’t explained here, see DAGMan Introduction.
Creating DAGs
- class htcondor.dags.DAG(dagman_config=None, dagman_job_attributes=None, max_jobs_by_category=None, dot_config=None, jobstate_log=None, node_status_file=None)
This object represents the entire DAGMan workflow, including both the execution graph and miscellaneous configuration options.
It contains the individual
NodeLayerandSubDAGthat are the “logical” nodes in the graph, created by thelayer()andsubdag()methods respectively.- Parameters
dagman_config (Optional[Mapping[str, Any]]) – A mapping of DAGMan configuration options.
dagman_job_attributes (Optional[Mapping[str, Any]]) – A mapping that describes additional HTCondor JobAd attributes for the DAGMan job itself.
max_jobs_by_category (Optional[Mapping[str, int]]) – A mapping that describes the maximum number of jobs (values) that should be run simultaneously from each category (keys).
dot_config (Optional[DotConfig]) – Configuration options for writing a DOT file, as a
DotConfig.jobstate_log (Optional[Path]) – The path to the jobstate log. If not given, the jobstate log will not be written.
node_status_file (Optional[NodeStatusFile]) – Configuration options for the node status file, as a
NodeStatusFile.
- describe() str
Return a tabular description of the DAG’s structure.
- property edges: Iterator[Tuple[Tuple[BaseNode, BaseNode], BaseEdge]]
Iterate over
((parent, child), edge)tuples, for every edge in the graph.
- final(**kwargs) FinalNode
Create the
FINALnode of the DAG. A DAG can only have oneFINALnode; if you call this method multiple times, it will override any previous calls. To customize theFINALnode after creation, modify theFinalNodeinstance that it returns.
- glob(pattern) Nodes
Return a
Nodesof the nodes in the DAG whose names match the globpattern.
- layer(**kwargs) NodeLayer
Create a new
NodeLayerin the graph with no parents or children. Keyword arguments are forwarded toNodeLayer.
- property leaves: Nodes
A
Nodesof the nodes in the DAG that have no children.
- property node_to_children: Dict[BaseNode, Nodes]
Return a dictionary that maps each node to a
Nodescontaining its children. TheNodeswill be empty if the node has no children.
- property node_to_parents: Dict[BaseNode, Nodes]
Return a dictionary that maps each node to a
Nodescontaining its parents. TheNodeswill be empty if the node has no parents.
- property nodes: Nodes
Iterate over all of the nodes in the DAG, in no particular order.
- property roots: Nodes
A
Nodesof the nodes in the DAG that have no parents.
- DAG.select(selector -> ~htcondor.dags.node.Nodes
Return a
Nodesof the nodes in the DAG that satisfyselector.selectorshould be a function which takes a singleBaseNodeand returnsTrue(will be included) orFalse(will not be included).
- subdag(**kwargs) SubDAG
Create a new
SubDAGin the graph with no parents or children. Keyword arguments are forwarded toSubDAG.
- walk(order=WalkOrder.DEPTH_FIRST) Iterator[BaseNode]
Iterate over all of the nodes in the DAG, starting from the roots (i.e., the nodes with no parents), in either depth-first or breadth-first order.
Sibling order is not specified, and may be different in different calls to this method.
- Parameters
order (WalkOrder) – Walk depth-first (children before siblings) or breadth-first (siblings before children).
- walk_ancestors(node, order=WalkOrder.DEPTH_FIRST) Iterator[BaseNode]
Iterate over all of the ancestors (i.e., parents, parents of parents, etc.) of some node, in either depth-first or breadth-first order.
Sibling order is not specified, and may be different in different calls to this method.
- Parameters
node (BaseNode) – The node to begin walking from. It will not be included in the results.
order (WalkOrder) – Walk depth-first (parents before siblings) or breadth-first (siblings before parents).
- walk_descendants(node, order=WalkOrder.DEPTH_FIRST) Iterator[BaseNode]
Iterate over all of the descendants (i.e., children, children of children, etc.) of some node, in either depth-first or breadth-first order.
Sibling order is not specified, and may be different in different calls to this method.
- Parameters
node (BaseNode) – The node to begin walking from. It will not be included in the results.
order (WalkOrder) – Walk depth-first (children before siblings) or breadth-first (siblings before children).
- class htcondor.dags.WalkOrder(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
An enumeration for keeping track of which order to walk through a graph. Depth-first means that parents/children will be visited before siblings. Breadth-first means that siblings will be visited before parents/children.
- BREADTH_FIRST = 'BREADTH'
- DEPTH_FIRST = 'DEPTH'
Nodes and Node-likes
- class htcondor.dags.BaseNode(dag, *, name, dir=None, noop=False, done=False, retries=None, retry_unless_exit=None, pre=None, post=None, pre_skip_exit_code=None, priority=0, category=None, abort=None)
This is the superclass for all node-like objects (things that can be the logical nodes in a
DAG, likeNodeLayerandSubDAG).Generally, you do not need to construct nodes yourself; instead, they are created by calling methods like
DAG.layer(),DAG.subdag(),BaseNode.child_layer(), and so forth. These methods automatically attach the new node to the sameDAGas the node you called the method on.- Parameters
dag (dag.DAG) – Which
DAGto attach this node to.name (str) – The human-readable name of this node.
dir (Optional[Path]) – The directory to submit from. If
None, it will be the directory the DAG itself was submitted from.noop (Union[bool, Mapping[int, bool]]) – If this is
True, this node will be skipped and marked as completed, no matter what it says it does. For aNodeLayer, this can be dictionary mapping individual underlying node indices to their desired value.done (Union[bool, Mapping[int, bool]]) – If this is
True, this node will be considered already completed. For aNodeLayer, this can be dictionary mapping individual underlying node indices to their desired value.retries (Optional[int]) – The number of times to retry the node if it fails (defined by
retry_unless_exit).retry_unless_exit (Optional[int]) – If the node exits with this code, it will not be retried.
pre (Optional[Script]) – A
Scriptto run before the node itself.post (Optional[Script]) – A
Scriptto run after the node itself.pre_skip_exit_code (Optional[int]) – If the pre-script exits with this code, the node will be skipped.
priority (int) – The internal priority for DAGMan to run this node.
category (Optional[str]) – Which
CATEGORYthis node belongs to.abort (Optional[DAGAbortCondition]) – A
DAGAbortConditionwhich may cause the entire DAG to stop if this node exits in a certain way.
- add_children(*nodes, edge=None) BaseNode
Makes all of the
nodeschildren of this node.- Parameters
nodes – The nodes to make children of this node.
edge (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.
- Returns
self – This method returns
self.- Return type
BaseNode
- add_parents(*nodes, edge=None) BaseNode
Makes all of the
nodesparents of this node.- Parameters
nodes – The nodes to make parents of this node.
edge (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.
- Returns
self – This method returns
self.- Return type
BaseNode
- child_layer(edge=None, **kwargs) NodeLayer
Create a new
NodeLayerwhich is a child of this node.- Parameters
edge (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.kwargs – Additional keyword arguments are passed to the
NodeLayerconstructor.
- Returns
node_layer – The newly-created node layer.
- Return type
NodeLayer
- child_subdag(edge=None, **kwargs) SubDAG
Create a new
SubDAGwhich is a child of this node.- Parameters
edge (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.kwargs – Additional keyword arguments are passed to the
SubDAGconstructor.
- Returns
subdag – The newly-created sub-DAG.
- Return type
SubDAG
- property children: Nodes
Return a
Nodescontaining all of the children of this node.
- parent_layer(edge=None, **kwargs) NodeLayer
Create a new
NodeLayerwhich is a parent of this node.- Parameters
edge (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.kwargs – Additional keyword arguments are passed to the
NodeLayerconstructor.
- Returns
node_layer – The newly-created node layer.
- Return type
NodeLayer
- parent_subdag(edge=None, **kwargs) SubDAG
Create a new
SubDAGwhich is a parent of this node.- Parameters
edge (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.kwargs – Additional keyword arguments are passed to the
SubDAGconstructor.
- Returns
subdag – The newly-created sub-DAG.
- Return type
SubDAG
- property parents: Nodes
Return a
Nodescontaining all of the parents of this node.
- remove_children(*nodes) BaseNode
Makes sure that the
nodesare not children of this node.- Parameters
nodes – The nodes to remove edges from.
- Returns
self – This method returns
self.- Return type
BaseNode
- remove_parents(*nodes) BaseNode
Makes sure that the
nodesare not parents of this node.- Parameters
nodes – The nodes to remove edges from.
- Returns
self – This method returns
self.- Return type
BaseNode
- walk_ancestors(order=WalkOrder.DEPTH_FIRST) Iterator[BaseNode]
Walk over all of the ancestors of this node, in the given order.
- walk_descendants(order=WalkOrder.DEPTH_FIRST) Iterator[BaseNode]
Walk over all of the descendants of this node, in the given order.
- class htcondor.dags.NodeLayer(dag, *, submit_description=None, vars=None, **kwargs)
Bases:
BaseNodeRepresents a “layer” of actual
JOBnodes that share a submit description and edge relationships. Each underlying actual node’s attributes may be customized usingvars.- Parameters
dag (dag.DAG) – The DAG to connect this node to.
submit_description (Union[Submit, None, Path]) – The HTCondor submit description for this node. Can be either an
htcondor.Submitobject or aPathto an existing submit file on disk.vars (Optional[Iterable[Dict[str, str]]]) – The
VARSfor this logical node; one actual node will be created for each dictionary in thevars.kwargs – Additional keyword arguments are passed to the
BaseNodeconstructor.
- class htcondor.dags.SubDAG(dag, *, dag_file, **kwargs)
Bases:
BaseNodeRepresents a
SUBDAGin the graph.See SUBDAG EXTERNAL for more information on sub-DAGs.
- Parameters
dag (dag.DAG) – The DAG to connect this node to.
dag_file (Path) – The
pathlib.Pathto where the sub-DAG’s DAG description file is (or will be).kwargs – Additional keyword arguments are passed to the
BaseNodeconstructor.
- class htcondor.dags.FinalNode(dag, submit_description=None, **kwargs)
Bases:
BaseNodeRepresents the
FINALnode in a DAG.See Final Node for more information on the
FINALnode.- Parameters
dag (dag.DAG) – The DAG to connect this node to.
submit_description (Union[Submit, None, Path]) – The HTCondor submit description for this node. Can be either an
htcondor.Submitobject or aPathto an existing submit file on disk.kwargs – Additional keyword arguments are passed to the
BaseNodeconstructor.
- Nodes(*nodes
This class represents an arbitrary collection of
BaseNode. In many cases, especially when manipulating the structure of the graph, it can be used as a replacement for directly iterating over collections of nodes.- Parameters
nodes – The logical nodes that will be in this
Nodes.
- Nodes.add_children(*nodes, type=None) Nodes
Makes all of the
nodeschildren of all of the nodes in thisNodes.- Parameters
nodes – The nodes to make children of this
Nodes.type (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.
- Returns
self – This method returns
self.- Return type
Nodes
- Nodes.add_parents(*nodes, type=None) Nodes
Makes all of the
nodesparents of all of the nodes in thisNodes.- Parameters
nodes – The nodes to make parents of this
Nodes.type (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.
- Returns
self – This method returns
self.- Return type
Nodes
- Nodes.child_layer(type=None, **kwargs) NodeLayer
Create a new
NodeLayerwhich is a child of all of the nodes in thisNodes.- Parameters
type (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.kwargs – Additional keyword arguments are passed to the
NodeLayerconstructor.
- Returns
node_layer – The newly-created node layer.
- Return type
NodeLayer
- Nodes.child_subdag(type=None, **kwargs) SubDAG
Create a new
SubDAGwhich is a child of all of the nodes in thisNodes.- Parameters
type (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.kwargs – Additional keyword arguments are passed to the
SubDAGconstructor.
- Returns
subdag – The newly-created sub-DAG.
- Return type
SubDAG
- Nodes.parent_layer(type=None, **kwargs) NodeLayer
Create a new
NodeLayerwhich is a parent of all of the nodes in thisNodes.- Parameters
type (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.kwargs – Additional keyword arguments are passed to the
NodeLayerconstructor.
- Returns
node_layer – The newly-created node layer.
- Return type
NodeLayer
- Nodes.parent_subdag(type=None, **kwargs) SubDAG
Create a new
SubDAGwhich is a parent of all of the nodes in thisNodes.- Parameters
type (Optional[BaseEdge]) – The type of edge to use; an instance of a concrete subclass of
BaseEdge. IfNone, aManyToManyedge will be used.kwargs – Additional keyword arguments are passed to the
SubDAGconstructor.
- Returns
subdag – The newly-created sub-DAG.
- Return type
SubDAG
- Nodes.remove_children(*nodes) Nodes
Makes sure that the
nodesare not children of all of the nodes in thisNodes.- Parameters
nodes – The nodes to remove edges from.
- Returns
self – This method returns
self.- Return type
Nodes
- Nodes.remove_parents(*nodes) Nodes
Makes sure that the
nodesare not parents of any of the nodes in thisNodes.- Parameters
nodes – The nodes to remove edges from.
- Returns
self – This method returns
self.- Return type
Nodes
- Nodes.walk_ancestors(order=WalkOrder.DEPTH_FIRST)
Walk over all of the ancestors of all of the nodes in this
Nodes, in the given order.
- Nodes.walk_descendants(order=WalkOrder.DEPTH_FIRST)
Walk over all of the descendants of all of the nodes in this
Nodes, in the given order.
Edges
- class htcondor.dags.BaseEdge
An abstract class that represents the edge between two logical nodes in the DAG.
- abstract get_edges(parent, child, join_factory) Iterable[Union[Tuple[Tuple[int], Tuple[int]], Tuple[Tuple[int], JoinNode], Tuple[JoinNode, Tuple[int]]]]
This abstract method is used by the writer to figure out which nodes in the parent and child should be connected by an actual DAGMan edge. It should yield (or simply return an iterable of) individual edge specifications.
Each edge specification is a tuple containing two elements: the first is a group of parent node indices, the second is a group of child node indices. Either (but not both) may be replaced by a special
JoinNodeobject provided byJoinFactory.get_join_node(). An instance of this class is passed into this function by the writer; you should not create one yourself.You may yield any number of edge specifications, but the more compact you can make the representation (i.e., fewer edge specifications, each with fewer elements), the better. This is where join nodes are helpful: they can turn “many-to-many” relationships into a significantly smaller number of actual edges (\(2N\) instead of \(N^2\)).
A
SubDAGor a zero-varsNodeLayerboth implicitly have a single node index,0. See the source code ofManyToManyfor a simple pattern for dealing with this.- Parameters
parent (BaseNode) – The parent, a concrete subclass of
BaseNode.child (BaseNode) – The child, a concrete subclass of
BaseNode.join_factory (JoinFactory) – An instance of
JoinFactorythat will be provided by the writer.
- class htcondor.dags.OneToOne
This edge connects two layers “linearly”: each underlying node in the child layer is a child of the corresponding underlying node with the same index in the parent layer. The parent and child layers must have the same number of underlying nodes.
- class htcondor.dags.ManyToMany
This edge connects two layers “densely”: every node in the child layer is a child of every node in the parent layer.
- class htcondor.dags.Grouper(parent_chunk_size=1, child_chunk_size=1)
This edge connects two layers in “chunks”. The nodes in each layer are divided into chunks based on their respective chunk sizes (given in the constructor). Chunks are then connected like a
OneToOneedge.The number of chunks in each layer must be the same, and each layer must be evenly-divided into chunks (no leftover underlying nodes).
When both chunk sizes are
1this is identical to aOneToOneedge, and you should use that edge instead because it produces a more compact representation.
- class htcondor.dags.Slicer(parent_slice=slice(None, None, None), child_slice=slice(None, None, None))
This edge connects individual nodes in the layers, selected by slices. Each node from the parent layer that is in the parent slice is joined, one-to-one, with the matching node from the child layer that is in the child slice.
Node Configuration
- class htcondor.dags.Script(executable, arguments=None, retry=False, retry_status=1, retry_delay=0)
- Parameters
executable (Union[str, Path]) – The path to the executable to run.
arguments (Optional[List[str]]) – The individual arguments to the executable. Keep in mind that these are evaluated as soon as the
Scriptis created!retry (bool) –
Trueif the script can be retried on failure.retry_status (int) – If the script exits with this status, the script run will be considered a failure for the purposes of retrying.
retry_delay (int) – The number of seconds to wait after a script failure before retrying.
- class htcondor.dags.DAGAbortCondition(node_exit_value, dag_return_value=None)
Represents the configuration of a node’s DAG abort condition.
See ABORT-DAG-ON for more information about DAG aborts.
Writing a DAG to Disk
- htcondor.dags.write_dag(dag, dag_dir, dag_file_name='dagfile.dag', node_name_formatter=None) Path
Write out the given DAG to the given directory. This includes the DAG description file itself, as well as any associated submit descriptions.
- Parameters
dag (DAG) – The DAG to write the description for.
dag_dir (Path) – The directory to write the DAG files to.
dag_file_name (Optional[str]) – The name of the DAG description file itself.
node_name_formatter (Optional[NodeNameFormatter]) – The
NodeNameFormatterto use for generating underlying node names. If not provided, the default isSimpleFormatter.
- Returns
dag_file_path – The path to the DAG description file; can be passed to
htcondor.Submit.from_dag()if you convert it to a string, likeSubmit.from_dag(str(write_dag(...))).- Return type
- class htcondor.dags.NodeNameFormatter
An abstract base class that represents a certain way of formatting and parsing underlying node names.
- abstract generate(layer_name, node_index) str
This method should generate a single node name, given the name of the layer and the index of the underlying node inside the layer.
- class htcondor.dags.SimpleFormatter(separator=':', index_format='{, offset= 0)
A no-frills
NodeNameFormatterthat produces underlying node names likeLayerName-5.
DAG Configuration
- class htcondor.dags.DotConfig(path, update=False, overwrite=True, include_file=None)
A
DotConfigholds the configuration options for whether and how DAGMan will produce a DOT file representing its execution graph.See Visualizing DAGs for more information.
- Parameters
path (Path) – The path to write the DOT file to.
update (bool) – If
True, the DOT file will be updated as the DAG executes. IfFalse, it will be written once at startup.overwrite (bool) – If
True, the DOT file will be updated in-place. IfFalse, new DOT files will be created next to the original.include_file (Optional[Path]) – Include the contents of the file at this path in the DOT file.
- class htcondor.dags.NodeStatusFile(path, update_time=None, always_update=False)
A
NodeStatusFileholds the configuration options for whether and how DAGMan will write a file containing node status information.See Current Node Status File for more information.
- Parameters
path (Path) – The path to write the node status file to.
update_time (Optional[int]) – The minimum interval to write new information to the node status file.
always_update (Optional[bool]) – Always update the node status file after the
update_time, even if there are no changes from the previous update.
Rescue DAGs
htcondor.dags can read information from a DAGMan rescue file and apply
it to your DAG as it is being constructed.
See The Rescue DAG for more information on Rescue DAGs.
- htcondor.dags.rescue(dag, rescue_file, formatter=None) None
Applies state recorded in a DAGMan rescue file to the
dag. Thedagwill be modified in-place.Warning
Running this function on a
DAGreplaces any existingDONEinformation on all of its nodes. Every node will have a dictionary for itsdoneattribute. If you want to edit this information manually, always run this function first, then make the desired changes on top.Warning
This function cannot detect changes in node names. If node names are different in the rescue file compared to the
DAG, this function will not behave as expected.- Parameters