DAG Save Point Files
A DAG can be set up to write the current progress of the DAG at specified nodes to a save point file. These files are written the first time the designated node starts running. Meaning any retries won’t save the DAG progress again. The save point file is written in the exact same format as a partial Rescue DAG except that all node retry values will be reset to their max value. The DAG save point file can then be specified when re-running a DAG to start the DAG at a certain point of progress.
To specify a save point file use the DAG submit description keyword
SAVE_POINT_FILE followed by the name of the node designated as the save
point to write a save file, and optionally a filename. If a filename is not
specified the file will be written as
[Node Name]-[DAG filename].save
where the DAG filename is the DAG file that the save file declaration was
If the specified save point filename includes a path then DAGMan will attempt
to write the file to that location. If the condor_submit_dag
flag is used and a path is specified for a save point then the file will be
written to that path relative to a DAG’s working directory. Any save point
files without a specified path will be written to a sub-directory called
save_files created near all other DAGMan procuded files (i.e.
# File: savepointEx.dag JOB A node.sub JOB B node.sub JOB C node.sub JOB D node.sub PARENT A B C CHILD D #SAVE_POINT_FILE NodeName Filename SAVE_POINT_FILE A SAVE_POINT_FILE B Node-B_custom.save SAVE_POINT_FILE C ../example/subdir/Node-C_custom.save SAVE_POINT_FILE D ./Node-D_custom.save
Given the above example DAG file, if
condor_submit_dag savepointEx.dag was ran
from the below directory
my_work then the produced files appear in the
directory tree as follows:
Directory Tree Visualized └─Home ├─example │ └─subdir │ └─Node-C_custom.save └─my_work ├─savepointEx.dag ├─savepointEx.dag.condor.sub ├─savepointEx.dag.dagman.out ├─... ├─Node-D_custom.save └─save_files ├─ A-savepointEx.dag.save └─ Node-B_custom.save
Once a DAG has ran and produce save point files, the DAG can then be re-run from
a save file by passing a filename via the
-load_save flag for condor_submit_dag.
If the save point file is passed with a specified path then DAGMan will attempt to
read the file from that path. If just a save point filename is given then DAGMan will
assume the file is located in the``save_files`` directory. The path to save point
files will be checked relative to the current working directory that condor_submit_dag
was ran from.
When DAGMan writes save point files, if a save file with the same name already exists
then DAGMan will rotate the file to
[filename].old before writing the new save.
Any already existing “old” save files will be removed prior to rotation and saving.
So, if the above example DAG was re-run with
./Node-D_custom.save savepointEx.dag from the same directory then once node D starts
the previous save would become
Node-D_custom.save.old. This behavior does not just
effect save point files when re-running a DAG. If a DAG was set up as follows:
# File: progressSavefile.dag JOB A node.sub JOB B node.sub JOB C node.sub ... SAVE_POINT_FILE A dag-progress.save SAVE_POINT_FILE B dag-progress.save SAVE_POINT_FILE C dag-progress.save
Then assuming the parent/child relationships is
A->B->C, the first save written at
the start of node A will be written to
dag-progress.save. Then when node B starts
dag-progress.save will become
dag-progress.save.old and a new
dag-progress.save will be written. Finally, once node C starts
will be deleted, the present
dag-progress.save will become
and a new
dag-progress.save will be written. Allowing a single save file that progresses
with the DAG to be created.