DAGMan Configuration Options
These macros affect the operation of DAGMan and DAGMan jobs within HTCondor.
Note
Many of these configuration variables are appropriate to set on a per DAG basis. For more information see DAG Specific Configuration.
Warning
Configuration settings do not get applied to running DAGMan workflows when executing condor_reconfig.
General
- DAGMAN_CONFIG_FILE¶
The path to the configuration file to be used by condor_dagman. This option is set by condor_submit_dag automatically and should not be set explicitly by the user. Defaults to an empty string.
- DAGMAN_USE_OLD_FILE_PARSER¶
A boolean that defaults to
False, whenTruecondor_dagman will use the old file parser to process DAG files.
Note
This option is intended to be a fall back to the known working DAG file parser while transitioning to the new style parser. This will be deprecated in the future.
- DAGMAN_USE_STRICT¶
An integer defining the level of strictness condor_dagman will apply when turning warnings into fatal errors, as follows:
0: no warnings become errors
1: severe warnings become errors
2: medium-severity warnings become errors
3: almost all warnings become errors
Using a strictness value greater than 0 may help find problems with a DAG that may otherwise escape notice. The default value if not defined is 1.
- DAGMAN_STARTUP_CYCLE_DETECT¶
A boolean value that defaults to
False. WhenTrue, causes condor_dagman to check for cycles in the DAG before submitting DAG node jobs, in addition to its run time cycle detection.
Note
The startup process for DAGMan is much slower for large DAGs when set
to True.
- DAGMAN_ABORT_DUPLICATES¶
A boolean that defaults to
True. WhenTrue, upon startup DAGMan will check to see if a previous DAGMan process for a specific DAG is still running and prevent the new DAG instance from executing. This check is done by checking for a DAGMan lock file and verifying whether or not the recorded PID’s associated process is still running.
Note
This value should rarely be changed, especially by users, since have multiple DAGMan processes executing the same DAG in the same directory can lead to strange behavior and issues.
- DAGMAN_USE_SHARED_PORT¶
A boolean that defaults to
False. WhenTrue, condor_dagman will attempt to connect to the shared port daemon.
Note
This value should never be changed since it was added to prevent spurious shared port related error messages from appearing the DAGMan debug log.
- DAGMAN_USE_DIRECT_SUBMIT¶
A boolean value that defaults to
True. WhenTrue, condor_dagman will open a direct connection to the local condor_schedd to submit jobs rather than spawning the condor_submit process.- DAGMAN_PRODUCE_JOB_CREDENTIALS¶
A boolean value that defaults to
True. WhenTrue, condor_dagman will attempt to produce needed credentials for jobs at submit time when using direct submission.- DAGMAN_USE_JOIN_NODES¶
A boolean value that defaults to
True. WhenTrue, condor_dagman will create special join nodes for PARENT/CHILD relationships between multiple parent nodes and multiple child nodes.
Note
This should never be changed since it reduces the number of dependencies in the graph resulting in a significant improvement to the memory footprint, parse time, and submit speed of condor_dagman (Especially for large DAGs).
- DAGMAN_PUT_FAILED_JOBS_ON_HOLD¶
A boolean value that defaults to
False. WhenTrue, DAGMan will automatically retry a node with its job submitted on hold if any of the nodes jobs fail. Script failures do not cause this behavior. The job is only put on hold if the node has no more declared RETRY attempts.- DAGMAN_NODE_JOB_FAILURE_TOLERANCE¶
An integer value representing the number of jobs in a single cluster that can fail before DAGMan considers the cluster as failed and removes any remaining jobs. This value is applied to all nodes in the DAG for each execution. The default value is
0meaning no jobs should fail.Warning
If the tolerance value is greater than or equal to the total number of jobs in a cluster then DAGMan will consider the cluster as successful even if all jobs fail.
- DAGMAN_DEFAULT_APPEND_VARS¶
A boolean value that defaults to
False. WhenTrue, variables parsed in the DAG file VARS line will be appended to the given Job submit description file unless VARS specifies PREPEND or APPEND. WhenFalse, the parsed variables will be prepended unless specified.- DAGMAN_MANAGER_JOB_APPEND_GETENV¶
A comma separated list of variable names to add to the DAGMan
*.condor.subfile’s getenv option. This will in turn add any found matching environment variables to the DAGMan proper jobs Environment. Setting this value toTruewill result ingetenv = true. The Base*.condor.subvalues for getenv are the following:General Shell
PATH
HOME
USER
TZ
LANG
LC_ALL
PYTHONPATH
PERL*
HTCondor
CONDOR_CONFIG
CONDOR_*
Scitoken
BEARER_TOKEN
BEAERER_TOKEN_FILE
Misc.
PEGASUS_*
- DAGMAN_NODE_RECORD_INFO¶
A string that when set to
RETRYwill cause DAGMan to insert a nodes current retry attempt number into the nodes job ad as the attribute DAGManNodeRetry at submission time. This knob is not set by default.- DAGMAN_RECORD_MACHINE_ATTRS¶
A comma separated list of machine attributes that DAGMan will insert into a node jobs submit description for job_ad_information_attrs and job_machine_attrs. This will result in the listed machine attributes to be injected into the nodes produced job ads and userlog. This knob is not set by default.
- DAGMAN_METRICS_FILE_VERSION¶
An integer value that represents the version of metrics file to write (see info below). This value defaults to
2.- V1 Metrics File (1):
Original metric file output that refers to DAG nodes as
jobs.- V2 Metrics File (2):
New metric file using updated terminology (i.e. using the word
nodes).
- DAGMAN_REPORT_GRAPH_METRICS¶
A boolean that defaults to
False. WhenTrue, DAGMan will write additional information regarding graph metrics to*.metricsfile. The included graph metrics are as follows:Graph Height
Graph Width
Number of edges (dependencies)
Number of vertices (nodes)
- DAGMAN_DISABLE_PORT¶
A boolean that defaults to
False. WhenTrue, DAGMan will not open up a command port.Warning
Disabling the command port will make tools, such as htcondor dag halt that talk directly to a running DAGMan process fail.
Throttling
- DAGMAN_DISABLE_ADMIN_THROTTLE_LIMITING¶
A boolean value that defaults to
False. WhenTrueusers can set any value for the various DAGMan throttles. WhenFalseDAGMan will limit any user specified throttle values to the bounds of the configured throttles (maximum/minimum) unless the configured value is0. This applies to the following options:- DAGMAN_MAX_JOBS_IDLE¶
An integer value that controls the maximum number of idle procs allowed within the DAG before condor_dagman temporarily stops submitting jobs. condor_dagman will resume submitting jobs once the number of idle procs falls below the specified limit. DAGMAN_MAX_JOBS_IDLE currently counts each individual proc within a cluster as a job, which is inconsistent with DAGMAN_MAX_JOBS_SUBMITTED. Note that submit description files that queue multiple procs can cause the DAGMAN_MAX_JOBS_IDLE limit to be exceeded. If a submit description file contains
queue 5000and DAGMAN_MAX_JOBS_IDLE is set to 250, this will result in 5000 procs being submitted to the condor_schedd, not 250; in this case, no further jobs will then be submitted by condor_dagman until the number of idle procs falls below 250. The default value is 1000. To disable this limit, set the value to 0. This configuration option can be overridden by the condor_submit_dag -maxidle command-line argument (see condor_submit_dag).- DAGMAN_MAX_JOBS_SUBMITTED¶
An integer value that controls the maximum number of node jobs (clusters) within the DAG that will be submitted to HTCondor at one time. A single invocation of condor_submit by condor_dagman counts as one job, even if the submit file produces a multi-proc cluster. The default value is 0 (unlimited). This configuration option can be overridden by the condor_submit_dag -maxjobs command-line argument (see condor_submit_dag).
- DAGMAN_MAX_PRE_SCRIPTS¶
An integer defining the maximum number of PRE scripts that any given condor_dagman will run at the same time. The value 0 allows any number of PRE scripts to run. The default value if not defined is 20. Note that the DAGMAN_MAX_PRE_SCRIPTS value can be overridden by the condor_submit_dag -maxpre command line option.
- DAGMAN_MAX_POST_SCRIPTS¶
An integer defining the maximum number of POST scripts that any given condor_dagman will run at the same time. The value 0 allows any number of POST scripts to run. The default value if not defined is 20. Note that the DAGMAN_MAX_POST_SCRIPTS value can be overridden by the condor_submit_dag -maxpost command line option.
- DAGMAN_MAX_HOLD_SCRIPTS¶
An integer defining the maximum number of HOLD scripts that any given condor_dagman will run at the same time. The default value 0 allows any number of HOLD scripts to run.
- DAGMAN_REMOVE_JOBS_AFTER_LIMIT_CHANGE¶
A boolean that determines if after changing some of these throttle limits, condor_dagman should forceably remove jobs to meet the new limit. Defaults to
False.
Priority, node semantics
- DAGMAN_DEFAULT_PRIORITY¶
An integer value defining the minimum priority of node jobs running under this condor_dagman job. Defaults to 0.
- DAGMAN_SUBMIT_DEPTH_FIRST¶
A boolean value that controls whether to submit ready DAG node jobs in (more-or-less) depth first order, as opposed to breadth-first order. Setting DAGMAN_SUBMIT_DEPTH_FIRST to
Truedoes not override dependencies defined in the DAG. Rather, it causes newly ready nodes to be added to the head, rather than the tail, of the ready node list. If there are no PRE scripts in the DAG, this will cause the ready nodes to be submitted depth-first. If there are PRE scripts, the order will not be strictly depth-first, but it will tend to favor depth rather than breadth in executing the DAG. If DAGMAN_SUBMIT_DEPTH_FIRST is set toTrue, consider also setting DAGMAN_RETRY_SUBMIT_FIRST and DAGMAN_RETRY_NODE_FIRST toTrue. If not defined, DAGMAN_SUBMIT_DEPTH_FIRST defaults toFalse.- DAGMAN_ALWAYS_RUN_POST¶
A boolean value defining whether condor_dagman will ignore the return value of a PRE script when deciding whether to run a POST script. The default is
False, which means that the failure of a PRE script causes the POST script to not be executed. Changing this toTruewill restore the previous behavior of condor_dagman, which is that a POST script is always executed, even if the PRE script fails.
Node job submission/removal
- DAGMAN_USER_LOG_SCAN_INTERVAL¶
An integer value representing the number of seconds that condor_dagman waits between checking the workflow log file for status updates. Setting this value lower than the default increases the CPU time condor_dagman spends checking files, perhaps fruitlessly, but increases responsiveness to nodes completing or failing. The legal range of values is 1 to INT_MAX. If not defined, it defaults to 5 seconds. This default may be automatically decreased if DAGMAN_MAX_JOBS_IDLE is set to a small value. If so, this will be noted in the
dagman.outfile.- DAGMAN_MAX_SUBMITS_PER_INTERVAL¶
An integer value that controls the maximum number of successful node submissions in a single submit interval before doing other work such as scanning the
*.nodes.logfile for new status information. This value has a minimum value of 1 which will be used if any lower value is provided. This value defaults to 100 but may be decreased automatically if DAGMAN_MAX_JOBS_IDLE is set to a small value,Note
The maximum rate at which DAGMan can submit jobs is DAGMAN_MAX_SUBMITS_PER_INTERVAL / DAGMAN_USER_LOG_SCAN_INTERVAL
- DAGMAN_MAX_SUBMIT_ATTEMPTS¶
An integer that controls how many times in a row condor_dagman will attempt to execute condor_submit for a given job before giving up. Note that consecutive attempts use an exponential backoff, starting with 1 second. The legal range of values is 1 to 16. If defined with a value less than 1, the value 1 will be used. If defined with a value greater than 16, the value 16 will be used. Note that a value of 16 would result in condor_dagman trying for approximately 36 hours before giving up. If not defined, it defaults to 6 (approximately two minutes before giving up).
- DAGMAN_MAX_JOB_HOLDS¶
An integer value defining the maximum number of times a node job is allowed to go on hold. As a job goes on hold this number of times, it is removed from the queue. For example, if the value is 2, as the job goes on hold for the second time, it will be removed. At this time, this feature is not fully compatible with node jobs that have more than one
ProcID. The number of holds of each process in the cluster count towards the total, rather than counting individually. So, this setting should take that possibility into account, possibly using a larger value. A value of 0 allows a job to go on hold any number of times. The default value if not defined is 100.- DAGMAN_HOLD_CLAIM_TIME¶
An integer defining the number of seconds that condor_dagman will cause a hold on a claim after a job is finished, using the job ClassAd attribute KeepClaimIdle. The default value is 20. A value of 0 causes condor_dagman not to set the job ClassAd attribute.
- DAGMAN_SUBMIT_DELAY¶
An integer that controls the number of seconds that condor_dagman will sleep before submitting consecutive jobs. It can be increased to help reduce the load on the condor_schedd daemon. The legal range of values is any non negative integer. If defined with a value less than 0, the value 0 will be used.
- DAGMAN_PROHIBIT_MULTI_JOBS¶
A boolean value that controls whether condor_dagman prohibits node job submit description files that queue multiple job procs other than parallel universe. If a DAG references such a submit file, the DAG will abort during the initialization process. If not defined, DAGMAN_PROHIBIT_MULTI_JOBS defaults to
False.- DAGMAN_GENERATE_SUBDAG_SUBMITS¶
A boolean value specifying whether condor_dagman itself should create the
.condor.subfiles for nested DAGs. If set toFalse, nested DAGs will fail unless the.condor.subfiles are generated manually by running condor_submit_dag -no_submit on each nested DAG, or the -do_recurse flag is passed to condor_submit_dag for the top-level DAG. DAG nodes specified with theSUBDAG EXTERNALkeyword or with submit description file names ending in.condor.subare considered nested DAGs. The default value if not defined isTrue.- DAGMAN_REMOVE_NODE_JOBS¶
A boolean value that controls whether condor_dagman removes its node jobs itself when it is removed (in addition to the condor_schedd removing them). Note that setting DAGMAN_REMOVE_NODE_JOBS to
Trueis the safer option (setting it toFalsemeans that there is some chance of ending up with “orphan” node jobs). Setting DAGMAN_REMOVE_NODE_JOBS toFalseis a performance optimization (decreasing the load on the condor_schedd when a condor_dagman job is removed). Note that even if DAGMAN_REMOVE_NODE_JOBS is set toFalse, condor_dagman will remove its node jobs in some cases, such as a DAG abort triggered by an ABORT-DAG-ON command. Defaults toTrue.- DAGMAN_MUNGE_NODE_NAMES¶
A boolean value that controls whether condor_dagman automatically renames nodes when running multiple DAGs. The renaming is done to avoid possible name conflicts. If this value is set to
True, all node names have the DAG number followed by the period character (.) prepended to them. For example, the first DAG specified on the condor_submit_dag command line is considered DAG number 0, the second is DAG number 1, etc. So if DAG number 2 has a node named B, that node will internally be renamed to 2.B. If not defined, DAGMAN_MUNGE_NODE_NAMES defaults toTrue. Note: users should rarely change this setting.- DAGMAN_SUPPRESS_JOB_LOGS¶
A boolean value specifying whether events should be written to a log file specified in a node job’s submit description file. The default value is
False, such that events are written to a log file specified by a node job.- DAGMAN_SUPPRESS_NOTIFICATION¶
A boolean value that controls whether jobs submitted by condor_dagman can send email notifications. If True then no submitted jobs will send email notifications. This is equivalent to setting notification to
Never. Defaults to False.- DAGMAN_CONDOR_SUBMIT_EXE¶
The executable that condor_dagman will use to submit HTCondor jobs. If not defined, condor_dagman looks for condor_submit in the path. Note: users should rarely change this setting.
- DAGMAN_CONDOR_RM_EXE¶
The executable that condor_dagman will use to remove HTCondor jobs. If not defined, condor_dagman looks for condor_rm in the path. Note: users should rarely change this setting.
- DAGMAN_AGGRESSIVE_SUBMIT¶
A boolean value that defaults to
False. WhenTrueDAGMan prioritizes job submission such that it does not stop early if the time spent submitting jobs is greater than the DAGMAN_USER_LOG_SCAN_INTERVAL. Assuming there are still jobs to be submitted within the DAGs throttle limits.- DAGMAN_ABORT_ON_SCARY_SUBMIT¶
A boolean value that controls whether to abort a DAG upon detection of a scary submit event. An example of a scary submit event is one in which the HTCondor ID does not match the expected value. Note that in all HTCondor versions prior to 6.9.3, condor_dagman did not abort a DAG upon detection of a scary submit event. This behavior is what now happens if DAGMAN_ABORT_ON_SCARY_SUBMIT is set to
False. If not defined, DAGMAN_ABORT_ON_SCARY_SUBMIT defaults toTrue. Note: users should rarely change this setting.
Rescue/retry
- DAGMAN_AUTO_RESCUE¶
A boolean value that controls whether condor_dagman automatically runs Rescue DAGs. If DAGMAN_AUTO_RESCUE is
Trueand the DAG input filemy.dagis submitted, and if a Rescue DAG such as the examplesmy.dag.rescue001ormy.dag.rescue002exists, then the largest magnitude Rescue DAG will be run. If not defined, DAGMAN_AUTO_RESCUE defaults toTrue.- DAGMAN_MAX_RESCUE_NUM¶
An integer value that controls the maximum Rescue DAG number that will be written, in the case that
DAGMAN_OLD_RESCUEisFalse, or run if DAGMAN_AUTO_RESCUE isTrue. The maximum legal value is 999; the minimum value is 0, which prevents a Rescue DAG from being written at all, or automatically run. If not defined, DAGMAN_MAX_RESCUE_NUM defaults to 100.- DAGMAN_RESET_RETRIES_UPON_RESCUE¶
A boolean value that controls whether node retries are reset in a Rescue DAG. If this value is
False, the number of node retries written in a Rescue DAG is decreased, if any retries were used in the original run of the DAG; otherwise, the original number of retries is allowed when running the Rescue DAG. If not defined, DAGMAN_RESET_RETRIES_UPON_RESCUE defaults toTrue.- DAGMAN_RETRY_SUBMIT_FIRST¶
A boolean value that controls whether a failed submit is retried first (before any other submits) or last (after all other ready jobs are submitted). If this value is set to
True, when a job submit fails, the job is placed at the head of the queue of ready jobs, so that it will be submitted again before any other jobs are submitted. This had been the behavior of condor_dagman. If this value is set toFalse, when a job submit fails, the job is placed at the tail of the queue of ready jobs. If not defined, it defaults toTrue.- DAGMAN_RETRY_NODE_FIRST¶
A boolean value that controls whether a failed node with retries is retried first (before any other ready nodes) or last (after all other ready nodes). If this value is set to
True, when a node with retries fails after the submit succeeded, the node is placed at the head of the queue of ready nodes, so that it will be tried again before any other jobs are submitted. If this value is set toFalse, when a node with retries fails, the node is placed at the tail of the queue of ready nodes. This had been the behavior of condor_dagman. If not defined, it defaults toFalse.
Log files
- DAGMAN_DEFAULT_NODE_LOG¶
The default name of a file to be used as a job event log by all node jobs of a DAG.
This configuration variable uses a special syntax in which @ instead of $ indicates an evaluation of special variables. Normal HTCondor configuration macros may be used with the normal $ syntax.
Special variables to be used only in defining this configuration variable:
@(DAG_DIR): The directory in which the primary DAG input file resides. If more than one DAG input file is specified to condor_submit_dag, the primary DAG input file is the leftmost one on the command line.@(DAG_FILE): The name of the primary DAG input file. It does not include the path.@(CLUSTER): The ClusterId attribute of the condor_dagman job.@(OWNER): The user name of the user who submitted the DAG.@(NODE_NAME): For SUBDAGs, this is the node name of the SUBDAG in the upper level DAG; for a top-level DAG, it is the string"undef".
If not defined,
@(DAG_DIR)/@(DAG_FILE).nodes.logis the default value.Notes:
Using
$(LOG)in defining a value for DAGMAN_DEFAULT_NODE_LOG will not have the expected effect, because$(LOG)is defined as"."for condor_dagman. To place the default log file into the log directory, write the expression relative to a known directory, such as$(LOCAL_DIR)/log(see examples below).A default log file placed in the spool directory will need extra configuration to prevent condor_preen from removing it; modify VALID_SPOOL_FILES. Removal of the default log file during a run will cause severe problems.
The value defined for DAGMAN_DEFAULT_NODE_LOG must ensure that the file is unique for each DAG. Therefore, the value should always include
@(DAG_FILE). For example,DAGMAN_DEFAULT_NODE_LOG = $(LOCAL_DIR)/log/@(DAG_FILE).nodes.log
is okay, but
DAGMAN_DEFAULT_NODE_LOG = $(LOCAL_DIR)/log/dag.nodes.log
will cause failure when more than one DAG is run at the same time on a given access point.
- DAGMAN_LOG_ON_NFS_IS_ERROR¶
A boolean value that controls whether condor_dagman prohibits a DAG workflow log from being on an NFS file system. This value is ignored if CREATE_LOCKS_ON_LOCAL_DISK and ENABLE_USERLOG_LOCKING are both
True. If a DAG uses such a workflow log file file and DAGMAN_LOG_ON_NFS_IS_ERROR isTrue(and not ignored), the DAG will abort during the initialization process. If not defined, DAGMAN_LOG_ON_NFS_IS_ERROR defaults toFalse.- DAGMAN_ALLOW_ANY_NODE_NAME_CHARACTERS¶
Allows any characters to be used in DAGMan node names, even characters that are considered illegal because they are used internally as separators. Turning this feature on could lead to instability when using splices or munged node names. The default value is
False.- DAGMAN_ALLOW_EVENTS¶
An integer that controls which bad events are considered fatal errors by condor_dagman. This macro replaces and expands upon the functionality of the
DAGMAN_IGNORE_DUPLICATE_JOB_EXECUTIONmacro. If DAGMAN_ALLOW_EVENTS is set, it overrides the setting ofDAGMAN_IGNORE_DUPLICATE_JOB_EXECUTION. Note: users should rarely change this setting.The DAGMAN_ALLOW_EVENTS value is a logical bitwise OR of the following values:
0 = allow no bad events
1 = allow all bad events, except the event
"job re-run after terminated event"2 = allow terminated/aborted event combination
4 = allow a
"job re-run after terminated event"bug8 = allow garbage or orphan events
16 = allow an execute or terminate event before job’s submit event
32 = allow two terminated events per job, as sometimes seen with grid jobs
64 = allow duplicated events in general
The default value is 114, which allows terminated/aborted event combination, allows an execute and/or terminated event before job’s submit event, allows double terminated events, and allows general duplicate events.
As examples, a value of 6 instructs condor_dagman to allow both the terminated/aborted event combination and the
"job re-run after terminated event"bug. A value of 0 means that any bad event will be considered a fatal error.A value of 5 will never abort the DAG because of a bad event. But this value should almost never be used, because the
"job re-run after terminated event"bug breaks the semantics of the DAG.
Debug output
- DAGMAN_DEBUG¶
This variable is described in <SUBSYS>_DEBUG.
- DAGMAN_VERBOSITY¶
An integer value defining the verbosity of output to the
dagman.outfile, as follows (each level includes all output from lower debug levels):level = 0; never produce output, except for usage info
level = 1; very quiet, output severe errors
level = 2; output errors and warnings
level = 3; normal output
level = 4; internal debugging output
level = 5; internal debugging output; outer loop debugging
level = 6; internal debugging output; inner loop debugging
level = 7; internal debugging output; rarely used
The default value if not defined is 3.
- DAGMAN_DEBUG_CACHE_ENABLE¶
A boolean value that determines if log line caching for the
dagman.outfile should be enabled in the condor_dagman process to increase performance (potentially by orders of magnitude) when writing thedagman.outfile to an NFS server. Currently, this cache is only utilized in Recovery Mode. If not defined, it defaults toFalse.- DAGMAN_DEBUG_CACHE_SIZE¶
An integer value representing the number of bytes of log lines to be stored in the log line cache. When the cache surpasses this number, the entries are written out in one call to the logging subsystem. A value of zero is not recommended since each log line would surpass the cache size and be emitted in addition to bracketing log lines explaining that the flushing was happening. The legal range of values is 0 to INT_MAX. If defined with a value less than 0, the value 0 will be used. If not defined, it defaults to 5 Megabytes.
- DAGMAN_PENDING_REPORT_INTERVAL¶
An integer value representing the number of seconds that controls how often condor_dagman will print a report of pending nodes to the
dagman.outfile. The report will only be printed if condor_dagman has been waiting at least DAGMAN_PENDING_REPORT_INTERVAL seconds without seeing any node job events, in order to avoid cluttering thedagman.outfile. This feature is mainly intended to help diagnose condor_dagman processes that are stuck waiting indefinitely for a job to finish. If not defined, DAGMAN_PENDING_REPORT_INTERVAL defaults to 600 seconds (10 minutes).- DAGMAN_CHECK_QUEUE_INTERVAL¶
An integer value representing the number of seconds DAGMan will wait pending on nodes to make progress before querying the local condor_schedd queue to verify that the jobs the DAG is pending on are in said queue. If jobs are missing, DAGMan will write a rescue DAG and abort. When set to a value equal to or less than 0 DAGMan will not query the condor_schedd. Default value is 28800 seconds (8 Hours).
- DAGMAN_PRINT_JOB_TABLE_INTERVAL¶
An integer value representing the number of seconds DAGMan will wait before printing out the job state table to the debug log. If set to
0then the table is never printed. This value defaults to 900 seconds (15 minutes).- MAX_DAGMAN_LOG¶
This variable is described in MAX_<SUBSYS>_LOG. If not defined, MAX_DAGMAN_LOG defaults to 0 (unlimited size).
HTCondor attributes
- DAGMAN_INSERT_SUB_FILE¶
A file name of a file containing submit description file commands to be inserted into the
.condor.subfile created by condor_submit_dag. The specified file is inserted into the.condor.subfile before the queue command and before any commands specified with the -append condor_submit_dag command line option. Note that the DAGMAN_INSERT_SUB_FILE value can be overridden by the condor_submit_dag -insert_sub_file command line option.- DAGMAN_ON_EXIT_REMOVE¶
Defines the
OnExitRemoveClassAd expression placed into the condor_dagman submit description file by condor_submit_dag. The default expression is designed to ensure that condor_dagman is automatically re-queued by the condor_schedd daemon if it exits abnormally or is killed (for example, during a reboot). If this results in condor_dagman staying in the queue when it should exit, consider changing to a less restrictive expression, as in the example(ExitBySignal == false || ExitSignal =!= 9)
If not defined, DAGMAN_ON_EXIT_REMOVE defaults to the expression
( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2))
- DAGMAN_INHERIT_ATTRS¶
A list of job ClassAd attributes from the root DAGMan job’s ClassAd to be passed down to all managed jobs including SubDAGs. Empty by default.
- DAGMAN_INHERIT_ATTRS_PREFIX¶
A string to be prefixed to the job ClassAd attributes described in DAGMAN_INHERIT_ATTRS. Empty by default.