Shadow Daemon Configuration Options

These settings affect the condor_shadow.

SHADOW_LOCK¶

This macro specifies the lock file to be used for access to the ShadowLog file. It must be a separate file from the ShadowLog, since the ShadowLog may be rotated and you want to synchronize access across log file rotations. This macro is defined relative to the $(LOCK) macro.

SHADOW_DEBUG¶

This macro (and other settings related to debug logging in the shadow) is described in <SUBSYS>_DEBUG.

SHADOW_QUEUE_UPDATE_INTERVAL¶

The amount of time (in seconds) between ClassAd updates that the condor_shadow daemon sends to the condor_schedd daemon. Defaults to 900 (15 minutes).

SHADOW_LAZY_QUEUE_UPDATE¶

This boolean macro specifies if the condor_shadow should immediately update the job queue for certain attributes (at this time, it only effects the NumJobStarts and NumJobReconnects counters) or if it should wait and only update the job queue on the next periodic update. There is a trade-off between performance and the semantics of these attributes, which is why the behavior is controlled by a configuration macro. If the condor_shadow does not use a lazy update, and immediately ensures the changes to the job attributes are written to the job queue on disk, the semantics for the attributes are very solid (there’s only a tiny chance that the counters will be out of sync with reality), but this introduces a potentially large performance and scalability problem for a busy condor_schedd. If the condor_shadow uses a lazy update, there is no additional cost to the condor_schedd, but it means that condor_q will not immediately see the changes to the job attributes, and if the condor_shadow happens to crash or be killed during that time, the attributes are never incremented. Given that the most obvious usage of these counter attributes is for the periodic user policy expressions (which are evaluated directly by the condor_shadow using its own copy of the job’s ClassAd, which is immediately updated in either case), and since the additional cost for aggressive updates to a busy condor_schedd could potentially cause major problems, the default is True to do lazy, periodic updates.

SHADOW_WORKLIFE¶

The integer number of seconds after which the condor_shadow will exit when the current job finishes, instead of fetching a new job to manage. Having the condor_shadow continue managing jobs helps reduce overhead and can allow the condor_schedd to achieve higher job completion rates. The default is 3600, one hour. The value 0 causes condor_shadow to exit after running a single job.

SHADOW_JOB_CLEANUP_RETRY_DELAY¶

This integer specifies the number of seconds to wait between tries to commit the final update to the job ClassAd in the condor_schedd ‘s job queue. The default is 30.

SHADOW_MAX_JOB_CLEANUP_RETRIES¶

This integer specifies the number of times to try committing the final update to the job ClassAd in the condor_schedd ‘s job queue. The default is 5.

SHADOW_CHECKPROXY_INTERVAL¶

The number of seconds between tests to see if the job proxy has been updated or should be refreshed. The default is 600 seconds (10 minutes). This variable’s value should be small in comparison to the refresh interval required to keep delegated credentials from expiring (configured via DELEGATE_JOB_GSI_CREDENTIALS_REFRESH and DELEGATE_JOB_GSI_CREDENTIALS_LIFETIME). If this variable’s value is too small, proxy updates could happen very frequently, potentially creating a lot of load on the submit machine.

SHADOW_RUN_UNKNOWN_USER_JOBS¶

A boolean that defaults to False. When True, it allows the condor_shadow daemon to run jobs as user nobody when remotely submitted and from users not in the local password file.

SHADOW_STATS_LOG¶

The full path and file name of a file that stores TCP statistics for shadow file transfers. (Note that the shadow logs TCP statistics to this file by default. Adding D_STATS to the SHADOW_DEBUG value will cause TCP statistics to be logged to the normal shadow log file ($(SHADOW_LOG)).) If not defined, SHADOW_STATS_LOG defaults to $(LOG)/XferStatsLog. Setting SHADOW_STATS_LOG to /dev/null disables logging of shadow TCP file transfer statistics.

MAX_SHADOW_STATS_LOG¶

Controls the maximum size in bytes or amount of time that the shadow TCP statistics log will be allowed to grow. If not defined, MAX_SHADOW_STATS_LOG defaults to $(MAX_DEFAULT_LOG), which currently defaults to 10 MiB in size. Values are specified with the same syntax as MAX_DEFAULT_LOG.

SHADOW_LOG_RECONNECT¶

A boolean that defaults to True. When True, the condor_shadow appends a CSV record to the file specified by SHADOW_RECONNECT_LOG each time it successfully reconnects to a disconnected starter, or when it gives up trying to reconnect because the job lease expired. Each record contains the following comma-separated fields: The cluster.proc of the job id, the epoch time the job was activated, the last epoch time the shadow heard from the starter, whether the reconnect succeeded, the epoch time of the reconnect attempt, the configured lease duration in seconds, a timeout version identifier, and whether the starter process is known to be dead.

SHADOW_RECONNECT_LOG¶

The full path and file name of the CSV file where the condor_shadow writes reconnect records when SHADOW_LOG_RECONNECT is True. Defaults to $(LOG)/ShadowReconnectLog.

SHADOW_RECONNECT_LOG_MAX¶

The maximum size in bytes of the ShadowReconnectLog file before it is rotated. Defaults to 10485760 (10 MiB).

SHADOW_RECONNECT_LOG_MAX_NUM¶

The maximum number of rotated ShadowReconnectLog files to keep. When more than this many rotated files exist, the oldest are removed. Defaults to 4.

SHADOW_RECONNECT_TIMEOUT_VERSION¶

An optional string identifier that is included in each reconnect log record. This can be used to tag records with a version when experimenting with different reconnect timeout values. Defaults to an empty string.

ALLOW_TRANSFER_REMAP_TO_MKDIR¶

A boolean value that when True allows the condor_shadow to create directories in a transfer output remap path when the directory does not exist already. The condor_shadow can not create directories if the remap is an absolute path or if the remap tries to write to a directory specified within LIMIT_DIRECTORY_ACCESS.

JOB_EPOCH_HISTORY¶

A full path and filename of a file where the condor_shadow will write to a per run job history file in an analogous way to that of the history file defined by the configuration variable HISTORY. It will be rotated in the same way, and has similar parameters that apply to the HISTORY file rotation apply to the condor_shadow daemon epoch history as well. This can be read with the condor_history command using the -epochs option. The default value is $(SPOOL)/epoch_history.

$ condor_history -epochs

MAX_EPOCH_HISTORY_LOG¶

Defines the maximum size for the epoch history file, in bytes. It defaults to 20MB.

MAX_EPOCH_HISTORY_ROTATIONS¶

Controls the maximum number of backup epoch history files to be kept. It defaults to 2, which means that there may be up to three epoch history files (two backups, plus the epoch history file that is being currently written to). When the epoch history file is rotated, and this rotation would cause the number of backups to be too large, the oldest file is removed.

JOB_EPOCH_HISTORY_DIR¶

A full path to an existing directory that the condor_shadow will write the jobs current job ad to a per job run history file with the name job.runs.X.Y.ads. Where X is the jobs cluster id and Y is the jobs process id. For example, job 35.2 would write a job ad for each run to the file job.runs.35.2.ads. These files can be read through condor_history when run with the -epochs and -directory options.

$ condor_history -epochs -directory

HTCondor does not automatically delete these files, so if unchecked, the directory can grow very large. Either an external entity needs to clean up or condor_history can use the -epochs options optional :d extension to read and delete the files.

$ condor_history -epochs:d -directory