Version 24.0 LTS Releases

These are Long Term Support (LTS) versions of HTCondor. As usual, only bug fixes (and potentially, ports to new platforms) will be provided in future 24.0.y versions. New features will be added in the 24.x.y feature versions.

The details of each version are described below.

Version 24.0.2

Release Notes:

  • HTCondor version 24.0.2 released on November 26, 2024.

New Features:

  • Added a new config parameter, STARTER_ALWAYS_HOLD_ON_OOM which defaults to true. When true, if a job is killed with an OOM signal, it is put on hold. When false, the system tries to determine if the job was out of memory, or the system was, and if the latter, evicts the job and sets it back to idle. (HTCONDOR-2686)

Bugs Fixed:

  • Fixed a bug that prevents condor_ssh_to_job from working with sftp and scp modes. (HTCONDOR-2687)

  • Fixed a bug where a daemon would repeatedly try to use its family security session when authenticating with another daemon that doesn’t know about the session. (HTCONDOR-2685)

  • Fixed a bug where a job would sometimes match but then fail to start on a machine with a START expression that referenced the KeyboardIdle attribute. (HTCONDOR-2689)

  • htcondor2.Submit.itemdata() now correctly accepts an optional qargs parameter (as in version 1). (HTCONDOR-2618)

  • Stop signaling the condor_credmon_oauth daemon on every job submission when there’s no work for it to do. This will hopefully reduce the frequency of some errors in the condor_credmon_oauth. (HTCONDOR-2653)

  • Docker universe jobs now check the Architecture field in the image, and if it doesn’t match the architecture of the EP, the job is put on hold. The new parameter DOCKER_SKIP_IMAGE_ARCH_CHECK skips this. (HTCONDOR-2661)

Version 24.0.1

Release Notes:

New Features:

  • condor_gpu_discovery can now detect GPUs using AMD’s HIP 6 library. HIP detection will be used if the new -hip option is used or if no detection method is specified and no CUDA devices are detected. (HTCONDOR-2509)

Bugs Fixed:

  • On Windows the htcondor tool now uses the Python C API to try and launch the python interpreter. This will fail with a message box about installing python if python 3.9 is not in the path. (HTCONDOR-2650)

  • htcondor2.Submit.from_dag() now recognizes DoRecov as a synonym for DoRecovery. This improves compatibility with version 1. (HTCONDOR-2613)

  • htcondor2.Submit.itemdata() now (correctly) returns an iterator over dictionaries if the htcondor2.Submit object specified variable names in its queue statement. (HTCONDOR-2613)

  • When you specify item data using a dict, HTCondor will now correctly reject values containing newlines. (HTCONDOR-2616)

  • When docker universe jobs failed with a multi-line errors from docker run, the job used to fail with an “unable to inspect container” message. Now the proper hold message is set and the job goes on hold as expected. (HTCONDOR-2679)

  • htcondor annex now reports a proper error if you request an annex from a GPU-enabled queue but don’t specify how many GPUs per node you want (and the queue does not always allocate whole nodes). (HTCONDOR-2633)

  • Fixed a bug where HTCondor systems configured to use cgroups on Linux to measure memory would reuse the peak memory from the previous job in a slot, if any process in the former job was unkillable. This can happen if the job is stuck in NFS or running GPU code. (HTCONDOR-2647)

  • Fixed a bug where the -divide flag to condor_gpu_discovery would be ignored on servers with only one type of GPU device. (HTCONDOR-2669)

  • Fixed a bug introduced in HTCSS v23.8.1 which prevented an EP from running multiple jobs on a single GPU device when -divide or -repeat was added to to configuration knob GPU_DISCOVERY_EXTRA. Also fixed problems with any non-fungible machine resource inventory that contained repeated identifiers. (HTCONDOR-2678)

  • Fixed a bug where condor_watch_q would display None for jobs with no JobBatchName instead of the expected ClusterId. (HTCONDOR-2625)

  • htcondor2.Schedd.submit() now correctly raises a TypeError when passed a description that is not a htcondor2.Submit object. (HTCONDOR-2631)

  • When submitting jobs to an SGE cluster via the grid universe, the blahp no longer saves the output of its wrapper script in the user’s home directory (where the files would accumulate and never be cleaned up). (HTCONDOR-2630)

  • Improved the error message when job submission as a disallowed user fails (i.e. submitting as the ‘condor’ or ‘root’ user). (HTCONDOR-2638)

  • Fixed bug in htcondor server status that caused incorrect output if DAEMON_LIST contained commas. (HTCONDOR-2667)

  • Fixed the new default security configuration to work with older binaries. (HTCONDOR-2701)

  • An unresponsive libvirtd daemon no longer causes the condor_startd to block indefinitely. (HTCONDOR-2644)