Version 24.0 LTS Releases
These are Long Term Support (LTS) versions of HTCondor. As usual, only bug fixes (and potentially, ports to new platforms) will be provided in future 24.0.y versions. New features will be added in the 24.x.y feature versions.
The details of each version are described below.
Version 24.0.3
Release Notes:
HTCondor version 24.0.3 released on January 6, 2025.
New Features:
Add new knob CGROUP_POLLING_INTERVAL which defaults to 5 (seconds), to control how often a cgroup system polls for resource usage. (HTCONDOR-2802)
Bugs Fixed:
EPs spawned by htcondor annex no longer crash on start-up. (HTCONDOR-2745)
When resolving a hostname to a list of IP addresses, avoid using IPv6 link-local addresses. This change was done incorrectly in 23.9.6. (HTCONDOR-2746)
htcondor2.Submit.from_dag()
andhtcondor.Submit.from_dag()
now correctly raises an HTCondor exception when the processing of DAGMan options and submit time DAG commands fails. (HTCONDOR-2736)Fixed confusing job hold message that would state a job requested
0.0 GB
of disk via request_disk when exceeding disk usage on Execution Points using STARTD_ENFORCE_DISK_LIMITS. (HTCONDOR-2753)You can now locate a collector daemon in the htcondor2 Python bindings. (HTCONDOR-2738)
Fixed a bug in condor_qusers tool where the
add
argument would always enable rather than add a user. (HTCONDOR-2775)Fixed a bug where cgroup systems did not report peak memory, as intended but current instantaneous memory instead. (HTCONDOR-2800) (HTCONDOR-2804)
Fixed an inconsistency in cgroup v1 systems where the memory reported by condor included memory used by the kernel to cache disk pages. (HTCONDOR-2807)
Fixed a bug on cgroup v1 systems where jobs that were killed by the Out of Memory killer did not go on hold. (HTCONDOR-2806)
Fixed incompatibility of condor_adstash with v2.x of the OpenSearch Python Client. (HTCONDOR-2614)
The
-subsystem
argument of condor_status is once again case-insensitive for credd and defrag subsystem types. (HTCONDOR-2796)
Version 24.0.2
Release Notes:
HTCondor version 24.0.2 released on November 26, 2024.
New Features:
Added a new configuration parameter, STARTER_ALWAYS_HOLD_ON_OOM which defaults to true. When true, if a job is killed with an OOM signal, it is put on hold. When false, the system tries to determine if the job was out of memory, or the system was, and if the latter, evicts the job and sets it back to idle. (HTCONDOR-2686)
Bugs Fixed:
Fixed a bug that prevents condor_ssh_to_job from working with
sftp
andscp
modes. (HTCONDOR-2687)Fixed a bug where a daemon would repeatedly try to use its family security session when authenticating with another daemon that doesn’t know about the session. (HTCONDOR-2685)
Fixed a bug where a job would sometimes match but then fail to start on a machine with a START expression that referenced the KeyboardIdle attribute. (HTCONDOR-2689)
htcondor2.Submit.itemdata()
now correctly accepts an optionalqargs
parameter (as in version 1). (HTCONDOR-2618)Stop signaling the condor_credmon_oauth daemon on every job submission when there’s no work for it to do. This will hopefully reduce the frequency of some errors in the condor_credmon_oauth. (HTCONDOR-2653)
Fixed a bug that could cause the condor_schedd to crash if a job’s ClassAd contained a $$() macro that couldn’t be expanded. (HTCONDOR-2730)
Docker universe jobs now check the Architecture field in the image, and if it doesn’t match the architecture of the EP, the job is put on hold. The new parameter DOCKER_SKIP_IMAGE_ARCH_CHECK skips this. (HTCONDOR-2661)
Version 24.0.1
Release Notes:
HTCondor version 24.0.1 released on October 31, 2024.
LVM_USE_THIN_PROVISIONING now defaults to
False
. This affects Execution Points using STARTD_ENFORCE_DISK_LIMITS.HTCondor tarballs now contain Pelican 7.10.11
New Features:
condor_gpu_discovery can now detect GPUs using AMD’s HIP 6 library. HIP detection will be used if the new
-hip
option is used or if no detection method is specified and no CUDA devices are detected. (HTCONDOR-2509)
Bugs Fixed:
On Windows the htcondor tool now uses the Python C API to try and launch the python interpreter. This will fail with a message box about installing python if python 3.9 is not in the path. (HTCONDOR-2650)
htcondor2.Submit.from_dag()
now recognizesDoRecov
as a synonym forDoRecovery
. This improves compatibility with version 1. (HTCONDOR-2613)htcondor2.Submit.itemdata()
now (correctly) returns an iterator over dictionaries if thehtcondor2.Submit
object specified variable names in itsqueue
statement. (HTCONDOR-2613)When you specify item data using a
dict
, HTCondor will now correctly reject values containing newlines. (HTCONDOR-2616)When docker universe jobs failed with a multi-line errors from docker run, the job used to fail with an “unable to inspect container” message. Now the proper hold message is set and the job goes on hold as expected. (HTCONDOR-2679)
htcondor annex now reports a proper error if you request an annex from a GPU-enabled queue but don’t specify how many GPUs per node you want (and the queue does not always allocate whole nodes). (HTCONDOR-2633)
Fixed a bug where HTCondor systems configured to use cgroups on Linux to measure memory would reuse the peak memory from the previous job in a slot, if any process in the former job was unkillable. This can happen if the job is stuck in NFS or running GPU code. Instead, HTCondor polls the current memory and keeps the peak itself internally. (HTCONDOR-2647)
Fixed a bug where the
-divide
flag to condor_gpu_discovery would be ignored on servers with only one type of GPU device. (HTCONDOR-2669)Fixed a bug introduced in HTCSS v23.8.1 which prevented an EP from running multiple jobs on a single GPU device when
-divide
or-repeat
was added to to configuration knob GPU_DISCOVERY_EXTRA. Also fixed problems with any non-fungible machine resource inventory that contained repeated identifiers. (HTCONDOR-2678)Fixed a bug where condor_watch_q would display
None
for jobs with no JobBatchName instead of the expected ClusterId. (HTCONDOR-2625)htcondor2.Schedd.submit()
now correctly raises aTypeError
when passed a description that is not ahtcondor2.Submit
object. (HTCONDOR-2631)When submitting jobs to an SGE cluster via the grid universe, the blahp no longer saves the output of its wrapper script in the user’s home directory (where the files would accumulate and never be cleaned up). (HTCONDOR-2630)
Improved the error message when job submission as a disallowed user fails (i.e. submitting as the ‘condor’ or ‘root’ user). (HTCONDOR-2638)
Fixed bug in htcondor server status that caused incorrect output if DAEMON_LIST contained commas. (HTCONDOR-2667)
Fixed the new default security configuration to work with older binaries. (HTCONDOR-2701)
An unresponsive libvirtd daemon no longer causes the condor_startd to block indefinitely. (HTCONDOR-2644)