Version 24.0 LTS Releases
These are Long Term Support (LTS) versions of HTCondor. As usual, only bug fixes (and potentially, ports to new platforms) will be provided in future 24.0.y versions. New features will be added in the 24.x.y feature versions.
The details of each version are described below.
Version 24.0.7
Release Notes:
HTCondor version 24.0.7 released on April 22, 2025.
New Features:
The condor_startd now distributes the LoadAvg assigned to a partitionable slot to the idle resources of the partitionable slot, and then to the dynamic slots. Machines that have only a single partitionable slot will now have the same behavior under a use policy:DESKTOP as they did in version 23.10.18 and 24.0.1. (HTCONDOR-2901)
HTCondor tarballs now contain Pelican 7.15.1
The condor package now requires pelican-7.15.1. The weak dependency is no longer used, because dnf would not update to the requested pelican version.
HTCondor tarballs now contain Apptainer 1.4.0
The condor RPM package now requires at least apptainer version 1.3.6.
Bugs Fixed:
When using delegated cgroup v2, HTCondor no longer reports that that main job (often a pilot) has an out of memory condition when only the sub-job has hit an oom. (HTCONDOR-2944)
Fixed a bug that could cause the condor_starter to crash when running docker universe jobs with custom volume mounts. (HTCONDOR-2890)
Fixed a bug preventing spooled or remote jobs using preserve_relative_paths from working. (HTCONDOR-2877)
The condor_kbdd now also looks in the
XDG_RUNTIME_DIRECTORY
when trying to find a XAuthority file to use to connect to a local X server. (HTCONDOR-2921)Fixed a bug that prevented daemons from updating their ads in the condor_collector when authentication is disabled but encryption or integrity is enabled. (HTCONDOR-2888)
Fixed a bug in condor_adstash that caused it to fail to discover condor_startd daemons using ENABLE_STARTD_DAEMON_AD (enabled by default since HTCondor 23.9). (HTCONDOR-2908)
Fixed a bug with transfer_output_remaps when given an erroneous trailing semicolon. (HTCONDOR-2910)
Fixed inflated memory usage reporting for docker universe jobs on hosts using cgroups V2. The reported memory no longer includes the cached memory. (HTCONDOR-2961)
Fixed a bug where specifying transfer_output_remaps from a path which didn’t exist to a
file://
URL would cause HTCondor to report a useless (albeit correct) error. (HTCONDOR-2790)Fixed a bug that could cause the condor_shadow daemon to crash when the transfer_input_files list was very long (thousands of characters). (HTCONDOR-2859)
Fixed a bug where two different condor_gridmanager processes could attempt to manage the same jobs when GRIDMANAGER_SELECTION_EXPR evaluated to
UNDEFINED
or an empty string for any job. (HTCONDOR-2895)Fixed a rare bug in the condor_schedd, when PER_JOB_HISTORY_DIR is set that could cause a repeated restart loop. (HTCONDOR-2902)
X.509 proxy delegation no longer fails when using OpenSSL 3.4.0 or later. (HTCONDOR-2904)
Fixed a bug that could cause the condor_gridmanager to crash when there were ARC CE jobs with no X509UserProxy. (HTCONDOR-2907)
Fixed a bug that usually prevented manifest from populating the
in
andout
files. (HTCONDOR-2916)Fixed a bug that could cause a job submission to fail if a previous job submission to the same condor_schedd failed. (HTCONDOR-2917)
Fixed a bug where daemons wouldn’t immediately apply new security policy to incoming commands after a reconfigure. (HTCONDOR-2929)
Fixed a bug where condor_history would crash when reading a history file larger than
2GB
in the default mode (backwards). (HTCONDOR-2933)Fixed a bug that caused the ce-audit plugin to fail. (HTCONDOR-2963)
Removed a scary-looking message in the log of the condor_collector about denying NEGOTIATOR-level authorization when the client wasn’t requesting that authorization level. (HTCONDOR-2964)
Fixed a bug that caused most updates of collector ads via UDP to be rejected. (HTCONDOR-2975)
Fixed a bug where the condor_shadow would wait for the job lease to expire (usually 40 minutes) before returning a job to idle status when the condor_starter failed to initialize. (HTCONDOR-2997)
The condor_startd now checks to see if the START expression of a static slot still evaluates to true before it allows a slot to be claimed. This helps to give an accurate reply to the condor_schedd when it tries to claim a slot with a START expression that changes frequently. (HTCONDOR-3013)
Version 24.0.6
Release Notes:
HTCondor version 24.0.6 released on March 27, 2025.
New Features:
None.
Bugs Fixed:
Security Item: This release of HTCondor fixes a security-related bug described at
Version 24.0.5
Release Notes:
HTCondor version 24.0.5 released on March 4, 2025.
New Features:
HTCondor tarballs now contain Pelican 7.13.0
Bugs Fixed:
Fixed a bug where chirp would not work in container universe jobs using the docker runtime. (HTCONDOR-2866)
Fixed a bug where referencing
htcondor2.JobEvent.cluster
could crash if processed log event was not associated with job(s) (i.e. had a negative value). (HTCONDOR-2881)Fixed a bug that caused the condor_gridmanager to abort if a job that it was managing disappeared from the job queue (i.e. due to someone running condor_rm -force). (HTCONDOR-2845)
Fixed a bug that caused grid ads from different Access Points to overwrite each other in the collector. (HTCONDOR-2876)
Fixed a memory leak that can occur in any HTCondor daemon when an invalid ClassAd expression is encountered. (HTCONDOR-2847)
Fixed a bug that caused daemons to go into infinite recursion, eventually crashing when they ran out of stack memory. (HTCONDOR-2873)
Version 24.0.4
Release Notes:
HTCondor version 24.0.4 released on February 4, 2025.
New Features:
For arc grid universe jobs, the new submit command arc_data_staging can be used to supply additional elements to the DataStaging block of the ARC ADL that HTCondor constructs. (HTCONDOR-2774)
Bugs Fixed:
Fixed a bug in the negotiator that caused it to crash when matching offline ads. (HTCONDOR-2819)
Fixed a memory leak in the schedd that could be caused by
SCHEDD_CRON
scripts that generate standard error output. (HTCONDOR-2817)Fixed a bug that cause the condor_schedd to crash with a segmentation fault if a condor_off
-fast
command was run while a schedd cron script was running. (HTCONDOR-2815)Fixed issue where EPs using STARTD_ENFORCE_DISK_LIMITS would fill up the EPs filesystem due to excessive saving of metadata to
/etc/lvm/archive
. (HTCONDOR-2791)Fixed bug where container_service_names did not work. (HTCONDOR-2829)
Fixed very rare bug that could cause the condor_startd to crash when the condor_collector times out queries and DNS is running very slowly. (HTCONDOR-2831)
Updated condor_upgrade_check to test for use for PASSWORD authentication and warn about the authenticated identity changing. (HTCONDOR-2823)
Version 24.0.3
Release Notes:
HTCondor version 24.0.3 released on January 6, 2025.
New Features:
Add new knob CGROUP_POLLING_INTERVAL which defaults to 5 (seconds), to control how often a cgroup system polls for resource usage. (HTCONDOR-2802)
Bugs Fixed:
EPs spawned by htcondor annex no longer crash on start-up. (HTCONDOR-2745)
When resolving a hostname to a list of IP addresses, avoid using IPv6 link-local addresses. This change was done incorrectly in 23.9.6. (HTCONDOR-2746)
htcondor2.Submit.from_dag()
andhtcondor.Submit.from_dag()
now correctly raises an HTCondor exception when the processing of DAGMan options and submit time DAG commands fails. (HTCONDOR-2736)Fixed confusing job hold message that would state a job requested
0.0 GB
of disk via request_disk when exceeding disk usage on Execution Points using STARTD_ENFORCE_DISK_LIMITS. (HTCONDOR-2753)You can now locate a collector daemon in the htcondor2 Python bindings. (HTCONDOR-2738)
Fixed a bug in condor_qusers tool where the
add
argument would always enable rather than add a user. (HTCONDOR-2775)Fixed a bug where cgroup systems did not report peak memory, as intended but current instantaneous memory instead. (HTCONDOR-2800) (HTCONDOR-2804)
Fixed an inconsistency in cgroup v1 systems where the memory reported by condor included memory used by the kernel to cache disk pages. (HTCONDOR-2807)
Fixed a bug on cgroup v1 systems where jobs that were killed by the Out of Memory killer did not go on hold. (HTCONDOR-2806)
Fixed incompatibility of condor_adstash with v2.x of the OpenSearch Python Client. (HTCONDOR-2614)
The
-subsystem
argument of condor_status is once again case-insensitive for credd and defrag subsystem types. (HTCONDOR-2796)
Version 24.0.2
Release Notes:
HTCondor version 24.0.2 released on November 26, 2024.
New Features:
Added a new configuration parameter, STARTER_ALWAYS_HOLD_ON_OOM which defaults to true. When true, if a job is killed with an OOM signal, it is put on hold. When false, the system tries to determine if the job was out of memory, or the system was, and if the latter, evicts the job and sets it back to idle. (HTCONDOR-2686)
Bugs Fixed:
Fixed a bug that prevents condor_ssh_to_job from working with
sftp
andscp
modes. (HTCONDOR-2687)Fixed a bug where a daemon would repeatedly try to use its family security session when authenticating with another daemon that doesn’t know about the session. (HTCONDOR-2685)
Fixed a bug where a job would sometimes match but then fail to start on a machine with a START expression that referenced the KeyboardIdle attribute. (HTCONDOR-2689)
htcondor2.Submit.itemdata()
now correctly accepts an optionalqargs
parameter (as in version 1). (HTCONDOR-2618)Stop signaling the condor_credmon_oauth daemon on every job submission when there’s no work for it to do. This will hopefully reduce the frequency of some errors in the condor_credmon_oauth. (HTCONDOR-2653)
Fixed a bug that could cause the condor_schedd to crash if a job’s ClassAd contained a $$() macro that couldn’t be expanded. (HTCONDOR-2730)
Docker universe jobs now check the Architecture field in the image, and if it doesn’t match the architecture of the EP, the job is put on hold. The new parameter DOCKER_SKIP_IMAGE_ARCH_CHECK skips this. (HTCONDOR-2661)
Version 24.0.1
Release Notes:
HTCondor version 24.0.1 released on October 31, 2024.
LVM_USE_THIN_PROVISIONING now defaults to
False
. This affects Execution Points using STARTD_ENFORCE_DISK_LIMITS.HTCondor tarballs now contain Pelican 7.10.11
New Features:
condor_gpu_discovery can now detect GPUs using AMD’s HIP 6 library. HIP detection will be used if the new
-hip
option is used or if no detection method is specified and no CUDA devices are detected. (HTCONDOR-2509)
Bugs Fixed:
On Windows the htcondor tool now uses the Python C API to try and launch the python interpreter. This will fail with a message box about installing python if python 3.9 is not in the path. (HTCONDOR-2650)
htcondor2.Submit.from_dag()
now recognizesDoRecov
as a synonym forDoRecovery
. This improves compatibility with version 1. (HTCONDOR-2613)htcondor2.Submit.itemdata()
now (correctly) returns an iterator over dictionaries if thehtcondor2.Submit
object specified variable names in itsqueue
statement. (HTCONDOR-2613)When you specify item data using a
dict
, HTCondor will now correctly reject values containing newlines. (HTCONDOR-2616)When docker universe jobs failed with a multi-line errors from docker run, the job used to fail with an “unable to inspect container” message. Now the proper hold message is set and the job goes on hold as expected. (HTCONDOR-2679)
htcondor annex now reports a proper error if you request an annex from a GPU-enabled queue but don’t specify how many GPUs per node you want (and the queue does not always allocate whole nodes). (HTCONDOR-2633)
Fixed a bug where HTCondor systems configured to use cgroups on Linux to measure memory would reuse the peak memory from the previous job in a slot, if any process in the former job was unkillable. This can happen if the job is stuck in NFS or running GPU code. Instead, HTCondor polls the current memory and keeps the peak itself internally. (HTCONDOR-2647)
Fixed a bug where the
-divide
flag to condor_gpu_discovery would be ignored on servers with only one type of GPU device. (HTCONDOR-2669)Fixed a bug introduced in HTCSS v23.8.1 which prevented an EP from running multiple jobs on a single GPU device when
-divide
or-repeat
was added to to configuration knob GPU_DISCOVERY_EXTRA. Also fixed problems with any non-fungible machine resource inventory that contained repeated identifiers. (HTCONDOR-2678)Fixed a bug where condor_watch_q would display
None
for jobs with no JobBatchName instead of the expected ClusterId. (HTCONDOR-2625)htcondor2.Schedd.submit()
now correctly raises aTypeError
when passed a description that is not ahtcondor2.Submit
object. (HTCONDOR-2631)When submitting jobs to an SGE cluster via the grid universe, the blahp no longer saves the output of its wrapper script in the user’s home directory (where the files would accumulate and never be cleaned up). (HTCONDOR-2630)
Improved the error message when job submission as a disallowed user fails (i.e. submitting as the ‘condor’ or ‘root’ user). (HTCONDOR-2638)
Fixed bug in htcondor server status that caused incorrect output if DAEMON_LIST contained commas. (HTCONDOR-2667)
Fixed the new default security configuration to work with older binaries. (HTCONDOR-2701)
An unresponsive libvirtd daemon no longer causes the condor_startd to block indefinitely. (HTCONDOR-2644)