Version 24 Feature Releases
We release new features in these releases of HTCondor. The details of each version are described below.
Version 24.7.3
Release Notes:
HTCondor version 24.7.3 released on April 22, 2025.
New Features:
Improved the ability of condor_who to query condor_startd processes when condor_who is running as root or as the same user as the Startd, and added formatting options for use when the condor_startd is running as a job on another batch system. (HTCONDOR-2927)
htcondor credential add oauth2 can now be used to store tokens that can be used by jobs via use_oauth_services. The user is responsible for updating tokens that can expire. (HTCONDOR-2803)
Added OSHomeDir to starter’s copy of the job ad. (HTCONDOR-2972)
Add SYSTEM_MAX_RELEASES which implements an upper bound on the number of times any job can be released by a user or periodic expression. (HTCONDOR-2926)
Added the ability for an EP administrator to disable access to the network by a job, by setting NO_JOB_NETWORKING to true. (HTCONDOR-2967)
Added the ability for a docker universe job to fetch an authenticated image from the docker repository. (HTCONDOR-2870)
Improved condor_watch_q to display information about the number of jobs actively transferring input or output files. (HTCONDOR-2958)
The default value for DISABLE_SWAP_FOR_JOB has been changed to
True
. This provides a more predictable and uniform user experience for jobs running on different EPs. (HTCONDOR-2960)Add htcondor annex login verb, which opens a shared SSH connection to the named HPC system. If you’ve recently created or added an annex at a particular system, it will re-use that cached connection; otherwise, you’ll have to login again, but that connection will then be re-usable by other htcondor annex commands. (HTCONDOR-2809)
Updated htcondor annex to work with Expanse’s new requirements for its
gpu
andgpu-shared
queues. (HTCONDOR-2634)Enhanced htcondor job status to also show the time to transfer the job input sandbox. (HTCONDOR-2959)
Jobs that use concurrency_limits can now re-use claims in the schedd. (HTCONDOR-2937)
Added shell for Linux systems. (HTCONDOR-2918)
START_VANILLA_UNIVERSE expressions may now refer to attributes in the schedd add using the prefix
SCHEDD
. (HTCONDOR-2919)Hold messages generated by failure to transfer output now include how many files failed to transfer. (HTCONDOR-2903)
Added
-transfer-history
flag to condor_history to query historical Input, Output, and Checkpoint transfer ClassAds stored in the JOB_EPOCH_HISTORY files. (HTCONDOR-2878)Improved the parsing and handling of syntax errors in the transfer_output_remaps submit command. (HTCONDOR-2920)
DAGMan SERVICE nodes will no longer be removed automatically when a DAG contains a FINAL node and condor_rm is used on the DAGMan scheduler job. (HTCONDOR-2938)
The list of files generated by the manifest submit command now recursively includes subdirectories. (HTCONDOR-2903)
Added new option
-extract
to condor_history to copy historical ClassAd entries that match a provided constraint to a specified file. (HTCONDOR-2923)EPs using disk enforcement via LVM and LVM_HIDE_MOUNT =
True
will now advertise HasVM =False
due to VM universe jobs being incompatible with mount namespaces. (HTCONDOR-2945)Added support for running Docker universe on ARM hosts (HTCONDOR-2906)
The CLAIMTOBE authentication protocol now fully qualified user names with the system’s
$(UID_DOMAIN)
. To revert to the former semantics, set SEC_CLAIMTOBE_INCLUDE_DOMAIN to false. (HTCONDOR-2915)The condor_startd now distributes the LoadAvg assigned to a partitionable slot to the idle resources of the partitionable slot, and then to the dynamic slots. Machines that have only a single partitionable slot will now have the same behavior under a use policy:DESKTOP as they did in version 23.10.18 and 24.0.1. (HTCONDOR-2901)
HTCondor tarballs now contain Pelican 7.15.1
The condor package now requires pelican-7.15.1. The weak dependency is no longer used, because dnf would not update to the requested pelican version.
HTCondor tarballs now contain Apptainer 1.4.0
The condor RPM package now requires at least apptainer version 1.3.6.
Bugs Fixed:
Fixed a bug in the local issuer credential monitor that prevented the issuance of tokens using the WLCG profile. (HTCONDOR-2954)
Fixed bug where DAGMan would output an error message containing garbage when dumping failed node information to the debug log. (HTCONDOR-2899)
Fixed a bug where EP’s using STARTD_ENFORCE_DISK_LIMITS would mark a slot as
broken
when the condor_starter fails to remove the ephemeral logical volume but the condor_startd successfully removes the LV. (HTCONDOR-2953)Fixed a bug in the Vault credential monitor that kept credentials from being fetched if VAULT_CREDMON_PROVIDER_NAMES was unset. Introduced in HTCondor 24.3.0. (HTCONDOR-2912)
Fixed a bug in the local issuer credential monitor that kept credentials from being issued if LOCAL_CREDMON_TOKEN_VERSION (or named variant) was not set. (HTCONDOR-2965)
When using delegated cgroup v2, HTCondor no longer reports that that main job (often a pilot) has an out of memory condition when only the sub-job has hit an oom. (HTCONDOR-2944)
Fixed a bug that could cause the condor_starter to crash when running docker universe jobs with custom volume mounts. (HTCONDOR-2890)
Fixed a bug preventing spooled or remote jobs using preserve_relative_paths from working. (HTCONDOR-2877)
The condor_kbdd now also looks in the
XDG_RUNTIME_DIRECTORY
when trying to find a XAuthority file to use to connect to a local X server. (HTCONDOR-2921)Fixed a bug that prevented daemons from updating their ads in the condor_collector when authentication is disabled but encryption or integrity is enabled. (HTCONDOR-2888)
Fixed a bug in condor_adstash that caused it to fail to discover condor_startd daemons using ENABLE_STARTD_DAEMON_AD (enabled by default since HTCondor 23.9). (HTCONDOR-2908)
Fixed a bug with transfer_output_remaps when given an erroneous trailing semicolon. (HTCONDOR-2910)
Fixed inflated memory usage reporting for docker universe jobs on hosts using cgroups V2. The reported memory no longer includes the cached memory. (HTCONDOR-2961)
Fixed a bug where specifying transfer_output_remaps from a path which didn’t exist to a
file://
URL would cause HTCondor to report a useless (albeit correct) error. (HTCONDOR-2790)Fixed a bug that could cause the condor_shadow daemon to crash when the transfer_input_files list was very long (thousands of characters). (HTCONDOR-2859)
Fixed a bug where two different condor_gridmanager processes could attempt to manage the same jobs when GRIDMANAGER_SELECTION_EXPR evaluated to
UNDEFINED
or an empty string for any job. (HTCONDOR-2895)Fixed a rare bug in the condor_schedd, when PER_JOB_HISTORY_DIR is set that could cause a repeated restart loop. (HTCONDOR-2902)
X.509 proxy delegation no longer fails when using OpenSSL 3.4.0 or later. (HTCONDOR-2904)
Fixed a bug that could cause the condor_gridmanager to crash when there were ARC CE jobs with no X509UserProxy. (HTCONDOR-2907)
Fixed a bug that usually prevented manifest from populating the
in
andout
files. (HTCONDOR-2916)Fixed a bug that could cause a job submission to fail if a previous job submission to the same condor_schedd failed. (HTCONDOR-2917)
Fixed a bug where daemons wouldn’t immediately apply new security policy to incoming commands after a reconfigure. (HTCONDOR-2929)
Fixed a bug where condor_history would crash when reading a history file larger than
2GB
in the default mode (backwards). (HTCONDOR-2933)Fixed a bug that caused the ce-audit plugin to fail. (HTCONDOR-2963)
Removed a scary-looking message in the log of the condor_collector about denying NEGOTIATOR-level authorization when the client wasn’t requesting that authorization level. (HTCONDOR-2964)
Fixed a bug that caused most updates of collector ads via UDP to be rejected. (HTCONDOR-2975)
Fixed a bug where the condor_shadow would wait for the job lease to expire (usually 40 minutes) before returning a job to idle status when the condor_starter failed to initialize. (HTCONDOR-2997)
The condor_startd now checks to see if the START expression of a static slot still evaluates to true before it allows a slot to be claimed. This helps to give an accurate reply to the condor_schedd when it tries to claim a slot with a START expression that changes frequently. (HTCONDOR-3013)
Version 24.6.1
Release Notes:
HTCondor version 24.6.1 released on March 27, 2025.
New Features:
None.
Bugs Fixed:
Security Item: This release of HTCondor fixes a security-related bug described at
Version 24.5.2
Release Notes:
HTCondor version 24.5.2 released on March 20, 2025.
New Features:
None.
Bugs Fixed:
The default value for STARTD_LEFTOVER_PROCS_BREAK_SLOTS has been changed to ‘False’. When ‘True’, the EP was erroneously marking slots as broken. (HTCONDOR-2946)
Version 24.5.1
Release Notes:
HTCondor version 24.5.1 released on March 4, 2025.
New Features:
The condor_starter now advertise StdoutMtime and StderrMtime which represent the most recent modification time, in seconds since the epoch of a job which uses file transfer. (HTCONDOR-2837)
The condor_startd, when running on a machine with Nvidia gpus, now advertises Nvidia driver version. (HTCONDOR-2856)
Increased the default width of condor_qusers output when redirected to a file or piped to another command to prevent truncation. (HTCONDOR-2861)
The condor_startd will now never lose track and leak logical volumes that were failed to be cleaned up when using STARTD_ENFORCE_DISK_LIMITS. The condor_startd will now periodically retry removal of logical volumes with an exponential back off. (HTCONDOR-2852)
The condor_startd will now keep dynamic slots that have a SlotBrokenReason attribute in
Unclaimed
state rather than deleting them when they change state toUnclaimed
. A new configuration variable CONTINUE_TO_ADVERTISE_BROKEN_DYNAMIC_SLOTS controls this behavior. It defaults totrue
but can be set tofalse
to preserve the old behavior. This change also adds a new attribute BrokenContextAds to the daemon ad of the condor_startd. This attribute has a ClassAd for each broken resource in the startd. condor_status has been enhanced to use this new attribute to display more information about the context of broken resources when both-startd
and-broken
arguments are used. (HTCONDOR-2844)The condor_startd will now permanently reduce the total slot resources advertised by a partitionable slot when a dynamic slot is deleted while it is marked as broken. The amount of reduction will be advertised in new attributes such as ad-attr:BrokenSlotCpus so that the original size of the slot can be computed. (HTCONDOR-2865)
Daemons will now more quickly discover with a non-responsive condor_collector has recovered and resume advertising to it. (HTCONDOR-2605)
Jobs can now request user credentials generated by any combination of the OAuth2, Local Issuer, and Vault credential monitors on the AP. Remote submitters can request these credentials without having any of the CREDMON-related parameters in their configuration files. (HTCONDOR-2851)
HTCondor tarballs now contain Pelican 7.13.0
Bugs Fixed:
Fixed a bug where the condor_gridmanager would write to log file GridmanagerLog.root after a reconfiguration. (HTCONDOR-2846)
htcondor annex shutdown
now works again. (HTCONDOR-2808)Fixed a bug where the job state table DAGMan prints to its debug file could contain a negative number for the count of failed jobs. (HTCONDOR-2872)
Fixed a bug where chirp would not work in container universe jobs using the docker runtime. (HTCONDOR-2866)
Fixed a bug where referencing
htcondor2.JobEvent.cluster
could crash if processed log event was not associated with job(s) (i.e. had a negative value). (HTCONDOR-2881)Fixed a bug that caused the condor_gridmanager to abort if a job that it was managing disappeared from the job queue (i.e. due to someone running condor_rm -force). (HTCONDOR-2845)
Fixed a bug that caused grid ads from different Access Points to overwrite each other in the collector. (HTCONDOR-2876)
Fixed a memory leak that can occur in any HTCondor daemon when an invalid ClassAd expression is encountered. (HTCONDOR-2847)
Fixed a bug that caused daemons to go into infinite recursion, eventually crashing when they ran out of stack memory. (HTCONDOR-2873)
Version 24.4.0
Release Notes:
HTCondor version 24.4.0 released on February 4, 2025.
New Features:
Improved validation and cleanup of EXECUTE directories. The EXECUTE directory must now be owned by the condor user when the daemons are started as root. The condor_startd will not attempt to clean an invalid EXECUTE directory nor will it alter the file permissions of an EXECUTE directory. (HTCONDOR-2789)
For batch grid universe jobs, the PATH environment variable values from the job ad and the worker node environment are now combined. Previously, only the PATH value from the job ad was used. The old behavior can be restored by setting
blah_merge_paths=no
in theblah.config
file. (HTCONDOR-2793)Many small improvements to condor_q
-analyze
and-better-analyze
for pools that use partitionable slots. As a part of this, the condor_schedd was changed to provide match information for the auto-cluster of the job being analyzed, which condor_q will report if it is available. (HTCONDOR-2720)The condor_startd now advertises a new attribute, SingularityUserNamespaces which is
true
when apptainer or singularity work and are using Linux user namespaces, andfalse
when it is using setuid mode. (HTCONDOR-2818)The condor_startd daemon ad now contains attributes showing the average and total bytes transferred to and from jobs during its lifetime. (HTCONDOR-2721)
The condor_credd daemon no longer listens on port
9620
by default, but rather uses the condor_shared_port daemon. (HTCONDOR-2763)DAGMan will now periodically print a table regarding states of job placed to the Access Point to the debug log (
*.dagman.out
). The rate at which this table in printed is dictated by DAGMAN_PRINT_JOB_TABLE_INTERVAL (HTCONDOR-2794)For arc grid universe jobs, the new submit command arc_data_staging can be used to supply additional elements to the DataStaging block of the ARC ADL that HTCondor constructs. (HTCONDOR-2774)
Bugs Fixed:
Changed the numeric output of htcondor job status so that the rounding to megabytes, gigabytes, etc. matches the binary definitions the rest of the tools use. (HTCONDOR-2788)
Fixed a bug in the negotiator that caused it to crash when matching offline ads. (HTCONDOR-2819)
Fixed a memory leak in the schedd that could be caused by
SCHEDD_CRON
scripts that generate standard error output. (HTCONDOR-2817)Fixed a bug that cause the condor_schedd to crash with a segmentation fault if a condor_off
-fast
command was run while a schedd cron script was running. (HTCONDOR-2815)Fixed issue where EPs using STARTD_ENFORCE_DISK_LIMITS would fill up the EPs filesystem due to excessive saving of metadata to
/etc/lvm/archive
. (HTCONDOR-2791)Fixed bug where container_service_names did not work. (HTCONDOR-2829)
Fixed very rare bug that could cause the condor_startd to crash when the condor_collector times out queries and DNS is running very slowly. (HTCONDOR-2831)
Updated condor_upgrade_check to test for use for PASSWORD authentication and warn about the authenticated identity changing. (HTCONDOR-2823)
Version 24.3.0
Release Notes:
HTCondor version 24.3.0 released on January 6, 2025.
New Features:
Updated the condor_credmon_oauth and created a new
condor-credmon-multi
RPM package which, when installed, allows user credentials added via Vault and user credentials generated via a local issuer to exist simultaneously without conflict (e.g. the Vault credmon will not attempt to refresh locally issued credentials). (HTCONDOR-2408)Added singularity launcher wrapper script that runs inside the container and launches the job proper. If this fails to run, HTCondor detects there is a problem with the container runtime, not the job, and reruns the job elsewhere. Controlled by parameter SINGULARITY_USE_LAUNCHER (HTCONDOR-1446)
EP’s using STARTD_ENFORCE_DISK_LIMITS will now advertise IsEnforcingDiskUsage in the machine ad. (HTCONDOR-2734)
Added new
AUTO
option to LVM_HIDE_MOUNT that creates a mount namespace for ephemeral logical volumes if the job is compatible with mount hiding (i.e not Docker jobs). TheAUTO
value is now the default value. (HTCONDOR-2717)Added new submit command for container universe, mount_under_scratch that allows user to create writable ephemeral directories in their otherwise read only container images. (HTCONDOR-2728)
Environment variables from the job that start with
PELICAN_
will now be set in the environment of the pelican file transfer plugin when it is invoked to do file transfer. This is intended to allow jobs to turn on enhanced logging in the plugin. (HTCONDOR-2674)When the condor_startd interrupts a job’s execution, the specific reason is now reflected in the job attributes VacateReason and VacateReasonCode. (HTCONDOR-2713)
Improved performance of condor_history by using the in-memory sort order of job attributes used by the condor_schedd. (HTCONDOR-2729)
If the startd detects that an exited or evicted job has leftover, unkillable processes, it now marks that slot as “broken”, and will not reassign the resources for that slot to any other jobs. Disabled if STARTD_LEFTOVER_PROCS_BREAK_SLOTS is set to false. (HTCONDOR-2756)
Methods in
htcondor2.Schedd
which takejob_spec
arguments now accept a cluster ID in the form of anint
. These functions (htcondor2.Schedd.act()
,htcondor2.Schedd.edit()
,htcondor2.Schedd.export_jobs()
,htcondor2.Schedd.retrieve()
, andhtcondor2.Schedd.unexport_jobs()
) now also raiseTypeError
if theirjob_spec
argument is not astr
,list
ofstr
,classad2.ExprTree
, orint
. (HTCONDOR-2745)Add new knob CGROUP_POLLING_INTERVAL which defaults to 5 (seconds), to control how often a cgroup system polls for resource usage. (HTCONDOR-2802)
Bugs Fixed:
Fixed a bug introduced in 24.2.0 where the daemons failed to start if configured to use only a network interface that didn’t have an IPv6 address. Also, the daemons will no longer bind and advertise an address that doesn’t match the value of NETWORK_INTERFACE. (HTCONDOR-2799)
The htcondor job submit command now issues credentials like condor_submit. (HTCONDOR-2745)
EPs spawned by htcondor annex no longer crash on start-up. (HTCONDOR-2745)
When resolving a hostname to a list of IP addresses, avoid using IPv6 link-local addresses. This change was done incorrectly in 23.9.6. (HTCONDOR-2746)
htcondor2.Submit.from_dag()
andhtcondor.Submit.from_dag()
now correctly raises an HTCondor exception when the processing of DAGMan options and submit time DAG commands fails. (HTCONDOR-2736)Fixed confusing job hold message that would state a job requested
0.0 GB
of disk via request_disk when exceeding disk usage on Execution Points using STARTD_ENFORCE_DISK_LIMITS. (HTCONDOR-2753)You can now locate a collector daemon in the htcondor2 Python bindings. (HTCONDOR-2738)
Fixed a bug in condor_qusers tool where the
add
argument would always enable rather than add a user. (HTCONDOR-2775)Fixed a bug where cgroup systems did not report peak memory, as intended but current instantaneous memory instead. (HTCONDOR-2800) (HTCONDOR-2804)
Fixed an inconsistency in cgroup v1 systems where the memory reported by condor included memory used by the kernel to cache disk pages. (HTCONDOR-2807)
Fixed a bug on cgroup v1 systems where jobs that were killed by the Out of Memory killer did not go on hold. (HTCONDOR-2806)
Fixed incompatibility of condor_adstash with v2.x of the OpenSearch Python Client. (HTCONDOR-2614)
The
-subsystem
argument of condor_status is once again case-insensitive for credd and defrag subsystem types. (HTCONDOR-2796)
Version 24.2.2
Release Notes:
HTCondor version 24.2.2 released on December 4, 2024.
New Features:
None.
Bugs Fixed:
If knob EXECUTE is explicitly set to a blank string in the configuration file for whatever reason, the execution point (startd) may attempt to remove all files from the root partition (everything in /) upon startup. (HTCONDOR-2760)
Version 24.2.1
Release Notes:
HTCondor version 24.2.1 released on November 26, 2024.
This version includes all the updates from Version 24.0.2.
The DAGMan metrics file has changed the name of metrics referring to
jobs
to accurately refer to modern terminology asnodes
. To revert back to old terminology set DAGMAN_METRICS_FILE_VERSION =1
. (HTCONDOR-2682)
New Features:
DAGMan will now correctly submit late materialization jobs to an Access Point when DAGMAN_USE_DIRECT_SUBMIT =
True
. (HTCONDOR-2673)Added new submit command primary_unix_group, which takes a string which must be one of the user’s supplemental groups, and sets the primary group to that value. (HTCONDOR-2702)
Improved DAGMan metrics file to use updated terminology and contain more metrics. (HTCONDOR-2682)
A condor_startd which has ENABLE_STARTD_DAEMON_AD enabled will no longer abort when it cannot create the required number of slots of the correct size on startup. It will now continue to run; reporting the failure to the collector in the daemon ad. Slots that can be fully provisioned will work normally. Slots that cannot be fully provisioned will exist but advertise themselves as broken. This is now the default behavior because daemon ads are enabled by default. The condor_status tool has a new option
-broken
which displays broken slots and their reason for being broken. Use this option with the-startd
option to display machines that are fully or partly broken. (HTCONDOR-2500)A new job attribute FirstJobMatchDate will be set for all jobs of a single submission to the current time when the first job of that submission is matched to a slot. (HTCONDOR-2676)
Added new job ad attribute InitialWaitDuration, recording the number of seconds from when a job was queued to when the first launch happened. (HTCONDOR-2666)
condor_ssh_to_job when entering an Apptainer container now sets the supplemental unix group ids in the same way that vanilla jobs have them set. (HTCONDOR-2695)
IPv6 networking is now fully supported on Windows. (HTCONDOR-2601)
Daemons will no longer block trying to invalidate their ads in a dead collector when shutting down. (HTCONDOR-2709)
Added option
FAST
to configuration parameter MASTER_NEW_BINARY_RESTART. This will cause the condor_master to do a fast restart of all the daemons when it detects new binaries. (HTCONDOR-2708)
Bugs Fixed:
None.
Version 24.1.1
Release Notes:
HTCondor version 24.1.1 released on October 31, 2024.
This version includes all the updates from Version 24.0.1.
New Features:
Added
get
to thehtcondor credential
noun, which prints the contents of a stored OAuth2 credential. (HTCONDOR-2626)Added
htcondor2.set_ready_state()
for those brave few writing daemons in the Python bindings. (HTCONDOR-2615)When blah_debug_save_submit_info is set in blah.config, the
stdout
andstderr
of the blahp’s wrapper script is saved under the given directory. (HTCONDOR-2636)The DAG command SUBMIT-DESCRIPTION and node inline submit descriptions now work when DAGMAN_USE_DIRECT_SUBMIT =
False
. (HTCONDOR-2607)Docker universe jobs now check the Architecture field in the image, and if it doesn’t match the architecture of the EP, the job is put on hold. The new parameter DOCKER_SKIP_IMAGE_ARCH_CHECK skips this. (HTCONDOR-2661)
Added a configuration template, use feature:DefaultCheckpointDestination. (HTCONDOR-2403)
Bugs Fixed:
If HTCondor detects that an invalid checkpoint has been downloaded for a self-checkpoint jobs using third-party storage, that checkpoint is now marked for deletion and the job rescheduled. (HTCONDOR-1258)