Version 24 Feature Releases

We release new features in these releases of HTCondor. The details of each version are described below.

Version 24.5.1

Release Notes:

  • HTCondor version 24.5.1 released on March 4, 2025.

New Features:

  • The condor_starter now advertise StdoutMtime and StderrMtime which represent the most recent modification time, in seconds since the epoch of a job which uses file transfer. (HTCONDOR-2837)

  • The condor_startd, when running on a machine with Nvidia gpus, now advertises Nvidia driver version. (HTCONDOR-2856)

  • Increased the default width of condor_qusers output when redirected to a file or piped to another command to prevent truncation. (HTCONDOR-2861)

  • The condor_startd will now never lose track and leak logical volumes that were failed to be cleaned up when using STARTD_ENFORCE_DISK_LIMITS. The condor_startd will now periodically retry removal of logical volumes with an exponential backoff. (HTCONDOR-2852)

  • The condor_startd will now keep dynamic slots that have a SlotBrokenReason attribute in Unclaimed state rather than deleting them when they change state to Unclaimed. A new configuration variable CONTINUE_TO_ADVERTISE_BROKEN_DYNAMIC_SLOTS controls this behavior. It defaults to true but can be set to false to preserve the old behavior. This change also adds a new attribute BrokenContextAds to the daemon ad of the condor_startd. This attribute has a classad for each broken resource in the startd. condor_status has been enhanced to use this new attribute to display more information about the context of broken resources when both -startd and -broken arguments are used. (HTCONDOR-2844)

  • The condor_startd will now permanently reduce the total slot resources advertised by a partitionable slot when a dynamic slot is deleted while it is marked as broken. The amount of reduction will be advertised in new attributes such as ad-attr:BrokenSlotCpus so that the original size of the slot can be computed. (HTCONDOR-2865)

  • Daemons will now more quickly discover with a non-responsive condor_collector has recovered and resume advertising to it. (HTCONDOR-2605)

  • Jobs can now request user credentials generated by any combination of the OAuth2, Local Issuer, and Vault credential monitors on the AP. Remote submitters can request these credentials without having any of the CREDMON-related parameters in their configuration files. (HTCONDOR-2851)

  • HTCondor tarballs now contain Pelican 7.13.0

Bugs Fixed:

  • Fixed a bug where the condor_gridmanager would write to log file GridmanagerLog.root after a reconfig. (HTCONDOR-2846)

  • htcondor annex shutdown now works again. (HTCONDOR-2808)

  • Fixed a bug where the job state table DAGMan prints to its debug file could contain a negative number for the count of failed jobs. (HTCONDOR-2872)

  • Fixed a bug where chirp would not work in container universe jobs using the docker runtime. (HTCONDOR-2866)

  • Fixed a bug where referencing htcondor2.JobEvent.cluster could crash if processed log event was not associated with job(s) (i.e. had a negative value). (HTCONDOR-2881)

  • Fixed a bug that caused the condor_gridmanager to abort if a job that it was managing disappeared from the job queue (i.e. due to someone running condor_rm -force). (HTCONDOR-2845)

  • Fixed a bug that caused grid ads from different Access Points to overwrite each other in the collector. (HTCONDOR-2876)

  • Fixed a memory leak that can occur in any HTCondor daemon when an invalid ClassAd expression is encountered. (HTCONDOR-2847)

  • Fixed a bug that caused daemons to go into infinite recursion, eventually crashing when they ran out of stack memory. (HTCONDOR-2873)

Version 24.4.0

Release Notes:

  • HTCondor version 24.4.0 released on February 4, 2025.

New Features:

  • Improved validation and cleanup of EXECUTE directories. The EXECUTE directory must now be owned by the condor user when the daemons are started as root. The condor_startd will not attempt to clean an invalid EXECUTE directory nor will it alter the file permissions of an EXECUTE directory. (HTCONDOR-2789)

  • For batch grid universe jobs, the PATH environment variable values from the job ad and the worker node environment are now combined. Previously, only the PATH value from the job ad was used. The old behavior can be restored by setting blah_merge_paths=no in the blah.config file. (HTCONDOR-2793)

  • Many small improvements to condor_q -analyze and -better-analyze for pools that use partitionable slots. As a part of this, the condor_schedd was changed to provide match information for the autocluster of the job being analyzed, which condor_q will report if it is available. (HTCONDOR-2720)

  • The condor_startd now advertises a new attribute, SingularityUserNamespaces which is true when apptainer or singularity work and are using Linux user namespaces, and false when it is using setuid mode. (HTCONDOR-2818)

  • The condor_startd daemon ad now contains attributes showing the average and total bytes transferred to and from jobs during its lifetime. (HTCONDOR-2721)

  • The condor_credd daemon no longer listens on port 9620 by default, but rather uses the condor_shared_port daemon. (HTCONDOR-2763)

  • DAGMan will now periodically print a table regarding states of job placed to the Access Point to the debug log (*.dagman.out). The rate at which this table in printed is dictated by DAGMAN_PRINT_JOB_TABLE_INTERVAL (HTCONDOR-2794)

  • For arc grid universe jobs, the new submit command arc_data_staging can be used to supply additional elements to the DataStaging block of the ARC ADL that HTCondor constructs. (HTCONDOR-2774)

Bugs Fixed:

  • Changed the numeric output of htcondor job status so that the rounding to megabytes, gigabytes, etc. matches the binary definitions the rest of the tools use. (HTCONDOR-2788)

  • Fixed a bug in the negotiator that caused it to crash when matching offline ads. (HTCONDOR-2819)

  • Fixed a memory leak in the schedd that could be caused by SCHEDD_CRON scripts that generate standard error output. (HTCONDOR-2817)

  • Fixed a bug that cause the condor_schedd to crash with a segmentation fault if a condor_off -fast command was run while a schedd cron script was running. (HTCONDOR-2815)

  • Fixed issue where EP’s using STARTD_ENFORCE_DISK_LIMITS would fill up the EP’s filesystem due to excessive saving of metadata to /etc/lvm/archive. (HTCONDOR-2791)

  • Fixed bug where container_service_names did not work. (HTCONDOR-2829)

  • Fixed very rare bug that could cause the condor_startd to crash when the condor_collector times out queries and DNS is running very slowly. (HTCONDOR-2831)

  • Updated condor_upgrade_check to test for use for PASSWORD authentication and warn about the authenticated identity changing. (HTCONDOR-2823)

Version 24.3.0

Release Notes:

  • HTCondor version 24.3.0 released on January 6, 2025.

New Features:

Bugs Fixed:

  • Fixed a bug introduced in 24.2.0 where the daemons failed to start if configured to use only a network interface that didn’t have an IPv6 address. Also, the daemons will no longer bind and advertise an address that doesn’t match the value of NETWORK_INTERFACE. (HTCONDOR-2799)

  • The htcondor job submit command now issues credentials like condor_submit. (HTCONDOR-2745)

  • EPs spawned by htcondor annex no longer crash on start-up. (HTCONDOR-2745)

  • When resolving a hostname to a list of IP addresses, avoid using IPv6 link-local addresses. This change was done incorrectly in 23.9.6. (HTCONDOR-2746)

  • htcondor2.Submit.from_dag() and htcondor.Submit.from_dag() now correctly raises an HTCondor exception when the processing of DAGMan options and submit time DAG commands fails. (HTCONDOR-2736)

  • Fixed confusing job hold message that would state a job requested 0.0 GB of disk via request_disk when exceeding disk usage on Execution Points using STARTD_ENFORCE_DISK_LIMITS. (HTCONDOR-2753)

  • You can now locate a collector daemon in the htcondor2 Python bindings. (HTCONDOR-2738)

  • Fixed a bug in condor_qusers tool where the add argument would always enable rather than add a user. (HTCONDOR-2775)

  • Fixed a bug where cgroup systems did not report peak memory, as intended but current instantaneous memory instead. (HTCONDOR-2800) (HTCONDOR-2804)

  • Fixed an inconsistency in cgroup v1 systems where the memory reported by condor included memory used by the kernel to cache disk pages. (HTCONDOR-2807)

  • Fixed a bug on cgroup v1 systems where jobs that were killed by the Out of Memory killer did not go on hold. (HTCONDOR-2806)

  • Fixed incompatibility of condor_adstash with v2.x of the OpenSearch Python Client. (HTCONDOR-2614)

  • The -subsystem argument of condor_status is once again case-insensitive for credd and defrag subsystem types. (HTCONDOR-2796)

Version 24.2.2

Release Notes:

  • HTCondor version 24.2.2 released on December 4, 2024.

New Features:

  • None.

Bugs Fixed:

  • If knob EXECUTE is explicitly set to a blank string in the configuration file for whatever reason, the execution point (startd) may attempt to remove all files from the root partition (everything in /) upon startup. (HTCONDOR-2760)

Version 24.2.1

Release Notes:

  • HTCondor version 24.2.1 released on November 26, 2024.

  • This version includes all the updates from Version 24.0.2.

  • The DAGMan metrics file has changed the name of metrics referring to jobs to accurately refer to modern terminology as nodes. To revert back to old terminology set DAGMAN_METRICS_FILE_VERSION = 1. (HTCONDOR-2682)

New Features:

  • DAGMan will now correctly submit late materialization jobs to an Access Point when DAGMAN_USE_DIRECT_SUBMIT = True. (HTCONDOR-2673)

  • Added new submit command primary_unix_group, which takes a string which must be one of the user’s supplemental groups, and sets the primary group to that value. (HTCONDOR-2702)

  • Improved DAGMan metrics file to use updated terminology and contain more metrics. (HTCONDOR-2682)

  • A condor_startd which has ENABLE_STARTD_DAEMON_AD enabled will no longer abort when it cannot create the required number of slots of the correct size on startup. It will now continue to run; reporting the failure to the collector in the daemon ad. Slots that can be fully provisioned will work normally. Slots that cannot be fully provisioned will exist but advertise themselves as broken. This is now the default behavior because daemon ads are enabled by default. The condor_status tool has a new option -broken which displays broken slots and their reason for being broken. Use this option with the -startd option to display machines that are fully or partly broken. (HTCONDOR-2500)

  • A new job attribute FirstJobMatchDate will be set for all jobs of a single submission to the current time when the first job of that submission is matched to a slot. (HTCONDOR-2676)

  • Added new job ad attribute InitialWaitDuration, recording the number of seconds from when a job was queued to when the first launch happened. (HTCONDOR-2666)

  • condor_ssh_to_job when entering an Apptainer container now sets the supplemental unix group ids in the same way that vanilla jobs have them set. (HTCONDOR-2695)

  • IPv6 networking is now fully supported on Windows. (HTCONDOR-2601)

  • Daemons will no longer block trying to invalidate their ads in a dead collector when shutting down. (HTCONDOR-2709)

  • Added option FAST to configuration parameter MASTER_NEW_BINARY_RESTART. This will cause the condor_master to do a fast restart of all the daemons when it detects new binaries. (HTCONDOR-2708)

Bugs Fixed:

  • None.

Version 24.1.1

Release Notes:

  • HTCondor version 24.1.1 released on October 31, 2024.

  • This version includes all the updates from Version 24.0.1.

New Features:

Bugs Fixed:

  • If HTCondor detects that an invalid checkpoint has been downloaded for a self-checkpoint jobs using third-party storage, that checkpoint is now marked for deletion and the job rescheduled. (HTCONDOR-1258)