Version 8.9 Feature Releases¶
We release new features in these releases of HTCondor. The details of each version are described below.
Version 8.9.13¶
Release Notes:
HTCondor version 8.9.13 released on March 30, 2021.
We fixed a bug in how the IDTOKENS authentication method reads its signing keys. You can use the new condor_check_password tool to determine if your signing keys were truncated by this bug. If they were, we recommend that you reissue all tokens signed by the affected key(s). The condor_check_password tool may also be used to truncate the affected key(s) before the first byte causing the problem, which will allow existing tokens to continue to work, but is, perhaps substantially, less cryptographically secure. For more information, see https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=UpgradingToEightNineThirteen. (HTCONDOR-295) (HTCONDOR-369)
We have changed the default configuration file. It no longer sets
use security : host_based
. This may cause your existing, insecure security configuration to stop working. Consider updating your configuration and see the next item. (HTCONDOR-301)We have added a new configuration file (
config.d/00-htcondor-9.0.config
) to the packaging. It won’t be replaced by future upgrades, but it sets a new default for security,use security : recommended_v9_0
, whose contents could be updated in future point releases. You may safely delete this file if you’re upgrading. This file has an extensive explanatory comment, which includes the work-around for the previous item’s problem. (HTCONDOR-339)HTCondor will now access tokens in directory
/etc/condor/tokens.d
as user root, meaning this directory and its contents should be only accessible by user root for maximum security. If upgrading from an earlier v8.9.x release, it may currently be accessible by usercondor
, so recommend that admins issue commandchown -R root:root /etc/condor/tokens.d
. (HTCONDOR-266)As an improved security measure, HTCondor can now prohibit Linux jobs from running setuid executables by default. We believe the only common setuid program this impacts is
ping
. To prohibit jobs from running setuid binaries, setDISABLE_SETUID
totrue
in the configuration of the worker node. (HTCONDOR-256)SCITOKENS is now in the default list of authentication methods. (HTCONDOR-47)
Added configuration parameter
LEGACY_ALLOW_SEMANTICS
, which re-enables the behavior of HTCondor 8.8 and prior whenALLOW_DAEMON
orDENY_DAEMON
is not defined.This parameter is intended to ease the transition of existing HTCondor configurations from 8.8 to 9.0, and should not be used long-term or in new installations. (HTCONDOR-263)
On Windows, the condor_master now prefers TCP over UDP when forwarding
condor_reconfig
andcondor_off
commands to other daemons. (HTCONDOR-273)The condor_credmon_oauth now writes logs to the path in
CREDMON_OAUTH_LOG
if defined, which matches the typical log config parameter naming pattern, instead of using the path defined inSEC_CREDENTIAL_MONITOR_OAUTH_LOG
. (HTCONDOR-122)
Known Issues:
The MUNGE security method is not working at this time.
New Features:
By default, all HTCondor communications and HTCondor file transfers are protected by hardware-accelerated integrity and AES encryption. HTCondor automatically falls back to Triple-DES or Blowfish when inter-operating with a previous version. (HTCONDOR-222)
HTCondor now detects instances of multi-instance GPUs. These devices will be reported as individual GPUs, each named as required to access them via
CUDA_VISIBLE_DEVICES
. Some of the-extra
and-dynamic
properties are unavailable for these devices. Due to driver limitations, if MIG is enabled on any GPU, HTCondor will detect all devices on the system using NVML; some-extra
properties will not be reported for these systems as a result.This feature does not include reporting the utilization of MIG instances. (HTCONDOR-137)
If
SCITOKENS_FILE
is not specified, HTCondor will now follow WLCG’s bearer token discovery protocol to find it. See https://zenodo.org/record/3937438 for details. (HTCONDOR-92)Improvements made to error messages when jobs go on hold due to timeouts transferring files via HTTP. (HTCONDOR-355)
HTCondor now creates a number of directories on start-up, rather than fail later on when it needs them to exist. See the ticket for details. (HTCONDOR-73)
The default value for the knob DISABLE_SETUID is now false, so that condor_ssh_to_job works on machines with SELinux enabled. (HTCONDOR-358)
HTCondor daemons that read the configuration files as root when they start up will now also read the configuration files as root when they are reconfigured. (HTCONDOR-314)
To make the debugging logs more consistent, the slot name is always appended to the StarterLog. Previously, a single slot startd’s StarterLog would have no suffix. Now it will be called StarterLog.slot1. (HTCONDOR-178)
Added command-line options to condor_gpu_discovery to report GPUs multiple times. If your GPU jobs are small and known to be well-behaved, this makes it easier for them to share a GPU. (HTCONDOR-106)
Enhanced the optional job completion sent to the submitter to now include the batch name, if defined, and the submitting directory, so the user has a better idea of which job the job id is. (HTCONDOR-71)
Added submit commands
batch_project
andbatch_runtime
, used to set job attributesBatchProject
andBatchRuntime
for batch grid universe jobs. (HTCONDOR-44)Added a new tool/daemon condor_adstash that polls for job history ClassAds and pushes their contents to Elasticsearch. Setting
use feature: adstash
will run condor_adstash as a daemon, though care should be taken to configure it for your pool and Elasticsearch instance. See the Elasticsearch documentation in the admin manual for more detail.When token authentication (IDTOKENS or SCITOKENS) is used, HTCondor will now record the subject, issuer, scopes, and groups, from the token used to submit jobs. (HTCONDOR-90)
The python
schedd.submit
method now honors the spool argument even when the first argument is aSubmit
object. (HTCONDOR-131)When singularity is enabled, when there is an error running singularity test before the job, the first line of singularity stderr is logged to the hold message in the job. (HTCONDOR-133)
When the startd initializes, it runs the
condor_starter
with the -classad option to probe the features this starter support. As a side-effect, the starter logs some information to a StarterLog file. This StarterLog is almost never of interest when debugging jobs. To make that more clear, this starter log is now named StarterLog.testing. (HTCONDOR-132)The condor_collector can now use a projection when forwarding ads to a View Collector. A new configuration variable
COLLECTOR_FORWARD_PROJECTION
can be configured to enabled this. (HTCONDOR-51)The condor_drain command now has a
-reason
argument and will supply a default reason value if it is not used. The condor_defrag daemon will always passdefrag
as the reason so that draining initiated by the administrator can be distinguished by draining initiated by condor_defrag. (HTCONDOR-77)The condor_defrag daemon will now supply a
-reason
argument ofdefrag
and will ignore machines that have have a draining reason that is notdefrag
. (HTCONDOR-89)Added a new a ClassAd function to help write submit transforms. You can now use unresolved() to check for existing constraints on a particular attribute (or attribute regex). (HTCONDOR-66)
Added TensorFlow environment variables
TF_NUM_THREADS
andTF_LOOP_PARALLEL_ITERATIONS
to the list of environment variables exported by the condor_starter per these recommendations. (HTCONDOR-185)Certificate map files can now use the
@include
directive to include another file or all of the files in a directory. (HTCONDOR-46)The ClassAd returned by Python binding
htcondor.SecMan.ping()
now includes extra information about the daemon’s X.509 certificate if SSL or GSI authentication was used. (HTCONDOR-43)Added configuration parameter
GRIDMANAGER_LOG_APPEND_SELECTION_EXPR
, which allows each condor_gridmanager process to write to a separate daemon log file. When this parameter is set toTrue
, the evaluated value ofGRIDMANAGER_SELECTION_EXPR
(if set) will be appended to the filename specified byGRIDMANAGER_LOG
. (HTCONDOR-102)Added command-line argument
-jsonl
to condor_history. This will print the output in JSON Lines format (one ClassAd per line in JSON format). (HTCONDOR-35)Running
condor_ping
with no arguments now defaults to doing a ping against the condor_schedd at authorization level WRITE. (Ticket #7892) (HTCONDOR-246)Adjusted configuration defaults for condor_c-gahp so that a restart of the remote condor_collector or condor_schedd doesn’t result in a prolonged interruption of communication. Previously, communication could be interrupted for up to a day. (HTCONDOR-313)
Bugs Fixed:
Fixed a bug where daemons would leak memory whenever sending updates to the collector, at the rate of a few megabytes per day. This leak was introduced in an earlier HTCondor v8.9.x release. (HTCONDOR-323)
Malformed or missing SciToken can result in a schedd abort when using
SCHEDD_AUDIT_LOG
. (HTCONDOR-315)Fixed a problem where
condor_watch_q
would crash when updating totals for DAGman jobs. (HTCONDOR-201)Sufficiently old versions of the CUDA libraries no longer cause condor_gpu_discovery to segfault. (HTCONDOR-343)
Fixed a bug introduced in version 8.9.12, where non-root clients could not use IDTOKENS. (HTCONDOR-370)
Fixed a bug where an IDTOKEN could be sent to a user who had authenticated with the ANONYMOUS method after the auto-approval period had expired. (HTCONDOR-231)
HTCondor daemons used to access tokens in
/etc/condor/tokens.d
as usercondor
, now instead tokens are accessed as userroot
. (HTCONDOR-266)Fixed a bug where jobs that asked for
transfer_output_files = .
would be put on hold if they were evicted and restarted. (HTCONDOR-267)The
preserve_relative_paths
submit command now properly allows jobs to run on HTCondor versions 8.9.10 and later. (HTCONDOR-189)Utilization is now properly reported if
GPU_DISCOVERY_EXTRA
includes-uuid
. (HTCONDOR-137)Fixed a bug with singularity support where the job’s cwd wasn’t being set to the scratch directory when SINGULARITY_TARGET_DIR wasn’t also set. (HTCONDOR-91)
The tool
condor_store_cred
will now accept and use a handle for an OAuth cred, and the condor_credd will now honor the handle in the stored filename. (HTCONDOR-291)Condor-C (grid universe type condor) now works correctly when jobs use different SciTokens. (HTCONDOR-99)
Fixed a bug where file transfers using HTTP may fail unnecessarily after 30 seconds if the HTTP server does not include a Content-Length header. (HTCONDOR-354)
The local issuer in the
condor_credmon_oauth
gives more useful log output if it detects that the private key was generated with something other than the expected EC algorithm. (HTCONDOR-305)Fixed a bug with condor_dagman direct job submission where certain submit errors were not getting reported in the debug output. (HTCONDOR-85)
The minihtcondor DEB package no longer aborts if installed on systems that do not have systemd installed, as is common with Debian and Ubuntu docker containers. (HTCONDOR-362)
The
condor_procd
now detects and compensates for incomplete reads of the/proc
file system. (HTCONDOR-33)
Version 8.9.12¶
Release Notes:
HTCondor version 8.9.12 was released on March 25, 2021, and pulled the next day for significant incompatibilities with previous releases.
New Features:
None.
Bugs Fixed:
None.
Version 8.9.11¶
Release Notes:
HTCondor version 8.9.11 released on January 27, 2021.
New Features:
None.
Bugs Fixed:
Security Item: This release of HTCondor fixes security-related bugs described at
Version 8.9.10¶
Release Notes:
HTCondor version 8.9.10 released on November 24, 2020.
For condor_annex users: Amazon Web Services is deprecating support for the Python 2.7 runtime used by condor_annex. If you ran the condor_annex setup command with a previous version of HTCondor, you should update your setup to use the new runtime. (Go to the AWS Lambda console and look for the
HTCondorAnnex-CheckConnectivity
function; click on it. Scroll down to “Runtime settings”; click the “Edit” button. Select “Python 3.8” from the drop-down list under “Runtime”. Then hit the “Save” button. You’ll have to repeat this for each region you’re using.) (HTCONDOR-24)
New Features:
Added support for OAuth, SciTokens, and Kerberos credentials in local universe jobs. (Ticket #7693)
The python
schedd.submit
method now accepts aSubmit
object and itemdata to define the jobs, to be submitted. The use of a ClassAd to define the job is now deprecated for this method (Ticket #7853)A new Python method
schedd.edit
can be used to set multiple attributes for a job specification with a single call to this method. (HTCONDOR-28)Added a new
SCRIPT HOLD
feature to DAGMan, allowing users to define a script executable that runs when a job goes on hold. (HTCONDOR-65)Added a new
SUBMIT-DESCRIPTION
command to DAGMan, which allows inline jobs to share submit descriptions. (HTCONDOR-64)You may now tag instances from the command line of condor_annex. Use the
-tag <name> <value>
command-line option once for each tag. (Ticket #7834)When running a singularity job, the starter first runs singularity test if this returns non-zero, the job is put on hold. (Ticket #7801)
Added support for requesting GPUs with grid universe jobs of type batch. (Ticket #7757)
Added new configuration variable MIN_FLOCK_LEVEL, which can be used to specify how many of the remote HTCondor pools listed in
FLOCK_COLLECTOR_HOSTS
should always be flocked to. The default is 0. (HTCONDOR-62)Job attributes set by the job using the Chirp command
set_job_attr_delayed
are now propagated back to the originating condor_schedd by the Job Router and Condor-C (a.k.a grid universe typecondor
). (HTCONDOR-63)A new configuration variable DEFAULT_DRAINING_START_EXPR can be used to define what the
START
value of a slot should be while it is draining. This configuration variable is used when the command to drain does not have an override value forSTART
. (HTCONDOR-67)When a SEC_CREDENTIAL_PRODUCER is configured for condor_submit it now assumes that the CREDD is the current version when does not know what version it is, which is common when the CREDD is running on a different machine than condor_submit. (HTCONDOR-76)
The
--add
option of bosco_cluster now attempts to install a version of HTCondor on the remote cluster that closely matches the version installed locally. The new--url
option can be used to specify the URL from which the HTCondor binaries should be fetched. (HTCONDOR-21)The Python scripts distributed with HTCondor (except those dealing with the OAuth credmon) have been upgraded to run under Python 3. (Ticket #7698) (Ticket #7844) (Ticket #7872)
Added the ability to have finer grain control over the SSH connection when using the remote gahp. One can now specify the SSH port and also whether or not SSH BatchMode is used. (HTCONDOR-18) (HTCONDOR-19)
The condor_useprio tool now displays any submitter ceilings that are set. (Ticket #7837)
Added statistics to the collector ad about CCB. (Ticket #7842)
Bugs Fixed:
Fixed a bug introduced in 8.9.9 that, only when accounting groups with quotas were defined that caused the matchmaker to stop making new matches after several negotiation cycles. (HTCONDOR-83)
The condor_credd now signals the OAuth credmon, not the Kerberos credmon, when processing a locally-issued credential. (Ticket #7889)
Fixed a bug in DAGMan where a
_gotEvents
warning kept appearing incorrectly in the output file. (HTCONDOR-15)Fixed a bug which caused the
condor-annex-ec2
script to exit prematurely on some systemd platforms. (HTCONDOR-22)Fixed a bug specific to MacOS X which could cause the shared port daemon’s initial childalive message to be lost. This would cause condor_who to wrongly think that HTCondor hadn’t started up until the shared port daemon sent its second childalive message. (Ticket #7866)
Version 8.9.9¶
Known Issues:
If group quotas are in use, the negotiator will eventually stop making matches. This defect was introduced in HTCondor 8.9.9. It will be fixed in HTCondor 8.9.10 to be released on November 24, 2020. In the meantime, one may revert the Central Manager machine to HTCondor 8.9.8, leaving the remainder of the pool at HTCondor 8.9.9.
Release Notes:
HTCondor version 8.9.9 released on October 26, 2020.
The RPMs have been restructured to require additional packages from EPEL. In addition to the boost libraries, the RPMs depend on the Globus, munge, SciTokens, and VOMS libraries in EPEL. (Ticket #7681)
When the condor_startd is running as root on a Linux machine, unless CGROUP_MEMORY_LIMIT_POLICY is
none
, HTCondor now always sets both the soft and hard cgroup memory limit for a job. When CGROUP_MEMORY_LIMIT_POLICY issoft
, the soft limit is set to the slot size, and the hard limit is set to the TotalMemory of the whole startd. When CGROUP_MEMORY_LIMIT_POLICY ishard
, the hard limit is set to the slot size, and the soft limit is set 90% lower. Also added knob DISABLE_SWAP_FOR_JOB, which when set totrue
, prevents the job from using any swap space. This knob defaults tofalse
. (Ticket #7882)When running on a Linux system with cgroups enabled, the
MemoryUsage
attribute of a job no longer includes the memory used by the kernel disk cache. (Ticket #7882)We deprecated the exceptions raised by the Python Bindings. The new exceptions all inherit from
HTCondorException
orClassAdException
, according to the originating module. For backwards-compatibility, the new exceptions all also inherit the class of each exception type they replaced. (Ticket #6935)We changed the default value of
PROCD_ADDRESS
on Windows to make it less likely for multiple instances of HTCondor on the machine to collide. (Ticket #7789)The condor_schedd will no longer modify a job’s
User
attribute when the job’sNiceUser
attribute is set. Thenice_user
submit keyword is now implemented entirely by condor_submit. Because of this change thenice_user
mechanism will only work when condor_submit and the condor_schedd are both version 8.9.9 or later. (Ticket #7783)
New Features:
You may now instruct HTCondor to record certain information about the files present in the top level of a job’s sandbox and the job’s environment variables. The list of files is recorded when transfer-in completes and again when transfer-out starts. Set
manifest
totrue
in your submit file to enable, ormanifest_dir
to specify where the lists are recorded. See the condor_submit man page for details. (Ticket #7381)This feature is not presently available on Windows.
DAGMan now waits for
PROVISIONER
nodes to reach a ready status before submitting any other jobs. (Ticket #7610)Added a
-Dot
argument to condor_dagman which tells DAGMan to simply output a .dot file graphic representation of the dag, then exit immediately without submitting any jobs. (Ticket #7796)Set a variety of defaults into condor_dagman so it can now easily be invoked directly from the command line using
condor_dagman mydag.dag
(Ticket #7806)Singularity jobs now ignore bind mount directories if the source directory for the bind mount does not exist on the host machine (Ticket #7807)
Singularity jobs now ignore bind mount directories if the target directory for the bind mount does not exist in the image and SINGULARITY_IGNORE_MISSING_BIND_TARGET is set to
true
(default isfalse
). (Ticket #7846)Improved startup time of the daemons. (Ticket #7799)
Added a machine-ad attribute,
LastDrainStopTime
, which records the last time a drain command was cancelled. Added two attributes to the defrag daemon’s ad,RecentCancelsList
andRecentDrainsList
, which record information about the last ten cancel or drain commands, respectively, that the defrag daemon sent. (Ticket #7732)The accounting group that the
nice_user
submit command puts jobs into is now configurable by settingNICE_USER_ACCOUNTING_GROUP_NAME
in the configuration of condor_submit. (Ticket #7792)Python 3 bindings are now available on macOS. They are linked against Python 3.8 provided by python.org. (Ticket #7090)
Added oauth-services method to the python-bindings
Submit
class. The python-bindingsCredCheck
class can now be used to check if the OAuth services that a job needs are present before the job is submitted. (Ticket #7606)The Python API daemon objects
Schedd
,Startd
,Negotiator
andCredd
now have a location member whose value can be passed to the constructor of a class of the same type to create a new object pointing to the same HTCondor daemon. (Ticket #7670)The Python API daemon object
Schedd
constructor now accepts None and interprets that to be the address of the local HTCondor Schedd. (Ticket #7668)The Python API now includes the job status enumeration. (Ticket #7726)
The Python API methods that take a constraint argument will now accept an :class:
~classad.ExprTree
in addition to the native Python types, string, bool, int and None. (Ticket #7657)Updated the
htcondor.Submit.from_dag()
Python binding to support the full range of command-line arguments available to condor_submit_dag. (Ticket #7823)Added the
htcondor.personal
module to the Python bindings. Its primary feature is thehtcondor.personal.PersonalPool
class, which is responsible for managing the life-cycle of a “personal” single-machine HTCondor pool. A personal pool can (for example) be used for testing and development of HTCondor workflows before deploying to a larger pool. Personal pools do not require administrator/root privileges. HTCondor itself must still be installed on your system. (Ticket #7745)Added a family of version comparison functions to ClassAds. (Ticket #7504)
Added the OAuth2 Credmon (aka “SciTokens Credmon”) daemon (condor_credmon_oauth), WSGI application, helper libraries, example configuration, and documentation to HTCondor for Enterprise Linux 7 platforms. (Ticket #7741)
The bosco_cluster can optionally specify the remote installation directory. (Ticket #7843)
HTCondor lets the administrator know when a SciToken mapping contains a trailing slash and optionally allow it to map. It is easy for an administrator to overlook the trailing slash when cutting a pasting from a browser. (Ticket #7557)
Bugs Fixed:
Fixed a bug that could cause the condor_schedd to abort if a SUBMIT_REQUIREMENT prevented a late materialization job from materializing. (Ticket #7874)
condor_annex -check-setup
now respects the configuration settingANNEX_DEFAULT_AWS_REGION
. In addition,condor_annex -setup
now setsANNEX_DEFAULT_AWS_REGION
if it hasn’t already been set. This makes first-time setup in a non-default region much less confusing. (Ticket #7832)Fixed a bug introduced in 8.9.6 where enabling pid namespaces in the startd would make every job go on hold. (Ticket #7797)
condor_watch_q now correctly groups jobs submitted by DAGMan after condor_watch_q has started running. (Ticket #7800)
Fixed a bug in the ClassAd library where calling the ClassAd sum function on an empty list returned undefined. It now returns 0. (Ticket #7838)
Fixed a bug in Docker Universe that caused a confusing warning message about an unaccessible file in /root/.docker (Ticket #7805)
Fixed a bug in the condor_collector that caused it to handle queries from the condor_negotiator at normal priority instead of high priority. (Ticket #7729)
Fixed attribute
ProportionalSetSizeKb
to behave the same asResidentSetSize
in the slot ad. (Ticket #7787)Removed the Java benchmark
JavaMFlops
from the machine ad. (Ticket #7795)Read IDTOKENS used by daemons with the correct UID. (Ticket #7767)
Fixed the Python
htcondor.Submit.from_dag()
binding so it now throws anIOError
exception when the specified .dag file is not found. (Ticket #7808)Fixed a bug that would cause a job to go on hold with a memory usage exceeded message in the rare case where the usage could not be obtained. (Ticket #7886)
condor_q no longer prints misleading message about the matchmaker when asked to analyze a job. (Ticket #5834)
Version 8.9.8¶
Release Notes:
HTCondor version 8.9.8 released on August 6, 2020.
Fixed some issues with the condor_schedd validating attribute values and actions from condor_qedit. Certain edits could cause the condor_schedd to enter an invalid state and in some cases would required editing of the job queue to restore the condor_schedd to operation. While no security exploits are known to be possible, mischievous users could potentially disrupt the operation of the condor_schedd. A more detailed description and workaround for these issues can be found in the ticket. (Ticket #7784)
The
SHARED_PORT_PORT
setting is now honored. If you are using a non-standard port on machines other than the Central Manager, this bug fix will a require configuration change in order to specify the non-standard port. (Ticket #7697)API change in the Python bindings. The
classad.ExprTree
constructor now tries to parse the entire string passed to it. Failure results in aSyntaxError
. This prevents strings like"foo = bar"
from silently being parsed as justfoo
and causing unexpected results. (Ticket #7607)API change in the Python bindings. The
classad.ExprTree
constructor now acceptsclassad.ExprTree
(creating an identical copy) in addition to strings, making it easier to handle inputs uniformly. (Ticket #7654)API change in the Python bindings: we deprecated
Schedd.negotiate()
. (Ticket #7524)API change in the Python bindings: we deprecated the classes
htcondor.Negotiator
,htcondor.FileLock
,htcondor.EventIterator
, andhtcondor.LogReader
, as well as the functionshtcondor.lock()
andhtcondor.read_events()
. (Ticket #7690)API change in the Python bindings: the methods
htcondor.Schedd.query()
,htcondor.Schedd.xquery()
, andhtcondor.Schedd.history()
now use the argument namesconstraint
andprojection
(for the query condition and the attributes to return from the query) consistently. The old argument names (requirements
andattr_list
) are deprecated, but will still work (raising aFutureWarning
when used) until a future release. (Ticket #7630)Removed the condor_dagman
node_scheduler
module, which contains earlier implementations of several DAGMan components and has not been used in a long time. (Ticket #7674)
New Features:
Added a new Python bindings sub-package,
htcondor.dags
, which contains tools for writing DAGMan input files programmatically using high-level abstractions over the basic DAGMan constructs. There is a new tutorial at HTCondor Python Bindings Tutorials walking through a basic use case.htcondor.dags
is very new and its API has not fully stabilized; it is possible that there will be deprecations and breaking changes in the near future. Bug reports and feature requests greatly encouraged! (Ticket #7682)Added a new Python bindings subpackage,
htcondor.htchirp
. This subpackage provides theHTChirp
andcondor_chirp()
objects for using the Chirp protocol inside a+WantIOProxy = true
job. (Ticket #7330)Added a new tool, condor_watch_q, a live-updating job status tracker that does not repeatedly query the condor_schedd like
watch condor_q
would. It includes options for colored output, progress bars, and a minimal language for exiting when certain conditions are met. The man page can be found here: condor_watch_q. condor_watch_q is still under development; several known issues are summarized in the ticket. (Ticket #7343)When the condor_master starts in background mode, which is the default, control is not returned until the background condor_master has created the MasterLog and is ready to accept commands. (Ticket #7667)
Added options
-short-uuid
and-uuid
to the condor_gpu_discovery tool. These options use the NVIDIA uuid assigned to each GPU to produce stable identifiers for each GPU so that devices can be taken offline without causing confusion about which of the remaining devices a job is using. (Ticket #7696)Configuration variables of the form OFFLINE_MACHINE_RESOURCE_<TAG> such as OFFLINE_MACHINE_RESOURCE_GPUs will now take effect on a condor_reconfig. (Ticket #7651)
HTCondor now supports setting an upper bound on the number of cores user can be given. This is called the submitter ceiling. The ceiling can be set with the
condor_userprio -setceiling
command line option. (Ticket #7702)The condor_startd now detects whether user namespaces can be created by unprivileged processes. If so, it advertises the ClassAd attribute
HasUserNamespaces
. In this case, container managers like singularity can be run without setuid root. (Ticket #7625)Added a SEC_CREDENTIAL_SWEEP_DELAY configuration parameter which specifies how long, in seconds, we should wait before cleaning up unused credentials. (Ticket #7484)
classad_eval now allows its first (ClassAd) argument to be just the interior of a single ClassAd. That is, you no longer need to surround the first argument with square brackets. This means that
classad_eval 'x = y; y = 7' 'x'
will now correctly return7
. (Ticket #7621)classad_eval now allows you to freely mix (partial) ClassAds, single attribute assignments, and the expressions you want to evaluate. This means that
classad_eval 'x = y' 'y = 7' 'x'
will now return7
. The ad used to evaluate an expression will be printed before the expression’s result, unless doing so would repeat the previous expression’s ad; use the-quiet
flag to disable. (Ticket #7341)Improved the efficiency of process monitoring in macOS. (Ticket #7708)
The condor_startd now handles STARTD_SLOT_ATTRS after STARTD_ATTRS and STARTD_PARTITIONABLE_SLOT_ATTRS so that custom slot attributes describing the resources of dynamic children can be referred to by STARTD_SLOT_ATTRS (Ticket #7588)
Updated condor_q so when called with the
-dag
flag and a DAGMan job ID, it will display all jobs running under any nested sub-DAGs. (Ticket #7483)Direct job submission in condor_dagman now reports warning messages related to job submission (for example, possible typos in submit arguments) to help debug problems with jobs not running correctly. (Ticket #7568)
condor_dagman now allows jobs to be described with an inline submit description, instead of referencing a separate submit file. See the Inline Submit Descriptions section for more details. (Ticket #7352)
Improved messaging for the condor_drain tool to indicate that it is only draining the single specified condor_startd. If the target host has multiple condor_startd daemons running, the other instances will not be drained. (Ticket #7664)
Added new authentication method names
FAMILY
andMATCH
. These represent automated establishment of trust between daemons. They can not be used as values for configuration parameters such as SEC_DEFAULT_AUTHENTICATION_METHODS.FAMILY
represents a security session between daemons within the same family of OS processes.MATCH
represents a security session between daemons mediated through a central manager (condor_collector and condor_negotiator) that both daemons trust. These values will be most visible in the attributeAuthenticationMethod
in ClassAds advertised in the condor_collector. (Ticket #7683)Added a new submit file option,
docker_network_type = none
, which causes a docker universe job to not have any network connectivity. (Ticket #7701)Docker jobs now respect CPU Affinity. (Ticket #7627)
Added a
debug
option to bosco_cluster to help diagnose ssh failures. (Ticket #7712)The condor_submit executable will not abort if the submitting user has a gid of 0. Jobs still will not run with root privileges, but this allows jobs to be submitted which are assigned an
Owner
via the result of user mapping from authentication. (Ticket #7662)The condor_store_cred tool can now be used to manage different kinds of credentials, including Password, Kerberos, and OAuth. (Ticket #6868)
Bugs Fixed:
Fixed a segmentation fault in the condor_schedd that could happen on some platforms when handling certain condor_startd failures after invoking condor_now. (Ticket #7692)
classad_eval no longer ignores trailing garbage in its first (ClassAd) argument. This prevents
classad_eval 'x = y; y = 7' 'x'
from incorrectly returningundefined
. (Ticket #7621)An ID token at the end of a file lacking a trailing newline is no longer ignored. (Ticket #7499)
condor_token_request_list will now correctly list requests with request IDs starting with the number
0
. (Ticket #7641)Fixed a bug introduced in 8.9.3 that cause the condor_chirp tool to crash when passed the
getfile
argument. (Ticket #7612)Added
OMP_THREAD_LIMIT
to list of environment variables to let programs likeR
know the maximum number of threads it should use. (Ticket #7649)Fixed a bug in Docker Universe that prevented administrator-defined bind-mounts from working correctly. (Ticket #7635)
If the administrator of an execute machine has disabled file transfer plugins by setting ENABLE_URL_TRANSFERS to
False
, then the machine Ad in the collector will no longer advertise support, which will prevent jobs from matching there and attempting to run. (Ticket #7707)Fixed a bug in condor_dagman where completed jobs incorrectly showed a warning message related to job events. (Ticket #7548)
Stopped HTCondor from sweeping OAuth credentials too aggressively, during the window between credential creation and job submission. The condor_credd will now wait SEC_CREDENTIAL_SWEEP_INTERVAL seconds before cleaning them up, and the default is 300 seconds. (Ticket #7484)
When authenticating, clients now only suggest methods that it supports, rather than providing a list of methods where it will reject some. This improves the initial security handshake. (Ticket #7500)
For RPM installations, the HTCondor Python bindings RPM will now be automatically installed whenever the condor RPM is installed. (Ticket #7647)
Bosco will use the newer version (1.3) of the tarballs on Enterprise Linux 7 and 8. (Ticket #7753)
HTCondor no longer probes the file transfer plugins except in the starter and then only if they are actually being used. This was potentially adding delays to starting individual shadows, which when starting a lot of shadows could lead to scalability issues on a submit machine. (Ticket #7688)
Version 8.9.7¶
Release Notes:
HTCondor version 8.9.7 released on May 20, 2020.
The
TOKEN
authentication method has been renamed toIDTOKENS
to better differentiate it from theSCITOKENS
method. All sites are encouraged to update their configurations accordingly; however, the configuration files and wire protocol remains backward compatible with prior releases. (Ticket #7540)HTCondor now advertises
CUDAMaxSupportedVersion
(when appropriate). This attribute is an integer representation of the highest CUDA version the machine’s driver supports. HTCondor no longer advertises the attributeCUDARuntimeVersion
. (Ticket #7413)If you know what a shared port ID is, it may interest you to learn that starters in this version of HTCondor use their slot names, if available, in their shared port IDs. (Ticket #7510)
New Features:
You may now specify that HTCondor only transfer files when the job succeeds (as defined by
success_exit_code
). Setwhen_to_transfer_output
toON_SUCCESS
. When you do, HTCondor will transfer files only when the job exits (in the sense ofON_EXIT
) with the specified success code. This is intended to prevent unsuccessful jobs from going on hold because they failed to produce the expected output (file(s)). (Ticket #7270)HTCondor may now preserve the relative paths you specify when transferring files. See the condor_submit man page about
preserve_relative_paths
. (Ticket #7338)You may now specify a distinct list of files for use with the vanilla universe’s support for application-level checkpointing (
checkpoint_exit_code
). Usetransfer_checkpoint_files
if you’d like to shorten yourtransfer_output_files
list by removing files only needed for checkpoints. See the condor_submit man page. (Ticket #7269)The condor_job_router configuration and transform language has changed. The Job Router will still read the old configuration and transforms, but the new configuration syntax is much more flexible and powerful.
Routes are now a modified form of job transform. JOB_ROUTER_ROUTE_NAMES` defines both the order and which routes are enabled
Multiple pre-route and post-route transforms that apply to all routes can be defined.
The Routes and transforms use the same syntax and transform engine as SUBMIT_TRANSFORM_NAMES.
HTCondor now offers a submit command,
cuda_version
, so that jobs can describe which CUDA version (if any) they use. HTCondor will use that information to match the job with a machine whose driver supports that version of CUDA. See the condor_submit man page. (Ticket #7413)Tokens can be blacklisted by setting the SEC_TOKEN_BLACKLIST_EXPR configuration parameter to an expression matching the token contents. Further, a unique ID has been added to all generated tokens, allowing individual tokens to be blacklisted. (Ticket #7449) (Ticket #7450)
If the condor_master cannot authenticate with the collector then it will automatically attempt to request an ID token (which the collector administrator can subsequently approve). This now matches the behavior of the condor_schedd and condor_startd. (Ticket #7447)
The condor_token_request_list can now print out pending token requests when invoked with the
-json
flag. (Ticket #7454)Request IDs used for condor_token_request are now zero-padded, ensuring they are always a fixed length. (Ticket #7461)
All token generation and usage is now logged using HTCondor’s audit log mechanism. (Ticket #7450)
The new SEC_TOKEN_REQUEST_LIMITS configuration parameter allows administrators to limit the authorizations available to issued tokens. (Ticket #7455)
HTCondor now allows OAuth tokens and Kerberos credentials to be enabled on the same machine. This involves some changes to the way these two features are configured. condor_store_cred and the Python bindings has new commands to allow Kerberos and OAuth credentials to be stored and queried. (Ticket #7462)
The submit command
getenv
can now be a list of environment variables to import and not justTrue
orFalse
. (Ticket #7572)The condor_history command now has a
startd
option to query the condor_startd history file. This works for both local and remote queries. (Ticket #7538)The
-submitters
argument to condor_q` now correctly shows jobs for the given submitter name, even when the submitter name is an accounting group. (Ticket #7616)The accountant ads that condor_userprio displays have two new attributes. The
SubmitterLimit
contains the fair share, in number of cores, that this submitter should have access to, if they have sufficient jobs, and they all match. TheSubmitterShares
is the percentage of the pool they should have access to. (Ticket #7626) (Ticket #7453)When running on a Linux system with cgroups enabled, the MemoryUsage attribute of a job now includes the memory used by the kernel disk cache. This helps users set Request_Memory to more useful values. (Ticket #7442)
Docker universe now works inside an unprivileged personal HTCondor, if you give the user starting the personal condor rights to run the docker commands. (Ticket #7485)
The condor_master and other condor daemons can now run as PID 1. This is useful when starting HTCondor inside a container. (Ticket #7472)
When worker nodes are running on CPUs that support the AVX512 instructions, the condor_startd now advertises that fact with has_avx512 attributes. (Ticket #7528)
Added
GOMAXPROCS
to the default list of environment variables that are set to the number of CPU cores allocated to the job. (Ticket #7418)Added the option for condor_dagman to remove jobs after reducing MaxJobs to a value lower than the number of currently running jobs. This behavior is controlled by the DAGMAN_REMOVE_JOBS_AFTER_LIMIT_CHANGE macro, which defaults to False. (Ticket #7368)
The new configuration parameter NEGOTIATOR_SUBMITTER_CONSTRAINT defines an expression which constrains which submitter ads are considered for matchmaking by the condor_negotiator. (Ticket #7490)
Removed the unused and always set to zero job attribute LocalUserCpu and LocalSysCpu (Ticket #7546)
condor_submit now treats
request_gpu
as a typo and suggests thatrequest_gpus
may have been what was intended. This is the same way that it treatsrequest_cpu
. (Ticket #7421)Feature to enhance the reliability of condor_ssh_to_job is now on by default: CONDOR_SSH_TO_JOB_FAKE_PASSWD_ENTRY is now
true
(Ticket #7536)Enhanced the dataflow jobs that we introduced in version 8.9.5. In addition to output files, we now also check the executable and stdin files. If any of these are newer than the input files, we consider this to be a dataflow job and we skip it if SHADOW_SKIP_DATAFLOW_JOBS set to
True
. (Ticket #7488)When HTCondor is running as root on a Linux machine, it now makes /dev/shm a private mount for jobs. This means that files written to /dev/shm in one job aren’t visible to other jobs, and that HTCondor now cleans up any leftover files in /dev/shm when the job exits. If you want to the old behavior of a shared /dev/shm, you can set MOUNT_PRIVATE_DEV_SHM to
false
. (Ticket #7443)When configuration parameter HAD_USE_PRIMARY is set to
True
, the collectors will be queried in the order in which they appear in HAD_LIST. Otherwise, the order in which the collectors are queried will be randomized (before, this was always done). (Ticket #7556)Added a very basic
PROVISIONER
node type to the condor_dagman parse language and plumbing. When this work is completed in a future release, it will allow users to provision remote compute resources (ie. Amazon EC2, Argonne Cooley) as part of their DAG workflows, then run their jobs on these resources. (Ticket #5622)A new attribute
ScratchDirFileCount
was added to the Job ClassAd and to the Startd ClassAd. It contains the number of files in the job sandbox for the current job. This attribute will be refreshed as the same time thatDiskUsage
is refreshed. (Ticket #7486)A new configuration macro SUBMIT_GENERATE_CUSTOM_RESOURCE_REQUIREMENTS can be used to disable the behavior of condor_submit to generate Requirements clauses for job attributes that begin with Request (Ticket #7513)
Made some performance improvements in the condor_collector. This includes new configuration parameter COLLECTOR_FORWARD_CLAIMED_PRIVATE_ADS, which reduces the amount of data forwarded between condor_collectors. (Ticket #7440) (Ticket #7423)
condor_install can now generate a script to set environment variables for the “fish” shell. (Ticket #7505)
Bugs Fixed:
The Box.com file transfer plugin now implements the chunked upload method, which means that uploads of 50 MB or greater are now possible. Prior to this implementation, jobs uploading large files would unexpectedly go on hold. (Ticket #7531)
The curl_plugin previously implemented a minimum speed timeout with an option flag that caused memory problems in older versions of libcurl. We’ve reimplemented timeouts now using a callback that manually enforces a minimum 1 byte/second transfer speed. (Ticket #7414)
Some URLs for keys in AWS S3 buckets were previously of the form
s3://<bucket>.s3-<region>.amazonaws.com/<key>
. Not all regions support this form of address; instead, you must use URLs of the forms3://<bucket>.s3.<region>.amazonaws.com/<key>
. HTCondor now allows and requires the latter; you will have to change older submit files. (Ticket #7517)Amazon’s S3 service used to allow bucket names with underscores or capital letters. HTCondor can now download from and upload to buckets with this sort of name. (Ticket #7477)
The condor_token family of tools now respect the
-debug
command line flag. (Ticket #7448)The condor_token_request_list tool now respects the
-reqid
flag. (Ticket #7448)Tokens with authorization limits no longer need to explicitly list the
ALLOW
authorization, fixing a regression from 8.9.4. (Ticket #7456)Fixed a bug where Kerberos principals were being set incorrectly when KERBEROS_SERVER_PRINCIPAL was set. (Ticket #7577)
The packaged versions of HTCondor automatically creates the directories to hold pool passwords, tokens, and Kerberos and OAuth credentials. (Ticket #7117)
The HTCondor central manager will generate a pool password if needed on startup or reconfiguration. (Ticket #7634)
Fixed a bug in reading service account credentials when submitting to Google Compute Engine (grid universe, grid-type
gce
). (Ticket #7555)To work around an issue where long-running gce_gahp process enter a state where they can no longer authenticate with GCE, the daemon now restarts once every 24 hours. This does not affect the jobs themselves. See GCE Configuration Variables. (Ticket #7401)
Fixed a bug that prevented the condor_schedd from effectively flocking to pools when resource request list prefetching is enabled, which is the default in HTCondor version 8.9 (Ticket #7549) (Ticket #7539)
It is now safe to call functions from the Python bindings
htcondor
module on multiple threads simultaneously. See the Thread Safety section in the Python bindings documentation for more details. (Ticket #7359)Our
htcondor.Submit.from_dag()
Python binding now throws an exception when it fails, giving the programmer a chance to catch and recover. Previously this just caused Python to fall over and die immediately. (Ticket #7337)The RPM packaging now obsoletes the standard universe package so that it will deleted upon upgrade. (Ticket #7444)
Restored setting RUNPATH instead of RPATH for the libcondor_utils shared library and the Python bindings. The accidental change to setting RPATH in 8.9.5 altered how libraries were found when
LD_LIBRARY_PATH
is set. (Ticket #7584)The location for the CA certificates on Debian and Ubuntu systems is now properly set. (Ticket #7569)
Fixed a bug where the condor_schedd and condor_negotiator couldn’t talk to each other if one was version 8.9.3 and the other was version 8.9.4 or later. (Ticket #7615)
Version 8.9.6¶
Release Notes:
HTCondor version 8.9.6 released on April 6, 2020.
New Features:
None.
Bugs Fixed:
Security Item: This release of HTCondor fixes security-related bugs described at
Version 8.9.5¶
Release Notes:
HTCondor version 8.9.5 released on January 2, 2020.
New Features:
Implemented a dataflow mode for jobs. When enabled, a job whose 1) pre-declared output files already exist, and 2) output files are more recent than its input files, is considered a dataflow job and gets skipped. This feature can be enabled by setting the SHADOW_SKIP_DATAFLOW_JOBS configuration option to
True
. (Ticket #7231)Added a new tool, classad_eval, that can evaluate a ClassAd expression in the context of ClassAd attributes, and print the result in ClassAd format. (Ticket #7339)
You may now specify ports to forward into your Docker container. See Docker and Networking for details. (Ticket #7322)
Added the ability to edit certain properties of a running condor_dagman workflow: MaxJobs, MaxIdle, MaxPreScripts, MaxPostScripts. A user can call condor_qedit to set new values in the job ad, which will then be updated in the running workflow. (Ticket #7236)
Jobs which must use temporary credentials for S3 access may now specify the “session token” in their submit files. Set
+EC2SessionToken
to the name of a file whose only content is the session token. Temporary credentials have a limited lifetime, which HTCondor does not help you manage; as a result, file transfers may fail because the temporary credentials expired. (Ticket #7407)Improved the performance of the negotiator by simplifying the definition of the condor_startd’s
WithinResourceLimits
attribute when custom resources are defined. (Ticket #7323)If you configure a condor_startd with different SLOT_TYPEs, you can use the SLOT_TYPE as a prefix for configuration entries. This can be useful to set different BASE_GROUPs for different slot types within the same condor_startd. For example,
SLOT_TYPE_1.BASE_CGROUP = hi_prio
(Ticket #7390)Added a new knob SUBMIT_ALLOW_GETENV. This defaults to
true
. When set tofalse
, a submit file with getenv = true will become an error. Administrators may want to set this tofalse
to prevent users from submitting jobs that depend on the local environment of the submit machine. (Ticket #7383)condor_submit will no longer set the
Owner
attribute of jobs it submits to the name of the current user. It now leaves this attribute up to the condor_schedd to set. This change was made because the condor_schedd will reject the submission if theOwner
attribute is set but does not match the name of the mapped authenticated user submitting the job, and it is difficult for condor_submit to know what the mapped name is when there is a map file configured. (Ticket #7355)Added ability for a condor_startd to log the state of Ads when shutting down using STARTD_PRINT_ADS_ON_SHUTDOWN and STARTD_PRINT_ADS_FILTER. (Ticket #7328)
Bugs Fixed:
condor_submit -i
now works with Docker universe jobs. (Ticket #7394)Fixed a bug that happened on a Linux condor_startd running as root where a running job getting close to the
RequestMemory
limit, could get stuck, and neither get held with an out of memory error, nor killed, nor allowed to run. (Ticket #7367)The Python 3 bindings no longer cause a segmentation fault when putting a
ClassAd
constructed from a Python dictionary into anotherClassAd
. (Ticket #7371)The Python 3 bindings were missing the division operator for
ExprTree
. (Ticket #7372)When calling
classad.ClassAd.setdefault()
without a default, or with a default of None, if the default is used, it is now treated as theclassad.Value.Undefined
ClassAd value. (Ticket #7370)Fixed a bug where when file transfers fail with an error message containing a newline (
\n
) character, the error message would not be propagated to the job’s hold message. (Ticket #7395)SciTokens support is now available on all Linux and MacOS platforms. (Ticket #7406)
Fixed a bug that caused the Python bindings included in the tarball package to fail due to a missing library dependency. (Ticket #7435)
Fixed a bug where the library that is pre-loaded to provide a sane passwd entry when using condor_ssh_to_job was placed in the wrong directory in the RPM packaging. (Ticket #7408)
Version 8.9.4¶
Release Notes:
HTCondor version 8.9.4 released on November 19, 2019.
The Python bindings are now packaged as extendable modules. (Ticket #6907)
The format of the aborted event has changed. This will only affect you if you’re not using one the readers provided by HTCondor. (Ticket #7191)
DAGMAN_USE_JOIN_NODES is now on by default. (Ticket #7271)
New Features:
HTCondor now supports secure download and upload to and from S3. See the condor_submit man page and File Transfer Using a URL. (Ticket #7289)
Reduced the memory needed for condor_dagman to load a DAG that has a large number of PARENT and CHILD statements. (Ticket #7170)
Optimized condor_dagman startup speed by removing unnecessary 3-second sleep. (Ticket #7273)
Added a new option to condor_q. -idle shows only idle jobs and their requested resources. (Ticket #7241)
SciTokens support is now available. (Ticket #7248)
Added a new tool, condor_evicted_files, to help users find files that HTCondor is holding on to for them (as a result of a job being evicted when
when_to_transfer_output = ON_EXIT_OR_EVICT
, or checkpointing whenCheckpointExitCode
is set). (Ticket #7038)Added
erase_output_and_error_on_restart
as a new submit command. It defaults totrue
; if set tofalse
, andwhen_to_transfer_output
isON_EXIT_OR_EVICT
, HTCondor will append to the output and error logs when the job restarts, instead of erasing them (and starting the logs over). This may make the output and error logs more useful when the job self-checkpoints. (Ticket #7189)Added
$(SUBMIT_TIME)
,$(YEAR)
,$(MONTH)
, and$(DAY)
as built-in submit variables. These expand to the time of submission. (Ticket #7283)GPU monitoring is now on by default. It reports
DeviceGPUsAverageUsage
andDeviceGPUsMemoryPeakUsage
for slots with GPUs assigned. These values are for the lifetime of the condor_startd. Also, we renamedGPUsUsage
toGPUsAverageUsage
because all other usage values are peaks. We also now report GPU memory usage in the job termination event. (Ticket #7201)Added new configuration parameter for execute machines, CONDOR_SSH_TO_JOB_FAKE_PASSWD_ENTRY, which defaults to
false
. Whentrue
, condor LD_PRELOADs into unprivileged sshd it condor_startd a special version of the Linux getpwnam() library call, which forces the user’s shell to /bin/bash and the home directory to the scratch directory. This allows condor_ssh_to_job to work on sites that don’t create login shells for slots users, or who want to run as nobody. (Ticket #7260)The
htcondor.Submit.from_dag()
static method in the Python bindings, which creates a Submit description from a DAG file, now supports keyword arguments (in addition to positional arguments), and theoptions
argument is now optional:dag_args = { "maxidle": 10, "maxpost": 5 } # with keyword arguments for filename and options dag_submit = htcondor.Submit.from_dag(filename = "mydagfile.dag", options = dag_args) # or like this, with no options dag_submit = htcondor.Submit.from_dag(filename = "mydagfile.dag")
Added an example of a multi-file plugin to transfer files from a locally mounted Gluster file system. This script is also designed to be a template for other file transfer plugins, as the logic to download or upload files is clearly indicated and could be easily changed to support different file services. (Ticket #7212)
Added a new multi-file transfer plugin for downloading files from Microsoft OneDrive user accounts. This supports URLs like “onedrive://path/to/file” and using the plugin requires the administrator configure the condor_credd to allow users to obtain Microsoft OneDrive tokens and requires the user request Microsoft OneDrive tokens in their submit file. (Ticket #7171)
Externally-issued SciTokens can be exchanged for an equivalent HTCondor-issued token, enabling authorization flows in some cases where SciTokens could not otherwise be used (such as when the remote daemon has no host certificate). (Ticket #7281)
The condor_annex tool will now check during setup for instance credentials if none were specified. (Ticket #7097)
The condor_schedd now keeps track of which submitters it has advertised to flocked pools. The condor_schedd will only honor matchmaking requests from flocked pool for submitters it did not advertise to the flock pool. This new logic only applies to auto-created authorizations (introduced in 8.9.3) and not NEGOTIATOR-level authorizations setup by pool administrators. (Ticket #7100)
Added Python bindings for the TOKEN request API. (Ticket #7162)
In addition to administrators, token requests can be approved by the user whose identity is requested. (Ticket #7159)
Bugs Fixed:
The curl_plugin now correctly advertises
file
andftp
as supported methods. (Ticket #7357)Fixed a bug where condor_ssh_to_job to a Docker universe job landed outside the container if the container had not completely started. (Ticket #7246)
Fixed a bug where Docker universe jobs were always hard-killed (sent SIGKILL). The appropriate signals are now being sent for hold, remove, and soft kill (defaulting to SIGTERM). This gives Docker jobs a chance to shut down cleanly. (Ticket #7247)
condor_submit and the python bindings
Submit
object will no longer treat submit commands that begin withrequest_<tag>
as custom resource requests unless<tag>
does not begin with an underscore, and is at least 2 characters long. (Ticket #7172)The python bindings
Submit
object now converts keys of the form+Attr
toMY.Attr
when setting and getting values into theSubmit
object. TheSubmit
object had been storing+Attr
keys and then converting these keys to the correctMY.Attr
form on an ad-hoc basis, this could lead to some very strange error conditions. (Ticket #7261)In some situations, notably with Amazon AWS, our curl_plugin requests URLs which return an HTTP 301 or 302 redirection but do not include a Location header. These were previously considered successful transfers. We’ve fixed this so they are now considered failures, and the jobs go on hold. (Ticket #7292)
Our curl_plugin is designed to partially retry downloads which did not complete successfully (HTTP Content-Length header reporting a different number than bytes downloaded). However partial retries do not work with some proxy servers, causing jobs to go on hold. We’ve updated the plugin to not attempt partial retries when a proxy is detected. (Ticket #7259)
The timeout for condor_ssh_to_job connection has been restored to the previous setting of 20 seconds. Shortening the timeout avoids getting into a deadlock between the condor_schedd, condor_starter, and condor_shadow. (Ticket #7193)
Fixed a performance issue in the curl_plugin, where our low-bandwidth timeout caused 100% CPU utilization due to an old libcurl bug. (Ticket #7316)
The Condor Connection Broker (CCB) will allow daemons to register at the
ADVERTISE_STARTD
,ADVERTISE_SCHEDD
, andADVERTISE_MASTER
authorization level. This reduces the minimum authorization needed by daemons that are located behind NATs. (Ticket #7225)
Version 8.9.3¶
Release Notes:
HTCondor version 8.9.3 released on September 12, 2019.
If you run a CCB server, please note that the default value for CCB_RECONNECT_FILE has changed. If your configuration does not set CCB_RECONNECT_FILE, CCB will forget about existing connections after you upgrade. To avoid this problem, set CCB_RECONNECT_FILE to its default path before upgrading. (Look in the
SPOOL
directory for a file ending in.ccb_reconnect
. If you don’t see one, you don’t have to do anything.) (Ticket #7135)The Log file specified by a job, and by the EVENT_LOG configuration variable will now have the year in the event time. Formerly, only the day and month were printed. This change makes these logs unreadable by versions of DAGMan and condor_wait that are older 8.8.4 or 8.9.2. The configuration variable DEFAULT_USERLOG_FORMAT_OPTIONS can be used to revert to the old time format or to opt in to UTC time and/or fractional seconds. (Ticket #6940)
The format of the terminated and aborted events has changed. This will only affect you if you’re not using one the readers provided by HTCondor. (Ticket #6984)
New Features:
TOKEN
authentication is enabled by default if the HTCondor administrator does not specify a preferred list of authentication methods. In this case,TOKEN
is only used if the user has at least one usable token available. (Ticket #7070) Similarly,SSL
authentication is enabled by default and used if there is a server certificate available. (Ticket #7074)The condor_collector daemon will automatically generate a pool password file at the location specified by SEC_PASSWORD_FILE if no file is already present. This should ease the setup of
TOKEN
andPOOL
authentication for a new HTCondor pool. (Ticket #7069)Added a new multifile transfer plugin for downloading and uploading files from/to Google Drive user accounts. This supports URLs like “gdrive://path/to/file” and using the plugin requires the administrator configure the condor_credd to allow users to obtain Google Drive tokens and requires the user request Google Drive tokens in their submit file. (Ticket #7136)
The Box.com multifile transfer plugin now supports uploads. The plugin will be used when a user lists a “box://path/to/file” URL as the output location of file when using
transfer_output_remaps
. (Ticket #7085)Added a Python binding for condor_submit_dag. A new method,
htcondor.Submit.from_dag()
class creates a Submit description based on a .dag file:dag_args = { "maxidle": 10, "maxpost": 5 } dag_submit = htcondor.Submit.from_dag("mydagfile.dag", dag_args)
The resulting
dag_submit
object can be submitted to a condor_schedd and monitored just like any other Submit description object in the Python bindings. (Ticket #6275)The Python binding’s
JobEventLog
can now be pickled and unpickled, allowing users to preserve job-reading progress between process restarts. (Ticket #6944)A number of ease-of-use changes were made for submitting jobs from Python. In the Python method
Schedd::queue_with_itemdata
, the keyword argument was renamed fromfrom
(which, unfortunately, is also a Python keyword) toitemdata
. (Ticket #7064) Both this method and theSubmit
object can now accept a wider range of objects, as long as they can be converted to strings. (Ticket #7065) TheSubmit
class’s constructor now behaves in the same way as a Python dictionary (Ticket #7067)The
Undefined
andError
values in Python no longer cast silently to integers. Previously,Undefined
andError
evaluated toTrue
when used in a conditional; now,Undefined
evaluates toFalse
and evaluatingError
results in aRuntimeError
exception. (Ticket #7109)Improved the speed of matchmaking in pools with partitionable slots by simplifying the slot’s WithinResourceLimits expression. This new definition for this expression now ignores the job’s _condor_RequestXXX attributes, which were never set. In pools with simple start expressions, this can double the speed of matchmaking. (Ticket #7131)
Improved the speed of matchmaking in pools that don’t support standard universe by unconditionally removing standard universe related expressions in the slot START expression. (Ticket #7123)
Reduced DAGMan’s memory footprint when running DAGs with nodes that use the same submit file and/or current working directory. (Ticket #7121)
The terminated and abort events now include “Tickets of Execution”, which specify when the job terminated, who requested the termination, and the mechanism used to make the request (as both a string an integer). This information is also present in the job ad (in the
ToE
attribute). Presently, tickets are only issued for normal job terminations (when the job terminated itself of its own accord), and for terminations resulting from theDEACTIVATE_CLAIM
command. We expect to support tickets for the other mechanisms in future releases. (Ticket #6984)Added new submit parameters
cloud_label_names
andcloud_label_<name>
, which allowing the setting of labels on the cloud instances created for gce grid jobs. (Ticket #6993)The condor_schedd automatically creates a security session for the negotiator if SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION is enabled (the default setting). HTCondor pool administrators no longer need to setup explicit authentication from the negotiator to the condor_schedd; any negotiator trusted by the collector is automatically trusted by the collector. (Ticket #6956)
Daemons will now print a warning in their log file when a client uses an X.509 credential for authentication that contains VOMS extensions that cannot be verified. These warnings can be silenced by setting configuration parameter USE_VOMS_ATTRIBUTES to
False
. (Ticket #5916)When submitting jobs to a multi-cluster Slurm configuration under the grid universe, the cluster to submit to can be specified using the
batch_queue
submit attribute (e.g.batch_queue = debug@cluster1
). (Ticket #7167)HTCondor now sets numerous environment variables to tell the job (or libraries being used by the job) how many CPU cores have been provisioned. Also added the configuration knob STARTER_NUM_THREADS_ENV_VARS to allow the administrator to customize this set of environment variables. (Ticket #7296)
Bugs Fixed:
Fixed a bug where condor_schedd would not start if the history file size, named by MAX_HISTORY_SIZE was more than 2 Gigabytes. (Ticket #7023)
The default CCB_RECONNECT_FILE name now includes the shared port ID instead of the port number, if available, which prevents multiple CCBs behind the same shared port from interfering with each other’s state file. (Ticket #7135)
Fixed a large memory leak when using SSL authentication. (Ticket #7145)
The
TOKEN
authentication method no longer fails if the/etc/condor/passwords.d
is missing. (Ticket #7138)Hostname-based verification for SSL now works more reliably from command-line tools. In some cases, the hostname was dropped internally in HTCondor, causing the SSL certificate verification to fail because only an IP address was available. (Ticket #7073)
Fixed a bug that could cause the condor_schedd to crash when handling a query for the slot ads that it has claimed. (Ticket #7210)
Eliminated needless work done by the condor_schedd when contacted by the negotiator when CURB_MATCHMAKING or MAX_JOBS_RUNNING prevent the condor_schedd from accepting any new matches. (Ticket #6749)
HTCondor’s Docker Universe jobs now more reliably disable the setuid capability from their jobs. Docker Universe has also done this, but the method used has recently changed, and the new way should work going forward. (Ticket #7111)
HTCondor users and daemons can request security tokens used for authentication. This allows the HTCondor pool administrator to simply approve or deny token requests instead of having to generate tokens and copy them between hosts. The condor_schedd and condor_startd will automatically request tokens from any collector they cannot authenticate with; authorizing these daemons can be done by simply having the collector administrator approve the request from the collector. Strong security for new pools can be bootstrapped by installing an auto-approval rule for host-based security while the pool is being installed. (Ticket #7006) (Ticket #7094) (Ticket #7080)
Changed the condor_annex default AMIs to run Docker jobs. As a result, they no longer default to encrypted execute directories. (Ticket #6690)
Improved the handling of parallel universe Docker jobs and the ability to rm and hold them. (Ticket #7076)
Singularity jobs no longer mount the user’s home directory by default. To re-enable this, set the knob
SINGULARITY_MOUNT_HOME = true
. (Ticket #6676)
Version 8.9.2¶
Release Notes:
HTCondor version 8.9.2 released on June 4, 2019.
The default setting for CREDD_OAUTH_MODE is now
true
. This only affects people who were using the condor_credd to manage Kerberos credentials in the SEC_CREDENTIAL_DIRECTORY. (Ticket #7046)
Known Issues:
This release introduces a large memory leak when SSL authentication fails. This will be fixed in the next release. (Ticket #7145)
New Features:
The default file transfer plugin for HTTP/HTTPS will timeout transfers that make no progress as opposed to waiting indefinitely. (Ticket #6971)
Added a new multifile transfer plugin for downloading files from Box.com user accounts. This supports URLs like “box://path/to/file” and using the plugin requires the administrator to configure the condor_credd to allow users to obtain Box.com tokens and requires the user request Box.com tokens in their submit file. (Ticket #7007)
The HTCondor manual has been migrated to Read the Docs. (Ticket #6908)
Python bindings docstrings have been improved. The Python built-in
help
function should now give better results on objects and function in the bindings. (Ticket #6953)The system administrator can now configure better time stamps for the global event log and for all jobs that specify a user log or DAGMan nodes log. There are two new configuration variables that control this; EVENT_LOG_FORMAT_OPTIONS controls the format of the global event log and DEFAULT_USERLOG_FORMAT_OPTIONS controls formatting of user log and DAGMan nodes logs. These configuration variables can individually enable UTC time, ISO 8601 time stamps, and fractional seconds. (Ticket #6941)
The implementation of SSL authentication has been made non-blocking, improving scalability and responsiveness when this method is used. (Ticket #6981)
SSL authentication no longer requires a client X509 certificate present in order to establish a security session. If no client certificate is available, then the client is mapped to the user
unauthenticated
. (Ticket #7032)During SSL authentication, clients now verify that the server hostname matches the host’s X509 certificate, using the rules from RFC 2818. This matches the behavior most users expected in the first place. To restore the prior behavior, where any valid certificate (regardless of hostname) is accepted by default, set SSL_SKIP_HOST_CHECK to
true
. (Ticket #7030)HTCondor will now utilize OpenSSL for random number generation when cryptographically secure (e.g., effectively impossible to guess beforehand) random numbers are needed. Previous random number generation always utilized a method that was not appropriate for cryptographic contexts. As a side-effect of this change, HTCondor can no longer be built without OpenSSL support. (Ticket #6990)
A new authentication method,
TOKEN
, has been added. This method provides the pool administrator with more fine-grained authorization control (making it appropriate for end-user use) and provides the ability for multiple pool passwords to exist within a single setup. (Ticket #6947)Authentication can be done using SciTokens. If the client saves the token to the file specified in SCITOKENS_FILE, that token will be used to authenticate with the remote server. Further, for HTCondor-C jobs, the token file can be specified by the job attribute
ScitokensFile
. (Ticket #7011)condor_submit and the python bindings submit now use a table to convert most submit keywords to job attributes. This should make adding new submit keywords in the future quicker and more reliable. (Ticket #7044)
File transfer plugins can now be supplied by the job. (Ticket #6855)
Add job ad attribute
JobDisconnectedDate
. When the condor_shadow and condor_starter are disconnected from each other, this attribute is set to the time at which the disconnection happened. (Ticket #6978)HTCondor EC2 components are now packaged for Debian and Ubuntu. (Ticket #7043)
Bugs Fixed:
condor_status -af:r now properly prints nested ClassAds. The handling of undefined attribute references has also been corrected, so that that they print
undefined
instead of the name of the undefined attribute. (Ticket #6979)X.509 proxies now work properly with job materialization. In particular, the job attributes describing the X.509 credential are now set properly. (Ticket #6972)
Argument names for all functions in the Python bindings (including class constructors and methods) have been normalized. We don’t expect any compatibility problems with existing code. (Ticket #6963)
In the Python bindings, the default argument for
use_tcp
inCollector.advertise
is nowTrue
(it was previouslyFalse
, which was very outdated). (Ticket #6983)Reduced the number of DNS resolutions that may be performed while establishing a network connection. Slow DNS queries could cause a connection to fail due to the peer timing out. (Ticket #6968)
Version 8.9.1¶
Release Notes:
HTCondor version 8.9.1 released on April 17, 2019.
New Features:
The deprecated
HOSTALLOW...
andHOSTDENY...
configuration knobs have been removed. Please useALLOW...
andDENY...
. (Ticket #6921)Implemented a new version of the curl_plugin with multi-file support, allowing it to transfer many files in a single invocation of the plugin. (Ticket #6499) (Ticket #6859)
The performance of HTCondor’s File Transfer mechanism has improved when sending multiple files, especially in wide-area network settings. (Ticket #6884)
Added support for passing HTTPS authentication credentials to file transfer plugins, using specially customized protocols. (Ticket #6858)
If a job requests GPUs and is a Docker Universe job, HTCondor automatically mounts the nVidia GPU devices. (Ticket #6910)
If a job requests GPUs, and Singularity is enabled, HTCondor automatically passes the -nv flag to Singularity to tell it to mount the nVidia GPUs. (Ticket #6898)
Added a new submit file option,
docker_network_type = host
, which causes a docker universe job to use the host’s network, instead of the default NATed interface. (Ticket #6906)Added a new configuration knob, DOCKER_EXTRA_ARGUMENTS, to allow administrators to add arbitrary docker command line options to the docker create command. (Ticket #6900)
We’ve added six new events to the job event log, recording details about file transfer. For both file transfer -in (before/to the job) and -out (after/from the job), we log if the transfer was queued, when it started, and when it finished. If the event was queued, the start event will note for how long; the first transfer event written will additionally include the starter’s address, which has not otherwise been printed.
We’ve also added several transfer-related attributes to the job ad. For jobs which do file transfer, we now set
JobCurrentFinishTransferOutputDate
, to complementJobCurrentStartTransferOutputDate
, as well as the corresponding attributes for input transfer:JobCurrentStartTransferInputDate
andJobCurrentFinishTransferInputDate
. The new attributes are added at the same time asJobCurrentStartTransferOutputDate
, that is, at job termination. This set of attributes use the older and more deceptive definitions of file transfer timing. To obtain the times recorded by the new events, instead referenceTransferInQueued
,TransferInStarted
,TransferInFinished
,TransferOutQueued
,TransferOutStarted
, andTransferOutFinished
. HTCondor sets these attributes (roughly) at the time they occur. (Ticket #6854)Added support for output file remaps for URLs. This allows users to specify a URL where they want individual output files to go, and once a job is complete, we automatically uploads the files there. We are preserving the older implementation (OutputDestination), which puts all output files in the same place, for backwards compatibility. (Ticket #6876)
Added options
f
(return full target string) andg
(perform multiple substitutions) to ClassAd functionregexps()
. Added new ClassAd functionsreplace()
(equivalent toregexps()
withf
option) andreplaceall()
(equivalent toregexps()
withfg
options). (Ticket #6848)When jobs are run without file transfer on, usually because there is a shared file system, HTCondor used to unconditionally set the jobs argv[0] to the string condor_exec.exe. This breaks jobs that look at their own argv[0], in ways that are very hard to debug. In this release of HTCondor, we no longer do this. (Ticket #6943)
Bugs Fixed:
Avoid killing jobs using between 90% and 99% of memory limit. (Ticket #6925)
Improved how
"Chirp"
handles a network disconnection between the condor_starter and condor_shadow."Chirp"
commands now return a error and no longer cause the condor_starter to exit (killing the job). (Ticket #6873)Fixed a bug that could cause condor_submit to send invalid job ClassAds to the condor_schedd when the executable attribute was not the same for all jobs in that submission. (Ticket #6719)
Version 8.9.0¶
Release Notes:
HTCondor version 8.9.0 released on February 28, 2019.
Known Issues:
This release may require configuration changes to work as before. During the 8.9 development series, we are making changes to make it easier to deploy secure pools. This release contains two security related configuration changes.
Absent any configuration, the default behavior is to deny authorization to all users.
In the configuration files, if
ALLOW_DAEMON
orDENY_DAEMON
are omitted,ALLOW_WRITE
orDENY_WRITE
are no longer used in their place.On most pools, the easiest way to get the previous behavior is to add the following to your configuration:
ALLOW_READ = * ALLOW_DAEMON = $(ALLOW_WRITE)
The main configuration file (
/etc/condor/condor_config
) already implements the above change by callinguse SECURITY : HOST_BASED
.With the addition of the automatic security session for a family of HTCondor daemons and the existing match password authentication between the execute and submit daemons, most hosts in a pool may not require changes to the configuration files. On the central manager, you do need to ensure
DAEMON
level access for your submit nodes. Also, CCB requiresDAEMON
level access.
New Features:
Changed the default security behavior to deny authorization by default. Also, neither
ALLOW_DAEMON
norDENY_DAEMON
fall back to using the correspondingALLOW_WRITE
orDENY_WRITE
when reading configuration files. (Ticket #6824)A family of HTCondor daemons can now share a security session that allows them to trust each other without doing a security negotiation when a network connection is made amongst them. This “family” security session can be disabled by setting the new configuration parameter SEC_USE_FAMILY_SESSION to
False
. (Ticket #6788)Scheduler Universe jobs now start in order of priority, instead of random order. This is most typically used for DAGMan. When running condor_submit_dag against a .dag file, you can use the -priority <N> flag to set the priority for the overall condor_dagman job. When the condor_schedd is starting new Scheduler Universe jobs, the highest priority queued job will start first. If all queued Scheduler Universe jobs have equal priority, they get started in order of submission. (Ticket #6703)
Normally, HTCondor requires the user to specify their credentials when using EC2 (via the grid universe or via condor_annex). This allows users to use different accounts from the same machine. However, if a user started an EC2 instance with the privileges necessary to start other instances, and ran HTCondor in that instance, HTCondor was unable to use that instance’s privileges; the user still had to specify their credentials. Instead, the user may now specify
FROM INSTANCE
instead of the name of a credential file to indicate that HTCondor should use the instance’s credentials.By default, any user with access to a privileged EC2 instance has access to that instance’s privileges. If you would like to make use of this feature, please read HTCondor Annex Customization Guide before adding privileges (an instance role) to an instance which allows access by other users, specifically including the submitting of jobs to or running jobs on that instance. (Ticket #6789)
The condor_now tool now supports vacating more than one job; the additional jobs’ resources will be coalesced into a single slot, on which the now-job will be run. (Ticket #6694)
In the Python bindings, the
JobEventLog
class now has aclose
method. It is also now its own iterable context manager (implements__enter__
and__exit__
). TheJobEvent
class now implements__str__
and__repr__
. (Ticket #6814)the condor_hdfs daemon which allowed the hdfs daemons to run under the condor_master has been removed from the contributed source. (Ticket #6809)
Bugs Fixed:
Fixed potential authentication failures between the condor_schedd and condor_startd when multiple condor_startd s are using the same shared port server. (Ticket #5604)