Networking (includes sections on Port Usage and CCB)
This section on network communication in HTCondor discusses which network ports are used, how HTCondor behaves on machines with multiple network interfaces and IP addresses, and how to facilitate functionality in a pool that spans firewalls and private networks.
The security section of the manual contains some information that is relevant to the discussion of network communication which will not be duplicated here, so please see the Security section as well.
Firewalls, private networks, and network address translation (NAT) pose special problems for HTCondor. There are currently two main mechanisms for dealing with firewalls within HTCondor:
Restrict HTCondor to use a specific range of port numbers, and allow connections through the firewall that use any port within the range.
Use HTCondor Connection Brokering (CCB).
Each method has its own advantages and disadvantages, as described below.
Port Usage in HTCondor
IPv4 Port Specification
The general form for IPv4 port specification is
These parameters and values are URL-encoded. This means any special character is encoded with %, followed by two hexadecimal digits specifying the ASCII value. Special characters are any non-alphanumeric character.
HTCondor currently recognizes the following parameters with an IPv4 port specification:
Provides contact information for forming a CCB connection to a daemon, or a space separated list, if the daemon is registered with more than one CCB server. Each contact information is specified in the form of IP:port#ID. Note that spaces between list items will be URL encoded by
Provides the name of the daemon’s private network. This value is specified in the configuration with
Provides the name of condor_shared_port daemon named socket.
Provides the daemon’s private address in form of
Default Port Usage
Every HTCondor daemon listens on a network port for incoming commands. (Using condor_shared_port, this port may be shared between multiple daemons.) Most daemons listen on a dynamically assigned port. In order to send a message, HTCondor daemons and tools locate the correct port to use by querying the condor_collector, extracting the port number from the ClassAd. One of the attributes included in every daemon’s ClassAd is the full IP address and port number upon which the daemon is listening.
To access the condor_collector itself, all HTCondor daemons and tools must know the port number where the condor_collector is listening. The condor_collector is the only daemon with a well-known, fixed port. By default, HTCondor uses port 9618 for the condor_collector daemon. However, this port number can be changed (see below).
As an optimization for daemons and tools communicating with another
daemon that is running on the same host, each HTCondor daemon can be
configured to write its IP address and port number into a well-known
file. The file names are controlled using the
configuration variables, as described in the
DaemonCore Configuration File Entries
NOTE: In the 6.6 stable series, and HTCondor versions earlier than 6.7.5, the condor_negotiator also listened on a fixed, well-known port (the default was 9614). However, beginning with version 6.7.5, the condor_negotiator behaves like all other HTCondor daemons, and publishes its own ClassAd to the condor_collector which includes the dynamically assigned port the condor_negotiator is listening on. All HTCondor tools and daemons that need to communicate with the condor_negotiator will either use the NEGOTIATOR_ADDRESS_FILE or will query the condor_collector for the condor_negotiator ‘s ClassAd.
Using a Non Standard, Fixed Port for the condor_collector
By default, HTCondor uses port 9618 for the condor_collector daemon. To use a different port number for this daemon, the configuration variables that tell HTCondor these communication details are modified. Instead of
CONDOR_HOST = machX.cs.wisc.edu COLLECTOR_HOST = $(CONDOR_HOST)
the configuration might be
CONDOR_HOST = machX.cs.wisc.edu COLLECTOR_HOST = $(CONDOR_HOST):9650
If a non standard port is defined, the same value of COLLECTOR_HOST
(including the port) must be used for all machines in the HTCondor pool.
Therefore, this setting should be modified in the global configuration
condor_config file), or the value must be duplicated across
all configuration files in the pool if a single configuration file is
not being shared.
When querying the condor_collector for a remote pool that is running on a non standard port, any HTCondor tool that accepts the -pool argument can optionally be given a port number. For example:
$ condor_status -pool foo.bar.org:1234
Using a Dynamically Assigned Port for the condor_collector
On single machine pools, it is permitted to configure the condor_collector daemon to use a dynamically assigned port, as given out by the operating system. This prevents port conflicts with other services on the same machine. However, a dynamically assigned port is only to be used on single machine HTCondor pools, and only if the COLLECTOR_ADDRESS_FILE configuration variable has also been defined. This mechanism allows all of the HTCondor daemons and tools running on the same machine to find the port upon which the condor_collector daemon is listening, even when this port is not defined in the configuration file and is not known in advance.
To enable the condor_collector daemon to use a dynamically assigned
port, the port number is set to 0 in the
COLLECTOR_HOST variable. The
configuration variable must also be defined, as it provides a known file
where the IP address and port information will be stored. All HTCondor
clients know to look at the information stored in this file. For
COLLECTOR_HOST = $(CONDOR_HOST):0 COLLECTOR_ADDRESS_FILE = $(LOG)/.collector_address
Restricting Port Usage to Operate with Firewalls
If an HTCondor pool is completely behind a firewall, then no special consideration or port usage is needed. However, if there is a firewall between the machines within an HTCondor pool, then configuration variables may be set to force the usage of specific ports, and to utilize a specific range of ports.
By default, HTCondor uses port 9618 for the condor_collector daemon, and dynamic (apparently random) ports for everything else. See Port Usage in HTCondor, if a dynamically assigned port is desired for the condor_collector daemon.
All of the HTCondor daemons on a machine may be configured to share a single port. See the condor_shared_port Configuration File Macros section for more information.
The configuration variables HIGHPORT and
LOWPORT facilitate setting a restricted range
of ports that HTCondor will use. This may be useful when some machines
are behind a firewall. The configuration macros
LOWPORT will restrict dynamic ports to the range specified. The
configuration variables are fully defined in the
Network-Related Configuration File Entries section. All of these ports must be greater than 0 and less than 65,536.
Note that both
LOWPORT must be at least 1024 for HTCondor
version 6.6.8. In general, use ports greater than 1024, in order to avoid port
conflicts with standard services on the machine. Another reason for
using ports greater than 1024 is that daemons and tools are often not
run as root, and only root may listen to a port lower than 1024. Also,
the range must include enough ports that are not in use, or HTCondor
The range of ports assigned may be restricted based on incoming (listening) and outgoing (connect) ports with the configuration variables IN_HIGHPORT, IN_LOWPORT, OUT_HIGHPORT, and OUT_LOWPORT See the Network-Related Configuration File Entries section for complete definitions of these configuration variables. A range of ports lower than 1024 for daemons running as root is appropriate for incoming ports, but not for outgoing ports. The use of ports below 1024 (versus above 1024) has security implications; therefore, it is inappropriate to assign a range that crosses the 1024 boundary.
LOWPORT will not automatically force
the condor_collector to bind to a port within the range. The only way
to control what port the condor_collector uses is by setting the
COLLECTOR_HOST (as described above).
The total number of ports needed depends on the size of the pool, the usage of the machines within the pool (which machines run which daemons), and the number of jobs that may execute at one time. Here we discuss how many ports are used by each participant in the system. This assumes that condor_shared_port is not being used. If it is being used, then all daemons can share a single incoming port.
The central manager of the pool needs
5 + number of condor_schedd daemons ports for outgoing connections
and 2 ports for incoming connections for daemon communication.
Each execute machine (those machines running a condor_startd daemon) requires `` 5 + (5 * number of slots advertised by that machine)`` ports. By default, the number of slots advertised will equal the number of physical CPUs in that machine.
Submit machines (those machines running a condor_schedd daemon) require `` 5 + (5 * MAX_JOBS_RUNNING``) ports. The configuration variable MAX_JOBS_RUNNING limits (on a per-machine basis, if desired) the maximum number of jobs. Without this configuration macro, the maximum number of jobs that could be simultaneously executing at one time is a function of the number of reachable execute machines.
Also be aware that
LOWPORT only impact dynamic port
selection used by the HTCondor system, and they do not impact port
selection used by jobs submitted to HTCondor. Thus, jobs submitted to
HTCondor that may create network connections may not work in a port
restricted environment. For this reason, specifying
LOWPORT is not going to produce the expected results if a user
submits MPI applications to be executed under the parallel universe.
Where desired, a local configuration for machines not behind a firewall
can override the usage of
LOWPORT, such that the
ports used for these machines are not restricted. This can be
accomplished by adding the following to the local configuration file of
those machines not behind a firewall:
HIGHPORT = UNDEFINED LOWPORT = UNDEFINED
If the maximum number of ports allocated using
LOWPORT is too few, socket binding errors of the form
failed to bind any port within <$LOWPORT> - <$HIGHPORT>
are likely to appear repeatedly in log files.
This section has not yet been written
This section has not yet been written
Configuring HTCondor for Machines With Multiple Network Interfaces
HTCondor can run on machines with multiple network interfaces. Starting with HTCondor version 6.7.13 (and therefore all HTCondor 6.8 and more recent versions), new functionality is available that allows even better support for multi-homed machines, using the configuration variable BIND_ALL_INTERFACES. A multi-homed machine is one that has more than one NIC (Network Interface Card). Further improvements to this new functionality will remove the need for any special configuration in the common case. For now, care must still be given to machines with multiple NICs, even when using this new configuration variable.
Machines can be configured such that whenever HTCondor daemons or tools
bind(), the daemons or tools use all network interfaces on the
machine. This means that outbound connections will always use the
appropriate network interface to connect to a remote host, instead of
being forced to use an interface that might not have a route to the
given destination. Furthermore, sockets upon which a daemon listens for
incoming connections will be bound to all network interfaces on the
machine. This means that so long as remote clients know the right port,
they can use any IP address on the machine and still contact a given
This functionality is on by default. To disable this functionality, the
boolean configuration variable BIND_ALL_INTERFACES is defined and
BIND_ALL_INTERFACES = FALSE
This functionality has limitations. Here are descriptions of the limitations.
- Using all network interfaces does not work with Kerberos.
Every Kerberos ticket contains a specific IP address within it. Authentication over a socket (using Kerberos) requires the socket to also specify that same specific IP address. Use of
BIND_ALL_INTERFACEScauses outbound connections from a multi-homed machine to originate over any of the interfaces. Therefore, the IP address of the outbound connection and the IP address in the Kerberos ticket will not necessarily match, causing the authentication to fail. Sites using Kerberos authentication on multi-homed machines are strongly encouraged not to enable
BIND_ALL_INTERFACES, at least until HTCondor’s Kerberos functionality supports using multiple Kerberos tickets together with finding the right one to match the IP address a given socket is bound to.
- There is a potential security risk.
Consider the following example of a security risk. A multi-homed machine is at a network boundary. One interface is on the public Internet, while the other connects to a private network. Both the multi-homed machine and the private network machines comprise an HTCondor pool. If the multi-homed machine enables
BIND_ALL_INTERFACES, then it is at risk from hackers trying to compromise the security of the pool. Should this multi-homed machine be compromised, the entire pool is vulnerable. Most sites in this situation would run an sshd on the multi-homed machine so that remote users who wanted to access the pool could log in securely and use the HTCondor tools directly. In this case, remote clients do not need to use HTCondor tools running on machines in the public network to access the HTCondor daemons on the multi-homed machine. Therefore, there is no reason to have HTCondor daemons listening on ports on the public Internet, causing a potential security threat.
- Up to two IP addresses will be advertised.
At present, even though a given HTCondor daemon will be listening to ports on multiple interfaces, each with their own IP address, there is currently no mechanism for that daemon to advertise all of the possible IP addresses where it can be contacted. Therefore, HTCondor clients (other HTCondor daemons or tools) will not necessarily able to locate and communicate with a given daemon running on a multi-homed machine where
BIND_ALL_INTERFACEShas been enabled.
Currently, HTCondor daemons can only advertise two IP addresses in the ClassAd they send to their condor_collector. One is the public IP address and the other is the private IP address. HTCondor tools and other daemons that wish to connect to the daemon will use the private IP address if they are configured with the same private network name, and they will use the public IP address otherwise. So, even if the daemon is listening on 3 or more different interfaces, each with a separate IP, the daemon must choose which two IP addresses to advertise so that other daemons and tools can connect to it.
By default, HTCondor advertises the most public IP address available on the machine. The NETWORK_INTERFACE configuration variable can be used to specify the public IP address HTCondor should advertise, and PRIVATE_NETWORK_INTERFACE, along with PRIVATE_NETWORK_NAME can be used to specify the private IP address to advertise.
Sites that make heavy use of private networks and multi-homed machines should consider if using the HTCondor Connection Broker, CCB, is right for them. More information about CCB and HTCondor can be found in the HTCondor Connection Brokering (CCB) section.
Central Manager with Two or More NICs
Often users of HTCondor wish to set up compute farms where there is one machine with two network interface cards (one for the public Internet, and one for the private net). It is convenient to set up the head node as a central manager in most cases and so here are the instructions required to do so.
Setting up the central manager on a machine with more than one NIC can be a little confusing because there are a few external variables that could make the process difficult. One of the biggest mistakes in getting this to work is that either one of the separate interfaces is not active, or the host/domain names associated with the interfaces are incorrectly configured.
Given that the interfaces are up and functioning, and they have good host/domain names associated with them here is how to configure HTCondor:
In this example,
farm-server.farm.org maps to the private interface.
In the central manager’s global (to the cluster) configuration file:
CONDOR_HOST = farm-server.farm.org
In the central manager’s local configuration file:
NETWORK_INTERFACE = <IP address of farm-server.farm.org> NEGOTIATOR = $(SBIN)/condor_negotiator COLLECTOR = $(SBIN)/condor_collector DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, STARTD
Now, if the cluster is set up so that it is possible for a machine name
to never have a domain name (for example, there is machine name but no
fully qualified domain name in
DEFAULT_DOMAIN_NAME to be the domain that is to be added on
to the end of the host name.
A Client Machine with Multiple Interfaces
If client machine has two or more NICs, then there might be a specific network interface on which the client machine desires to communicate with the rest of the HTCondor pool. In this case, the local configuration file for the client should have
NETWORK_INTERFACE = <IP address of desired interface>
HTCondor Connection Brokering (CCB)
HTCondor Connection Brokering, or CCB, is a way of allowing HTCondor components to communicate with each other when one side is in a private network or behind a firewall. Specifically, CCB allows communication across a private network boundary in the following scenario: an HTCondor tool or daemon (process A) needs to connect to an HTCondor daemon (process B), but the network does not allow a TCP connection to be created from A to B; it only allows connections from B to A. In this case, B may be configured to register itself with a CCB server that both A and B can connect to. Then when A needs to connect to B, it can send a request to the CCB server, which will instruct B to connect to A so that the two can communicate.
As an example, consider an HTCondor execute node that is within a private network. This execute node’s condor_startd is process B. This execute node cannot normally run jobs submitted from a machine that is outside of that private network, because bi-directional connectivity between the submit node and the execute node is normally required. However, if both execute and access point can connect to the CCB server, if both are authorized by the CCB server, and if it is possible for the execute node within the private network to connect to the submit node, then it is possible for the submit node to run jobs on the execute node.
To effect this CCB solution, the execute node’s condor_startd within the private network registers itself with the CCB server by setting the configuration variable CCB_ADDRESS. The submit node’s condor_schedd communicates with the CCB server, requesting that the execute node’s condor_startd open the TCP connection. The CCB server forwards this request to the execute node’s condor_startd, which opens the TCP connection. Once the connection is open, bi-directional communication is enabled.
If the location of the execute and submit nodes is reversed with respect to the private network, the same idea applies: the submit node within the private network registers itself with a CCB server, such that when a job is running and the execute node needs to connect back to the submit node (for example, to transfer output files), the execute node can connect by going through CCB to request a connection.
If both A and B are in separate private networks, then CCB alone cannot provide connectivity. However, if an incoming port or port range can be opened in one of the private networks, then the situation becomes equivalent to one of the scenarios described above and CCB can provide bi-directional communication given only one-directional connectivity. See Port Usage in HTCondor for information on opening port ranges. Also note that CCB works nicely with condor_shared_port.
Any condor_collector may be used as a CCB server. There is no
requirement that the condor_collector acting as the CCB server be the
same condor_collector that a daemon advertises itself to (as with
COLLECTOR_HOST). However, this is often a convenient choice.
This example assumes that there is a pool of machines in a private network that need to be made accessible from the outside, and that the condor_collector (and therefore CCB server) used by these machines is accessible from the outside. Accessibility might be achieved by a special firewall rule for the condor_collector port, or by being on a dual-homed machine in both networks.
The configuration of variable CCB_ADDRESS on machines in the private network causes registration with the CCB server as in the example:
CCB_ADDRESS = $(COLLECTOR_HOST) PRIVATE_NETWORK_NAME = cs.wisc.edu
The definition of
PRIVATE_NETWORK_NAME ensures that all
communication between nodes within the private network continues to
happen as normal, and without going through the CCB server. The name
PRIVATE_NETWORK_NAME should be different from the private
network name chosen for any HTCondor installations that will be
communicating with this pool.
Under Unix, and with large HTCondor pools, it is also necessary to give
the condor_collector acting as the CCB server a large enough limit of
file descriptors. This may be accomplished with the configuration
variable MAX_FILE_DESCRIPTORS or
an equivalent. Each HTCondor process configured to use CCB with
CCB_ADDRESS requires one persistent TCP connection to the CCB
server. A typical execute node requires one connection for the
condor_master, one for the condor_startd, and one for each running
job, as represented by a condor_starter. A typical access point
requires one connection for the condor_master, one for the
condor_schedd, and one for each running job, as represented by a
condor_shadow. If there will be no administrative commands required
to be sent to the condor_master from outside of the private network,
then CCB may be disabled in the condor_master by assigning
MASTER.CCB_ADDRESS to nothing:
Completing the count of TCP connections in this example: suppose the pool consists of 500 8-slot execute nodes and CCB is not disabled in the configuration of the condor_master processes. In this case, the count of needed file descriptors plus some extra for other transient connections to the collector is 500*(1+1+8)=5000. Be generous, and give it twice as many descriptors as needed by CCB alone:
COLLECTOR.MAX_FILE_DESCRIPTORS = 10000
Security and CCB
The CCB server authorizes all daemons that register themselves with it (using CCB_ADDRESS) at the DAEMON authorization level (these are playing the role of process A in the above description). It authorizes all connection requests (from process B) at the READ authorization level. As usual, whether process B authorizes process A to do whatever it is trying to do is up to the security policy for process B; from the HTCondor security model’s point of view, it is as if process A connected to process B, even though at the network layer, the reverse is true.
Errors registering with CCB or requesting connections via CCB are logged
D_ALWAYS in the debugging log. These errors may be
identified by searching for “CCB” in the log message. Command-line tools
require the argument -debug for this information to be visible. To
see details of the CCB protocol add
D_FULLDEBUG to the debugging
options for the particular HTCondor subsystem of interest. Or, add
ALL_DEBUG to get extra debugging from all
A daemon that has successfully registered itself with CCB will advertise
this fact in its address in its ClassAd. The ClassAd attribute
MyAddress will contain information about its
Scalability and CCB
Any number of CCB servers may be used to serve a pool of HTCondor daemons. For example, half of the pool could use one CCB server and half could use another. Or for redundancy, all daemons could use both CCB servers and then CCB connection requests will load-balance across them. Typically, the limit of how many daemons may be registered with a single CCB server depends on the authentication method used by the condor_collector for DAEMON-level and READ-level access, and on the amount of memory available to the CCB server. We are not able to provide specific recommendations at this time, but to give a very rough idea, a server class machine should be able to handle CCB service plus normal condor_collector service for a pool containing a few thousand slots without much trouble.
Using TCP to Send Updates to the condor_collector
TCP sockets are reliable, connection-based sockets that guarantee the delivery of any data sent. However, TCP sockets are fairly expensive to establish, and there is more network overhead involved in sending and receiving messages.
UDP sockets are datagrams, and are not reliable. There is very little overhead in establishing or using a UDP socket, but there is also no guarantee that the data will be delivered. The lack of guaranteed delivery of UDP will negatively affect some pools, particularly ones comprised of machines across a wide area network (WAN) or highly-congested network links, where UDP packets are frequently dropped.
By default, HTCondor daemons will use TCP to send updates to the condor_collector, with the exception of the condor_collector forwarding updates to any condor_collector daemons specified in CONDOR_VIEW_HOST, where UDP is used. These configuration variables control the protocol used:
When set to
False, the HTCondor daemons will use UDP to update the condor_collector, instead of the default TCP. Defaults to
When set to
True, the HTCondor collector will use TCP to forward updates to condor_collector daemons specified by
CONDOR_VIEW_HOST, instead of the default UDP. Defaults to
A list of condor_collector daemons which will be updated with TCP instead of UDP, when
UPDATE_VIEW_COLLECTOR_WITH_TCPis set to
When there are sufficient file descriptors, the condor_collector leaves established TCP sockets open, facilitating better performance. Subsequent updates can reuse an already open socket.
Each HTCondor daemon that sends updates to the condor_collector will have 1 socket open to it. So, in a pool with N machines, each of them running a condor_master, condor_schedd, and condor_startd, the condor_collector would need at least 3*N file descriptors. If the condor_collector is also acting as a CCB server, it will require an additional file descriptor for each registered daemon. In the default configuration, the number of file descriptors available to the condor_collector is 10240. For very large pools, the number of descriptor can be modified with the configuration:
COLLECTOR_MAX_FILE_DESCRIPTORS = 40960
If there are insufficient file descriptors for all of the daemons
sending updates to the condor_collector, a warning will be printed in
the condor_collector log file. The string
"file descriptor safety level exceeded" identifies this warning.
Running HTCondor on an IPv6 Network Stack
HTCondor supports using IPv4, IPv6, or both.
The default setting for ENABLE_IPV4 and
auto. If HTCondor does
not find an interface with an address of the corresponding protocol,
that protocol will not be used. Additionally, if only one of the
protocols has a private or public address, the other protocol will be
disabled. For instance, a machine with a private IPv4 address and a
loopback IPv6 address will only use IPv4; there’s no point trying to
contact some other machine via IPv6 over a loopback interface.
If both IPv4 and IPv6 networking are enabled, HTCondor runs in mixed mode. In mixed mode, HTCondor daemons have at least one IPv4 address and at least one IPv6 address. Other daemons and the command-line tools choose between these addresses based on which protocols are enabled for them; if both are, they will prefer the first address listed by that daemon.
A daemon may be listening on one, some, or all of its machine’s addresses. NETWORK_INTERFACE Daemons may presently list at most two addresses, one IPv6 and one IPv4. Each address is the “most public” address of its protocol; by default, the IPv6 address is listed first. HTCondor selects the “most public” address heuristically.
Nonetheless, there are two cases in which HTCondor may not use an IPv6 address when one is available:
When given a literal IP address, HTCondor will use that IP address.
When looking up a host name using DNS, HTCondor will use the first address whose protocol is enabled for the tool or daemon doing the look up.
You may force HTCondor to prefer IPv4 in all three of these situations by setting the macro PREFER_IPV4 to true; this is the default. With PREFER_IPV4 set, HTCondor daemons will list their “most public” IPv4 address first; prefer the IPv4 address when choosing from another’s daemon list; and prefer the IPv4 address when looking up a host name in DNS.
In practice, both an HTCondor pool’s central manager and any submit machines within a mixed mode pool must have both IPv4 and IPv6 addresses for both IPv4-only and IPv6-only condor_startd daemons to function properly.
IPv6 and Host-Based Security
You may freely intermix IPv6 and IPv4 address literals. You may also
specify IPv6 netmasks as a legal IPv6 address followed by a slash
followed by the number of bits in the mask; or as the prefix of a legal
IPv6 address followed by two colons followed by an asterisk. The latter
is entirely equivalent to the former, except that it only allows you to
(implicitly) specify mask bits in groups of sixteen. For example,
fe8f:1234::* specify the same network mask.
The HTCondor security subsystem resolves names in the ALLOW and DENY
lists and uses all of the resulting IP addresses. Thus, to allow or deny
IPv6 addresses, the names must have IPv6 DNS entries (AAAA records), or
NO_DNS must be enabled.
IPv6 Address Literals
When you specify an IPv6 address and a port number simultaneously, you must separate the IPv6 address from the port number by placing square brackets around the address. For instance:
COLLECTOR_HOST = [2607:f388:1086:0:21e:68ff:fe0f:6462]:5332
If you do not (or may not) specify a port, do not use the square brackets. For instance:
NETWORK_INTERFACE = 1234:5678::90ab
IPv6 without DNS
When using the configuration variable NO_DNS,
IPv6 addresses are turned into host names by taking the IPv6 address,
changing colons to dashes, and appending