# Networking (includes sections on Port Usage and CCB)¶

This section on network communication in HTCondor discusses which network ports are used, how HTCondor behaves on machines with multiple network interfaces and IP addresses, and how to facilitate functionality in a pool that spans firewalls and private networks.

The security section of the manual contains some information that is relevant to the discussion of network communication which will not be duplicated here, so please see the Security section as well.

Firewalls, private networks, and network address translation (NAT) pose special problems for HTCondor. There are currently two main mechanisms for dealing with firewalls within HTCondor:

1. Restrict HTCondor to use a specific range of port numbers, and allow connections through the firewall that use any port within the range.

2. Use HTCondor Connection Brokering (CCB).

## Port Usage in HTCondor¶

### IPv4 Port Specification¶

The general form for IPv4 port specification is

<IP:port?param1name=value1&param2name=value2&param3name=value3&...>


These parameters and values are URL-encoded. This means any special character is encoded with %, followed by two hexadecimal digits specifying the ASCII value. Special characters are any non-alphanumeric character.

HTCondor currently recognizes the following parameters with an IPv4 port specification:

CCBID

Provides contact information for forming a CCB connection to a daemon, or a space separated list, if the daemon is registered with more than one CCB server. Each contact information is specified in the form of IP:port#ID. Note that spaces between list items will be URL encoded by %20.

PrivNet

Provides the name of the daemon’s private network. This value is specified in the configuration with PRIVATE_NETWORK_NAME.

sock

Provides the name of condor_shared_port daemon named socket.

PrivAddr

Provides the daemon’s private address in form of IP:port.

### Default Port Usage¶

Every HTCondor daemon listens on a network port for incoming commands. (Using condor_shared_port, this port may be shared between multiple daemons.) Most daemons listen on a dynamically assigned port. In order to send a message, HTCondor daemons and tools locate the correct port to use by querying the condor_collector, extracting the port number from the ClassAd. One of the attributes included in every daemon’s ClassAd is the full IP address and port number upon which the daemon is listening.

To access the condor_collector itself, all HTCondor daemons and tools must know the port number where the condor_collector is listening. The condor_collector is the only daemon with a well-known, fixed port. By default, HTCondor uses port 9618 for the condor_collector daemon. However, this port number can be changed (see below).

As an optimization for daemons and tools communicating with another daemon that is running on the same host, each HTCondor daemon can be configured to write its IP address and port number into a well-known file. The file names are controlled using the <SUBSYS>_ADDRESS_FILE configuration variables, as described in the DaemonCore Configuration File Entries section.

NOTE: In the 6.6 stable series, and HTCondor versions earlier than 6.7.5, the condor_negotiator also listened on a fixed, well-known port (the default was 9614). However, beginning with version 6.7.5, the condor_negotiator behaves like all other HTCondor daemons, and publishes its own ClassAd to the condor_collector which includes the dynamically assigned port the condor_negotiator is listening on. All HTCondor tools and daemons that need to communicate with the condor_negotiator will either use the NEGOTIATOR_ADDRESS_FILE or will query the condor_collector for the condor_negotiator ‘s ClassAd.

Sites that configure any checkpoint servers will introduce other fixed ports into their network. Each condor_ckpt_server will listen to 4 fixed ports: 5651, 5652, 5653, and 5654. There is currently no way to configure alternative values for any of these ports.

### Using a Non Standard, Fixed Port for the condor_collector¶

By default, HTCondor uses port 9618 for the condor_collector daemon. To use a different port number for this daemon, the configuration variables that tell HTCondor these communication details are modified. Instead of

CONDOR_HOST = machX.cs.wisc.edu
COLLECTOR_HOST = $(CONDOR_HOST)  the configuration might be CONDOR_HOST = machX.cs.wisc.edu COLLECTOR_HOST =$(CONDOR_HOST):9650


If a non standard port is defined, the same value of COLLECTOR_HOST (including the port) must be used for all machines in the HTCondor pool. Therefore, this setting should be modified in the global configuration file (condor_config file), or the value must be duplicated across all configuration files in the pool if a single configuration file is not being shared.

When querying the condor_collector for a remote pool that is running on a non standard port, any HTCondor tool that accepts the -pool argument can optionally be given a port number. For example:

COLLECTOR = $(SBIN)/condor_collector DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, STARTD  Now, if the cluster is set up so that it is possible for a machine name to never have a domain name (for example, there is machine name but no fully qualified domain name in /etc/hosts), configure DEFAULT_DOMAIN_NAME to be the domain that is to be added on to the end of the host name. ### A Client Machine with Multiple Interfaces¶ If client machine has two or more NICs, then there might be a specific network interface on which the client machine desires to communicate with the rest of the HTCondor pool. In this case, the local configuration file for the client should have NETWORK_INTERFACE = <IP address of desired interface>  ### A Checkpoint Server on a Machine with Multiple NICs¶ If a checkpoint server is on a machine with multiple interfaces, then 2 items must be correct to get things to work: 1. The different interfaces have different host names associated with them. 2. In the global configuration file, set configuration variable CKPT_SERVER_HOST to the host name that corresponds with the IP address desired for the pool. Configuration variable NETWORK_INTERFACE must still be specified in the local configuration file for the checkpoint server. ## HTCondor Connection Brokering (CCB)¶ HTCondor Connection Brokering, or CCB, is a way of allowing HTCondor components to communicate with each other when one side is in a private network or behind a firewall. Specifically, CCB allows communication across a private network boundary in the following scenario: an HTCondor tool or daemon (process A) needs to connect to an HTCondor daemon (process B), but the network does not allow a TCP connection to be created from A to B; it only allows connections from B to A. In this case, B may be configured to register itself with a CCB server that both A and B can connect to. Then when A needs to connect to B, it can send a request to the CCB server, which will instruct B to connect to A so that the two can communicate. As an example, consider an HTCondor execute node that is within a private network. This execute node’s condor_startd is process B. This execute node cannot normally run jobs submitted from a machine that is outside of that private network, because bi-directional connectivity between the submit node and the execute node is normally required. However, if both execute and submit machine can connect to the CCB server, if both are authorized by the CCB server, and if it is possible for the execute node within the private network to connect to the submit node, then it is possible for the submit node to run jobs on the execute node. To effect this CCB solution, the execute node’s condor_startd within the private network registers itself with the CCB server by setting the configuration variable CCB_ADDRESS . The submit node’s condor_schedd communicates with the CCB server, requesting that the execute node’s condor_startd open the TCP connection. The CCB server forwards this request to the execute node’s condor_startd, which opens the TCP connection. Once the connection is open, bi-directional communication is enabled. If the location of the execute and submit nodes is reversed with respect to the private network, the same idea applies: the submit node within the private network registers itself with a CCB server, such that when a job is running and the execute node needs to connect back to the submit node (for example, to transfer output files), the execute node can connect by going through CCB to request a connection. If both A and B are in separate private networks, then CCB alone cannot provide connectivity. However, if an incoming port or port range can be opened in one of the private networks, then the situation becomes equivalent to one of the scenarios described above and CCB can provide bi-directional communication given only one-directional connectivity. See Port Usage in HTCondor for information on opening port ranges. Also note that CCB works nicely with condor_shared_port. Any condor_collector may be used as a CCB server. There is no requirement that the condor_collector acting as the CCB server be the same condor_collector that a daemon advertises itself to (as with COLLECTOR_HOST). However, this is often a convenient choice. ### Example Configuration¶ This example assumes that there is a pool of machines in a private network that need to be made accessible from the outside, and that the condor_collector (and therefore CCB server) used by these machines is accessible from the outside. Accessibility might be achieved by a special firewall rule for the condor_collector port, or by being on a dual-homed machine in both networks. The configuration of variable CCB_ADDRESS on machines in the private network causes registration with the CCB server as in the example: CCB_ADDRESS =$(COLLECTOR_HOST)
PRIVATE_NETWORK_NAME = cs.wisc.edu


The definition of PRIVATE_NETWORK_NAME ensures that all communication between nodes within the private network continues to happen as normal, and without going through the CCB server. The name chosen for PRIVATE_NETWORK_NAME should be different from the private network name chosen for any HTCondor installations that will be communicating with this pool.

Under Unix, and with large HTCondor pools, it is also necessary to give the condor_collector acting as the CCB server a large enough limit of file descriptors. This may be accomplished with the configuration variable MAX_FILE_DESCRIPTORS or an equivalent. Each HTCondor process configured to use CCB with CCB_ADDRESS requires one persistent TCP connection to the CCB server. A typical execute node requires one connection for the condor_master, one for the condor_startd, and one for each running job, as represented by a condor_starter. A typical submit machine requires one connection for the condor_master, one for the condor_schedd, and one for each running job, as represented by a condor_shadow. If there will be no administrative commands required to be sent to the condor_master from outside of the private network, then CCB may be disabled in the condor_master by assigning MASTER.CCB_ADDRESS to nothing:

MASTER.CCB_ADDRESS =


Completing the count of TCP connections in this example: suppose the pool consists of 500 8-slot execute nodes and CCB is not disabled in the configuration of the condor_master processes. In this case, the count of needed file descriptors plus some extra for other transient connections to the collector is 500*(1+1+8)=5000. Be generous, and give it twice as many descriptors as needed by CCB alone:

COLLECTOR.MAX_FILE_DESCRIPTORS = 10000


### Security and CCB¶

The CCB server authorizes all daemons that register themselves with it (using CCB_ADDRESS ) at the DAEMON authorization level (these are playing the role of process A in the above description). It authorizes all connection requests (from process B) at the READ authorization level. As usual, whether process B authorizes process A to do whatever it is trying to do is up to the security policy for process B; from the HTCondor security model’s point of view, it is as if process A connected to process B, even though at the network layer, the reverse is true.

### Troubleshooting CCB¶

Errors registering with CCB or requesting connections via CCB are logged at level D_ALWAYS in the debugging log. These errors may be identified by searching for “CCB” in the log message. Command-line tools require the argument -debug for this information to be visible. To see details of the CCB protocol add D_FULLDEBUG to the debugging options for the particular HTCondor subsystem of interest. Or, add D_FULLDEBUG to ALL_DEBUG to get extra debugging from all HTCondor components.

A daemon that has successfully registered itself with CCB will advertise this fact in its address in its ClassAd. The ClassAd attribute MyAddress will contain information about its "CCBID".

### Scalability and CCB¶

Any number of CCB servers may be used to serve a pool of HTCondor daemons. For example, half of the pool could use one CCB server and half could use another. Or for redundancy, all daemons could use both CCB servers and then CCB connection requests will load-balance across them. Typically, the limit of how many daemons may be registered with a single CCB server depends on the authentication method used by the condor_collector for DAEMON-level and READ-level access, and on the amount of memory available to the CCB server. We are not able to provide specific recommendations at this time, but to give a very rough idea, a server class machine should be able to handle CCB service plus normal condor_collector service for a pool containing a few thousand slots without much trouble.

## Using TCP to Send Updates to the condor_collector¶

TCP sockets are reliable, connection-based sockets that guarantee the delivery of any data sent. However, TCP sockets are fairly expensive to establish, and there is more network overhead involved in sending and receiving messages.

UDP sockets are datagrams, and are not reliable. There is very little overhead in establishing or using a UDP socket, but there is also no guarantee that the data will be delivered. The lack of guaranteed delivery of UDP will negatively affect some pools, particularly ones comprised of machines across a wide area network (WAN) or highly-congested network links, where UDP packets are frequently dropped.

By default, HTCondor daemons will use TCP to send updates to the condor_collector, with the exception of the condor_collector forwarding updates to any condor_collector daemons specified in CONDOR_VIEW_HOST, where UDP is used. These configuration variables control the protocol used:

UPDATE_COLLECTOR_WITH_TCP

When set to False, the HTCondor daemons will use UDP to update the condor_collector, instead of the default TCP. Defaults to True.

UPDATE_VIEW_COLLECTOR_WITH_TCP

When set to True, the HTCondor collector will use TCP to forward updates to condor_collector daemons specified by CONDOR_VIEW_HOST, instead of the default UDP. Defaults to False.

TCP_UPDATE_COLLECTORS

A list of condor_collector daemons which will be updated with TCP instead of UDP, when UPDATE_COLLECTOR_WITH_TCP or UPDATE_VIEW_COLLECTOR_WITH_TCP is set to False.

When there are sufficient file descriptors, the condor_collector leaves established TCP sockets open, facilitating better performance. Subsequent updates can reuse an already open socket.

Each HTCondor daemon that sends updates to the condor_collector will have 1 socket open to it. So, in a pool with N machines, each of them running a condor_master, condor_schedd, and condor_startd, the condor_collector would need at least 3*N file descriptors. If the condor_collector is also acting as a CCB server, it will require an additional file descriptor for each registered daemon. In the default configuration, the number of file descriptors available to the condor_collector is 10240. For very large pools, the number of descriptor can be modified with the configuration:

COLLECTOR_MAX_FILE_DESCRIPTORS = 40960


If there are insufficient file descriptors for all of the daemons sending updates to the condor_collector, a warning will be printed in the condor_collector log file. The string "file descriptor safety level exceeded" identifies this warning.

## Running HTCondor on an IPv6 Network Stack¶

HTCondor supports using IPv4, IPv6, or both.

To require IPv4, you may set ENABLE_IPV4 to true; if the machine does not have an interface with an IPv4 address, HTCondor will not start. Likewise, to require IPv6, you may set ENABLE_IPV6 to true.

If you set ENABLE_IPV4 to false, HTCondor will not use IPv4, even if it is available; likewise for ENABLE_IPV6 and IPv6.

The default setting for ENABLE_IPV4 and ENABLE_IPV6 is auto. If HTCondor does not find an interface with an address of the corresponding protocol, that protocol will not be used. Additionally, if only one of the protocols has a private or public address, the other protocol will be disabled. For instance, a machine with a private IPv4 address and a loopback IPv6 address will only use IPv4; there’s no point trying to contact some other machine via IPv6 over a loopback interface.

If both IPv4 and IPv6 networking are enabled, HTCondor runs in mixed mode. In mixed mode, HTCondor daemons have at least one IPv4 address and at least one IPv6 address. Other daemons and the command-line tools choose between these addresses based on which protocols are enabled for them; if both are, they will prefer the first address listed by that daemon.

A daemon may be listening on one, some, or all of its machine’s addresses. (See NETWORK_INTERFACE) Daemons may presently list at most two addresses, one IPv6 and one IPv4. Each address is the “most public” address of its protocol; by default, the IPv6 address is listed first. HTCondor selects the “most public” address heuristically.

Nonetheless, there are two cases in which HTCondor may not use an IPv6 address when one is available:

• When given a literal IP address, HTCondor will use that IP address.

• When looking up a host name using DNS, HTCondor will use the first address whose protocol is enabled for the tool or daemon doing the look up.

You may force HTCondor to prefer IPv4 in all three of these situations by setting the macro PREFER_IPV4 to true; this is the default. With PREFER_IPV4 set, HTCondor daemons will list their “most public” IPv4 address first; prefer the IPv4 address when choosing from another’s daemon list; and prefer the IPv4 address when looking up a host name in DNS.

In practice, both an HTCondor pool’s central manager and any submit machines within a mixed mode pool must have both IPv4 and IPv6 addresses for both IPv4-only and IPv6-only condor_startd daemons to function properly.

### IPv6 and Host-Based Security¶

You may freely intermix IPv6 and IPv4 address literals. You may also specify IPv6 netmasks as a legal IPv6 address followed by a slash followed by the number of bits in the mask; or as the prefix of a legal IPv6 address followed by two colons followed by an asterisk. The latter is entirely equivalent to the former, except that it only allows you to (implicitly) specify mask bits in groups of sixteen. For example, fe8f:1234::/60 and fe8f:1234::* specify the same network mask.

The HTCondor security subsystem resolves names in the ALLOW and DENY lists and uses all of the resulting IP addresses. Thus, to allow or deny IPv6 addresses, the names must have IPv6 DNS entries (AAAA records), or NO_DNS must be enabled.

When you specify an IPv6 address and a port number simultaneously, you must separate the IPv6 address from the port number by placing square brackets around the address. For instance:

COLLECTOR_HOST = [2607:f388:1086:0:21e:68ff:fe0f:6462]:5332


If you do not (or may not) specify a port, do not use the square brackets. For instance:

NETWORK_INTERFACE = 1234:5678::90ab


### IPv6 without DNS¶

When using the configuration variable NO_DNS , IPv6 addresses are turned into host names by taking the IPv6 address, changing colons to dashes, and appending \$(DEFAULT_DOMAIN_NAME). So,

2607:f388:1086:0:21b:24ff:fedf:b520


becomes

2607-f388-1086-0-21b-24ff-fedf-b520.example.com


assuming

DEFAULT_DOMAIN_NAME=example.com
`