GridManager Configuration Options

These macros affect the condor_gridmanager.

GRIDMANAGER_LOG

Defines the path and file name for the log of the condor_gridmanager. The owner of the file is the condor user.

GRIDMANAGER_CHECKPROXY_INTERVAL

The number of seconds between checks for an updated X509 proxy credential. The default is 10 minutes (600 seconds).

GRIDMANAGER_PROXY_REFRESH_TIME

For remote schedulers that allow for X.509 proxy refresh, the condor_gridmanager will not forward a refreshed proxy until the lifetime left for the proxy on the remote machine falls below this value. The value is in seconds and the default is 21600 (6 hours).

GRIDMANAGER_MINIMUM_PROXY_TIME

The minimum number of seconds before expiration of the X509 proxy credential for the gridmanager to continue operation. If seconds until expiration is less than this number, the gridmanager will shutdown and wait for a refreshed proxy credential. The default is 3 minutes (180 seconds).

HOLD_JOB_IF_CREDENTIAL_EXPIRES

True or False. Defaults to True. If True, and for grid universe jobs only, HTCondor-G will place a job on hold GRIDMANAGER_MINIMUM_PROXY_TIME seconds before the proxy expires. If False, the job will stay in the last known state, and HTCondor-G will periodically check to see if the job’s proxy has been refreshed, at which point management of the job will resume.

GRIDMANAGER_SELECTION_EXPR

By default, the gridmanager operates on a per-Owner basis. That is, the condor_schedd starts a distinct condor_gridmanager for each grid universe job with a distinct Owner. For additional isolation and/or scalability, you may set this macro to a ClassAd expression. It will be evaluated against each grid universe job, and jobs with the same evaluated result will go to the same gridmanager. For instance, if you want to isolate job going to different remote sites from each other, the following expression works:

GRIDMANAGER_SELECTION_EXPR = GridResource
GRIDMANAGER_LOG_APPEND_SELECTION_EXPR

A boolean value that defaults to False. When True, the evaluated value of GRIDMANAGER_SELECTION_EXPR (if set) is appended to the value of GRIDMANAGER_LOG for each condor_gridmanager instance. The value is sanitized to remove characters that have special meaning to the shell. This allows each condor_gridmanager instance that runs concurrently to write to a separate daemon log.

GRIDMANAGER_CONTACT_SCHEDD_DELAY

The minimum number of seconds between connections to the condor_schedd. The default is 5 seconds.

GRIDMANAGER_JOB_PROBE_INTERVAL

The number of seconds between active probes for the status of a submitted job. The default is 1 minute (60 seconds). Intervals specific to grid types can be set by appending the name of the grid type to the configuration variable name, as the example

GRIDMANAGER_JOB_PROBE_INTERVAL_ARC = 300
GRIDMANAGER_JOB_PROBE_RATE

The maximum number of job status probes per second that will be issued to a given remote resource. The time between status probes for individual jobs may be lengthened beyond GRIDMANAGER_JOB_PROBE_INTERVAL to enforce this rate. The default is 5 probes per second. Rates specific to grid types can be set by appending the name of the grid type to the configuration variable name, as the example

GRIDMANAGER_JOB_PROBE_RATE_ARC = 15
GRIDMANAGER_RESOURCE_PROBE_INTERVAL

When a resource appears to be down, how often (in seconds) the condor_gridmanager should ping it to test if it is up again. The default is 5 minutes (300 seconds).

GRIDMANAGER_EMPTY_RESOURCE_DELAY

The number of seconds that the condor_gridmanager retains information about a grid resource, once the condor_gridmanager has no active jobs on that resource. An active job is a grid universe job that is in the queue, for which JobStatus is anything other than Held. Defaults to 300 seconds.

GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE

An integer value that limits the number of jobs that a condor_gridmanager daemon will submit to a resource. A comma-separated list of pairs that follows this integer limit will specify limits for specific remote resources. Each pair is a host name and the job limit for that host. Consider the example:

GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE = 200, foo.edu, 50, bar.com, 100

In this example, all resources have a job limit of 200, except foo.edu, which has a limit of 50, and bar.com, which has a limit of 100.

Limits specific to grid types can be set by appending the name of the grid type to the configuration variable name, as the example

GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE_PBS = 300

In this example, the job limit for all PBS resources is 300. Defaults to 1000.

GAHP_DEBUG_HIDE_SENSITIVE_DATA

A boolean value that determines when sensitive data such as security keys and passwords are hidden, when communication to or from a GAHP server is written to a daemon log. The default is True, hiding sensitive data.

GRIDMANAGER_GAHP_CALL_TIMEOUT

The number of seconds after which a pending GAHP command should time out. The default is 5 minutes (300 seconds).

GRIDMANAGER_GAHP_RESPONSE_TIMEOUT

The condor_gridmanager will assume a GAHP is hung if this many seconds pass without a response. The default is 20.

GRIDMANAGER_MAX_PENDING_REQUESTS

The maximum number of GAHP commands that can be pending at any time. The default is 50.

GRIDMANAGER_CONNECT_FAILURE_RETRY_COUNT

The number of times to retry a command that failed due to a timeout or a failed connection. The default is 3.

EC2_RESOURCE_TIMEOUT

The number of seconds after which if an EC2 grid universe job fails to ping the EC2 service, the job will be put on hold. Defaults to -1, which implements an infinite length, such that a failure to ping the service will never put the job on hold.

EC2_GAHP_RATE_LIMIT

The minimum interval, in whole milliseconds, between requests to the same EC2 service with the same credentials. Defaults to 100.

BATCH_GAHP_CHECK_STATUS_ATTEMPTS

The number of times a failed status command issued to the blahpd should be retried. These retries allow the condor_gridmanager to tolerate short-lived failures of the underlying batch system. The default value is 5.

C_GAHP_LOG

The complete path and file name of the HTCondor GAHP server’s log. The default value is /tmp/CGAHPLog.$(USERNAME).

MAX_C_GAHP_LOG

The maximum size of the C_GAHP_LOG.

C_GAHP_WORKER_THREAD_LOG

The complete path and file name of the HTCondor GAHP worker process’ log. The default value is /temp/CGAHPWorkerLog.$(USERNAME).

C_GAHP_CONTACT_SCHEDD_DELAY

The number of seconds that the condor_C-gahp daemon waits between consecutive connections to the remote condor_schedd in order to send batched sets of commands to be executed on that remote condor_schedd daemon. The default value is 5.

C_GAHP_MAX_FILE_REQUESTS

Limits the number of file transfer commands of each type (input, output, proxy refresh) that are performed before other (potentially higher-priority) commands are read and performed. The default value is 10.

BLAHPD_LOCATION

The complete path to the directory containing the blahp software, which is required for grid-type batch jobs. The default value is $(RELEASE_DIR).

GAHP_SSL_CADIR

The path to a directory that may contain the certificates (each in its own file) for multiple trusted CAs to be used by GAHP servers when authenticating with remote services.

GAHP_SSL_CAFILE

The path and file name of a file containing one or more trusted CA’s certificates to be used by GAHP servers when authenticating with remote services.

CONDOR_GAHP

The complete path and file name of the HTCondor GAHP executable. The default value is $(SBIN)/condor_c-gahp.

EC2_GAHP

The complete path and file name of the EC2 GAHP executable. The default value is $(SBIN)/ec2_gahp.

BATCH_GAHP

The complete path and file name of the batch GAHP executable, to be used for Slurm, PBS, LSF, SGE, and similar batch systems. The default location is $(BIN)/blahpd.

ARC_GAHP

The complete path and file name of the ARC GAHP executable. The default value is $(SBIN)/arc_gahp.

ARC_GAHP_COMMAND_LIMIT

On systems where libcurl uses NSS for security, start a new arc_gahp process when the existing one has handled the given number of commands. The default is 1000.

ARC_GAHP_USE_THREADS

Controls whether the arc_gahp should run multiple HTTPS requests in parallel in different threads. The default is False.

GCE_GAHP

The complete path and file name of the GCE GAHP executable. The default value is $(SBIN)/gce_gahp.

AZURE_GAHP

The complete path and file name of the Azure GAHP executable. The default value is $(SBIN)/AzureGAHPServer.py on Windows and $(SBIN)/AzureGAHPServer on other platforms.

BOINC_GAHP

The complete path and file name of the BOINC GAHP executable. The default value is $(SBIN)/boinc_gahp.