condor_suspend

Suspend running job(s) from HTCondor.

Synopsis

condor_suspend [-help | -version]

condor_suspend [OPTIONS] [cluster… | cluster.proc… | user…]

condor_suspend [-debug] [-long] [-totals] [-all] [-constraint expression] [-pool hostname[:portnumber] | -name scheddname | -addr “<a.b.c.d:port>”]

Description

All the processes in a suspended job are sent the SIGSTOP signal, or equivalent. They consume no cpu usage, but continue to use memory and scratch disk space. Suspended jobs still consume the slot they run in. Suspended jobs are still charged to submitters in term of user priority. For any given job, only the owner of the job or one of the queue super users (defined by the QUEUE_SUPER_USERS macro) can suspend the job.

Options

-help

Display usage information.

-version

Display version information.

-long

Display result ClassAd.

-totals

Display success/failure totals.

-pool hostname[:portnumber]

Specify a pool by giving the central manager’s host name and an optional port number.

-name scheddname

Send the command to a machine identified by scheddname.

-addr “<a.b.c.d:port>”

Send the command to a machine located at “<a.b.c.d:port>”.

-debug

Causes debugging information to be sent to stderr, based on the value of the configuration variable TOOL_DEBUG.

-constraint expression

Suspend all jobs which match the job ClassAd expression constraint.

-all

Suspend all the jobs in the queue.

cluster

Suspend all jobs in the specified cluster.

cluster.process

Suspend the specific job in the cluster.

user

Suspend jobs belonging to specified user.

General Remarks

If the -name option is specified, the named condor_schedd is targeted for processing. Otherwise, the local condor_schedd is targeted.

When a job is suspended, the match between the condor_schedd and machine is not been broken, such that the claim is still valid.

An administrator might want to suspend the jobs in a pool to quickly reduce the power draw in a pool, in the case where they may be able to condor_continue the jobs after a short while, and do not wish the jobs to be vacated and then restart from their beginning or last checkpoint.

Use condor_continue to continue suspended job executions.

Exit Status

0 - Success

1 - Failure has occurred

Examples

To suspend a specific job:

$ condor_continue 432.1

To suspend all jobs except for a specific user:

# condor_suspend -constraint 'Owner =!= "foo"'

See Also

condor_continue, condor_rm, condor_hold, condor_release, condor_vacate_job, condor_vacate

Availability

Linux, MacOS, Windows