Advanced Schedd Interaction

Launch this tutorial in a Jupyter Notebook on Binder: Binder

The introductory tutorial only scratches the surface of what the Python bindings can do with the condor_schedd; this module focuses on covering a wider range of functionality:

  • Job and history querying.

  • Advanced job submission.

  • Python-based negotiation with the Schedd.

As usual, we start by importing the relevant modules:

[1]:
import htcondor
import classad

Job and History Querying

In HTCondor Introduction, we covered the Schedd.query method and its two most important keywords:

  • requirements: Filters the jobs the schedd should return.

  • projection: Filters the attributes returned for each job.

For those familiar with SQL queries, requirements performs the equivalent as the WHERE clause while projection performs the equivalent of the column listing in SELECT.

There are two other keywords worth mentioning:

  • limit: Limits the number of returned ads; equivalent to SQL’s LIMIT.

  • opts: Additional flags to send to the schedd to alter query behavior. The only flag currently defined is QueryOpts.AutoCluster; this groups the returned results by the current set of “auto-cluster” attributes used by the pool. It’s analogous to GROUP BY in SQL, except the columns used for grouping are controlled by the schedd.

To illustrate these additional keywords, let’s first submit a few jobs:

[2]:
schedd = htcondor.Schedd()
sub = htcondor.Submit({
    "executable": "/bin/sleep",
    "arguments": "5m",
    "hold": "True",
})
submit_result = schedd.submit(sub, count=10)
print(submit_result.cluster())
19

Note: In this example, we used the hold submit command to indicate that the jobs should start out in the condor_schedd in the Hold state; this is used simply to prevent the jobs from running to completion while you are running the tutorial.

We now have 10 jobs running under cluster_id; they should all be identical:

[3]:
print(len(schedd.query(projection=["ProcID"], constraint=f"ClusterId=={submit_result.cluster()}")))
10

History Queries

After a job has finished in the Schedd, it moves from the queue to the history file. The history can be queried (locally or remotely) with the Schedd.history method:

[4]:
schedd = htcondor.Schedd()
for ad in schedd.history(
    constraint='true',
    projection=['ProcId', 'ClusterId', 'JobStatus'],
    match=2,  # limit to 2 returned results
):
    print(ad)

    [
        JobStatus = 3;
        ProcId = 99;
        ClusterId = 18
    ]

    [
        JobStatus = 3;
        ProcId = 98;
        ClusterId = 18
    ]