Advanced Schedd Interaction
Launch this tutorial in a Jupyter Notebook on Binder:
The introductory tutorial only scratches the surface of what the Python bindings can do with the condor_schedd
; this module focuses on covering a wider range of functionality:
Job and history querying.
Advanced job submission.
Python-based negotiation with the Schedd.
As usual, we start by importing the relevant modules:
[1]:
import htcondor
import classad
Job and History Querying
In HTCondor Introduction, we covered the Schedd.query
method and its two most important keywords:
requirements
: Filters the jobs the schedd should return.projection
: Filters the attributes returned for each job.
For those familiar with SQL queries, requirements
performs the equivalent as the WHERE
clause while projection
performs the equivalent of the column listing in SELECT
.
There are two other keywords worth mentioning:
limit
: Limits the number of returned ads; equivalent to SQL’sLIMIT
.opts
: Additional flags to send to the schedd to alter query behavior. The only flag currently defined isQueryOpts.AutoCluster
; this groups the returned results by the current set of “auto-cluster” attributes used by the pool. It’s analogous toGROUP BY
in SQL, except the columns used for grouping are controlled by the schedd.
To illustrate these additional keywords, let’s first submit a few jobs:
[2]:
schedd = htcondor.Schedd()
sub = htcondor.Submit({
"executable": "/bin/sleep",
"arguments": "5m",
"hold": "True",
})
submit_result = schedd.submit(sub, count=10)
print(submit_result.cluster())
19
Note: In this example, we used the hold
submit command to indicate that the jobs should start out in the condor_schedd
in the Hold state; this is used simply to prevent the jobs from running to completion while you are running the tutorial.
We now have 10 jobs running under cluster_id
; they should all be identical:
[3]:
print(len(schedd.query(projection=["ProcID"], constraint=f"ClusterId=={submit_result.cluster()}")))
10
History Queries
After a job has finished in the Schedd, it moves from the queue to the history file. The history can be queried (locally or remotely) with the Schedd.history
method:
[4]:
schedd = htcondor.Schedd()
for ad in schedd.history(
constraint='true',
projection=['ProcId', 'ClusterId', 'JobStatus'],
match=2, # limit to 2 returned results
):
print(ad)
[
JobStatus = 3;
ProcId = 99;
ClusterId = 18
]
[
JobStatus = 3;
ProcId = 98;
ClusterId = 18
]