{ "cells": [ { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "# Advanced Schedd Interaction\n", "\n", "Launch this tutorial in a Jupyter Notebook on Binder: \n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/htcondor/htcondor-python-bindings-tutorials/master?urlpath=lab/tree/Advanced-Schedd-Interactions.ipynb)\n", "\n", "The introductory tutorial only scratches the surface of what the Python bindings\n", "can do with the ``condor_schedd``; this module focuses on covering a wider range\n", "of functionality:\n", "\n", "* Job and history querying.\n", "* Advanced job submission.\n", "* Python-based negotiation with the Schedd.\n", "\n", "As usual, we start by importing the relevant modules:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-09-16T13:14:16.915503Z", "iopub.status.busy": "2021-09-16T13:14:16.914602Z", "iopub.status.idle": "2021-09-16T13:14:16.985521Z", "shell.execute_reply": "2021-09-16T13:14:16.986912Z" }, "pycharm": {} }, "outputs": [], "source": [ "import htcondor\n", "import classad" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Job and History Querying\n", "\n", "In [HTCondor Introduction](HTCondor-Introduction.ipynb), we covered the `Schedd.xquery` method\n", "and its two most important keywords:\n", "\n", "* ``requirements``: Filters the jobs the schedd should return.\n", "* ``projection``: Filters the attributes returned for each job.\n", "\n", "For those familiar with SQL queries, ``requirements`` performs the equivalent\n", "as the ``WHERE`` clause while ``projection`` performs the equivalent of the column\n", "listing in ``SELECT``.\n", "\n", "There are two other keywords worth mentioning:\n", "\n", "* ``limit``: Limits the number of returned ads; equivalent to SQL's ``LIMIT``.\n", "* ``opts``: Additional flags to send to the schedd to alter query behavior.\n", " The only flag currently defined is `QueryOpts.AutoCluster`; this\n", " groups the returned results by the current set of \"auto-cluster\" attributes\n", " used by the pool. It's analogous to ``GROUP BY`` in SQL, except the columns\n", " used for grouping are controlled by the schedd.\n", "\n", "To illustrate these additional keywords, let's first submit a few jobs:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-09-16T13:14:16.992260Z", "iopub.status.busy": "2021-09-16T13:14:16.991039Z", "iopub.status.idle": "2021-09-16T13:14:17.018593Z", "shell.execute_reply": "2021-09-16T13:14:17.019422Z" }, "pycharm": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n" ] } ], "source": [ "schedd = htcondor.Schedd()\n", "sub = htcondor.Submit({\n", " \"executable\": \"/bin/sleep\",\n", " \"arguments\": \"5m\",\n", " \"hold\": \"True\",\n", "})\n", "submit_result = schedd.submit(sub, count=10)\n", "print(submit_result.cluster())" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "**Note:** In this example, we used the ``hold`` submit command to indicate that\n", "the jobs should start out in the ``condor_schedd`` in the *Hold* state; this\n", "is used simply to prevent the jobs from running to completion while you are\n", "running the tutorial.\n", "\n", "We now have 10 jobs running under ``cluster_id``; they should all be identical:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2021-09-16T13:14:17.026534Z", "iopub.status.busy": "2021-09-16T13:14:17.024289Z", "iopub.status.idle": "2021-09-16T13:14:17.035542Z", "shell.execute_reply": "2021-09-16T13:14:17.036427Z" }, "pycharm": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n" ] } ], "source": [ "print(len(schedd.query(projection=[\"ProcID\"], constraint=f\"ClusterId=={submit_result.cluster()}\")))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "The ``sum(1 for _ in ...)`` syntax is a simple way to count the number of items\n", "produced by an iterator without buffering all the objects in memory." ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Querying many Schedds\n", "\n", "On larger pools, it's common to write Python scripts that interact with not one but many schedds. For example,\n", "if you want to implement a \"global query\" (equivalent to ``condor_q -g``; concatenates all jobs in all schedds),\n", "it might be tempting to write code like this:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2021-09-16T13:14:17.043221Z", "iopub.status.busy": "2021-09-16T13:14:17.041614Z", "iopub.status.idle": "2021-09-16T13:14:17.058488Z", "shell.execute_reply": "2021-09-16T13:14:17.059466Z" }, "pycharm": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n" ] } ], "source": [ "jobs = []\n", "for schedd_ad in htcondor.Collector().locateAll(htcondor.DaemonTypes.Schedd):\n", " schedd = htcondor.Schedd(schedd_ad)\n", " jobs += schedd.xquery()\n", "print(len(jobs))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "This is sub-optimal for two reasons:\n", "\n", "* ``xquery`` is not given any projection, meaning it will pull all attributes for all jobs -\n", " much more data than is needed for simply counting jobs.\n", "* The querying across all schedds is serialized: we may wait for painfully long on one or two\n", " \"bad apples.\"\n", "\n", "We can instead begin the query for all schedds simultaneously, then read the responses as\n", "they are sent back. First, we start all the queries without reading responses:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2021-09-16T13:14:17.064017Z", "iopub.status.busy": "2021-09-16T13:14:17.063175Z", "iopub.status.idle": "2021-09-16T13:14:17.071287Z", "shell.execute_reply": "2021-09-16T13:14:17.072088Z" }, "pycharm": {} }, "outputs": [], "source": [ "queries = []\n", "coll_query = htcondor.Collector().locateAll(htcondor.DaemonTypes.Schedd)\n", "for schedd_ad in coll_query:\n", " schedd_obj = htcondor.Schedd(schedd_ad)\n", " queries.append(schedd_obj.xquery())" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "The iterators will yield the matching jobs; to return the autoclusters instead of jobs, use\n", "the ``AutoCluster`` option (``schedd_obj.xquery(opts=htcondor.QueryOpts.AutoCluster)``). One\n", "auto-cluster ad is returned for each set of jobs that have identical values for all significant\n", "attributes. A sample auto-cluster looks like:\n", "\n", " [\n", " RequestDisk = DiskUsage;\n", " Rank = 0.0;\n", " FileSystemDomain = \"hcc-briantest7.unl.edu\";\n", " MemoryUsage = ( ( ResidentSetSize + 1023 ) / 1024 );\n", " ImageSize = 1000;\n", " JobUniverse = 5;\n", " DiskUsage = 1000;\n", " JobCount = 1;\n", " Requirements = ( TARGET.Arch == \"X86_64\" ) && ( TARGET.OpSys == \"LINUX\" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == MY.FileSystemDomain ) );\n", " RequestMemory = ifthenelse(MemoryUsage isnt undefined,MemoryUsage,( ImageSize + 1023 ) / 1024);\n", " ResidentSetSize = 0;\n", " ServerTime = 1483758177;\n", " AutoClusterId = 2\n", " ]\n", "\n", "We use the `poll` function, which will return when a query has available results:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2021-09-16T13:14:17.076522Z", "iopub.status.busy": "2021-09-16T13:14:17.075609Z", "iopub.status.idle": "2021-09-16T13:14:17.086183Z", "shell.execute_reply": "2021-09-16T13:14:17.086783Z" }, "pycharm": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Got 10 results from jovyan@abae0fbbde81.\n", "{'jovyan@abae0fbbde81': 10}\n" ] } ], "source": [ "job_counts = {}\n", "for query in htcondor.poll(queries):\n", " schedd_name = query.tag()\n", " job_counts.setdefault(schedd_name, 0)\n", " count = len(query.nextAdsNonBlocking())\n", " job_counts[schedd_name] += count\n", " print(\"Got {} results from {}.\".format(count, schedd_name))\n", "print(job_counts)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "The `QueryIterator.tag` method is used to identify which query is returned; the\n", "tag defaults to the Schedd's name but can be manually set through the ``tag`` keyword argument\n", "to `Schedd.xquery`." ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## History Queries\n", "\n", "After a job has finished in the Schedd, it moves from the queue to the history file. The\n", "history can be queried (locally or remotely) with the `Schedd.history` method:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2021-09-16T13:14:17.090919Z", "iopub.status.busy": "2021-09-16T13:14:17.090063Z", "iopub.status.idle": "2021-09-16T13:14:17.112956Z", "shell.execute_reply": "2021-09-16T13:14:17.113612Z" }, "pycharm": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " [\n", " JobStatus = 3; \n", " ProcId = 0; \n", " ClusterId = 1\n", " ]\n", "\n", " [\n", " JobStatus = 3; \n", " ProcId = 9; \n", " ClusterId = 3\n", " ]\n" ] } ], "source": [ "schedd = htcondor.Schedd()\n", "for ad in schedd.history(\n", " constraint='true',\n", " projection=['ProcId', 'ClusterId', 'JobStatus'],\n", " match=2, # limit to 2 returned results\n", "):\n", " print(ad)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 4 }