{ "cells": [ { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "# HTCondor Introduction\n", "\n", "Launch this tutorial in a Jupyter Notebook on Binder: \n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/htcondor/htcondor-python-bindings-tutorials/master?urlpath=lab/tree/HTCondor-Introduction.ipynb)\n", "\n", "Let's start interacting with the HTCondor daemons!\n", "\n", "We'll cover the basics of two daemons, the _Collector_ and the _Schedd_:\n", "\n", "- The **Collector** maintains an inventory of all the pieces of the HTCondor pool. For example, each machine that can run jobs will advertise a ClassAd describing its resources and state. In this module, we'll learn the basics of querying the collector for information and displaying results.\n", "- The **Schedd** maintains a queue of jobs and is responsible for managing their execution. We'll learn the basics of querying the schedd.\n", "\n", "There are several other daemons - particularly, the _Startd_ and the _Negotiator_ - that the Python bindings can interact with. We'll cover those in the advanced modules.\n", "\n", "If you are running these tutorials in the provided Docker container or on Binder, a local HTCondor pool has been started in the background for you to interact with.\n", "\n", "To get start, let's import the `htcondor` modules." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": {} }, "outputs": [], "source": [ "import htcondor\n", "import classad" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Collector\n", "\n", "We'll start with the _Collector_, which gathers descriptions of the states of all the daemons in your HTCondor pool. The collector provides both **service discovery** and **monitoring** for these daemons.\n", "\n", "Let's try to find the Schedd information for your HTCondor pool. First, we'll create a `Collector` object, then use the `locate` method:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " [\n", " CondorPlatform = \"$CondorPlatform: X86_64-CentOS_7.9 $\"; \n", " MyType = \"Scheduler\"; \n", " Machine = \"fa6c829ace67\"; \n", " Name = \"jovyan@fa6c829ace67\"; \n", " CondorVersion = \"$CondorVersion: 10.7.0 2023-07-31 BuildID: UW_Python_Wheel_Build $\"; \n", " MyAddress = \"<172.17.0.2:9618?addrs=172.17.0.2-9618&alias=fa6c829ace67&noUDP&sock=schedd_82_2342>\"\n", " ]\n" ] } ], "source": [ "coll = htcondor.Collector() # create the object representing the collector\n", "schedd_ad = coll.locate(htcondor.DaemonTypes.Schedd) # locate the default schedd\n", "\n", "print(schedd_ad)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "The `locate` method takes a type of daemon and (optionally) a name, returning a ClassAd that describes how to contact the daemon.\n", "\n", "A few interesting points about the above example:\n", " - Because we didn't provide the collector with a constructor, we used the default collector in the container's configuration file.\n", " If we wanted to instead query a non-default collector, we could have done `htcondor.Collector(\"collector.example.com\")`.\n", " - We used the `DaemonTypes` enumeration to pick the kind of daemon to return.\n", " - If there were multiple schedds in the pool, the `locate` query would have failed.\n", " In such a case, we need to provide an explicit name to the method. E.g., `coll.locate(htcondor.DaemonTypes.Schedd, \"schedd.example.com\")`.\n", " - The `MyAddress` field in the ad is the actual address information.\n", " You may be surprised that this is not simply a `hostname:port`; \n", " to help manage addressing in the today's complicated Internet (full of NATs, private networks, and firewalls), a more flexible structure was needed.\n", " HTCondor developers sometimes refer to this as the _sinful string_; here, _sinful_ is a play on a Unix data structure, not a moral judgement.\n", " \n", "The `locate` method often returns only enough data to contact a remote daemon. Typically, a ClassAd records significantly more attributes. For example, if we wanted to query for a few specific attributes, we would use the `query` method instead:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "pycharm": {} }, "outputs": [ { "data": { "text/plain": [ "[[ DaemonCoreDutyCycle = 1.150843433815885E-02; Name = \"jovyan@fa6c829ace67\"; MyAddress = \"<172.17.0.2:9618?addrs=172.17.0.2-9618&alias=fa6c829ace67&noUDP&sock=schedd_82_2342>\" ]]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coll.query(htcondor.AdTypes.Schedd, projection=[\"Name\", \"MyAddress\", \"DaemonCoreDutyCycle\"])" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "Here, `query` takes an `AdType` (slightly more generic than the `DaemonTypes`, as many kinds of ads are in the collector) and several optional arguments, then returns a list of ClassAds.\n", "\n", "We used the `projection` keyword argument; this indicates what attributes you want returned.\n", "The collector may automatically insert additional attributes (here, only `MyType`); \n", "if an ad is missing a requested attribute, it is simply not set in the returned ClassAd object.\n", "If no projection is specified, then all attributes are returned.\n", "\n", "**WARNING**: when possible, utilize the projection to limit the data returned. Some ads may have hundreds of attributes, making returning the entire ad an expensive operation.\n", "\n", "The projection filters the returned _keys_; to filter out unwanted _ads_, utilize the `constraint` option. Let's do the same query again, but specify our hostname explicitly:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": {} }, "outputs": [ { "data": { "text/plain": [ "[[ DaemonCoreDutyCycle = 1.150843433815885E-02; Name = \"jovyan@fa6c829ace67\"; MyAddress = \"<172.17.0.2:9618?addrs=172.17.0.2-9618&alias=fa6c829ace67&noUDP&sock=schedd_82_2342>\" ]]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import socket # We'll use this to automatically fill in our hostname\n", "\n", "name = classad.quote(f\"jovyan@{socket.getfqdn()}\")\n", "coll.query(\n", " htcondor.AdTypes.Schedd, \n", " constraint=f\"Name =?= {name}\", \n", " projection=[\"Name\", \"MyAddress\", \"DaemonCoreDutyCycle\"],\n", ")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "Notes:\n", "- `constraint` accepts either an `ExprTree` or `string` object; the latter is automatically parsed as an expression.\n", "- We used the `classad.quote` function to properly quote the hostname string. In this example, we're relatively certain the hostname won't contain quotes. However, it is good practice to use the `quote` function to avoid possible SQL-injection-type attacks. Consider what would happen if the host's FQDN contained spaces and doublequotes, such as `foo.example.com\" || true`!" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Schedd\n", "\n", "Let's try our hand at querying the `schedd`!\n", "\n", "First, we'll need a schedd object. You may either create one out of the ad returned by `locate` above or use the default in the configuration file:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "pycharm": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "schedd = htcondor.Schedd(schedd_ad)\n", "print(schedd)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "Unfortunately, as there are no jobs in our personal HTCondor pool, querying the `schedd` will be boring. Let's submit a few jobs (**note** the API used below will be covered by the next module; it's OK if you don't understand it now):" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "pycharm": {} }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sub = htcondor.Submit(\n", " executable = \"/bin/sleep\",\n", " arguments = \"5m\",\n", ")\n", "schedd.submit(sub, count=10)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "We should now have 10 jobs in queue, each of which should take 5 minutes to complete.\n", "\n", "Let's query for the jobs, paying attention to the jobs' ID and status:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "pycharm": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ ProcId = 8; ClusterId = 7; JobStatus = 1; ServerTime = 1695159746 ]\n", "[ ProcId = 9; ClusterId = 7; JobStatus = 1; ServerTime = 1695159746 ]\n", "[ ProcId = 0; ClusterId = 7; JobStatus = 2; ServerTime = 1695159746 ]\n", "[ ProcId = 1; ClusterId = 7; JobStatus = 2; ServerTime = 1695159746 ]\n", "[ ProcId = 2; ClusterId = 7; JobStatus = 2; ServerTime = 1695159746 ]\n", "[ ProcId = 3; ClusterId = 7; JobStatus = 1; ServerTime = 1695159746 ]\n", "[ ProcId = 4; ClusterId = 7; JobStatus = 1; ServerTime = 1695159746 ]\n", "[ ProcId = 5; ClusterId = 7; JobStatus = 1; ServerTime = 1695159746 ]\n", "[ ProcId = 6; ClusterId = 7; JobStatus = 1; ServerTime = 1695159746 ]\n", "[ ProcId = 7; ClusterId = 7; JobStatus = 1; ServerTime = 1695159746 ]\n" ] } ], "source": [ "for job in schedd.query(projection=['ClusterId', 'ProcId', 'JobStatus']):\n", " print(repr(job))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "The `JobStatus` is an integer; the integers map into the following states:\n", "- `1`: Idle (`I`)\n", "- `2`: Running (`R`)\n", "- `3`: Removed (`X`)\n", "- `4`: Completed (`C`)\n", "- `5`: Held (`H`)\n", "- `6`: Transferring Output\n", "- `7`: Suspended\n", "\n", "Depending on how quickly you executed the above cell, you might see all jobs idle (`JobStatus = 1`) or some jobs running (`JobStatus = 2`) above.\n", "\n", "As with the Collector's `query` method, we can also filter out jobs using `query`:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "pycharm": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8\n", "9\n", "5\n", "6\n", "7\n" ] } ], "source": [ "for ad in schedd.query(constraint = 'ProcId >= 5', projection=['ProcId']):\n", " print(ad.get('ProcId'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's clean up after ourselves (this will remove all of the jobs you own from the queue)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "[ TotalChangedAds = 1; TotalJobAds = 0; TotalPermissionDenied = 0; TotalAlreadyDone = 0; TotalBadStatus = 0; TotalNotFound = 0; TotalSuccess = 10; TotalError = 0 ]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import getpass\n", "\n", "schedd.act(htcondor.JobAction.Remove, f'Owner == \"{getpass.getuser()}\"')" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## On Job Submission\n", "\n", "Congratulations! You can now perform simple queries against the collector for worker and submit hosts, as well as simple job queries against the submit host!\n", "\n", "It is now time to move on to [advanced job submission and management](Advanced-Job-Submission-And-Management.ipynb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 4 }