Personal Pools

Launch this tutorial in a Jupyter Notebook on Binder: Binder

A Personal HTCondor Pool is an HTCondor Pool that has a single owner, who is: - The pool’s administrator. - The only submitter who is allowed to submit jobs to the pool. - The owner of all resources managed by the pool.

The HTCondor Python bindings provide a submodule, htcondor.personal, which allows you to manage personal pools from Python. Personal pools are useful for: - Utilizing local computational resources (i.e., all of the cores on a lab server). - Created an isolated testing/development environment for HTCondor workflows. - Serving as an entrypoint to other computational resources, like annexes or flocked pools (not yet implemented).

We can start a personal pool by instantiating a PersonalPool. This object represents the personal pool and lets us manage its “lifecycle”: start up and shut down. We can also use the PersonalPool to interact with the HTCondor pool once it has been started up.

Each Personal Pool must have a unique “local directory”, corresponding to the HTCondor configuration parameter LOCAL_DIR. For this tutorial, we’ll put it in the current working directory so that it’s easy to find.

Advanced users can configure the personal pool using the PersonalPool constructor. See the documentation for details on the available options.

[1]:
import htcondor
from htcondor.personal import PersonalPool
from pathlib import Path
[2]:
pool = PersonalPool(local_dir = Path.cwd() / "personal-condor")
pool
[2]:
PersonalPool(local_dir=./personal-condor, state=INITIALIZED)

To tell the personal pool to start running, call the start() method:

[3]:
pool.start()
[3]:
PersonalPool(local_dir=./personal-condor, state=READY)

start() doesn’t return until the personal pool is READY, which means that it can accept commands (e.g., job submission).

Schedd and Collector objects for the personal pool are available as properties on the PersonalPool:

[4]:
pool.schedd
[4]:
<htcondor.htcondor.Schedd at 0x7faf80bfa040>
[5]:
pool.collector
[5]:
<htcondor.htcondor.Collector at 0x7faf80bf6e00>

For example, we can submit jobs using pool.schedd:

[6]:
sub = htcondor.Submit(
    executable = "/bin/sleep",
    arguments = "$(ProcID)s",
)

schedd = pool.schedd
submit_result = schedd.submit(sub, count=10)

print(f"ClusterID is {submit_result.cluster()}")
ClusterID is 2

And we can query for the state of those jobs:

[7]:
for ad in pool.schedd.query(
    constraint = f"ClusterID == {submit_result.cluster()}",
    projection = ["ClusterID", "ProcID", "JobStatus"]
):
    print(repr(ad))
[ ProcID = 0; ClusterID = 2; JobStatus = 1; ServerTime = 1695159761 ]
[ ProcID = 1; ClusterID = 2; JobStatus = 1; ServerTime = 1695159761 ]
[ ProcID = 2; ClusterID = 2; JobStatus = 1; ServerTime = 1695159761 ]
[ ProcID = 3; ClusterID = 2; JobStatus = 1; ServerTime = 1695159761 ]
[ ProcID = 4; ClusterID = 2; JobStatus = 1; ServerTime = 1695159761 ]
[ ProcID = 5; ClusterID = 2; JobStatus = 1; ServerTime = 1695159761 ]
[ ProcID = 6; ClusterID = 2; JobStatus = 1; ServerTime = 1695159761 ]
[ ProcID = 7; ClusterID = 2; JobStatus = 1; ServerTime = 1695159761 ]
[ ProcID = 8; ClusterID = 2; JobStatus = 1; ServerTime = 1695159761 ]
[ ProcID = 9; ClusterID = 2; JobStatus = 1; ServerTime = 1695159761 ]

We can use the collector to query the state of pool:

[8]:
# get 3 random ads from the daemons in the pool
for ad in pool.collector.query()[:3]:
    print(ad)

    [
        AuthenticatedIdentity = "condor@family";
        EffectiveQuota = 0.0;
        Priority = 5.000000000000000E+02;
        Requested = 0.0;
        UpdateSequenceNumber = 3;
        PriorityFactor = 1.000000000000000E+03;
        AuthenticationMethod = "FAMILY";
        AccountingGroup = "<none>";
        Name = "<none>";
        SubtreeQuota = 0.0;
        IsAccountingGroup = true;
        MyType = "Accounting";
        NegotiatorName = "jovyan@fa6c829ace67";
        GroupSortKey = 0.0;
        ResourcesUsed = 0;
        ConfigQuota = 0.0;
        DaemonStartTime = 1695159756;
        BeginUsageTime = 0;
        LastHeardFrom = 1695159760;
        WeightedAccumulatedUsage = 0.0;
        AccumulatedUsage = 0.0;
        TargetType = "none";
        WeightedResourcesUsed = 0.0;
        DaemonLastReconfigTime = 1695159756;
        SurplusPolicy = "byquota";
        LastUsageTime = 0;
        LastUpdate = 1695159760
    ]

    [
        UpdatesLost_Collector = 0;
        UpdatesInitial_Collector = 1;
        ActiveQueryWorkers = 0;
        SubmitterAds = 0;
        RecentUpdatesLostRatio = 0.0;
        DetectedCpus = 16;
        UpdatesLost = 0;
        CCBReconnects = 0;
        MachineAdsPeak = 0;
        MaxJobsRunningPVM = 0;
        RecentUpdatesLost_Accouting = 0;
        RecentCCBReconnects = 0;
        MaxJobsRunningPipe = 0;
        UpdatesInitial_Accouting = 1;
        UpdatesInitial_Schedd = 1;
        StatsLastUpdateTime = 1695159757;
        CurrentJobsRunningLinda = 0;
        StatsLifetime = 1;
        MonitorSelfTime = 1695159756;
        RecentUpdatesInitial_Negotiator = 1;
        CurrentJobsRunningStandard = 0;
        MaxJobsRunningAll = 0;
        CondorPlatform = "$CondorPlatform: X86_64-Ubuntu_20.04 $";
        CCBRequests = 0;
        MaxJobsRunningVM = 0;
        AddressV1 = "{[ p=\"primary\"; a=\"172.17.0.2\"; port=46071; n=\"Internet\"; alias=\"fa6c829ace67\"; spid=\"collector\"; noUDP=true; ], [ p=\"IPv4\"; a=\"172.17.0.2\"; port=46071; n=\"Internet\"; alias=\"fa6c829ace67\"; spid=\"collector\"; noUDP=true; ]}";
        UpdatesTotal_Accouting = 1;
        HostsUnclaimed = 0;
        MaxJobsRunningJava = 0;
        UpdatesInitial = 4;
        MaxJobsRunningGrid = 0;
        DetectedMemory = 32180;
        MaxJobsRunningPVMD = 0;
        RecentUpdatesLostMax = 0;
        RecentUpdatesTotal = 4;
        MaxJobsRunningStandard = 0;
        UpdatesTotal_Negotiator = 1;
        RecentUpdatesInitial_Accouting = 1;
        CurrentJobsRunningVM = 0;
        RecentUpdatesLost_Negotiator = 0;
        RecentUpdatesLost_Collector = 0;
        MaxJobsRunningUnknown = 0;
        CurrentJobsRunningPipe = 0;
        RecentCCBRequestsSucceeded = 0;
        CurrentJobsRunningLocal = 0;
        RecentUpdatesInitial = 4;
        RunningJobs = 0;
        CollectorIpAddr = "<172.17.0.2:46071?addrs=172.17.0.2-46071&alias=fa6c829ace67&noUDP&sock=collector>";
        UpdatesLost_Schedd = 0;
        Machine = "fa6c829ace67";
        CCBRequestsFailed = 0;
        CurrentJobsRunningPVMD = 0;
        MyCurrentTime = 1695159756;
        UpdatesLostRatio = 0.0;
        TargetType = "";
        MonitorSelfSecuritySessions = 2;
        LastHeardFrom = 1695159757;
        UpdateInterval = 21600;
        CurrentJobsRunningParallel = 0;
        CCBRequestsSucceeded = 0;
        MonitorSelfImageSize = 16092;
        CurrentJobsRunningScheduler = 0;
        CurrentJobsRunningAll = 0;
        MonitorSelfCPUUsage = 2.600000000000000E+01;
        UpdatesLost_Accouting = 0;
        SubmitterAdsPeak = 0;
        ForkQueriesFromCOLLECTOR = 2;
        UpdatesLost_Negotiator = 0;
        CurrentJobsRunningVanilla = 0;
        CCBEndpointsConnected = 0;
        CurrentJobsRunningPVM = 0;
        DaemonCoreDutyCycle = 4.107288524954678E-03;
        RecentStatsLifetime = 1;
        MonitorSelfRegisteredSocketCount = 2;
        RecentUpdatesLost = 0;
        RecentUpdatesInitial_Schedd = 1;
        MaxJobsRunningParallel = 0;
        RecentForkQueriesFromCOLLECTOR = 2;
        MaxJobsRunningLinda = 0;
        DroppedQueries = 0;
        CurrentJobsRunningUnknown = 0;
        HostsTotal = 0;
        CCBEndpointsRegistered = 0;
        UpdatesTotal = 4;
        RecentUpdatesTotal_Accouting = 1;
        ActiveQueryWorkersPeak = 1;
        MyType = "Collector";
        MonitorSelfResidentSetSize = 10940;
        HostsOwner = 0;
        RecentUpdatesLost_Schedd = 0;
        RecentUpdatesTotal_Negotiator = 1;
        RecentCCBRequestsNotFound = 0;
        CondorAdmin = "root@fa6c829ace67";
        UpdatesTotal_Collector = 1;
        CCBEndpointsConnectedPeak = 0;
        RecentCCBRequests = 0;
        UpdatesTotal_Schedd = 1;
        CCBRequestsNotFound = 0;
        RecentDroppedQueries = 0;
        MachineAds = 0;
        RecentUpdatesTotal_Schedd = 1;
        IdleJobs = 0;
        CCBEndpointsRegisteredPeak = 0;
        CurrentJobsRunningJava = 0;
        RecentDaemonCoreDutyCycle = 4.107288524954678E-03;
        CurrentJobsRunningMPI = 0;
        PendingQueriesPeak = 0;
        RecentUpdatesInitial_Collector = 1;
        PendingQueries = 0;
        UpdatesLostMax = 0;
        CondorVersion = "$CondorVersion: 8.9.11 Dec 29 2020 BuildID: Debian-8.9.11-1.2 PackageID: 8.9.11-1.2 Debian-8.9.11-1.2 $";
        RecentUpdatesTotal_Collector = 1;
        MaxJobsRunningLocal = 0;
        RecentCCBRequestsFailed = 0;
        MaxJobsRunningVanilla = 0;
        Name = "My Pool - 127.0.0.1@fa6c829ace67";
        MyAddress = "<172.17.0.2:46071?addrs=172.17.0.2-46071&alias=fa6c829ace67&noUDP&sock=collector>";
        CurrentJobsRunningGrid = 0;
        MaxJobsRunningMPI = 0;
        HostsClaimed = 0;
        MaxJobsRunningScheduler = 0;
        UpdatesInitial_Negotiator = 1;
        MonitorSelfAge = 1
    ]

    [
        LastHeardFrom = 1695159760;
        LastBenchmark = 0;
        TotalVirtualMemory = 32952448;
        HasReconnect = true;
        has_sse4_2 = true;
        OpSysMajorVer = 20;
        has_sse4_1 = true;
        DaemonCoreDutyCycle = 0.0;
        Disk = 362403396;
        CondorVersion = "$CondorVersion: 8.9.11 Dec 29 2020 BuildID: Debian-8.9.11-1.2 PackageID: 8.9.11-1.2 Debian-8.9.11-1.2 $";
        SlotTypeID = 1;
        Machine = "fa6c829ace67";
        HasPerFileEncryption = true;
        TotalSlotGPUs = 0;
        TotalGPUs = 0;
        Activity = "Idle";
        TotalCondorLoadAvg = 0.0;
        CpuCacheSize = 512;
        MonitorSelfCPUUsage = 7.000000000000001E+00;
        OpSys = "LINUX";
        SlotType = "Partitionable";
        UtsnameVersion = "#1 SMP Debian 4.19.132-1 (2020-07-24)";
        AuthenticationMethod = "FAMILY";
        CpuModelNumber = 1;
        MyCurrentTime = 1695159760;
        Name = "slot1@fa6c829ace67";
        Unhibernate = MY.MachineLastMatchTime isnt undefined;
        IsWakeOnLanSupported = false;
        HasJobDeferral = true;
        UtsnameNodename = "fa6c829ace67";
        ChildDSlotId =
           {
           };
        ChildRemoteUser =
           {
           };
        HasJICLocalConfig = true;
        DaemonStartTime = 1695159756;
        ChildRetirementTimeRemaining =
           {
           };
        HibernationSupportedStates = "S3,S4,S5";
        NextFetchWorkDelay = -1;
        TotalMemory = 32180;
        has_avx2 = true;
        HasTransferInputRemaps = true;
        RetirementTimeRemaining = 0;
        FileSystemDomain = "fa6c829ace67";
        StartdIpAddr = "<172.17.0.2:46071?addrs=172.17.0.2-46071&alias=fa6c829ace67&noUDP&sock=startd_326_2ff6>";
        RecentJobRankPreemptions = 0;
        ClockDay = 2;
        TotalLoadAvg = 6.800000000000000E-01;
        HasJobTransferPlugins = true;
        Cpus = 16;
        CondorLoadAvg = 0.0;
        MaxJobRetirementTime = 0;
        NumDynamicSlots = 0;
        StarterAbilityList = "HasFileTransferPluginMethods,HasVM,HasMPI,HasFileTransfer,HasJobDeferral,HasJobTransferPlugins,HasPerFileEncryption,HasReconnect,HasTDP,HasJICLocalStdin,HasTransferInputRemaps,HasSelfCheckpointTransfers,HasJICLocalConfig";
        HardwareAddress = "02:42:ac:11:00:02";
        ChildMemory =
           {
           };
        HasTDP = true;
        ClockMin = 1302;
        AcceptedWhileDraining = false;
        TimeToLive = 2147483647;
        EnteredCurrentActivity = 1695159760;
        Arch = "X86_64";
        SlotWeight = Cpus;
        MyType = "Machine";
        JobRankPreemptions = 0;
        HasIOProxy = true;
        TotalSlotMemory = 32180;
        Requirements = START && (WithinResourceLimits);
        UtsnameSysname = "Linux";
        NumPids = 0;
        TargetType = "Job";
        JobUserPrioPreemptions = 0;
        LastFetchWorkCompleted = 0;
        UpdatesHistory = "00000000000000000000000000000000";
        RecentJobUserPrioPreemptions = 0;
        COLLECTOR_HOST_STRING = "127.0.0.1:0";
        PslotRollupInformation = true;
        ChildAccountingGroup =
           {
           };
        Rank = 0.0;
        CpuBusyTime = 0;
        ExpectedMachineQuickDrainingCompletion = 1695159760;
        MonitorSelfResidentSetSize = 11572;
        MachineMaxVacateTime = 10 * 60;
        IsWakeOnLanEnabled = false;
        ChildName =
           {
           };
        has_ssse3 = true;
        HasSelfCheckpointTransfers = true;
        GPUs = 0;
        MachineResources = "Cpus Memory Disk Swap GPUs";
        WakeOnLanEnabledFlags = "NONE";
        has_avx = true;
        ExpectedMachineGracefulDrainingBadput = 0;
        CurrentRank = 0.0;
        HasFileTransfer = true;
        EnteredCurrentState = 1695159760;
        MonitorSelfSecuritySessions = 3;
        HasJICLocalStdin = true;
        CpuBusy = ((LoadAvg - CondorLoadAvg) >= 5.000000000000000E-01);
        DetectedMemory = 32180;
        MonitorSelfTime = 1695159760;
        CpuFamily = 23;
        OpSysShortName = "Ubuntu";
        HasVM = false;
        CanHibernate = true;
        ChildGPUs =
           {
           };
        DetectedCpus = 16;
        MonitorSelfRegisteredSocketCount = 0;
        CpuIsBusy = true;
        SlotID = 1;
        OpSysLongName = "Ubuntu 20.04.2 LTS";
        UtsnameMachine = "x86_64";
        ExpectedMachineGracefulDrainingCompletion = 1695159760;
        AuthenticatedIdentity = "condor@family";
        OpSysVer = 2004;
        OpSysAndVer = "Ubuntu20";
        UpdatesSequenced = 0;
        HibernationState = "NONE";
        UpdateSequenceNumber = 1;
        RecentJobPreemptions = 0;
        HibernationLevel = 0;
        HasMPI = true;
        WithinResourceLimits = (MY.Cpus > 0 && TARGET.RequestCpus <= MY.Cpus && MY.Memory > 0 && TARGET.RequestMemory <= MY.Memory && MY.Disk > 0 && TARGET.RequestDisk <= MY.Disk && (TARGET.RequestGPUs is undefined || MY.GPUs >= TARGET.RequestGPUs));
        MonitorSelfImageSize = 16564;
        OpSysLegacy = "LINUX";
        ChildCurrentRank =
           {
           };
        LoadAvg = 6.800000000000000E-01;
        JobPreemptions = 0;
        ChildDisk =
           {
           };
        Memory = 32180;
        MonitorSelfAge = 4;
        ChildRemoteOwner =
           {
           };
        UpdatesLost = 0;
        ChildCpus =
           {
           };
        ChildState =
           {
           };
        TotalDisk = 362403396;
        TotalSlotDisk = 3.624033960000000E+08;
        OpSysName = "Ubuntu";
        UpdatesTotal = 1;
        ChildActivity =
           {
           };
        CondorPlatform = "$CondorPlatform: X86_64-Ubuntu_20.04 $";
        HasFileTransferPluginMethods = "box,https,gdrive,dav,davs,http,onedrive,data,ftp,file,s3";
        DaemonLastReconfigTime = 1695159756;
        MyAddress = "<172.17.0.2:46071?addrs=172.17.0.2-46071&alias=fa6c829ace67&noUDP&sock=startd_326_2ff6>";
        DetectedGPUs = 0;
        KeyboardIdle = 1695159760;
        WakeOnLanSupportedFlags = "NONE";
        State = "Unclaimed";
        PartitionableSlot = true;
        JobStarts = 0;
        RecentDaemonCoreDutyCycle = 0.0;
        Start = true;
        UtsnameRelease = "4.19.0-10-amd64";
        TotalSlots = 1;
        UidDomain = "fa6c829ace67";
        SubnetMask = "255.255.0.0";
        IsWakeAble = false;
        RecentJobStarts = 0;
        AddressV1 = "{[ p=\"primary\"; a=\"172.17.0.2\"; port=46071; n=\"Internet\"; alias=\"fa6c829ace67\"; spid=\"startd_326_2ff6\"; noUDP=true; ], [ p=\"IPv4\"; a=\"172.17.0.2\"; port=46071; n=\"Internet\"; alias=\"fa6c829ace67\"; spid=\"startd_326_2ff6\"; noUDP=true; ]}";
        TotalCpus = 1.600000000000000E+01;
        TotalSlotCpus = 16;
        ChildEnteredCurrentState =
           {
           };
        IsLocalStartd = false;
        LastFetchWorkSpawned = 0;
        VirtualMemory = 32952448;
        ExpectedMachineQuickDrainingBadput = 0;
        ConsoleIdle = 1695159760
    ]

When you’re done using the personal pool, you can stop() it:

[9]:
pool.stop()
[9]:
PersonalPool(local_dir=./personal-condor, state=STOPPED)

stop(), like start() will not return until the personal pool has actually stopped running. The personal pool will also automatically be stopped if the PersonalPool object is garbage-collected, or when the Python interpreter stops running.

To prevent the pool from being automatically stopped in these situations, call the detach() method. The corresponding attach() method can be used to “re-connect” to a detached personal pool.

When working with a personal pool in a script, you may want to use it as a context manager. This pool will automatically start and stop at the beginning and end of the context:

[10]:
with PersonalPool(local_dir = Path.cwd() / "another-personal-condor") as pool:  # note: no need to call start()
    print(pool.get_config_val("LOCAL_DIR"))
/home/jovyan/tutorials/another-personal-condor