Worker Stuck and Cannot be Deleted

I have three workers that I cannot delete. When I try to delete them I get this error:

There was a problem with your request.

  • The worker machine is being used in one or more deployment tasks. Set the worker to disabled, wait for all tasks using the work to complete and then try again.

I have had the worker set to disabled for a few days now and I still cannot delete them.

Is there a way to “Force Delete” it?
(I don’t really care about whatever it thinks it is working on.)

NOTE: I have an automated process that removes all the workers and then creates new ones, so I would prefer a way to force delete them via the Octopus API is possible.

Hi @OctopusSchaff,

Thanks for getting in touch! We don’t have a force delete option for workers as far as I’m aware. I’ve asked our developers whether this is an intentional omission or whether we might consider adding it (or whether we do and I’m just unaware of it). However, the worker being set to disabled for a few days should be more than enough to clear any associated / active tasks.

I am aware that you’re mostly just after an API solution to automate this process, but I think the best approach to resolving this right now is to find out why the worker still thinks there’s a task being executed on it and see if we can get things unstuck. It could point towards a larger issue or bug that needs our attention.

To start, would you be able to let me know which version of Octopus Server you’re currently using and details about the worker machine with this issue? (Windows or SSH target, If running Tentacle, which version, etc)

Do you see this issue frequently, sporadically, or is this the first occurrence of a worker disabled for multiple days refusing to release tasks?

Finally, are you able to share the relevant parts of the script you use to remove the workers?

The above information will help provide some additional context around this problem while waiting for feedback from our developers.

Looking forward to hearing from you and getting to the bottom of this.

Best Regards,
Daniel

Thank you for your response!

This is the first time this issue has happened to me. Normally my script is able to delete the Workers just fine.

I found this link: Unable to delete worker - #2 by trond and it had the suggestion to delete the relevant rows from the WorkerTaskLease table. While not an easily automatable solution, we manually tried it out and it fixed my current issue (I was able to delete the workers).

If this issue repeats then I think it will be worth digging into, but I am hoping it was an isolated incident. For now, lets not worry about it unless it happens again.


For reference:

  1. Both tasks that caused the workers to stay locked had timed out. My guess is that this error caused the lock to not get released (locking and releasing can be tricky business). This was the error message for both tasks:

A request was sent to a polling endpoint, but the polling endpoint did not collect the request within the allowed time (00:02:00), so the request timed out.

Error Call Stack
Server exception: System.TimeoutException: A request was sent to a polling endpoint, but the polling endpoint did not collect the request within the allowed time (00:02:00), so the request timed out.
Halibut.HalibutClientException
 at Halibut.ServiceModel.HalibutProxy.EnsureNotError(ResponseMessage responseMessage)
 at Halibut.ServiceModel.HalibutProxy.Invoke(MethodInfo targetMethod, Object[] args)
 at System.Reflection.DispatchProxyGenerator.Invoke(Object[] args)
 at generatedProxy_1.StartScript(StartScriptCommand )
 at Octopus.Server.Orchestration.Targets.Tentacles.TentacleRemoteEndpointFacade.ExecuteCommand(StartScriptCommand command, ITaskLog taskLog) in TentacleRemoteEndpointFacade.cs:line 62
 at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionExecution.Immediate.ExecutionTargets.TentacleExecutionTarget.Execute(ScriptCollection bootstrapperScripts, IReadOnlyList`1 bootstrapperArguments, IReadOnlyList`1 files, Nullable`1 forceIsolationLevel, Boolean raw, ITaskLog taskLog, String isolationMutexName, Nullable`1 isolationMutexTimeout) in TentacleExecutionTarget.cs:line 69
 at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionExecution.Immediate.ImmediateExecutor.ExecuteCalamari(CalamariFlavour calamariFlavour, String calamariCommand, IReadOnlyList`1 calamariArguments, IReadOnlyList`1 files, IReadOnlyList`1 deploymentTools, VariableCollection extraVariables, TargetManifest targetManifest, CalamariPlatformConstraint calamariPlatformConstraint, Nullable`1 isolationMutexTimeout, String isolationMutexName, ITaskLog taskLog) in ImmediateExecutor.cs:line 160
 at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionExecution.CommandBuilders.CalamariCommandBuilder.Execute(ITaskLog taskLog) in CalamariCommandBuilder.cs:line 172
 at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionDispatch.AdHocActionDispatcher.InvokeActionHandler(Machine target, ActionHandlerInvocation actionHandler, ActionAndTargetScopedVariables actionAndTargetScopedVariables, IExecutor executor, TargetManifest targetManifest, ITaskLog taskLog) in AdHocActionDispatcher.cs:line 210
 at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionDispatch.AdHocActionDispatcher.ExecuteOnWorker(TargetManifest targetManifest, Worker worker, ActionHandlerInvocation actionHandler, ITaskLog taskLog) in AdHocActionDispatcher.cs:line 109
 at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionDispatch.AdHocActionDispatcher.Dispatch(Machine machine, ActionHandlerInvocation actionHandler, ITaskLog taskLog, VariableCollection variables) in AdHocActionDispatcher.cs:line 57
 at Octopus.Server.Orchestration.ServerTasks.Deploy.PackageLockReleaser.ReleasePackageLock(Machine machine, String taskId, ITaskLog taskLog) in PackageLockReleaser.cs:line 37
 at Octopus.Core.Extensions.EnumerableExtensionMethods.<>c__DisplayClass6_0`1.<Do>b__0(T item) in EnumerableExtensionMethods.cs:line 89
 at System.Linq.Enumerable.SelectEnumerableIterator`2.MoveNext()
 at Octopus.Core.Extensions.EnumerableExtensionMethods.Done[T](IEnumerable`1 items) in EnumerableExtensionMethods.cs:line 108
 at Octopus.Server.Orchestration.Workers.WorkerReleaser.Release(ITaskLog taskLog, ITaskContext taskContext, AcquiredPackageMap acquiredPackages) in WorkerReleaser.cs:line 27
 at Octopus.Server.Orchestration.ServerTasks.Deploy.ExecutionTaskController`3.<ExecuteBase>b__16_0(IWorkerReleaser workerReleaser, CancellationToken _) in ExecutionTaskController.cs:line 97
 at Octopus.Server.Infrastructure.Orchestration.UnitsOfWork.UnitOfWorkExecutor.<>c__DisplayClass3_0`1.<Execute in UnitOfWorkExecutor.cs:line 87
 at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.Do(IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 75
 at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.Do(IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 75
 at Octopus.Server.Infrastructure.Orchestration.UnitsOfWork.UnitOfWorkExecutor.Execute[T](Func`3 action, CancellationToken cancellationToken, String name) in UnitOfWorkExecutor.cs:line 90
 at Octopus.Server.Orchestration.ServerTasks.Deploy.ExecutionTaskController`3.ExecuteBase(CancellationToken cancellationToken) in ExecutionTaskController.cs:line 92
 at Octopus.Server.Orchestration.ServerTasks.Deploy.RunbookRunTaskController.Execute(CancellationToken cancellationToken) in RunbookRunTaskController.cs:line 56
 at Octopus.Server.Orchestration.ServerTasks.RunningTask.<>c__DisplayClass31_1.<WorkerTask in RunningTask.cs:line 188
 at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.Do(IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 75
 at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.Do(IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 75
 at Octopus.Server.Orchestration.ServerTasks.RunningTask.WorkerTask(CancellationToken cancellationToken) in RunningTask.cs:line 221
  1. I am using Octopus Server 2022.2.6895. The worker is a Docker Container running in Kubernetes. I am using version 6.1.1409 of the Docker Tentacle container image. (Though I do update Calamari when it starts up, not sure if Calamari is related to the Tentacle version.)

  2. This is the part of the script that I use to delete the workers:

# Get the workers in the WorkerPool
$workers = Invoke-RestMethod -Method Get -Uri "$octopusUrl/api/$spaceId/workerpools/$kubernetesWorkerPoolId/workers" -Headers $headers

if ($isTestRun -eq $false)
{
    # Delete the workers in the WorkerPool
    $workers.Items | ForEach-Object -Process {
        Write-Output "Deleting Worker $($_.Id)"
        Invoke-RestMethod -Method Delete -Uri "$octopusUrl/api/$spaceId/workers/$($_.Id)" -Headers $headers	
	}
}

Hey @OctopusSchaff,

Just jumping in for Daniel who is currently off shift as part of our Australian based team. Thank you for confirming you managed to manually delete the workers. Though we do not usually recommend manually removing rows directly from the DB the WorkerLease Table is fine to do this in as it will just kill the workers connection to Octopus until its next check in (via a deployment or time if polling tentacle).

To fix cloud worker issues we usually remove the lease from that instance so its the same thing for Server.

I am also hoping this is a one off issue, as you mentioned releasing DB locks can be complicated at times so this looks to be an anomaly. Please do get in touch if it happens again though and we will look at improving code in whatever area we find has the lock on it.

Thank you for the detailed explanation and the script too as, if it does happen again, you can just reference this forum post and we will be up to speed and can get this in front of the engineers straight away.

The request sent to a polling endpoint timeout error usually does not result in a stuck lock as Octopus should just kill the task to that worker as it cant communicate with it but again we cant know for sure unless we go through SQL DB logs.

Just to address the tentacle and Calamari query you had: Tentacles are a Service which facilitates the communication between the Octopus Server and the deployment target. It passes commands onto Calamari to execute. So they are two separate entities, but are linked in terms of certain tentacle versions will only work with certain calamari versions if that makes sense.

Let us know if it happens again and we will jump into action!

Kind Regards,
Clare

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.