Runbook Timeout Settings

Hi,

I have a runbook that runs chocolatey upgrade across all deployment targets. Whilst some of the runs finishes successfully, others are hanging infinitely and disrupting regular deployments.

The runbook script is a simple powershell script calling a choco command.
I am wondering if I can set a timeout minutes to the runbook, so it cancels the runbook if it’s taking longer than that.

Hey @Hossein.Margani,

Thanks for reaching out to Octopus Support!

It’s a great question, and we have a script that we’ve created to cancel long-running tasks that may fit what you’re looking for.

You can set it to run as a Runbook and set the timeout window as you see fit:
GitHub/Octopusdeploy-Api/PoSh/Deployments/CancelLongRunningTasks.ps1

I hope this helps! Let me know if you have any questions or concerns.

Kind Regards,
Adam

I created another runbook with that script. But because the long running runbook is still running in that environment, the runbook for cancelling it can’t even start, it shows a message:

Cannot start this task yet because "ServerTasks-349002" (has a write lock) tasks are currently running and this task cannot be run in conjunction with any other tasks. Please wait...

Hi @Hossein.Margani,

Sorry to hear that script is giving you issues.

Are you running the script on a worker pool with workers available? Or on the server itself?

What type of task is ServerTasks-349002? What location is that task running?

Looking forward to hearing back.

Best,
Jeremy

Both are running on the deployment target. Both are runbook running tasks, and they are running on the same deployment target.

Hi @Hossein.Margani,

Thank you for explaining your setup. Would you be able to run the canceling task script on a worker (or the server itself) rather than the deployment target and see if that works?

Please let me know.

Best,
Jeremy

Thanks I did it. But there is a problem.

I have cancelled the task manually, and it was in Cancelling mode. Now this script is only looking for Executing state. I modified it so it also look for Cancelling too.

Now this script shows an error trying to stop the task: The task was canceled

Now my question is how I can stop this long running task which is now in cancelling state?

Hi @Hossein.Margani,

You’re very welcome.

If tasks are stuck in a canceling state it is usually due to the tentacle needing to be restarted. Are you able to restart the tentacle and tell me if the task goes fully canceled rather than canceling? If you can’t reboot the entire machine for some reason, usually a service restart of the Octopus Tentacle can suffice but if you can reboot the whole thing we should do that instead.

If that doesn’t work, you may need to restart the Octopus Server itself.

Please let me know how it goes.

Best,
Jeremy

Yes, I knew restarting Tentacle will fix this, but I was looking for a more manageable solution.

Hi,

Does this task get stuck every time you try to cancel it? I assumed this was the first time this had happened.

If so, how far along in the chocolatey process does it get before it gets stuck and needs to be canceled?

Best,
Jeremy

It happens to some of the machines running this chocolatey command. And it gets stuck somewhere, and the next day I see the whole Octopus cannot accept any deployment because the capacity is full on the tasks which are running doing nothing. Then I cancel them, but it won’t stop them, they are in cancelling state forever. Until I restart tentacle.

This script didn’t help me, as it is like manually cancelling the task. I think even if I don’t cancel the task, and restart the tentacle, then the task will be finished.

How can I implement this solution:

  1. Run the chocolatey upgrade
  2. Run the runbook for cancelling long running tasks which are running more than an hour.
  3. Run the runbook for restarting the tentacle after an hour

At the moment, I can’t do the above, because 3 will not happen as it has to be run on the deployment target, and the deployment target is already busy with 1. Can we have more than one deployment task for an environment or deployment target?

Hi,

You can definitely bypass mutex by using the documentation here: Run multiple processes on a target simultaneously - Octopus Deploy

Are you able to try using a Start-Process and Wait-Process for your chocolatey commands with the -Timeout parameter? This way, if your chocolatey script breaks, the command should end and the task shouldn’t hang.

Please let me know what you think.

Best,
Jeremy

I am using the following command:

Start-Process `
  -NoNewWindow `
  -FilePath "$env:ChocolateyInstall\bin\choco.exe" `
  -ArgumentList "upgrade all -y"

I will add the timeout option and test it.

But in the documentation, I don’t see any Timeout option.

Nope, Timeout option is not valid.

Hi @Hossein.Margani ,

You need to use Wait-Process to add Timeout: Wait-Process (Microsoft.PowerShell.Management) - PowerShell | Microsoft Docs

Please let me know if that works for you.

Best,
Jeremy

My Start-Process command has already an issue which doesn’t stop when chocolatey is stopped, so I don’t think executing Wait-Process after that helps.

By the way, I couldn’t restart tentacle via a runbook running on the same target after setting this to True, the error is:

The step failed: Activity Restart Tentacle on XXXX failed with error 'An error occurred when sending a request to 'https://XXXX:10933/', after the request began: messageEnvelope is null
messageEnvelope is null'. 

I guess it means, tentacle cannot restart itself. Or it may be able to do, but then you see an error responding to the Octopus.

Hi,

Here is an example of what I tested on my side:


$maximumRuntimeSeconds = 5 #leave this low as you will want the process to get killed in the try block if its still running.

$process = Start-Process -FilePath powershell.exe -ArgumentList '-Command Start-Sleep -Seconds 10' -PassThru #put your chocolatey command in the argument list

Start-Sleep -s 15 #put some value here that you think is a reasonable time for the chocolatey update to be finished by. 
try
{
    $process | Wait-Process -Timeout $maximumRuntimeSeconds -ErrorAction Stop
    Write-Warning -Message 'Process successfully completed within timeout.'
}
catch
{
    Write-Warning -Message 'Process exceeded timeout, will be killed now.'
    $process | Stop-Process -Force
}

Subbing in the start sleep for an amount of seconds or minutes you’d like the task to get killed by should create a timeout period. If the process is finished after the Start-Sleep, it will just do nothing and say “Process successfully completed within timeout.”, if by the time the Start-Sleep is over the process is still running, it will kill the process and the task should end.

You can test this by changing the seconds portion of the argument list in the second line of the script from 10 to 30 and run it both ways and see the results.

Please let me know if that works for you.

Best,
Jeremy

Thank you Jeremy,

I did this, but I don’t have the output from chocolatey command.

And if it is exceeds the timeout, it shows it with an error:

The step failed: Activity Run Upgrade All on XXX failed with error 'The process cannot access the file 'C:\Octopus\Work\20211228092818-349309-34' because it is being used by another process.

Server exception: 
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.AggregateException: One or more errors occurred. ---> System.IO.IOException: The process cannot access the file 'C:\Octopus\Work\20211228092818-349309-34' because it is being used by another process.
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.Directory.DeleteHelper(String fullPath, String userPath, Boolean recursive, Boolean throwOnTopLevelDirectoryNotFound, WIN32_FIND_DATA& data)
   at System.IO.Directory.Delete(String fullPath, String userPath, Boolean recursive, Boolean checkHost)
   at Octopus.Shared.Util.OctopusPhysicalFileSystem.<>c__DisplayClass43_0.<PurgeDirectoryAsync>b__1()
   at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Octopus.Shared.Util.OctopusPhysicalFileSystem.<PurgeDirectoryAsync>d__43.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Octopus.Shared.Util.OctopusPhysicalFileSystem.<DeleteDirectory>d__14.MoveNext()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at Octopus.Shared.Util.OctopusPhysicalFileSystem.DeleteDirectory(String path, DeletionOptions options)
   at Octopus.Tentacle.Services.Scripts.ScriptService.CompleteScript(CompleteScriptCommand command)
   --- End of inner exception stack trace ---
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Halibut.ServiceModel.ServiceInvoker.Invoke(RequestMessage requestMessage)
   at Halibut.Transport.Protocol.MessageExchangeProtocol.InvokeAndWrapAnyExceptions(RequestMessage request, Func`2 incomingRequestProcessor)'.