Terminating long running tasks in Octopus Deploy

prabhjotkour.91 · 13 March 2023 11:32

Hi Team,

I am Prabhjot working on Octopus Deploy. I am looking for an option where I can terminate the long running tasks which are taking more than 2 hours and still unable to complete.

Please suggest me an option where we can have this.
Thanks in advance.

Regards,
Prabhjot

clare.martin · 13 March 2023 12:11

Good afternoon @prabhjotkour.91,

Thank you for contacting Octopus Support and great question on cancelling long running tasks. We have a PowerShell script here which will cancel long running tasks for you, you just need to change the variables at the top such as API key, Octopus Server URL and Space ID.

Let me know if that would suit your needs, please check what the script does before you run it so you are comfortable with what tasks it will cancel for you.

Kind Regards,
Clare

prabhjotkour.91 · 14 March 2023 15:46

Hi Clare,

Thanks for the reply. The provided power shell script will run in the background or we have to create a new project with run a script as process step and deploy it to terminate other project.

Please help me to understand in detail.

Regards,
Prabhjot

clare.martin · 14 March 2023 16:20

Hey @prabhjotkour.91,

The script needs to be run outside of Octopus in a PowerShell command prompt, ideally on the Octopus Server, using the same account the Octopus Server Service is running under.

You then need to change those variables I mentioned (shown below) to match your instance details and have an API key that has the correct permissions to view the tasks in Octopus (I recommend having an API key from an account that is in the Octopus Managers team if possible).

$OctopusURL = $OctopusParameters["Global.Base.Url"]
$APIKey = $OctopusParameters["Global.Api.Key"]
$CurrentSpaceId = $OctopusParameters["Octopus.Space.Id"]
$MaxRunTime = 15

You could also run this from the Octopus Server script console if you wanted to but you would still need to change those values I have above to match your instance details and have an API key with the correct permissions.

I hope that helps,
Kind Regards,
Clare

prabhjotkour.91 · 17 March 2023 09:18

Hi Clare,

Thanks for the details. One more thing during deployment if I cancel it, it says cancelling for more than 2 hours means it hangs, so in this case do we have any option to force cancel it.

Thanks,
Prabhjot

clare.martin · 17 March 2023 10:15

Hey @prabhjotkour.91,

I am sorry to hear your deployments are still hanging, am I able to ask some follow up questions please so I can help you get to the bottom of why this is happening?

What version of Octopus are you currently running?
Does this happen to all deployments or a select few?
If it is a select few have you noticed a pattern to the hanging deployments (ie they are all deploying to a certain target / they are all trying to deploy a package / they are all using a certain step).
If they are all deploying to a certain target what tentacle version is that target running?
Also, are we able to get a few task logs for the deployments that are hanging please so we can compare them and see if we can get to the bottom of why this is happening?

If you are able to get us some logs I have created you a link to our secure files upload site here where you can upload them to, let me know once you have uploaded them though as we don’t get notified when customers upload to our secure site.

You are able to force cancel a task, however there are some caveats which I will go into below:

You can try and restart the tentacle service on that target you deployed the stuck task to and see if that makes a difference. This should force cancel that task in Octopus for you.

However, if that tentacle has been updated to Tentacle version 6.3.x the code has unfortunately changed so in that case you would need to restart the Octopus Server service to get that task cancelled.

Ideally though we want to get to the bottom of why tasks are getting stuck so if you could send us over some logs that would be great!

I look forward to hearing from you,
Kind Regards,
Clare

prabhjotkour.91 · 17 March 2023 11:47

Hi Clare,

Answers to the questions are below:

Octopus version - 2022.3
When one deployment hang on the target and if we try to do other deployment on same target it says waiting for some script
This is hanging because we have a script which hangs and we are testing if there is an option to force stop the deployment
Tentacle version is 6.2.277

FYI - I am deploying a script which is having loop and it is hanging and when we are trying to cancel is it says cancelling for long time and it doesn’t get cancelled.

One thing I observed is when the deployment which is still cancelling and we try to deploy other project on same target, it will go into queue and wait for the cancelling task to get cancelled. Please correct me if I am wrong.

Thanks,
Prabhjot

clare.martin · 17 March 2023 13:16

Hey @prabhjotkour.91,

Thank you for getting back to me with those answers,

You are correct in that, if you cancel a task on the Octopus server and it hangs and does not cancel it wont send the cancel request to the tentacle so if you then send another task to that tentacle it will queue as there is already a task running and tentacles only will perform one task at a time and queue the rest.

We do have a way to mitigate this though with our OctopusBypassDeploymentMutex feature which you can read about here, so, if you set that variable up on your projects that you want to deploy in parallel on the target it wont queue the other tasks if you are struggling to cancel one.

As your tentacles are on 6.2.277 you should be able to cancel those tasks by restarting the tentacle, its not ideal but its the only way to get those unstuck in Octopus itself.

The other thing to mention is this might be one of a few old bugs we had in for 2022.3.x versions where cancelling tasks was an issue, I imagine you might be running into this one here:

Cancelled Tasks (e.g. a deployment) completes but the Task is never marked as completed and stays in the cancelling state with SQL Error 1222.

You should be able to check your Octopus Server logs for something like SQL Error 1222 - Lock request time out period exceeded. To see if you are bumping into that old GitHub issue, that was fixed in 2022.3.10827 though and I am not sure if you are on a version higher than that as you did mention you were on 2022.3 but not what hotfix.

It looks like you are trying to force a hang with a script for testing so am I right in saying this issue only occurs at the moment if you force the deployment to hang, that is good news as it means there is no underlying issue with your Octopus instance, you are just almost trying to get this to break so you can test if you can cancel hanging deployments?

If so, it looks like you may be affected with one of the bugs that was in 2022.3 versions, it might be worth looking to upgrade to at least 2022.4 to see if you can mitigate the cancelling issue, otherwise you will have to keep restarting tentacles (which, once you upgrade past 6.2.277 you will have to actually restart the Octopus Server as a tentacle restart will no longer work) to force the cancel of those hanging tasks.

I hope that makes sense and has answered all of your questions, let me know what you think of the GitHub issue and if you can see any SQL 1222 errors in your Octopus server logs, if so this will be an upgrade of your instance in order to fix this for the future.

Kind Regards,
Clare

system · 17 April 2023 13:16

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.