2.3.2 upgrade - tasks stuck in cancelling state

Hi,

After upgrading to 2.3.2 we’ve experiencing the exactly same situation as link below with old task stuck on “cancelling” state, preventing tentacles upgrade and other tasks to run.

http://help.octopusdeploy.com/discussions/problems/16710-tentacle-health-check-db-backup-tentacle-upgrade-tasks-timeout-after-23-upgrade

Any suggestions ?

Thanks,
Igal

Hi Igal,

Can you follow these steps to mark the cancelling task as failed?

http://help.octopusdeploy.com/discussions/problems/16357-octopus-backup-failure#comment_31907647

Paul

Hi Paul,

Manual status change helped.

Thanks,
Igal

+1 same problem happened to me

I think this same thing is happening to us. Tentacle upgrades are taking a long time to complete on some nodes (10+ minutes). So I cancel the upgrade, planning to come back to it later, only to find out that the Tentacle service on that node is locked up. Some nodes fail to upgrade and I allow them to fail on their own after 30+ minutes (without clicking the cancel button). Those nodes do NOT have the service locked up which is leading me to believe that canceling the upgrade is what’s causing the Tentacle service to lock up. Rebooting the node seems to “fix” it. This is when upgrading 3.3.1 Tentacles to match the server on 3.3.11.

Is there a way to get the Tentacle service back up by using the same RavenDB fix above but applying it to a SQL database instead? Rebooting is not a good option on some nodes. Or maybe this problem has been resolved and we’re seeing something else.

Hi,

@rjhansen Could you send through your task log for the upgrade. Also are your Tentacles Polling or Listening?
This thread is actually quite old, and since 2 years ago the process has been completely re-written.

Vanessa

Where do I find the task log for a Tentacle upgrade?

Hi @Rjhansen,

The upgrade task will have the same logs as any task: http://docs.octopusdeploy.com/display/OD/Get+the+raw+output+from+a+task

Vanessa

Oh I see. I’m so used to looking at task logs at the project level I forgot all about the Tasks menu. Task log attached.

ServerTasks-10230.log.txt (7 KB)

Hi @Rjhansen,

I have seen this error to do with AWS and the way VMS are spun up and down around restarting.
Are your machines on AWS ? Were they polling or listening?

Vanessa

We are not using Amazon Web Services but we are using Virtual Machines. We are using listening tentacles.

Hi @Rjhansen,

Thanks for confirming. I will have to run some tests here and see if I can replicate it.

Vanessa

Hi @Rjhansen,

We will need some more information to correctly identify the issue with the first upgrade, as the logs you have provided for the retry will not help us isolate the cause.
What we will require next time you upgrade:

  1. the logs from the upgrade task that you wait 10 minutes on
  2. the logs from the second attempt
  3. the Tentacle logs from when this occurs (C:\Octopus\Logs)

To correct the issue should only require a restart or stop and start of the Tentacle service and not a full restart of the server/machine.
Could you also confirm where your VMs are hosted?

Thanks!
Vanessa

  1. See task 10208 (attached). This is the second upgrade attempt and was canceled by the user. It ran for more than 10 minutes before it was canceled.
  2. Task 10230 was the most recent upgrade attempt (fifth time) and was canceled by the user. The first five upgrade attempts were all canceled by the user. The subsequent four upgrade attempts failed on their own, without user intervention, with exit code 1.
  3. I think logs have a short lifespan and have already been overwritten. We’ll have to wait until next time to get these.

Some of our nodes were able to be “fixed” by restarting the Tentacle. For others it required a full machine reboot. Our VM’s are hosted in-house.

ServerTasks-10208-cleaned_for_web.log_-_Copy.txt (117 KB)

Hi,

In 3.3.21 we released a change to make sure that a cancelled state would properly propagate through child processes to help ease this issue.
Next time you need to upgrade, if it is past this version please let me know if the situation changes.
If it doesn’t be sure to capture those logs!

Vanessa