Octopus repair/reboot needed any time a task is manually cancelled

When I deploy to a machine in Octopus and immediately realize that I needed to deploy to a different machine, I choose to cancel the task while it is still in step 1. This causes the task to get stuck in a “Cancelling” state and if I press cancel again, it will go into a “Timed out” state. However, if I try to deploy to any machine again after, they are stuck in a “queued” state.

To fix this, I have to backup Octopus (which doesn’t always work, it also gets stuck in a queued state), then run the repair script.

Has anyone ever experienced this issue? If so, is there a fix?

Hi Jenna,

Thanks for getting in touch! What version of Octopus are you running so I can attempt to replicate this.

Thanks!
Vanessa

Vanessa -

Thanks for your quick response! I am running Octopus Deploy 2.5.12.666. I plan to upgrade to 2.6.0.778 soon, just want to get all of my ducks in a row first.

Hi Jenna,

Thanks for the extra info. I’ve created a ticket to investigate this and try to replicate the behavior.

Vanessa

+1 this
We see this problem almost daily in our environment also running 2.5.12.666
We have had to automate the cancelling of all other deployments as well as the restarting of the service to get Octopus running smoothly.

-J

Hi Jenna (and Jeff),

Just for ‘more info to try to replicate this’ a couple of questions for you to answer.

  1. How long is it between hitting cancel and hitting it a second time?
  2. Is it typically the one environment?

Vanessa

Vanessa,
We have seen this with variable time frames. We recommend users do not hit cancel a second time at all and manage that from our Ops team, so the delay can be as much as an hour or two before we force cancel the task with the second click. It may be helpful to add a click timer to the button with a 15 second delay to ensure that users are not able to hit the second click immediately.

-Jeff

Jeffrey Feinberg
Director | Release Engineering
Eze Software Group / Eze OMS
12 Farnsworth St / Boston, MA 02210
T 617.897.2357 /E jfeinberg@ezesoft.commailto:jfeinberg@ezesoft.com
www.ezesoft.comhttp://www.ezesoft.com/

Vanessa -

I agree with Jeff, they delay can be an hour or two before I hit cancel
again to time it out. As far as environments, I’ve seen this happen in at
least 2 different environments and at least 5 different machines have
triggered it at different times. On a similar note, the users are hitting
cancel because they accidentally forgot to select a machine within the
environment before deploying and end up deploying to the whole
environment. Is there a trick to make the 2nd green "Deploy Release"
button gray out until the user selects a specific machine? In my case, we
don’t have any reason to deploy to all machines in an environment at once.

Thanks for your help!!

  • Jenna

Thanks Jenna and Jeff

That actually really helps, because I was under the impression it was quick, then everything Queued. But I think this should be easy to replicate now.

Vanessa

Jenna,

We have a similar issue and created a custom step template to limit the users from deploying to all machines at once. We use this in our general development environment to prevent a user from deploying to all machines in an environments at once. We limit them to deploy to only a single machine at a time. Hope this helps you out.
-Jeff

$environment = $OctopusParameters[‘Octopus.Environment.Name’]
Write-Output “Selected environment = $environment”

$deploymachineList = $OctopusParameters[‘Octopus.Deployment.SpecificMachines’]
Write-Output “Specific machine list = $deploymachineList”

if( ($environment -eq ‘EnvironmentName’) -and ($deploymachineList -eq ‘’) )
{
Write-Output "Deployment Stopped. You must select a specific machine target when deploying to EnvironmentName environment"
Exit -1
}

Jeffrey Feinberg

Thank you so much, Jeff! I am going to work on implementing this very soon!

This issue is affecting us too (2.5.12.666). Do you have any updates on this?

Hi All,

As an update, looking into this issue is part of our current sprint.

Vanessa

Hi all,

I’m having trouble reproducing this on 2.6.3, and I’d really appreciate some more information:

  1. Does this happen every single time you cancel a deployment, or just occasionally?
  2. Does it eventually return to normal if you click cancel twice (the task should be marked as timed out, but other tasks should then be able to run)?
  3. When it does happen, can you look in RavenDB (http://localhost:10931 on the Octopus server) and see if there are any stale indexes or errors?

If this is something you can reproduce easily/on demand, we’d love to do a screen sharing session to see it and try to get to the bottom of it. You can pick a time that works here:

https://octopusdeploy.acuityscheduling.com/schedule.php

Thanks,
Paul

Hi Paul -

Today I saw that I had a task that had been sitting in a “deploying” state
for one month. I was afraid to cancel it because then I would have to shut
down the Octopus server and run the repair script. Today, I got tired of
seeing it sitting there and took the chance of cancelling it. It worked!
I had no issues with it getting stuck in a queued state, didn’t have to
click ‘cancel’ again to get it to actually cancel, and I was able to
successfully deploy other projects after I cancelled it.

So it seems that this is no longer an issue for me. I didn’t do any
updates to Octopus, other than a normal Windows update a few days ago.

Thanks to you and Vanessa for all of your help.

  • Jenna