Deployments fail with error "Stack Empty"

Hi,
We have some issues with some of our deploys, where they sometimes fail with the error “Stack empty”, somewhere down in the Octopus Server stack. The full callstack is included in the paste below.

This seems to happen especially if the deployment happens right after another deploy in the previous environment, or right after the release is created. I am not 100% sure that this is a reliable pattern though.

The deployment process it’s trying to deploy in this example is a Windows Service deployment, but it never reaches the actual steps in the process. It fails even before pulling packages. The deployment process that fails is part of a larger parent project, where we have multiple “deploy a release” steps. Only one failed with this error, the rest passed. Other times we have seen different releases fail to deploy. Seems a bit random.

Is this a known issue? It seems to have started to happen after we migrated to a new Octopus server where we use a newer version of Octopus Deploy. We currently use 2021.2 (Build 7727)).

Any tips would be greatly appreciated.

Hi hallgeir.osterbo,

Thank you for reaching out to us. I’m sorry to hear that your deployment is failing.

It sounds like you may be encountering the following issue:

Can you please advise if the workaround listed on the GitHub page allows your deployment to proceed?

Best Regards,

Charles

Thanks a lot for the quick reply! I will certainly try this out.

Is it just a matter of adding a variable to the project with the name OctopusBypassDeploymentMutex and setting it to false on the parent (orchestrator) project? So I assume it’s default “true”? Because I haven’t added this with any value so far.

Update: I tried setting this variable to false. This did not have an effect - the next release we created and deployed failed with “Stack empty”.

Hi @hallgeir.osterbo,

Sorry changing the variable setting didn’t resolve the issue for you.

Which version of Octopus did you upgrade from?

Have you done any resource monitoring on the Octopus server while the deployments are taking place?

I’m not sure what might be causing this yet, but I was wondering if you’d be willing to send us a task log from the failed deployment along with your server log from around the same time?

Also, if you’re willing to provide us with an export of your process as .JSON steps, that would also be helpful for our understanding and testing.

You can use the following link to upload these to us securely:
Support - Octopus Deploy

Once we have those we will review them and continue troubleshooting with you.

Best,
Patrick

We upgraded from 2021.1.7738.

We haven’t done any resource monitoring. We can do that - however, this does seem like a bug in the code to my eyes. But we’ll have a look at the resource usage.

We have some strict rules when it comes to what we can include in logs we are sharing. So I have “anonymized” the project names and server names by hashing (so you should be able to correlate the different logs). But in any case, I’ve uploaded:

  • Task execution log for the orchestrator project that fails (note that even though the final “Deploy a release” task failed in the orchestrator project, the subproject deployment itself actually succeeded).
  • Process definition json for the orchestrator project.
  • Server logs for the time

Let me know if you need anything else related to this. This is currently creating a lot of disturbances for us unfortunately, so really hoping for a workaround or quick solution to this.

Thanks!

Sincerely,
Hallgeir

Just some more info - one of the environments we deploy to only have two of the steps in this orchestration project enabled (using the environment conditions). And even that environment fails often with the same error. So this doesn’t seem to be load related, and does not seem to be related to a high number of simultaneous jobs.

Hi @hallgeir.osterbo,

Thanks for that additional information.

We agree that the issue is within the code somewhere, we’re just aiming to gather as much data as possible to compare your situation with the other users that have encountered this to enable our engineers to track down the cause more quickly.

If you track the issue my colleague linked earlier, our engineers will update it with a note on which versions the fix is released to once complete.

Regards,
Paul

Thanks Paul!

As mentioned, let me know if I can help with more info on this. And I’ve subscribed to the Github issue.

Sincerely,
Hallgeir

1 Like

Is there any updates to this issue that could be shared? For us this is getting a bit painful, so knowing when a fix is planned would be very helpful for us. :slight_smile:

I am following the Github issue, but currently the only thing that has happened is more confirmed cases + the one who were assigned are now not assigned anymore. Anything that can be shared of what’s going on behind the scenes?

Hi Hallgeir,

Thank you for your message.

I can completely understand that this issue could be awkward to work around. Unfortunately I don’t have any updates at the moment. The issue is with our engineering team and needs to work through our usual processes (e.g. triage, assignment, development, testing and so on). I’ll let the team know you’re keen for this to be fixed and I will let you know if I can provide any further context.

I hope this is helpful. Please let me know if you have any questions.

Best Regards,

Charles

1 Like