Deployments fail with error "Stack Empty"

hallgeir.osterbo · 10 November 2021 08:03

Hi,
We have some issues with some of our deploys, where they sometimes fail with the error “Stack empty”, somewhere down in the Octopus Server stack. The full callstack is included in the paste below.

This seems to happen especially if the deployment happens right after another deploy in the previous environment, or right after the release is created. I am not 100% sure that this is a reliable pattern though.

The deployment process it’s trying to deploy in this example is a Windows Service deployment, but it never reaches the actual steps in the process. It fails even before pulling packages. The deployment process that fails is part of a larger parent project, where we have multiple “deploy a release” steps. Only one failed with this error, the rest passed. Other times we have seen different releases fail to deploy. Seems a bit random.

Is this a known issue? It seems to have started to happen after we migrated to a new Octopus server where we use a newer version of Octopus Deploy. We currently use 2021.2 (Build 7727)).

Any tips would be greatly appreciated.

charles.hague · 10 November 2021 09:04

Hi hallgeir.osterbo,

Thank you for reaching out to us. I’m sorry to hear that your deployment is failing.

It sounds like you may be encountering the following issue:

github.com/OctopusDeploy/Issues

"Stack empty" error may occur when running many tasks in parallel against the Octopus Server with OctopusBypassDeploymentMutex set to true

opened 08:29PM - 25 Oct 21 UTC

donnybell

kind/bug feature/deployments state/triage team/config-as-code

### Team - [X] I've assigned ### Severity 2+… affected customers Reported in 2021.2.7697 and 2021.2.7580 steps/tasks in parallel against the Octopus Server with `OctopusBypassDeploymentMutex` set to `true` can create a `Stack empty` error. I was able to replicate this using `Deploy a Release` steps. or so projects that deploy a simple package (they can be clones of each other) orchestrator project with a matching number of `Deploy a Release` steps `OctopusBypassDeploymentMutex` to `true` Release with the orchestrator project trouble reproducing with the above instructions, you can try cloning the orchestrator project and deploying both orchestrator projects at the same time_ for release 0.0.1 of project 'Deployment Proj 2' Stack empty. | System.InvalidOperationException | at System.Collections.Generic.Stack`1.ThrowForEmptyStack() | at System.Collections.Generic.Stack`1.Pop() | at Octopus.Core.Features.Projects.ProjectScope.ProjectPathDisposable.Dispose() in ProjectScope.cs:line 105 | at Octopus.Server.Orchestration.ServerTasks.Deploy.DeploymentFactory.CreateDeployment(IOctopusQueryExecutor queryExecutor, Deployment deployment, User user, DateTimeOffset queueTime, Nullable`1 queueTimeExpiry, ILifecycleProgressionEvaluator lifecycleProgressionEvaluator, IDeploymentSummaryService deploymentSummaryService, IEnumerable`1 additionalVariables, IPromptedVariableValueProvider promptedVariableValueProvider, Boolean isScheduledDeployment, ILog log) in DeploymentFactory.cs:line 184 | at Octopus.Core.Resources.Deployments.DeploymentCreator.CreateDeployment(DeploymentResource resource, DateTimeOffset queueTime, Boolean isScheduledDeployment, IOctopusQueryExecutor executorAsPrincipal, User user, IPromptedVariableValueProvider promptedVariableValueProvider, ILog log, CancellationToken cancellationToken, IEnumerable`1 additionalVariables) in DeploymentCreator.cs:line 68 | at Octopus.Server.Orchestration.ServerTasks.Deploy.Steps.DeployRelease.DeployReleaseActionHandler.<>c__DisplayClass18_1.<Execute in DeployReleaseActionHandler.cs:line 223 | at Octopus.Server.Infrastructure.Orchestration.UnitsOfWork.UnitOfWorkExecutor.<>c__DisplayClass10_0`2.<Execute in UnitOfWorkExecutor.cs:line 166 | at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.DoAsync[T](IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 43 | at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.DoAsync[T](IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 43 | at Octopus.Server.Infrastructure.Orchestration.UnitsOfWork.UnitOfWorkExecutor.Execute[T1,TResult](Func`3 action, CancellationToken cancellationToken, String name) in UnitOfWorkExecutor.cs:line 167 | at Nito.AsyncEx.Synchronous.TaskExtensions.WaitAndUnwrapException[TResult](Task`1 task) | at Octopus.Server.Orchestration.ServerTasks.Deploy.Steps.DeployRelease.DeployReleaseActionHandler.Execute(ActionCommand command, DeploymentTarget machine, ITaskLog taskLog) in DeployReleaseActionHandler.cs:line 220 | at Octopus.Server.Orchestration.ServerTasks.Deploy.Steps.DeployRelease.DeployReleaseActionHandler.Execute(ActionCommand command, ITaskLog taskLog, CancellationToken cancellationToken) in DeployReleaseActionHandler.cs:line 101 | at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionDispatch.OldActionDispatcher.<>c__DisplayClass4_0.<DispatchServerAction in OldActionDispatcher.cs:line 42 | at Octopus.Server.Infrastructure.EitherAsyncOrSync.Execute(CancellationToken cancellationToken) in EitherAsyncOrSync.cs:line 38 | at Octopus.Server.Orchestration.ServerTasks.Deploy.Guidance.ExecuteWithoutGuidance(EitherAsyncOrSync callback, String actionName, Boolean actionIsRequiredToRun, ITaskLog taskLog, CancellationToken cancellationToken) in Guidance.cs:line 116 | at Octopus.Server.Orchestration.ServerTasks.Deploy.Guidance.Execute(EitherAsyncOrSync callback, String actionName, Boolean actionIsRequiredToRun, ITaskLog taskLog, Maybe`1 callbackOnExclude, CancellationToken cancellationToken) in Guidance.cs:line 67 | at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionDispatch.OldActionDispatcher.DispatchServerAction(ActionCommand command, ITaskLog taskLog, CancellationToken cancellationToken) in OldActionDispatcher.cs:line 37 | at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionDispatch.OldActionDispatcher.Dispatch(ActionCommand command, DeploymentTarget deploymentTarget, ITaskLog taskLog, CancellationToken cancellationToken) in OldActionDispatcher.cs:line 27 | at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionDispatch.NewActionDispatcher.Dispatch(ActionCommand command, DeploymentTarget deploymentTarget, ITaskLog taskLogForTarget, ITaskLog taskLogRoot, Maybe`1 guidanceExcludeCallback, CancellationToken cancellationToken) in NewActionDispatcher.cs:line 92 | at Octopus.Server.Orchestration.ServerTasks.Deploy.ExecutionTaskController`1.<>c__DisplayClass38_2.<ExecuteActionAndInitLoggingContext in ExecutionTaskController.cs:line 601 | at Octopus.Server.Infrastructure.EitherAsyncOrSync.Execute(CancellationToken cancellationToken) in EitherAsyncOrSync.cs:line 38 | at Octopus.Server.Orchestration.ServerTasks.Deploy.Guidance.ExecuteWithoutGuidance(EitherAsyncOrSync callback, String actionName, Boolean actionIsRequiredToRun, ITaskLog taskLog, CancellationToken cancellationToken) in Guidance.cs:line 116 | at Octopus.Server.Orchestration.ServerTasks.Deploy.Guidance.Execute(EitherAsyncOrSync callback, String actionName, Boolean actionIsRequiredToRun, ITaskLog taskLog, Maybe`1 callbackOnExclude, CancellationToken cancellationToken) in Guidance.cs:line 67 | at Octopus.Server.Orchestration.ServerTasks.Deploy.ExecutionTaskController`1.<>c__DisplayClass38_1.<ExecuteActionAndInitLoggingContext in ExecutionTaskController.cs:line 607 | at Octopus.Server.Infrastructure.Orchestration.UnitsOfWork.UnitOfWorkExecutor.<>c__DisplayClass4_0`2.<Execute in UnitOfWorkExecutor.cs:line 74 | at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.DoAsync(IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 73 | at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.DoAsync(IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 73 | at Octopus.Server.Infrastructure.Orchestration.UnitsOfWork.UnitOfWorkExecutor.Execute[T1,T2](Func`4 action, CancellationToken cancellationToken, String name) in UnitOfWorkExecutor.cs:line 75 | at Octopus.Server.Orchestration.ServerTasks.Deploy.ExecutionTaskController`1.<>c__DisplayClass38_0.<ExecuteActionAndInitLoggingContext in ExecutionTaskController.cs:line 608 | at Octopus.Server.Orchestration.ServerTasks.Deploy.ExecutionTaskController`1.ExecuteWithTransientErrorDetection(Func`2 action, ITaskLog taskLog, CancellationToken cancellationToken, DeploymentTarget deploymentTarget) in ExecutionTaskController.cs:line 753 | at Octopus.Server.Orchestration.ServerTasks.Deploy.ExecutionTaskController`1.ExecuteActionAndInitLoggingContext(PlannedStep step, DeploymentTarget targetContext, PlannedAction action, ITaskLog taskLogForTarget, ITaskLog taskLogRoot, CancellationToken cancellationToken) in ExecutionTaskController.cs:line 626 The action Deploy a Release 2 on a Worker failed elp.octopus.com/t/runtime-error-on-deploy-a-release-process-step/27290) ctopus.zendesk.com/agent/tickets/77788) `OctopusBypassDeploymentMutex` to `false` for projects that run many steps/tasks directly against the Octopus Server

Can you please advise if the workaround listed on the GitHub page allows your deployment to proceed?

Best Regards,

Charles

hallgeir.osterbo · 10 November 2021 12:01

Thanks a lot for the quick reply! I will certainly try this out.

Is it just a matter of adding a variable to the project with the name OctopusBypassDeploymentMutex and setting it to false on the parent (orchestrator) project? So I assume it’s default “true”? Because I haven’t added this with any value so far.

hallgeir.osterbo · 10 November 2021 13:46

Update: I tried setting this variable to false. This did not have an effect - the next release we created and deployed failed with “Stack empty”.

patrick.smergut · 10 November 2021 17:42

Hi @hallgeir.osterbo,

Sorry changing the variable setting didn’t resolve the issue for you.

Which version of Octopus did you upgrade from?

Have you done any resource monitoring on the Octopus server while the deployments are taking place?

I’m not sure what might be causing this yet, but I was wondering if you’d be willing to send us a task log from the failed deployment along with your server log from around the same time?

Also, if you’re willing to provide us with an export of your process as .JSON steps, that would also be helpful for our understanding and testing.

You can use the following link to upload these to us securely:
Support - Octopus Deploy

Once we have those we will review them and continue troubleshooting with you.

Best,
Patrick

hallgeir.osterbo · 11 November 2021 08:08

We upgraded from 2021.1.7738.

We haven’t done any resource monitoring. We can do that - however, this does seem like a bug in the code to my eyes. But we’ll have a look at the resource usage.

We have some strict rules when it comes to what we can include in logs we are sharing. So I have “anonymized” the project names and server names by hashing (so you should be able to correlate the different logs). But in any case, I’ve uploaded:

Task execution log for the orchestrator project that fails (note that even though the final “Deploy a release” task failed in the orchestrator project, the subproject deployment itself actually succeeded).
Process definition json for the orchestrator project.
Server logs for the time

Let me know if you need anything else related to this. This is currently creating a lot of disturbances for us unfortunately, so really hoping for a workaround or quick solution to this.

Thanks!

Sincerely,
Hallgeir

hallgeir.osterbo · 11 November 2021 10:59

Just some more info - one of the environments we deploy to only have two of the steps in this orchestration project enabled (using the environment conditions). And even that environment fails often with the same error. So this doesn’t seem to be load related, and does not seem to be related to a high number of simultaneous jobs.

paul.calvert · 11 November 2021 12:27

Hi @hallgeir.osterbo,

Thanks for that additional information.

We agree that the issue is within the code somewhere, we’re just aiming to gather as much data as possible to compare your situation with the other users that have encountered this to enable our engineers to track down the cause more quickly.

If you track the issue my colleague linked earlier, our engineers will update it with a note on which versions the fix is released to once complete.

Regards,
Paul

hallgeir.osterbo · 11 November 2021 14:40

Thanks Paul!

As mentioned, let me know if I can help with more info on this. And I’ve subscribed to the Github issue.

Sincerely,
Hallgeir

hallgeir.osterbo · 24 November 2021 09:30

Is there any updates to this issue that could be shared? For us this is getting a bit painful, so knowing when a fix is planned would be very helpful for us.

I am following the Github issue, but currently the only thing that has happened is more confirmed cases + the one who were assigned are now not assigned anymore. Anything that can be shared of what’s going on behind the scenes?

charles.hague · 24 November 2021 11:24

Hi Hallgeir,

Thank you for your message.

I can completely understand that this issue could be awkward to work around. Unfortunately I don’t have any updates at the moment. The issue is with our engineering team and needs to work through our usual processes (e.g. triage, assignment, development, testing and so on). I’ll let the team know you’re keen for this to be fixed and I will let you know if I can provide any further context.

I hope this is helpful. Please let me know if you have any questions.

Best Regards,

Charles

system · 25 December 2021 11:25

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.