Deployment failed but all steps succeed

Hello,

Yesterday, we updated our Octopus Server to the last 2021.2.7650 version.

Since then, a majority of our deployments fail with a strange error (see screenshot below) although all steps were successful :

Both the tentacle and the server versions are up-to-date.

Exception occurred while executing a reader for `SELECT TOP 1 * FROM [dbo].[Deployment] WHERE [Id] = @Id`

System.Exception

   at Nevermore.Transient.DbCommandExtensions.ExecuteReaderWithRetryAsync(DbCommand command, RetryPolicy commandRetryPolicy, CommandBehavior commandBehavior, RetryPolicy connectionRetryPolicy, String operationName, CancellationToken cancellationToken)

   at Nevermore.CommandExecutor.ExecuteReaderAsync(CancellationToken cancellationToken)

   at Nevermore.Advanced.ReadTransaction.ExecuteReaderAsync(PreparedCommand preparedCommand, CancellationToken cancellationToken)

   at Nevermore.Advanced.ReadTransaction.StreamAsync[TRecord](PreparedCommand command, CancellationToken cancellationToken)+MoveNext()

   at Nevermore.Advanced.ReadTransaction.LoadAsync[TDocument,TKey](TKey id, CancellationToken cancellationToken)

   at Nevermore.Advanced.ReadTransaction.LoadAsync[TDocument,TKey](TKey id, CancellationToken cancellationToken)

   at Octopus.Core.RelationalStorage.RawRelationalTransaction.LoadAsync[TDocument](String id, CancellationToken cancellationToken) in RawRelationalTransaction.cs:line 26

   at Octopus.Server.Web.Infrastructure.OctopusQueryExecutor.LoadAsync[TDocument](String id, CancellationToken cancellationToken) in OctopusQueryExecutor.cs:line 225

   at Octopus.Core.Persistence.Database.ProjectPathDecorators.ProjectPathQueryExecutorDecorator.LoadAsync[TDocument](String id, CancellationToken cancellationToken) in ProjectPathQueryExecutorDecorator.cs:line 39

   at Octopus.Core.Persistence.Database.DatabaseDocumentStore`1.GetAsync(String id, CancellationToken cancellationToken) in DatabaseDocumentStore.cs:line 45

   at Octopus.Core.Persistence.Database.FullTableCacheDocumentStoreDecorator`1.GetAsync(String id, CancellationToken cancellationToken) in FullTableCacheDocumentStoreDecorator.cs:line 53

   at Octopus.Core.Persistence.Database.ProjectPathDecorators.ProjectPathDocumentStoreDecorator`1.GetAsync(String id, CancellationToken cancellationToken) in ProjectPathDocumentStoreDecorator.cs:line 42

   at Octopus.Core.Persistence.EntityTracking.EntityTrackingDocumentStoreDecorator`1.GetAsync(String id, CancellationToken cancellationToken) in EntityTrackingDocumentStoreDecorator.cs:line 42

   at Octopus.Core.Persistence.Auditing.AuditingDocumentStoreDecorator`1.GetAsync(String id, CancellationToken cancellationToken) in AuditingDocumentStoreDecorator.cs:line 48

   at Octopus.Core.Persistence.Database.Deletion.DeleteRelatedDocumentsDocumentStoreDecorator`1.GetAsync(String id, CancellationToken cancellationToken) in DeleteRelatedDocumentsDocumentStoreDecorator.cs:line 36

   at Octopus.Core.Persistence.Database.Deletion.VetoDocumentStoreDecorator`1.GetAsync(String id, CancellationToken cancellationToken) in VetoDocumentStoreDecorator.cs:line 35

   at Octopus.Core.Persistence.DocumentStore`1.GetAsync(String id, CancellationToken cancellationToken) in DocumentStore.cs:line 83

   at Octopus.Server.Orchestration.ServerTasks.Deploy.DeploymentPlanService.LoadStateContainer(DeploymentPlan plan, CancellationToken cancellationToken) in DeploymentPlanService.cs:line 83

   at Octopus.Server.Orchestration.ServerTasks.Deploy.ExecutionPlanService`3.Persist(DeploymentPlan plan, CancellationToken cancellationToken) in ExecutionPlanService.cs:line 60

   at Octopus.Server.Orchestration.ServerTasks.Deploy.DeploymentTaskController.<>c__DisplayClass5_0.<PersistPlan in DeploymentTaskController.cs:line 70

   at Octopus.Server.Infrastructure.Orchestration.UnitsOfWork.UnitOfWorkExecutor.<>c__DisplayClass3_0`1.<Execute in UnitOfWorkExecutor.cs:line 61

   at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.DoAsync(IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 73

   at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.DoAsync(IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 73

   at Octopus.Server.Infrastructure.Orchestration.UnitsOfWork.UnitOfWorkExecutor.Execute[T](Func`3 action, CancellationToken cancellationToken, String name) in UnitOfWorkExecutor.cs:line 62

   at Octopus.Server.Orchestration.ServerTasks.Deploy.DeploymentTaskController.PersistPlan(DeploymentPlan plan, CancellationToken cancellationToken) in DeploymentTaskController.cs:line 71

   at Octopus.Server.Orchestration.ServerTasks.Deploy.ExecutionTaskController`1.ExecuteBase(ITaskLog taskLogRoot, CancellationToken cancellationToken) in ExecutionTaskController.cs:line 106

   at Octopus.Server.Orchestration.ServerTasks.Deploy.DeploymentTaskController.Execute(ITaskLog taskLog, CancellationToken cancellationToken) in DeploymentTaskController.cs:line 55

   at Octopus.Server.Orchestration.ServerTasks.RunningTask.<>c__DisplayClass32_0.<RunMainThread in RunningTask.cs:line 154

   at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.DoAsync(IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 73

   at Octopus.Core.Infrastructure.UnitsOfWork.UnitOfWorkExtensionMethods.DoAsync(IUnitOfWork unitOfWork, Func`1 action, CancellationToken cancellationToken, String name) in UnitOfWorkExtensionMethods.cs:line 73

   at Nito.AsyncEx.Synchronous.TaskExtensions.WaitAndUnwrapException(Task task)

   at Octopus.Server.Orchestration.ServerTasks.RunningTask.RunMainThread() in RunningTask.cs:line 181



--Inner Exception--

SQL Error 995 - A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The I/O operation has been aborted because of either a thread exit or an application request.)

Microsoft.Data.SqlClient.SqlException

   at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__188_0(Task`1 result)

   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()

   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)

   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)

   at Nevermore.Transient.DbCommandExtensions.<>c__DisplayClass3_0.<ExecuteReaderWithRetryAsync

   at Nevermore.Transient.DbCommandExtensions.<>c__DisplayClass3_0.<ExecuteReaderWithRetryAsync

   at Nevermore.Transient.RetryPolicy.ExecuteActionAsync[TResult](Func`1 func)

   at Nevermore.Transient.DbCommandExtensions.ExecuteReaderWithRetryAsync(DbCommand command, RetryPolicy commandRetryPolicy, CommandBehavior commandBehavior, RetryPolicy connectionRetryPolicy, String operationName, CancellationToken cancellationToken)



--Inner Exception--

The I/O operation has been aborted because of either a thread exit or an application request.

System.ComponentModel.Win32Exception 

May this issue be related to bug in the last release ? Thanks in advance for your help. Selmir

Hi @BizyDev,

Thank you for contacting Octopus Support. I’m sorry you are running into this issue.

It looks like the Octopus Server may be having trouble with the task log. Is it possible the Octopus Server is running out of memory? Memory usage did go up a bit in Octopus Server 2021.2.X.

A couple of quick questions:

  • Can you confirm the specs of your Octopus Server?
  • Is SQL on a different machine or the same machine as the Octopus Server?

Let me know at your earliest convenience.

Best Regards,
Donny

Hi @donny.bell, thank you very much for the quick support.

Our Octopus Server and the MSSQL Server are not on the same host.

Here are the specs of our Octopus Server :

image

According to our monitoring, memory usage was under 40% for the whole day. And yes we noticed a small memory usage spike since yesterday, but as you can see the server is pretty comfortable regarding the RAM.

Our database server also has 16 GB of RAM, but since it’s a SQL Server, it uses almost the whole RAM available (85% RAM usage average for today).

Testing deployments as writing, memory doesn’t go up on the SQL Server when a deployment fails.

Our DBA is offline and I have no access to this production server, so I can’t try the failing SQL request on my own machine with SSMS, but I’ll certainly do it tomorrow.

Hope those details will help you (and us) to address this annoying issue :confused:

Hi @BizyDev,

Thank you for getting back to me.

Can you tell me if the Octopus Server “Tasklogs” folder is on the local machine or on network storage?

Let me know at your earliest convenience.

Best Regards,
Donny

Re, yes it’s on the same machine in the C:\Octopus\TaskLogs.

Hi @BizyDev,

Thank you for getting back to me.

Would you mind running a System Integrity Check?

Let me know if it finds anything.

Best Regards,
Donny

Re,

Looks fine on this side :

I just noticed another issue, Octopus indicated that a few tentacles are not up-to-date :

But, I upgraded all tentacles, it didn’t worked as expected. No errors were thrown, but the “Upgrade available” remained :confused:

Finally, if I try to upgrade all tentacles again, there’s no new upgrade task created but the “Upgrade available” is still there for a majority of tentacles… Strange :ghost:

Hi @BizyDev,

Thank you for the quick reply.

Are you, by chance, running SQL Express? We’ve had a few other customers report similar issues with SQL Error 995 due to the memory limitations of SQL Express. If you are not running SQL Express, can you tell me what version and edition of SQL you are currently running? If RAM is sitting at an average of 85% usage, would it be possible to increase the memory resources on this machine to see if the issue still appears?

It may also be worth check the fragmentation of your Octopus SQL db. We have a Step Template that can do that here: https://library.octopus.com/step-templates/b362bd69-4a69-42c1-bcb5-2a134549ef3f

Regarding the Tentacle upgrade message, this is a known bug:

I look forward to hearing back from you.

Best Regards,
Donny

Re Donny,

Sorry for the late answer.

No it’s not a SQL Express instance. Unfortunately, I’m not able to run your " SQL - Query Octopus Database for Fragmentation" step from our Octopus Server, I have a connection issue :

I’ll check with the infra team to try to run this step on the database.

I’ll keep you in touch asap, thank you very much for your support.

Hi,
My team have been having the same issue after updating to 2021.2.7650.
We also get errors from the health check, usually only for 1-3 of the first targets to be checked.

Our database runs in Azure. There are no shortage of resources.

A workaround seems to be to set Pooling=false in the connection string.

Listings from the two errors we get from the health check:


                |     System.InvalidOperationException: Invalid operation. The connection is closed.
                |     at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ValidateConnectionForExecute(SqlCommand command)
                |     at Microsoft.Data.SqlClient.SqlInternalTransaction.Rollback()
                |     at Microsoft.Data.SqlClient.SqlTransaction.Dispose(Boolean disposing)
                |     at Nevermore.Advanced.ReadTransaction.Dispose()
                |     at Octopus.Server.Orchestration.ServerTasks.HealthCheck.HealthResultRecorder.SaveResult(MachineHealthResult result, ITaskLog taskLog, CancellationToken cancellationToken) in C:\BuildAgent\work\eb99e602a96dd63\source\Octopus.Server\Orchestration\ServerTasks\HealthCheck\HealthResultRecorder.cs:line 85
                |     at Octopus.Server.Orchestration.ServerTasks.HealthCheck.HealthResultRecorder.Record(MachineHealthResult result, ITaskLog taskLog, CancellationToken cancellationToken, IHealthResultCollator healthResultCollator) in C:\BuildAgent\work\eb99e602a96dd63\source\Octopus.Server\Orchestration\ServerTasks\HealthCheck\HealthResultRecorder.cs:line 56
                |     at Octopus.Server.Orchestration.ServerTasks.HealthCheck.HealthCheckService.PerformHealthCheck(Machine machine, ITaskLog taskLogForMachine, CancellationToken cancellationToken, IHealthResultCollator healthResultCollator, ExceptionHandling exceptionHandling, Action`2 customAction) in C:\BuildAgent\work\eb99e602a96dd63\source\Octopus.Server\Orchestration\ServerTasks\HealthCheck\HealthCheckService.cs:line 92
                |     Octopus.Server version 2021.2.7650 (2021.2.7650+Branch.release-2021.2.Sha.0f176810b157f3de525844fbb6bd147186c70893)

                |     Exception occurred while executing a reader for `SELECT TOP 1 * FROM [dbo].[Machine] WHERE [Id] = @Id`
                |     System.Exception: Exception occurred while executing a reader for `SELECT TOP 1 * FROM [dbo].[Machine] WHERE [Id] = @Id`
                |     ---> Microsoft.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The I/O operation has been aborted because of either a thread exit or an application request.)
                |     ---> System.ComponentModel.Win32Exception (995): The I/O operation has been aborted because of either a thread exit or an application request.
                |     at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__188_0(Task`1 result)
                |     at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
                |     at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
                |     --- End of stack trace from previous location ---
                |     at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
                |     --- End of stack trace from previous location ---
                |     at Nevermore.Transient.DbCommandExtensions.<>c__DisplayClass3_0.<<ExecuteReaderWithRetryAsync>b__0>d.MoveNext()
                |     --- End of stack trace from previous location ---
                |     at Nevermore.Transient.DbCommandExtensions.<>c__DisplayClass3_0.<<ExecuteReaderWithRetryAsync>b__0>d.MoveNext()
                |     --- End of stack trace from previous location ---
                |     at Nevermore.Transient.RetryPolicy.ExecuteActionAsync[TResult](Func`1 func)
                |     at Nevermore.Transient.DbCommandExtensions.ExecuteReaderWithRetryAsync(DbCommand command, RetryPolicy commandRetryPolicy, CommandBehavior commandBehavior, RetryPolicy connectionRetryPolicy, String operationName, CancellationToken cancellationToken)
                |     ClientConnectionId:689bb87c-aa3f-4c28-8621-cea63dea142d
                |     Error Number:995,State:0,Class:20
                |     ClientConnectionId before routing:4ed1a8ba-1f2c-4ec7-870b-01e24b467987
                |     Routing Destination:b5e9fe70b3a9.tr1.norwayeast1-a.worker.database.windows.net,11047
                |     --- End of inner exception stack trace ---
                |     at Nevermore.Transient.DbCommandExtensions.ExecuteReaderWithRetryAsync(DbCommand command, RetryPolicy commandRetryPolicy, CommandBehavior commandBehavior, RetryPolicy connectionRetryPolicy, String operationName, CancellationToken cancellationToken)
                |     at Nevermore.CommandExecutor.ExecuteReaderAsync(CancellationToken cancellationToken)
                |     at Nevermore.Advanced.ReadTransaction.ExecuteReaderAsync(PreparedCommand preparedCommand, CancellationToken cancellationToken)
                |     at Nevermore.Advanced.ReadTransaction.StreamAsync[TRecord](PreparedCommand command, CancellationToken cancellationToken)+MoveNext()
                |     at Nevermore.Advanced.ReadTransaction.LoadAsync[TDocument,TKey](TKey id, CancellationToken cancellationToken)
                |     at Nevermore.Advanced.ReadTransaction.LoadAsync[TDocument,TKey](TKey id, CancellationToken cancellationToken)
                |     at Octopus.Core.RelationalStorage.RawRelationalTransaction.LoadAsync[TDocument](String id, CancellationToken cancellationToken) in C:\BuildAgent\work\eb99e602a96dd63\source\Octopus.Core\RelationalStorage\RawRelationalTransaction.cs:line 26
                |     at Octopus.Server.Web.Infrastructure.OctopusQueryExecutor.LoadAsync[TDocument](String id, CancellationToken cancellationToken) in C:\BuildAgent\work\eb99e602a96dd63\source\Octopus.Server\Web\Infrastructure\OctopusQueryExecutor.cs:line 225
                |     at Octopus.Server.Orchestration.ServerTasks.HealthCheck.HealthResultRecorder.SaveResult(MachineHealthResult result, ITaskLog taskLog, CancellationToken cancellationToken) in C:\BuildAgent\work\eb99e602a96dd63\source\Octopus.Server\Orchestration\ServerTasks\HealthCheck\HealthResultRecorder.cs:line 85
                |     at Octopus.Server.Orchestration.ServerTasks.HealthCheck.HealthResultRecorder.Record(MachineHealthResult result, ITaskLog taskLog, CancellationToken cancellationToken, IHealthResultCollator healthResultCollator) in C:\BuildAgent\work\eb99e602a96dd63\source\Octopus.Server\Orchestration\ServerTasks\HealthCheck\HealthResultRecorder.cs:line 56
                |     at Octopus.Server.Orchestration.ServerTasks.HealthCheck.HealthCheckService.PerformHealthCheck(Machine machine, ITaskLog taskLogForMachine, CancellationToken cancellationToken, IHealthResultCollator healthResultCollator, ExceptionHandling exceptionHandling, Action`2 customAction) in C:\BuildAgent\work\eb99e602a96dd63\source\Octopus.Server\Orchestration\ServerTasks\HealthCheck\HealthCheckService.cs:line 92
                |     Octopus.Server version 2021.2.7650 (2021.2.7650+Branch.release-2021.2.Sha.0f176810b157f3de525844fbb6bd147186c70893)

Thank you very much @tsandbukt, Pooling=false fixed the issue for us.

Hi @BizyDev,

Can you confirm if your SQL db is self-hosted or cloud hosted? If cloud hosted, which provider are you using?

Let me know at your earliest convenience.

Best Regards,
Donny

Re @donny.bell, it’s self-hosted but not on the same host as the Octopus Server.

Hi @BizyDev,

Thank you for the quick response.

We have submitted an inquiry with our Development Team concerning the SQL 995 errors and the relationship with the Pooling setting. I appreciate @tsandbukt sharing a potential fix. However, we’ll need to get confirmation from our Development Team before making a specific recommendation to adjust this setting.

I’ll let you know once I hear back. If you have any questions until then, please let me know.

Best Regards,
Donny

1 Like

Hi @BizyDev and @tsandbukt,

I’ll jump in here for Donny as he’s currently offline as part of our US-based. Just a quick update to let you know the dev team has raised a bug report for this at the following link.

We greatly appreciate your report and troubleshooting efforts. If you have any questions or concerns moving forward, please don’t hesitate to reach out.

Best regards,

Kenny