Release showing as success but steps showing as not run

Hi,
We are seeing an issue where the release is showing as success but the steps are still running or not showing as complete.

We upgraded our version yesterday to 2023.1.10235.

The issue was reported before and after the upgrade.

I can send the log, so if you can send me a link to upload.

Kind Regards,
Micheál Power

Hey @mikepower79,

Sorry you are running into this issue, I have seen this before but only when Octopus is in a HA configuration and the instances are using shared storage. I do not think that is the case with you is it?

I have created you a secure link here you can upload the task log to, is it just that one release that is having the issue?

Let us know once the logs are uploaded and I can take a look for you,
Kind Regards,
Clare

Hey @mikepower79,

Sorry for the double post here but a college of mine just pointed out can you check if the deployment has available targets as if it does not it will skip all the steps. It should show that in the task log.

If that is not the case I will await the logs and take a look!

Kind Regards,
Clare

Hi @clare.martin,
Our octopus is a HA instance and using Azure files storage.

Kind Regards,
Micheál Power

@mikepower79,

We have seen it before with shared storage where the deployment completes in the database, which is why you are seeing it as green in the UI, but the logs are still being written to, which is why it can take a bit of time for the UI to catch up. When we see this its usually intermittent and does not happen on all projects.

If you can send us the logs over just so we can check there is nothing else going wrong here but it might just be that it may take a bit of time for the logs to be written to for that release. If that is the case how are the metrics looking for the shared storage do you notice a lag there or is it slower than what it is normally?

I look forward to hearing from you,
Kind Regards,
Clare

Hi @clare.martin,
I have uploaded the deployment log and the system diagnostic report.

Kind Regards,
Micheál Power

Hey @mikepower79,

Thanks for those logs, it does look like they are full logs and the steps that are supposed to run are showing as successful, which does seem to point to the shared storage issue I mentioned.

I did also find this forum post you put up which is the same issue and Justin mentioned shared storage and also this post here which was shared storage related and Justin gave you some tips as to how to pinpoint what the issue was, if you have not followed those already that would be a good start to the troubleshooting process.

I also noticed you put in another request which looks shared storage related so I would definitely take a look at your shared storage and see if the metrics are on par with how they are supposed to look.

Let me know if they look ok but the faults you are putting in at the moment all seem to point to a shared storage issue I am afraid.

Kind Regards,
Clare

1 Like

Hi @clare.martin,
Cant see any issues with our Azure storage.
If it was an issue with our storage catching up with the UI then the deployment would catch up and the steps would show as complete in the UI.
But this is not the case the steps are still showing not complete after a week, the release is showing as complete but steps are not.

I will investigate further but for now our Storage looks good.

Kind Regards,
Micheál Power

Hey @mikepower79,

I took another look at your server logs, specifically for ServerTasks-3366253 as that is the task you are seeing as still showing as not completed and there are a few errors relating to what I think is the task log, this does point to permissions on the storage. I am unsure on what the first error means so I can get that in front of our engineers but it does look like you may need to take a look at your storage permissions as Octopus is struggling to write to task logs:

Access to the path '/taskLogs/servertasks-3366253_xfxfrqu8t9.txt' is denied.

System.ArgumentException: CorrelationId ServerTasks-3366253_XFXFRQU8T9 is not currently active
   at Octopus.Core.Diagnostics.TaskLog.TaskLogFactory.Get(CorrelationId correlationId) in ./source/Octopus.Core/Diagnostics/TaskLog.cs:line 377
   at Octopus.Server.Extensibility.ITSM.Core.Precondition.ChangeControlledDeploymentStartHandlerBase.Handle(DeploymentEvent e, CancellationToken cancellationToken) in ./source/Octopus.Server.Extensibility.ITSM.Core/Precondition/ChangeControlledDeploymentStartHandlerBase.cs:line 53
   at Octopus.Core.Infrastructure.MessageBus.Middleware.UnitOfWorkMiddleware.Invoke(MessageContext context, Func`2 next, CancellationToken cancellationToken) in ./source/Octopus.Core/Infrastructure/MessageBus/Middleware/UnitOfWorkMiddleware.cs:line 29
   at Octopus.Core.Infrastructure.MessageBus.Middleware.UnitOfWorkMiddleware.Invoke(MessageContext context, Func`2 next, CancellationToken cancellationToken) in ./source/Octopus.Core/Infrastructure/MessageBus/Middleware/UnitOfWorkMiddleware.cs:line 29
   at Octopus.Core.Infrastructure.MessageBus.Middleware.LoggingMiddleware.Invoke(MessageContext context, Func`2 next, CancellationToken cancellationToken) in ./source/Octopus.Core/Infrastructure/MessageBus/Middleware/LoggingMiddleware.cs:line 39
   at Octopus.Core.Infrastructure.MessageBus.Middleware.CorrelationIdMiddleware.Invoke(MessageContext context, Func`2 next, CancellationToken cancellationToken) in ./source/Octopus.Core/Infrastructure/MessageBus/Middleware/CorrelationIdMiddleware.cs:line 33
   at Octopus.Core.Infrastructure.MessageBus.PipelineFactory.BuildAndRun[TEvent](Int64 sequenceNumber, TEvent e, IDictionary`2 metadata, String consumerGroupId, Type handlerType, CancellationToken cancellationToken) in ./source/Octopus.Core/Infrastructure/MessageBus/PipelineFactory.cs:line 51
   at Octopus.Core.Infrastructure.MessageBus.MessagePump.Dispatch[TEvent](Int64 sequenceNumber, TEvent e, IDictionary`2 metadata, CancellationToken cancellationToken) in ./source/Octopus.Core/Infrastructure/MessageBus/MessagePump.cs:line 135
2023-05-18 10:26:47.6595      1     74  WARN  An unhandled exception occurred while dispatching an event via the message bus: "CorrelationId ServerTasks-3366253_XFXFRQU8T9 is not currently active"
System.ArgumentException: CorrelationId ServerTasks-3366253_XFXFRQU8T9 is not currently active
"CorrelationId: "ServerTasks-3366253_XFXFRQU8T9/6ec38cf4a5854df198facc13655a1f3a""
"Message: "Enable Maintenance completed"
System.UnauthorizedAccessException: Access to the path '/taskLogs/servertasks-3366253_xfxfrqu8t9.txt' is denied.
 ---> System.IO.IOException: Permission denied
   --- End of inner exception stack trace ---
"CorrelationId: "ServerTasks-3366253_XFXFRQU8T9/85fe2e16c7224a33aad3a76cf403d11f/60ec2f6113ff4f64bff995305632c6a8/b9ff3e936ae4446580144d9d6066aa37""
"Message: "Calamari Version: 25.3.3"
System.UnauthorizedAccessException: Access to the path '/taskLogs/servertasks-3366253_xfxfrqu8t9.txt' is denied.
 ---> System.IO.IOException: Permission denied
   --- End of inner exception stack trace ---
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.Strategies.BufferedFileStreamStrategy.Dispose(Boolean disposing)
   at System.IO.StreamWriter.CloseStreamFromDispose(Boolean disposing)
   at System.IO.StreamWriter.Dispose(Boolean disposing)
   at Octopus.Core.Orchestration.Logging.Processing.AppendToLogFile.InitializeInternal() in ./source/Octopus.Core/Orchestration/Logging/Processing/AppendToLogFile.cs:line 49
   at Octopus.Core.Orchestration.Logging.Processing.AppendToLogFile.Process(CorrelationId correlationId, ActivityLogEntry entry) in ./source/Octopus.Core/Orchestration/Logging/Processing/AppendToLogFile.cs:line 71
   at Nevermore.Transient.RetryPolicy.<>c__DisplayClass26_0.<ExecuteAction>b__0()
   at Nevermore.Transient.RetryPolicy.ExecuteAction[TResult](Func`1 func)
   at Nevermore.Transient.RetryPolicy.ExecuteAction(Action action)
   at Octopus.Server.Orchestration.Logging.ServerLogWriter.DoSafeAppend(ILogEntryProcessor processor, CorrelationId correlationId, ActivityLogEntry entry) in ./source/Octopus.Server/Orchestration/Logging/ServerLogWriter.cs:line 171
2023-05-18 11:18:19.9081      1    138 ERROR  Failed to append the following log entry:"
"CorrelationId: "ServerTasks-3366253_XFXFRQU8T9/85fe2e16c7224a33aad3a76cf403d11f/60ec2f6113ff4f64bff995305632c6a8/b9ff3e936ae4446580144d9d6066aa37""
"Message: "Exit code: 0"
System.UnauthorizedAccessException: Access to the path '/taskLogs/servertasks-3366253_xfxfrqu8t9.txt' is denied.
 ---> System.IO.IOException: Permission denied
   --- End of inner exception stack trace ---

You can see here too its failing to write to the logs:

2023-05-18 11:18:19.9081      1    138 ERROR  Failed to append the following log entry:"
"CorrelationId: "ServerTasks-3366253_XFXFRQU8T9/85fe2e16c7224a33aad3a76cf403d11f/60ec2f6113ff4f64bff995305632c6a8/b9ff3e936ae4446580144d9d6066aa37""

Which would show as incomplete in the UI as the database has to get the information from the storage in order to update the steps in the UI. It will show as complete for the deployment as when that completes it updates directly from the database, but the logs themselves for the steps need to be taken from the storage and Octopus is having difficulty accessing them.

I will get the first error looked at by the engineers but in the meantime can you look at the permissions for that folder and ensure the service account Octopus is using for your HA nodes has access to that task logs folder.

I found at least 50 of those access denied errors so I would try and get those resolved and that might fix the issue.

I will update you on what the engineers say about the first error message.

Kind Regards,
Clare

Hi @clare.martin,
Ok will take a look at permissions on our storage.
Thanks for the quick response.

Kind Regards,
Micheál Power

1 Like

Hey @mikepower79,

Just to let you know our engineers got back to me last night and did mention we have another customer seeing this issue (we know their permissions are correct on their shared storage).

I will keep you updated on any discussions but as a precaution just to tripple check I would still check the permissions on the task logs folder just so we can rule that out. If they look ok at least we can say its not down to permissions as it may just be a co-incidence that this other customer has the same issue as you but its not entirely related to what may be the cause of your issue here.

Once I hear back from the engineers I will let you know.

Kind Regards,
Clare

Hey @mikepower79,

Sorry for the double post but I just took another look at the shortcut story the engineers have open and the other customer was getting the same error you are but only the first one -

System.ArgumentException: CorrelationId ServerTasks-3366253_XFXFRQU8T9 is not currently active

They think that is related to service now and Jira but that is what they are investigating, so the other customer is not seeing the tasks UI issue you are, I apologise for mixing that up!

So your issue looks to be permissions related but the engineers are looking into the ‘not currently active’ error.

Kind Regards,
Clare

Hey @mikepower79,

One last thing on that error:

System.ArgumentException: CorrelationId ServerTasks-3366253_XFXFRQU8T9 is not currently active

The engineers mentioned that error is is not impacting the ability to use Octopus. They think it’s caused by how task logs are being resolved when the message pump sends a message but it can be safety ignored.

The engineers will work on seeing what the cause of that issue is but it doesn’t factor into the issues you are seeing in the UI.

Hopefully the permissions check gets you somewhere but reach out if not and we can see if we can dig into this a little more.

Kind Regards,
Clare