Lifecycle automatic deploy not firing

pixman20 · 28 February 2019 03:48

We are currently using 2018.8.12 and are having an issue where a release is not automatically firing the deploy for the subsequent environment. Our best guess of the cause is that we had a failed deploy (due to connectiviy issues), which after restarting it and successfully deploying appears to have caused issues with the automatic deploys.

Our setup is as follows:

We are deploying Project A, which deploys project 1, 2 and 3 to several different environments subsequently.-
We have several environments, some of which are manual deploys, and some are automatic. Let say we have environment 1, 2, 3, 4 … up to 10 where 1 is manual, 2 fires automatically, 3 is manual, then 4 - 10 should fire automatically in series.
We are not using tentants

What we saw is environment 1 was selected to deploy, which worked succesfully. Env 2 then correctly fired off and deployed successfully. Env 3 was manually kicked off to deploy, but failed due to connectivity issues. After re-running it successfully (using the main deploy button in the top right), env 4 did not automatically fire as it should have. After env 4 was manually kicked off and completed, environment 5 also did not fire and so on for all remaining environments, such that we had to manually click each one.

I have not had a chance to attempt to reproduce this on our test instance, but was wondering if this is a known issue that’s been addressed in later versions or will be addressed. I could not find anything related when searching on my own through issues.

Also related to the lifecycle automatic deploys, I was wondering if there is a way to check in the logs to know when a trigger is fired that kicks off the next environment deploy (or doesn’t in the case above). We are noticing considerable delay where it sometimes takes around 25 seconds between 1 environment’s completion and the next environment’s start. We have many different projects deploying to many environments so this adds up considerably. I’m hoping there’s a way to decrease the delay or check if there’s some kind of performance issue on our end.

Thank you!
Andrew

Shane_Gill · 1 March 2019 05:40

Hi Andrew,

Thanks for getting in touch and providing a detailed description of the problem.

I have tried reproducing by configuring a project that deploys 3 child projects. I have a lifecycle with auto, manual, auto environments. I let the auto deploy occur to the first environment and then fail the manual deploy to the second environment… After a successful deploy to the manual environment the release is automatically promoted to the next environment, as you were expecting to happen.

You can find the logs in Configuration > Diagnostics > Auto deploy logs. If you could send those through to support@octopus.com they may help figure out what is happening.

The delay you are seeing is because auto deployments are processed on a schedule that triggers every 30 seconds. The interval is not currently configurable.

Cheers,
Shane

pixman20 · 1 March 2019 20:16

Hi Shayne,

Thanks for attempting to reproduce the issue. After looking further in the logs as you’d mentioned, I see the following error:

February 27th 2019 10:42:10 Error
Failed to flush Octopus.Server.Orchestration.Logging.Processors.AppendToLogFile System.IO.IOException: An unexpected network error occurred.

   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.FileStream.WriteCore(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.FileStream.Write(Byte[] array, Int32 offset, Int32 count)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at Octopus.Server.Orchestration.Logging.ServerLogWriter.DoSafeFlush(ILogEntryProcessor processor)

This error repeated every 30 seconds for about 12 hours it looks like, which makes sense if it happens every time the 30 second timer triggers. It sounds like this may have been caused by a network issue on our end when attempting to write to a log file during that time or perhaps another issue. As of now I’m not aware of any network issues though and the deploys seemed to work regardless.
The errors stopped on their own prior to the issue that we ran into, but they may have contributed to it. I’ve restarted the service and now that I know where to look I can check the logging on the server if it happens again or check if we’re having connectivity issues with our shared drive where the other logs reside.

As for the delay, are there any plans to make this configurable, or is it possible to workaround this another way using triggers? In order to workaround this issue: Allow Deploy Release step to be used in rolling deployment we’ve had to create an environment per deployment machine, so we have a substantial amount of “environments” and the delay is growing as we connect more machines.

Thanks Shane!

Andrew

Shane_Gill · 4 March 2019 22:29

Hi Andrew,

There are currently no plans to make the delay configurable, it sounds like a better solution would be to allow “deploy a release” in a rolling deployment. An environment per machine using auto-deploy is horrible, I’ll see what I can do to get that issue resolved.

Cheers,
Shane

system · 3 April 2019 22:29

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.