Deployment timeout did not occur when the machine was offline

Brent · 2 January 2015 16:07

We deploy some of our services via octo.exe after successful packaging. Currently we had one of our nodes responsible for handling notifications go offline and all the builds that got kicked off with octo.exe were waiting for that one machine forever. The deployment never timed out when the node was offline. Nor was it skipped when kicked off with octo.exe. Not sure if this is something we do not have configured right or if this is a flaw in what the timeout is used for.

Thanks!

Vanessa_Love · 7 January 2015 03:08

Hi Brent,

Thanks for getting in touch! I will see if I can explain this well. Octopus will continue to try deployments with offline machines. The timeout is large in some cases up to 30 days. It does depend on what type of step it is and how long we wait. But if a deployment starts and it can’t contact the machine, it will try (and try and try) for a really long time. The main assumption being that the machine is expected to come back online.

So really you didn’t do anything wrong or have settings wrong, there is no setting to timeout deployments, and it cannot be relied upon to fail a deployment (depending on the step type).
We did add the feature for skipping online deployments with 2.6 in the UI. But secretly all it does is show potentially offline machines, and then excludes them from the specific machines that the deployment will happen for. This functionality is not in Octo.exe yet. I’ve created a UserVoice (see below) to add this feature to the tool, but i’m not really sure if it would help in this situation. It does appear mostly just a misunderstanding of how the deployment could fail from the timeout. If that machine had come back up, the deployments would have continued.

Hopefully this information helps.
Vanessa

Brent · 7 January 2015 17:51

Hmm this does not seem ideal from our perspective. Is there anyway to set the timeout for this wait per project or globally? I feel like waiting indefinitely is not the optimal solution here. To give you more context let me tell you what made this question start.

We use one machine that handles deploying assets and notifying other systems of deployments. This method works decently well and thanks to your suggestion has improved performance wise. We lost this node in an outage for a day because of a VM issue. Every job that ran for a testing region with this node then queued on the first step that required the machine that went offline. Then we had about 60 tasks queued up because of automated deployments kicking off. I would think that having this value configurable and allowing a timeout to fail the build would be a better solution then waiting indefinitely. This is more specifically when a machine becomes unavailable after deployment has begun.

Not sure if our assumption is a fair assumption but would love some feedback because I imagine other people would agree with this behavior.

Thanks!

Vanessa_Love · 9 January 2015 08:59

Hi Brent,

Thanks for the reply. We did a test for this on our demo server, and a PowerShell step deployment to an offline machine and it timed out in 10 minutes.
Could you provide one or two of the deployment logs for the tasks you had to cancel.

Vanessa