Multiple deployments of the same project concurrently

Hi,

We’re trying to deploy a relatively small project to a large number (2000+) of deployment targets. The project is a collection of files, rather than an IIS application or anything else too complex.

So far, we’ve found using Octopus’s inbuilt tools for creating deployments to be relatively unsuccessful for deploying releases of this project. It’s almost impossible to guarantee that all 2000+ nodes are available and healthy, and by default a failure on 1 machine then fails the whole deployment.

We experimented with guided failure mode, but this isn’t really practical either when a deployment contains 2000+ machines.

We’ve also seen that Octopus seems to consider each deployment a single task, so despite having Octopus setup in HA mode with 4 server nodes, and a task cap of 10 per server, Octopus was ever only running 1 “task” to try and deploy our project.

It’s probably worth mentioning we’ve tried different variations of rolling deployment windows, max parallelism, etc, to try and mitigate some of these problems, but generally this hasn’t improved the situation.

We’ve started testing out using the API to create multiple deployments for this project, each one targeting a subset of the deployment targets, for example: instead of 1 deployment of 2000 machines, we create 5 deployments of 400 machines. Our hope was that when these deployments were created through the API, Octopus would recognise the multiple large deployments and load balance them across our 4 nodes.

While the deployments get created successfully, only 1 deployment for this project runs, while the others are queued. Once again, only 1/10 tasks on 1 node is in use. I can however see deployments for other projects seem to “skip” this queue and start deploying straight away.

How can we configure Octopus in such a way that multiple deployments of the same project to different targets aren’t stuck in their own queue? Ideally I’d like to be able to split this into many small deployments across all 4 nodes.

Additionally, is there any way to configure Octopus so that it ignores failures and continues the deployment, without using guided failure mode?

We’re running Octopus Server 2018.4.0, with tentacles on 3.19.

Thanks.

1 Like

Ben,

Thank you for contacting us. I think there are two different questions to answer here.

With regards to the failing nodes, have you tried enabling “Skip Deployment Targets” in your project’s settings? This should allow the deployment to proceed and succeed if any nodes are not contactable. You can find more information about deploying to transient targets in our documentation.

As for running deployments in parallel, by default a deployment step will run on up to 10 targets simultaneously, which is determined by the Octopus.Action.MaxParallelism variable. Note that unless you are using tenants, you’ll only see one deployment task. During a deployment, do you see the step running against only one target at a time?

Looking forward to your reply.

Regards,
Jayden

Hi Jayden,

Often these nodes are contactable but fail for other reasons - sometimes OS misconfiguration, for example. Is there a way to successfully continue the deployment in this scenario?

I’ll double check on our Max Parallelism setting and whether deployment steps are running in parallel.What is the best way to check this? I’m looking at the “Acquire package” step of one of our deployments, and it tells me which are complete, but only lists “Started x minutes ago” on other targets, no indication as to which are running.

Thanks.

Hi Ben,

If your nodes are failing the deployment for other reasons, it becomes a little more complicated. Octopus is all about repeatable deployments and having the outcome of a deployment vary from one instance to the next doesn’t really fit the pattern. Not that your scenario is an isolated one, there is certainly some demand as evidenced by the support for this suggestion on our UserVoice site, but so far it hasn’t been implemented.

If there is any way to predict ahead of time that a machine will fail a deployment, then you can use a custom health check script to check for common issues such as OS misconfiguration ahead of time, and automatically remove those from the deployment using the methods described in the deploying to transient targets documentation linked in my earlier response. This is the best possible solution I can think of for you, if hardening the deployment itself against failures isn’t an option.

There are some other possible options, such as deploying to tenants rather than targets (which creates a separate deployment for each tenant), or creating another project with a script step that triggers the deployment of your original project to one target at a time. I would like to stress that I would only use these options if they were absolutely necessary, as they have configuration, performance, and maintenance implications that may not be palatable.

If you haven’t customised the MaxParallelism value, it will be the default, which is 10. You can read more about it in our documentation here. Note that there is a separate value for Acquire steps, Octopus.Acquire.MaxParallelism, which also defaults to 10. These values aren’t exposed in the UI, and setting them requires creating a variable with a matching name and the value you desire.

The deployment log for your task should give you timing information that shows you when each target acquire operation started and how long it ran for, and so you should be able to see if a number of them are running in parallel.

I hope that information helps. Let me know how it all goes!

Regards,
Jayden

Hi Jayden.

Thanks - I’ll take a look at the deployment log in more detail. We have customised Octopus.Acquire.MaxParallelism, it’s currently set to 32 (which, as far as I know was an arbitrary choice). I might also have a play around with .Action.MaxParallelism to see how that impacts our deployment speed.

I think a custom health check might also be a promising avenue to look into, if we can narrow it down to a subset of problems. Is there a way to put a tentacle into some sort of debug mode? All the deployment logs really tell us right now is that it exited with an error code (often a PowerShell error, off the top of my head). It’d be good if we could get a little more detail from the tentacles / calamari.

Thanks again.

Ben,

What type(s) of deployment steps are you using? There aren’t any specific debug switches we can enable, but do note that there is an option in the deployment log to include verbose data, rather than info, which is the default:

If you are using any of your own PowerShell scripts you can always add additional logging output to those.

Regards,
Jayden

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.