We have what I would think would be a common requirement, but can’t see from the documentation how to achieve it.
We are deploying 5 steps to a machine, there are 10 machines in the environment. The requirement is that if one step fails then the installation on that machine fails, but the other machines in the environment continue installing. The default functionality we are seeing is that all machines are deployed to in parallel, if one step fails the whole release is aborted and failed.
We seem to have achieved what we require by setting the deployment up to have a guided failure. This allows us ignore the failed step and skip the current machine, but continue with the rest.
However it would be preferable to have this functionality without requiring human interaction, an example of this would be for scheduled deploy to QA environment, we might want to set this running at 03:00 when all users are out of the system. The default behaviour would be on step fail to skip that machine and carry on with all others, but as no one would be around to guide the failure our current solution won’t work.
My questions are :
Can we setup the system to be by default on step failure, fail the machine, but continue to the next screen ?
Can you point me to any documentation that explains the step “run conditions” or documents that show the relationship between step fail and machine / release fails ?
Thanks for getting in touch! To answer your questions, no there is no way to do what you ask with failure. We also don’t really have any documentation on the conditions because currently they aren’t terribly complicated, but I can see how this is expanding, is slightly different for scenarios and I will look at adding something.
There could be a way to do what you want (automatic guided failure), using the API, to poll a deployment and perform the tasks that you now do manually.
We can’t recall being asked for similar functionality and would really like to understand your scenario. Could you explain a bit about wanting a machine to fail while all others succeed.
Do you then fix or redeploy to that machine? Would you expect any kind of rollback to complete on that machine after the failure?
We are interested in seeing what the Community thinks about this idea, and want to add it to UserVoice but I would first like to understand your point of view.
We are using Octopus to deploy applications to servers distributed throughout the country (connected on a WAN) and there are times when one server’s connection is lost. Right now, this means that the entire environment will fail (even of 10+ servers are available) or we have to fall back to the guided failure process, which is a bit of a pain for us. We have also considered having one environment per physical location, but that means that a lot more manual work when completing the deployment (unless I missing something in the life cycles setup).
I’m not sure if Jon’s situation is similar, but the ability for us to fail one machine in an environment would make deployments much easier for us to manage.
Thanks for getting in touch! Could you explain the following in a bit more detail “We have also considered having one environment per physical location, but that means that a lot more manual work when completing the deployment (unless I missing something in the life cycles setup).”
If you could also explain a bit more how you would like the deployment to work if a machine lost connection and failed. It will give me and idea if we have had similar features suggested.
Since we have servers spread across the country I have our production environment split between 4 different geographic environments. For example:
· Production East
o Server 1
o Server 2
o Server 3
· Production Central
o Server 4
o Server 5
o Server 6
Each server is in a different physical location, but they are geographically close and all servers are connected through a WAN. Right now, if Server 2 has a connection issue (internet down or something of that nature), then the entire environment will fail in Octopus Deploy. And, if Server 1 is already complete, it is not rolled back. So, now I have an environment running 2 different versions of the software, and Server 3 is left on the old version just because Server 2 failed.
In a perfect world, there would be two options in that scenario: We could roll back Server 1 to the previous version or server 2 would fail and server 3 would attempt the install.
For the rollback scenario we could handle it through complex deployment processes, but that is a lot of work. The only way I know of that we could force server 3 to continue is to use the guided failure mode (hard when we schedule our updates for late day deployments) or manually redeploy to just that server.
We considered having each physical location be its own environment:
· Location 1
o Server 1
· Location 2
o Server 2
That would allow us to deploy to all “Production” environments and if one fails, everything else would continue (as it is more or less a separate task). We currently have 30+ servers in our production environment (it varies based on project). So, the work required to set up all the environments would be very high. It would also be more work for us to schedule deployments in a rolling fashion. As an example, right now we generally schedule the deployment to East first, Central an hour later, etc. This allows it to roll across the time zones and gives us an opportunity to review and possibly cancel other deployments if there is an issue. I was hoping there would be something in lifecycles where we could “chain” the environments and add a delay to them. That way, when we deploy to production we would schedule the start and a delay period would be automatically inserted between each environment.
It also makes monitoring more complicated. The Dashboard screen would be so wide as to be useless for us unless we constantly scroll it (or if we had an option to make it show environments vertically).
Hope that helps. If you want more information let me know and I will gladly help out.
Sorry for the delay in getting back to you, I have been thinking about how best this could translate to a feature.
I can see both benefits and pain points in both scenarios, and I also can’t think of a better work around.
What I have done is created a UserVoice suggestion for the idea of a ‘continue on failure’ automated deployment mode:
I think this would help your situation if you could both schedule the deployment and tell it to continue to the next machine even if one fails.
So please go through to UserVoice, vote and comment so we can see what community support is behind this idea.