Issues with Octopus Rolling deployment (Powershell script step)

usability
known
(Rock26in) #1

Hi,

I need to run a ‘PowerShell Script Step’ (it has 10 child steps) on to 20 target servers.

I have configured a Rolling Deployment with window size as ‘5’, so expect Octopus to run ‘PS Script Step’ on 5 servers at a time, and in 4 slots.

However, I’m facing an issue as mentioned below:

When I run the ‘PS Script Step’, it runs on ‘first 5’ target servers in parallel regardless of deployment failure/success on any target server. Only after successful completion of ‘first 5’ target servers, it runs on the next '5 servers’ and so on.

However, if ‘PS Script Step’ fails on any 1 server out of ‘first 5’ servers, then step will not run on any of the remaining ‘15 servers’ and overall deployment will be marked as ‘Fail’.

I would expect ‘PS Script Step’ to run on all 20 target servers irrespective of deployment failure/success on any target server.

Note:

  1. ‘PS Script Step’ - whenever there is any exception or error while running the PS script, I’m using either throw “exception” or Write-Error “error” to fail the script.
  2. I can use ‘Octopus.Action.MaxParallelism’ variable to increase default value from ‘10’ to ‘20’ and set ‘parallel deployment’ in the Step, but it would put load on the Octopus Server. I need to increase the no. of target servers to ‘100’ or more in future so don’t want to do this.

Any help would be highly appreciated!

Thanks!

(Paul Calvert) #3

Hi @rock26in,

Thanks for getting in touch!

The behaviour that you’re seeing is expected. Any failure during deployment will prevent any further steps from running unless they have a run condition of always run. And in a rolling deployment, each batch is treated as an individual step.

The logic behind this being that if you’re rolling out a deployment to 20 servers, and it fails on the first 5, you wouldn’t usually want it to continue on the remaining 15 and end up taking your entire environment offline.

The way to continue a deployment would be through enabling Guided Failures on either the environment or project. With guided failure enabled, any failures will only pause the deployment and allow you to ignore the failure and continue with the deployment.

I hope this helps clarify the behaviour, let me know if you have any further questions.

Best regards,
Paul

(Rock26in) #4

Hi @paul.calvert,

Thanks for your quick response!

I’ve explored Guided Failures and it looks to be a very good feature in Octopus. But somehow I’m not able to fully achieve what I want to. I have almost 5-6 child steps within the parent step, and if 1st child step fails due to some reason, then ‘EXCLUDE MACHINE FROM DEPLOYMENTS’ guided failure options still execute all the remaining child steps of the parent step. Whereas I expect parent step to stop executing when 1st child step fails.

Below is summary of what I’ve done so far and wish to achieve. Please suggest what could be the best possible solution:

I have created a project in Octopus that has 3 Steps as below:

Step-1: Health Check (octo in-built step)

Step-2 (Parent): Pre-checks, Install Patches and Perform Testing

  • Step 2.1: Check if enough free disk space is available on target server (powershell script)
  • Step 2.2: Transfer nuget package from artifactory to target server (octo in-built step)
  • Step 2.3: Check if nuget package is available at directory path on target server (powershell script)
  • Step 2.4: Install patch (powershell script)
  • Step 2.5: Post deployment testing (powershell script)

Step-3: Deployment Notification

Conditions:

  1. Step-2 will run only if Step-1 runs successfully.
  2. Within Step-2, each child step should run only if previous child step executes successfully.
  3. Step-3 is configured to always run.
  4. Step-1, Step-3 run on Octopus Master server whereas Step-2 runs on deployment targets.

Requirement:
During execution, Step-2 should run in parallel (maybe in slots of 5 or 10) on all the target servers (say 20) such that:

  1. Step should run on each Target server.
  2. If any child step (say Step 2.1) fails for any server, then Octopus should immediately fail Step-2 for that server only and not impact deployment of other servers.

There might be cases where Step 2.1 (Check if enough free disk space is available on target server) fails only on 1 server out of 20, so I don’t want the whole deployment to fail due to this reason because remaining 19 servers might have enough free space to proceed further.

If above is not possible with Rolling deployments and instead I need to go with setting-up ‘Octopus.Action.MaxParallelism’ variable at the project level, then what kind of Octopus Server Configuration is required for it to efficiently handle parallel deployments at a time for up to say ‘100’ or ‘1000’ servers?

(Paul Calvert) #5

Hi @rock26in,

I have a few initial thoughts on this.

The primary purpose of grouping steps and rolling deployments is to handle situations where you don’t want all of your servers to go offline at the same time, for example, a load-balanced web server farm. The rolling deployment would allow you to run through a full deployment on a small batch of servers, bring them back online, then work on the next set.
If this isn’t a concern for you, then I would consider moving away from the parent-child grouping and rolling deployment and instead have all the 2.x steps as individual full steps.

You can look at increasing the max parallelism above the default 10 if you feel that you want the process to move quickly. It seems like most of your steps are being run on the deployment target rather than the Octopus Server, so running against an increased number of targets at a time shouldn’t have that much of an impact on your main Octopus Server.

Another thing you may want to consider is amending the default health check script to include the Step 2.1 check. With it being a simple disk space check, I can’t imagine it adding any overhead to the regular health check run times, and you could then configure your Health Check step to exclude any machines that fail automatically.
This would remove the possibility of a failed step due to disk space altogether.

Regards,
Paul

(Rock26in) #6

Hi @paul.calvert,

Thanks for your valuable inputs!

Firstly, to make things more clear, I’m using Octopus to automate patching on multiple target servers (could be 10, 100 or more).

Secondly, I have many more pre-checks to perform in Step-2, just like Step 2.1, which I did not include in the project details above. And all of those pre-checks are PS scripts which need to run on deployment targets. So clubbing all those pre-checks with Step-1 (Health check) might not be feasible as Health-check (connection test only) will run on Octopus Master Server whereas all other pre-checks on deployment targets.

Reason why I have grouped steps in Step-2 is to make sure that if Step-2 has started to execute/run for a particular target server, then it should execute fully till completion or till the point where any child step fails.

If I move away from the parent-child grouping and instead have all the 2.x steps as individual full steps, then while running deployment in parallel for say 10 servers at a time, if any individual step fails for any server, then the whole deployment is marked as failed and all subsequent steps do no execute for all target servers. There might be cases wherein a pre-check fails only for 1 server and succeeds for remaining 9 servers, but this will stop execution on all servers.

Talking about max parallelism, is there an upper limit for this? Say 1000, or 5000?

Considering most of the steps need to execute on deployment target, if running against an increased number of targets at a time shouldn’t have that much of an impact on your main Octopus Server, does that mean I can even go for a higher number like ‘100’? Is there any document or guide that could be referred?

I went thru the link https://octopus.com/docs/deployment-process/performance that talks about certain contraints on following but doesn’t give any concrete numbers:

  1. Consider How Many Targets You Deploy to in Parallel (suggests using Rolling deployment – but it has certain contraints for my requirement as discussed above)
  2. Consider How Many Targets Acquire Packages in Parallel

Thanks!

(Paul Calvert) #7

Hi @rock26in,

Whichever way you configure the steps, any failure will cause the deployment to cease unless guided failure is enabled. Once this is enabled you’d have the option to exclude the failing machine from the rest of the deployment and continue.

As every environment is different we don’t provide any suggestions for max parallelism, it would come down to some trial and error until you find the best setting.

Regards,
Paul

(Rock26in) #8

Hi @paul.calvert,

Thank you so much for your help and advise on this. Well appreciated! :slight_smile:

Thanks!

1 Like