Adding HA after the fact can be very frustrating when having to work with firewalls. From experience, it can take weeks or even months for firewall changes to be approved.
This is something we want to solve for Octopus Cloud. The solution proposed is a work around until that is solved.
In a nutshell, you will configure an HA cluster with three (or more nodes)
- Node #1: your existing instance customers are connecting to with their polling tentacles. This will continue to orchestrate deployments as it always has.
- Nodes #2 - N: new nodes with the task cap set to zero. This will handle all UI and API requests from your users.
The benefits this configuration provides are:
- UI related work is offloaded onto other nodes, freeing up resources for the instance processing tasks.
- You have hot backups ready to go in the event node #1 were to go down.
- Adding new nodes to the cluster is trivial at this point.
In the event Octopus Node #1 goes down (and doesn’t come back up), you could switch the load balancers over like this:
There are some caveats to this:
- The nodes will need to be started with the task cap set to 0. This can be accomplished via the command line: Node - Octopus Deploy
- You’ll need some automation in place to switch over to node #2 when #1 goes down.
- When node #1 comes back online you’ll need to determine which load balancer to add it back to.
The question then becomes, how big should the task cap be for an instance and how many polling tentacles can connect to a single VM? Here are some considerations to consider:
- If you are in the cloud smaller instances (1 cpu/2 GB of RAM) have less network IOPS. That limits the number of polling tentacles that can be connected (and not suffer timeouts)
- In our experience with Octopus Cloud v1 (each customer got their own Ec2 VM with 2 CPUs/4 GB of RAM), that could handle 1500-2000 polling tentacles before hitting resource contention.
- There is a margin of diminishing returns, once you go over 8 CPUs/12 GB of RAM the amount of network IOPS allocated hits a cap. Our largest customer has Ec2 instances with 8 CPUs/10 GB of RAM for 5000+ polling tentacles
- We don’t recommend going over 40 tasks for the task cap. Ideally 20-25 is the “sweet spot” for the vast majority of people.