I’m converting our self-hosted (relatively static) infrastructure to AWS and in the process thinking about the future and the more dynamic nature of AWS environments. In the process, I am becoming aware of Scale Groups and “Scheduled Retirement” of instances and other AWS concepts that require our software deployment to be more robust and even more hands-off.
Currently (in our data centre) we use Octopus to deploy our software over a number of pre-provisioned VMs that are in a load-balancer. we use hide/expose to get as close as possible to blue-green deployment. We release our software many times per-week. Once a quarter, our ops team replace all the VMs in the DC: adding new VMs; deploy current green releases; hide and shutdown legacy VMs. We use DSC and custom Chocolatey packages to register new VMs with octopus and help with automation.
In converting to AWS, I initially want to replicate this pattern - a static number of servers that are replaced at planned times. I have this working using Scale Groups, Launch Templates with UserData + choco packages. This works well: I can scale up, and VMs are registered in Octopus in the correct environments with the correct roles. However I want to move away from this pattern and leverage the more dynamic abilities of AWS.
The main issue I’m struggling with is AWS dynamic scale-out and Manual Intervention steps. I would like that when a new target lights up, that the current green version is deployed and comes online without intervention. I have read and broadly understand Automatic Deployment Triggers and Unattended Release Behaviour, but given that our teams release many times a week, I want to minimize the overhead and remembering to run
octo create-autodeployoverride --version every time they release a new version, seems like something that will be forgotten. I think I am looking for a pattern that allows a step to be skipped or not depending on if a human is releasing or if an Automatic Deployment Trigger is releasing.
Secondly, I am concerned about scale-in. I am aware that Machine Policies can remove targets that are no longer available: but just because a tentacle is de-registered, does not mean that the VM has been removed from the AWS load-balancer/scale group. It could be just a comms problem and it is still there running an older version of the software potentially clashing with later deployments. In this case, I think I want to have the AWS infrastructure explicitly de-register the tentacle when the instance is terminated - I realize this would be an AWS feature so I wonder if you are aware of anything in this space (IE to run a script on instance stop/terminate)?
Any pointers to best-practice would be appreciated…