Best pracitce for cloud dynamic scaling (AWS/ScaleGroup)

peter_m_mcevoy · 6 August 2019 10:32

Hi there,
I’m converting our self-hosted (relatively static) infrastructure to AWS and in the process thinking about the future and the more dynamic nature of AWS environments. In the process, I am becoming aware of Scale Groups and “Scheduled Retirement” of instances and other AWS concepts that require our software deployment to be more robust and even more hands-off.

Currently (in our data centre) we use Octopus to deploy our software over a number of pre-provisioned VMs that are in a load-balancer. we use hide/expose to get as close as possible to blue-green deployment. We release our software many times per-week. Once a quarter, our ops team replace all the VMs in the DC: adding new VMs; deploy current green releases; hide and shutdown legacy VMs. We use DSC and custom Chocolatey packages to register new VMs with octopus and help with automation.

In converting to AWS, I initially want to replicate this pattern - a static number of servers that are replaced at planned times. I have this working using Scale Groups, Launch Templates with UserData + choco packages. This works well: I can scale up, and VMs are registered in Octopus in the correct environments with the correct roles. However I want to move away from this pattern and leverage the more dynamic abilities of AWS.

The main issue I’m struggling with is AWS dynamic scale-out and Manual Intervention steps. I would like that when a new target lights up, that the current green version is deployed and comes online without intervention. I have read and broadly understand Automatic Deployment Triggers and Unattended Release Behaviour, but given that our teams release many times a week, I want to minimize the overhead and remembering to run octo create-autodeployoverride --version every time they release a new version, seems like something that will be forgotten. I think I am looking for a pattern that allows a step to be skipped or not depending on if a human is releasing or if an Automatic Deployment Trigger is releasing.

Secondly, I am concerned about scale-in. I am aware that Machine Policies can remove targets that are no longer available: but just because a tentacle is de-registered, does not mean that the VM has been removed from the AWS load-balancer/scale group. It could be just a comms problem and it is still there running an older version of the software potentially clashing with later deployments. In this case, I think I want to have the AWS infrastructure explicitly de-register the tentacle when the instance is terminated - I realize this would be an AWS feature so I wonder if you are aware of anything in this space (IE to run a script on instance stop/terminate)?

Any pointers to best-practice would be appreciated…

Pete

Bob_Walker · 7 August 2019 23:01

Hi Pete,

Thank you for reaching out. I did a recent talk at KCDC on scaling out production using Infrastructure as Code for Azure. The same core concepts will apply for you. There was no way anyone would remember everything so I wrote a lot of extensive doco.

I think specifically you would be interested: https://github.com/OctopusSamples/KCDCIaC/blob/master/ScalingOutProduction.md#skip-specific-deployment-steps-during-trigger-deploy

I also have a practice recording of the presentation showing that in action: https://youtu.be/I8k2ox9QtXw

Onto your next question, what about removing targets the auto scaling group removes them. Well, you can call a Lambda in that case. That Lambda can turn around and then invoke the Octopus API to either deregister a machine or deploy a project to remove a machine and remove it from the load balancer. We did an Ask Octopus video to help answer that. https://www.youtube.com/watch?v=5zYkkkFLwzY.

AWS Doco: https://aws.amazon.com/blogs/compute/using-aws-lambda-with-auto-scaling-lifecycle-hooks/

Example Script the Lambda Function Could call: https://github.com/OctopusDeploy/OctopusDeploy-Api/blob/master/REST/PowerShell/Workers/DeleteWorkerDuringDeployment.ps1. It’s a PowerShell script, so it probably won’t work for AWS Lambda, but hopefully gives you an idea on what you can accomplish with it.

I hope that helps!

peter_m_mcevoy · 8 August 2019 15:47

Hi Bob,
Thanks for the detailed reply. I had a feeling that the Run Condition based on a trigger name variable would be a solution - but then I second-guessed myself, thinking that the result of the run condition would be snap-shotted and stored with the deployment. Good to hear that this is the correct approach - I particularly like the “unless” semantic so will explore. I’ll also explore if more complicated expressions are possible.

I hadn’t considered the lambda for deregister so this also is very useful.

Thanks again for a considered response - feeling a lot more confident now!

Bob_Walker · 8 August 2019 21:42

Hi Peter,

You are welcome. In all fairness I was able to provide an answer this detailed because I just did a presentation

Please reach out if you have any other questions!

Best Regards,
Bob

peter_m_mcevoy · 16 August 2019 10:30

Hi Bob,
Apologies to re-open this item, but I have a follow up question that I have only discovered having seen auto-deploy in action and it seems appropriate to keep the question in context.

I have enabled the auto-deploy triggers and added the unless run condition to avoid Manual Intervention on Scale-Out. However I ran into an issue where two of our projects have an implicit deployment ordering dependency and I’m trying to ensure that one is installed before the other: Project A depends on a specific known version of Project B.

I discovered the “Deploy a Release” step and after reading, introduced Project C, which will “Deploy a release from” B followed by A.

Because I want Project C to control deployment ordering, I enabled the auto-deploy trigger on C, and removed the trigger from A and B - however the manual intervention steps in A & B still have the “unless” logic.

I created a release of C, which takes a specific version of B and the (current) latest of A. I deployed that release to my AWS environment and it completed all the way through (as it was kicked off by me, I had to manually intervene)

I then scaled out the environment and the trigger on C fired. Unfortunately the Manual Intervention steps were not ignored on A and B… Perhaps there is alternate variable I should be using?

(As an aside, I also note that the “Task Progress - Summary” screen for project C shows the two separate deploys of the release - the initial followed by the scale out. However this is not the case on the task summary pages of A and B - it’s two separate deploys of the chosen releases).

Pete

Bob_Walker · 16 August 2019 14:41

Hi Peter,

No worries. So I have a similar “traffic cop” project.

With the deploy a release step you can send down a variable to the child project.

So in the child projects you could send down a variable and unless that variable is set then a manual intervention is required for the child project. In the parent project you can do have the run condition look at the trigger. That way you can use the traffic cop project to handle all your deployments to prod.

BTW, for my traffic cop project I typically have them use a different lifecycle. For the child projects they have the standard lifecycle dev->test->staging->prod. But the traffic cop project only has staging->prod. Or test->staging->prod, depending on your scenarios.

I hope that helps!