Octopus version: 2019.5.10
I have a project that has recently been updated to deploy to an Azure VM Scaleset (AZVMSS). The Project is set up that whenever I get a new deploy target, it automatically deploys to the new server whatever was the last successful release, with any notifications to Slack and Email disabled since the trigger is not a manual one. This has been working well and fine until I try to deploy a new release.
I’ve noticed that new releases are taking an hour now, when it used to take 10 minutes. The reason from what I can see in the deploy log is that Octopus decides to run a health check on the servers, and they take a lot longer than the managed instances I have on-prem and the regular VMs in azure do. Since it cannot do 2 things at once the deploy waits for that to happen. Well this is fine, except that since sometimes the healthchecks take a lot longer than they should, and because of the waits the AZVMSS might decide to scale in the total # of machines running, so then when it goes to the next step, multiple machines take a lot longer to fail and ultimately self-remove from the deploy process ( I am setup to ignore machines in deploy if i can’t find them). This becomes a never ending cycle and my most recent deploy for this project now takes an hour.
What are some things I can do to optimize this and avoid the long wait times? I’ve tried going the route of disabling autoscaling, however azure powershell doesn’t support the rule I made in the portal and thusly will error out when i try to do a commandline override.
Some other details:
I have a different machine policy for these specific VMs. It uses the default health check that the system provides when trying to verify the machines. the Scaleset policy will auto-delete any VM it can’t find after 30 minutes.