Deploying to AZVMSS is more complicated than I expected. Can I get some pointers?

Octopus version: 2019.5.10

I have a project that has recently been updated to deploy to an Azure VM Scaleset (AZVMSS). The Project is set up that whenever I get a new deploy target, it automatically deploys to the new server whatever was the last successful release, with any notifications to Slack and Email disabled since the trigger is not a manual one. This has been working well and fine until I try to deploy a new release.

I’ve noticed that new releases are taking an hour now, when it used to take 10 minutes. The reason from what I can see in the deploy log is that Octopus decides to run a health check on the servers, and they take a lot longer than the managed instances I have on-prem and the regular VMs in azure do. Since it cannot do 2 things at once the deploy waits for that to happen. Well this is fine, except that since sometimes the healthchecks take a lot longer than they should, and because of the waits the AZVMSS might decide to scale in the total # of machines running, so then when it goes to the next step, multiple machines take a lot longer to fail and ultimately self-remove from the deploy process ( I am setup to ignore machines in deploy if i can’t find them). This becomes a never ending cycle and my most recent deploy for this project now takes an hour.

What are some things I can do to optimize this and avoid the long wait times? I’ve tried going the route of disabling autoscaling, however azure powershell doesn’t support the rule I made in the portal and thusly will error out when i try to do a commandline override.

Some other details:

I have a different machine policy for these specific VMs. It uses the default health check that the system provides when trying to verify the machines. the Scaleset policy will auto-delete any VM it can’t find after 30 minutes.

Hi @Zoren_Manteuffel,

Thanks for getting in touch! This sounds painful. For us to help you we need some more information from you. Once you send these details through, someone can take the next steps to figure out a way forward with you.

Gather the following things and send them to

  1. The raw task log for a deployment to this environment which performaned normally (10 mins).
  2. The raw task log for a deployment which shows the bad behaviour (1 hour). More examples may help isolate problems, so send through as many as you feel may help.
  3. The raw task log for any other tasks which are running around the same time which may be interfering, such as Health Checks.
  4. A screenshot of your deployment process.
  5. A screenshot of your trigger configuration.

Hope that helps!

Thanks Mike, I’ll send them an email and see what happens.

1 Like