Chef config deployment to Windows VMs are slow

tcochran · 22 September 2022 14:55

We use Chef for our configuration management and the cookbooks & recipes get deployed to VMs using Octopus and Octopus Chef step templates. Currently our converge Chef recipe step template which is in our deploy process for certain Octopus projects which contain Windows deployment targets takes a long time to deploy. It takes about 30minutes or more per VM and Chef in the past is notorious for not being fast on Windows machines since it was more optimized for Linux machines.

We tried updating the chef infra client version on some of the Windows VMs to the latest version since 17.x looked to improve Chef Windows speed but we only saw a very minor improvement. So were wondering if there are certain things we can look at between Octopus and our Windows VMs to improve the Chef deploy speed?

cory.reid · 22 September 2022 17:26

Hey @tcochran , thanks for reaching out!

I’m familiar with Chef, but want to make sure I’m correctly understanding your context here:

You have Chef recipes/cookbooks that are packaged in Octopus
You’re deploying these to your Windows targets, and using a step template in Octopus to converge the resources and ensure they meet the intended declared configuration

Beyond that, you’re seeing some slowness in the way these are being managed when deploying through Octopus. Is all of this correct? Want to make sure I understand your current state so I’m asking the right questions/providing the right information and resources!

tcochran · 22 September 2022 17:48

Thanks for the response,

Yep that’s all correct.

cory.reid · 23 September 2022 12:21

Perfect, thanks for confirming!

I have a few additional questions -

Have you tried converging your resources on the server locally (without running via Octopus/on the Tentacle)? Interested in how long that takes compared to the Octopus execution.
In your task log, about how much time is being spent on the Chef execution itself? 30+ minutes is a hefty chunk of time, interested in how much of that is being spent on Octopus tasks (transferring the package, extracting, setting up Calamari/execution) compared to the Chef specific tasks.

Happy to hear any additional information or context around the above so we can hopefully dig into where the slowness is occurring!

tcochran · 23 September 2022 13:53

Have not tried on the server locally but will do that and get back to you. Will check task log to see what is taking the longest amount of time as well.

tcochran · 23 September 2022 15:49

So tested locally and it did run quicker locally instead of 30+ mins usually been between 30 - 35 mins in Octopus it went down to about 24 mins locally. I also checked the task log for the Chef specific task and didn’t see anything out of the ordinary but will attach below the output. The log below actually shows one of the quicker chef deploys from octopus. The other tasks don’t take long like acquiring the packages only takes a minute so its something with the chef step I’m thinking.

Chef log.txt (90.5 KB)

cory.reid · 23 September 2022 16:31

Thanks for the added context, that matches up with what I generally expected.

The good news is that Octopus isn’t adding a tremendous amount of slowness, and it’s largely based on the way the Chef recipes are converging. Unfortunately, that doesn’t help when it’s eating up 1 of your task capacity for 30 minutes while Chef is running. Have you considered using a scheduled project trigger (or scheduled runbook trigger) to schedule these to run at off hours? That can allow them to run with the same auditability through Octopus, but take up less time and capacity during the active deployment day.

tcochran · 23 September 2022 18:04

Have not looked at scheduled project triggers yet. Will definitely look into it though. Thanks for your help.