Query timeouts and slow server start time

We’ve been using our Octopus Deploy server for over 2 years now and are currently at version 3.13.8. In the last couple weeks we’ve noticed extremely slow response times via the web ui with lots of query timeout error messages. During our last 2 server reboots Octopus server has taken 30 - 40 minutes to come up and start accepting requests (via web ui). Nothing seems out of the ordinary and our projects have remained fairly static over the last 3 months or so (just minor process tweaks here and there). The culprit query timeout occurs when trying to view the overview tab of just about any project:

SELECT * FROM dbo.[MultiTenancyDashboard] WHERE ([ProjectId] = @projectId) ORDER BY [Id]

What’s interesting is we only have one project that uses the multi-tenant feature but every project overview fails with the above query timeout.

Other observations:

  • Server startup time ~30-40 minutes
  • CPU is almost always pegged 100% with Octopus Server by far the heaviest usage even though 0 tasks are running
  • Health Checks will run for over 24 hours even though everything comes back healthy

Any ideas what might be wrong or where to start troubleshooting? It’s almost becoming unusable in its current state.

Thanks!

Hi Brian,

Those are indeed disconcerting numbers. Let’s see if we can determine the cause.

In case you haven’t come across it, we have a documentation page on performance. It is worth reading this to see if anything is applicable.

To give us an idea of the scale of your instance, could you supply rough counts of:

  • Projects
  • Tenants
  • Machines

What are your hardware specs for the machine Octopus is running on?

Would you also be able to attach an Octopus server log? Preferably one which covers the time during a server start.

I would also recommend inspecting your Retention Policies. Do you have these configured?

Finally, if are willing to capture a performance trace during a time when your server is experiencing high CPU usage, this would be a big help.

This information above will hopefully provide some clues to the problem.

You can upload files (e.g. server logs or performance trace) to this secure location, but please also respond to this thread, as I don’t get automatically notified when files are uploaded.

Regards,
Michael

Hi Michael –

Here are the answers to your questions:

Projects = ~50 (just over the limit so the dashboard has the group/name filters)
Tenants = 8
Machines = ~150

Server size = 6 x 2.3GHX w/ 64GB RAM

I did review retention policies and updated all our life-cycles to include a phase for PROD so I could drop release retention on lower environments. This appears to have helped with the query timeouts and server start-up time. We still experience high CPU utilization even when no tasks are queued.

Attached is the server log during last startup.

I will attempt to get a dotTrace ASAP.

Thanks,

Brian

OctopusServer.zip (9 KB)

Brian,

Thank-you for the logs.

Your logs suggest you are on version 3.7.18.

Octopus Deploy: Server version 3.7.18 (3.7.18+Branch.master.Sha.6146586b187ed08164b3db8f3becba4f364ad767) instance OctopusServer

Assuming this is correct, I would definitely recommend upgrading. We have made a number of performance improvements since that release.
In particular, we introduced better caching for the dashboard queries, which from inspecting your logs would seem to have good potential to help in this case.

I would love to know the results if you can schedule an upgrade.