Server Memory Issue

Octopus 3.8.6

Recently we have started noticing Octopus slowing down, then hanging until the service was restarted manually. It seems to coincide with Applying Retention Policies, specifically the ‘Apply built-in deployment manifest retention policy’ step. During this step Octopus will take up all available memory, currently at over 10GB commit.

Last few attempts have timed out on this step.

Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
The wait operation timed out

Hi Jeff,

I’m sorry to hear that applying retention policies is causing performance issues for you.

To assist our investigations, would be able to supply:

  • The task log from one of the “Apply retention policies” tasks (You can find these under the “Tasks” top-level menu).
  • The Octopus Server logs for the corresponding time period.

This will help us to determine the problem.

Regards,
Michael

Thanks for the reply, here is some more information from debugging today. We had previously been running the server with 4GB of memory, after bumping it to 8GB the task still times out but the service usually recovers afterward.

Analyzing the memory usage after the Deployment Manifest Retention stage begins shows the source of the memory usage. The server is bringing into memory a list of variable sets that doesn’t complete until it has exhausted all available memory and the task times out. The last snapshot I took shows that single list retaining 7.18 GB of memory.

Looking at the DeploymentManifestCleaner I see that it first queries all deployments (our installation currently has 11544) and then loads all related variable sets before checking the dashboard for candidate variable sets to delete. From the last snapshot there were only ~3100 variable sets loaded when the machine ran out of memory and the task timed out.

We could give the machine more memory for the task to complete and perhaps our variable sets are large (script templates are using quite some space) but it looks like this task could easily be updated to run through the deployments in batches by changing the code to use a paginated query.

ServerTasks-476586.log.txt (1 KB)

OctopusServer.txt (1 MB)

Hi Jeff,

Thank-you for supplying the log files.

Thank-you also for your investigations. I agree with your diagnosis. I have created an issue to resolve this, which you can follow.
We will implement this as soon as possible; hopefully within the next few days.

Thanks again, and we sincerely apologize for any inconvenience.

Regards,
Michael

Jeff,

This has now been implemented and will be released in Octopus version 3.10.1, which should be available early next week.

I would love to hear if it alleviates your memory usage.

Michael,

Thanks for the fix, 3.10.1 has solved the memory issue :slight_smile: Retention task is still timing out every other run or so; successful runs take about an hour.

Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
The wait operation timed out