Apply retention policies task causing slowness in the UI

Hi,
We are currently running version 2022.2.6971, are we are seeing an issue when the Apply retention policy tasks are running.
While the tasks are running the UI is very slow and I have noticed that a high number of tasks will appear that are waiting in the queue. Then when I select the waiting in queue the UI does not respond. Then the high number of tasks in the queue will suddenly drop to a single figure.
The Apply retention policy tasks run every 4 hours and is impacting the UI response.

Also the DTUs are spiking to 100% while this is happening.

Kind Regards,
Micheál Power

Hi @mikepower79,

Thank you for contacting Octopus Support. I’m sorry you are running into this issue.

I hope you don’t mind, I have a few clarifying questions:

  • Approximately how many “Apply retention policies” tasks are being created at once?
  • Is this only occurring once every 4 hours?
  • Do you have any automation in place that re-runs tasks?
  • Are any other tasks exhibiting similar behavior?

You may get additional information by clicking on one of the “Apply retention policies” tasks and checking the pane on the right side. Here is an example that shows a user re-ran the task:

I look forward to hearing back from you.

Best Regards,
Donny

Hi @donny.bell,
There are 24 Apply Retention Policies task that ran at 2:15PM today.
Yes every 4 hours
No automation in place that re-runs tasks

As you can see the Apply Retention Policies tasks take about 20 minutes to complete

Also the Task history is empty on each task , so the task was no re-run

Kind Regards,
Micheál Power

Hi @mikepower79,

Thank you for the quick response.

Could you upload a copy of your Server Logs via this secure upload link?

If we don’t find any hints in the Server Logs, I may need to lean on our Development Team to get their eyes on this issue.

Let me know once you are able to upload the logs and I’ll have a look.

Best Regards,
Donny

Hi @donny.bell,

I will send on logs to you, I have to get logs from each node as Octopus is running on AKS.
I have seen in the Diagnostics that spaces = 1185, and we only have about 30 spaces.
Why is it showing such a high number of spaces?

Would this have anything to do with the slowness when Apply Retention Policies is running?

Kind Regards,
Micheál Power

Hey @mikepower79,

Donny is away today so we will await the logs you send over. I just wanted to alleviate your concerns on the amount of spaces your Octopus instance is showing.

I imagine you have quite a few users on your instance? We introduced a feature called ‘Private Spaces’ which was rolled out to our cloud customers. The idea was to give each user their own private space so they could test out their own projects away from production spaces, each users private space was automatically added when an account was created.

Our engineers passed this feature onto our On-Prem customers a little too early and there were a few issues surrounding their introduction, they disabled private spaces for on-prem customers in the DB and in some bits of Octopus (each user still got a space in the DB but they couldnt access it in the Octopus UI). Unfortuantely this still created issues so the engineers have completely disabled this feature so it doesnt get created in the DB for each user.

This disabling has yet to reach the current On-Prem installs though, so, long story short, those spaces I imagine are just your users private spaces, hence why the number is so high. I hope that helps alleviate any concerns you have here.

We have not seen private spaces influence retention policies but will keep that in mind when looking at your logs. Since they are private spaces, retention policies wont apply to them anyway so that should not be the issue here.

Let us know when you are able to upload the logs and we can take a look at them.

Kind Regards,

Clare

Hi @clare.martin,
Thanks alot for the feedback. That’s good to know.

Kind Regards,
Micheál Power

1 Like

Hi @donny.bell,
Can you share a link to upload the server logs please.

Kind Regards,
Micheál Power

Hey @mikepower79,

Here is your secure link to send us the logs.

Can you let us know once you have uploaded them and I will take a look.

Kind Regards,

Clare

Hi @clare.martin,
Octopus container logs uploaded.

Kind Regards,
Micheál Power

1 Like

Hey @mikepower79,

Thanks for sending those over it seems like sometimes the retention policy task can take over 5 seconds to load which is quite a long time. I did find one for 15 and 19 seconds:

I am going to get this in front of our engineers as I don’t think that query should be taking that long to execute. This is what is causing the slowness for the page loading / crashing out I imagine.

I will let you know any outcomes or suggestions our engineers have, please reach out in the meantime if you have any other questions.

Kind Regards,

Clare

1 Like

Hi @clare.martin,
Any update on this ticket?

Kind Regards,
Micheál Power

Hey @mikepower79,

I have just requested an update from the engineers regarding this issue, I will let you know when I receive anything from them.

Kind Regards,

Clare

1 Like

Hey @mikepower79,

I got an update from the engineers I want to share with you, they have looked through your logs and found:

  • You have 1 retention policy task that actually does take a while in the default space Spaces-1. Presumably this is the primary space you use?
  • All the other retention policy tasks complete in 0-1 seconds. Not sure why they show up on the UI as running until the first one completes, probably something we need to look into further but will be mitigated by the below bullet point.
  • There are so many retention policy tasks running because they’re running for each private space - one of our teams already has a PR up to fix this, the PR is private and it does look like a change is in place and its merged but there is a conflict so the engineers need to do their tests before that change gets rolled out.

The engineers have said it would be good to get your task logs for:

  • The “Apply retention policies” task from the default space Spaces-1
  • Any other “Apply retention policies” task, just to see what’s going on there

So unfortunately its the dreaded private spaces causing this (we think) because a retention policy runs for each space and you have thousands of them. Luckily the engineers were aware this was happening so are a few days into getting a fix out for it.

I am really sorry all this is happening to you because of those Private Spaces, it looks like we are quite close to getting those removed for our on-prem customers which is some good news at least.

Are you able to get those files for us the engineers requested, they may be able to provide you with a workaround until they can merge the PR and get the fix put into a build? The Secure Link I gave you earlier still works so let me know once those files have uploaded and I will get them to the engineers.

Kind Regards,

Clare

1 Like

Hi @clare.martin,
Can you send a link and I will post the Apply retention logs for another space.
Our default space in this instance of Octopus is not running. The default space (spaces-1) is still using on-prem Octopus instance. They will be migrated to a new instance in a few weeks. So actually the Spaces-1 Apply Retention Policy task does not run. It runs for other tasks and I will upload one of the logs.

Kind Regards,
Micheal Power

Hey @mikepower79,

Thank you for that extra information, I will take that back to the engineers who can use it to further their investigations on this subject.

Here is a new secure link for you, as always, let us know when the logs are uploaded and I shall get them to the engineers.

Kind Regards,

Clare

1 Like

Hey @mikepower79,

The engineers have looked at your retention policy logs and said It does look like at the start of the task its taking 18 minutes to do… something. They are not sure if it’s private space related or if it always takes that long.

Private spaces are actually being removed hopefully at the start of next week with a fix coming out around then, the removal fix is being backported to 2022.2 and 2022.1 so it will be like Private Spaces never existed.

Your Octopus instance will run tasks in the background which will delete each private space separately from the DB and within the Octopus UI itself. The engineers did say the removal process runs every minute and deletes a single space per run, so it will take 1000 minutes for 1000 private spaces.

The engineers have asked for your full Octopus server logs if you are happy to provide those to the secure link I gave you above but I wonder if you would rather wait until next week for the new 2022.2 build to come out with the Private Space removal in?

Once you upgrade that may help determine whether the issue is due to the large number of spaces or something else.

If the issue still persists we can get your logs and troubleshoot without the need to look into Private Spaces, let me know what you think. If you are willing to wait and then do run the upgrade remember you will need to wait a day or so for the removal to complete fully, hopefully then a lot of your issues you have logged over the past few weeks will be solved for you.

Again I can only apologise because this has been a bit of a nightmare for you! Hopefully the new release will mean your instance is back to a more normal state (your log size should hopefully shrink too).

Kind Regards,

Clare

Hi @clare.martin,
Yes I think we will wait for the new release. Hopefully the issues will be resolved.
Can you let me know when the release will be available.
Thanks again for all your help.

Kind Regards,
Micheál Power

1 Like

Hey @mikepower79,

I will keep checking on the engineering channel we have.

If you keep an eye on this page too that’s where we put the new downloads as you might see it before I do.

No problem on the help, thats what we are here for, I really do hope the Private Spaces are the issue here but if it turns out they are not we can get your server logs and start digging through them, we have a lot of information from you already so that will be half the battle won.

Kind Regards,

Clare

1 Like

Hey @mikepower79,

Just to let you know the latest version of Octopus Server 2022.2.7897 is out now on the downloads page and is the one with private spaces deleted.

As we mentioned, a script will run every minute which will get rid of one private space at a time so when you install it it would be worth waiting a week to fully see the benefits. If the issues you are experiencing still exists we can go back to the engineers and re-tackle them with the private spaces not being in their minds as the potential issue.

Kind Regards,

Clare

1 Like