Apply retention policies task causing slowness in the UI

Hey @mikepower79,

I got an update from the engineers I want to share with you, they have looked through your logs and found:

  • You have 1 retention policy task that actually does take a while in the default space Spaces-1. Presumably this is the primary space you use?
  • All the other retention policy tasks complete in 0-1 seconds. Not sure why they show up on the UI as running until the first one completes, probably something we need to look into further but will be mitigated by the below bullet point.
  • There are so many retention policy tasks running because they’re running for each private space - one of our teams already has a PR up to fix this, the PR is private and it does look like a change is in place and its merged but there is a conflict so the engineers need to do their tests before that change gets rolled out.

The engineers have said it would be good to get your task logs for:

  • The “Apply retention policies” task from the default space Spaces-1
  • Any other “Apply retention policies” task, just to see what’s going on there

So unfortunately its the dreaded private spaces causing this (we think) because a retention policy runs for each space and you have thousands of them. Luckily the engineers were aware this was happening so are a few days into getting a fix out for it.

I am really sorry all this is happening to you because of those Private Spaces, it looks like we are quite close to getting those removed for our on-prem customers which is some good news at least.

Are you able to get those files for us the engineers requested, they may be able to provide you with a workaround until they can merge the PR and get the fix put into a build? The Secure Link I gave you earlier still works so let me know once those files have uploaded and I will get them to the engineers.

Kind Regards,

Clare

1 Like

Hi @clare.martin,
Can you send a link and I will post the Apply retention logs for another space.
Our default space in this instance of Octopus is not running. The default space (spaces-1) is still using on-prem Octopus instance. They will be migrated to a new instance in a few weeks. So actually the Spaces-1 Apply Retention Policy task does not run. It runs for other tasks and I will upload one of the logs.

Kind Regards,
Micheal Power

Hey @mikepower79,

Thank you for that extra information, I will take that back to the engineers who can use it to further their investigations on this subject.

Here is a new secure link for you, as always, let us know when the logs are uploaded and I shall get them to the engineers.

Kind Regards,

Clare

1 Like

Hey @mikepower79,

The engineers have looked at your retention policy logs and said It does look like at the start of the task its taking 18 minutes to do… something. They are not sure if it’s private space related or if it always takes that long.

Private spaces are actually being removed hopefully at the start of next week with a fix coming out around then, the removal fix is being backported to 2022.2 and 2022.1 so it will be like Private Spaces never existed.

Your Octopus instance will run tasks in the background which will delete each private space separately from the DB and within the Octopus UI itself. The engineers did say the removal process runs every minute and deletes a single space per run, so it will take 1000 minutes for 1000 private spaces.

The engineers have asked for your full Octopus server logs if you are happy to provide those to the secure link I gave you above but I wonder if you would rather wait until next week for the new 2022.2 build to come out with the Private Space removal in?

Once you upgrade that may help determine whether the issue is due to the large number of spaces or something else.

If the issue still persists we can get your logs and troubleshoot without the need to look into Private Spaces, let me know what you think. If you are willing to wait and then do run the upgrade remember you will need to wait a day or so for the removal to complete fully, hopefully then a lot of your issues you have logged over the past few weeks will be solved for you.

Again I can only apologise because this has been a bit of a nightmare for you! Hopefully the new release will mean your instance is back to a more normal state (your log size should hopefully shrink too).

Kind Regards,

Clare

Hi @clare.martin,
Yes I think we will wait for the new release. Hopefully the issues will be resolved.
Can you let me know when the release will be available.
Thanks again for all your help.

Kind Regards,
Micheál Power

1 Like

Hey @mikepower79,

I will keep checking on the engineering channel we have.

If you keep an eye on this page too that’s where we put the new downloads as you might see it before I do.

No problem on the help, thats what we are here for, I really do hope the Private Spaces are the issue here but if it turns out they are not we can get your server logs and start digging through them, we have a lot of information from you already so that will be half the battle won.

Kind Regards,

Clare

1 Like

Hey @mikepower79,

Just to let you know the latest version of Octopus Server 2022.2.7897 is out now on the downloads page and is the one with private spaces deleted.

As we mentioned, a script will run every minute which will get rid of one private space at a time so when you install it it would be worth waiting a week to fully see the benefits. If the issues you are experiencing still exists we can go back to the engineers and re-tackle them with the private spaces not being in their minds as the potential issue.

Kind Regards,

Clare

1 Like

Hi @clare.martin,
Thanks for letting me know.
We will upgrade to the latest 2022.2.8011.

Kind Regards,
Micheál Power

1 Like

Hi @clare.martin,

We have upgrade our Octopus version to 2022.2.8011. This did remove all the individual spaces but we are still seeing the issue with the Octopus UI slow response while the Apply Retention Policies job is running for each space. The job runs for 20 mins approx and runs every 4 hours. (Can this job be disabled?) The DTU’s in our DB also still spike to 100% for the duration of the Apply Retention Policies job is running.

I have run a query on the DB to get the longest running queries that are running on the DB while the Apply Retention Policies job is running and get below returned.

I have attached the full text of the query which is returned.
DB_Query.txt (10.1 KB)

Kind regards,
Micheál Power

Hey @mikepower79,

Good to know you managed to upgrade, there were a few issues with your instance that could potentially be narrowed down to Private Spaces so I hope the upgrade at least had some benefit for you, even if it just cleaned up the DB a bit.

I am really sorry to hear the upgrade has not solved this issue for you though, it completely rules out Private Spaces which is good as our engineers can focus their attention on other possible causes. That is a huge CPU spike, there is definitely something not quite right there.

I am sorry to have to ask you this but I have created you a secure link here as the old one expired, are you able to get us a copy of your Octopus Server logs please and I can update the engineers with what you have found.

Can you let us know once the logs have been uploaded and I will get them to the engineers ASAP.

Kind Regards,

Clare

Hi @clare.martin,
I have uploaded Octopus server logs.

Kind regards,
Micheál Power

1 Like

Hi Micheál,

Thank you for uploading this, I’m just stepping in for Clare as she has gone offline for the day.

I have relayed this log to our engineering team via the thread Clare created earlier, and we will reach back out as soon as we have any updates.

Regards,

Britton

Hey @mikepower79,

Another update for one of your tickets has come through so thought I would share it.

The engineer has put together a GitHub Issue (which is private at the moment whilst we investigate so I cannot share it with you).

He is going to tackle this issue as a wider part of the engineers’ work they are doing on improving performance but does not have a set date for this as of yet. This seems to be related to the amount of releases a customer has as we load all releases when running our retention policies. The more releases you have, the slower the policy will take to run and the more impact it will have on the DB.

Do you have lots of releases for each project? If so this is potentially why the retention policy is taking so long and making the UI slow.

There is no way to remove the retention policies or disable them but you could potentially change your retention policies for releases and see if that helps. What are your retention policies for releases, are you needing to keep lots of them per project or could you perhaps scale it down to keep the last 3 or 4?

The fix for this will not be worked on just yet as it is part of a larger piece of work we will look into and is not just a small code change, it will impact one of the ways Octopus runs so I wanted to let you know this will be worked on soon, but for now we need to find more of a workaround for you at this moment.

Are you able to apply stricter retention to your releases? If not, how many do you have roughly per project?

I look forward to hearing from you,

Kind Regards,

Clare

Hi @clare.martin,
Yes we do have a lot of releases for each project, so this may be the reason.
How do I go about changing the retention policies for releases, is this at a project level or space level?

Kind regards,
Micheál Power

Hey @mikepower79,

Good question on releases for retention policies. You may have seen this already since you are an experienced Octopus user but our documentation on retention policies has sections in there for releases and how you can clean those up and explains the process a lot better than I could over a forum post. We also have this section which further explains the process. I hope you don’t mind me linking them.

Let me know if you need further clarification on setting those up. I would start high first, if you have a lot of releases and set your policies to say ‘Only keep the last 3 releases’ your retention policies will massively slow your instance down (more than it is already doing) because it has to go through each project and delete a lot of releases.

If you set it to keep say 10 releases and you have 13 per project it will go through and delete three out of each project, once a few days has passed you can check your retention policy tasks and see if its showing no releases need to be deleted. Then drop the value from 13 to 6, repeat etc until you have the right number of releases. Note the part in our doc which says Octopus will never delete releases on a dashboard and will keep the current and previous release (for rollback) so you will never be able to delete a current release or the one prior, which makes this fairly safe to implement.

Hopefully cleaning those up helps, let us know if it doesn’t though.

Kind Regards,

Clare

Hi @clare.martin,
Thanks for the detailed feedback.
I will pass on the relevant information to the space managers so we can do a tidy up and hopefully see some improvement.
Also keep me posted on the Github issue.

Kind regards,
Micheál Power

1 Like

Hi @clare.martin,
Is there also any default setting for the Runbook retention?
It is set to 100 per environment by default and this does not seem to be link to the lifecycle.

Kind regards,
Micheal Power

Hey Micheál,

Runbooks have retention policies that are separate from the Lifecycle retention policies, so you’re right they’re not linked to Lifecycles.

If you’re looking for where to configure the runbook retention policies, you can find these under Project > Runbook > Settings > Retention Policy:

image

Unfortunately there isn’t a way via the UI to globally configure this setting, but this could be done via the REST API. The following help post demonstrates how to do this:
https://help.octopus.com/t/can-i-set-a-global-runbook-retention-policy/25577/2

I hope that’s helpful, but let me know if you were looking for something else or if I didn’t fully answer your question.

Best,
Patrick

1 Like

Hi @patrick.smergut,
Thanks for the detailed response.

Kind Regards,
Micheál Power

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.