Linux tentacle not applying retention policy

andres.colodrero · 22 June 2023 11:06

Hi,
i have defined retention policy in a lifecycle. In the retention policy i defined:

Keep a limited number (5 Releases)
How long should we keep extracted packages and files on disk on Tentacles?
Keep a limited number (1 Release)

When this run on windows targes, i can see the last step “Apply retention policies on Tentacles”

But i dont see this in any of my linux targets and worker pools. Obviosly, disk space keep growing

clare.martin · 22 June 2023 11:41

Thank you for contacting Octopus Support and sorry to hear you are not seeing retention running on your Linux tentacles.

Unfortunately my Linux box has decided to give up on itself so I am just re-installing Ubuntu now on it and I will test this myself but in the meantime would you be able to share a deployment log of one of your Linux boxes please and I can take a look.

We do have a GitHub issue in at the moment where runbooks are not respecting the retention policy set and are defaulting to 90 days for file (package) retention (not release retention from project deployments) so I wanted to check its not a runbook you are running on the Linux target?

If I can get a tentacle log from your Linux box (this will be located in the area you installed your tentacle) and the raw deployment log from that deployment I can take a look at this for you.

I have created you a secure link here you can upload the files to, let me know once they have been uploaded and I can take a look at them.

Kind Regards,
Clare

andres.colodrero · 22 June 2023 12:02

Hi Clare and thanks for the answer.

I attached to that link the deployment logs and tentacle logs (it doesnt show much).
I can see that DeploymentJournal.xml is 11Mb and PackageRetentionJournal.json is 1Mb.

clare.martin · 22 June 2023 14:35

Hey @andres.colodrero,

Thank you for uploading those logs, I took a look and it looks like you are using the workers here to run the deployments on. I did some testing and if I run a script to run directly on a worker with a package reference I see what you are seeing:

If I run the same deployment on a worker on behalf of a target I see this:

And if I run the same deployment on a Linux Target directly then the package retention kicks in:

So you can see retention only happens if you are running the task directly on a target not on a worker or on a worker on behalf of a target:

This is because its the worker that stores the package not the target itself, in your specific deployment you sent over it looks like you are getting the packages from an external feed:

Downloading XXXX.Claims.XXXX.Server.ReleaseScripts v0.1.10.66 from NuGet feed 'OctopusDeploymentPackages' using cache policy UseCache

Octopus does not perform package retention on external feeds which is why I think your workers are not removing them as its grabbing the package from the external feed and placing it on the worker.

You can test this out by setting up a ‘Deploy a package’ step and deploy a test package from the Octopus built in feed to one of your Linux boxes (deployment target not worker) and you should see it perform package retention on it.

Unfortunately it looks like you are going to have to script the removal of those packages from the external feed to your workers. We have an example on how to do this in the documentation I linked about retention on external feeds.

Hopefully that helps, let me know if you have any other questions at all,
Kind Regards,
Clare

andres.colodrero · 22 June 2023 14:51

Thanks for your response.

Im not expecting to remove the packages from the feed, but from the folder “/Files” that keeps growing.

IM also using external feed on windows servers, and still getting removed the packages in the server where the tentacle runs. The only different, is that in windows servers im not using Workers.

In conclusion, doesnt workers apply retention policies?

clare.martin · 22 June 2023 16:12

Hey @andres.colodrero,

I cant find anything in our documentation that suggests we don’t run retention on workers, I also found this GitHub issue which suggests we should run retention on workers so I have sent that to our developers who should be able to confirm if this is a bug or not.

Thank you for raising this, I will get back to you as soon as I have more information.

Kind Regards,
Clare

clare.martin · 23 June 2023 08:51

Hey @andres.colodrero,

I have an update from our engineers about retention on workers, they said package retention does run on workers but its not part of the retention policy.

The way it works is it will keep storing packages until the worker starts to run out of space. Once out of space an algorithm will run and it will remove the packages that are “least used” to make room for the new packages.

They mentioned your logs where you still have space for packages and that’s why no clean-up is happening:

10:34:33 Verbose | Detected enough space for new packages. (4.662 GB/9.745 GB)

I assume you have a third party program or script that runs on each server that warns you when disk space is below a certain threshold and those workers have hit that threshold, is that why you noticed this originally?

I have enquired if we can put something in our documentation which shows how retention on workers works as even I assumed we ran the retention policy on them because of the way our documentation specifies it working on tentacles (not specifically deployment targets / workers).

I ran some more tests to improve my understanding of this and on a windows worker with the same project as I was using for the Linux ones I do actually see the cache being cleared but its not where you think it would be in the logs as its not a ‘step’ that runs a policy. Its a process that runs on start of the deployment.

The spot in your logs where its showing you have enough space is where it would show the cleanup if you met the threshold criteria for disk space usage.

You can see my first deployment run below which cleans up the JhubSite package ending in BDBB:

Cleaning 222.302 GB space from the package cache.
Removing package file 'C:\Octopus\Main Computer Tentacle Worker\Files\JhubSite@S1.0.3@C4D282796B747840BA663F054B89BDBB.zip'

And if I re-run that deployment a different package gets removed (ending in B711C)

Cleaning 221.931 GB space from the package cache.
Removing package file 'C:\Octopus\Main Computer Tentacle Worker\Files\JhubSite@S1.0.3@C006024D803EB54581D1D8E01C7B711C.zip'

It looks like this is the code we run for the package retention so if it goes under 30% disk space that’s when it runs retention by the looks of the code. So, in theory, your servers can get to 1.949GB of free disk space before it runs the clean up with your 9.745GB server. I have 16GB free on my 1TB laptop which the worker is on which is why its running clean up on my deployments.

Hopefully we can get something put in our documentation regarding this so its a bit more transparent what we do for workers regarding retention but if you need your servers to be above a certain threshold you would need to manually run a script to remove those files from the worker folder.

If there are any more questions you have surrounding this let me know and I will be happy to answer them for you.

Kind Regards,
Clare

andres.colodrero · 23 June 2023 09:23

Thanks Claire for your detailed response, i get better understanding.
I have the workers running on /opt, and i assign this partition a lot of disk space, but i got very often notifications about disks space of /opt being at 98% disks. Then i did some cleanup, of /opt/octopus/tentacles/(tentacle or workername/files), because it was keeping 16.000 files.

where this message come from? from temp directory? I guess this is only cache.
TempDirectory: /tmp/
04:34:12 Verbose | HostProcess: Calamari (747190)
04:34:13 Verbose | Detected enough space for new packages. (5.035 GB/9.745 GB)

Thats actually my disks after cleanup:o
Filesystem Size Used Avail Use% Mounted on
udev 8.4G 0 8.4G 0% /dev
tmpfs 1.7G 1.4M 1.7G 1% /run
/dev/mapper/ubuntu–vg-root 11G 5.1G 4.9G 51% /
tmpfs 8.4G 13k 8.4G 1% /dev/shm
tmpfs 5.3M 0 5.3M 0% /run/lock
tmpfs 8.4G 0 8.4G 0% /sys/fs/cgroup
/dev/sda2 1.1G 219M 732M 23% /boot
/dev/mapper/ubuntu–vg-opt 518G 345G 153G 70% /opt
/dev/mapper/ubuntu–vg-home 100G 28G 68G 29% /home
/dev/mapper/ubuntu–vg-var–lib 32G 9.8G 21G 33% /var/lib
/dev/mapper/ubuntu–vg-tmp 22G 168k 21G 1% /tmp
/dev/mapper/ubuntu–vg-var–log 27G 4.0G 22G 16% /var/log

clare.martin · 23 June 2023 11:36

Hey @andres.colodrero,

Thank you for getting back to me, yes the /tmp/ directory is for the cache so you would not need to worry about that.

The packages themselves do get put in /etc/octopus/*{nameoftentacleinstance}*/Files/ by default as you mentioned.

It looks like that scan you sent through is showing the folder below (which is where the /etc/ folder lives) as at 51% free space:

/dev/mapper/ubuntu–vg-root 11G 5.1G 4.9G 51%

So if you wanted to clear up any files on the worker manually you would need to run cleanup scripts on the /etc/octopus/*{nameoftentacleinstance}*/Files/ folder, taking into account the name of the worker instance and the different instances if you have more than one on that worker.

I would air on the side of caution with any delete files scripts and delete files that are over 10 days old or so just so you have some left in the cache so your deployments still run quickly if you are using the same package from a few days ago.

One thing I did see when looking at your disk space, the /opt/ folder is 345GB which is huge for that folder as it should only contain the Octopus install files for the tentacle, in contrast mine is only 83mb with one tentacle installed on it.

Are you able to check that folder and see what else is in there, did you change the install folder to /opt/ from /etc/ on the tentacle install?

One thing you could potentially do is have the files from the tentacle workers put onto a smaller drive on the linux boxes so the file cleanup would get run more frequently on those servers without having to manually script the removal, your servers wont run out of disk space then as it should only put the packages in the new smaller drive which should be partitioned and wont affect the main linux box partitions.

Hopefully that helps but let me know if you have any more questions regarding this.
Kind Regards,
Clare

andres.colodrero · 23 June 2023 13:42

OK, i would try to set the tentacle into a smaller partition. But im not sure it will work. As i mentioned, the folder /opt/ was at 99% disks and the tentacles didnt execute the clean up.
is it probable because, even if i select "store log files? /opt/octopus), the workers will be checking into /etc/octopus disk space).

Anyways, i appreciate your help.

dom.richardson · 23 June 2023 13:52

Hi @andres.colodrero

I’m just stepping in for Clare to keep things moving.

Keeping the home directories separated from workers and deployment targets should resolve any issues with normal deployment targets in the future.

We have had a reply from our engineers that says that workers will automatically detect if there not enough room on the drive, and will automatically clear up and files that are not used in any release or runbook snapshot.

Using a smaller drive will force the worker to clean up the files more frequently as the drive will fill up quicker.

Hopefully this gives a bit more clarity.

Kind Regards,
Dom.

andres.colodrero · 23 June 2023 14:15

Thanks,
Then, in my case, i can assume that the worker never cleanup disk space and i got issues on the server because of that.
So my question is, what filesystem and threshold is monitoring the worker? If the worker is monitoring the whole disk size or /etc, it wont catch any issues on other filsystems/mounts

clare.martin · 23 June 2023 14:49

Hey @andres.colodrero,

Thanks for the follow up questions, from the code I sent over in the link it looks like the retention threshold is checked across the whole disk partition:

const string PackageRetentionPercentFreeDiskSpace = "OctopusPackageRetentionPercentFreeDiskSpace";
const int DefaultPercentFreeDiskSpace = 20;
const int FreeSpacePercentBuffer = 30;

In my case my C drive is 1TB and it has 16GB left which is why its running the cleanup, but, it only runs the cleanup when the disk has 30% space left and it will only delete one package at a time from the worker instance /etc/octopus/*{nameoftentacleinstance}*/Files/ location, it wont touch any other folders and will only delete one file per deployment.

So, in your case you are going to have to look at your /opt/ area and work out why that is so large and manually reduce the size of it so it frees up space on the server.

Putting the files into a different partition will mean that the cleanup will run on just that partition as the disk space on that partition will be smaller and will fill up quicker.

The octopus cleanup will only clean packages that its put in that files folder, all other files are either tentacle install / configuration files, or files not related to the tentacle so Octopus wont touch them.

Your best bet would be to use a program like treesize if that can be used on a Linux box and see where the larger files are located and then work out what they are and manually delete them if required.

I hope that helps,
Clare

system · 24 July 2023 14:49

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.