Deployment process attempts to delete folders deployed in the same release

hallgeir.osterbo · 9 August 2022 13:38

Hi,
We have a very odd issue where the Apply Retention Policy step deletes the folders that were just deployed in the steps before it.

The deployment process in itself is a bit odd in that it deploys the same package multiple times on different roles, but with a bit different configuration for each. For our dev environment, those roles happen to be on the same server, and that’s where we see this issue.

In short, the deployment process does the following:

Deploy package as a windows service on RoleA
Deploy package as a second windows service on RoleA (same config - just having two services per server)
Deploy package as a windows service on RoleB (various different config compared to previous)
Deploy package as a second windows service on RoleB
Deploy package as a windows service on RoleC (various different config compared to previous)
Deploy package as a second windows service on RoleC
There can be multiple targets with each of these labels, but in the failing case that is not the case.

After this, the retention policy steps starts cleaning up previously deployed packages. There’s one retention policy step for each “Deploy a Windows Service” step that is created automatically by Octopus. Sometimes, it identifies packages that was from the previous release - great! But sometimes, each of the apply retention policy steps starts deleting the packages that was just deployed in the other steps. It seems that for instance the Apply retention policy step “belonging” to step X starts cleaning up the package folder that was deployed in step Y. And when in our dev environment all steps are deployed on the same server, it’s all going down the drain with apply retention policy steps deleting package folders left and right. Of course the Apply retention policy step is failing to completely delete the folders, because they are in use, but we’re left with a bunch of non-functioning services.

For the record, we have a fairly tight retention policy – keeping 1 release in addition to the current, and package retention policy of 1 day.

I have a log if needed (if you have a secure place I can upload it).

I think this must be a bug, right? And is there a workaround that we can use here?

Thank you!

Justin_Walsh · 9 August 2022 15:12

Hi @hallgeir.osterbo!

Thanks for reaching out, and sorry to hear that you’re having issues with your package retention. Owing to the multiple ways of modeling this scenario, I do think that taking a look at your task logs, and deployment process JSON would be a great start to help get you answers. In general, if it’s 6 steps within a single process, retention shouldn’t run on those while the current deployment is in process. I’ve created a secure upload location for you HERE, if you would like to send that data through, and I’ll take a look.

Looks forward to hearing from you soon!

hallgeir.osterbo · 10 August 2022 05:09

Thanks for getting back to me on this!

I’ve uploaded both the deployment process and the log (I’ve anonymized the server names due to our security policy in the logs). Note that for the dev environment only steps 1, 2, 5 and 6 is being run. You will see in the logs clearly that the retention policy is deleting other steps’ deployment folders.

paul.calvert · 10 August 2022 08:20

Hi @hallgeir.osterbo,

Thanks for sending those over. I’ve looked through the logs, and this seems like odd behaviour.

The retention policy works by referencing the DeploymentJournal.xml file on the tentacle to determine when a package was deployed. Each package folder within that file is linked to the step that deployed it, so, even though your steps are using the same package, on the disk, each step downloads that package to a separate folder that should then “belong” to that step.

e.g. this is from my deployment journal when I deployed the same package with four steps

<Deployment Id="5d8b1c25-efb1-48dd-8a05-6ed3bb851c7e" EnvironmentId="Environments-742" TenantId="" ProjectId="Projects-1026" InstalledOn="2022-08-10 07:38:59" RetentionPolicySet="Environments-742/Projects-1026/Step-Deploy a Package/Machines-1481/<default>" ExtractedTo="C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_10" CustomInstallationDirectory="C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_10" WasSuccessful="True">
<Package PackageId="sleep" PackageVersion="0.3" DeployedFrom="C:\Octopus\Cloud Poll\Files\sleep@S0.3@9A1956F25FF0A446B5B79513073DE97E.zip"/>
</Deployment>
<Deployment Id="aa32ee87-201f-435f-bcf8-f87263c95438" EnvironmentId="Environments-742" TenantId="" ProjectId="Projects-1026" InstalledOn="2022-08-10 07:39:07" RetentionPolicySet="Environments-742/Projects-1026/Step-Deploy a Package - clone (3)/Machines-1481/<default>" ExtractedTo="C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_11" CustomInstallationDirectory="C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_11" WasSuccessful="True">
<Package PackageId="sleep" PackageVersion="0.3" DeployedFrom="C:\Octopus\Cloud Poll\Files\sleep@S0.3@9A1956F25FF0A446B5B79513073DE97E.zip"/>
</Deployment>
<Deployment Id="c352eaee-ca24-4c94-8d8d-37b53fe0beb4" EnvironmentId="Environments-742" TenantId="" ProjectId="Projects-1026" InstalledOn="2022-08-10 07:39:15" RetentionPolicySet="Environments-742/Projects-1026/Step-Deploy a Package - clone (2)/Machines-1481/<default>" ExtractedTo="C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_12" CustomInstallationDirectory="C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_12" WasSuccessful="True">
<Package PackageId="sleep" PackageVersion="0.3" DeployedFrom="C:\Octopus\Cloud Poll\Files\sleep@S0.3@9A1956F25FF0A446B5B79513073DE97E.zip"/>
</Deployment>
<Deployment Id="b1ab838d-c64b-44c4-a3c5-4b678182d21e" EnvironmentId="Environments-742" TenantId="" ProjectId="Projects-1026" InstalledOn="2022-08-10 07:39:23" RetentionPolicySet="Environments-742/Projects-1026/Step-Deploy a Package - clone (1)/Machines-1481/<default>" ExtractedTo="C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_13" CustomInstallationDirectory="C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_13" WasSuccessful="True">
<Package PackageId="sleep" PackageVersion="0.3" DeployedFrom="C:\Octopus\Cloud Poll\Files\sleep@S0.3@9A1956F25FF0A446B5B79513073DE97E.zip"/>
</Deployment>

So, when the retention runs on this deployment, package C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_13 should only be able to be deleted when the retention policy for step Deploy a Package - clone (1) runs.

It would seem that somehow your tentacle thinks that some of your packages belong to multiple steps, allowing it to remove them incorrectly.
The only way I’ve been able to reproduce this behaviour is by manually deleting some folders from the applications directory (e.g.C:\Octopus\Applications\Stage\convert-worker).
e.g.

I run a deployment with four deploy package steps which creates folders C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_2 to C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_5
I run this again, which creates the next four incremented folders - this sets up 0.3_2 - 0.3_5 to be deleted on the next run
I manually delete C:\Octopus\Applications\Cloud Poll\Dev\sleep\0.3_3
I run the deployment, and the first step uses the lowest available increment (0.3_3)
This results in 0.3_3 existing in the deployment journal twice, once for step 2 and once for step 1
When the retention policy runs, it deletes 0.3_3 because the journal entry for step 2 says it is outside the retention policy; it isn’t able to check that the package also belongs to step 1

Is it possible that something like this could have occurred on your tentacle? That some of these folders may have been manually deleted at some point?

One possible workaround for this would be to change your tentacle retention policy to use one day instead of one release. This should keep the folder size relatively small without the risk of deleting packages from the same deployment.

I will also pass my findings to our engineers to see if we can adjust this process to account for folders existing within the deployment journal more than once.

Regards,
Paul

hallgeir.osterbo · 10 August 2022 10:05

Hi @paul.calvert ,

This is very interesting. And I think you may have hit the nail on the head here!

On our dev environments, we do have some extra cleanup that we do, because we’ve had issues with space being filled up on our deployment targets, especially when there’s a bunch of failed deploys (then it seems the retention steps are not run, at least that used to be the case). That means that if we had introduced some issue that failed the deployment, and it ran e.g. across the weekend (multiple deploys per day), the disk would be full when we were back. So to avoid that we have a runbook that runs nightly that deletes all folders under C:\Octopus\Applications that are not used by any of our services.

What would be your advise here? Is there some other way we can ensure retention policy is applied in the case where we have e.g. a series of failing deployments, so that we don’t have to keep running that cleanup runbook? Or would the best plan here be to also delete the Deployment nodes from DeploymentJournal.xml containing the folders that we are deleting, when deleting the folders, to prevent this?

Sincerely
Hallgeir

paul.calvert · 10 August 2022 10:37

The easiest options here would be to either increase the number of releases to be kept slightly to mitigate the problem or switch it to 1 day instead of 1 release, which should circumvent the problem entirely.

I wouldn’t want to advise manually editing the deployment journal for this task. Any errors in that file could have a much larger impact.

This would be a more complicated option, but you could look at using Custom Installation Directories within your deployment steps. This would result in your applications running from a different location than the applications folder, making them immune to any deletions from retention policies.

The only other solution would be to increase the disk size to accommodate for a weekend of failed deploys until you can successfully run them and have the retention trigger.

hallgeir.osterbo · 10 August 2022 12:14

Ok, then I think we can solve it. We’ll look into your proposed solutions on our end. Thanks a lot for your help on this!

paul.calvert · 10 August 2022 12:24

No problem.

As mentioned, I have passed this along to our engineers for review. It would be nice if we could check the deployment journal to ensure that we don’t create a folder that already exists, but I’m unsure how feasible that is (or how high up the priority list it would get ). If they commit to making any changes in this area, we’ll update this thread with details.

hallgeir.osterbo · 10 August 2022 12:35

That’s great! Thanks a lot for that

system · 10 September 2022 12:36

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.