Getting frequent errors from octopus during deployment, why?

during out deployments, on steps that run a powershell on a server, we often (once or twice in each deployment) get an error back a tentacle that says:

The actor ProcedureCallOrchestrator-ATk-AeOL8gxntA@SQ-ENCSTAGEBAT01-0CEB3E41 cannot handle failure Pipefish.Messages.Timing.SetTimeoutCommand: The message could not be handled
System.IO.IOException: Unable to move the replacement file to the file to be replaced. The file to be replaced has been renamed using the backup name.

telling octo to retry usually works, but why is this happening so often?

Hi Mike,

Thanks for getting in touch!

This may be due to a permission error or an issue with antivirus or something similar.

Is there an error message that immediately precedes this one?

Damo

sometimes the error happens repeatedly, requiring a several retried before it resumes working properly. I’m attaching a file with the full text of the error. also in verbose mode, and the raw log.

it actually occurred three times during this deployment

error1.txt (2 KB)

verbose.txt (8 KB)

ServerTasks-18588.log.txt (287 KB)

Hi Mike,

Thanks for the reply.

We sometimes see this problem when monitoring or anti-virus software is running on the machine. It can lock files that Octopus needs to write to to enable tasks to persist if the machine restarts.

Do you have any anti-virus or monitoring software that could be locking files by reading/scanning them? If so, are you disable them to test, or exclude the Octopus installation folders from the scans?

Damo

we have set our antivirus to exclude the c:\octopus folder on all servers. these errors continue to occur.

any other thoughts? thanks.

Hi Mike,

Can you try deleting the C:\Octopus\Tentacle\Actors\Clock.pfa file? There may be some corruption of permissions. The tentacle be able to should recreate it.

Can you also tell me whether there are multiple deployments to that tentacle or whether you have steps running in parallel on that machine? Either of these could potentially cause a clash when accessing that file.

Thanks,
Damo

on most of our servers, we have multiple tentacles installed, one for each tenant (currently 9), as well as the original default tentacle.

when we do an install, yes, things are often operating in parallel on different tentacles.

Hi Mike,

Thanks for the reply.

Multiple tentacle instances on the same machine shouldn’t clash with each other. It’s only if a single tentacle is running multiple things in parallel.

Can I ask whether your deployment process has any steps that are set to run in parallel that will target the same tentacle instance?
For example, you may have a nuget package step and a powershell step that both run on machines with the same role and are set to run in parallel. That could be a potential cause of a clash as both tasks will try to run at the same time on each tentacle.

Damo

no, we don’t do any parallel execution with the same tentacle.

it just seems to a rather fundamental problem if Octopus is having problems maintaining it own files, the ones it uses to track its own work. and it happens reasonable frequently for us, as I said, usually at least once, usually several times, per deployment on our production network.

as an aside, we do not get this problem on our QA network. I don’t know the details, but I knew fewer network policies and such are applied on the QA network.

Since it doesn’t appear to be anti-virus caused, are you aware of anything else, like network policy settings, that might be interfering with Octopus’ operation?

Hi Mike,

Apologies for the delay in responding. This slipped through my net and I didn’t see your response.

In general, the only issues we have like this are caused by other processes locking the file when the Tentacle is working with it. Usually it’s antivirus software or monitoring.

As long as a Tentacle is doing only one task at a time, it should have extremely predictable control over that file, and no other Tentacles should interfere with it (they all have their own clock.pfa).

If it’s happening in your production environment but not QA, you may have to have a deep look at the differences between those environments. In particular monitoring or auditing tools.

Thanks,
Damian Brady

the problem was caused by the fact that we had multiple tentacle installed on the server to allow parallel deployments, but due to having been installed with an early version of the tentacle installer, all tentacles were pointing to the SAME actors folder. this caused lots of file access problems as each tentacle fought for the same clock files, and others.

we have reconfigured these servers’ tentacles to use separate folders, and the problem has gone away.

Hi Mike,

Thanks for letting us know, and that’s a good catch!

I’m glad you got it sorted.

Kind Regards,
Damo