Tentacle gets stuck, waiting on non-existing queued job

Occasionally during out deployment, a deployment gets stuck on a step, Octopus reporting that work is currently being done and the current work has been added to the queue. But there is no other work being done, and the job would just sit there forever.

to get things moving, we’ve had to cancel the deployment, stop the tentacle service, clear out the c:\octopus\tentacle\actors folder, and restart the tentacle service. then we would re-run the deployment, starting at the step where it had hung.

  1. why is it doing this
  2. what can I do to get the deployment working again without stopping and restarting the deployment?

I have attached a raw log of a deployment where this (and other problematic errors) occurred.

ServerTasks-18896.log.txt (232 KB)

Hi Mike,

Thanks for getting in touch! It appears you are having a conflict of trying to run multiple tasks on the one Tentacle without having a specific bypass set.
This is the last message in your logs:
18:28:41 Info | Another deployment-related activity is in progress on this machine (Running script '# clean out Past fol...'); the request has been added to a queue. This is a safety feature to prevent multiple deployments from running at the same time and causing conflicts. To override this behavior, create a project variable with the name 'OctopusBypassDeploymentMutex' and a value of 'True' and create a new release.
So it appears your deployment isn’t hanging so much as queued and unable to recover.
You will need to do the following: http://docs.octopusdeploy.com/display/OD/Run+multiple+processes+on+a+Tentacle+Simultaneously
If you have more than 1 deployment running to the machine/s in question you will need to add it for all projects.

Let me know how that goes
Vanessa

the point is that there ISN’T another task trying to run on that tentacle.

the script starting with “# clean out Past folder…” had completed a log time prior. there is only the one deployment running when this error occurs. it’s intermittent; most of the time it runs clean with no phantom task blocking things. but on occasion, something “sticks” in the tentacle’s task queue, and it thinks it is still working on a prior step, but isn’t actually.

so, how can I clear things so that the tentacle gives up waiting for this non-existent task to complete and move on with the next REAL step?

Hi Mike,

Sorry for not getting back to you sooner, this got moved out of my queue accidentally.
To find out what is running at once to cause this block you could set up more advanced logging on the Tentacle and we can see what is running in parallel.

Stop the Tentacle service.
Under c:\Program Files\Octopus Deploy\Tentacle find the NLog.config file and change:
from: <logger name="*" minlevel="Info" writeTo="octopus-log-file" /> to:<logger name="*" minlevel="Trace" writeTo="octopus-log-file" />
Restart the Tentacle service.

After you have this error appear grab the latest c:\Octopus\Logs tentacle logs and send them over.

You do not want this logging turned on for long as it could effect performance. So do it around a deployment where you predict this error will happen.

Vanessa

here’s another raw log. see the last two steps, that might give a clue.

ServerTasks-20929.log.txt (298 KB)

Hi Mike,

I am still going to need the tentacle logging change and tentacle logs. I would need them for all of the instances in that machine to determine what other process is running at the same time and causing the lock.
This should also help answer the other ticket you have open I think. So lets treat them both the same and use this ticket as the master.

Vanessa

unfortunately, we have lots of servers (about 300) and multiple tenant tentacles per server (up to 9 on some), and no way to predict which machine or tentacle is going to encounter this, so changing the logging setting for so many tentacles is really not feasible.

there is one server where it has happened twice, but it’s not isolated to just that server, so it could just be a co-incidence. still, once we’re past this weekend’s major release, i’ll change the logging settings on that server and try to trigger it.

the problem was caused by the fact that we had multiple tentacle installed on the server to allow parallel deployments, but due to having been installed with an early version of the tentacle installer, all tentacles were pointing to the SAME actors folder. this caused lots of file access problems as each tentacle fought for the same clock file, mutex file, and others.

we have reconfigured these servers’ tentacles to use separate folders, and the problem has gone away.

Hi Mike,

That is excellent news, thanks for letting us know the solution.

Vanessa