Make Octopus Tentacles more reliable

SKB_Kontur · 7 November 2016 09:09

Hi guys,

We’ve been using Octopus Deploy for one and half year. From time to time we face an issue when Octopus Tentacle service stops. In this case, Tentacle target becomes inactive and unreachable and the only way to bring it to life is to connect to machine via RDP and restart service.

Fortunately, we’ve found a solution. We create 2 tasks in schedule manager on each Tentacle target. This tasks run the following commands:

get-service "OctopusDeploy Tentacle" | foreach-object -process { SC.EXE config $_.Name start= delayed-auto}
get-service "OctopusDeploy Tentacle" | foreach-object -process { sc.exe failure $_.Name reset= 86400 actions= restart/5000/restart/5000/""/ }

This tasks solve 2 issues:

The first one set delayed startup for service, so it doesn’t fail when Windows starts.
The second task makes service restart on fail.

It seems, these tricks could be usefull for lots of people using Octopus Deploy. Could you include them in the following release?

Thank you in advance!

Kind regards,
Denis Titusov

StephenS · 7 November 2016 14:03

We had the same problem. We did something similar. We set the on-failure restart to keep trying to start the service every one minute if it failed to start. We used Octopus’s Script Console feature to push it out to all the already installed tentacles. And our install script for the tentacle now adds it after it installs tentacles.

Vanessa_Love · 8 November 2016 04:55

Hi Denis and Stephen,

Thanks for the information, it will be very beneficial to Octopus users.
To ask a question does this only happen after a windows restart, or does the Tentacle service stop without a Windows restart?

Vanessa

SKB_Kontur · 8 November 2016 09:32

Hi Vanessa,

It happens both when Windows starts and when Tentacle runs for a while. To be clear, it happens rarely, but we would like to decrease it to zero

Kind regards,
Denis Titusov

StephenS · 8 November 2016 13:50

Vanessa,

We usually see it after a server reboot. The event viewer log states something along the lines of “The service failed to start in the time allotted (30000ms)” or something like that.

We have seen it crash when there was no server reboot also. But that may have been from other causes like the virtual drive it was installed on being pulled from under it.

Since we have implemented the fail-restarts we have not seen the issue again… Though its only been roughly a week or so.

Vanessa_Love · 11 November 2016 22:58

Hi Stephen and Denis,

Thanks so much for the info. So it mostly seems it is around rebooting which is something we are continually trying to understand and make a fix for (it’s been through 4 developers here so far).

However the occasional crashing does need to be resolved it would be great if you could pick a couple of Tentacles (if there are any that more consistently go down) and turn on crash dumping for the service. Hopefully we can eliminate that from happening. http://docs.octopus.com/display/OD/Capture+a+crash+dump

Vanessa

SKB_Kontur · 15 November 2016 10:45

Hi Vanessa,

Thank you for your response!
We will capture dump and send it to you as soon this issue occurs again.

Kind regards,
Denis Titusov

StephenS · 15 November 2016 14:16

Vanessa,

Sadly not as simple fro us. It seems to be randomly occurring across all of our agents. So there is no way I can predict when and which agent will present the problem.

Vanessa_Love · 15 November 2016 22:55

Hi Stephen,

Yes I did believe that would be the case. My hope was more that for a few Tentacles turning on crash dumping and leaving it on, in the chance that one of them in the end has a crash. When it appears random and cannot be predicted its the best you can ask for. Turning on the feature does not impede in performance at all.

Vanessa