Post-upgrade failures in octopus tentacle health check

When initiating upgrade of tentacles via the Octopus Portal, we’ve seen issues where the tentacles seem to go off-line and not come back up on their own. Usually, at least 30% of our tentacles do not come back up. When we query the service in these instances, we find its status is stopped, and that we can start it without any problem.

We’ve experienced this with at least three separate upgrades, including one we executed yesterday, from 1.0.26.1320 to 1.0.30.1340. The upgrade task itself executed with “success”, and all subsequent health checks failed, according to our Tasks history. On one server whose tentacle service was not running, post-upgrade, we saw this in the event logs:

2012-08-28 18:33:25,104 [1] ERROR Octopus [(null)] - System.InvalidOperationException: Cannot stop Octopus Tentacle service on computer '.'. ---> System.ComponentModel.Win32Exception: The pipe has been ended
   --- End of inner exception stack trace ---
   at System.ServiceProcess.ServiceController.Stop()
   at Octopus.Shared.Startup.ServiceInstaller.StopAndWaitForStop(String name) in c:\BuildAgent\work\7bf5272a44079f5\source\Octopus.Shared\Startup\ServiceInstaller.cs:line 597
   at Octopus.Shared.Startup.ServiceInstaller.Restart(String serviceName) in c:\BuildAgent\work\7bf5272a44079f5\source\Octopus.Shared\Startup\ServiceInstaller.cs:line 43
   at Octopus.Shared.Startup.CommandProcessor.Process(String[] args) in c:\BuildAgent\work\7bf5272a44079f5\source\Octopus.Shared\Startup\CommandProcessor.cs:line 40

There are 6 Octopus non-Error events recorded immediately before this one:

2012-08-28 18:32:10,170 [27] INFO  Octopus [(null)] - Deploying package: 'Octopus.Tentacle', version '1.0.30.1340'
2012-08-28 18:32:21,301 [4] INFO  Octopus [(null)] - Deleting package: E:\Octopus\Applications\.Tentacle\Octopus.Tentacle\1.0.30.1340\Octopus.Tentacle.1.0.30.1340.nupkg
2012-08-28 18:32:21,302 [4] INFO  Octopus [(null)] - Looking for any configuration transformation files
2012-08-28 18:32:21,304 [4] INFO  Octopus [(null)] - Looking for appSettings and connectionStrings in any .config files
2012-08-28 18:32:21,974 [4] INFO  Octopus [(null)] - Calling PowerShell script: 'E:\Octopus\Applications\.Tentacle\Octopus.Tentacle\1.0.30.1340\Deploy.ps1'
2012-08-28 18:32:58,228 [4] INFO  Octopus [(null)] - Octopus agent is shutting down as part of an upgrade.

Is there an action we could take, that would prevent us from encountering this error? At the moment, an upgrade of tentacles means for us that we need to budget in time to manually start up the tentacle service on a several servers.

Thanks.
Lisa

Hi Lisa,

Thanks for letting me know. In the next release I’ll add some logic so that Tentacle’s try harder when stopping/starting (as this is probably a timing issue - services stopping or starting too quickly), and I’ll add a policy so that the tentacle service automatically starts up again if it crashes.

Paul

Thanks, Paul.

This might be relevant: before our most recent upgrade, we had modified the failure configuration of all tentacle services, so that the failure action was to restart the service after 60 seconds (for 1st, 2nd, and subsequent failures) - but this didn’t seem to help with our last upgrade.

A follow up: we haven’t seen this issue in recent months, through a handful of upgrades. Thanks.