When a deployment gets stuck for some unknown reason, I would like the ability to cancel that deployment. That deployment would be marked as “cancelled” or “failed”. Any operations that Octopus and the Tentacles were performing for that deployment would also be aborted.
I am using Topshelf to install/uninstall/start/stop my Windows services and this past weekend I had an issue with my Deploy.ps1 script. Due to no fault of Octopus, the installation of the Windows service hung and I had to kill my service process manually. Unfortunately, there was nothing to tell Octopus that the install failed, and it just got stuck in the “in progress” state. I had to manually change the Task table in order to show the task as “Failed”. I also restarted the Octopus and Tentacle services to be safe since I had no idea what state they were currently in.
Good suggestion, tasks do actually have cancellation abilities but I haven’t implemented a UI button for it.
‘In Progress’ tasks will automatically ‘time out’ to a failed state if the deployment falls over (either an Octopus or Tentacle crash) - I think it is set to 5 minutes. Editing the task in the database shouldn’t strictly have been necessary, though I know it wasn’t clear from the UI.
Thanks for the quick response! What do you classify as an Octopus or Tentacle crash? Those services just stopping entirely? Is that 5 minute timeout after the last update (so if something goes wrong 3 minutes in, will it fail at the 8 minute mark)? In my case, when the Deploy.ps1 ran the command “Project.Service.exe install”, there was a problem (with it, not Octopus) and it just hung there. There was nothing I could do besides manually kill that .exe process. Having a UI for cancelling those tasks would be awesome.
While the Octopus process is executing the task, it updates the task in the database every couple of seconds by flushing the log and setting an update time.
If the Tentacle service crashes, the Octopus will usually notice that and log it, before failing the deployment.
If the Octopus service crashes, the last update time on the task won’t get updated. Once the Octopus UI notices the task is ‘in progress’ but hasn’t been updated for 5 minutes, the UI itself will mark the task as failed. (So the calculation is (Now - LastProgressUpdate) > 5 mins)
In the case of the call to InstallUtil, Tentacle actually calls it with a timeout (all external processes are launched with a timeout) - this is 10 minutes by default. If InstallUtil hadn’t returned after 10 minutes, Tentacle would have aborted InstallUtil.exe and returned to the Octopus.
So the system is pretty self-healing, it just needs time for the various timeouts to trigger - having an explicit cancellation button that aborts the task in a safe way would make for a better UX.
That’s awesome! I didn’t know about the 10 minute timeout for InstallUtil. In addition to the explicit cancellation button, I think having that all that information from your reply in a “What happens when something goes wrong” knowledge base article would be valuable for everyone to know as well.