One of our deployments hangs intermittently - it’s always on the same project. This particular deployment is triggered automatically when TeamCity builds our develop branch (with a 5 minute quiet period). Our developers are typically pushing commits to that branch 5-10 times a day, so this deployment to our DEV platform is our most frequent deployment.
Because it is intermittent (17th Oct 2018 cancelled after 16 hours, 12th Oct 2018 cancelled after 3 days - no more shown in the Task history) I’ll try to give as much detail as I can.
The project being deployed consists of an IIS web site with another application inside it. It’s upon creating that IIS application that the deployment seems to hang.
On this occasion I just cancelled the stalled deployment task (and one other that was queued behind it) and clicked Try Again. On other occasions I’ve resorted to restarting the tentacle and one time when that didn’t work and restarting Octopus Server didn’t work either, I found that rebooting the target server solved the issue.
The occasion where I restarted the server also tied in with our server backup process hanging on that server, so I suspect a Windows issue rather than an Octopus issue. Stalled deployments have all been to the same server running Windows Server 2016.
We are using version 2018.8.8. Previous occurrences were on version 2018.8.4.
Snippet of the deployment output:
17:26:26 Info | “H5” does not exist. Creating Web Application pointing to IIS:\Sites\ClaimControl-DEV-3.1.15\H5 …
17:26:26 Verbose | Acquired mutex Global\Octopus-IIS-Metabase-Mutex
09:18:31 Verbose | Process C:\Windows\system32\WindowsPowershell\v1.0\PowerShell.exe in D:\Alphatec\Apps\Octopus\Work\20181016162535-5631-814 exited with code -1
09:18:31 Verbose | Updating manifest with output variables
09:18:31 Verbose | Updating manifest with action evaluated variables
09:18:31 Error | The remote script failed with exit code -1
09:18:31 Error | The task was canceled
09:18:31 Verbose | The task was canceled
There’s nothing obvious in the tentacle log file. Can you offer any advice on turning on more logging to try to track this down?
Thanks for getting in touch, and I’m sorry to hear that you are having these issues.
We have had this reported previously and it seems to be a Windows issue and the only way to resolve it is to reboot the physical server.
Next time a deployment hangs, you can log on to the Tentacle server and try and run the following command Get-WindowsFeature Web-WebServer -ErrorAction Stop and see if that completes successfully.
You can turn on trace logging by following the instructions here (but instead of octopus.server.exe.nlog it’ll be tentacle.exe.nlog.
OK - it happened again this morning. My colleague cancelled the deployment and did a retry and it still hung on the same step. Unfortunately I hadn’t gotten around to setting the tentacle log file level to trace but while the build was hung I ran the Get-WindowsFeature Web-WebServer -ErrorAction Stop command from PowerShell on the target machine and it returned immediately:
PS C:\Windows\system32> Get-WindowsFeature Web-WebServer -ErrorAction Stop
Display Name Name Install State
------------ ---- -------------
[X] Web Server Web-WebServer Installed
I cancelled the deploy again and did retry and it was fine (suggesting that whatever was blocking it had gone).
I’ve got the tentacle log in trace mode now so maybe we’ll be able to get more insight into what is going on next time it occurs.
In case it’s not clear - this is only a minor inconvenience for us but I’d like to get to the bottom of it.
I has happened again today. I’ve got the logging in Trace mode now but couldn’t see any thing obvious (I’m happy to send the logs over - I took a copy after cancelling, but before retrying).
This time retrying the deploy hung at the same point.
I restarted the tentacle and tried again. This time it succeeded.
Is there any more I can do to help determine what is causing this? Do we need to set the octopus server log to Trace mode too?
The next time it hangs (which sounds like it’s happening quite frequent unfortunately), can you grab a process dump of the Tentacle.exe process and upload that here (after you’ve uploaded the file, it will appear as if the file just disappeared but this is only because that link only has upload rights, not list/download). Hopefully this can shed some light on what is causing the deployment to hang.
It failed again this morning on the same step, but unfortunately my colleague had retried it before I read your reply. Upon retry it hang on the step that normally takes about 20 seconds - and it’s not the one I saw it fail on previously - it’s the step before that creates the IIS website.
I’ve uploaded tentacle.dmp using the link provided.
After rebooting the server, the deployment hung on the step it hung on originally. I deleted the website from within IIS Manager and tried again - The Deploy Landing Website
step took 17 seconds (was previous hung for 25 minutes) and Deploy H5 step took 42 seconds - so I am getting the impression that we are leaving IIS in some invalid state during our deployment somehow.
Thanks for uploading the process dump for the Tentacle. I will have a look and see if it gives any clues as to what is causing the hanging.
A co-worker of mine today said it would also be good if you were able to take a memory dump of the powershell.exe process that is running when the deployment hangs, this should help us track down what command it is that is causing the process to hang.
You might need to use process explorer to find the powershell process that gets started by the Tentacle.
It happened again this morning - so I’ve dumped tentacle.exe and the PowerShell instance launched by tentacle.exe along with the one launched by calimari.exe at that time. I’ve compressed them with 7-Zip to reduce my upload time and named the two powershell dump files so you can see which is which.
Looking at the one for the powershell process launched by Calamari.exe (which would be the one that is causing the hang) it looks like something down in the Microsoft.IIs.PowerShell.Framework.Configuration code that is causing the hang when the IIS configuration is changed (see output from process dump of hung thread below).
I’m glad to be of assistance if you need us to try anything more.
This may be happening to us because we do things a little differently for historical reasons - we don’t just have our website in one IIS website - we had our original site (which was Adobe Flash) in the IIS website, then added on new functionality using MVC/HTML5 which was the H5 application inside of the IIS website.
It’s no longer strictly necessary - we could move it all into a single website project now - we just haven’t gotten around to it.