We are running version 2022.3.10828. (upgraded yesterday) We seem to have a case where the Deployment succeeded, but there is a step that is still executing
Thanks for reaching out - this is certainly an interesting issue!
The “Smoke Test Instance” run a script step does seem to be where the issue lies.
Invoking target script C:\Octopus\Work\20221205212911-1202692-3250\Script.ps1 with parameters.
21:29:08 Info | Starting verification request to http://localhost:8080/
21:29:08 Info | Expecting response code 200.
21:29:08 Info | Expecting response: .
21:29:08 Info | Attempt 1 of 5 to make request
Can you please send through the script that is being referenced in this step? Would you please also check the Windows Event logs to see if there are any related events to show if it was killed off by Windows and possibly our logging didn’t catch it? I’m wondering, if this is the case, a process dump may also be beneficial to send through. When capturing the process dump for Tentacle.exe, please also capture any child Calamari.exe processes.
I’ve created a secure upload link for you here to upload the above files to.
The step is from a step template. the template has been updated, but not yet applied to the project. Is there a way, other than going thru the database, to get the older version of the script?
All of our servers run Windows Servers Core. I have not found a reliable way to generate a process dump in that space. I can probably get the event logs, do you want them from the worker pool instance, or the server node and which log do you want?
In regards to the process dump, it would be great if you could retrieve this from the tentacle side whilst this issue is ongoing.
We’ve got some powershell code that can be run to extract a mini dump that may be useful, you just need the process ID of Tentacle.exe to feed into the function:
function Create-MiniDump
{
param($tentacleProcessId)
rundll32.exe C:\Windows\System32\comsvcs.dll, MiniDump $tentacleProcessId $Env:TEMP\tentacle-$((Get-Date).ToString("yyyyMMddTHHmmss")).bin full
Write-Host "Process dumped to $Env:TEMP"
}
When it comes to the step template, as the step template itself has not been updated within the process (and only updated with Library → Step Templates) an export of the projects process JSON should provide us with the step template information.
I hope this makes sense, if you have any questions then please don’t hesitate to get back to me and I’ll do my best to help how I can.
The UI has finally caught up and stopped saying the steps were still running. This was after the default pool workers were all replaced (we replace all instance weekly) That step was running on a deployment target, not a worker pool. Since that is a server in active production service, I can’t log in to generate a dump file
In the case of not being able to create a dump file, is there a chance you could send over the tentacle logs from that Tentacle, please?
It would be incredibly helpful if you could either zip all the available logs when uploading or pinpoint the log from the date/time of the issue occurring.
Thanks a lot for providing the logs and the minidump file, I’ve passed this over to our engineers to look at, to see if they can identify anything that may be causing the task to not complete as expected.
We’ll be back in touch as soon as we have more information to share.
In the mean time, please don’t hesitate to reach back out if you have any questions or concerns.
I’ve had a response back from our engineering team with a question regarding the issue:
Is this issue a one-off occurrence? I.e. if they re-run the step, does it all report back correctly?
They’ve mentioned that if this is a one-off occurrence it’ll be unlikely that they’ll be successful in tracking down the cause and it may be best to restart the Tentacle in this scenario to continue using the target.
If the issue is a recurring issue, please let me know and I can feed that back to the engineers who will take another look at the problem.
Thanks for confirming that, it’s bittersweet that it’s a one-off (mostly sweet) as it likely means we won’t be able to get to the bottom of what caused the issue. That being said, I’ve forwarded your thoughts in your last reply to our engineers to see what they think as it could be a plausible cause for this scenario.
In regards to the Windows Server core as far as I’m aware all of our calamari processes and sub-process operate via a silent, non-interactive window. This should mean that all operations are able to be performed via CLI without the need for prompts, nevertheless, it doesn’t mean that something funky hasn’t happened along the way.
I’ll keep you updated and let you know of any feedback from our engineers. Thanks again for your patience and co-operation.