Deployment Successful with steps still running

lbrody · 5 December 2022 22:46

We are running version 2022.3.10828. (upgraded yesterday) We seem to have a case where the Deployment succeeded, but there is a step that is still executing

Here is the log
ServerTasks-1202692.log.txt (206.8 KB)

this seems like it might be relevant

Lauren.Cagney · 6 December 2022 00:43

Hi Leslie,

Thanks for reaching out - this is certainly an interesting issue!

The “Smoke Test Instance” run a script step does seem to be where the issue lies.

Invoking target script C:\Octopus\Work\20221205212911-1202692-3250\Script.ps1 with  parameters.
21:29:08   Info     |         Starting verification request to http://localhost:8080/
21:29:08   Info     |         Expecting response code 200.
21:29:08   Info     |         Expecting response: .
21:29:08   Info     |         Attempt 1 of 5 to make request

Can you please send through the script that is being referenced in this step? Would you please also check the Windows Event logs to see if there are any related events to show if it was killed off by Windows and possibly our logging didn’t catch it? I’m wondering, if this is the case, a process dump may also be beneficial to send through. When capturing the process dump for Tentacle.exe, please also capture any child Calamari.exe processes.

I’ve created a secure upload link for you here to upload the above files to.

I look forward to hearing back!

Kind regards,
Lauren

lbrody · 6 December 2022 15:38

The step is from a step template. the template has been updated, but not yet applied to the project. Is there a way, other than going thru the database, to get the older version of the script?

All of our servers run Windows Servers Core. I have not found a reliable way to generate a process dump in that space. I can probably get the event logs, do you want them from the worker pool instance, or the server node and which log do you want?

adam.hollow · 6 December 2022 16:22

Hi @lbrody,

Thanks for getting back in touch!

In regards to the process dump, it would be great if you could retrieve this from the tentacle side whilst this issue is ongoing.
We’ve got some powershell code that can be run to extract a mini dump that may be useful, you just need the process ID of Tentacle.exe to feed into the function:

function Create-MiniDump
{
    param($tentacleProcessId)

    rundll32.exe C:\Windows\System32\comsvcs.dll, MiniDump $tentacleProcessId $Env:TEMP\tentacle-$((Get-Date).ToString("yyyyMMddTHHmmss")).bin full
    Write-Host "Process dumped to $Env:TEMP"
}

When it comes to the step template, as the step template itself has not been updated within the process (and only updated with Library → Step Templates) an export of the projects process JSON should provide us with the step template information.

I hope this makes sense, if you have any questions then please don’t hesitate to get back to me and I’ll do my best to help how I can.

Kind Regards,
Adam

lbrody · 6 December 2022 17:12

I have uploaded the json for the process. will work on generating a minidump from the worker

lbrody · 7 December 2022 03:38

The UI has finally caught up and stopped saying the steps were still running. This was after the default pool workers were all replaced (we replace all instance weekly) That step was running on a deployment target, not a worker pool. Since that is a server in active production service, I can’t log in to generate a dump file

adam.hollow · 7 December 2022 10:57

Hi @lbrody,

No worries, thanks for getting back to us.

In the case of not being able to create a dump file, is there a chance you could send over the tentacle logs from that Tentacle, please?
It would be incredibly helpful if you could either zip all the available logs when uploading or pinpoint the log from the date/time of the issue occurring.

Kind Regards,
Adam

lbrody · 7 December 2022 16:47

I was wrong - looked at the wrong deployment, the UI is still showing that the tasks are running and the log entries for those tasks are not complete.

I have uploaded the tentacle logs as well as the minidump (I was able to grab it)

adam.hollow · 9 December 2022 16:17

Hi Leslie,

Thanks a lot for providing the logs and the minidump file, I’ve passed this over to our engineers to look at, to see if they can identify anything that may be causing the task to not complete as expected.

We’ll be back in touch as soon as we have more information to share.
In the mean time, please don’t hesitate to reach back out if you have any questions or concerns.

Kind Regards,
Adam

adam.hollow · 14 December 2022 12:02

Hey @lbrody,

I’ve had a response back from our engineering team with a question regarding the issue:

Is this issue a one-off occurrence? I.e. if they re-run the step, does it all report back correctly?

They’ve mentioned that if this is a one-off occurrence it’ll be unlikely that they’ll be successful in tracking down the cause and it may be best to restart the Tentacle in this scenario to continue using the target.

If the issue is a recurring issue, please let me know and I can feed that back to the engineers who will take another look at the problem.

Kind Regards,
Adam

lbrody · 15 December 2022 02:33

so far this is a one-off occurrence. All of our Octopus infrastructure is running on windows server core (no UI) I have never seen this before

and I am wondering if somehow a new PowerShell session launched that could be running as the system account and then it tried to popup a login dialog.

thoughts?

adam.hollow · 15 December 2022 13:03

Hey @lbrody,

Thanks for confirming that, it’s bittersweet that it’s a one-off (mostly sweet) as it likely means we won’t be able to get to the bottom of what caused the issue. That being said, I’ve forwarded your thoughts in your last reply to our engineers to see what they think as it could be a plausible cause for this scenario.

In regards to the Windows Server core as far as I’m aware all of our calamari processes and sub-process operate via a silent, non-interactive window. This should mean that all operations are able to be performed via CLI without the need for prompts, nevertheless, it doesn’t mean that something funky hasn’t happened along the way.

I’ll keep you updated and let you know of any feedback from our engineers. Thanks again for your patience and co-operation.

Kind Regards,
Adam