Octopus deploy master service stuck

Hi Team,

We are facing an issue with Octopus Deploy master, where service is getting stuck and UI keeps on loading. We have to restart the service to make it work again. In last week this has happened twice. I have extracted process dump on both the occasions.

Details of Octo master:
version: 2020.2.15
Os: Windows 2012
EC2: C4.2xlarge

Can someone please help us in troubleshooting the issues?

Thanks and Regards,
Devan

Hi @d.jain,

Thank you for contacting Octopus Support.

Are you running HA nodes? If so, have you tried swapping the leader?

I’m happy to look into this for you. In addition to your process dump, the Octopus Server logs would be helpful as well.

You may upload these files via Octopus.com -> Sign-In (or create account) -> Import License

Once that is done, you may upload files via the Support tab for your Org.

If you feel up to it, you can also enable trace level logging via the instructions here to see the last operation before encountering an issue.

I look forward to hearing back from you.

Regards,
Donny

hey Donny,

Thanks for the instructions, I will upload the dump files. We are running on a single master node only, no HA. I will check if we can go with trace level logging, only problem I see is , as per documentation it says it may impact octo master performance if left for too long. As the issue is intermittent and usually doesn’t occurs for days, are you asking for enable trace logs for any random period ?

Thanks,
Devan

I have uploaded server logs and crash analysis report from the process dump. Each process dump file is >1.5GB, hence shared the crash analysis report only.

Thanks and Regards,
Devan

Hi @d.jain,

Thank you for getting back to me.

Regarding the dump files, you may upload the files. The size is fine.

For the trace level logging, you can enable it and see how it goes. If it starts affecting performance too much, you can always turn it back off.

Let me know if you have any questions.

Regards,
Donny

Hey Donny,

We have uploaded 1 process dump (zipped), will be uploading second process dump soon. The size of dumping causing delays in uploading.

Thanks,
Devan

Hi @d.jain,

Thank you for uploading that.

Be sure to include raw logs with trace logging enabled if at all possible. Let me know once you finish uploading and I’ll send this over to our engineering team.

Have a nice weekend.

Regards,
Donny

Hey Donny,

I have uploaded both process dump files now. Please go ahead with the analysis. I am yet to work on trace logs. Will revert once I have some progress on it.

Thanks and Regards,
Devan

Hi @d.jain,

Thank you for the update.

I’ll take a look at the dump files. Please be sure to send over the trace logs at your earliest convenience.

Regards,
Donny

Hey @donny.bell,

Is there any update on dump file analysis?
Trace logs will take some time, as we had an incident this week where our master node got terminated and we had to re-provision our master node. Luckily apart than downtime, there was no data loss. We are still working on stability and root cause for the node. So enabling trace log may take some time.

Would appreciate if we can expediate the dump file analysis.

Thanks and Regards,
Devan

Hi @d.jain,

Thank you for your patience.

The dump file seems to be pointing toward a stuck .dll file relating to .NET Core. I would recommend uninstalling .NET Core then reinstalling it. It may also be beneficial to uninstall old/unused/redundant .Net Framework versions. After that, re-run the installer for your current version of Octopus Server, then use the “Repair” option.

If that doesn’t resolve the issue, we will need the trace level logs in order to investigate further.

Let me know if the above yields positive results.

Regards,
Donny

Hi @donny.bell,

Thanks for sharing the analysis. Can you share the .dll file name which was stuck. As we have a new server in place, we will monitor it and if we run in the similar situation, will enable the trace logs.

Regards,
Devan

Hi @d.jain,

Thank you for getting back to me. The DLL file in question was “system.private.corelib.dll”.

Keep me posted if the issue comes up again.

Regards,
Donny