We are facing an issue with Octopus Deploy master, where service is getting stuck and UI keeps on loading. We have to restart the service to make it work again. In last week this has happened twice. I have extracted process dump on both the occasions.
Details of Octo master:
version: 2020.2.15
Os: Windows 2012
EC2: C4.2xlarge
Can someone please help us in troubleshooting the issues?
Thanks for the instructions, I will upload the dump files. We are running on a single master node only, no HA. I will check if we can go with trace level logging, only problem I see is , as per documentation it says it may impact octo master performance if left for too long. As the issue is intermittent and usually doesn’t occurs for days, are you asking for enable trace logs for any random period ?
I have uploaded server logs and crash analysis report from the process dump. Each process dump file is >1.5GB, hence shared the crash analysis report only.
Be sure to include raw logs with trace logging enabled if at all possible. Let me know once you finish uploading and I’ll send this over to our engineering team.
I have uploaded both process dump files now. Please go ahead with the analysis. I am yet to work on trace logs. Will revert once I have some progress on it.
Is there any update on dump file analysis?
Trace logs will take some time, as we had an incident this week where our master node got terminated and we had to re-provision our master node. Luckily apart than downtime, there was no data loss. We are still working on stability and root cause for the node. So enabling trace log may take some time.
Would appreciate if we can expediate the dump file analysis.
The dump file seems to be pointing toward a stuck .dll file relating to .NET Core. I would recommend uninstalling .NET Core then reinstalling it. It may also be beneficial to uninstall old/unused/redundant .Net Framework versions. After that, re-run the installer for your current version of Octopus Server, then use the “Repair” option.
If that doesn’t resolve the issue, we will need the trace level logs in order to investigate further.
Thanks for sharing the analysis. Can you share the .dll file name which was stuck. As we have a new server in place, we will monitor it and if we run in the similar situation, will enable the trace logs.