Octopus crushed

Hi, we have a error and server goes down
Faulting application name: Octopus.Server.exe, version: 2022.2.6895.0, time stamp: 0x62577b55
Faulting module name: coreclr.dll, version: 5.0.1722.21314, time stamp: 0x62577019
Exception code: 0xc00000fd
Fault offset: 0x0000000000030f1b
Faulting process id: 0xe58
Faulting application start time: 0x01d888b6f182f630
Faulting application path: C:\Octopus\Octopus.Server.exe
Faulting module path: C:\Octopus\coreclr.dll
Report Id: 905a98c2-500c-4825-a2bb-aedf7506e372
Faulting package full name:
Faulting package-relative application ID:

Hi @dennis.ananyin

Sorry to hear you’re getting a crash on your server! Does this happen in any particular part of Octopus? Can the Octopus Server service actually be started?

Also, could you please upload your Octopus Server logs to our secure repo here: Support Files. With the logs we should be able to see what is causing the issue.

Kind Regards
Sean

it happened only once but it stops all company. Reboot helped.
Is this chat private? I don’t want send support file in public

Hi @dennis.ananyin

That is odd that this only happened once. Was there any other issues on the machine running your server at that time?

The support files are private, so only we at Octopus can see these.

Kind Regards,
Sean

Hi @dennis.ananyin

Sorry, I should have been more clear. These forums themselves are not private, but the link I provided for you to upload your files to is. I’ve deleted your previous post so that no one can see this anymore and access your logs.

If you could upload them to the link I previous posted, those are completely private.

Kind Regards
Sean

uploaded

Thanks, @dennis.ananyin. I’ll have a look through your logs and get back to you ASAP.

Kind Regards
Sean

Hi @dennis.ananyin

Can you tell me what time roughly this fault happened? I’ve looked through the server logs and can’t see anything that stands out particularly. It all looks standard. If the application/service crashed, I’m not even sure if it would push any data into the logs. From what you’ve posted, it looks like the fault was in the core library, but that contains many modules.

It may be that this could have been a hiccup in the system itself. Did you notice anything else that was running at the time (AV, maintenance, etc.)? Are the resources sufficient for the server?

Kind Regards
Sean

12:40

resources sufficient
8 cpu
15 gb ram

Did you notice anything else that was running at the time (AV, maintenance, etc.)?
Nope, usual work day.

Error that i send in event log

cpu usage around 15%
but ram strange
usual usage 6.6Gb, but from yesterday it doubled to 11.5 Gb
Sorry, Error 12:40 13.09.2022

Provider Name=“Application Error”
EventID Qualifiers=“0” 1000
Level 2
Task100
Keywords 0x80000000000000
TimeCreated SystemTime=“2022-09-13T09:44:18.284065400Z”
EventRecordID 304296
Channel Application
EventData
Data>Octopus.Server.exe
Data>2022.2.6895.0
Data>62577b55
Data>coreclr.dll
Data>5.0.1722.21314
Data>62577019
Data>c00000fd
Data>0000000000030f1b
Data>e58
Data>01d888b6f182f630
Data>C:\Octopus\Octopus.Server.exe
Data>C:\Octopus\coreclr.dll
Data>905a98c2-500c-4825-a2bb-aedf7506e372

Hi @dennis.ananyin

I’ve looked at the logs around the time the error was reported, but I see nothing in the logs indicating that the server was down or even had a fault.

From the logs, everything was running normally between 12:37 pm and 12:45 pm. API calls were returning 200s, and I can see 315 successful API calls, with at least two going to the database to get information.

Even looking at the TimeCreated SystemTime for the fault shows everything working as usual between 9:41 am and 9:46 am.

With memory usage doubling, is that something that has happened in the past or just recently? Also, have you upgraded your Octopus Server recently too?

Kind Regards
Sean

I’ve just noticed in your task logs @dennis.ananyin that there was package reindexing from 13:11:33 on the 13th that was still going on until today. This might explain the growth in memory usage since I can see about 40,000 lines of packages.

Do you have package retention running on your Library > Packages? If not, I would highly recommend using that to clear down some packages as there is a lot in there, which could cause unforeseen circumstances if the reindexing is applied to that many packages.

Kind Regards
Sean

Screenshot_1
Nope, last upgrade was couple of month ago

yes, we have

Can I ask what that retention is currently set at? I can see there are about 4000+ entries for your EC.CP.API package versions alone, which is enormous.

As for the fault that occurred, I would say keep an eye on it for the moment. I can’t pinpoint what might have caused this from the logs, which should contain any failures or dropouts. I’ll run this by some more senior colleagues and see if there is anything else they can think of that might have caused this, but it is extremely difficult to diagnose in this case.

Kind Regards
Sean

@dennis.ananyin could you send over the latest task log for your last package retention task? You should be able to find it by going to Tasks, selecting the filter for Task Type and selecting “Apply Retention Policies”. This is done per space, so if you have a lot of spaces it would be best to get these tasks for the most used ones.

You should still be able to use the previous support link I gave you to upload these.
Kind Regards
Sean

uploaded

Thanks for that @dennis.ananyin. It might be worth bringing down the release and package rentention so that some of these older releases/package versions get cleared out. This should stop any future issues cropping up if the reindexing runs again, which I suspect may have caused the initial memory spike and possible crash that you experienced.

Would you mind making the following changes to clear down the release/packages?

  1. Disable the package reindexing inside Library > Packages
  2. Set the lifecycle release retention to be around 30-60 days (this can be lower or higher, but I’d recommend not keeping any more than 3 months’ worth).

Once those two changes are done, you’ll have to let the lifecycle retention run through and clear up all the old releases, and then let the package retention run, cleaning up old versions that used to be related to these releases. After that is all said and done, the reindexing can be switched back on, which should go through relatively fast.

Let me know if you have any questions!
Kind Regards
Sean