Octopus server crashes after Windows 2012 patch and reboot

The latest Windows 2012 patches were installed on our Octopus deployment server and rebooted. Since then, whenever a task executes, the Octopus service immediately crashes. I see the following 2 messages in the Event logs:

Event Log #1
Application: Octopus.Server.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AccessViolationException
Stack:
at System.Net.UnsafeNclNativeMethods+SafeNetHandles_SECURITY.AcquireCredentialsHandleW(System.String, System.String, Int32, Void*, System.Net.SecureCredential ByRef, Void*, Void*, System.Net.SSPIHandle ByRef, Int64 ByRef)
at System.Net.UnsafeNclNativeMethods+SafeNetHandles_SECURITY.AcquireCredentialsHandleW(System.String, System.String, Int32, Void*, System.Net.SecureCredential ByRef, Void*, Void*, System.Net.SSPIHandle ByRef, Int64 ByRef)
at System.Net.SafeFreeCredentials.AcquireCredentialsHandle(System.Net.SecurDll, System.String, System.Net.CredentialUse, System.Net.SecureCredential ByRef, System.Net.SafeFreeCredentials ByRef)
at System.Net.SSPIWrapper.AcquireCredentialsHandle(System.Net.SSPIInterface, System.String, System.Net.CredentialUse, System.Net.SecureCredential)
at System.Net.Security.SecureChannel.AcquireCredentialsHandle(System.Net.CredentialUse, System.Net.SecureCredential ByRef)
at System.Net.Security.SecureChannel.AcquireClientCredentials(Byte[] ByRef)
at System.Net.Security.SecureChannel.GenerateToken(Byte[], Int32, Int32, Byte[] ByRef)
at System.Net.Security.SslState.StartSendBlob(Byte[], Int32, System.Net.AsyncProtocolRequest)
at System.Net.Security.SslState.ProcessReceivedBlob(Byte[], Int32, System.Net.AsyncProtocolRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[], System.Net.AsyncProtocolRequest)
at System.Net.Security.SslState.StartSendBlob(Byte[], Int32, System.Net.AsyncProtocolRequest)
at System.Net.Security.SslState.ForceAuthentication(Boolean, Byte[], System.Net.AsyncProtocolRequest)
at System.Net.Security.SslState.ProcessAuthentication(System.Net.LazyAsyncResult)
at Halibut.Transport.SecureClient.EstablishNewConnection()
at Halibut.Transport.SecureClient.AcquireConnection()
at Halibut.Transport.SecureClient.ExecuteTransaction(System.Action1<Halibut.Transport.Protocol.MessageExchangeProtocol>) at Halibut.HalibutRuntime.SendOutgoingHttpsRequest(Halibut.Transport.Protocol.RequestMessage) at Halibut.ServiceModel.HalibutProxy.Invoke(System.Runtime.Remoting.Messaging.IMessage) at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(System.Runtime.Remoting.Proxies.MessageData ByRef, Int32) at Octopus.Shared.Contracts.IScriptService.StartScript(Octopus.Shared.Contracts.StartScriptCommand) at Octopus.Worker.Tentacles.TentacleRemoteEndpointFacade.ExecuteCommand(Octopus.Shared.Contracts.StartScriptCommand) at Octopus.Worker.Tentacles.TentacleWorker.CheckHealth() at Octopus.Server.Orchestration.Health.MachineTaskController.PerformTask(Octopus.Core.Model.Environments.Machine) at Octopus.Shared.Tasks.OctoThreadClosure1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].Execute()
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at System.Threading.ThreadHelper.ThreadStart()

Event Log #2
Faulting application name: Octopus.Server.exe, version: 3.2.24.0, time stamp: 0x56ca7f2e
Faulting module name: ncryptprov.dll, version: 6.3.9600.16384, time stamp: 0x5215e237
Exception code: 0xc0000005
Fault offset: 0x00000000000146a5
Faulting process id: 0x1954
Faulting application start time: 0x01d17310a2bab97d
Faulting application path: E:------\Tools\Octopus Deploy\Octopus\Octopus.Server.exe
Faulting module path: C:\Windows\system32\ncryptprov.dll
Report Id: f860f4d4-df03-11e5-80cd-005056a410bf
Faulting package full name:
Faulting package-relative application ID:

We were able to workaround the issue by uninstalling the Windows Server 2012 patches KB3123479, KB3109853, and KB3081320.

We’d still appreciate any feedback on how to address this from the Octopus application so the patches can be loaded at a later time. Thanks!

Hi Brian,

Thanks for getting in touch.

One of the updates that you installed (KB3123479) deprecated SHA1. My guess would be that you have an old certificate for Server -> Tentacle comms. Has your Octopus Server been around for a while?

You may need to generate a new certificate on your Octopus Server and configure your Tentacles to trust that certificate. Do you have a lot of Tentacles?

Cheers,
Shane

Hi Shane –

We’ve been using Octopus for a couple years now, but many of our servers are still Windows 2003 (upgrades are in progress).
I’ll try generating a new certificate from the Octopus server and see if that addresses the issue. Thanks!

Brian

I see commands on how to update the Tentacle, but not the Server. How does one update the Octopus Server certificate?

Brian

Hi Brian,

To generate a new certificate on your Octopus Server you will need to run the command Octopus.Server.exe regenerate-certificate --octopus-tentacle. That will break all of the trust between your Octopus Server and Tentacles which means you will need to re-configure all of your Tentacles with Tentacle.exe configure --trust <new thumbprint>.

If you have a large number of Tentacle you may be able to generate a certificate but do not use it in Octopus yet, use the script console to add the certificate thumbprint to each Tentacle and then start using that certificate on your Server. I haven’t tried it yet but if you needed help doing it I could put together a step-by-step for you.

Hope this helps.

Cheers,
Shane

Hi Shane –

I tried your suggestion but unfortunately it didn’t work.
I ran the “octopus.server.exe regenerate-certificate --octopus-tentacle” and then re-configured all the Tentacles.
I then re-applied the Windows 2012 Server R2 patches and am back to where I started before.
Is there a way to determine what type of certificate is being generated by the Octopus server? Or how to force it to use SHA2 vs SHA1?

Brian

Hi Brian,

I’m sad to hear that it didn’t work, I thought we might get an easy win. I’ve set up a Windows 2012 Server with Windows 2003 Tentacles and installed the updates that you mentioned , unfortunately I can’t reproduce the crash.

Would you be able to provide a crash dump when you install the updates and the server crashes? We have instructions here: http://docs.octopusdeploy.com/display/OD/Capture+a+crash+dump

You can upload it here: https://file.ac/IMcQRm7pS2s/

Cheers,
Shane

Hi Shane -

I’ve uploaded the dump file and a Word document showing the Application and System event logs at the time of the crash.

Brian

Hi Brian,

Thank you for sending so much detail. You have us completely stumped.

Would you be able to run the following from a command prompt to query what .NET versions you have installed on that machine:

reg query "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP

Also run this utility to show the security setup on that machine (after installing the updates from Microsoft if possible):
https://www.nartac.com/Products/IISCrypto

I am also wondering if the problem is related to the machine or the Octopus instance. Would you be able to set up a fresh Octopus Server and Tentacle instance on the same machine and see if the problem persists on the new instance?

Thanks,
Shane

Hi Shane –

  1.   Here are the results of the reg query command:
    

C:>reg query "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\CDF
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v2.0.50727
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v3.0
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v3.5
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4.0

  1.   I ran some experiments with the 3 different Microsoft patches installed:
    

KB3123479

KB3109853

KB3081320

Result

Y

Y

Y

Fail

Y

Y

N

Fail

Y

N

Y

Fail

N

Y

Y

Fail

Y

N

N

Pass

N

Y

N

Fail

N

N

Y

Fail

N

N

N

Pass

Based on this experiment, the two patches that are affecting the issue are KB3109853 and KB3081320. KB3123479 does not appear to be affecting Octopus.

  1.   I ran some Wireshark traces during a “Check health” from the Admin server to one of the web servers; one with the patches on the Admin server, and one without patches on the Admin server.
    

Successful health check (without patches on Admin server)
[cid:image002.jpg@01D17DD5.42C46BD0]

Failed health check (with patches on Admin server)
[cid:image004.jpg@01D17DD5.42C46BD0]

On the failed “Check heath” when Octopus crashes, it appears to be grabbing the Octopus Tentacle certificate (I can see the 2015-03-23 expiration date) and the web server’s list of root certificates (I only showed some from the screen shot).
Since the Octopus certificate is self-signed and does not have a certificate chain up to a root certificate, that could be a clue to the failure.

  1.   I’ll see what I can do about installing a fresh copy of Octopus server on the Admin box.  I didn’t think you could install multiple copies of Octopus on the same server, so I’ll have to see what I can do without losing the install that I already have.
    

Brian

image004.jpg

image002.jpg

Hi Brian,

Thanks for all the info. Will review and get back to you shortly.

In the meantime, you can install multiple Octopus instances on the same machine: http://docs.octopusdeploy.com/display/OD/Managing+multiple+instances

Cheers,
Shane

Hi Shane –

I was able to reproduce the issue with a fresh installation of SQL Server 2014, Octopus Server (3.2.13), and Octopus Tentacle on a single and separate Windows Server 2012 R2 machine.
I created a single environment with one server (the local machine) as the deployment target.
With the patches KB3109853 and KB3081320 installed, the Octopus server crashed after a health check to the target.
With the patches removed, the health check was successful.

Brian

Hi Shane –

I have some good news – it looks like this is a Microsoft bug as we were able to reproduce the same error on a non-Octopus machine.
Steps to reproduce:

  1.   On a Windows 2012 R2 Server, ensure the patches KB3109853 and KB3081320 are installed (use Control Panel --> View Installed Updates)
    
  2.   If not already installed, install IIS
    
  3.   Start IIS Manager, select the local server, and double-click “Server Certificates”
    
  4.   Click on “Create Certificate Request”
    
  5.   On the Request Certificate window, enter in random values for each field (the values don’t matter) and click Next
    
  6.   Click Next to accept the default CS Provider Properties (the values don’t matter) and Click Next
    
  7.   Save the file to somewhere on the local server (like C:\temp\a.txt)
    
  8.   Click Finish
    

[cid:image001.png@01D17F9E.8CB382B0]

This looks very similar to the error that occurs before Octopus crashes:
“A fatal error occurred when attempting to access the SSL client credential private key. The error code returned from the cryptographic module is 0xC002001B. The internal error state is 10003.”

I suspect the issues are similar. Please see if you can reproduce these steps on your side. I’ll be opening a ticket with Microsoft on our side, but Octopus may want to consider opening one also if the issues look related.

Brian

image001.png

One other item my teammate found. The Window Service “CNG Key Isolation” had been recently Disabled. When we Enabled this service, the issue was resolved. I’ll try that step also on the Octopus servers to see if that resolves the issue.

Brian

That fixed the issue – when the “CNG Key Isolation” service was Enabled, the issue went away.
The service has been disabled the whole time, but when these two patches came out, it collided with this Disabled service and caused the certificate malfunction to occur.
Thanks for your time in looking at this - hopefully this will save someone else the same heartache!

Brian

Hi Brian,

I just wanted to say a big thank you for all your findings and persistence with this - it will help someone else. These Microsoft patches clashing with environment settings are just so hard for us to reproduce, so we really do appreciate it.

Vanessa