Some nodes could not work

zhang.shangwu · 3 August 2020 06:46

We have 3 Octopus Server nodes, and 2 new setup listening octopus tentacles. Strange enough, when the health check is running on one of these 3 nodes, it could pass. The other two always failed with the following log. What could be the reason?

2020-07-31 03:27:31.4407 9648 15 INFO listen://[::]:10933/ 15 Accepted TCP client: [::ffff:10.178.161.202]:53940
2020-07-31 03:27:31.4407 9648 15 INFO listen://[::]:10933/ 15 Performing TLS server handshake
2020-07-31 03:27:31.4563 9648 19 INFO listen://[::]:10933/ 19 Secure connection established, client is not yet authenticated, client connected with Tls12
2020-07-31 03:27:31.4875 9648 19 INFO listen://[::]:10933/ 19 Client at [::ffff:10.178.161.202]:53940 authenticated as 96539F6EA3317551E318FFBFD84A22C6FCC3C88C
2020-07-31 03:28:09.7157 9648 3 INFO listen://[::]:10933/ 3 Unhandled error when handling request from client: [::ffff:10.178.161.202]:53621
System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. —> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
— End of inner exception stack trace —
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.FixedSizeReader.ReadPacket(Byte[] buffer, Int32 offset, Int32 count)
at System.Net.Security._SslStream.StartFrameHeader(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security._SslStream.StartReading(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security._SslStream.ProcessRead(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslStream.Read(Byte[] buffer, Int32 offset, Int32 count)
at Halibut.Transport.Protocol.BufferedStream.Read(Byte[] array, Int32 offset, Int32 count)
at System.IO.Compression.DeflateStream.Read(Byte[] array, Int32 offset, Int32 count)
at System.IO.BinaryReader.FillBuffer(Int32 numBytes)
at System.IO.BinaryReader.ReadInt32()
at Newtonsoft.Json.Bson.BsonDataReader.ReadNormal()
at Newtonsoft.Json.Bson.BsonDataReader.Read()
at Newtonsoft.Json.JsonReader.ReadForType(JsonContract contract, Boolean hasConverter)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent)
at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType)
at Newtonsoft.Json.JsonSerializer.Deserialize[T](JsonReader reader)
at Halibut.Transport.Protocol.MessageExchangeStream.ReadBsonMessageT
at Halibut.Transport.Protocol.MessageExchangeStream.ReceiveT
at Halibut.Transport.Protocol.MessageExchangeProtocol.ProcessClientRequests(Func`2 incomingRequestProcessor)
at Halibut.Transport.Protocol.MessageExchangeProtocol.d__12.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Halibut.Transport.SecureListener.d__18.MoveNext()

Justin_Walsh · 3 August 2020 15:04

Hi @zhang.shangwu!

Thanks for reaching out - this looks like an issue where your listening Tentacles might only be configured to trust the thumbprint of one of your three Octopus Server nodes. While they are a HA cluster, each node does have an individual thumbprint that the Tentacle needs to trust.

You can configure this on your deployment targets by running the Tentacle.exe config --Trust command to add another trust relationship for each of your non-trusted nodes.

I hope this helps, and please don’t hesitate to reach out if you have any further questions.

zhang.shangwu · 4 August 2020 07:40

Thank you for your reply.
But those three nodes tried to connect with tentacle with the same thumbrpint **, but only one of them could work. So I think the thumbprint should be in the tentacle trust list.

paul.calvert · 4 August 2020 07:50

Hi @zhang.shangwu,

That information is showing that the tentacle will only accept communications from an Octopus Server that is using the thumbprint 96539F6EA3317551E318FFBFD84A22C6FCC3C88C. Each of your Octopus Server nodes should be using a different thumbprint, so you will need to add two more thumbprints to the tentacle trust list in order for all three nodes to be able to communicate with it.

If you run the show-thumbprint command on each of the three Octopus Server nodes you should be able to determine what the missing thumbprints are.

Regards,
Paul

zhang.shangwu · 4 August 2020 07:58

I understand your idea, but from the log I’ve posted, the failed node also tried to connect with the new listening tentacle using thumbprint …

2020-07-31 03:27:31.4875 9648 19 INFO listen://[::]:10933/ 19 Client at [::ffff:10.178.161.202]:53940 authenticated as …

paul.calvert · 4 August 2020 08:04

That is true, and it may be that the issue is being caused by something else, but to start troubleshooting further we need to at least ensure the basics are in place.

So, if you could provide the thumbprint of the node that successfully performs the health check, and then list the thumbprints for the two nodes that fail we can progress from there.

zhang.shangwu · 4 August 2020 09:45

hi, as the following snapshot I’ve checked the thumbprint, and found all the three nodes have same thumbprint.

paul.calvert · 4 August 2020 10:06

Perfect, thanks for confirming that.

The next step would be to begin working through the troubleshooting steps listed here. The initial check of the port will need to be run once from each node.

If checking the tentacle port URL from each node is successful, then the next step would be to use the TentaclePing tool to see if any additional information is revealed.

I look forward to the results of these tests.

Regards,
Paul

zhang.shangwu · 4 August 2020 11:16

hi, I’ve made a tentacleping/pong tests. Here’s the log file from the octopus server and snapshot from deployment target.
Output1.txt (160.3 KB)

paul.calvert · 4 August 2020 11:21

I’m assuming the URL checks on the listening port was successful from each node to the tentacle?

The error being reported there The client and server cannot communicate, because they do not possess a common algorithm typically indicates an issue with the TLS/SSL or cipher suite settings.

You will need to confirm that each node is configured in the same way, IISCrypto can help you check and compare these settings across the three nodes and the tentacle. Ideally, they should all be identical.

Also, are all three nodes running the same operating system version? I’ve issues similar to this when running between older and newer versions of Windows Server.

zhang.shangwu · 4 August 2020 11:27

Strange enough, when we run the ping/pong tests on the working node , we got the same error. It’s a little confusing.

Justin_Walsh · 4 August 2020 16:39

As @paul.calvert surmised, this does appear to by a TLS cipher suite mismatch, are you able to let us know what operating systems you’re running for both your server nodes, and the target machines?

Is this also, by chance, an older Octopus install? We previously used SHA1 for our certificate hashing, up until 3.14. After that, we moved to SHA256, but changing the certificate to the newer algorithm required manually regenerating the certificate. If your Octopus install started before then, it’s possible that it is still using the SHA1 certificate, which will cause some modern operating systems to reject the connection for being insecure.

Look forward to hearing from you soon!

zhang.shangwu · 5 August 2020 02:26

hi, here’s my re settings from one of server node, and deployemnt target vm. Please review if there’s any problem.

paul.calvert · 5 August 2020 04:56

The items in the Schannel and cypher suite tabs typically need to be identical across all of the machines.

zhang.shangwu · 5 August 2020 06:00

yes, the oct server 3 nodes have the same config, and at the deploy target side I’ve enabled all the options, so I think it should not be the problem of TLS, right?

zhang.shangwu · 5 August 2020 06:48

and I have another deployment target with the same Schannel and TLS Settings and same OS version, windows server 2016 (version 1607 os build 14393.3808), tenacle version is 3.8.9,which could communicate with all three oct server nodes without any issue.
So what could be the reason?

zhang.shangwu · 5 August 2020 08:02

just forget about the tentacleping/pong tests, the result were the same even with the deployment targets that could communicate with all 3 oct server nodes.
We might need think about other reason.

zhang.shangwu · 5 August 2020 08:10

We have checked that sha were both enabled at server side and deployment targets VM, by the tool IISCrypto.
The OS version of deployment targets are windows server 2016 (version 1607 os build 14393.3808) and the octopus server are running on windows server 2012 r2 (version 6.3, build 9600).
The octopus server version is 2018.6.9+Branch.master.Sha.382d6372d807b99497a0ba07292a53e22299f341 .
The tentacle is 3.8.9.(We also tried version 3.22.0, just same result.)

zhang.shangwu · 5 August 2020 10:58

anyway, the problem was resolved after a restart of the other two octopus server nodes…

system · 5 September 2020 10:58

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.