Octopus Tentacle - Read Remote identity failed

michal.wilanowski · 16 May 2023 04:01

Hi,

After EC2 stop and start, Octopus Tentacle was restarted as a process and the following error occured.
Quick fix is a manual service restart.
I’ve compared pollid and cert thumbprint and everything looks fine. As mentioned above, restart works so it’s not that.

Current Version 6.1.1403
Part of the logs:

Secure connection established… , using protocol Tls12
Unexpected exception executing transaction.
Halibut.Transport.Protocol.ProtocolException: Unable to receive the remote identity; the identity line was empty.
at Halibut.Transport.Protocol.MessageExchangeStream.ReadRemoteIdentity()
at Halibut.Transport.Protocol.MessageExchangeStream.ExpectServerIdentity()
at Halibut.Transport.Protocol.MessageExchangeStream.IdentifyAsSubscriber(String subscriptionId)
at Halibut.Transport.Protocol.MessageExchangeProtocol.ExchangeAsSubscriber(Uri subscriptionId, Func`2 incomingRequestProcessor, Int32 maxAttempts)
at Halibut.Transport.SecureClient.ExecuteTransaction(ExchangeAction protocolHandler, CancellationToken cancellationToken)
2023-05-16 03:43:39.5737

Any advice ?

BR,
M

Daniel_Fischer · 16 May 2023 04:26

Hi Michal,

Thanks for getting in touch! The issue you’re describing sounds an awful lot like this.

github.com/OctopusDeploy/Issues

Polling tentacle fails to present client certificate following 2020.1.4 upgrade

opened 06:32PM - 18 Mar 20 UTC

closed 06:41AM - 23 Mar 20 UTC

Justin-Walsh

kind/bug

# Prerequisites - [X] I have verified the problem exists in the latest versio…n - [X] I have searched [open](https://github.com/OctopusDeploy/Issues/issues) and [closed](https://github.com/OctopusDeploy/Issues/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aclosed) issues to make sure it isn't already reported - [X] I have written a descriptive issue title - [X] I have linked the original source of this report - [X] I have tagged the issue appropriately (area/*, kind/bug, tag/regression?) # The bug Following the cloud rollout of 2020.1.4, we've seen a small number of cases where tentacles fail to authenticate to the Octopus Server, with a `Halibut.Transport.Protocol.ProtocolException: Unable to receive the remote identity; the identity line was empty.` error. At this time, we believe this issue stems from older tentacle installs that would have generated a certificate with a weaker cipher, which is presenting issues when trying to negotiate a secure connection over modern encryption protocols (TLS). ## How do I know if I'll be affected? You may be affected by this bug if you use Octopus Cloud, and you use a polling tentacle that was created in the distant past (e.g. v3.0.25). ## Steps to reproduce TBD ## Related bugs We saw a similar issue affecting _listening_ tentacles in the same release, but with slightly different symptoms: #6266. ### Log exerpt #### Tentacle log ``` https://snip.octopus.app:10943/ 4 Unexpected exception executing transaction. Halibut.Transport.Protocol.ProtocolException: Unable to receive the remote identity; the identity line was empty. at Halibut.Transport.Protocol.MessageExchangeStream.ReadRemoteIdentity() in C:\buildAgent\work\fe2b45bbd4978f75\sourc e\Halibut\Transport\Protocol\MessageExchangeStream.cs:line 169 at Halibut.Transport.Protocol.MessageExchangeStream.ExpectServerIdentity() in C:\buildAgent\work\fe2b45bbd4978f75\sou rce\Halibut\Transport\Protocol\MessageExchangeStream.cs:line 242 at Halibut.Transport.Protocol.MessageExchangeProtocol.ExchangeAsSubscriber(Uri subscriptionId, Func`2 incomingRequest Processor, Int32 maxAttempts) in C:\buildAgent\work\fe2b45bbd4978f75\source\Halibut\Transport\Protocol\MessageExchangePr otocol.cs:line 75 at Halibut.Transport.SecureClient.ExecuteTransaction(Action`1 protocolHandler) in C:\buildAgent\work\fe2b45bbd4978f75 \source\Halibut\Transport\SecureClient.cs:line 66 ``` #### Server log ``` listen://[::]:10943/ 82 A client at [::ffff:10.1.0.115]:52944 connected, and attempted a message exchange, but did not present a client certificate ``` ## Affected versions **Octopus Server: 2020.1.2 -> (Suspected)** ## Workarounds ### Workaround 1 (recreate tentacle) 1. Log into the Octopus server and note the target name, environments, and target roles. 2. Open tentacle manager on the impacted machine. 3. Take note of the "Home Directory" setting for your instance (C:\Octopus by default) 4. Delete the Tentacle instance via the Tentacle Manager: ![image](https://user-images.githubusercontent.com/49404281/76994464-cbf82d80-6924-11ea-97e4-954c20ad3aea.png) 5. Ensure that the `Tentacle.config` file has been removed from the home directory noted in step 2. 7. Recreate the tentacle instance with the previously noted name, environments, and target roles. Click the "Overwrite existing" checkbox on the registration page to prevent duplicate targets being created. ### Workaround 2 (new certificate) 1. Run `tentacle new-certificate` on the target (https://octopus.com/docs/octopus-rest-api/tentacle.exe-command-line/new-certificate 1) 2. Update the thumbprint for the target accordingly on the Server as shown: ![image](https://user-images.githubusercontent.com/1407488/77127331-c66f2480-6a97-11ea-84d5-1ecb557514c2.png) ### Workaround 3 (patch server) 1. Ensure that your servers running Tentacle are updated with the Windows Cumulative updates for March 2020 2. Reboot the servers running Tentacle ## Internal links ### Reported cases - https://secure.helpscout.net/conversation/1112186056/58987 (recreated polling tentacle) - https://secure.helpscout.net/conversation/1113171485/59057 (new cert for polling tentacle) - https://secure.helpscout.net/conversation/1113360699/59108 (recreated polling tentacle) ### Slack discussion - https://octopusdeploy.slack.com/archives/CUCG4LAF8/p1584465465243200

I couldn’t find any mentions of restarting the service as a workaround though, so it’s possible that you’re hitting some other issue with similar symptoms. I think it’s worth checking it out and trying the workaround methods listed though.

Let me know if this helps or if you’re still stuck after trying the fixes listed in the issue.

Best regards,
Daniel

michal.wilanowski · 18 May 2023 04:50

Hi Daniel,

Thanks for the answer & you are right, however, in our use case, we don’t need to do anything more than just restart the service,

Aside from that, we are patching servers regularly.
I am testing now the update of the tentacle version from 6.1.1403 to 6.3.305, test results will be present on Monday.

michal.wilanowski · 23 May 2023 05:58

Hi Daniel,

Well, tentacle version didnt change anything.
Can you try to reproduce it?

Installation of working tentacle on EC2
Stop the instance during the weekend.
Start the instance. (Failure as mentioned above)

Thanks,
M

clare.martin · 23 May 2023 12:25

Hey @michal.wilanowski,

Just jumping in for Daniel as he is currently offline as part of our Australian based team. I am just going to see if I can replicate this but wanted to make sure I was creating the EC2 instance like for like otherwise it wont be a valid test.

Can you let me know what OS your EC2 instance is running, is it Amazon Linux, Windows or Ubuntu and what version of the OS was it running (Ie Windows Server 2019).

Just out of curiosity too what Octopus Server version are you running?

We often see:

Halibut.Transport.Protocol.ProtocolException: Unable to receive the remote identity; **the identity line was empty.**

When there is a TLS or SSL cypher mismatch between the Octopus Server and the tentacle but since a service restart fixed the issue I don’t think that is the case here.

Have you noticed this happens on just one machine or are there a few this has happened on and has it happened again after a reboot of the server or has it only happened once on the initial reboot when you put this ticket in?

It would also be good to get a copy of those tentacle logs so we can take a closer look if you can send them over. I have created you a secure link here you can send the files to, let us know once they have been sent over and we will look at them and get to trying to reproduce this issue.

I look forward to hearing from you,
Kind Regards,
Clare

michal.wilanowski · 26 May 2023 05:06

Hi Clare,

It’s r5.large.
I have 6 machines that have the same behavior and one of them has the latest tentacle version.
I would like to propagate mechanism for cost optimization on more ec2’s but it’s blocking me.

We stop machines on Friday and start them on Monday and what happens is.

Identity issue (service restart helps)
Check health issue does not work until step 1 is performed

I’ve checked the logs and this entry is repeatable, there are not other entries which could say more aside from “vulnerable configuration parts”

Before I raised the ticket I’ve done the research and I am confident that it’s not TLS / SSL, it has to be some sort of bug with I/O.

I’ve shared the logs, actually, one entry that is repeatable over and over.

Thanks,
M

finnian.dempsey · 26 May 2023 06:27

Hi @michal.wilanowski,

Just stepping in for Clare while she’s offline, cheers for confirming that and uploading the logs!

From a quick look it seems that you’re using a Polling Tentacle (port 10943) and I also saw the following error which does imply it’s I/O:

System.IO.EndOfStreamException: Attempted to read past the end of the stream
at Halibut.Transport.Protocol.ControlMessageReader.ReadControlMessage(Stream stream)

I’d like to double check if you are running Octopus in HA or just a single node? Are there any network appliances (e.g. LoadBalancers or Proxies) between Octopus and the Tentacle?

You don’t happen to be using Poling Tentacles over WebSockets do you? I recall an issue with SSL offloading but looking closer I don’t really think that’s what’s happening here: Halibut - Polling Tentacle WebSocket via LoadBalancer Error - The identity line was empty · Issue #7340 · OctopusDeploy/Issues · GitHub

I’d also like to double check if you are starting and stopping the instances via the Console UI or another method? Instance lifecycle - Amazon Elastic Compute Cloud

I find this post from our Solutions Engineers is great for outlining their recommendation with using ASG’s, which uses Lambda to remove the instance from Octopus when down-scaling: Using Octopus Deploy AWS Auto-Scaling Groups - #2 by Bob_Walker

The code for Halibut is public and this seems to be where the error is coming from, I noticed that it checks the NETFRAMEWORK value: Halibut/MessageExchangeStream.cs at 377a0e5d2d467fa9d3d045520e988264acace943 · OctopusDeploy/Halibut · GitHub

So could you please confirm which .NET version you have installed? Tentacle installation requirements - Octopus Deploy

We’ll look into reproducing this on our end and will keep you posted with any updates or questions!

Best Regards,

michal.wilanowski · 30 May 2023 08:09

Hi Finnian,

OS Windows Server 2019

Octopus Server in cloud
Deployment Targets
No proxy between EC2 and server
Programmaticaly via boto3, function stop and start instance
We are using: Octopus.Tentacle.6.1.1403-x64.msi I don’t see if we install anything additionally.

Thanks,
M

clare.martin · 30 May 2023 14:55

Hey @michal.wilanowski,

Thanks for that extra information, I have a reproduction of this in our AWS with the below settings (please let me know if any are incorrect):

Windows Server 2019, R5 EC2 Instance
Polling tentacle on 6.1.1403 setup as a Deployment Target (not a worker)

This connects fine to my Octopus Cloud instance on 2023.2.10947.

I notice you mentioned you are hosting Octopus in Cloud but you have a few different Octopus licences (all Octopus Server so you must be self hosting Octopus) and I am unsure which one belongs to the instance you are working on so could you please let us know what Octopus Server version you are running (this should not make a difference but if we cant replicate it on 2023.2.10947 we can try your version).

Also, would you mind uploading the boto3 script you are using to start / stop your instances, I did some googling on the script as I have never used boto3 before and it seems there are a few ways to start/stop an EC2 instance using boto3 (one user was doing it through a lambda function).

I assume you are using the official way from the AWS documentation but just in case you are not I would rather use your script so we have a like for like test.

It seems your secure link has expired so I generated you a new one here you can use for the script upload (please redact any sensitive names or passwords).

What I have tested so far:

I did a direct reboot of the EC2 machine via Windows Start menu and it health checks fine in Octopus when it comes back up without me having to do a service restart of the tentacle. And if I reboot it via the AWS console it comes up fine too in Octopus.
I then stopped the instance via the AWS console and restarted it via the same console, its health checking fine in Octopus. I then shut the machine down via Windows start menu and brought it back up via the AWS console and that connects fine to Octopus too.

From your responses it seems like you need to have the instances off for a few days in order for this issue to be presented. Have you tried to stop the instance via boto3 and then start it again almost straight after and does that health check fine?

Once we get the script and the rest of the details from you we can really start to dig into this issue, I apologise its taken us a bit of time to get to this point with the reproduction but we wanted to make sure we were setting this up right.

I look forward to hearing from you,
Kind Regards,
Clare

system · 30 June 2023 14:56

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.