After EC2 stop and start, Octopus Tentacle was restarted as a process and the following error occured.
Quick fix is a manual service restart.
I’ve compared pollid and cert thumbprint and everything looks fine. As mentioned above, restart works so it’s not that.
Current Version 6.1.1403
Part of the logs:
Secure connection established… , using protocol Tls12
Unexpected exception executing transaction.
Halibut.Transport.Protocol.ProtocolException: Unable to receive the remote identity; the identity line was empty.
at Halibut.Transport.Protocol.MessageExchangeStream.ReadRemoteIdentity()
at Halibut.Transport.Protocol.MessageExchangeStream.ExpectServerIdentity()
at Halibut.Transport.Protocol.MessageExchangeStream.IdentifyAsSubscriber(String subscriptionId)
at Halibut.Transport.Protocol.MessageExchangeProtocol.ExchangeAsSubscriber(Uri subscriptionId, Func`2 incomingRequestProcessor, Int32 maxAttempts)
at Halibut.Transport.SecureClient.ExecuteTransaction(ExchangeAction protocolHandler, CancellationToken cancellationToken)
2023-05-16 03:43:39.5737
Thanks for getting in touch! The issue you’re describing sounds an awful lot like this.
I couldn’t find any mentions of restarting the service as a workaround though, so it’s possible that you’re hitting some other issue with similar symptoms. I think it’s worth checking it out and trying the workaround methods listed though.
Let me know if this helps or if you’re still stuck after trying the fixes listed in the issue.
Thanks for the answer & you are right, however, in our use case, we don’t need to do anything more than just restart the service,
Aside from that, we are patching servers regularly.
I am testing now the update of the tentacle version from 6.1.1403 to 6.3.305, test results will be present on Monday.
Just jumping in for Daniel as he is currently offline as part of our Australian based team. I am just going to see if I can replicate this but wanted to make sure I was creating the EC2 instance like for like otherwise it wont be a valid test.
Can you let me know what OS your EC2 instance is running, is it Amazon Linux, Windows or Ubuntu and what version of the OS was it running (Ie Windows Server 2019).
Just out of curiosity too what Octopus Server version are you running?
We often see:
Halibut.Transport.Protocol.ProtocolException: Unable to receive the remote identity; **the identity line was empty.**
When there is a TLS or SSL cypher mismatch between the Octopus Server and the tentacle but since a service restart fixed the issue I don’t think that is the case here.
Have you noticed this happens on just one machine or are there a few this has happened on and has it happened again after a reboot of the server or has it only happened once on the initial reboot when you put this ticket in?
It would also be good to get a copy of those tentacle logs so we can take a closer look if you can send them over. I have created you a secure link here you can send the files to, let us know once they have been sent over and we will look at them and get to trying to reproduce this issue.
I look forward to hearing from you,
Kind Regards,
Clare
It’s r5.large.
I have 6 machines that have the same behavior and one of them has the latest tentacle version.
I would like to propagate mechanism for cost optimization on more ec2’s but it’s blocking me.
We stop machines on Friday and start them on Monday and what happens is.
Identity issue (service restart helps)
Check health issue does not work until step 1 is performed
I’ve checked the logs and this entry is repeatable, there are not other entries which could say more aside from “vulnerable configuration parts”
Before I raised the ticket I’ve done the research and I am confident that it’s not TLS / SSL, it has to be some sort of bug with I/O.
I’ve shared the logs, actually, one entry that is repeatable over and over.
Just stepping in for Clare while she’s offline, cheers for confirming that and uploading the logs!
From a quick look it seems that you’re using a Polling Tentacle (port 10943) and I also saw the following error which does imply it’s I/O:
System.IO.EndOfStreamException: Attempted to read past the end of the stream
at Halibut.Transport.Protocol.ControlMessageReader.ReadControlMessage(Stream stream)
I’d like to double check if you are running Octopus in HA or just a single node? Are there any network appliances (e.g. LoadBalancers or Proxies) between Octopus and the Tentacle?
Thanks for that extra information, I have a reproduction of this in our AWS with the below settings (please let me know if any are incorrect):
Windows Server 2019, R5 EC2 Instance
Polling tentacle on 6.1.1403 setup as a Deployment Target (not a worker)
This connects fine to my Octopus Cloud instance on 2023.2.10947.
I notice you mentioned you are hosting Octopus in Cloud but you have a few different Octopus licences (all Octopus Server so you must be self hosting Octopus) and I am unsure which one belongs to the instance you are working on so could you please let us know what Octopus Server version you are running (this should not make a difference but if we cant replicate it on 2023.2.10947 we can try your version).
Also, would you mind uploading the boto3 script you are using to start / stop your instances, I did some googling on the script as I have never used boto3 before and it seems there are a few ways to start/stop an EC2 instance using boto3 (one user was doing it through a lambda function).
I assume you are using the official way from the AWS documentation but just in case you are not I would rather use your script so we have a like for like test.
It seems your secure link has expired so I generated you a new one here you can use for the script upload (please redact any sensitive names or passwords).
What I have tested so far:
I did a direct reboot of the EC2 machine via Windows Start menu and it health checks fine in Octopus when it comes back up without me having to do a service restart of the tentacle. And if I reboot it via the AWS console it comes up fine too in Octopus.
I then stopped the instance via the AWS console and restarted it via the same console, its health checking fine in Octopus. I then shut the machine down via Windows start menu and brought it back up via the AWS console and that connects fine to Octopus too.
From your responses it seems like you need to have the instances off for a few days in order for this issue to be presented. Have you tried to stop the instance via boto3 and then start it again almost straight after and does that health check fine?
Once we get the script and the rest of the details from you we can really start to dig into this issue, I apologise its taken us a bit of time to get to this point with the reproduction but we wanted to make sure we were setting this up right.
I look forward to hearing from you,
Kind Regards,
Clare