Tentacles cannot be reached (health check fails) after migration

alexanderdancho · 1 January 2014 00:28

Hey folks,

So after migrating from 1.6 to 2, it looks like everything took just fine, but after upgrading tentacles I’m getting the following error in my health checks:

Delivery of a Octopus.Platform.Deployment.Messages.Health.TentacleReportHealthRequest failed
The remote machine could not be reached.
Pipefish.PipefishException: The request failed: BadRequest
The incoming request was on a communication link (subscription) that is no longer valid. Reset connectivity to perform a new handshake and reestablish communication.
   at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.PerformExchange() in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 303
   at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.Run() in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 173
19:25:15Verbose
Delivery of a Octopus.Platform.Deployment.Messages.Health.TentacleReportHealthRequest failed

Any ideas where that may be stemming from? I’ve restarted the server, restarted the tentacles, and I’m still a little perplexed as to what to do. Thanks!

Nicholas_Blumhardt · 1 January 2014 03:27

Hi Alexander,

This can sometimes happen as a result of reinstalls etc.

On the Environments page, click the problematic Machine, select the Connectivity tab. On the upper right corner of the page there should be a Reset button - press this and wait while communication is re-established (the connection will drop out while the Tentacle restarts).

Let me know how you go!
Regards,
Nick

alexanderdancho · 1 January 2014 03:31

Perfect. Thanks Nick!

alexanderdancho · 1 January 2014 19:48

Hey Nick,

So everything seems to be fine, but on two of my health checks I’m getting the warning This machine shares its endpoint. Any idea what could be causing that? I set them up as fresh machines with new tentacles, so there’s not any legacy 1.6 stuff there any more.

Nicholas_Blumhardt · 1 January 2014 21:15

We usually emit this warning if the same physical “Tentacle” is mapped to more than one “Machine” in Octopus; do you have one machine shared in multiple environments?

If so, in 2.0 you can add the same machine to multiple environments via its settings page, just delete one of the machines and map the remaining one to all the environments it appears in.

If not, can you please send the “Raw” task log for the health check? (nblumhardt at our domain reaches me if you’d rather not post here.)

alexanderdancho · 2 January 2014 20:04

Here’s the log. It looks like it’s assuming the tentacles are on the same box. If I go into the server and RDP into those addresses, I get the proper machines, so the computer names and IPs are definitely correct. Not sure why it’d think that they’re located on the same endpoint. Any thoughts?

                    |   Success: Check machine: IP-0AC00A9D at https://ip-0ac00a9d:10933/
15:02:48   Info     |     Sending health check request to IP-0AC00A9D at https://ip-0ac00a9d:10933/ with SQUID SQ-IP-0AC00BB6-FE23360C...
15:02:48   Info     |     Health check successful. Running version: 2.0.8.977.
15:02:48   Info     |     Drive C:\ has 23 GB of available free space remaining
15:02:48   Info     |     Drive D:\ has 22 GB of available free space remaining
15:02:48   Info     |     Drive Z:\ has 11 GB of available free space remaining
                    |   
                    |   Warning: Check machine: IP-0AC00C4A at https://ip-0ac00c4a:10933/
15:02:48   Warning  |     This machine shares its endpoint.
15:02:48   Info     |     Health check successful. Running version: 2.0.8.977.
15:02:48   Info     |     Drive C:\ has 23 GB of available free space remaining
15:02:48   Info     |     Drive D:\ has 22 GB of available free space remaining
15:02:48   Info     |     Drive Z:\ has 11 GB of available free space remaining
                    |   
                    |   Success: Health results:
15:02:48   Info     |     - ONLINE:  AWS Thrawn at https://10.192.10.215:10933/, running version 2.0.7.966 on machine WIN-VCT3OUA7Q2O as WORKGROUP\SYSTEM (Local Administrator: True)
15:02:48   Info     |     - ONLINE:  IP-0AC00A9D at https://ip-0ac00a9d:10933/, running version 2.0.8.977 on machine IP-0AC00A9D as WORKGROUP\SYSTEM (Local Administrator: True)
15:02:48   Info     |     - ONLINE:  IP-0AC00C4A at https://ip-0ac00c4a:10933/, running version 2.0.8.977 on machine IP-0AC00A9D as WORKGROUP\SYSTEM (Local Administrator: True)

Parenthetically, if I switch to the private IP instead of computer name, it still thinks they live on the same endpoint. It’s kind of baffling.

alexanderdancho · 2 January 2014 20:14

Another strange part: on the connectivity page of the machine that has the “same endpoint” warning, it’s got

Secure TCP connection to https://ip-0ac00a9d:10933/mx/v1

…which is the TCP connection for the other machine. I’ve triple checked to make sure the IP address is mapped fine on the settings page, and it is.

alexanderdancho · 2 January 2014 20:37

Digging into it even more, here’s the two records in Raven:

{
  "Name": "IP-0AC00A9D",
  "Uri": "https://ip-0ac00a9d:10933/",
  "Thumbprint": "40A7BBF685A2566575941D4925648E223CDD6944",
  "Squid": "SQ-IP-0AC00BB6-FE23360C",
  "CommunicationStyle": "TentaclePassive",
  "Health": {
    "Status": "Online",
    "Version": "2.0.8.977",
    "LastChecked": "2014-01-02T20:26:45.5760187+00:00"
  },
  "IsDisabled": false,
  "Roles": [
    "web"
  ],
  "EnvironmentIds": [
    "Environments-2"
  ]
}

{
  "Name": "IP-0AC00C4A",
  "Uri": "https://ip-0ac00c4a:10933/",
  "Thumbprint": "F488742DE7AD9825172E1D4495EDF7F8DE4F3D14",
  "Squid": "SQ-IP-0AC00BB6-FE23360C",
  "CommunicationStyle": "TentaclePassive",
  "Health": {
    "Status": "Online",
    "Version": "2.0.8.977",
    "LastChecked": "2014-01-02T20:26:45.5760187+00:00"
  },
  "IsDisabled": false,
  "Roles": [
    "web"
  ],
  "EnvironmentIds": [
    "Environments-2"
  ]
}

The one thing in common is the Squid ID, which, looking at some of my machines that aren’t having this problem, appears to be the problem. When does this get set, and am I doing something in setup that’s causing them to be duplicated?

alexanderdancho · 2 January 2014 20:41

It’s definitely on my end. Trying to figure out where this gets set, so if you have an idea of where that’s getting pulled from (so far ruling out Computer Name and Full Computer Name, since they’re different on each machine), I’d appreciate it.

Your product is super cool. Just thought I’d throw that out there.

alexanderdancho · 2 January 2014 20:44

So it’s in tentacle.config, and my guess is when I uninstalled / reinstalled, it didn’t wipe that properly. Going to blow that away and see what happens.

(Sorry for sort of talking to myself in this thread, I’m basically just forum-rubber-ducking at this point.)

alexanderdancho · 2 January 2014 20:48

So I did the following:

Uninstalled the service from the tentacle manager
Uninstalled from Add/Remove Programs

… and Tentacle.config still lives in the original install directory. I’m wondering if maybe the Squid ID that’s in there doesn’t get overwritten on a reinstall if you don’t blow that directory away?

Nicholas_Blumhardt · 2 January 2014 22:13

Aaah - this makes sense, thanks for all the detail. Yes, the machines need to have unique SQUIDs, so when cloning VM images you need to delete the Tentacle.config file before running the Tentacle Manager or a script to register the machine.

Up and running now?

Regards,
Nick

alexanderdancho · 3 January 2014 18:46

Yeah, we’re up and running with a bit of a caveat - I’m happy to open a new ticket for it because it doesn’t really address this conversation, but we’re not seeing prerelease packages show up in the create release screen at all. We’ve got a lot of packages that use git branch names for development branches like:

3.1.36-plannerlanding

… and if I create a release using packages with that version number and manually put the release in, it’s fine, but when trying to browse the package list, we’re not seeing any of our prerelease packages whether or not we have the option ticked off. Just for frame of reference, we’re using TeamCity 8 as our NuGet server and if I create/deploy the release when manually putting the version number in, it works fine, so we can work around it, but I wasn’t sure if that was a known issue for y’all.

Edit: just a thought - are y’all only showing the most recent release number? For example, our master branch is on SemVer 5.0.100, but our development branches are 3.1.X-branchname. Just thought maybe that played into it.

Nicholas_Blumhardt · 5 January 2014 23:18

Ah- yes, that would make sense. We sort by package version number descending, and only bring back the top 30 results. Not sure what we can do to improve this one, open to ideas though. Thanks for the follow-up!

alexanderdancho · 6 January 2014 00:34

Seems like a load more option (with a skip take on the back end) could do
it? It definitely loaded all of 'em in 1.6, right?

Nicholas_Blumhardt · 6 January 2014 00:47

Sounds reasonable - but if you have a large number of 5.x packages it could be a pretty poor experience clicking through all of them to get to the 3.x packages you need.

Not sure about 1.6’s behaviour here, I’ll check it out ASAP and see what we do there. Cheers!

alexanderdancho · 6 January 2014 02:12

Maybe even not a “load more” but if an exact match is found when searching, display that package’s availability? Dunno, just kicking tires. Or even just querying the whole of the feed when searching.

alexanderdancho · 17 February 2014 20:57

Hey there Nicholas,

Not to drag up an ancient thread, but I have a question on resetting Tentacle connectivity. I know it can be done from the server side using the GUI, but is it possible to do it from the Tentacle side using Tentacle.exe? Just trying to see if I can work that into some of my posh scripts. Thanks!

Nicholas_Blumhardt · 17 February 2014 21:54

Hi Alexander,

There are a couple of different Tentacle commands for connectivity-related things; most are sub-commands of Tentacle.exe config.

Can you give me an idea of what you need the result to be (reset trust, or just reset connection, etc.)

Cheers!
Nick

alexanderdancho · 17 February 2014 22:20

So when you go into:

Environments > Machines > [name] > Connectivity

…there’s a button for “Reset”. Is that available through Tentacle.exe config?