Mysterious links between machines residing on same server

We ran into some really weird behaviour with Octopus Deploy. I’ll describe it here and leave it up to you to decide whether it’s a bug or not.

Here’s our setup.
We have six environments: dev, test, dev-release, test-release, staging, production.
These each have a host name: dev-secure.reqtest.com, test-secure.reqtest.com etc.
These reside on three servers: (1) dev & dev-release, (2) test & test-release & staging, (3) production.
In OD, each environment has exactly one machine, connecting by the appropriate host name.

We now wanted to migrate the test environment to a new server, which was a clone of the original test server. We disabled the tentacle on the old server, enabled the tentacle on the new server, and updated DNS settings so test pointed to the new server.

But for some reason OD was still looking for test on the old server. It wasn’t immediately obvious from the UI but when the tentacle health check failed with a SocketException (“A connection attempt failed”) the IP address listed in the exception details was still the old address.
We then changed the tentacle settings so it would connect by IP address and not host name, thinking it was a DNS issue. But the SocketException still listed the old IP address.

We tried lots of things all which I cannot even remember now. We restarted the OD service. We created a new machine configuration and disabled the old one. We removed the tentacle on the new server and reinstalled it to get a fresh certificate with a new thumbprint. Despite this OD somehow still tried to connect to the old server.

Then we discovered under the Connectivity tab for the test machine that it was referring to another host. So in the connectivity tab for machine “test-secure.reqtest.com” the status message referred to “staging-secure.reqtest.com”.

Our guess is that somehow, because the different machine configurations all originally resolved to the same IP adress, OD had linked them all together. So although we had migrated the test machine to a new server, it was still linked via the other machines to the old server. We finally solved the issue by disabling the other machines that reside on the same server as test, and then suddenly everything worked.

I hope this makes sense and that this is enough to recreate the issue (should you wish to do so). We didn’t save any screenshots or anything because we were too busy trying to solve the problem.

Hi - thanks for all the info. We haven’t hit any similar issues recently, but we’ll keep an eye on it in case it pops up again.

Best regards,
Nick