Octopus Server having problems connecting with Tentacles in Staging environment in DMZ

Our Octopus server is having problems communicating with our Tentacle setup in our staging environment.

Environments:
We have 3 Environments. Development, Staging and Production. Below are more details on Dev and Staging. We won’t worry with production in this thread, but it’s assumed that staging is a mirrored setup to production so we would experience the same issues there as well.

Octopus Server and Tentacle Version: 2.0.9.1020


About the Development Environment…
MachineName: DEVBOX (for the purposes of this conversation)
DNS Name: dev-web1

Development is located on a server inside of our local network and is a member of our Active Directory domain. The Octopus server has no problems connecting with the Development Tentacle instance and everything works as expected. Note that since this server is located on our internal network, system policy and network configurations are not very restricted.


About the Staging Environment…
MachineName: STAGEBOX (for the purposes of this conversation)
DNS Name: stage-web1

Staging is located on a server in the DMZ and is NOT part of our Windows domain. It is an exact mirror of our production environment with identical network and application configurations in place. The configurations for the DMZ machines are much more restricted than that of our Development server since they are on the DMZ. One key thing to note, which could be part of our problem is that the DMZ servers have no inbound access to machines on our internal network and also have no Internet access… so if I go to a browser and type www.google.com, I got nothin’. Another thing to note is that as of now, the Windows Firewall is turned off, so that shouldn’t be an issue.

It’s been explained to me by our network administrators that the Octopus server should have access to port 10933 on the staging server. I verified this by logging onto the machine that the Octopus Server is installed via Remote Desktop and tested the connection via a telnet prompt by typing this in a command line prompt:
telnet stage-web1 10933

It connected just fine.


Here’s what happened when I try setting up staging in the Octopus control panel…

In the Octopus control panel, I created my Staging Environment and clicked “Add machine”. It’s interesting because I got different results when I entered different values in the hostname input box.

Attempt #1: I used the DNS Name for hostname value. I typed “stage-web1” into the hostname field and a port value of 10933. I then pressed “Discover” it span for 30 seconds and timed out saying: A response was not received within 00:00:30. I verified that stage-web1 resolves correctly to the IP Address of the staging server by testing a ping from a command prompt on the Octopus server, so that’s odd that it didn’t work. (Technically, our telnet test did the same thing, but I pinged to double check)

Attempt #2: I used the machine name next. I typed “STAGEBOX” into the hostname field and left the port value of 10933. I then pressed “Discover” and after only about a second, I received this error message: The requested name is valid, but no data of the requested type was found.

Attempt #3: I used the IP Address of the server. I typed the IP Address into the hostname field and again, left the default port of 10933. I pressed “Discover” and less than one second later, I received this 3rd different error message: No connection could be made because the target machine actively refused it x.x.x.x:10933


Another interesting observation:
Installation of the Tentacle on our staging environment was painfully slow. It took the installation script nearly 5 minutes to run, whereas it only took seconds to run our our Development server. Is it making calls back home or trying to hit the Internet or anything like that? If so, that could be causing some sort of time-out within the script since the staging server hosting Tentacle doesn’t have direct Internet access.

More questions:
Does the Tentacle need to have access back to the Octopus server through some sort of different port number? As I mentioned before, while our Octopus server has outbound access to our staging server, our staging server does not have inbound access to the Octopus server. We may be able to get a network rule changed to allow for this, but could this be part of the problem?.. and if so what port(s) would need to be granted access?


Thanks for any help or suggestions you can provide!

Hi Mike,

Octopus/Tentacle communication is pretty simple - Octopus makes a direct TCP connection to the Tentacle on 10933, that’s about it.

Since inserting images here is not so fun, I’ve written up a response as a how-to on our docs page: http://docs.octopusdeploy.com/display/OD/Troubleshoot+Listening+Tentacles

Can you please look over the suggestions in that document and let me know if you turn anything up?

Best regards,
Nick

Hi Nick,

Thank you for the quick response and the link to the article. I reviewed the contents of the article and am unfortunately still not having any luck.

I agree, posting screenshots isn’t fun here, but in this post, I am attaching some and their descriptions listed below. Hopefully that will help to convey any information you need in helping to troubleshoot.

01 - This is a screenshot on the troubled server running the tentacle service. You will notice that it says the tentacle service is in listening mode.

02 - Another screenshot from the server running the tentacle service. It’s running under the Local System account.

03 - Another screenshot from the server running the tentacle service. I was able to navigate to https://localhost:10933 and get the “Octopus Tentacle configured successfully” message.

04 - On the server running the main Octopus web management application, I was able to navigate to https://stage-web1:10933 and see the same “Octopus Tentacle configured successfully” message.

One thing I did notice is that the title of the web page that I get to reads “Unauthorized” (Look in the tab title on the browser screenshots)

Thanks again for the quick response and hopefully we’ll figure out what it is soon.

Mike

03.PNG

04.PNG

Two other interesting notes to point out:

  • In the Services screenshot (02) I notice that there’s a Stopped service called “Octopus Tentacle” … I’m guessing this is a remnant from 1.6 when I had that setup in the past. I used the Uninstaller to remove 1.6, but if my guess is correct, then this somehow remained.

  • Another thing to note is that I did have Tentacle version 1.6 working on this machine when I did have that setup. I was able do deploy from the same machine that’s giving me issues now.

Thanks again for any input you have to offer. I’m not sure if your company does phone support at all, but we’re equipped with all the necessary Voice and Remote screen sharing tools if that is an option.

Thanks,
Mike Joseph

Thanks for the follow up. Its possible there’s a proxy server between the server and Tentacles - can you please check proxy server configuration on the Octopus Server box (IE LAN settings, or ISA configuration if that’s in use).

Octopus/Tentacle communication in 2.0 is via TCP, we support a minimal subset of HTTP to make the services easier to work with, but we don’t support proxies at this point. (The 1.6 stack could handle this.)

If the proxy investigation comes up blank, can you please send (by email if preferred) your OctopusServer.txt file from C:\Octopus\Logs?

We don’t formally offer phone support, but we’re always happy to connect up if necessary to work through issues. At the moment I don’t think I have enough ideas to make a sync up worthwhile, let’s see if we can turn up a bit more information (or ideas on my part) first?

Regards,
Nick

That’s cool on the phone support thing. I figured that was the policy… I was just putting it out there as an option.

I changed the proxy configuration as you suggested (see the attached screenshot) and gave the service time to restart. I tried adding the machine and got the same results (by trying the DNS name, Machine name and IP address for the hostname values) that were in my initial post.

I sent the log file(s) to nick@octopusdeploy.com … but I got a bounceback. Could you send me the correct address?

Thanks again,
Mike

Thanks Mike - I’m nblumhardt at that domain.

Cheers,
Nick

Sent! Thanks again!

PS… I don’t see the attachment I cited in Post #6 showing up. I’m trying to re-attach to this message.

Thanks Mike!

We already ignore these settings when making TCP requests; the proxy situation will be more of a problem if the DMZ network is only accessible via the proxy, or if traffic can get out of your dev network only via the proxy (we won’t use it).

The firewall between your dev network and the DMZ is probably the one to check.

Hope this helps and I’m not too far off into the rough! :wink:

Cheers,
Nick

To anyone viewing this discussion -

After many days of trial and error and email dialog with our sys admins and the Octopus support team, Nick was able to come through with an explanation and a fix.

The Diagnosis:
First an explanation as to what was actually happening… Basically anything “tentacle” related was running very slowly because every time a tentacle executable was being run, the Tentacle Server’s security settings were attempting to hit Verisign servers to check the integrity of the executables. Since our DMZ servers do not have Internet access, the operations were timing out between each step, which was the root of the slow-downs.

The reason why we were having what seemed to be inconsistent network related issues is because the Tentacle Service was being constantly being automatically restarted because whatever mechanism was restarting it thought that the service was crashed because of its unresponsiveness. So for the moments that the service was running, I was able to discover a tentacle, but the service would promptly crash and then the connection to the Tentacle server would be actively refused on port 10933 until the service came back up. It was a vicious cycle!

The Fix:
After an accurate diagnosis was made, the fix was simple: Go to “Internet Options” on your computer. Select the “Advanced” tab, scroll down to the “Security” section and un-check the option that reads, “Check for publisher’s certificate revocation”.

After accepting the changes, the Tentacle software worked as expected. More information can be found in the email from support, which I included below.

I hope this saves someone some time in the future!

Best,
Mike Joseph


Hi Mike!

We use VeriSign to sign our installer and binaries. Windows will check certificate revocation lists when it loads the executables, which has caused reports of slowdowns from others. The time taken to load the executables does look pretty close to a classic network timeout as you suspected.

This article seems to cover it:
https://www.agressonet.com/Files/extranet_techguide/watcmf20awcoftte09/Agr552/html/certificate_revocation_check.htm

Can you let me know if this turns up any clues?

Cheers!
Nick

Glad you got this sorted and thanks for sharing the solution Mike!

Paul