Check machine status, if back online - continue, if not - wait and check again

Hello!

During our deployment process we sometimes have machines that won’t start up properly and that take a while to start as well. To handle this we are wanting to poll the machine and only continue the deployment process when the machine is online again and continue to poll if not (Step 3 below). We are having some troubles with this…

For instance:

  1. Start Service1
  2. Restart Service1 machine if Step 1 fails (variable run condition on Step 1 failure, working successfully)
  3. Wait for Service1 machine to come back online (variable run condition on Step 2 failure)
  4. Start Service 1 (Try starting again)

For Step 3 I am using “Run a Script” step template
This is the script:
while(!(Test-Connection -ComputerName $OctopusParameters[“Octopus.Machine.Hostname”] -Count 1 -Quiet)){
echo “Machine off, sleeping for 15s”;
Start-Sleep -s 5
}

Sometimes the script works and there are no errors. Other times I’m getting these errors (sometimes one, sometimes the other):

The step failed: Activity Check PF1 after machine restart on DPFWeb01 failed with error ‘An error occurred when sending a request to ‘machineURL’, before the request could begin: No connection could be made because the target machine actively refused it [IP]:10933
No connection could be made because the target machine actively refused it [IP]:10933’.
DPFWeb01
Ran for 40 seconds
July 9th 2019 14:31:38Info
Executing script on ‘DPFWeb01’
July 9th 2019 14:32:08Error
An error occurred when sending a request to ‘machineURL’, before the request could begin: No connection could be made because the target machine actively refused it [IP]:10933
No connection could be made because the target machine actively refused it [IP]:10933
July 9th 2019 14:32:18Info
Guidance received: Fail
July 9th 2019 14:32:18Fatal
The action Check PF1 after machine restart on DPFWeb01 failed

The step failed: Activity Check PF1 after machine restart - clone (1) on DPFWeb01 failed with error ‘An error occurred when sending a request to ‘machineURL’, before the request could begin: The client was unable to establish the initial connection within 00:01:00
The client was unable to establish the initial connection within 00:01:00’.
DPFWeb01
Ran for 6 minutes and 48 seconds
July 9th 2019 15:09:53Info
Executing script on ‘DPFWeb01’
July 9th 2019 15:11:38Error
An error occurred when sending a request to ‘machineURL’, before the request could begin: The client was unable to establish the initial connection within 00:01:00
The client was unable to establish the initial connection within 00:01:00
July 9th 2019 15:16:41Info
Guidance received: Fail
July 9th 2019 15:16:41Fatal
The action Check PF1 after machine restart - clone (1) on DPFWeb01 failed

Best Regards,

DG

Hi DG,

Thanks for getting in touch! This is a complex process indeed. Octopus wasn’t originally built to natively handle a deployment target being rebooted mid-deployment. We have added enough options now to make this possible, but it’s still a complex approach.

The specific error you are seeing usually means the Tentacle.exe service hasn’t fully started listening on TCP Port 10933 by the time you attempt to connect. What I noticed in your script for Step 3 is that you poll the machine by hostname, but you don’t poll the Tentacle itself. If you change this to poll the Tentacle on its HTTPS test address and wait for a HTTP 200 OK result, that should mean the Tentacle service is started and ready to accept connections.

If you don’t have good success with this approach, please send through a few things to support@octopus.com so we can help further:

  1. A screenshot of your deployment process
  2. A JSON export of any important steps with file names matching the process steps so we can correlate
  3. The raw task log of a successful process where there is a reboot, and a raw task log where the process fails this way.

Hope that helps!
Mike

I’ve been playing with this for a bit. I’m trying to poll the Tentacle itself using the Octopus Deploy Tentacle service name “OctopusDeploy Tentacle” and am not having much luck. I get errors related to not being able to find the service:
#ERROR: Get-Service : Cannot find any service with service name ‘OctopusDeploy Tentacle’.
#Get-Service : Cannot find any service with service name ‘W32Time’. (This is Windows Time service)

Here is the script I’ve been testing:

Question, what is the Tentacle’s “HTTPS test address” you speak of and how do I find it?

Thanks for the help!

I’m talking with my team a little more on this. The below command works when running locally against certain machines, but not others. May have something to do with the way are dev/prod environments differ.
Get-Service -ComputerName machineName -Name “OctopusDeploy Tentacle” | fl name, status

However, the locally working command still doesn’t work when ran from Octopus. This may have something to do with the “HTTPS test address” you mentioned

Hi!

Thanks for keeping in touch! @Alex.Rolley was able to provide some pointers. Here is a previous post where he helped a customer achieve the same kind of outcome: How to Gracefully Handle a Reboot Step In Deploy Project

And here is the PowerShell script body:

if (-not ([System.Management.Automation.PSTypeName]'ServerCertificateValidationCallback').Type)
{
$certCallback = @"
    using System;
    using System.Net;
    using System.Net.Security;
    using System.Security.Cryptography.X509Certificates;
    public class ServerCertificateValidationCallback
    {
        public static void Ignore()
        {
            if(ServicePointManager.ServerCertificateValidationCallback ==null)
            {
                ServicePointManager.ServerCertificateValidationCallback += 
                    delegate
                    (
                        Object obj, 
                        X509Certificate certificate, 
                        X509Chain chain, 
                        SslPolicyErrors errors
                    )
                    {
                        return true;
                    };
            }
        }
    }
"@
    Add-Type $certCallback
 }
[ServerCertificateValidationCallback]::Ignore()

(Invoke-WebRequest -method head -Uri "https://localhost:10833").statuscode

Hope that helps!
Mike

1 Like

Thanks for the help!

It looks like the issue we were having was when the scripts would be executed on the “Octopus Server”, the would run under the server’s own machine account. We determined that we either need to give the server machine necessary rights or change Octopus to run as a service account which has the necessary rights.

Best Regards

Hi @DG

Thanks for the update and glad to hear that you worked out the issue.

Let me know if there is anything else we can help with!

Regards,
Alex

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.