Octopus Check Health in Parallel

carlos.moreira · 17 May 2016 08:26

Hi,

I have one Octopus Server instance to deploy and install applications to several tentacles at the same time. I noticed that if i launch, at the same time, update batches to all machines the Octopus Check Health takes too long because the check is waiting in queue for different servers.
Only when one tentacle answers OK he moves to check next server. All machines are ready to deploy at once and this is causing a lot of delay… I do not want to remove the check health to ensure the tentacle is ready

Is there any way i can set octopus to check multiple heath to tentacles at once?

Thank you!

Damian_Brady · 19 May 2016 02:36

Hi Carlos,

Thanks for getting in touch!

I just wanted to clarify a few things to help with your issue.

Can you explain what you mean by launching update batches? Do you mean starting deployments, upgrading tentacle machines or something else?

When we run the scheduled health checks from within Octopus, we do it on all machines in parallel so they shouldn’t block each other. Are you explicitly triggering a health check from your deployment process or are you talking about the scheduled health checks running at the same time?

If this is during a deployment, it would help if you could send us the raw deployment log as well as a screenshot of your process. The more information the better!

Kind Regards,
Damian Brady

carlos.moreira · 13 June 2016 16:37

Hi Damian

The health check i am sending is via API, and if i send a manual health check request they are both pending waiting for something.

Check attached image.

If i access by remote desktop to the machine, the tentacle is up and running, i cannot tell what is he waiting for, but it is holding up almost all of my builds.

My process is the following
1- Deploy to a machine application using octopus (using API)
2- After deploy successful i restore the machine to previous state
3- The machine boots up to the previous state
4- Using PowerShell/API send heath check request before releasing the machine to accept more deploys

At this point (most of the times) it hangs in a loop checking heath.

The machine has the tentacle up and running
Sometimes if a do a heath check via octopus the heath check (not allways, check attached image) is ok but via API it is looping (waiting for response from heath check)

Vanessa_Love · 16 June 2016 04:07

Hi Carlos,

Damian is headed off to a conference so I am taking over this ticket. Could you explain to me a bit about why you are restoring the machines? Health checks are designed to run in the background and it is possible it is waiting for the machine to come back up.
Mostly Id like to understand the why of your process as I think one of our new features might handle this better.
Also are they polling or listening Tentacles?

Vanessa

carlos.moreira · 16 June 2016 11:24

Hi Vanessa

I’ll describe the restore process to see if you can help me. These are Continuous Integration machines designed to test builds with automated tests, the process is controlled by a Power Shell that controls the following steps:

[State 1] Machine is up and octopus tentacle manager is all green (service is running), Octopus server can see and detects this tentacle as OK.

1 – Deploy app to C.I. server using octopus server (using API). The deployment works perfect, the app is now installed on remote C.I. machine
2 – Application is working ok on server (Web App using IIS)
3 – After tests we restore the machine to previous state [State 1] using Microsoft Azure Cloud restore process
4 – As part of the restore process the machine restarts

[State 2] The machine finishes boot and octopus tentacle is green and running

5 – There is a while loop in powershell to wait for the resume process until tentacle answers OK with health check

 $body = @{ 
      Name = "Health" 
      Description = "Checking health of $($targetServer.Name)" 
      Arguments = @{ 
          Timeout= "00:05:00" 
          MachineIds = @($($targetServer.Id)) 
      } 
  } | ConvertTo-Json

  $result = Invoke-WebRequest -Uri $OctopusURL/api/tasks -Body $body -Headers $header -Method Post

Process sometimes hangs here in infinite loop because health check does not return ok
If I go to Octopus Server interface and click on the CI machine and check health the answer is ok
The api request keeps looping (Waiting for Tentacle to answer)

Our tentacles are in Listening mode

carlos.moreira · 30 June 2016 09:53

Hi!

Any news on this subject?

Michael_Noonan · 4 July 2016 04:07

Hi Carlos,

Thanks for getting back to us and I apologise for the delay in responding. I have read over your description a few times and I can see the value in what you are doing with automated testing, and using the infrastructure support in Azure to reset to known baseline after each test run.

I can’t give you a direct answer as to why the health check for that machine seems to be stalling without more information. If you feel like my suggestions below will not help in your situation, I will ask you for some more information so I can investigate further.

A (potentially) silly question to start with: Have you considered creating a new machine in Azure for each test run? This will end up being more straightforward from an Octopus point of view. My guess is the “restore” approach is to improve performance?

How strongly do you feel about the machine waiting for a successful health check when it comes back online? I understand the idea behind triggering a health check as quickly as possible, but from the pseudo-code you sent, it looks like you could end up enqueuing multiple health checks for the same machine for each iteration of the loop?

In this instance I would recommend POSTing the health check once (to get it back “online” ASAP) and don’t wait on the result of the health check task. Once the health check completes, Octopus will recognise the machine as being online and available for new deployments. Perhaps I am missing something important here?

We have been working hard on Octopus Deploy 3.4 (currently in Beta) which actually includes features to improve how Octopus treats elastic and transient environments/machines. I think some of the features we are building will come in very handy for your situation. The features I think will be interesting are:

You can add a Health Check step to your deployment process, and take certain actions based on the result
You can configure a deployment to automatically skip Unavailable or Unhealthy machines
You can tailor the health check scripts that are run (and how often they are run) using Machine Policies

I would highly recommend looking at our guide for working with elastic and transient environments and try out Octopus Deploy 3.4 Beta 1.

I hope that helps and please don’t hesitate to reach out if I can help you further!
Mike