Octopus fails to download docker containers when acquiring packages

dgard1981 · 10 March 2021 09:47

When using Octopus Cloud, there is often an error during the Acquire packages step when attempting to acquire a docker container. The issue appears to be that the Octopus worker does not have docker installed and running.

Raw logs - docker-not-available.txt (40.2 KB)

docker command not available 
NotSpecified: You will need docker installed and running to pull docker images 
At C:\Octopus\Tentacle\Work\20210310091735-922121-117\Bootstrap.Octopus.DockerPull.ps1:906 char:2 
+     . '.\Octopus.DockerPull.ps1' 
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
at <ScriptBlock>, C:\Octopus\Tentacle\Work\20210310091735-922121-117\Octopus.DockerPull.ps1: line 30 
at <ScriptBlock>, C:\Octopus\Tentacle\Work\20210310091735-922121-117\Bootstrap.Octopus.DockerPull.ps1: line 906 
at <ScriptBlock>, <No file>: line 1 
at <ScriptBlock>, <No file>: line 1 
Failed to download package my/container vlatest from feed: 'https://my.registry.com' 
Unable to pull Docker image

The issue is intermittent and seems to happen about 10% of the time. Hitting “TRY AGAIN…” typically fixes the issue, but obviously many deployments are automated and there isn’t necessarily someone watching who will be able to do that.

Given that the Acquire packages step is automatically applied to all projects that require one or more packages, I don’t think there is anything that can be done from a user perspective, but please correct me if I’m wrong.

If the above statement is true, is this a known issue for Octopus? If so, are there plans to fix the issue?

Thanks,
David

paul.calvert · 10 March 2021 09:59

Hi David,

Thanks for getting in touch!

This is something that we have noticed occurring recently.
We are planning to amend the worker build process to have an additional check on the Docker process to ensure it is available and running before the worker is leased to an instance.

I will reach out to the cloud team to see where they are with this.

Regards,
Paul

kaleb · 10 March 2021 20:50

Our team also started experiencing this issue sometime after 10:00AM CST on March 10th. Our first failure with this error was at 12:38 PM CST and have been unable to deploy successfully even once since that time.

The version 18.0.0 of the Calamari.linux-x64 tool has not been extracted, it will be extracted automatically. 
March 10th 2021 12:43:48Info
Version 18.0.0 of the Calamari.linux-x64 tool has been extracted successfully 
March 10th 2021 12:43:55Info
docker pull octopusdeploy/worker-tools:2.0.1-ubuntu.18.04 
March 10th 2021 12:43:55Error
Cannot connect to the Docker daemon at unix:///home/tentacle/.docker/run/docker.sock. Is the docker daemon running? 
March 10th 2021 12:43:55Error
Failed to download package octopusdeploy/worker-tools v2.0.1-ubuntu.18.04 from feed: 'https://index.docker.io' 
March 10th 2021 12:43:55Error
Unable to pull Docker image

Can we get this issue escalated?

dgard1981 · 11 March 2021 09:33

Thanks for the reply, Paul.

Unfortunately this issue still persists. So far this morning I’ve made 20 attempts to deploy a project that includes a step which runs in a custom docker container, and all of them have failed with the aforementioned error -

docker command not available

Without wishing to sound too dramatic, this issue basically makes Octopus unusable in this scenario.

Hopefully there will be a resolution soon.

Thanks,
David

paul.calvert · 11 March 2021 09:37

Hi David,

If you let me know your instance name/URL I can locate the Worker being used and deactivate it, this will then trigger a new worker to be leased that should hopefully not have this issue.

Regards,
Paul

dgard1981 · 11 March 2021 09:59

Hi Paul,

The logs suggest that DynamicWorker 21-03-11-0534-4qlzq is being used every time.

Thanks,
David

paul.calvert · 11 March 2021 10:03

OK, I’ve queued that worker for deletion, so, your next deployment should lease a brand new worker.

The way the dynamic workers run is that once leased to your instance they will remain there until 1 hour of inactivity or 24 hours, whichever comes first.
This means that if you start encountering this problem the only options at this point are to stop using the Worker for an hour or drop us a message to queue it for deletion and force a new worker to be leased.

Regards,
Paul

dgard1981 · 11 March 2021 10:14

A new worker was indeed used, but the same issue persists.

DynamicWorker 21-03-11-0908-vx9tc

It looks like this is an issue with Windows only. Other dynamic workers that run on Linux always seem to always successfully acquire docker images.

Thanks,
David

paul.calvert · 11 March 2021 10:19

Ah, as you mentioned that I took a look at the OS running on the worker and it is using our Windows 2016 image, which doesn’t have Docker installed on it.
Only our 2019 images have Docker on them.

To use the 2019 image you can either edit the Default Worker Pool in Infrastructure > Worker Pools and select the 2019 image.
e.g.

Or create a new Windows 2019 worker pool and amend the relevant steps to use the new worker pool.
e.g.

dgard1981 · 11 March 2021 10:27

Thanks Paul, that looks to have done the trick.

Thanks,
David

kaleb · 25 March 2021 14:51

This problem is still a huge issue for us. Several times a day for the last 2 weeks our deployment process gets completely blocked by this with no way to address without waiting for the lease to expire or contact support.

Can we get an update on this issue or some way to resolve it ourselves without having to reach out to support?

paul.calvert · 25 March 2021 14:56

Hi @kaleb,

As of this morning, a new version of the Ubuntu worker image was released which should resolve this problem.
If you continue to experience this issue going forwards please let us know and provide your Cloud instance name and we will investigate further.

Regards,
Paul

system · 25 April 2021 14:56

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.