Octopus Server version 3.1.2.0
Tentacle version 3.1.6.0
Octo.exe version 3.0.18.71
I have 4 environments with n machines in each. Each machine is an EC2 instance that runs a bootstrap script on instantiation registering itself with the Octopus server and uses Octo.exe deploy-release to trigger an initial deployment. The bootstrap script makes a call to api/dashboard to get the last release that was deployed to that particular environment and specifies itself as the single machine to target so as to not affect any other machines currently running in the environment.
./octo.exe deploy-release --server $server --apiKey $api --project $project --specificmachines $machinesToTarget --deployto $env --version $latestReleaseForEnv
I also have a tentacle running on the central Octopus server which runs scripts that check the health of currently registered machines in the environment and deletes any unhealthy machines. These two actions get called as the first two steps of the projects deployment steps. This tentacle is also referenced in the --specificmachines value on the octo.exe call above.
This whole configuration is an effort to support immutable infrastructure, self healing and auto scaling with all the added orchestration control and niceties that Octopus Deploy offers out of the box.
When writing the bootstrap script and testing it on a single box this works just fine. The script runs and the machines get health checked and the unhealthy ones deleted. Then the correct release gets deployed to the new machine. The load balancers then successfully ping the application and bring the new machine into the pool. “I win!” I thought… somewhat prematurely…
For the next set of testing I ran a single machine in each environment, so 4 machines in 4 separate environments. This is where I ran into trouble. When terminating all 4 instances at once and watching the self healing mechanism bring up 4 new instances I was met with mixed results. Sometimes 2 of the 4 instances would run their scripts successfully and other times 3 of 4. This was not consistently the same instances and the order I terminated them in didn’t seem to make a difference. Each time an instance failed its bootstrapping I would RDP into the offending instance and trigger the bootstrap script manually. Then the instance would bootstrap correctly and trigger the desired deployment.
The logs in the failing instances showed the following:
Handshaking with Octopus server: http://xxx.xxx.xxx.xxx:8081
Handshake successful. Octopus version: 3.1.2; API version: 3.0.0
Authenticated as: me <>
Finding project: MyProject
Finding release: 2015.12.553
Release ‘2015.12.553’ of project ‘MyProject’ cannot be deployed to environment ‘test’ because the environment is not in the list of environments that this release can be deployed to. This may be because a) the environment does not exist, b) the name is misspelled, c) you don’t have permission to deploy to this environment, or d) the environment is not in the list of environments defined by the project group.
Exit code: -1
This environment is definitely on the list of environments that the specified release can be deployed to… It works with exactly the same params when I trigger it manually.
I am assuming this is to do with the concurrent calls to the Octopus server not being queued correctly.
The tentacle running on the server fails to health check and delete the instances consistently also. In the server logs at C:\Octopus\Logs it shows the timeouts happening three times when the health check is fired correctly but no errors to show the failed deletion of servers. I get this popping up in the TaskLogs…
[“ServerTasks-3051_V48S7ZB4Q3/3af6a9c2366242d9b9bea32960d3e7e2”,“INF”,“2015-12-01T21:44:08.9060500+00:00”,“This Tentacle is currently busy performing a task that cannot be run in conjunction with any other task. Please wait…”,"",0]
Is this being treated as a fail in the project’s deployment process?
Does Octopus support this sort of concurrent use of octo.exe? Does it have a queuing mechanism for this?