curtis.b
(Curtis Boeschen)
6 April 2022 21:07
1
a) tentacle upgrade blocks all tasks
b) only 8 tentacles are upgraded at a time
We have over 2000 tentacles currently and the last 2 updates have resulted in the ‘tentacle upgrade’ to run, which is a bit time consuming. We have a HA configuration with 4 app tier servers and either need the block of all tasks removed or the ability to configure the number of tentacles to be upgraded at at time (maybe 100 vs 8)
Any suggestions or timeline for when these types of changes might be coming? (or is there a value in the DB we could edit to change the number of tentacles being updated?)
thanks
Hi Curtis,
Thanks for getting in touch and detailing your situation. My apologies for the confusion and inconvenience this is causing you. We are aware of what looks like the same issue around Tentacle upgrades blocking system tasks, linked below.
opened 08:09PM - 22 Oct 21 UTC
kind/bug
priority/low
state/backlog
team/fire-and-motion
### Team
- [X] I've assigned a team label to this issue
### Severity
An… y customer who upgrades will have all their deployments blocked during any tentacle upgrades. The bigger the customer the worse it is.
### Version
2021.2 but based on git history this might've been going on for months and months.
### Latest Version
I could reproduce the problem in the latest build
### What happened?
You can choose to upgrade all the tentacles in a specific environment rather than _all_ the tentacles.
![image](https://user-images.githubusercontent.com/38642825/138514326-4db3b506-d091-4a08-bfa0-2f28e131f6ab.png)
That triggers a POST to the /tasks endpoint with the following payload:
```JSON
{
"Arguments": {
"SpaceId": "Spaces-1",
"RestrictedTo": "DeploymentTargets",
"EnvironmentId": "Environments-1"
},
"Description": "Upgrade Tentacles in Development",
"Name": "Upgrade"
}
```
We can see in the log this message:
![image](https://user-images.githubusercontent.com/38642825/138514734-02bd684a-b866-41ea-a70e-a353ca368f90.png)
That message indicates the code got to this line: https://github.com/OctopusDeploy/OctopusDeploy/blob/master/source/Octopus.Server/Orchestration/ServerTasks/Upgrade/UpgradeTaskController.cs#L104
The upgrade is for all tentacles in a specific environment. However, when I run a runbook in my `Test` environment, it is blocked by that upgrade task:
![image](https://user-images.githubusercontent.com/38642825/138514905-5a1f1dd5-1c9d-4d32-b266-30c3d2f12161.png)
In looking at the database we can see why this is happening. The EnvironmentID column is populated for the runbook run, but _not_ the upgrade task (even though it was limited by environments). The upgrade environment ID is stored in the arguments JSON.
![image](https://user-images.githubusercontent.com/38642825/138515259-494958ee-d069-448a-9f44-a831fa28ea63.png)
In diving into the code, the function `IsTaskBlockedByCurrentlyExecutingTasks` on the `TaskQueue` object is comparing Task.EnvironmentId with UpgradeTask.EnvironmentId, not UpgradeTask.Arguments.EnvironmentId.
When I attempt to change the payload to send in the EnvironmentId in the root object, that additional environment id is ignored.
```
{
"Arguments": {
"SpaceId": "Spaces-1",
"RestrictedTo": "DeploymentTargets",
"EnvironmentId": "Environments-1"
},
"Description": "Upgrade Tentacles in Development",
"EnvironmentId": "Environments-1"
"Name": "Upgrade"
}
```
### Reproduction
Steps to reproduce:
Prep-work: Create a runbook to run a "Hello-World" script. Configure it to run in any environment.
1. Install an older version of the tentacle (I was choosing `6.0.0`)
2. Do a health check to ensure that tentacle is showing up as requiring an upgrade
3. Go to the Infrastructure -> Environments and click the overflow menu (`...`) on the environment with the tentacle to upgrade.
4. Choose the option `Upgrade 1 Tentacle in this Environment` to start the upgrade.
5. While the upgrade is running run the runbook you created in earlier in any other environment. You'll see it block.
### Error and Stacktrace
_No response_
### More Information
Octopus Code I looked at:
- `Octopus.Server.Orchestration.TaskQueue.TaskQueue`
- `Octopus.Server.Orchestration.ServerTasks.Upgrade.UpgradeTaskController`
- `Octopus.Server.Orchestration.ServerTasks.HealthCheck.MachineTaskController`
### Workaround
Wait to schedule tentacle upgrades during off-hours. For large companies with 100s of tentacles running 24/7 this is incredibly difficult and they have to bite the bullet and block everyone.
opened 02:18AM - 31 May 17 UTC
kind/enhancement
size/medium
feature/deployments
state/backlog
team/fire-and-motion
Currently Tentacle upgrade will block tasks queued behind it from running so the… y don't attempt to run on a restarting Tentacle. It is somewhat selective but clumsy. If there is a problem with Tentacle upgrade (for example updating Calamari gets stuck), the Tentacle upgrade task can block other tasks indefinitely.
We have a script isolation mutex but it does not help Tentacle upgrade because each step of the upgrade takes out the mutex. Another task can acquire the mutex while Tentacle upgrade is between steps and run on a restarting Tentacle.
I think this issue can be resolved by wrapping the entire Tentacle upgrade process in the script isolation mutex. Each script step will need to be run without acquiring the mutex, which I believe is different to anything we do currently. With this approach, if a machine is blocking the Tentacle upgrade from progressing only tasks that involved that particular machine will be blocked rather than the entire task queue.
We have had other reports of issues in this area recently so it does look like they’re actively looking at this as we speak. Hopefully we’ll see some significant improvements in the not-too-distant future.
For the time being, in your case you might be able to mitigate this a little bit by gradually increasing your task cap.
I hope that helps. Please don’t hesitate to reach out with any questions or concerns.
Best regards,
Kenny
system
(system)
Closed
8 May 2022 01:04
4
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.