We occasionally see performance issues of the Octopus Deploy UI. This typically comes when we have a large Prod or Staging deployment happening with about 10-15 people monitoring the deployment via their own browser. The issue seems to be Octopus gets bogged down with the amount of data it is receiving and then having to update the 10-15 dashboards it has running. We can check on the tentacle server and see the step has completed but the dashboard is still reporting the step is in progress and it’s updating the task log its displaying. It slows down the overall time of the release because it’s not even going to the next step but is instead stuck trying to render the task log.
Occasionally, a Health Check will kickoff in the middle of the release which seems to cause a lot of traffic issues as we see the number of TCP connections spike from about 50 connections to 500 (it never seems to ever go above that, not even slightly).
Steps we have taken to mitigate this issue:
- Report less data back to the Octopus server. It wasn’t being used anyway. 11k lines on the log became 1k lines
- Asking everyone except the person deploying the release to close their Octopus sessions. This helps but is obviously less than ideal as it sidesteps one of the main benefits of the tool.
- We are working on creating rolling deployments so that we don’t have 10 servers reporting back to the Octopus server at one time.
- We are working on just creating the log on the server and then collecting it as an artifact.
Thanks for getting in touch! It looks like you have already taken some really good steps to speed up the process by reducing the quantity of things being logged and the people monitoring the deployment. Unnecessary logging while files are being written and read can cause a lot of stress on the server and can easily cause performance issues. Limiting what is logged to the absolute necessary information is an effective way at reducing the stress. Limiting the people watching the deployment is another great way to reduce this stress.
Perhaps as a more efficient alternative to monitoring the deployment, you can have one or two watching the task progress from the task or release page, from there you would be able to investigate if Octopus flags an issue and then act on that.
As for the health checks, these are background tasks and Octopus prioritizes them as low importance, therefore they should have no impact on performance during a deployment.
If you do require more in depth logging yet do not want to impact the performance, you can record the data using artifacts so that it does not need to be written into the Octopus logs.
Let me know if this helps.
Is it really Octopus’ position to only have one or two people watching the task progress at a time? I consider it a huge design flaw if this application can’t handle more than a small handful of concurrent users. Don’t get me wrong, I like this tool but it needs to allow more than just a few people to monitor a deployment without causing performance issues.
I understand there is a balance between the performance of the tool and what we are asking it to do. Is there any documented guidance regarding how to handle these scenarios and/or decide about how to mitigate them? For example, If you have 10 power users, you can use a single server instance but if you have 40 power users, you need a HA setup. Or If you have more than X tentacles, you should consider either HA or separate Octopus instances for each Dev team. Obviously Octopus is great for small to medium teams but we are trying to implement it to an enterprise so it would nice to see some direct support directed toward large enterprises.
Additionally, is there any documentation about how to troubleshoot these performance issues?
Apparently the way to mitigate performance issues is “don’t use Octopus”… no response…