I have an Octopus setup with HA (2 nodes). In case this matters, it was setup over a year ago, so a much lower version of Octopus (probably version 2019.12.4). Every time I update permissions, it appears that one server doesn’t take the new permissions straight away.
I’m able to replicate this by having a test account with test permissions, updating said permissions, and seeing that on the server I added the settings I see the permissions immediately. The other server takes a few minutes.
I had a look at the database for the cluster configuration and can see everything pointing at our network DFS shares.
From each server, I’ve tried browsing to the directory and everything is in sync.
Both servers have config showing them pointing at the same database.
Any idea why we’re seeing a delay in permissions from one server?
Thanks for getting in touch!
As you mention, both servers read and write directly to the same database, so, there shouldn’t be any delay for replication or things like that.
I’m wondering if this could some form of caching issue within the UI?
Have you tried performing a full browser refresh (Ctrl-F5) on the second node after making the changes?
When you make changes to the permissions, if a user attempts to perform an action with the new permissions are they able to straight away or does that also take a few minutes?
It would be worth upgrading to the latest version when possible and seeing if this issue does persist too.
Ctrl + F5 does not load anything differently. I actually only started investigating this when I realised that API calls for a service account were seeing this behaviour too, so I dont think this is any caching within the UI. Also, since I can connect a browser to each node individually, I can see one immediately show the the updated permissions without me having to refresh the page at all.
I forgot to mention that we are currently on version 2020.6.4701. I mentioned the setup version as I did notice the HA setup has changed since we set this up. Last night I tried adding
Octopus.Server.exe path --clusterShared \\OctoShared\OctopusData with the relevant actual path. I restarted the Octopus service on both servers and tested again.
Still the same issue.
I’ve run some tests on this using my local environment and so far have been unable to replicate any kind of delay.
What kind of changes are you making to the permissions? Is it adding or removing a user from a team? Changing the roles assigned to a team? Or changing the permissions assigned to a user role?
Do you also see this kind of delay for any other data changes, for example, if you create a new project on one node does it immediately appear on the other?
I’ve been changing permissions in a user role, changing members in a team and the roles assigned to a team. All similarly face delays on one node.
I also tried creating a project and no, this doesn’t appear. If i try to take the URL to the project and open this up on the non-updated server, I also get a permissions error:
So far, I’ve only seen the same node always lag behind. I’m currently waiting for permissions to filter through so that I can try creating a project from my test account on one node. We are using OKTA so my normal admin account can’t actually connect to individual nodes right now.
Another test that would be worth trying would be to restart the Octopus service on the second node after making a change to see if it immediately picks up the change when it is back online.
This would highlight whether it is a caching issue within our software.
Permissions filtered through. Once I had admin permission, the projects started appearing fine. Creating or deleting a project on either node was immediately reflected in the other node.
I Added a project. Could see it on one node. The other (what I’ll refer to as the A node now) didn’t show it.
I restarted the service and when the service came back, it immediately could show the right project.
Same with permissions. Restarted the service and it immediately had the right permissions.
Do you see the same behaviour in both directions?
Modify Node A - Node B lags
Modify Node B - Node A lags
I just created a local admin account to be able to test this.
I created a project in node A. Node B lags
Most of the earlier tests were the other way and with A lagging behind B, so yes. The lag is happening both ways.
Ok, thanks for testing that.
I’ll bring this up with our engineers to look into further.
Thank you for getting in touch and for your detailed report. I got some additional details around how this currently works from some engineers that I’ll pass along.
Each node has its own cache of user permissions, which Octopus will invalidate when an API request comes through that would affect it, like a permissions change. But since the cache is local to the node that received the request, we don’t have a way to invalidate the separate caches across multiple nodes.
I think the only sure-fire workaround to this is to bounce the machines in the cluster.
I’ve also heard there’s some work in progress to improve this area, which will be continuing as part of the upcoming config as code feature.
Sorry I couldn’t give you better news! Please let us know if you have any questions going forward.
What is the length of time before the cache is reverified? As I’m not bouncing the services every time, the cache on both nodes must be syncing up / being rebuilt occasionally.
Is there any way to configure the cache lifetime?
Also, could I request that documentation be updated to reflect this until this is improved? Specifically:
This page should note the cache limitation.
It may also be useful to say that the Load Balancers should be configured with some sort of session persistence. Without it, you see a page that constantly flips back and forth between components you have access to vs components you just gained access to.
If you have other suggestions for ways to mitigate this, please let me know.
Unfortunately, at this point, the cache lifetime isn’t something that can be configured.
As touched upon by Kenny, upcoming changes made as part of the config as code feature should make it possible for us to invalidate the cache across all nodes when a change is made rather than only on the local node.
Session persistence does seem to be the best option at this point, and I have passed along the suggestion for amending our documentation to include this information.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.