Octopus HA TaskLog read delay

swalsh1 · 9 August 2022 22:09

Hi

I was wondering if there is a history of TaskLog delay in an HA design? We are using AWS EFS shared volumes between 30+ server containers and when we run tasks the logs take up to 30 seconds to show up. Until they show up the TaskLog is empty despite a successful Task ( or Failed Task ).

Thanks

brent_kinney · 9 August 2022 23:26

Hey @swalsh1,

Thanks for reaching out on our support forum. It looks like you may be hitting this known bug directly affecting the server tasks endpoint.

Can you review that GitHub issue and let me know if your Octopus version falls in the affected range? If it does, you’d need to upgrade past the fixed version to resolve the slowness issue.

Let us know if you have any other questions.

Best,
Brent

swalsh1 · 11 August 2022 21:24

Hi @brent_kinney

Our version is 2022.2 ( build 7512 ) currently. Doesn’t look like it falls in the range listed there but its possible. Can you provide more context on the ServerTasks/details endpoint they mention in the GitHub issue? Is that a database endpoint or just going to the local path ( network AWS EFS share in this case, shared amongst all 30 server node containers)?

I’m not seeing any errors reporting 5XX errors regarding this behavior. They just take about 30 seconds and then a refresh and the log shows up. Is there a location in the UI where we could locate the 5XX errors being talked about on that GitHub issue?

brent_kinney · 11 August 2022 21:28

Hey @swalsh1,

It’s both DB and local file system. Your Octopus home folder contains the actual server tasks logs. Also, in your Octo db, there is a server tasks table that contains all the records for the server tasks, so when you view that endpoint, it looks in the DB table for that record and then associates it with the server task file on disk. The same goes for artifacts and packages. If you’re not hitting that error, you may experience slowness if your server tasks table is very large. What is your retention policy like? Are you clearing out older releases and runbook runs?

Best,

swalsh1 · 11 August 2022 21:35

So we are currently in the process of migrating our single octopus server instance to an HA design ( almost done ) and due to that we’ve been using a copy of the existing RDS Database that contains all of our task logs for the last 2 years.

Those matching local files haven’t been migrated to the EFS file share yet, so I thought it could be a possible cause of this behavior. Where the Database has references to TaskLogs that don’t have the matching local file ( in the EFS share ) yet until we perform the final migration.

I believe we are retaining all logs ( part of company requirements ).

@brent_kinney

brent_kinney · 11 August 2022 22:44

Hey @swalsh1,

Yes, that could definitely contribute to those load times. Depending on how often you deploy and with your retention not cleaning anything up, you could have several hundred thousand records in that DB table. Plus, with the file storage split, it would make sense that your load time is on the higher side. To be honest, I would have expected it to be a bit worse given that scenario. I’m curious to see if your load time will reduce when you move over the physical files to your EFS share. If it doesn’t, you may want to see if you can set some type of retention, maybe copying task logs to another location and allowing retentions to clean up the releases and logs in the DB/on disk, but I understand that company policy can be a bit restrictive.

Best,
Brent

swalsh1 · 11 August 2022 22:56

I’ll let you know if it improves when all of the disc files have been migrated. I’ll also look into a retention policy, or archival system if we can. Does Octopus have suggestions in those areas?

brent_kinney · 11 August 2022 23:07

We could help suggest how to structure your retention policy according to what you want to keep, but most of the retention policy is pretty self-explanatory and is covered in our retention documentation. In general, we would suggest trying to keep your retentions as aggressive as possible to keep your table sizes down and your instance performing well. Our solutions team at advice@octopus.com might be able to give you some pointers in that area, but I’m not sure how much they’d be able to help as that would fall outside of what Octopus can do.

I’m sorry I don’t have more for you on that front.

Best,
Brent

system · 11 September 2022 23:07

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.