Problem when taking Scheduled backups in Octopus 2.6 version

Hi,

We have a issue every 2 days like the backup goes on running until we stop it and re-run.
iam attaching the picture.

Thanks,
Manoj.

Hi Manoj,

Thanks for getting in touch! When the UI shows the backup at 20% that is when Raven is doing its part of the backup and export process. It doesn’t give us feedback or error particularly well.
The next steps to diagnose this are to go into the Raven studio via the Octopus Manager and do the following:

  • check in the footer how many documents you have
  • Go to Tasks -> Export Database and export the database

Post back the results of both. The export is what is running when the backup reaches 20% this should give you an estimate of both how long it takes and if there are any errors reported.
Vanessa

Hi Vanessa,

we have

  1. 156,661 Documents

And attaches the Tasks–>>Export Database

–Manojreddy

And some time it says Backup Completed but it is failing with the below error

This task started 8 hours ago and ran for 2 hours
Scheduled database backup
Fatal 09:52:22
The remote server returned an error: (404) Not Found.
System.Net.WebException: The remote server returned an error: (404) Not Found.

Server stack trace:
at System.Net.HttpWebRequest.GetResponse()
at Raven.Abstractions.Connection.HttpRavenRequest.SendRequestToServer(Action`1 action) in c:\Builds\RavenDB-Stable\Raven.Abstractions\Connection\HttpRavenRequest.cs:line 205
at Raven.Smuggler.SmugglerApi.d__27.MoveNext() in c:\Builds\RavenDB-Stable\Raven.Smuggler\SmugglerApi.cs:line 259

Exception rethrown at [0]:
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Raven.Abstractions.Smuggler.SmugglerApiBase.d__1.MoveNext() in c:\Builds\RavenDB-Stable\Raven.Abstractions\Smuggler\SmugglerApiBase.cs:line 142

Exception rethrown at [1]:
at Microsoft.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at Microsoft.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccess(Task task)
at Octopus.Server.Orchestration.Backup.BackupOrchestrator.d__0.MoveNext() in y:\work\refs\heads\master\source\Octopus.Server\Orchestration\Backup\BackupOrchestrator.cs:line 51
Octopus.Server version 2.6.5.1010

Hi Manoj,

You have definitely reached higher than the maximum amount of documents that Raven can really handle inside of Octopus.
You are also having trouble with your migrations from the other ticket.

So I very strongly that you need to start applying retention policies to get that number of documents down.
And even when the data is down you are going to have to apply some control of what is imported into 3.x as the migration will just take too long.

But lets start first with getting that document number down.
Do you run any data retention policies, do you have any internal policy around having to keep data around for your releases?

Vanessa

Hi Vanessa,

We actually have Retention policy for Repository and that is 60 days.
and we don’t have any other internal policy…
Is there any option that we can bring down the Data?

–Manojreddy

rep.PNG

Hi Vanessa,

Can you provide us Info on how we can purge the data, actually our database size is 16gb and we would like to remove all the unused documents?
Do you have any script to do it or any other option to purge the Data?

Thanks,
Manoj.

Hi Manoj,

The best and most complete way to get your data and document count down (and you will need for your 3.x migration that you are currently attempting) is via the on server retention policies.

You want to add the Releases retention policies. As you have just so many records I suggest you start with a very large number such as 730 days (approx two years).
The next day set it to 100 days less, etc until you end up somewhere around 365 - then we can see how many documents you have left.

This is the only way I would recommend as it works with cascading style deletion and will not leave you with orphaned documents and a potentially broken Octopus UI.

I have never seen a successful migration with backups over 2GB - so we need to aim high.

I am also happy to do a call with you to discuss a strategy for this as with this much data it will take a bit to clean up correctly.

Vanessa

Hi Vanessa,

Glad to here, Ill discuss with my team and block the time as we can allow other team members to join the call.
We are actually in Philadelphia, PA (EST).

Thanks,
Manojreddy

Hi Manoj,

Not a problem, even though we are in Australia the scheduler works with your local time, but blocks available time in our calendar, so we don’t need to do any crazy conversions to select a time, just pick a time that works for you in that scheduler.

Vanessa

Scheduled it for Monday 6PM.

Thanks,
Manojreddy.

Hi Manoj,

Here are the call details:

  1. Please join my meeting.
    https://global.gotomeeting.com/join/156467733

  2. Use your microphone and speakers (VoIP) - a headset is recommended. Or, call in using your telephone.

Australia: +61 2 8355 1034
United States: +1 (571) 317-3116
Access Code: 156-467-733

Audio PIN: Shown after joining the meeting
Meeting ID: 156-467-733

GoToMeeting®
Online Meetings Made Easy®

Speak Soon,
Vanessa

Hi Manoj,

Thanks for the call yesterday. Below is a summary of my suggestion for your upgrade strategy.

Test run:

  1. restore a current backup from your Prod instance to a local instance.
  2. stop the service (you do not need the service to be running, and maintenance mode does not stop health checks or retention policies)
  3. browse Raven and delete from the events table - aim for a total of 70-90k documents (there is no query you can run to make this easier - from the documents is fine, you can multiple select and right click delete)
  4. start the service, and grab a backup - stop the service after - note the filesize
  5. attempt a migration to your 3.0 test instance

Keep note of the time it takes you to delete from the events table, and how long the service takes to start again as Raven re-indexes at this point after all that data has been deleted.
Also keep note of how long the migration takes, as it should be similar in production run.

When it comes to your actual prod upgrade you will need to schedule a time where the Octopus Server is unavailable as you will need to delete the records and have a working set of data to migrate. So figuring out how long this should take should be similar from where you are doing the above processes without step 1.

Please let me know if you have any additional questions.

Vanessa

Hi Vanessa,

We have upgraded to octopus 3.3.4 Successfully and it took like 6 hours for the whole process.
And i have a question regarding the Tentacle Upgrade, As i see Upgrade All Tentacles Option in Environments. Is it something we can use it to upgrade Tentacles rather than using Hydra?

Please let us know the Options.

Thanks for all your help.

Thanks,
Manojreddy.

Hi Manoj,

That is fantastic news!

2.x Octopus communicated with Tentacles via Pipefish, 3.x OD communicates via Halibut.
2.x Tentacles will not understand the communication from 3.x thus they need Hydra to be run from 2.6 to upgrade them to 3.x.
So no unfortunately Hydra is the only way to upgrade the Tentacles.

Vanessa

Hi Vanessa,

We have a problem with RavenDB Backup and every backup is timing out since 2 days
We were trying to Delete the Events Collection and Restore the DB so that the 63GB RavenDB comes to 2GB. We do it like once Every 2 Months until we upgrade to 3.x

We had Maintenance this week and we are encountering the DB Backup Time out issue and unless we get backup we cannot do the maintanance.

Iam Attaching the Screenshotsfrom dashboard ans well as from RavenDB.

Hope to here from you soon.

Thanks,
Manojreddy,

Hi Manojreddy,

In your 5.jpg while it gives an error it also says task complete. Did it manage to export a dump file?
You may be in a catch-22 situation. You cannot do the maintenance unless you have a backup, and you cannot get a backup until you do the maintenance.

What does the footer of your Raven manager look like currently?

When is your upgrade planned? The information we have given you to keep it stable is not sustainable for a lengthy duration. What can we do to help expedite this upgrade?

Vanessa

The issue with backups has been resolved.
but after maintanance one of the task with backup we have cancelled it twice and it timedout.
but, its not moving to Recent and still stays in Active Tasks due to which our Automated backups are not triggering.

Hi Manojreddy,

You may need to rebuild the indexes if that task has not moved from the running tasks.

Vanessa