SQL Execution Timeout Expired After upgrade

j.keulartz · 11 July 2023 10:56

Hi Octopus support,

After upgrading to version 2023.2 (build 12209) some weeks ago
and now we experience that with one runbook its generate the error given below.

Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. Current transactions: Transaction ‘GetEvents.GetEvents|80007b9d-0002-fb00-b63f-84710c7967bb|T1027’ Open with 2 commands started at 2023-07-06T12:19:15 (60.66 seconds ago) 2023-07-06T12:19:15 SELECT COUNT(*) FROM [dbo].[Event] WHERE (((([SpaceId] in (‘Spaces-1’))) OR (([SpaceId] is null)))) AND ([Id] in (SELECT er.EventId from EventRelatedDocument er where er.RelatedDocumentId = @regarding0)) 2023-07-06T12:19:15 SELECT * FROM ( SELECT Id,RelatedDocumentIds,ProjectId,EnvironmentId,Category,UserId,Username,Occurred,Message,TenantId,AutoId,DataVersion,UserAgent,SpaceId,JSONBlob,ChangeDetails,IpAddress,JSON, ROW_NUMBER() OVER (ORDER BY [Occurred] DESC, [AutoId] DESC) AS RowNum FROM [dbo].[Event] WHERE (((([SpaceId] in (‘Spaces-1’))) OR (([SpaceId] is null)))) AND ([Id] in (SELECT er.EventId from EventRelatedDocument er where er.RelatedDocumentId = @regarding0)) ) ALIAS_GENERATED_1 WHERE ([RowNum] >= @_minrow) AND ([RowNum] <= @_maxrow) ORDER BY [RowNum] SQL Error -2 - Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. The wait operation timed out.

We have run System Integrity Check and no errors there.
Any advice?

donny.bell · 11 July 2023 12:22

Hi @j.keulartz,

Thank you for contacting Octopus Support. I’m sorry to hear you are having timeout issues when trying to run a particular Runbook.

Are you able to generate a System Diagnostics Report and share that with us via this secure upload link?

Let us know once you have uploaded that and we’ll have a look to see what the issue is.

Best Regards,
Donny

j.keulartz · 11 July 2023 12:35

Hi Octopus support,

I have just uploaded the files

donny.bell · 11 July 2023 13:50

Hi @j.keulartz,

Thank you for getting back to me.

I don’t see anything in particular that jumps out other than the timeouts seem to occurring throughout the logs.

Is this Runbook the only “thing” that seems to be generating this error? If so, are you able to try cloning the Runbook and see if the clone behaves in the same way?

Let me know regarding the above at your earliest convenience.

Best Regards,
Donny

j.keulartz · 11 July 2023 14:18

Hi Donny

I just spoke to end user.

It appears that when viewing the snapshots of runbooks history, sometimes the screen immediately jumps. But also sometimes needs more than 1 min or shows the previously delivered error (time out)

I have included a movie (mp4) that shows a long period of waiting

<img width=“449” height=“349” style=“width:4.677in;height:3.6354in” id=“Afbeelding_x0020_3” src=“//octopus-help-prod.s3.dualstack.us-west-2.amazonaws.com/original/3X/9/9/99bb7cd4fde609c1cd292bee2daea56f7864afbb.png” alt="Afbeelding met tekst, schermopname, Lettertype, Webpagina

Automatisch gegenereerde beschrijving">

donny.bell · 11 July 2023 14:35

Hi @j.keulartz,

Thank you for letting me know. Unfortunately, the video didn’t come through. However I suspect this may be due to many repeated runs of the same Runbook Snapshots. This is the type of behavior I would expect if the Runbook Retention Policy is set to Keep All.

Can you confirm the current Retention Policy setting for the Runbook in question?

Example:

Let me know at your earliest convenience.

Best Regards,
Donny

paraic.oceallaigh · 11 July 2023 23:52

Hi @j.keulartz
Just stepping in for Donny who is offline at present as part of the UK team.

I came across this issue recently and we do have a GitHub issue for this which is still open:

github.com/OctopusDeploy/Issues

SQL timeouts when searching events, publishing or loading a runbook snapshot - Execution Timeout Expired

opened 12:26AM - 09 Aug 21 UTC

adam-mccoy

kind/bug area/core

# Prerequisites - [x] I have verified the problem exists in the latest versio…n - [x] I have searched [open](https://github.com/OctopusDeploy/Issues/issues) and [closed](https://github.com/OctopusDeploy/Issues/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aclosed) issues to make sure it isn't already reported - [x] I have written a descriptive issue title - [x] I have linked the original source of this report - [x] I have labelled the value stream (area/core, area/steps, ...) - [x] I have tagged the issue appropriately (kind/bug, kind/enhancement, feature/ ...) # The bug An Octopus instance with many events can experience SQL timeouts when searching for events in particular ways, like by user, project, environment, etc. ## Steps to reproduce This requires an Octopus instance with many millions of events. 1. Go to Audit under Configuration 2. Add filters for last 30 days and a user 3. See error ### Screen capture ![image](https://user-images.githubusercontent.com/10055302/128649821-c613c77a-73a3-405b-a175-3838e384e9a1.png) ``` Exception occurred while executing a reader for `SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY [Occurred] DESC, [AutoId] DESC) AS RowNum FROM [dbo].[Event] WHERE ((([SpaceId] = 'Spaces-1') OR ([SpaceId] = 'Spaces-2') OR ([SpaceId] = 'Spaces-3'))) AND ([Occurred] >= @from) AND ([Occurred] < @to) AND (([UserId] in (@userids_1) OR [Id] in (SELECT er.EventId from EventRelatedDocument er where er.RelatedDocumentId in (@userids_1)))) ) ALIAS_GENERATED_1 WHERE ([RowNum] >= @_minrow) AND ([RowNum] <= @_maxrow) ORDER BY [RowNum]` SQL Error -2 - Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. Unknown error 258 ``` ## Affected versions **Octopus Server:** 2021.1 onwards ## Workarounds for on-premise customers You need to clear the cached execution plans: ``` DBCC FREEPROCCACHE ``` ## Workarounds for Cloud (Azure SQL) customers There are a few things that can improve the performance of searching events: - Add more compute power to the SQL Server instance, if possible - Rebuild the indexes on the `EventRelatedDocument` tables, see example script below - Re-running the query several times can allow the query to complete within the timeout limit due to internal caching done by SQL Server See [detailed steps here](https://stackoverflow.com/c/octopusdeploy/a/1390/20). ### Rebuild `EventRelatedDocument` indexes ``` sql ALTER INDEX [IX_EventRelatedDocument_RelatedDocumentId] ON [dbo].[EventRelatedDocument] REBUILD; ALTER INDEX [IX_EventRelatedDocument_EventId] ON [dbo].[EventRelatedDocument] REBUILD; ``` Note: this can take a long time to execute, depending on the number of events. ## Links Internal report: https://trello.com/c/faJFrpEl/37-sql-execution-timeout-on-deploy

Reading through the issue it seems to be a failing on SQL server due to poor execution plans. There is a workaround for on-prem servers which is simply to clear the execution plan cache:

DBCC FREEPROCCACHE

If you get a chance to run this on your SQL server it may resolve the timeouts.

Let us know how you get on.

Kind regards,
Paraic

j.keulartz · 12 July 2023 06:24

Hi Goodmorning Paraic,

donny.bell · 12 July 2023 08:40

Hi @j.keulartz,

Thank you for getting back to us.

As you are running Octopus Server 2023.2.12209, I don’t believe we have any specific performance updates relating to the timeout issue you are experiencing when compared to 2023.2.12998.

Were you able to find the Runbook Retention Policy? If so, what is the current setting?

Let us know at your earliest convenience.

Best Regards,
Donny

j.keulartz · 12 July 2023 13:20

Hi Donny,

we’ve played around with the retention policies from 20 to 2, but this doesn’t seem to help.

We wonder how we can see in the database how many retentions there should be?

do you have the location (table name) or a script for us to request it? Your help is appreciated

donny.bell · 12 July 2023 15:22

Hi @j.keulartz,

Thank you for getting back to me.

Are you able to run 2 SQL queries against your Octopus SQL DB so we can get a better idea of what may be causing this issue?

Index Fragmentation Report:

SELECT S.name as 'Schema',
T.name as 'Table',
I.name as 'Index',
DDIPS.avg_fragmentation_in_percent,
DDIPS.page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS DDIPS
INNER JOIN sys.tables T on T.object_id = DDIPS.object_id
INNER JOIN sys.schemas S on T.schema_id = S.schema_id
INNER JOIN sys.indexes I ON I.object_id = DDIPS.object_id
AND DDIPS.index_id = I.index_id
WHERE DDIPS.database_id = DB_ID()
and I.name is not null
AND DDIPS.avg_fragmentation_in_percent > 0
ORDER BY DDIPS.avg_fragmentation_in_percent desc

Table Size Report:

-- Gets the size of all the tables
SELECT 
    t.NAME AS TableName,
    p.rows AS RowCounts,
    SUM(a.total_pages) * 8 AS TotalSpaceKB, 
    CAST(ROUND(((SUM(a.total_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS TotalSpaceMB,
    SUM(a.used_pages) * 8 AS UsedSpaceKB, 
    CAST(ROUND(((SUM(a.used_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS UsedSpaceMB, 
    (SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB,
    CAST(ROUND(((SUM(a.total_pages) - SUM(a.used_pages)) * 8) / 1024.00, 2) AS NUMERIC(36, 2)) AS UnusedSpaceMB
FROM 
    sys.tables t
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN 
    sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN 
    sys.schemas s ON t.schema_id = s.schema_id
WHERE 
    t.NAME NOT LIKE 'dt%' 
    AND t.is_ms_shipped = 0
    AND i.OBJECT_ID > 255 
GROUP BY 
    t.Name, s.Name, p.Rows
ORDER BY 
    2 desc

Once you have run those, please export the results in .CSV format, then upload them using the secure upload link.

Let me know once you have upload and we’ll have a look at the results.

Best Regards,
Donny

j.keulartz · 17 July 2023 10:52

Hi Donny

Have you received the files for further investigation?

donny.bell · 17 July 2023 11:07

Hi @j.keulartz,

Thank you for getting back to me and for sharing the results.

Upon reviewing the information, the fragmentation of the SQL DB indexes does appear high and would likely see a noticeable benefit from running defragmentation and hopefully fix the timeout issue.

Below is a script you can run against your Octopus SQL DB in order to rebuild the indexes. I recommend running this at a time where users won’t be impacted as rebuilding indexes will affect SQL DB performance while it is running:

DECLARE @indexName NVARCHAR(MAX), @schemaName NVARCHAR(MAX), @tableName NVARCHAR(MAX), @current SMALLINT, @avgFragmentationPercentage FLOAT, @start DATETIMEOFFSET, @globalStart DATETIMEOFFSET

SET @globalStart = SYSDATETIMEOFFSET()

PRINT 'Performing maintenance on SQL indexes'
DECLARE @indexFragmentation TABLE (
                                      [TableName] sysname,
                                      [SchemaName] sysname,
                                      [IndexName] sysname,
                                      [AvgFragmentationPercentage] FLOAT
                                  )

PRINT 'Querying index fragmentation'
INSERT INTO @indexFragmentation ([TableName], [SchemaName], [IndexName], [AvgFragmentationPercentage])
SELECT
    tbl.name as [TableName],
    SCHEMA_NAME (tbl.schema_id) as [SchemaName],
    idx.Name as [IndexName],
    pst.avg_fragmentation_in_percent as [AvgFragmentationPercentage]
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL , 'SAMPLED') as pst
         INNER JOIN sys.tables as tbl ON pst.object_id = tbl.object_id
         INNER JOIN sys.indexes idx ON pst.object_id = idx.object_id AND pst.index_id = idx.index_id
WHERE pst.index_id != 0
  AND pst.alloc_unit_type_desc IN ( N'IN_ROW_DATA', N'ROW_OVERFLOW_DATA');


SET @current = 1
DECLARE indexName_cursor CURSOR
    FOR
    SELECT [IndexName]
    FROM @indexFragmentation
    WHERE [AvgFragmentationPercentage] > 10 AND [AvgFragmentationPercentage] <= 30
    ORDER BY AvgFragmentationPercentage DESC;

OPEN indexName_cursor;
PRINT 'Reorganizing ' + LTRIM(STR(@@CURSOR_ROWS)) + ' fragmented indexes'
FETCH NEXT FROM indexName_cursor INTO @indexName;
WHILE @@fetch_status = 0
    BEGIN
        SELECT
                @schemaName = [SchemaName],
                @tableName = [TableName],
                @avgFragmentationPercentage = [AvgFragmentationPercentage]
        FROM @indexFragmentation
        WHERE [IndexName] = @indexName

        PRINT 'Reorganizing index ' + @indexName + ' (' + LTRIM(STR(@avgFragmentationPercentage)) + '%) on table ' + @schemaName + '.' + @tableName + ' (' + LTRIM(STR(@current)) + '/' + LTRIM(STR(@@CURSOR_ROWS)) + ')';
        SET @start = SYSDATETIMEOFFSET()
        EXEC('ALTER INDEX [' + @indexName + '] ON ['+ @schemaName +'].[' + @tableName + '] REORGANIZE;');

        PRINT 'Reorganizing index ' + @indexName + ' took ' + CONVERT(varchar(40), DATEDIFF(second, @start, SYSDATETIMEOFFSET())) + ' seconds';

        SET @current = @current + 1
        FETCH NEXT FROM indexName_cursor INTO @indexName;
    END;
CLOSE indexName_cursor;
DEALLOCATE indexName_cursor;

SET @current = 1

DECLARE indexName_cursor CURSOR
    FOR
    SELECT [IndexName]
    FROM @indexFragmentation
    WHERE [AvgFragmentationPercentage] > 30
    ORDER BY AvgFragmentationPercentage DESC;

OPEN indexName_cursor;
PRINT 'Rebuilding ' + LTRIM(STR(@@CURSOR_ROWS)) + ' heavily fragmented indexes'
FETCH NEXT FROM indexName_cursor INTO @indexName;
WHILE @@fetch_status = 0
    BEGIN
        SELECT
                @schemaName = [SchemaName],
                @tableName = [TableName],
                @avgFragmentationPercentage = [AvgFragmentationPercentage]
        FROM @indexFragmentation
        WHERE [IndexName] = @indexName

        PRINT 'Rebuilding index ' + @indexName + ' (' + LTRIM(STR(@avgFragmentationPercentage)) + '%) on table ' + @schemaName + '.' + @tableName + ' (' + LTRIM(STR(@current)) + '/' + LTRIM(STR(@@CURSOR_ROWS)) + ')';

        SET @start = SYSDATETIMEOFFSET()
        EXEC('ALTER INDEX [' + @indexName + '] ON ['+ @schemaName +'].[' + @tableName + '] REBUILD WITH (FILLFACTOR = 80, SORT_IN_TEMPDB = ON, STATISTICS_NORECOMPUTE = OFF);');

        PRINT 'Rebuilding index ' + @indexName + ' took ' + CONVERT(varchar(40), DATEDIFF(second, @start, SYSDATETIMEOFFSET())) + ' seconds';

        SET @current = @current + 1
        FETCH NEXT FROM indexName_cursor INTO @indexName;
    END;
CLOSE indexName_cursor;
DEALLOCATE indexName_cursor;

PRINT 'Finished performing maintenance on SQL indexes. The whole process took ' + CONVERT(varchar(40), DATEDIFF(second, @globalStart, SYSDATETIMEOFFSET())) + ' seconds'

Let me know if you have any questions prior to running the script. Once the script has run, please confirm if you are able to access the Runbook snapshots for the Runbook in question without timing out.

Best Regards,
Donny

j.keulartz · 17 July 2023 12:29

Hi Donny,

I can confirm that script had the desired result.

Thank you for your support
You can close topic

system · 17 August 2023 12:30

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.