We had a team member that was partitioning large tables based a CreateDate
column. The goal was to improve performance by only keeping the last 7 months of data for some of the larger tables on Azure SQL Server Hyperscale.
The query that was run usually takes about 30 to 40 min to complete, but this table never finishes. This is an example query:
INSERT [dbo].[Transactions_paritioned] WITH(TABLOCK)
([Id]
,[EmployeeId]
,[FirstName]
,[LastName]
,[PhoneNumber]
,[Email]
,[NameOfFirstBorn]
,[CreateDate]
,[ModifyDate]
,[Deceased]
SELECT [Id]
,[EmployeeId]
,[FirstName]
,[LastName]
,[PhoneNumber]
,[Email]
,[NameOfFirstBorn]
,[CreateDate]
,[ModifyDate]
,[Deceased]
FROM [dbo].[Transactions]
WHERE CreateDate> = '2022-10-01'
Services that touch this table were confirmed disabled before the query was run. Checking this table, for indexed columns, I could see that the only index was Id
which was the primary key with a clustered index.
To work around this, I got close to the 2022-10-01
date by querying by Id
instead with a bit of binary search sprinkled in.
SELECT TOP(1) Id, CreateDate FROM dbo.Transactions WHERE Id = 1851000000
Should this column be indexed? Doesn't look like anything else is querying by CreateDate
. Probably best to not add the overhead.