Efficiently Deleting Rows from Azure Table Storage

Azure Table storage is a way of storing structured NoSQL data in the cloud, as such it’s more geared towards rapid read access rather than manipulation of data in the table.

If you want to perform any operations such as updates or deletes on data in your table then the most efficient way to do this is as a batch operation which groups up to 100 operations into a single batch provided they all have the same partition key.

In my case I wanted to download all rows from a history table before the current point in time and to then delete them from the history table.

Initially I get all of the history messages (the row key in this table is a timestamp and the partition key is a customer ID).

public async Task<List<HistoryEntity>> GetHistoryMessagesAsync(string tableName, CloudStorageAccount storageAccount, string timestamp)
{	
	List<HistoryEntity> historyEntities = new List<HistoryEntity>();
	
	CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
	CloudTable historyTable = tableClient.GetTableReference(tableName);

	TableQuery<historyTable> dowloadQuery = new TableQuery<historyTable>().Where(TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThan, timestamp));
	TableContinuationToken continuationToken = null;

	do
	{
		var tableQueryResult = historyTable.ExecuteQuerySegmentedAsync(dowloadQuery, continuationToken);

		continuationToken = tableQueryResult.Result.ContinuationToken;

		foreach (HistoryEntity result in tableQueryResult.Result)
		{
			historyEntities.Add(result);
		}
	}
	while (continuationToken != null);

	return historyEntities;
}

I then get all of the distinct partition keys from my entities so that I can batch up all of my deletes into as few operations as possible.

List<HistoryEntity> historyEntities = GetHistoryMessagesAsync("HistoryTable", storageAccount, "2019-03-07 22:10:53.8785130");
List<string> partitionKeys = historyEntities.Select(x => x.PartitionKey).Distinct();

In order to most efficiently delete these keys I want to create batches of 100 or fewer entities, if there’s more than 100 records belonging to a partition key then I want to split these into separate batches.

public async Task DeleteHistoryMessagesAsync(string partitionKey, CloudTable historyTable)
{
	TableQuery<TableEntity> deleteQuery = new TableQuery<TableEntity>()
		.Where(TableQuery.CombineFilters(
			TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey),
			TableOperators.And,
			TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThan, timestamp)
			)
		)
		.Select(new string[] { "PartitionKey", "RowKey" });

	TableContinuationToken continuationToken = null;

	do
	{
		var tableQueryResult = historyTable.ExecuteQuerySegmentedAsync(deleteQuery, continuationToken);

		continuationToken = tableQueryResult.Result.ContinuationToken;

		// Split into chunks of 100 for batching
		List<List<TableEntity>> rowsChunked = tableQueryResult.Result.Select((x, index) => new { Index = index, Value = x })
			.Where(x => x.Value != null)
			.GroupBy(x => x.Index / 100)
			.Select(x => x.Select(v => v.Value).ToList())
			.ToList();

		// Delete each chunk of 100 in a batch
		foreach (List<TableEntity> rows in rowsChunked)
		{
			TableBatchOperation tableBatchOperation = new TableBatchOperation();
			rows.ForEach(x => tableBatchOperation.Add(TableOperation.Delete(x)));

			await historyTable.ExecuteBatchAsync(tableBatchOperation);
		}
	}
	while (continuationToken != null);
}

We can also run these deletes in parallel using TPL to further speed up the processing.

// Create a block with an asynchronous action
var block = new ActionBlock<(string partitionKey, CloudTable historyTable)>(
	async x => await DeleteHistoryMessagesAsync(x.partitionKey, x.historyTable),
	new ExecutionDataflowBlockOptions
	{
		BoundedCapacity = 100, // Cap the item count
		MaxDegreeOfParallelism = 16
	});

foreach (string partitionKey in partitionKeys)
{
	await block.SendAsync((partitionKey, historyTable));
}

block.Complete();
await block.Completion;

2 Comments

Kanag · 23 August 2022 at 08:08

What was the time taken overall to delete 1000 records?

Shinigami · 24 August 2022 at 13:08

Sorry, I can’t remember exact times but I think 1000 records was probably only a couple of seconds assuming they’re all on the same partition key. If they’re spread accross multiple partition keys then that’s going to require more API calls and will take longer.

Published by Shinigami on 25 March 2019

2 Comments

Kanag · 23 August 2022 at 08:08

Shinigami · 24 August 2022 at 13:08

Leave a Reply Cancel reply

Pass Through an iGPU to Frigate from Proxmox

Creating a Frigate VM on Proxmox

Enable Remote Root SSH Login

Efficiently Deleting Rows from Azure Table Storage

Published by Shinigami on 25 March 2019

2 Comments

Kanag · 23 August 2022 at 08:08

Shinigami · 24 August 2022 at 13:08

Leave a Reply Cancel reply

Related Posts

Pass Through an iGPU to Frigate from Proxmox

Creating a Frigate VM on Proxmox

Enable Remote Root SSH Login