BulkInsertOrUpdateAsync performance questions
bkulcsar opened this issue · comments
Hi Guys!
I need some help regarding to BulkInsertOrUpdateAsync operation, I could not find any performance benchmarks related to BulkInsertOrUpdateAsync, however I think it not as fast as I would expect from SQL MERGE operation. I am using Azure MS SQL server with 200 DTU and for 250k records the BulkInsertOrUpdateAsync tooks arround 6 minues for the first run (when basically all records are new so it does only insert), and when we load again the same records with some changes it tooks 2 minutes. If course increasing the MS DB's DTU would help the performance, but first I want to optimize the BulkInsertOrUpdateAsync as much as possible.
So my question is that is 6 mins normal to load 250k records with BulkInsertOrUpdateAsync with 200 DTU Azure SQL? Can I improve its perfomance with configuration? I cannot find any config in BulkConfig which might can help.
Code for the merge operation:
public virtual async Task BulkMergeAsync(
IEnumerable<T> entities,
BulkConfig? bulkConfig = null,
CancellationToken cancellationToken = default)
{
try
{
if (bulkConfig == null)
{
bulkConfig = new BulkConfig();
}
bulkConfig.CalculateStats = true;
bulkConfig.BulkCopyTimeout = 0;
await _dbContext.BulkInsertOrUpdateAsync(entities, bulkConfig, cancellationToken: cancellationToken);
if (bulkConfig.StatsInfo != null)
{
_logger.LogInformation($"Record inserted: {bulkConfig.StatsInfo.StatsNumberInserted}, Record updated: {bulkConfig.StatsInfo.StatsNumberUpdated}");
}
}
catch (Exception ex)
{
_logger.LogError($"Error during BulkInsertOrUpdate: {ex.Message}");
throw;
}
}
I was able to find out that not the MERGE operation is slow, but the INSERT to the temporary created table.
Is there a way to omit the COLLATE in the insert statement?
Both Insert or Update should be done much faster, as you check from ReadMe stats even 1 mil. is under a minute.
But that is done on local Sql, not sure about Azure, still should not be over 1 min.
Where did you find COLLATE statement in the code ?
I've changed the pricing model from DTU based to vCore based, it became much faster, however it's still not fast enough.
I've tested the operations with 2 million records:
80 vCores - 55 seconds - insert
80 vCores - 190 seconds - merge
16 vCores - 58 seconds - insert
16 vCores - 206 second - merge
8 vCores - 100 seconds - insert
8 vCores - 300 seconds - merge
I've profiled the SQL statements generated by the library and I saw there is a COLLATE function for every column:
insert bulk <table> (<column1> NVarChar(40) COLLATE SQL_Latin1_General_CP1_CI_AS, <column2> NVarChar(100) COLLATE SQL_Latin1_General_CP1_CI_AS ....)
This may can cause performance issues.