borisdj / EFCore.BulkExtensions

Entity Framework EF Core efcore Bulk Batch Extensions with BulkCopy in .Net for Insert Update Delete Read (CRUD), Truncate and SaveChanges operations on SQL Server, PostgreSQL, MySQL, SQLite

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BulkInsertOrUpdateAsync performance questions

bkulcsar opened this issue · comments

Hi Guys!

I need some help regarding to BulkInsertOrUpdateAsync operation, I could not find any performance benchmarks related to BulkInsertOrUpdateAsync, however I think it not as fast as I would expect from SQL MERGE operation. I am using Azure MS SQL server with 200 DTU and for 250k records the BulkInsertOrUpdateAsync tooks arround 6 minues for the first run (when basically all records are new so it does only insert), and when we load again the same records with some changes it tooks 2 minutes. If course increasing the MS DB's DTU would help the performance, but first I want to optimize the BulkInsertOrUpdateAsync as much as possible.

So my question is that is 6 mins normal to load 250k records with BulkInsertOrUpdateAsync with 200 DTU Azure SQL? Can I improve its perfomance with configuration? I cannot find any config in BulkConfig which might can help.

Code for the merge operation:

public virtual async Task BulkMergeAsync(
    IEnumerable<T> entities,
    BulkConfig? bulkConfig = null,
    CancellationToken cancellationToken = default)
        if (bulkConfig == null)
            bulkConfig = new BulkConfig();

        bulkConfig.CalculateStats = true;
        bulkConfig.BulkCopyTimeout = 0;

        await _dbContext.BulkInsertOrUpdateAsync(entities, bulkConfig, cancellationToken: cancellationToken);

        if (bulkConfig.StatsInfo != null)
            _logger.LogInformation($"Record inserted: {bulkConfig.StatsInfo.StatsNumberInserted}, Record updated: {bulkConfig.StatsInfo.StatsNumberUpdated}");
    catch (Exception ex)
        _logger.LogError($"Error during BulkInsertOrUpdate: {ex.Message}");

I was able to find out that not the MERGE operation is slow, but the INSERT to the temporary created table.
Is there a way to omit the COLLATE in the insert statement?

Both Insert or Update should be done much faster, as you check from ReadMe stats even 1 mil. is under a minute.
But that is done on local Sql, not sure about Azure, still should not be over 1 min.
Where did you find COLLATE statement in the code ?

I've changed the pricing model from DTU based to vCore based, it became much faster, however it's still not fast enough.
I've tested the operations with 2 million records:

80 vCores - 55 seconds - insert
80 vCores - 190 seconds - merge
16 vCores - 58 seconds - insert
16 vCores - 206 second - merge
8 vCores - 100 seconds - insert
8 vCores - 300 seconds - merge

I've profiled the SQL statements generated by the library and I saw there is a COLLATE function for every column:

insert bulk <table> (<column1> NVarChar(40) COLLATE SQL_Latin1_General_CP1_CI_AS, <column2> NVarChar(100) COLLATE SQL_Latin1_General_CP1_CI_AS ....)

This may can cause performance issues.