dnlnln / generate-sql-merge

Generate SQL MERGE statements with Table data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'System.OutOfMemoryException' was thrown on a Large Table

deyshin opened this issue · comments

The script has been very helpful to my project. Thank you very much.

I have run into one problem - When I run the script on a very large table, I get the following error:

Exception of type 'System.OutOfMemoryException' was thrown.

The particular table has 4 columns - 2 Int32 and 2 String type with about 1.7 million rows of data. Is it something that we can solve - or is it a inherent system limitation?

The tool was originally designed for synchronising smaller data sets between environments via source control, so typically this meant that a few thousand rows of static data would be the max. Others have used it for slightly different purposes, such as for performing simple ETL operations between databases.

In order to allow it to be used for datasets of this size, and have the data fully synchronised (e.g. all rows in the target inserted/updated/deleted to reflect the source data set), then a fundamentally different approach is needed in order to merge data in a reliable and efficient way.

The @include_values bit parameter, added recently (see #37, #47 & #34), might be a better option for working with larger datasets. When set to 0, it omits the VALUES clause containing all the data, in favour of simply joining onto the @table_name at merge time. This would allow you to populate the source table before running the merge, in whichever way is appropriate for your dataset (e.g. a single BULK INSERT statement containing all data, batches of INSERT statements containing thousands of rows at a time, etc).

For example:

EXEC [AdventureWorks2017].dbo.sp_generate_merge
  @table_name = 'CurrencyRate', 
  @schema = 'Sales',
  @include_values = 0

Generates the following statement:

MERGE INTO [Sales].[CurrencyRate] AS [Target]
USING [Sales].[CurrencyRate] AS [Source]
ON ([Target].[CurrencyRateID] = [Source].[CurrencyRateID])
WHEN MATCHED AND (
	NULLIF([Source].[CurrencyRateDate], [Target].[CurrencyRateDate]) IS NOT NULL OR NULLIF([Target].[CurrencyRateDate], [Source].[CurrencyRateDate]) IS NOT NULL OR 
	NULLIF([Source].[FromCurrencyCode], [Target].[FromCurrencyCode]) IS NOT NULL OR NULLIF([Target].[FromCurrencyCode], [Source].[FromCurrencyCode]) IS NOT NULL OR 
	NULLIF([Source].[ToCurrencyCode], [Target].[ToCurrencyCode]) IS NOT NULL OR NULLIF([Target].[ToCurrencyCode], [Source].[ToCurrencyCode]) IS NOT NULL OR 
	NULLIF([Source].[AverageRate], [Target].[AverageRate]) IS NOT NULL OR NULLIF([Target].[AverageRate], [Source].[AverageRate]) IS NOT NULL OR 
	NULLIF([Source].[EndOfDayRate], [Target].[EndOfDayRate]) IS NOT NULL OR NULLIF([Target].[EndOfDayRate], [Source].[EndOfDayRate]) IS NOT NULL OR 
	NULLIF([Source].[ModifiedDate], [Target].[ModifiedDate]) IS NOT NULL OR NULLIF([Target].[ModifiedDate], [Source].[ModifiedDate]) IS NOT NULL) THEN
 UPDATE SET
  [CurrencyRateDate] = [Source].[CurrencyRateDate], 
  [FromCurrencyCode] = [Source].[FromCurrencyCode], 
  [ToCurrencyCode] = [Source].[ToCurrencyCode], 
  [AverageRate] = [Source].[AverageRate], 
  [EndOfDayRate] = [Source].[EndOfDayRate], 
  [ModifiedDate] = [Source].[ModifiedDate]
WHEN NOT MATCHED BY TARGET THEN
 INSERT([CurrencyRateID],[CurrencyRateDate],[FromCurrencyCode],[ToCurrencyCode],[AverageRate],[EndOfDayRate],[ModifiedDate])
 VALUES([Source].[CurrencyRateID],[Source].[CurrencyRateDate],[Source].[FromCurrencyCode],[Source].[ToCurrencyCode],[Source].[AverageRate],[Source].[EndOfDayRate],[Source].[ModifiedDate])
WHEN NOT MATCHED BY SOURCE THEN 
 DELETE
;

The downside is that you'd need to manually edit the generated statement to use the pre-populated table object (rather than the original table object), so that the USING clause reads something like this:

USING [#currencyRateBulkInserted] AS [Source]

I'm curious to know if this kind of general approach would suit your needs? If so then there's always the possibility of adding a new parameter to override the source table name.

commented

"The downside is that you'd need to manually edit the generated statement to use the pre-populated table object (rather than the original table object), "

A way to make it more dynamic would be add an additional parameter that would take in a string - custom table name (e.g. @custom_source_name = '#currencyRateBulkInserted') - and the procedure would use this parameter's value instead if NOT NULL (a validation can be added at the top of the merge to throw an error if an object with that name doesn't exist). Just a thought off the top of my head. It's possible that I might be missing something.

An update on this: @EitanBlumin has very helpfully implemented a new parameter that allows you to split source rows into multiple MERGE statements. To use, use @max_rows_per_batch=1000 or whatever batch size you need and be sure to also include the @delete_if_not_matched=0 param.

This should avoid the out-of-memory exception. If it recurs in spite of this, please comment here and I will re-open the issue.