'System.OutOfMemoryException' was thrown on a Large Table
deyshin opened this issue · comments
The script has been very helpful to my project. Thank you very much.
I have run into one problem - When I run the script on a very large table, I get the following error:
Exception of type 'System.OutOfMemoryException' was thrown.
The particular table has 4 columns - 2 Int32 and 2 String type with about 1.7 million rows of data. Is it something that we can solve - or is it a inherent system limitation?
The tool was originally designed for synchronising smaller data sets between environments via source control, so typically this meant that a few thousand rows of static data would be the max. Others have used it for slightly different purposes, such as for performing simple ETL operations between databases.
In order to allow it to be used for datasets of this size, and have the data fully synchronised (e.g. all rows in the target inserted/updated/deleted to reflect the source data set), then a fundamentally different approach is needed in order to merge data in a reliable and efficient way.
The @include_values bit
parameter, added recently (see #37, #47 & #34), might be a better option for working with larger datasets. When set to 0
, it omits the VALUES
clause containing all the data, in favour of simply joining onto the @table_name
at merge time. This would allow you to populate the source table before running the merge, in whichever way is appropriate for your dataset (e.g. a single BULK INSERT
statement containing all data, batches of INSERT
statements containing thousands of rows at a time, etc).
For example:
EXEC [AdventureWorks2017].dbo.sp_generate_merge
@table_name = 'CurrencyRate',
@schema = 'Sales',
@include_values = 0
Generates the following statement:
MERGE INTO [Sales].[CurrencyRate] AS [Target]
USING [Sales].[CurrencyRate] AS [Source]
ON ([Target].[CurrencyRateID] = [Source].[CurrencyRateID])
WHEN MATCHED AND (
NULLIF([Source].[CurrencyRateDate], [Target].[CurrencyRateDate]) IS NOT NULL OR NULLIF([Target].[CurrencyRateDate], [Source].[CurrencyRateDate]) IS NOT NULL OR
NULLIF([Source].[FromCurrencyCode], [Target].[FromCurrencyCode]) IS NOT NULL OR NULLIF([Target].[FromCurrencyCode], [Source].[FromCurrencyCode]) IS NOT NULL OR
NULLIF([Source].[ToCurrencyCode], [Target].[ToCurrencyCode]) IS NOT NULL OR NULLIF([Target].[ToCurrencyCode], [Source].[ToCurrencyCode]) IS NOT NULL OR
NULLIF([Source].[AverageRate], [Target].[AverageRate]) IS NOT NULL OR NULLIF([Target].[AverageRate], [Source].[AverageRate]) IS NOT NULL OR
NULLIF([Source].[EndOfDayRate], [Target].[EndOfDayRate]) IS NOT NULL OR NULLIF([Target].[EndOfDayRate], [Source].[EndOfDayRate]) IS NOT NULL OR
NULLIF([Source].[ModifiedDate], [Target].[ModifiedDate]) IS NOT NULL OR NULLIF([Target].[ModifiedDate], [Source].[ModifiedDate]) IS NOT NULL) THEN
UPDATE SET
[CurrencyRateDate] = [Source].[CurrencyRateDate],
[FromCurrencyCode] = [Source].[FromCurrencyCode],
[ToCurrencyCode] = [Source].[ToCurrencyCode],
[AverageRate] = [Source].[AverageRate],
[EndOfDayRate] = [Source].[EndOfDayRate],
[ModifiedDate] = [Source].[ModifiedDate]
WHEN NOT MATCHED BY TARGET THEN
INSERT([CurrencyRateID],[CurrencyRateDate],[FromCurrencyCode],[ToCurrencyCode],[AverageRate],[EndOfDayRate],[ModifiedDate])
VALUES([Source].[CurrencyRateID],[Source].[CurrencyRateDate],[Source].[FromCurrencyCode],[Source].[ToCurrencyCode],[Source].[AverageRate],[Source].[EndOfDayRate],[Source].[ModifiedDate])
WHEN NOT MATCHED BY SOURCE THEN
DELETE
;
The downside is that you'd need to manually edit the generated statement to use the pre-populated table object (rather than the original table object), so that the USING
clause reads something like this:
USING [#currencyRateBulkInserted] AS [Source]
I'm curious to know if this kind of general approach would suit your needs? If so then there's always the possibility of adding a new parameter to override the source table name.
"The downside is that you'd need to manually edit the generated statement to use the pre-populated table object (rather than the original table object), "
A way to make it more dynamic would be add an additional parameter that would take in a string - custom table name (e.g. @custom_source_name = '#currencyRateBulkInserted') - and the procedure would use this parameter's value instead if NOT NULL (a validation can be added at the top of the merge to throw an error if an object with that name doesn't exist). Just a thought off the top of my head. It's possible that I might be missing something.
An update on this: @EitanBlumin has very helpfully implemented a new parameter that allows you to split source rows into multiple MERGE statements. To use, use @max_rows_per_batch=1000
or whatever batch size you need and be sure to also include the @delete_if_not_matched=0
param.
This should avoid the out-of-memory exception. If it recurs in spite of this, please comment here and I will re-open the issue.