Immer patches for Array are correct, yet inefficient. Suggesting a new algorithm for optimized patches

Question

Immer patches for Array are correct, yet inefficient. Suggesting a new algorithm for optimized patches

yoavaa opened this issue a year ago · comments

I am thinking as offering it as a pull request - let me know what you think

🚀 Feature Proposal

Today when computing a patch on arrays, immer compares objects by index, which results in all array change in case of unshift or splice usage. An example is the following test taken from immer testing -

https://github.com/immerjs/immer/blob/main/__tests__/patch.js#L543-L559

describe("arrays - multiple prepend", () => {
	runPatchTest(
		{x: [1, 2, 3]},
		d => {
			d.x.unshift(4)
			d.x.unshift(5)
			// 4,5,1,2,3
		},
		[
			{op: "replace", path: ["x", 0], value: 5},
			{op: "replace", path: ["x", 1], value: 4},
			{op: "replace", path: ["x", 2], value: 1},
			{op: "add", path: ["x", 3], value: 2},
			{op: "add", path: ["x", 4], value: 3}
		]
	)
})

the expected result should be instead

		[
			{op: "add", path: ["x", 0], value: 5},
			{op: "add", path: ["x", 1], value: 4}
		]

Motivation

I am working on a new framework that also serializes patches, and the above is a performance issue.

Consider the above case - for an array of object, lets say an array of 100 objects with 10 attributes each, the current algorithm will generate 1000 patches, while my suggestion is to produce only 1 per added / removed object

Can this be solved in user-land code?

no

Example

See the example above.

Usage: immer -> serialize -> deserialize -> patched object

suggested algorithm

(pseudo code)

the problem with json diff is that given two arrays, without any additional knowledge, to figure out if an item was
pushed to the front, we have to do a === comparison between the array items, which is O(n^2) complexity.

for (let a = 0; a < A.length; a++) {
    for (let b = 0; b < A.length; b++) {
        if (A[a] === B[b])
            // compute the diff    
    }    
}

However, we can create an algorithm focused on small changes that has complexity O(n) with a cutoff.
The algorithm makes the assumption that small changes can be found fast and require small number of mutations to describe.
The cutoff is set at let limit = log(min(A.length, B.length)) and the algorithm has complexity O(limit^2) ~ O(n)

The algorithm -

let limit = log(min(A.length, B.length));
let a=0, b=0;
mainLoop: while (a < A.length && b < B.length) {
    if (A(a) === B(b)) {
        a += 1;
        b += 1
    }
    continue;
    for (let seekSize = 1; seekSize < limit; seekSize++) {
        for (let index = 1; index <= seekSize; index++) {
            if (A[a+index] === B[b+seekSize-index])
                // we have a match, the elements from A between (a..a+index) and replace them with the items from B between (b..b+seekSize-index)
            continue mainLoop;    
        }
    }
    // revert to direct comparison of attributes json diff model (the standard model used by all other algorithms)
}

Yoav Abrahami · Answer 1 · Sun Jun 25 2023 14:42:07 GMT+0800 (China Standard Time)

Update:

Reading the Immer source again, I can see that for objects there is the state.assigned_ member which marks which object properties have been updated or deleted.

I suggest to extend the idea for arrays to also include added and removed array items, those making the above algorithm redundant and making for a simpler solution.

What do you think?