lodash / lodash

A modern JavaScript utility library delivering modularity, performance, & extras.

Home Page:https://lodash.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Memoizing functions with multiple arguments

willpiers opened this issue · comments

The docs mention that, "By default, the first argument provided to the memoized function is used as the map cache key"

It would be great if the cache were multi-dimensional so _.memoize worked well out of the box with longer functions. i.e. functions with more arguments.

If there is a simple way to do this already, we can close. If this sounds useful I can look into submitting a PR

You can use the resolver function to compose your own.

Dup of #680.

There is any reason why _.memoize does not support multiple arguments caching?

@VictorQueiroz serializing isn't cheap or straight-forward so we punt on that with a resolver function that you can customize for your particular scenario.

@jdalton It's for a good reason then. Thank's for the quick reply

@jdalton can you please provide an example of a simple resolver to do this? thanks

@anton6 Sure thing!

const example = _.memoize((...args) => args, (...args) => JSON.stringify(args))
example(1,3,2)
// => [1, 3, 2]
example(1,2,3)
// => [1, 2, 3]

Is there a resolver that works out-of-the-box for sets of hashable arguments? That is, right now I can use a Function as my memoize key, but I can't use a [Function, string] pair, because that array has a different identity than the previous one, even if the identities are the same.

@anton6 Sure thing!

const example = _.memoize((...args) => args, (...args) => JSON.stringify(args))
example(1,3,2)
// => [1, 3, 2]
example(1,2,3)
// => [1, 2, 3]

@anton6 Sure thing!

const example = _.memoize((...args) => args, (...args) => JSON.stringify(args))
example(1,3,2)
// => [1, 3, 2]
example(1,2,3)
// => [1, 2, 3]

It not work with function argument. and sometimes, we just need === of arguments, not content equals.

@jdalton the original suggestion in this issue was to create a multi-dimensional cache for memoize, which is very different from serializing multiple arguments into a single key, and in many cases much more performant.

To be clear: a multidimensional cache would be a Map of Map of Maps (one layer of nesting per argument).

Would you consider a PR for a change like this? The resolver argument could be left in place for those who want to serialize, and the multidimensional cache would be used when the "resolver" argument is not provided.

To add to the comments above.

If you try implementing memoize on your own in vanilla js you'll understand why the may have decided to provide a resolver.

Root cause

Objects and Array are passed by reference in JavaScript.

Example

Imagine you have a function that accepts three args a, b andc.

myCachedFunction(1,2,3)

So your arguments are [1,2,3].
If you store [1,2,3] in your cache (which is an instance of Map), and then do cache.get([1,2,3]) you will get undefined. Because this [1,2,3] is not the same instance as the one that was stored in the Map.

Conclusion

For this reason I guess it's up to us to define how we want hash the args, e.g. using JSON.stringify.

Hope it makes sense.

@giulioambrogi That's precisely the problem, though; it's quite difficult to come up with your own resolver in plain vanilla JS, even though the INTENT of memoize(function(a, b, c, d)) without a custom resolver is quite clear (we want to memoize on the unique set (a, b, c, d)). That's why I think this should be a feature; this is a problem worth solving once for everyone.

I know this is an old PR, and I came here hoping there could be some such functionality, but really I am siding heavily with the maintainers in that this should not be implemented. Two reasons:

  • Hard to find out how to consistently serialize argument by value (as opposed to reference). You would need to tell the memoizing functions which arguments you want to get memoized by value and which by reference only. So a simple === is the only thing that makes sense. Chances that that's what the user means when trying to memoize by multiple arguments are minimal in my opinion.
  • Computing complex keys for cache is a problem due to performance reasons. Why I came here is that I am trying to memoize code that actually attempts to generate synthetic keys for many objects and is killing the server CPU, so I am trying to figure out how to optimize and simplify it, but really - if you are going to throw complex logic you might be hiding some CPU-heavy features.

So, while I wish this was a feature, unless somebody could propose an intuitive interface and efficient implementation then I do not see how this could be a usable lodash feature.

No one wants complex or multidimensional or recursive logic, as far as I know. I don't think anyone would argue against requiring a resolver for those cases. I'm gonna go out on a limb and say that the only thing people want out of this feature is that memoize(f(a, b, c)) works the same exactly the same as memoize(f(a)): by comparing each argument separately, in order, using ===. I totally understand arguments against this feature that amount to "it's actually impossible to efficiently key a map in this way due to deficiencies in the language", but otherwise I have to say it seems like a perfectly obvious feature to add.

I think this feature request still makes a lot of sense. The multidimensional map mentioned in the description by @willpiers and in this comment by @ahfarmer would definitely work well for multi argument functions. I created here for me this crude implementation:

export default function memoize(func) {
  const cache = new Map();

  const memoized = function (...args) {
    let innerCache = cache;
    for (let i = 0; i < func.length - 1; i++) {
      const key = args[i];
      if (!innerCache.has(key)) {
        innerCache.set(key, new Map());
      }
      innerCache = innerCache.get(key);
    }
    const key = args[args.length - 1];
    if (innerCache.has(key)) {
      return innerCache.get(key);
    }

    const result = func(...args);
    innerCache.set(key, result);
    return result;
  };

  return memoized;
}

This should work the same as _.memoize(func, (...args) => JSON.stringify(args));, but for my use case it was much more performant to go with the multidimensional version above.

A problem I noticed here is that when looping through the arguments to create the inner levels of maps I was unsure whether to use args.length or func.length. In JS you can declare a function that accepts a number of arguments and then call it with a different number of arguments. And in my case, I found some instances where different calls to the same function passed different number of arguments. And in those cases the map structure wouldn't be build correctly if I was using args.length. So using func.length seemed more reliable for me. But for a more comprehensive solution we would need to consider something like const func = (...args) => {}, where func.length would be 0, but callers would definitely pass more arguments to this function. I thought about doing something like const length = Math.max(func.length, args.length) which would help a bit, but if a function is being called with a variable number of arguments this would still be tricky.

I've been using the suggestion from @shermam but i've modified it to support functions that can be passed a varying number of arguments like variadic functions (...args) => {}

export default function memoize(func) {
  const cache = new Map();

  const memoized = function (...args) {
    let innerCache = cache;

    // first layer of the map is the arguments length
    // if two calls have different number of arguments
    // then they cannot be the same call
    if (!innerCache.has(args.length)) {
        innerCache.set(args.length, new Map());
    }
    innerCache = innerCache.get(args.length);

    // using args.length because func.length is 0 for variadic functions
    for (let i = 0; i < args.length - 1; i++) {
      const key = args[i];
      if (!innerCache.has(key)) {
        innerCache.set(key, new Map());
      }
      innerCache = innerCache.get(key);
    }

    const key = args[args.length - 1];
    if (innerCache.has(key)) {
      return innerCache.get(key);
    }

    const result = func(...args);
    innerCache.set(key, result);
    return result;
  };

  return memoized;
}

Here's how I've been visualizing the data structure (cache).

const fn = memoize((...args) => { });

fn(1, 2, 3)
fn(1, 2)
{
  3: { // first call
    1: {
      2: {
        3: [memoized-result]
      }
    }
  },
  2: { // second call
    1: {
      2: [memoized-result]
    }
  }
}

@anton6 Sure thing!

Memoizing based on all function arguments should definitely be the standard case.

A (pure) function maps inputs to an output. You cannot memoize the output based on only a part of the inputs. This implies non-determinism!

We just debugged our code for 3 days with 4 senior developers in our company and removing memoize() from one function was the solution.
It is totally contra-intuitive that a utility function from lodash only operates on one parameter. A function with multiple parameters is like a function which takes a tuple as one single argument. And the uniqueness of a tuple comes from all of its elements.

This (and some other strange behaviors with utility functions in lodash) discourage me to use this library any further.

The memoize function should throw by default if supplied with multiple arg function and no resolver. This is trivial to implement and the utility as it is now, can only be categorized as a potential foot gun.
I would open a PR but i think it would be either sitting a lot until a major version is iminent or maybe frowned upon. It seems so simple to fix so that this behavior can be caught during CI..

Could this or some variant of this change be applied to make the utility safe?
Or at least write full behavior in function jsdoc description with warning.
#5858