z is slow because it reparses function source code every time

Question

z is slow because it reparses function source code every time

jedwards1211 opened this issue 4 years ago · comments

I mean pattern matching is cool when it's a core language feature, but...

Test Code

const { matches } = require('z')
 
const zCompress = numbers => matches(numbers)(
  (x, y, xs) => x === y
    ? zCompress([x].concat(xs))
    : [x].concat(zCompress([y].concat(xs))),
  (x, xs) => x // stopping condition
)

const jsCompress = numbers => numbers.filter((n, i) => n !== numbers[i - 1])

const input = [1, 1, 2, 3, 4, 4, 4]

const count = 100000
function time(fn) {
  const start = Date.now()
  for (let i = 0; i < count; i++) fn(input)
  return Date.now() - start
}
 
const zTime = time(zCompress)
const jsTime = time(jsCompress)

console.log('z: ', zTime + ' ms')
console.log('js: ', jsTime + ' ms')

console.log(`z took ${zTime / jsTime} times longer`)

Results

MacBook Pro Mid 2014, macOS 10.15.5. Node 12.16.2.

$ node index.js 
z:  12560 ms
js:  23 ms
z took 546.0869565217391 times longer

Wagner · Answer 1 · Wed Jun 10 2020 00:03:12 GMT+0800 (China Standard Time)

@jedwards1211 Hi Andy, let's talk about it:

The compress example is only an illustration to show how pattern matching works for people who don't know how pattern matching works. There's no intention for that code being fast or used in real-life scenarios instead of .filter in real life if the job is only for filtering numbers. You may know that pattern matching is much more powerful than that, but we can make that clear in the docs to avoid such misunderstanding 👍
About your benchmark.. honestly, it's comparing apple and oranges 😕 , the z example is deliberately using recursion and array .concat (again, only for didactic sake), while your code isn't doing the same, you are comparing much more JS recursion rather the z library itself!

In regard of that, I'll respectfully change the issue title.

Nevertheless, I believe that z may really have a performance drawback in a fair comparison as well, so in that case please send a benchmark comparing only the z code with another similar code containing all the powerful features that pattern matching does.

Thanks!

Andy Edwards · Answer 2 · Wed Jun 10 2020 00:46:37 GMT+0800 (China Standard Time)

@leonardiwagner the main problem is that z reparses the match function source code on every call to matches, which slows any possible use case down. The compress example demonstrates that perfectly well even if it's not a real-world example. Yes, the example should be something where the code is clearer with pattern matching, but let's focus on the catastrophic slowdown.

To prove that .concat is not the main bottleneck (even though it's an inefficient way to do things in JS) I made this version:

const concatCompress = numbers =>
  numbers.length <= 1
    ? numbers
    : numbers[0] === numbers[1]
      ? concatCompress([numbers[0]].concat(numbers.slice(2)))
      : [numbers[0]].concat(concatCompress(numbers.slice(1)))

Results:

z:  12017 ms
concat:  117 ms
js:  19 ms

So z still took 100X longer than the equivalent algorithm that doesn't use z. The only place that can be coming from is reparsing the function source code.

Wagner · Answer 3 · Wed Jun 10 2020 04:49:19 GMT+0800 (China Standard Time)

the main problem is that z reparses the match function source code on every call to matches, which slows any possible use case down.

@jedwards1211 Andy, you misunderstood the compress example with what exactly z or any pattern matching does. The didactic example performs as you state, not z itself! Let's see it together next:

So z still took 100X longer than the equivalent algorithm that doesn't use z.

@jedwards1211 That is not true, they're not equivalent, not even close:

The first algorithm (didactic example) deliberately uses recursion on each array element without necessity in a language that don't provide tail call recursion, and then, in each recursion z is called to destroy and match the partial array and apply a function by given matched pattern
The second algorithm (yours) is just applying a function to each element of the array.

This would be an equivalent code:

const zCompress = numbers => matches(numbers)(
   (xs) => xs.filter((n, i) => n !== xs[i - 1])
)

As you can see, it's not the job of z or any pattern matching to loop through array or filter elements, I think that's what you missed.

Sorry to use the harsh temper to explain, but you came up way too biased about proving that's hundreds of times slower howsoever without caring about the comparison! 😅

I strongly think there's room to find performance issues and it'd be amazing you helping us finding bottlenecks and improvements to this library, any kind of feedback is welcomed!

Andy Edwards · Answer 4 · Wed Jun 10 2020 05:07:48 GMT+0800 (China Standard Time)

Yes I am biased, I dislike the idea of working magic by parsing functions' source code, both because it's brittle (doesn't work with transpilers) and because it is computationally expensive, and it might be a nasty surprise to people who use this library because there's no warning in the readme. I think it would be better to implement with some other method (unfortunately I don't think there's a way to get a syntax as convenient with good performance) or at least add a warning to the README about this.

For some reason that code example outputs undefined, though it seems like it should work.

In any case, even the overhead of just one matches call per repetition is 65X slower than without it, which to me is very significant:

const { matches } = require('z')

const zCompress = numbers => matches(numbers)(
   (xs) => xs.filter((n, i) => n !== xs[i - 1])
)

const jsCompress = numbers => numbers.filter((n, i) => n !== numbers[i - 1])

const input = [1, 1, 2, 3, 4, 4, 4]

console.log(zCompress(input))
console.log(jsCompress(input))

const count = 100000
function time(fn) {
  const start = Date.now()
  for (let i = 0; i < count; i++) fn(input)
  return Date.now() - start
}
 
const zTime = time(zCompress)
const jsTime = time(jsCompress)

console.log('z: ', zTime + ' ms')
console.log('js: ', jsTime + ' ms')

console.log(`z took ${zTime / jsTime} times longer than pure js`)

$ node index.js
undefined
[ 1, 2, 3, 4 ]
z:  1500 ms
js:  23 ms
z took 65.21739130434783 times longer than pure js

Wagner · Answer 5 · Wed Jun 10 2020 07:00:26 GMT+0800 (China Standard Time)

@jedwards1211 you're right, I mostly agree with you, but tell me about that:

it might be a nasty surprise to people who use this library because there's no warning in the readme.

Do you have a suggestion on the exact text that should be stated? I just fear people thinking that just using z a few times will get the whole application 65x worse!

I don't see any harm to call z for 100 or 1000 times on an execution and have a 150ms overhead for lots of conveniences, code and bug-prone reduction, readability, etc.. (psss, let's see the bright side too! 🤣)

For 100k calls? I already had trouble with other libraries in that condition as well and I was more like "ok, I have to remove the convenience from that piece" rather than "OMG! they should warned me before". I think every time you do 100k something it's implicit that probably code optimization will be necessary.

That code translation overhead comes from js-function-reflector dependency, maybe we can improve how things work there, or have another idea how to reflect JS functions. Now JS has ReflectAPI but I'm not sure if that does the job. Help about that subject is also appreciated.

Andy Edwards · Answer 6 · Wed Jun 10 2020 08:17:21 GMT+0800 (China Standard Time)

You're right, I'm being too negative, it would only be a problem for large datasets or something that's called constantly in a busy webserver.

The closest experience I had to this was a validation library that would memoize my callback functions by source code, so variables I was accessing via the closure were never updating because it was always using an old instance of my function with old closure bindings. That was confusing at first and really frustrating once I figured out what was going on, and probably where my dislike of using the source code comes from. Though that's a bigger deal than what I'm complaining about with z, it probably wouldn't take long to find out if z is a bottleneck in an app.

I think a good improvement would be an extra, optional API that takes a bit more typing to use, but wouldn't rely on parsing the functions so it would be fast. Then the readme could say that the current API is not the best choice for operating on large datasets or performance-critical applications (or using with a transpiler), but that this alternative API can be used in those cases, so people wouldn't be scared away. I'm tempted to say some kind of syntax that could also match duck types like:

const matches = require('z')
const {Tail} = matches

matches(input)(
  [Number, number => ...],
  [String, string => ...],
  [[Number], arrayOfNumbers => ...],
  [[MyClass], instanceOfMyClass => ...],
  [[Number, String, Tail], (number, string, tailOfArray) => ...],
  [[Number, Boolean, Tail], (number, boolean, tailOfArray) => ...],
)

Wagner · Answer 7 · Wed Jun 10 2020 10:09:22 GMT+0800 (China Standard Time)

@jedwards1211 that suggestion is really great! I'm planning to do a v2 and I'm gonna put that on the roadmap, thank you! 😄

Andy Edwards · Answer 8 · Wed Jun 10 2020 10:14:06 GMT+0800 (China Standard Time)

Cool, glad I could propose something constructive out of this :)

#!/bin/BasH · Answer 9 · Tue Feb 16 2021 03:07:46 GMT+0800 (China Standard Time)

brittle

Valid critique. It is a hack. I do think it is a neat hack.

It would be possible to flip the function arguments. This has two benefits I can think of:

When the pattern matching function is defined the patterns can be parsed and stored in the function scope (if necessary). The returned function can reuse these parsed patterns.
It allows one to define a function with pattern matching without having to provide its arguments.

I have written a poc pattern matching utility myself. Maybe we can share ideas between the projects.

https://github.com/bas080/Patroon.js