Migrate the code to zephir

Question

Migrate the code to zephir

alrik11es opened this issue 7 years ago · comments

Hey! I've tested this code and looks nice. I've tried to use just for fun but the performance for big trees seems to become really really slow.

I think this project should be migrated as PHP C extension with zephir or be just done in other language like Go.

https://zephir-lang.com/

What you think? Worth a try?

Matt Nagi · Answer 1 · Sun Mar 04 2018 05:20:56 GMT+0800 (China Standard Time)

Could you point me to an example script that shows off the performance problem? I'd like to at least verify that there isn't some issue with the current implementation before looking into other options like Zephir.

Matt Nagi · Answer 2 · Sun Mar 04 2018 05:22:16 GMT+0800 (China Standard Time)

Also: if you are looking for a Go implementation of a merkle tree, many already exist.

Marcos · Answer 3 · Mon Mar 05 2018 16:54:45 GMT+0800 (China Standard Time)

Yes I know, but none as easy to use as this one. I was playing with this example:

$tree = new FixedSizeTree(10000, $hasher, $finished);
$tree->set(0, 'genesis');
for($i=1; $i<10000; $i++) {
    $result = $tree->set($i, md5(mt_rand(0,99999)));
    $output->write($i."\r");
}

It's a real possibility that I were doing something wrong... Thanks for taking time on this.

Matt Nagi · Answer 4 · Tue Mar 06 2018 02:29:55 GMT+0800 (China Standard Time)

Okay, so the problem is I'm iterating over the entire tree basically every time ->set() is called in order to decide if I should execute the "complete" callback or not. Turns out for very large trees like 10000, this means at least 10000 ->set() calls which means at least 10000 * 10000 iterations... so O(n^2) time based on the size of the tree (and actually more, since I'm only talking about the base nodes). This is not ideal.

The only other reason I try to calculate the hash on ->set() is if the value being passed to ->set() is very large. For example, if you had a tree size of 32 and your ->set() values were 1GB each or something, then there would be no reason to keep them around once the hash has been made of those values, so I try to re-calc every time.

So basically in the case of very large values passed to ->set(), you want to re-calc more often, and in the case of very large trees, you want to re-calc less often. Right now I don't let the user choose. I'll look into a way to allow the choice for the user so the 10000 node case can be covered.

Just as a note, It took my computer 2 minutes to do a tree size of 10000 with recalc on. I hacked it to turn the re-calc off just to see how fast it was, and it took 300ms instead, so about 4 orders of magnitude improvement.

Look for an update soon.

Marcos · Answer 5 · Tue Mar 06 2018 19:32:23 GMT+0800 (China Standard Time)

That was fast! Nice.

Seems right to me to try to optimize this library before any possible port to other languages.

I have at least other question but I'm gonna open another issue for that.

Matt Nagi · Answer 6 · Mon Nov 02 2020 21:17:58 GMT+0800 (China Standard Time)

So I know this was a LONG time ago, but I released 2.0 and it definitely fixes this issue by default now. So I'm going to closed this as fixed.