I made your regex 2X faster.

Question

I made your regex 2X faster.

kyx0r opened this issue 3 years ago · comments

Take a look at my regex.c https://github.com/kyx0r/neatvi/blob/master/regex.c
I can't think of any more ways to make it faster now, (unless rewrite completely from scatch using a diffrent approach described in many research papers.) I was able to cut down cpu usage by twice which is very substantial already. Overall the net patch will probably cost you +25 more lines on code, but it will definitely be worth it. Also you will have way less active memory usage and allocations, and thus faster compile times despite the fact that I precompute the bracket expressions. It's just so much better, also the problem is solved ireratively, but it still behaves like recursion would, but zero overhead of stackframe allocations and useless copy of data. Also the marks array for subs is reset to -1 for every character, well that was a huge bottleneck, so instead of memsetting the whole array each time I only reset a max portion that was used and it causes a massive speed up!

Kyryl · Answer 1 · Wed Apr 21 2021 02:07:58 GMT+0800 (China Standard Time)

By the way, @aligrudi it's actually usable on raspberry pi now, and other weak cpus.

Ali Gholami Rudi · Answer 2 · Sat Apr 24 2021 03:09:40 GMT+0800 (China Standard Time)

Kyryl ***@***.***> wrote:

Take a look at my regex.c https://github.com/kyx0r/neatvi/blob/master/regex.c I can't think of any more ways to make it faster now, (unless rewrite completely from scatch using a diffrent approach described in many research papers.) I was able to cut down cpu usage by twice which is very substantial already. Overall the net patch will probably cost you +25 more lines on code, but it will definitely be worth it. Also you will have way less active memory usage and allocations, and thus faster compile times despite the fact that I precompute the bracket expressions. It's just so much better, also the problem is solved ireratively, but it still behaves like recursion would, but zero overhead of stackframe allocations and useless copy of data. Also the marks array for subs is reset to -1 for every character, well that was a huge bottleneck, so instead of memsetting the whole array each time I only reset a max portion that was used and it causes a massive speed up!

Nice job and very interesting. I'll try to examine it minutely. Note that in Neatvi, you can use libc's version of regex.c without any change; just remove vi.h and omit regex.o from OBJS in Makefile. Although I rarely do that, it is very useful for testing or comparing its performance. Thanks, Ali

Kyryl · Answer 3 · Sat Apr 24 2021 04:13:37 GMT+0800 (China Standard Time)

@aligrudi Hello Ali, yes I am aware that it's possible to switch out the regex version to libc, but I never really liked the idea, so my fork's regex.c actually doesn't really try to be replaceable, I don't have regex.h and everything is embeded in vi.h for better coherency of the codebase. I did not change the ABI, so if you want to try swaping it you can but probably it won't be as easy as in your version. However you are probably very familiar with this code anyway, so there should be no problem to look over and see what exactly I changed, but to be honest there are lots of changes to regex.c.

Kyryl · Answer 4 · Sun Aug 08 2021 06:32:00 GMT+0800 (China Standard Time)

@aligrudi Here is a follow up about regexes: kyx0r/nextvi@45ac822

Hopefully you find it interesting to take a look at.

Regards,
Kyryl

Ali Gholami Rudi · Answer 5 · Mon Aug 16 2021 01:09:09 GMT+0800 (China Standard Time)

Kyryl ***@***.***> wrote:

@aligrudi Here is a follow up about regexes: kyx0r/nextvi@45ac822 Hopefully you find it interesting to take a look at.

Interesting indeed. I will examine it. Ali

Kyryl · Answer 6 · Fri Sep 03 2021 05:36:43 GMT+0800 (China Standard Time)

@aligrudi Hello Ali, any thoughts? Also take a look at this repo https://github.com/kyx0r/pikevm it has it as a library + test cases. I am getting very close to having NFA straight up beating DFA on all inputs with arbitrary sub-match complexity. I believe there is still some room for improvement, for example just looking at how sub->ref and splits work, also nlist doesn't always need to be recomputed, some states in regular expression don't need to be revisited every-time. I also found some interesting nfa implementation here, but it segfaults in some inputs, or gives wrong results. https://github.com/spewspews/bsp/blob/master/bspregexp.h