aligrudi / neatvi

A small vi/ex editor for editing UTF-8 text

Home Page:http://litcave.rudi.ir/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I made your regex 2X faster.

kyx0r opened this issue · comments

commented

Take a look at my regex.c https://github.com/kyx0r/neatvi/blob/master/regex.c
I can't think of any more ways to make it faster now, (unless rewrite completely from scatch using a diffrent approach described in many research papers.) I was able to cut down cpu usage by twice which is very substantial already. Overall the net patch will probably cost you +25 more lines on code, but it will definitely be worth it. Also you will have way less active memory usage and allocations, and thus faster compile times despite the fact that I precompute the bracket expressions. It's just so much better, also the problem is solved ireratively, but it still behaves like recursion would, but zero overhead of stackframe allocations and useless copy of data. Also the marks array for subs is reset to -1 for every character, well that was a huge bottleneck, so instead of memsetting the whole array each time I only reset a max portion that was used and it causes a massive speed up!

commented

By the way, @aligrudi it's actually usable on raspberry pi now, and other weak cpus.

commented

@aligrudi Hello Ali, yes I am aware that it's possible to switch out the regex version to libc, but I never really liked the idea, so my fork's regex.c actually doesn't really try to be replaceable, I don't have regex.h and everything is embeded in vi.h for better coherency of the codebase. I did not change the ABI, so if you want to try swaping it you can but probably it won't be as easy as in your version. However you are probably very familiar with this code anyway, so there should be no problem to look over and see what exactly I changed, but to be honest there are lots of changes to regex.c.

commented

@aligrudi Here is a follow up about regexes: kyx0r/nextvi@45ac822

Hopefully you find it interesting to take a look at.

Regards,
Kyryl

commented

@aligrudi Hello Ali, any thoughts? Also take a look at this repo https://github.com/kyx0r/pikevm it has it as a library + test cases. I am getting very close to having NFA straight up beating DFA on all inputs with arbitrary sub-match complexity. I believe there is still some room for improvement, for example just looking at how sub->ref and splits work, also nlist doesn't always need to be recomputed, some states in regular expression don't need to be revisited every-time. I also found some interesting nfa implementation here, but it segfaults in some inputs, or gives wrong results. https://github.com/spewspews/bsp/blob/master/bspregexp.h