Poor performance with huge lines

Question

Poor performance with huge lines

kyx0r opened this issue 4 years ago · comments

On my hardware neat-vi hits major performance issue when faced with huge lines, about
6000 characters is enough to make it unusable and slow even with syntax highlight disabled.
Moving around is really slow and fills the cpu core to 100%.

@aligrudi I took some time to investigate the cause and it was happening because
the function dir_reorder() is extremely slow in such cases. The algorithm is O(n)
where n is the number of characters in the line. But what is worse is that any motion key
has to go through dir_reorder(). But that wasn't the only cause for bad performance,
with it disabled things still did not look good and it was easy to feel the lag. The syn_highlight
is also extremely inefficient and does a O(n) regex search on entire line for no reason. Finally
a lot of the code has redundant computations, un_slen() does not scale well either.

I managed to solve most of these problems in my fork.
First I run dir_fix() only on the visible lines so instead of it being
O(n) it becomes O(c) where c is the number of terminal cells.
Secondly, the same strategy is used for syntax highlight
but regex is done only from cell 0 to number of terminal cells.
So as you can see the kind of panning essentially reduces
number of operations required to a constant. There is no
need to process things that will not get displayed on the
terminal anyway.
Finally, I have cleaned up the redundant mallocs and uc_slen(). (Not all of them yet)
After solving all these, the scrolling works fine no matter how
huge the line gets, and it's performance moved to a reasonable level.

Ali, what do you think about these changes? Maybe you can take a look
what else can be improved, and maybe would be reasonable to add these
changes into your base version?

Regards and Happy New Year!

Ali Gholami Rudi · Answer 1 · Sat Jan 02 2021 00:26:33 GMT+0800 (China Standard Time)

Kyryl <notifications@github.com> wrote:

On my hardware neat-vi hits major performance issue when faced with huge lines, about 6000 characters is enough to make it unusable and slow even with syntax highlight disabled. Moving around is really slow and fills the cpu core to 100%.

Yes, that is annoying indeed. When editing such files I usually set noorder, nohl, and td=+2.

@aligrudi I took some time to investigate the cause and it was happening because the function dir_reorder() is extremely slow in such cases. The algorithm is O(n) where n is the number of characters in the line. But what is worse is that any motion key has to go through dir_reorder(). But that wasn't the only cause for bad performance, with it disabled things still did not look good and it was easy to feel the lag. The syn_highlight is also extremely inefficient and does a O(n) regex search on entire line for no reason. Finally a lot of the code has redundant computations, un_slen() does not scale well either. I managed to solve most of these problems in my fork. First I run dir_fix() only on the visible lines so instead of it being O(n) it becomes O(c) where c is the number of terminal cells. Secondly, the same strategy is used for syntax highlight but regex is done only from cell 0 to number of terminal cells. So as you can see the kind of panning essentially reduces number of operations required to a constant. There is no need to process things that will not get displayed on the terminal anyway. Finally, I have cleaned up the redundant mallocs and uc_slen(). (Not all of them yet) After solving all these, the scrolling works fine no matter how huge the line gets, and it's performance moved to a reasonable level. Ali, what do you think about these changes? Maybe you can take a look what else can be improved, and maybe would be reasonable to add these changes into your base version?

Very nice and great job. A question: if the reordering of characters or syntax highlighting depends on characters of a line that do not appear on the screen, are they rendered correctly? Also, I wonder if the patch can be made simpler. In Neatvi, I tried to decrease the number of arguments of functions and variables that hold state between calls for readability; nevertheless, I takes me some time to figure out how lines are rendered. This patch increases them for several functions. Can the performance of functions like uc_slen(), uc_chop(), or ren_position be improved by storing the value returned from the previous call? Best wishes for 2021, Ali

Kyryl · Answer 2 · Mon Jan 04 2021 02:01:36 GMT+0800 (China Standard Time)

Kyryl notifications@github.com wrote:
This patch increases them for several functions. Can the performance of functions like uc_slen(), uc_chop(), or ren_position be improved by storing the value returned from the previous call? Best wishes for 2021, Ali

Yes, the best way to improve performance of these functions is to call them once and then reuse all throughout the code, just add extra parameters everywhere seems to be the easiest way to do this. Also compilers are not smart enough to optimize this call redundancy, so I guess they still have very long way to go :) . Though I don't like to rely on compiler optimizations. I don't like things being hidden from my sight.

Also I don't quite see how duplicating a state throughout functions helps readability, because essentially you end up having to write more code, where as in just adding couple more parameters is more atomic in complexity, for computer and human brain as well.

@aligrudi I was still working on this, apparently the optimizations I did when at the time I was writing this original post did not take tabs into account, and their variable length, but they are also 1 character in code, so it becomes not trivial to figure out the proper offset where the syn_highlight should start. But in the end I was able to solve this, the patch became more optimal but there are more lines of code required. But the amount of work to process is way less though. Nonetheless the movement keys are still pretty slow with huge lines, and I don't really feel like looking into how they are done, but I did see like 3 redundant ren_position calls there, so that's likely the cause. I find this optimization work really boring, but it has to be done for a quality product. Maybe I'll do another big patch and fix the rest of the problems in a week or so. I'll keep you up to date ;)

Kyryl.

Kyryl · Answer 3 · Thu Jan 14 2021 12:44:13 GMT+0800 (China Standard Time)

Hello @aligrudi it has been a while and I had some more time to think about on how to solve this problem and right now I think I have the most optimal solution which works perfectly without any regressions, for example tabs, utf text and double width characters does not break syntax highlight ever. I don't think there is anything left that can break my algorithm for doing syntax highlight only on visible terminal cells (thus setting syn_highlight performance cost to a constant). I would appreciate any feed back whether you think this is good or not. The code is very simple and is just this block https://github.com/kyx0r/neatvi/blob/master/led.c#L347-L366 . As you can see I had to split the code by text direction because it only works for td > 0 right now. I would assume if someone ever wanted to implement the same idea for td < 0 the algorithm has to be much different, but still obviously td > 0 is used more often so reverse text may not even need optimizations like that. Also I would point out that this optimization becomes more valuable the bigger your regex in conf.h is. Scary to think about, because just the cost of adding couple more extra regex for syn_highligh to chug probably scales O(n^2) or worse. And I have quite a bit more rules in my conf.h.

Regards

Ali Gholami Rudi · Answer 4 · Sun Jan 17 2021 18:20:01 GMT+0800 (China Standard Time)

Hello, Kyryl <notifications@github.com> wrote:

Hello @aligrudi it has been a while and I had some more time to think about on how to solve this problem and right now I think I have the most optimal solution which works perfectly without any regressions, for example tabs, utf text and double width characters does not break syntax highlight ever. I don't think there is anything left that can break my algorithm for doing syntax highlight only on visible terminal cells (thus setting syn highlight performance cost to a constant). I would appreciate any feed back whether you think this is good or not. The code is very simple and is just this block https://github.com/kyx0r/neatvi/blob/master/led.c#L347-L366 . As you can see I had to split the code by text direction because it only works for td > 0 right now. I would assume if someone ever wanted to implement the same idea for td < 0 the algorithm has to be much different, but still obviously td > 0 is used more often so reverse text may not even need optimizations like that. Also I would point out that this optimization becomes more valuable the bigger your regex in conf.h is. Scary to think about, because just the cost of adding couple more extra regex for syn highligh to chug probably scales O(n^2) or worse. And I have quite a bit more rules in my conf.h.

Impressive! Like the other features you implemented in your branch. I Unfortunately could not find the time to examine the changes minutely, but I think your approach is applying highlighting rules to the buffer you make from the characters on the screen, which seems sensible (there I see difficulties though, handling lines that have text in both directions). Ali

Kyryl · Answer 5 · Sat Jan 30 2021 10:34:51 GMT+0800 (China Standard Time)

@aligrudi Yet another update on this issue. Well I clearly had found a problem with my previous solution yet again, apparently the line may have placeholders which are tricky to work with since it's 1 character width but the number of bytes may be anything random. So it had invalid writes and segmentation faults sometimes. So I had to redesign the solution again. But now it's way better, handles placeholders gracefully and also works in both directions! So this is a complete solution. Also now to determine when to box out the string is extremely fast, it's just 1 if statement per line, so it's almost nothing! The bounds calculation code is also really fast now, because it does not have to look through entire string at all anymore. Take a look at my led.c file on how this magic is implemented, honestly I actually did not even expect that I could also make the reverse direction text work, but I took a more universal approach this time.

Kyryl · Answer 6 · Sat Jan 30 2021 11:28:50 GMT+0800 (China Standard Time)

@aligrudi You may be wondering, so how much was I able to get from his optimization? Well, I did a test on Intel Pentium Silver N5000, which has signle core performance pretty trash, if you compare to todays last gen cpus. Test was done with pure ascii characters with C syntax highlighting. With xorder set to 1, the lag point for the character append to the end of the line is about 16 000 characters. Now my fork the same lag point was reached on 160 000 characters, but the cursor movements are still as fast and no lag felt whatsoever. 2nd test was with xorder set to 0. Base neatvi version append starts to lag very badly at 60 000 characters. My version at 1 800 000++ characters the cursor movement was slightly lagging but it was still possible to edit that line fast enough! Also base version at 60k has a 5 seconds delay when reaching edge of the line while the delay on 1 800 000 characters feels like less than a second! Now not everything can be credited to this patch, I have done extra extensive work refactoring a lot of other code that was badly written or redundant in other places, but I have to say that synhighlight() is the biggest bottle neck in base version. xorder is also pretty bad, but it has to go thorugh entire line unfortunately because the arabic script may go past the screen, but it has to be reordered because the visible part may get messed up.