aligrudi / neatvi

A small vi/ex editor for editing bidirectional UTF-8 text

Home Page:http://litcave.rudi.ir/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ren_placeholder to find character width is wrong

kyx0r opened this issue · comments

commented

Hello @aligrudi I was looking at commit history and I find that this commit 0d6850f
has a bug. The problem is that function conf_placeholder can modify the wid variable without the character actually being a placeholder, so for example if the character is 'a' and last placeholder in the placeholders array is width of 5, the 'a' character will be treatead as width of 5, which is obviously wrong. The only saving grace from having this bug appear is that currently there isn't any placeholder in array to be wider than 1.

Also see my code for proper implementation.
https://github.com/kyx0r/neatvi/blob/fee6b5a1acccc4bec8ed14eff61f5a362b1b8d75/ren.c#L176

To be honest, I am quite skeptical of this change in the first place, as those extra checks actually hold an astunding performance implication. What is your take on the issue?

commented

Are those uc_iscomb(s) || uc_isbell(s) calls even necessary??? For now I only mirrored the change to have the same functionality. But I don't understand what problem they solve. uc_bell may be able to solve the problem of bad encoding data, so the width should be 1 for it, which looks right, but what's up with the uc_iscomb call? Is that something that was overlooked ?

commented

uc_wid has 3 possible return states, 0, 1, 2. binary search is performed on dwchars, it would safely return 1, zwchars are also handled. I don't see a reason for those checks, and they are quite a performance hurdle!

commented

Thanks for quick responce. Does uc_wid() have an erroneous width reporting in some cases? By that I mean, sometimes character of width 2 is taken as width 1 or wise versa? If so, that means the look up table is not complete, or its wrong and should be fixed. As for zero width characters, it can be handled this way:

int wid = uc_wid(s);
return !wid ? 1 : wid ;

This would solve the issue of them being 0, but more importantly, it only took 1 if statement which is 100000x faster than those is_comb is_bell checks. Am I missing some important piece of information?

commented

Now are only 2 states, uc_wid() can either be 1 or 2. However the 2 should be returned only for double width, and if the function handles double withth correctly, that leaves it with only 1 possible state for everything else, meaning that 1 will be returned regardless and it doesn't matter what is_comb and is_bell do.

commented

It certainly is a bottleneck! You are doing worst case 4 binary search lookups for every character in the line and like + 15 more if statements, when it can clearly be 2 bsearches and 3 if statements. And even worse, if that character actually happens to be passing the is_comb condition it will run into sprintf function which has like 100+ if statements inside of it, because it's a univeral function that needs to check for all kinds of string conversions. Now I imagine you won't notice any performance difference with this if you are editing files that have ~50 - 80 characters in them, but what will happen in there are 10000 characters in a line? I tried on my version and I have features that require screen to be redrawn completely all the time, and even on ascii lines that were <80 characters long it lags pretty bad, granted it's because I have complex syntax highlighting rules that take %90 of the cpu time there, but because this is a problem, there is no reason other things need to be slow, I need to get all the performance I can get so that it can be used on more important things!

commented

Do you mean in the implementation of ren_cwid()?

Yes.

I am perfectionist, by nature a want things to be done the way they should be done, even if it may seem like it's not a big performance issue right now, trust me, those things do add up quickly as time goes on. I worked on projects that are very bloated and there is just so much crap inside them that no profiler ever can figure out the bottleneck! It's because there actually isn't one, there is just a million things that were taken as negligible by programers so that they end up costing a big price once there was so many of them! It ends up pretty bad, this is why I think software should get better over time, not worse. That's why I refactored a lot of code in neatvi that I did not have to, that was fine as is, but it wasn't optimal.

commented

I think that placeholder feature support is actually great in general. But it definitely can be implemented with lower overhead, as it has been done in the past, before that commit. I understand the motivation of trying to reduce the number of lines of code to make it simpler, but in this case the benefits don't outweight the downsides. ren_placeholder() shouldn't be trying to get the width in the first place, it's crammed in functionality that's going to be used only on 1 occasion. And personally it made it hard for me to understand the code, which is why I made this issue because it's unclear why is_comb and is_bell have to be invoked. And it will be unclear to anybody else who will try to read the code, because lets be real, nobody is going to try and run the binary search of a look up table in their head to figure out whether those code path's affect the execution or not. But it turns out they don't affect anything but they will misslead anybody like they are necessary and also hurt performance. Unix philosophy, it should do one thing and do it well, same applies to functions in code.

If I didn't miss anything important on how uc_wid function works, this is my final solution https://github.com/kyx0r/neatvi/blob/208fa09f4094416d613a42d415b7a035c6ea9d35/ren.c#L149-L178

And as you can see, it's not that much more code, we are talking net ~3 more lines.

commented

Thank you for you time, I think that settles the issue then. But I will make another one, soon, it's a different question.