aligrudi / neatvi

A small vi/ex editor for editing bidirectional UTF-8 text

Home Page:http://litcave.rudi.ir/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segmentation fault or undefined behavior is possible.

kyx0r opened this issue · comments

commented

Hello @aligrudi I have found a way to break vi by looking at files with all kinds of weird encodings and characters, and basically somehow it is possible for the function uc_code(s); in many places to read past the malloc buffer. The good news is though that it's only a off by 1 error from what I saw on valgrind. Valgrind did not find any bad writes, so it may never crash, but the screen surely does show random artifacts. Overall I don't think it's that worth the hassle ( I cound not solve this myself) but maybe you will have a quick way to place a check somewhere? Generally I think uc_chop may be the root cause but I digress.

==31038== Invalid read of size 1
==31038==    at 0x11BF4C: uc_code (in /usr/bin/vi)
==31038==    by 0x11C5C2: uc_shape (in /usr/bin/vi)
==31038==    by 0x11AD30: led_render (in /usr/bin/vi)
==31038==    by 0x11B7F6: led_print (in /usr/bin/vi)
==31038==    by 0x112A78: vi_drawrow (in /usr/bin/vi)
==31038==    by 0x112BF8: vi_drawagain (in /usr/bin/vi)
==31038==    by 0x110546: main (in /usr/bin/vi)
==31038==  Address 0x49b1268 is 0 bytes after a block of size 104 alloc'd
==31038==    at 0x48D080F: malloc (vg_replace_malloc.c:307)
==31038==    by 0x11772F: lbuf_replace (in /usr/bin/vi)
==31038==    by 0x117DFF: lbuf_rd (in /usr/bin/vi)
==31038==    by 0x116CB6: ec_edit (in /usr/bin/vi)
==31038==    by 0x117382: ex_init (in /usr/bin/vi)
==31038==    by 0x110403: main (in /usr/bin/vi)
==31038==
==31038== Invalid read of size 1
==31038==    at 0x11BF4C: uc_code (in /usr/bin/vi)
==31038==    by 0x11C5C2: uc_shape (in /usr/bin/vi)
==31038==    by 0x11AD4A: led_render (in /usr/bin/vi)
==31038==    by 0x11B7F6: led_print (in /usr/bin/vi)
==31038==    by 0x112A78: vi_drawrow (in /usr/bin/vi)
==31038==    by 0x112BF8: vi_drawagain (in /usr/bin/vi)
==31038==    by 0x110546: main (in /usr/bin/vi)
==31038==  Address 0x49b1268 is 0 bytes after a block of size 104 alloc'd
==31038==    at 0x48D080F: malloc (vg_replace_malloc.c:307)
==31038==    by 0x11772F: lbuf_replace (in /usr/bin/vi)
==31038==    by 0x117DFF: lbuf_rd (in /usr/bin/vi)
==31038==    by 0x116CB6: ec_edit (in /usr/bin/vi)

This kind of stuff but not only for uc_shape.

vi_test.zip

Here is the offending file to reproduce the bug. I had to zip because github is dum dum.
Actually, this file is from mpv build source, it has some python at the top but the offending
part is at the bottom of the file, which I believe is a binary blob, but still I don't think vi should
just give up on that.

commented

I does say something about latin-1 encoding,
So I think this is because of 128 bit codepoint???
https://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29#Number_of_symbols,_letters_and_control_codes

commented

Thank you. I will try running this patch in my version. I also thought about adding null checks, but I did not try it because for whatever reason I considered that this wasn't uninitialized memory read but a past the buffer memory read? Generally this would not work against those types of bugs, but now I see that it was not right in my assumption.

I don't think the performance would suffer too badly from this, especially if the code is written like this:

/* the unicode codepoint of the given utf-8 character */
int uc_code(char *s)
{
        int c = (unsigned char) s[0];
        int cnot = ~c;
        if (cnot & 0xc0)        /* ASCII or invalid */
                return c;
        if (cnot & 0x20)
                return ((c & 0x1f) << 6) | (s[1] & 0x3f);
        if (s[1])
        {
                if (cnot & 0x10)
                        return ((c & 0x0f) << 12) | ((s[1] & 0x3f) << 6) | (s[2] & 0x3f);
                if (cnot & 0x08 && s[2])
                        return ((c & 0x07) << 18) | ((s[1] & 0x3f) << 12) |
                                ((s[2] & 0x3f) << 6) | (s[3] & 0x3f);
        }
        return 0;
}
commented

@aligrudi Thanks for the help again. As you can see I like to take a more streamlined approach to writing code, assuming 0 optimizations from the compiler (while imagining the assembly code in my head), so that the compiler can focus on doing the "hard" optimizations that actually matter. I would probably merge this patch for safe than sorry reasons, but I leave this one completely up to you. Closing this issue.