NiLuJe / FBInk

FrameBuffer eInker, a small tool & library to print text & images to an eInk Linux framebuffer

Home Page:https://www.mobileread.com/forums/showthread.php?t=299110

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ICU for Unicode handling?

shermp opened this issue · comments

Hi @NiLuJe

I'm mulling the idea of adding basic freetype2 support, and was having a look at the FBInk codebase to see if I could figure out how to add support, and I've noticed your rant on Kobo's broken libc with regard to unicode support.

I notice that the Kobo firmware appears to include the ICU library (libicu*.so, vers. 4.6). Have you looked into using this library for dealing with strings in FBInk?

The API documentation for ICU 4.6.1 is here

I ended up skirting the issue with libu8, and, provided no-one tries to feed us hopelessly broken encoding, that does the job just fine without having to massively rework how strings are handled ;).

(ICU is a very very large hammer to take care of the Unicode issue, and the fact that wchar_t is just hopelessly broken on Kobo probably doesn't help. Plus, the fact that some of our target devices either don't ship it, or ship wildly different versions is another thing against it, because bundling it is not an option: besides the fact that it's C++, and takes forever to build, libicudata is over 25MB in ICU 60.2 ;)).

Ah, fair enough. Carry on...

/me keeps forgetting kindles exist 😈

I read a blog post a while back, where the author advocated using UTF-8 internally, and therefore sticking with the standard *char data type. The author argued that many of the most common string operations only care about bytes, and not characters. Also, UTF-8 is a sequence of bytes, so endianess doesn't matter. I found it a rather fascinating read.

That's essentially what I ended up going with ;).

I think I may have read that very same article, (if it mentioned doing sanitization/conversions at I/O boundaries, that's the one). But with the hobbled libc, I can't really do the sanitization/conversion bit, since any libc-based locale/multibyte/widechar stuff is basically borked ;).
So I'm just skipping that, and hoping really hard no-one will feed us KOI8-R or something xD.

It probably was the same article :p

I've been looking into this area a bit lately, because I'm trying to see if I can add differential support to my VHD library, and filepath strings there are encoded as UTF16BE.

Incidentally, do you know of any good cross platform C file path library?

Not really, the only thing that comes to mind is C++ (namely, boost) :/.

And I really don't want to say glib on the C side of things, because glib's weird, and I'm not even sure it'd do what you need ;).

You might also find something interesting either in stb or some other small libs like that ;).

Thanks for the suggestions. I didn't see anything that really struck me as being suitable for my requirements (simple though they may be; path joining and normalization).

I had another look at that STB link, and noticed I had missed the stb.h file the first time around.

Oh my... that looks just about perfect :)