rtf reader: unichr() causes ValueError ()

Question

rtf reader: unichr() causes ValueError ()

joka opened this issue 14 years ago · comments

I have a rtf file with strange unicode strings (send you an email).

This causes rtf reader to throw ValueError:

* Module pyth.plugins.rtf15.reader, line 93, in read
* Module pyth.plugins.rtf15.reader, line 113, in go
* Module pyth.plugins.rtf15.reader, line 141, in parse
* Module pyth.plugins.rtf15.reader, line 369, in handle
* Module pyth.plugins.rtf15.reader, line 476, in handle_u
ValueError: unichr() arg not in range(0x10000) (narrow Python build)

The reason why is, my python was build without support for "wide" Unicode characters. (http://www.python.org/dev/peps/pep-0261/). However, an exception handling would be nice.

Brendon Hogger · Answer 1 · Tue Aug 17 2010 22:49:32 GMT+0800 (China Standard Time)

This is a legitimate bug but I'm not sure what the correct fix is. It's possible to construct a surrogate pair to represent the character, with something like:

struct.pack('<L', 0x10000).decode('utf-32')

Which will "work", but perhaps cause other bugs down the line with e.g. slicing. So maybe we shouldn't do it.

Other alternatives include replacing it with '?', or just raising a different exception type. I haven't decided.

Joscha Krutzki · Answer 2 · Wed Aug 18 2010 16:21:28 GMT+0800 (China Standard Time)

I like the solution: replacing with ? + log message

Brendon Hogger · Answer 3 · Thu Aug 19 2010 05:32:47 GMT+0800 (China Standard Time)

I've tried the struct trick above. It means that plugins should never trust len() of unicode strings, or slice them. But that's probably true anyway.