No safe `decodeASCII :: ByteString -> Maybe Text`
raehik opened this issue · comments
I would like to efficiently parse a bytestring as an ASCII character string, which should disallow UTF-8. text-2.0 improved Data.Text.Encoding.decodeASCII
, giving it its own definition rather than piggybacking off decodeUtf8
; but it's partial, and has worse error handling. It's not easy to implement this efficiently as a local user, because there's a bundled C function for checking that a buffer is valid ASCII. It feels like a useful function for low level conversions (I certainly have a use in binrep).
The bytestring decoding in text feels clunky overall. I have to copy an unexposed snippet that catches thrown exceptions to convert them to Either UnicodeException Text
. Could the interface here be improved? I would gladly take part in implementing them.
Somewhat related, I feel like an efficient isAscii :: Text -> Bool
could be exported with the new UTF-8 internal representation. text-short has it at Data.Text.Short.Internal.isAscii :: ShortText -> Bool
. Text.all Char.isAscii
is fine, but I'm conscious that it does a lot more work than it needs to.
Yes, there really should be total versions of all the decoding functions.
Nice. This is on my radar, particularly isAscii :: Text -> Bool
.
Fast isAscii
is tracked at #497 .
Safe decodeASCII' :: ByteString -> Either Int Text
is tracked at #499 .
Both functions discussed here have been given efficient implementations and merged. On a larger scale, the decoding received an overhaul in #448 . Thanks to the maintainers and co who helped me for the feedback and speedy turnaround!