haskell / text

Haskell library for space- and time-efficient operations over Unicode text.

Home Page:http://hackage.haskell.org/package/text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No safe `decodeASCII :: ByteString -> Maybe Text`

raehik opened this issue · comments

I would like to efficiently parse a bytestring as an ASCII character string, which should disallow UTF-8. text-2.0 improved Data.Text.Encoding.decodeASCII, giving it its own definition rather than piggybacking off decodeUtf8; but it's partial, and has worse error handling. It's not easy to implement this efficiently as a local user, because there's a bundled C function for checking that a buffer is valid ASCII. It feels like a useful function for low level conversions (I certainly have a use in binrep).

The bytestring decoding in text feels clunky overall. I have to copy an unexposed snippet that catches thrown exceptions to convert them to Either UnicodeException Text. Could the interface here be improved? I would gladly take part in implementing them.

Somewhat related, I feel like an efficient isAscii :: Text -> Bool could be exported with the new UTF-8 internal representation. text-short has it at Data.Text.Short.Internal.isAscii :: ShortText -> Bool. Text.all Char.isAscii is fine, but I'm conscious that it does a lot more work than it needs to.

Yes, there really should be total versions of all the decoding functions.

Nice. This is on my radar, particularly isAscii :: Text -> Bool.

Fast isAscii is tracked at #497 .

Safe decodeASCII' :: ByteString -> Either Int Text is tracked at #499 .

Both functions discussed here have been given efficient implementations and merged. On a larger scale, the decoding received an overhaul in #448 . Thanks to the maintainers and co who helped me for the feedback and speedy turnaround!