haskell / text

Haskell library for space- and time-efficient operations over Unicode text.

Home Page:http://hackage.haskell.org/package/text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request: Stream decoding with "stop" as the error behavior

chris-martin opened this issue · comments

I am considering a situation where I have a ByteString stream that may be UTF-8 up to some unknown point, and I'd like to be able to do a streaming decode of as much Text as possible for as long as the input is valid, and then stop at the first sign of trouble, obtaining both the decoded Text and the non-UTF8 ByteString remainder.

I envision something like this:

streamDecodeUtf8' :: ByteString -> Decoding'

data Decoding' = Some'
    Text -- ^ What was decoded
    ByteString -- ^ Remainder that was not decoded
    (Maybe UnicodeException)
        -- ^ 'Just' an exception if the remainder is non-empty
        -- because it begins with invalid input.
        -- 'Nothing' if the remainder is empty or is non-empty
        -- but could become valid with more input.

This is being worked on #448

The API there is more complicated, because (1) returning a Text forces you to do a copy and (2) returning the remainder as a ByteString forces you to append to the next chunk to resume. But I think it's still possible make it look closer to what you are proposing while leaving the user in control of how the copying to Text is done.

returning the remainder as a ByteString forces you to append to the next chunk to resume

Yes, the existing stream API gives you, in addition to the ByteString remainder, a function that lets you continue without having to concatenate, and there's no reason I should have proposed changing that aspect. A better attempt would be:

streamDecodeUtf8Strict :: ByteString -> StrictDecoding

data StrictDecoding = StrictDecoding
    Text -- ^ What was decoded
    ByteString -- ^ Remainder that was not decoded
    (Either UnicodeException (ByteString -> StrictDecoding))