streaming-chunked

Working with streams of packed data.

What's this for?

Sometimes we want to work with sequences of bytes or chars without keeping them wholly in memory at any point. Streaming libraries like streaming (of course!), streamly, conduit and pipes help with that.

However, turns out that yielding individual bytes or chars downstream is not very efficient. Instead, it's better to yield whole chunks of packed data, inside which the bytes or chars sit contiguous in memory, with less indirection.

For many functions, we might still want to refer to the individual items. The typical example is length: we usually don't want to count the number of yielded chunks, but the number of bytes or chars! Similarly, we usually want to split a stream at the nth byte or char, not at the nth chunk.

streaming-chunked builds on streaming and allows working with packed datatypes in a more natural way.

Comparison with conduit

conduit has -E suffixed versions of functions (like takeWhileE) that let you work with streams packed data. What counts as "packed data" is defined by the IsSequence typeclass.

Instead of a typeclass , this library uses a module signature to define what counts as "packed data".

Comparison with streaming-bytestring

streaming-chunked is similar in philosophy to streaming-bytestring with some differences:

It aims to be more general, using a module signature to allow clients to configure the packed representation they want, instead of harcoding it to bytestring.
Fewer dependencies: it doesn't depend on exceptions or resourcet. The main library doesn't depend on bytestring or text, either.
It doesn't have (at the moment at least) the focus on performance that streaming-bytestring has. In particular, in the internals of the library lives a newtyped Streaming.Stream, instead of a more specialized representation. We only wrap it for expressivity.

Comparison with monoid-subclasses

monoid-subclasses is a package with, well, subclasses of Monoid.

streaming-chunked doesn't depend on monoid-subclasses, but it takes inspiration from it (for example from classes like FactorialMonoid or LeftReductiveMonoid) to decide which operations shuld be listed in the signature of the "chunk" datatype.

Random design notes

Don't include a Builder -> Chunk function in the signature, as the text and bytestring builders don't return strict chunks, but lazy ones. And including the lazy versions in the signature would overcomplicate it.
Perhaps reexport most of the ByteString and Text apis from the chunk implementations? That way users wouldn't need to import the original modules in many cases.

danidiaz / streaming-chunked