Working with streams of packed data.
Sometimes we want to work with sequences of bytes or chars without keeping them wholly in memory at any point. Streaming libraries like streaming (of course!), streamly, conduit and pipes help with that.
However, turns out that yielding individual bytes or chars downstream is not very efficient. Instead, it's better to yield whole chunks of packed data, inside which the bytes or chars sit contiguous in memory, with less indirection.
For many functions, we might still want to refer to the individual
items. The typical example is length
: we usually don't want to count the
number of yielded chunks, but the number of bytes or chars! Similarly, we
usually want to split a stream at the nth byte or char, not at the nth
chunk.
streaming-chunked builds on streaming and allows working with packed datatypes in a more natural way.
conduit has -E
suffixed
versions of functions (like
takeWhileE)
that let you work with streams packed data. What counts as "packed data" is
defined by the
IsSequence
typeclass.
Instead of a typeclass , this library uses a module signature to define what counts as "packed data".
streaming-chunked is similar in philosophy to streaming-bytestring with some differences:
-
It aims to be more general, using a module signature to allow clients to configure the packed representation they want, instead of harcoding it to bytestring.
-
Fewer dependencies: it doesn't depend on exceptions or resourcet. The main library doesn't depend on bytestring or text, either.
-
It doesn't have (at the moment at least) the focus on performance that streaming-bytestring has. In particular, in the internals of the library lives a newtyped
Streaming.Stream
, instead of a more specialized representation. We only wrap it for expressivity.
monoid-subclasses
is a package with, well, subclasses of Monoid
.
streaming-chunked doesn't depend on monoid-subclasses, but it takes
inspiration from it (for example from classes like
FactorialMonoid
or
LeftReductiveMonoid
)
to decide which operations shuld be listed in the signature of the "chunk" datatype.
-
Don't include a Builder -> Chunk function in the signature, as the text and bytestring builders don't return strict chunks, but lazy ones. And including the lazy versions in the signature would overcomplicate it.
-
Perhaps reexport most of the ByteString and Text apis from the chunk implementations? That way users wouldn't need to import the original modules in many cases.