xguerin / bitstring

OCaml Bitstring - bitstring matching for OCaml

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Matching against Bigarray/Bigstring/Lwt_bytes.t/etc.

thisismiller opened this issue · comments

IO frameworks in OCaml outside of stdlib seem to be increasingly offering their read/write functions as ones that return or accept a Bigarray.Array1.t (wrapped as either Lwt_bytes.t, Core.Bigstring.t, etc.). There doesn't seem to be a zero-copy way of converting that Bigarray into bytes such that one can supply it to bitstring for matching.

Have I missed some thing obvious, or is there some unsafe_bytes_of_bigarray somewhere which I haven't found? Or is there some suggested way forward? Not being overly familiar with OCaml yet, is there a sensible refactoring of bitstring to allow it to operate on either type, or would I be best forking and beginning a s/bytes/Bigstring.t/ rewrite?

It does not look like there is any generic way to get bytes from Bigarray.Array1.t. However, both Lwt_bytes.t and Core.Bigstring.t offer a to_string method, which looks like the easiest way to interface with bitstring for now.

Converting to string and then to bytes (the internal bitstring representation) adds an extra copy as compared to bistring_from_string. It should be trivial to expose a bistring_from_bytes method that would have a comparable cost when counting the to_bytes conversion from bigarray.

As for using bigarray as backend type, I think using bytes makes the library generally more compatible. That being said, it should be possible to change the library signature into a functor to specialize the backend to either bigarray or bytes. Though such a change would require a major rewrite of the library.

I'm trying to match against a database page structure, so I'd prefer to not do a 4k copy to a string each time. If I end up putting in the effort to get a nicer syntax for picking out bytes, I'll follow the functor approach, and post for your consideration if it happens and looks alright. Thanks!