idris-lang / Idris-dev

A Dependently Typed Functional Programming Language

Home Page:http://idris-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement byte-arrays

IreneKnapp opened this issue · comments

The below is a design proposal which shlevy, Ralith, and myself discussed on IRC at great length just now. Ralith sees an issue with the size of the store object not being erased, but since we had this nice writeup already... I thought I'd put it here to begin a discussion, anyhow. So, this is waiting for comment from Ralith before implementation begins.

PLEASE NOTE that this describes the portion of this data object and structure which is to be implemented as part of Idris itself. Linked lists of these buffers, mutation semantics both pure and impure, and so forth, can be implemented in Idris “user code”, on top of these primitives. This allows users to experiment with different implementations of them and find good tradeoffs.

If you want to see and participate in some discussion of higher-level interfaces, please see the google document at:

https://docs.google.com/document/d/1MTvYOXuoumTw1N32pXRm0k8MtcM04PZQIceTO7vhO-k/edit?usp=sharing

The following is the basic interface:

Buffer : Nat -> Type
allocate : Nat -> Buffer 0
copy : Buffer n -> Buffer n

For each primitive type with a well-known representation:

peekSomeType : (a : SomeType) -> (offset : Fin n) -> Buffer n -> a
appendSomeType : (a : SomeType) -> (count : Nat) -> Buffer n -> a -> Buffer (size of a * count + n)

Specifically, these “viewable types” are to include the signed and unsigned bounded integer types, the bit types, and Buffer itself for constructing substrings. They are not to include the unbounded integer type, nor the string type.

Representation: Two objects, a front-end and a store, with the store “behind the curtain” in that no primitives expose it to the typesystem. The front-end object contains a pointer to a store object, as well as indices (or pointers; implementation detail) giving a window within it. The store object contains a “fill pointer” which tells how much of it is used. It does not contain an explicit size, because that’s in the typesystem.

Fast append operation: Check if the front-end object’s end coincides with the end of the store object. If it does not, copy into a new store object. If it does, advance the fill pointer of the existing store object, and return a new front-end object with the appropriate stuff. If advancing the fill pointer would move it past the end of the store object, allocate a new buffer with double the size, and copy into it.

Allocate: The Nat parameter is a size hint for the underlying buffer, which probably gets rounded up to the next power of two.

Copy: Make a fresh one, because we're splitting the buffer into a front-end object and a store object, with the latter "behind the curtain" because it can be mutated in the specific fashion of appending to it, which would break referential transparency if the object had any primitives that acknowledged its existence. Yes, even if we only had append and read operations on it, because read has to deal with failure somehow, which allows the fill pointer to be interrogated, and the front-end interface has to be pure functions. Making a fresh front-end object pointing to a fresh store object allows freeing the (possibly very large) old store object.

Planning to take a crack at this this week.

Also important to note: If the buffer is too small, append always allocates a new buffer rather than doing a realloc.

After some further thought last night it became clear that there probably isn't a problem with erasure, so no blocker there.