terminusdb / terminusdb-store

a tokio-enabled data store for triple data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use u32 for log array indexing and length

spl opened this issue · comments

The log array buffer format uses a u32 in the control word to determine the length (number of elements) in the log array. But this is exposed as a usize publicly for indexing and getting the length.

I'm guessing you have no plans to develop terminus-store for an architecture in which std::mem::size_of::<usize>() < 4 (i.e. the bus width is < 32 bits), so you won't have a problem with usize being too small. But maybe another problem will appear due to casting or running on a 32-bit platform.

It seems like it would be better to more accurately represent the index and length types. For one thing, given that a log array is read from a disk and (I think) should be portable across systems, it will never have a length determined by a particular system's architecture.

Unfortunately, changing usize to u32 in the appropriate places in logarray.rs introduces a cascade of errors around the code. I haven't looked into these. Maybe a more thorough audit of of data structure lengths and indexing is needed.

This is possibly a minor issue, but I just wanted to make a note of it before I moved on and forgot about it.

Right, I have no intention of supporting smaller architectures.

Much of rust's api dealing with slices or offsets or indexes expect an usize. Exposing len as an usize is therefore convenient. The alternative would be to cast whatever comes out of len in such situations. The problem with smaller architectures doesn't go away by moving the cast to the caller. Making logarray work on 16 bit would require some more effort. So I'm in favor of keeping this a usize.

Your assert is a good idea and should let us get away with this. That said, does rust even support small architectures?

I agree with not supporting anything less than 32-bit. Yes, I believe Rust does support (or would not be averse to supporting) some architectures with smaller bus sizes. There's a lot of movement toward making it work well in embedded systems.

Having looked more at the data structures used in terminus-store, I think they are definitely optimized for 64-bit architectures. I think this is the right way to go. I'm rather concerned surprising things might happen on a 32-bit architecture. I suppose there are too options:

  1. Attempt to support 32-bit and 64-bit. This should involve added 32-bit to CI.
  2. Forget about 32-bit for now and focus on 64-bit. If needed, come back to 32-bit later.

I'm guessing 32-bit would be a rather minor thing to focus on, and so option 2 is probably the better way to go. If that's the case, just in case anybody who happened to try running terminus-store on a 32-bit machine, I think we should add a static check similar to #43. I would put it in bitarray.rs, since that's an obvious case where you have a 64-bit length that is cast to usize.