pread/pwrite

Question

pread/pwrite

yamt opened this issue 3 months ago · comments

YAMAMOTO Takashi commented 3 months ago

it would be nice to support pread/pwrite natively.

while an emulation with seek is possible, i guess it can have a few drawbacks:

embedders likely need a recursive lock to avoid races
it's likely inefficient as littlefs's seek also moves cache position

Christopher Haster · Answer 1 · Fri Mar 08 2024 16:39:03 GMT+0800 (China Standard Time)

Hi @yamt, thanks for opening an issue.

I'm glad you proposed this, because I only learned pread/pwrite are in standard POSIX recently. It makes sense as an API and I think it would be good to add.

pread/pwrite also shouldn't add much code cost as most of the logic should be shared behind the scenes.

it's likely inefficient as littlefs's seek also moves cache position

The reason littlefs's seek moves the cache is because we usually need it for reading/writing, so I don't think pread/pwrite will escape this cost. Maybe when pread/pwrite bypasses the cache.

That being said, non-disk-format related API changes are low priority at the moment. So it may take some time for this to land.

YAMAMOTO Takashi · Answer 2 · Fri Mar 08 2024 18:12:54 GMT+0800 (China Standard Time)

it's likely inefficient as littlefs's seek also moves cache position

The reason littlefs's seek moves the cache is because we usually need it for reading/writing, so I don't think pread/pwrite will escape this cost. Maybe when pread/pwrite bypasses the cache.

i meant that, a straightforward emulation with lseek like the following would be inefficient as it moves
the position back and forth. a native implementation can be better.

lfs_file_pread(off) {
    orig_pos = lfs_file_seek(CUR);
    lfs_file_seek(off, SET);
    lfs_file_read();
    lfs_file_seek(orig_pos, SET);
}

Christopher Haster · Answer 3 · Sat Mar 09 2024 04:04:01 GMT+0800 (China Standard Time)

A native implementation could save a CTZ skip-list lookup, that's true. And there may be other opportunities for minor optimizations.

But caching is a bit of a harder problem, since lfs_file_pread doesn't know what the next operation will be. It may be followed by a pread immediately after if above layers emulate read with multiple pread calls, for example.

YAMAMOTO Takashi · Answer 4 · Mon Mar 11 2024 10:50:52 GMT+0800 (China Standard Time)

But caching is a bit of a harder problem, since lfs_file_pread doesn't know what the next operation will be. It may be followed by a pread immediately after if above layers emulate read with multiple pread calls, for example.

are you concerning cache-hit rate here?

Christopher Haster · Answer 5 · Tue Mar 12 2024 05:28:18 GMT+0800 (China Standard Time)

Ah yeah. Cases like these:

// write blob
pwrite(f, 0, <data_blob>)

// scan for start of payload
lfs_off_t payload_start = 0;
while (true) {
    pread(f, payload_start, buf, strlen("<header>"));
    if (strcmp(buf, "<header>") == 0) {
        break;
    }

    payload_start += 1;
}

You could argue this should be done with seek + read, but I'm not sure that's always possible when other OS layers get involved. FUSE for example only talks in pread/pwrite (fuse_lowlevel_ops.read).

Though, as an "optimization problem", there may not be a single correct impl here.

Christopher Haster · Answer 6 · Tue Mar 12 2024 05:32:01 GMT+0800 (China Standard Time)

Though I suppose you could also argue the FUSE impl should still call seek + read if only for cache reasons.

Maybe the best option is unclear. Not moving the cache in pread/pwrite at least gives users more control, even though the performance differences may be initially confusing to users.