mar-file-system / marfs

MarFS provides a scalable near-POSIX file system by using one or more POSIX file systems as a scalable metadata component and one or more data stores (object, file, etc) as a scalable data component.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

POSIX DAL Does not read packed files correctly

wfvining opened this issue · comments

The tests I ran on the POSIX DAL were insufficient and did not reveal a severe bug for reading packed files. because they used copies of the same data for each packed file, the tests always appeared to succeed, even though read always occurred at offset 0.

Read offsets are specified through the AWS4C IOBuf installed in the ObjectStream, but the POSIX DAL does not check the offset in the IOBuf (since the IOBuf is not used for reading in the POSIX DAL). To address this read problem I can get the offset from ObjectStream->iob->context->byte_range.offset. There are actually many places in marfs_read() where we expect the IOBuf to be side-effected by libaws4c, but trying to mimic all the possible side-effects from libaws4c in DALs that do not use aws4c is probably (definitely) a bad idea. We should consider refactoring to isolate dependencies on libaws4c and its side-effects in the OBJECT DAL.

I haven't looked, but I would think the file-handle read/write-state would be a natural place to keep state like that. The IOBuf would maintain it's own state as needed, but I agree we shouldn't be perpetuating dependencies on that state, outside of the object-IO use-cases.


From: Will Vining [notifications@github.com]
Sent: Friday, September 23, 2016 10:02 AM
To: mar-file-system/marfs
Subject: [mar-file-system/marfs] POSIX DAL Does not read packed files correctly (#165)

The tests I ran on the POSIX DAL were insufficient and did not reveal a severe bug for reading packed files. because they used copies of the same data for each packed file, the tests always appeared to succeed, even though read always occurred at offset 0.

Read offsets are specified through the AWS4C IOBuf installed in the ObjectStream, but the POSIX DAL does not check the offset in the IOBuf (since the IOBuf is not used for reading in the POSIX DAL). To address this read problem I can get the offset from ObjectStream->iob->context->byte_range.offset. There are actually many places in marfs_read() where we expect the IOBuf to be side-effected by libaws4c, but trying to mimic all the possible side-effects from libaws4c in DALs that do not use aws4c is probably (definitely) a bad idea. We should have a consider refactoring to isolate dependencies on libaws4c and its side-effects in the OBJECT DAL.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/165, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH22-cEDxNcndbnQ-TkGhnGeDekb8p5wks5qs_gzgaJpZM4KFIw8.

Jeff, It looks like this may have been resolved by your updates in the gc-dal branch. There are other problems that the changes in that branch have brought up. I am working on those now.

I think you're right about the file handle read status field. Having a file handle as part of the dal context makes this much easier to resolve and should make future DALs easier to write.

Sorry for any new trouble, but at least we'll have some advantages going forward.


From: Will Vining [notifications@github.com]
Sent: Friday, September 23, 2016 11:34 AM
To: mar-file-system/marfs
Cc: Inman, Jeff; Comment
Subject: Re: [mar-file-system/marfs] POSIX DAL Does not read packed files correctly (#165)

Jeff, It looks like this may have been resolved by your updates in the gc-dal branch. There are other problems that the changes in that branch have brought up. I am working on those now.

I think you're right about the file handle read status field. Having a file handle as part of the dal context makes this much easier to resolve and should make future DALs easier to write.


You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com//issues/165#issuecomment-249254780, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH22-amUitCR7Vpn8yqvS-mMxmzXPfTfks5qtA2ogaJpZM4KFIw8.

The trouble I'm having will need to be resolved for the MC DAL as well, so not a problem to address it now.

It turns out a large part of the problem stems from a log message in update_url() that expects the stream to have been initialized. I wonder if we can move the call to update_url() to obj_open() and remove it from everywhere else it is called in marfs_ops? (If we did that we probably have to add preprocessor directives to leave it in place if compiled with DAL disabled, but they could be restricted to open_data()). I can update the POSIX DAL (and the skeleton I worked up for the MC DAL) to use fh->info.pre.objid for its file names rather than os->url. I think it might make it cleaner overall to do it this way and avoid calling update_url() except in the case of the OBJECT DAL.

Alternatively we can move the update_url() call out of stream_init() and back into marfs_open() and rewrite it to make the log message safe if the stream is not initialized. The benefit there is that we don't have to make as many changes to the rest of the code (update_url() gets called in marfs_open(), marfs_ftruncate(), marfs_read_internal(), and marfs_write()).

I have implemented the first option and I am running tests on it now for the OBJECT DAL. I will test for the POSIX DAL next week.

The log message can be dropped, or moved somewhere else, whatever is convenient.

But generating URLs (object-IDs, or filenames in the case of MC DAL) is something I was figuring we would want to integrate into the DAL (somehow). You've seen the sketch of MC configurations where pathnames will need to be generated for blocks in a stripe. We want to push that into somewhere that DALs can do it, or at least influence it.

I was guessing it might work to have DALs take over something like update_url(), perhaps as a DAL "method" in the interface, or else as a special-purpose function that uses the configuration in some magical generic way to generate pathnames for everyone. The first way sounds easier, but the important thing is just to get the functionality we want, and methodology is open to whatever makes sense.

Thanks for digging in to this.


From: Will Vining [notifications@github.com]
Sent: Friday, September 23, 2016 3:59 PM
To: mar-file-system/marfs
Cc: Inman, Jeff; Comment
Subject: Re: [mar-file-system/marfs] POSIX DAL Does not read packed files correctly (#165)

I have implemented the first option and I am running tests on it now for the OBJECT DAL. I will test for the POSIX DAL next week.


You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com//issues/165#issuecomment-249312504, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH22-Y_DlsTW90BNsilK5wjKkXHW6PjAks5qtEuogaJpZM4KFIw8.

I have fixed the issue above, and posix dal is now up to date and working. I also tested garbage collection on a posix dal repo, and it works correctly. I'll look at adding update_url() (or something like it) to the DAL interface. That does seem like the easiest option, and it makes sense as an interface function.

I have added a function update_object_location() to the DAL interface. I also moved all the update_url() calls to happen in open_data() since they always precede that call anyway. That facilitates using the DAL operation without having to add #if USE_DAL all over the place.

The changes are in the dal-update_url branch. I am testing now.

Completed: 47c6981