moosichu / zar

An attempt to write an archiver using zig

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

API Discussions

iddev5 opened this issue · comments

commented

Currently, the API has some problems, as also mentioned in my last PR.

  1. When doing a delete operation, the finalize() method is overwriting the contents of the file itself, which is resulting in malformed archives.

Solution: Need some time

  1. After the completion of this project, it has to be integrated into zig as well, currently one such instance is here: src/link.zig#L668. As you can see, it is providing a slice of objects which are allocated in memory. In simple words, even though we dont plan to use memory based implementation in the frontend, we do still need to support it for zig facing parts in the library.

Solution: This would mean introduction of a Archive.addFilesFromMemory() function and modifying the Contents struct to its older state but maybe as a tagged union.

  1. Some API namings are not correct, the one which always comes out to me most is Archive.create(). This is because the name "create" is ambiguous when we are talking about an application which actually creates files.

Solution: I m actually planning to solve this combinedly with 1


It once also came to my attention that, is might be possible that we hit the max open file limit while doing addFiles operations (see ulimit -n on your machine) but even zld creates and holds on to a bunch of file handles at the same time and does not seem to be suffering from this problem. I briefly checked llvm build files and it seems the max number of object files ever used for static linking is close to 100 max in llvm, which is quite less from the common limits.

commented

Proposal of a new API:

  1. The archive structure would be created using: Archive.init(allocator)

  2. Writing to file using: Arch.finalize(self, allocator, file_handle)
    Instead of writing to the file mentioned in the constructor (which is the present style), explicitly ask for a file where it should be written to.

  3. Parsing using: Archive.parse(self, allocator, file_handle, stderr)
    As you can see, the file has been moved from constructor to parse function itself. This is due to few reasons:

  • The file handle at constructor doesn't have any special use after the introduction of the new finalize function.
  • We need to be able to call parse multiple times on different files, this is because the MRI script has a command called ADDLIB
  1. Introduce Archive.addFilesFromMemory() as mentioned above.
  2. All other functions remain unchanged.

Another internal detail:
We also need a way to change for duplicate entries, as its something other ar implementations do. And I think the best way to handle this is to use an ArrayHashMap instead of our current ArrayList based approach.

Suggestions very welcome.

Some of these changes are already what I have in progress locally (such as using a hash map). Hopefully I will be able to check it in this morning. Then I think apart from the changes look like they should work.

Although as we work on the project I expect we will learn quite a lot about what a good api might look like - so it shouldn't be a top focus as it is likely something we will want to change again.

So having had a little look at other implementations - our current approach (my bad on this one, you did the write thing initially!) of storing file contents in terms of file references (as opposed to just copying them into memory) is probably the bad one.

At least for now we should probably be copying them into memory (simpler for sure!) - so I will do that next.

(I won't be able to start until this evening (if I do) or tomorrow morning, so if this takes your fancy please give that a crack!)

commented

probably the bad one

not necessarily. It is indeed useful in some cases, but the main reason why I didnt started with it in the beginning is because it would have been complicated.

In case you haven't started, I will get it done right now. Then we can either merge yours or mine if we collide, its not that big of a thing which would waste my time even if mine isnt merged.

so it shouldn't be a top focus

I agree, but the only real reason why I proposed a new API now is because using file handles for contents broke certain things. But since we are going back to in memory representation, it wont be a problem for now.

Edit: So i just saw that you pushed your changes, nice

commented

Just to let you know, the recent changes had one more regression. Since we are now creating files OR appending to an existing file, it is also replacing the existing file contents when doing insert operation on a pre-existing file. Basically the same problem of "d" but now also with "r". Using the memory based representation (as already decided above) should solve it.

commented

Saving this issue for all API related discussion.

commented

I have started working on MRI scripts support here and it has brought up the need to parse and query the contents of a file but not write all of it when finalizing. So its one more thing added to TODOs. Maybe no. Because llvm-ar's MRI scripts doesn't support such complex commands, so we aren't at the pressure to do it.