Question about methodology
kayabaNerve opened this issue · comments
Sorry to be opening an issue for a question. I was curious why this library exists compared to std::fs::canonicalize
, which has the guarantee of Rust to return the real path. Is it a race condition commentary where
- open file
- move file
- move other file to that location
would cause distinct files to return as matches due to having the same path across time, an edge case this lib handles? What stronger guarantees exactly does this library aim to offer?
This may benefit from a follow-up to the README clarifying its utility.
It's a good question. It has been a long time since I wrote this library when I was steeped in the motivation for it. I wish I had written it down somewhere in the docs, but it looks like I didn't. I may have avoided doing so because it's full of platform specific details. With that said, I think I can describe the gist of it.
First of all, comparing by file paths means you need, well, a file path. Sometimes you might just have a file descriptor. The docs do give an example of this where one can determine whether stdout
corresponds to the same file as some other path.
Secondly, canonicalization doesn't deal with hard links at all. It only resolves symbolic links. So if foo
and bar
are hard-linked to each other, they are the same file. But your equality comparison using std::fs::canonicalize
will say they are different.
Thirdly, yes, there is the potential for race conditions here. Take a look at the internal documentation for the Windows implementation for example. It specifically talks about keeping a handle to the file open during the comparison, otherwise the underlying file ID numbers being used can be recycled. I believe the same is true on Unix systems too.
I believe this crate is currently used by myself in the following places:
- In
walkdir
andignore
for detecting file system loops. Another reason to do things this way versusstd::fs::canonicalize
, is that canonicalization is likely quite a bit more expensive. - In ripgrep for detecting accidents like
rg . > out.txt
. Without knowing thatout.txt
is actually ripgrep's stdout, ripgrep might actually searchout.txt
as it's writing to it, creating an infinite loop that will makeout.txt
grow without bound.
It looks like Cargo also uses it, but I'm unsure of the details there.
Thanks for the details!
I definitely get why there isn't a guaranteed list of functionality, hence asking "aim to offer". I appreciate the effort in this lib and in your response :)