BurntSushi / same-file

Cross platform Rust library for checking whether two file paths are the same file.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about methodology

kayabaNerve opened this issue · comments

Sorry to be opening an issue for a question. I was curious why this library exists compared to std::fs::canonicalize, which has the guarantee of Rust to return the real path. Is it a race condition commentary where

  • open file
  • move file
  • move other file to that location

would cause distinct files to return as matches due to having the same path across time, an edge case this lib handles? What stronger guarantees exactly does this library aim to offer?

This may benefit from a follow-up to the README clarifying its utility.

It's a good question. It has been a long time since I wrote this library when I was steeped in the motivation for it. I wish I had written it down somewhere in the docs, but it looks like I didn't. I may have avoided doing so because it's full of platform specific details. With that said, I think I can describe the gist of it.

First of all, comparing by file paths means you need, well, a file path. Sometimes you might just have a file descriptor. The docs do give an example of this where one can determine whether stdout corresponds to the same file as some other path.

Secondly, canonicalization doesn't deal with hard links at all. It only resolves symbolic links. So if foo and bar are hard-linked to each other, they are the same file. But your equality comparison using std::fs::canonicalize will say they are different.

Thirdly, yes, there is the potential for race conditions here. Take a look at the internal documentation for the Windows implementation for example. It specifically talks about keeping a handle to the file open during the comparison, otherwise the underlying file ID numbers being used can be recycled. I believe the same is true on Unix systems too.

I believe this crate is currently used by myself in the following places:

  1. In walkdir and ignore for detecting file system loops. Another reason to do things this way versus std::fs::canonicalize, is that canonicalization is likely quite a bit more expensive.
  2. In ripgrep for detecting accidents like rg . > out.txt. Without knowing that out.txt is actually ripgrep's stdout, ripgrep might actually search out.txt as it's writing to it, creating an infinite loop that will make out.txt grow without bound.

It looks like Cargo also uses it, but I'm unsure of the details there.

Thanks for the details!

I definitely get why there isn't a guaranteed list of functionality, hence asking "aim to offer". I appreciate the effort in this lib and in your response :)