gregl83 / paq

Hash file or directory recursively using Rust.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Windows benchmark and issues

p10tyr opened this issue · comments

I needed to know if contents of a folder changed from last time I chcked.
image

Running first time with errors like this. And it must have taken 4 minutes

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: Uncategorized, message: "The process cannot access the file because it is being used by another process." }', src/lib.rs:79:58
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: Uncategorized, message: "The process cannot access the file because it is being used by another process." }', src/lib.rs:79:58
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: Uncategorized, message: "The process cannot access the file because it is being used by another process." }', src/lib.rs:79:58

Second time round it seems to have taken 1second.. still same errors and niether runs returned a hash

Hey @p10tyr!

Apologies for the delayed response, yesterday was a busy one.

Thank you for trying out paq and taking the time to report your findings. I use paq to identify file system changes as well. Mostly with the bazel build system.

The error you're encountering is due to a file system lock on one or more files within the source root directory (C:\Repos). This is an intentional file system restriction to prevent integrity and consistency issues. Files within that source root are open in an IDE, used by a daemon, or some other program. This restriction prevents paq from streaming (opening) the file contents to generate a hash for that file respectively.

paq uses parallelism when able which can encounter a locked file at different times but 4 minutes is high. These errors can significantly slow down computing hashes but there also might be the option to return errors sooner. I do have a work issue open to cleanup error messages.

For your specific case, is it possible to close all applications using the files when computing the hash?

Thanks for your reply. Not convenient to close all IDE's (yes, multiple) if we just trying to rebuild a part of something and then have it released locally.

These files could be ide files that we don't even care about.. wonder if its worth checking git.ignore and applying the same ignore pattern on globs? this would help with node_modules inside repos but we also have dotnet with bin directories that are not what i'm not interested in..

Just remembered I had a chat with a colleague and I think we came to conclusion we can just use git to tell us if anything has changed. Thanks again.

@p10tyr, agreed. It'd be very disruptive to terminate processes constantly in a development environment. I am curious which process was locking files in your case, being able to reproduce the issue might reveal some design improvements for paq.

Thus far, I've used paq in hermetic builds that don't need to be concerned with concurrent access. One goal was independence from other software tools such as GIT. There is support to ignore file or folder names starting with . but that is all.

Worth mentioning, the build tool referenced earlier, bazel, was built by Google and maintains state to rebuild only changes making it really efficient. It's well suited for mono-repository or mono-directory cases that are massive. With this case, you'll still be required to manage deployments of changes which is actually one thing I used paq for. You'll also still need to trigger builds. State can be persisted locally or remotely on a build cluster.