shundhammer / qdirstat

QDirStat - Qt-based directory statistics (KDirStat without any KDE - from the original KDirStat author)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Option to store uid, gid and perms in cache file

jonjacksonma opened this issue · comments

Hi.
It would be great to have the option for qdirstat-cache-writer to have the option of storing uid, gid and permissions in the cache file. The use-case is a large shared filesystem where you typically want to know who owns the largest files/folders after identifying them, and where generating the cache file ahead of time via cron is necessary to load the tree in a reasonable time.
thanks,
Jon

That would mean making the file format incompatible to older versions, which I am very reluctant to do: It is also in use for some backup software because it's so simple.

In environments where large shared filesystems are still a thing (I haven't seen many of those over the last 20 or so years), isn't directory ownership typically implicit with the path? Do you really have directory trees where various users create files and directories all over the place? Don't departments, teams and users get some subtree assigned where they have permissions to create their own files and directories?

The file format specification:

https://github.com/shundhammer/qdirstat/blob/master/doc/cache-file-format.txt

Adding another three fields is pretty trivial, of course. It would be a numeric (!) UID, GID, and octal permissions. I recall something about an UID <-> user name mapping service for NFS for environments where they may be different on different machines; that would be out of the question.

The file format already has a version number in its header, which helps to identify which parser to use. But QDirStat would need to retain backwards compatibility with the old file format; that makes the code a bit uglier.

Writing UID, GID and permissions works now in the Perl qdirstat-cache-writer, and the QDirStat binary can read them. Please check out and build the huha-cache-uid branch and do some initial testing.

Writing the new format with the QDirStat built-in cache writer will come tomorrow. Done.

Docs for the new file format here. The format has also become a bit prettier and easier to read for humans.

qdirstat-cache-writer now also has two new command line options to enforce the old format V1.0 with -1, and the new format V2.0 with -2 (for completeness; it's the new default).

This is now merged to master.

Thanks for adding these options, it will be extremely helpful. The new options -1 and -2 for qdirstat-cache-writer work perfectly. There seems to be an issue with -l though. When read into QDirStat gui, there are spurious files with names that are 3 or 4 digit numbers and size of 1.8GB
qdirstat-cache-write-l-option

Tested with QDirStat 1.9.01-git

Oops... that was a missing whitespace delimiter in that long format between the type and the name/full path; so both were conflated into what appeared to be one single field like F/foo/bar/myfile, and the subsequent fields on that line all moved up one position.

This is now fixed.

BTW you can easily look into a cache file, even if it's gzipped: Just use zless, zcat or zgrep. You don't even need to gunzip it.

Fun fact: That whole thing moved the fields of the affected lines one position up, so some other was interpreted as the size; and time_t of today is around 0x65e1d6ff (seconds since 1970-01-01 00:00) which translates to just about 1.6 GB. :-)

I was wondering how the fields ended up mapped... Thanks for the fix.

This doesn't affect functionality at all but while testing I noticed that the cache file itself gets listed in the cache with a size of 257 bytes, i.e. the size while it's still being written to. Could be a cosmetic improvement to exclude the cache file from listing
qdirstat-cache-file

Well, it's there in that directory, so of course it will be listed. And yes, of course this is just a snapshot in time, and a moment later the size may be different; like with all files on a modern OS.