python-cachier / cachier

Persistent, stale-free, local and cross-machine caching for Python functions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request: use sha or md5 to hash a pathlib.Path

saludes opened this issue · comments

Assume we have a lenghty operation depending on the content of a big file.
Hashing the path (as in pathlib.Path) will give us only a hash on the file name.
What if the file grows in time by appends?

It will be useful to have it hashed automatically by sha/md5 on the content or maybe a combination of date/size as suggested in https://stackoverflow.com/questions/1761607/what-is-the-fastest-hash-algorithm-to-check-if-two-files-are-equal

commented

you can use hash_params

def md5(args, kwargs):
    """
    :return md5 string
    """
    filepath = args[0]
    hash_md5 = hashlib.md5()

    with open(filepath, "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            hash_md5.update(chunk)
    res = hash_md5.hexdigest()
    return res

@cachier(hash_params=md5)
def foo(filepath):
    return 111

Yes, I know; but I think it will be a fine feature to provide it as default for Path s

It would definitely need to be a VERY optional feature, and not a default one, because the assumption that if someone provides the wrapped function with a Path object your'e allowed to just open it and read it through (which is what hashing file content requires) is EXTREMELY problematic. Functions can do a lot of things with Path objects which are not reading file content, and other parts of the program or code might even assume that they do not read them, or perform ANY king of I/O.

So what's missing here, before this specific feature, is a feature that defines "plugins", or some other way to customize argument handling, that is a bit more fine-grained than the existing hash_params features. For most use cases, hash_params sounds like enough.

I'm closing this because I feel this is ill-fitted for a feature, but users should feel free to open this issue again and engage in a discussion and provide suggestions.