zkat / cacache

💩💵 but for your data. If you've got the hash, we've got the cache ™ (moved)

Home Page:https://github.com/npm/cacache

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Protect against hash conflicts

zkat opened this issue · comments

I'm starting to think that cacache should start storing secondary integrity hashes in the index. Direct content address reads can still potentially yield bad data, but if you provide a key, cacache will interpret checksum conflicts as regular checksum failures (by using the stronger algorithm for data verification), and then it's up to the user to figure out what to do with it.

In the case of, say, pacote, what would happen on a tarball conflict is simply treating the conflict as corruption and then it would re-fetch the data.

idk if this is worth the effort -- if you're using cacache with weak checksums (it defaults to sha512!), then you're basically asking for trouble, but the reality is the npm registry still relies on sha1, and alternative registries will continue to do so further into the future.

commented

Another use case that unfortunately is restricted to weak hashes is Google Drive.

I have a project that needs to locally cache several thousand files from Google Drive, which only offers md5 checksums via its API. This can kind of be worked around with a Google Apps Script to calculate an alternative checksum based on a string representation of the file in question, but even if you do that you end up with a checksum for that string, not for the file itself. In short: for caching files from Google Drive, I'm kind of stuck with md5.