udhos / equalfile

Go package to compare files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Should the 'equal' program return true for partial match with MaxSize error?

udhos opened this issue · comments

commented

Should the 'equal' program return true for partial match with MaxSize error?

Regarding the error behavior for exceeding the MaxSize setting, it seems that CompareReader() func itself will return "true", with an error, when exceeding MaxSize, presumably on the assumption that the files were equal up to that point. However, in your 'equal' program, it then sets the match to "false".

Currently, the amount of read bytes can be greater than MaxSize, and there is nothing to stop the comparison of bytes being performed on the excess read() data beyond MaxSize, so it's possible for two Readers to agree right up to MaxSize, where the return value should be "true", and then disagree starting at MaxSize + 1, in which case the result will be "false". In other words, if MaxSize is set, I'd expect no bytes beyond MaxSize should be compared (or at least they shouldn't be used to determine the equality result), which is currently not the case.

Rather than attempting to work out the correct buffer slicing so that the bytes.Equal doesn't exceed maxSize bytes, we can instead use LimitReader on the given readers (if they aren't already LimitedReaders) to ensure that the bytes returned during the initial bytes comparison phase don't exceed MaxSize. And then if the comparison up to that point is determined to be true, we can check if there is anything else to Read() by attempting to read one more byte from the original non-LimitedReaders. If it's still EOF, then we know we have equal readers, and if the reads succeed, we return an error for exceeding MaxSize (matching current behavior).

I have the test code demonstrating the current undesired behavior, and I think I can provide a proof-of-concept fix shortly; it not, I'll publish just the tests for review.

commented

I'm used to think of MaxLimit as a rough protection against reading from infinite streams.
It's never occurred to me it could also be seen as an actual limit to prevent comparisons beyond it.

Yeah, it might be another item for better documenting, since users currently have the option of setting MaxSize equal to a low value (even 1), and having that low value be a "limit" for both the CompareFile() and CompareReader() calls. Even if a user wanted to limit the comparison amount, having it as a global Option is (imo) a bit klunky, as it seems more natural as a function call argument.

If it's meant to be a safety guard for CompareReader(), would you consider deprecating it as a limit for regular files in CompareFile(), for example? Currently an uninitialized MaxSize is setup to be the known file size, but purposefully initialized low MaxSize values will affect the calculated hash value, and potentially the equality value returned with the MaxSize exceeded error.

This particular thread actually seems better suited for discussion in issue #12, so I'm going to move the rest of my response over there instead...