Hashing a file (MD5) without having to read the whole file in memory
saket opened this issue · comments
Hello, what would be an efficient way of comparing file content that does not involve loading the entire content into memory?
fun equalsContent(content: String): Boolean {
TODO()
}
If the content you want to compare is a String, I think the best way is to actually load the file in memory first. You are already consuming that memory. Also you might want to specify a charset encoding just in case it is not UTF
Gotcha. Let me ask a different question: is there a fast way to calculate md5 hash of a file using korio? Somewhat along the lines of okio: https://stackoverflow.com/a/61217039/2511884
Initially there was a MD5 implementation in KorIO, but I moved it to krypto here: https://github.com/korlibs/krypto/blob/master/krypto/src/commonMain/kotlin/com/soywiz/krypto/MD5.kt
Since krypto is not aware of KorIO, it doesn't have the concept of Stream you would need an extension method to connect both.
Without checking in code, that would look something like this:
suspend fun VfsFile.hash(algo: HasherFactory) = openRead().use { hash(algo) }
suspend fun AsyncStream.hash(algo: HasherFactory): Hash {
return algo.create().also {
val temp = ByteArray(0x1000)
while (true) {
val count = read(temp)
if (count <= 0) break
it.update(temp, 0, count)
}
}.digest()
}
If there was an artifact including korio and krypto, that would be possible to include those methods.
Maybe we can include krypto in korio 2.0 and add those methods. Since all targets have some kind of DCE I think it shouldn't be a problem.
Really nice. I look forward to the next release for using this, thank you!
Sure. Have in mind that you can use it already by including krypto and adding the provided extension methods to your project.
2.0.0 is scheduled for Kotlin 1.4.20