lovell / farmhash

Node.js implementation of FarmHash, Google's family of high performance hash functions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fingerprint64 results inconsistent with Google BigQuery

jakelowen opened this issue · comments

Google BigQuery has farm_fingerprint 64 as a built in function. https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#farm_fingerprint

BiqQuery:

SELECT FARM_FINGERPRINT("1footrue");

Returns -1541654101129638711

lovell/farmhash:

farmhash.fingerprint64("1footrue")

yields 16905089972579912905

Shouldn't they produce the same? For any string that BQ produces a positive hash for i.e. "2applefalse" both this package and bigquery converge on 2794438866806483259.

Hello, the method signature of Fingerprint64 returns an unsigned 64-bit integer via the uint64_t type so it would appear BigQuery is wrong to return a negative value.

Thanks @lovell - your analysis makes sense. Feel free to close the issue since it seems to be an issue on BQ side and not your library.

On that note: can you think of a helper function that would take the result of your farmhash.fingerprint64 and transform in the same way that BQ does? In essence to recreate their error reliably?

Not a big deal, as I can just fall back to to_hex(md5(...) for consistency across my scripts and bigquery for hashing, but it would be cool to use the farm approach.

I took the liberty of adding the "1footrue" example as a test case in commit c9e44b7.

Happy to accept a PR that exposes a fingerprint64signed function that performs an unsigned to signed cast in C++ land if you're able as it would be good to support those using BigQuery, albeit with a tinge of sadness at Google pushing their architectural trade-offs onto you and I.

Sorry @lovell - I would love to help but my C++ chops are weak to nonexistent. I do appreciate your work on this library though! I'll keep an eye on it in case anyone else is able to help.

Thanks @lovell - your analysis makes sense. Feel free to close the issue since it seems to be an issue on BQ side and not your library.

On that note: can you think of a helper function that would take the result of your farmhash.fingerprint64 and transform in the same way that BQ does? In essence to recreate their error reliably?

Thanks for clarification on this int type issue. I was taken by surprise as well.

import numpy as np
np.uint64(farmhash.fingerprint64(x)).astype('int64')

then it will give the same result as BigQuery on 'x'

This also works, no?

const fingerprint64signed = input => {
  return BigInt.asIntN(64, farmhash.fingerprint64(input)).toString();
};

fingerprint64signed("1footrue") yields "-1541654101129638711"