Node ID return type int -> uint64

Question

Node ID return type int -> uint64

baijum opened this issue 9 years ago · comments

Is it possible to change the return type of Node ID method from int to uint64 ? The size of type int depends on architecture where as uint64 would be more portable.

Side note: Hash algorithms like SipHash return uint64 by default ( https://github.com/dchest/siphash ) I am thinking to use that, if you change the return type.

Dan Kortschak · Answer 1 · Wed Jul 01 2015 05:04:59 GMT+0800 (China Standard Time)

I would rather not. Negative node IDs can be useful (e.g. 2-SAT problem) and I'm not convinced by the portability claim - the source is portable and we don't currently have and binary graph representation.

Can you explain how you want to use SipHash and how that is relevant here?

Baiju Muthukadan · Answer 2 · Wed Jul 01 2015 12:29:20 GMT+0800 (China Standard Time)

Well, I am very new to Graph. Now I am trying to solve a dependency analysis problem using DAG.

As per the spec the size of int is implementation specific. So, if the IDs are retrieved from a persistent store, there is a chance to get the constant NNN overflows int error.
https://golang.org/ref/spec#Numeric_types
If negative numbers are really required, why not use int32 or int64?
(BTW, int64 is supported on 32 bit platforms: https://groups.google.com/forum/#!topic/golang-nuts/mtnn-01Dh_I )

Regarding the SipHash, I was thinking I can create unique IDs for my nodes from certain string parameters using that hash. May be this design is not really required. I am in early stage of my development.

Dan Kortschak · Answer 3 · Wed Jul 01 2015 12:53:30 GMT+0800 (China Standard Time)

If you are writing code that you expect to create persistent data and you intend to share that between 32- and 64- bit archs, then you need to ensure that you your node IDs fit within 32 bits. How you serialise the node data is up to you, but there are plenty of arch agnostic methods available if you follow the above restriction.

Why not the fixed size ints? Because int matches the size of data store available for the arch without doubling the size of the ID needed (for some graphs this may be a signficant burden).

Using a hash to generate ID seems foolhardy. How do you know that you will not get a hash collision? Sure, the probability if low, but why not just make pool of IDs that you can draw from and keep those as well as whatever data you are storing in the node? I'm also wondering why you want to use a cryptographic hash for this.

I really can't see a good justification for making this change.

Baiju Muthukadan · Answer 4 · Wed Jul 01 2015 13:16:20 GMT+0800 (China Standard Time)

Thanks for the feedback, you can close this issue.

BTW, SipHash is not a cryptographic hash: https://131002.net/siphash/ (Users include Python, Ruby, JRuby, Rust, Redis etc.)

Dan Kortschak · Answer 5 · Wed Jul 01 2015 13:20:13 GMT+0800 (China Standard Time)

It's a fast cryptographic hash. Yes, it's used as a hashing system for many languages' associative arrays, but this is principally because it prevents the kind of things a crypto hash prevents, just faster.