suzaku-io / boopickle

Binary serialization library for efficient network communication

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add binary format [version | hash | digest] constant accessible from run time.

sumnulu opened this issue · comments

Conveniently Boopickle can generate picklers from sealed traits with composite picklers.

But incompatible binary formats will throw java.lang.IllegalStateException: Index NN is not defined in this CompositePickler

It would be nice to have some macro generated digest constant to check if the other node's binary format is compatible with the current one. So we could check the versions before deserialization | serialization.

This binary incompatibility is a design choice in BooPickle to keep overhead as low as possible. It's best to add versioning information to the binary stream before BooPickle's data and act on it accordingly. You can do this easily by keeping the PickleState.

For an example use case, check out Arteria's channel protocol system in https://github.com/suzaku-io/arteria/blob/master/arteria-core/shared/src/main/scala/arteria/core/MessageRouter.scala

Yep I understand and I agree that is a good choice. I mean something like this:

val someSealedTrait_picklerHash = Unpickle[SomeSealedTrait].getHash // macro generated some constant.

So off-band I can send this someSealedTrait_picklerHash hash to the other node just after connection established. If both hashes are same then we can start sending actual messages.

Hash can be simple as the following, (at compile time, statically generated)

final val hash = Digest(
findConcreteTypes(SomeSealedTrait) map (showCode) fold (_ + _)
)
//some 128 bit string

So we can get that hash from some thing like this:

val myPickler[A] =boopickle.DefaultBasic.PicklerGenerator.generatePickler[A]
final val myAutoGeneratedProtocolVersion = myPickler.hash

So there won't be any runtime overhead.

Does it makes sense?

I still doubt this kind of "versioning" would be very useful in reality. For example if you had case class A(x: Int, y:Int) extends T and later changed it to case class A(y: Int, x:Int) extends T it would be "binary compatible" from BooPickle's point of view, but it would still be wrong. Or if you included field names in the digest, renaming a field would lead to a different hash even though it actually would be binary compatibile in reality. Or changing a Seq[X] to List[X] etc.

It's impossible to automatically determine the intention of changes by the programmer in this fashion. That's why it's best to leave the versioning decision to the programmer themselves :)

You hava just described what is wrong with JVM generated serialVersionUID.
But... IMO auto generated version are not bad if you have:

  • fast client update cycle (web/scalajs)
    • most of the clients will eventually reload the page
    • or programatically one can reload the app
    • even one can hot reload the app
  • ephemeral objects that are serialized
  • Homogeneous platform(in this case just the compiler macro) to generate version number.

On the other hand protocols should not change frequently, and maybe they should deserve explicit developer versioning. For now I will go with the explicit versioning (simpler than pr), thanks for the lib&discussion.