ahamez / protox

A fast, easy to use and 100% conformant Elixir library for Google Protocol Buffers (aka protobuf)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] The code generated is not deterministic

ananthakumaran opened this issue · comments

We recently switched from macro to mix protox.generate to improve the compile time. One thing we noticed is the code generated by the command is not deterministic. Even if there is no change in the source proto files, the order of key/value present inside the defs_by_name() function changes. This only happens for proto files with a lot of fields (> 50 fields) and there are changes in unrelated area (like if I checkout different branch etc). I can't share my project proto file here, but if you have difficulty with reproducing the issue let me know.

     def(defs_by_name()) do
       %{
        ...

Thanks for the report! I'll try to reproduce this tomorrow.

I could not reproduce this 😓. I've been able to reproduce this! See next comment.

I did the following:

  • I've generated a protobuf message with many fields using the following code:
    File.write(
       "big_message.proto",
       ["syntax = \"proto3\";\nmessage BigMessage {\n", Enum.map((1..100), fn x -> "int32 a_#{x} = #{x};\n" end), "}\n"]
    )
  • I then generated the Elixir code with:
    MIX_ENV=prod mix protox.generate --output-path=./big_message.ex ./big_message.proto

However, the result of the code generation doesn't change 🤔.

I guess this is related to the fact that maps with a number of entries larger than 32 are implement with a Hash array mapped trie as per this SO answer. Maybe the hash function is not deterministic when invoked in different Beam VM instances…

If it's the case, I think the only fix, if you don't use defs_by_name(), is to make its generation optional (or even remove it completely as it's now deprecated).

So, as I suspected, the order changes when generating code using different VMs, one running on macOS and one on Linux:

Screenshot 2021-12-10 at 22 25 29

The question is now: do you need defs_by_name()?

Thanks for taking the effort to reproduce. I got the issue with the same VM (linux), but not on a consecutive runs. It usually happens when I switch branches or regenerate the code after making changes to unrelated parts. Though I never spent time to figure out a step-by-step recipe. I suspected the unordered nature of the map type and different implementation based on the map size.

The question is now: do you need defs_by_name()?

No, we don't depend on this. Either a flag to disable the generation of the function or the removal of the function is ok for me.

Great! I'll take care of this in a few days.

And it's now fixed in 1.6.6!
You can use the option generate-defs-funs to deactivate the generations of defs_by_name:

MIX_ENV=prod mix protox.generate --generate-defs-funs=false --output-path=./big_message.ex ./big_message.proto