python SimpleSparkSerializer support protobuf as serialize format

Question

python SimpleSparkSerializer support protobuf as serialize format

austinzh opened this issue 2 years ago · comments

SimpleSparkSerializer use the default json format.
But for model like Word2Vec, size can be huge and json decode with current design will cause heap OOM.
To ease this problem, protobuf format will provide better model size and read/write performance with less memory consumption.