python SimpleSparkSerializer support protobuf as serialize format
austinzh opened this issue · comments
austinzh commented
SimpleSparkSerializer use the default json
format.
But for model like Word2Vec, size can be huge and json decode with current design will cause heap OOM.
To ease this problem, protobuf format will provide better model size and read/write performance with less memory consumption.