Freezing the graph + optimizing + quantizing a model for inference purposes

Question

Freezing the graph + optimizing + quantizing a model for inference purposes

walmsley opened this issue 6 years ago · comments

Has anyone had any success with:
a) freezing the graph of your seq2seq model
b) optimizing it for inference, e.g. using optimize_for_inference_lib.optimize_for_inference(...) such as in https://gist.github.com/omimo/5d393ed5b64d2ca0c591e4da04af6009
c) quantizing it e.g. https://www.tensorflow.org/performance/quantization

I have a fully trained seq2seq model, but just need it to use less GPU memory. I was happy to discover that there tensorflow tools already exist for optimizing and quantizing your model, but this seq2seq library is complex enough that I'm having difficulty understanding how to apply those optimization scripts to a seq2seq model.

Thanks! I'd love to chat offline with anyone who has expertise with this library, and I'll post updates if I get this working.