Knowledge Distilation (Paper)
Knowledge distilation is a kind of transfer learning that learn from a larger pretrained model
The learning process of knowledge distillation is similar to the human beings learning
Students learn by watching and copying how teachers do it
If a teacher has a better ability, students will have a better ability too
On the contrary, if a teacher lacks ability, a student cannot produce good ability
We implemented response based offline distilation mentioned in the paper
The target training value of the student model is the predicted value of the teacher model
It does not matter whether the model to be trained is a classification model, an object detection model, or an RNN model
The preprocessing logic to build the training tensor is also not required at all
All you need is a teacher model and a student model with the same input shape and output shape