Knowledge Distillation

PyTorch implementations of algorithms for knowledge distillation.

Setup

$ docker build -t kd -f Dockerfile .

$ docker run -v local_data_path:/data -v project_path:/app -p 0.0.0.0:8084:8084 -it kd

Cristian Bucila, Rich Caruana, Alexandru Niculescu-Mizil "ModelCompression" (2006) pdf.
Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" (2019) https://arxiv.org/abs/1910.01108.
Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin "Distilling Task-Specific Knowledge from BERT into Simple Neural Networks" (2019) https://arxiv.org/abs/1903.12136.
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations" (2019) https://arxiv.org/abs/1909.11942.
Rafael Müller, Simon Kornblith, Geoffrey Hinton "Subclass Distillation" (2020) https://arxiv.org/abs/2002.03936.
Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova "Well-Read Students Learn Better: On the Importance of Pre-training Compact Models" (2020) https://arxiv.org/abs/1908.08962.

PyTorch implementations of algorithms for knowledge distillation.

MIT License

Language:Python 94.9%Language:Dockerfile 5.1%