- Model Compression, Rich Caruana, 2006
- Distilling the Knowledge in a Neural Network, Hinton, J.Dean, 2015
- Knowledge Acquisition from Examples Via Multiple Models, Perdo Domingos, 1997
- Combining labeled and unlabeled data with co-training, A. Blum, T. Mitchell, 1998
- Using A Neural Network to Approximate An Ensemble of Classifiers, Xinchuan Zeng and Tony R. Martinez, 2000
- Do Deep Nets Really Need to be Deep?, Lei Jimmy Ba, Rich Caruana, 2014
- FitNets: Hints for Thin Deep Nets, Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio, 2015
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, Sergey Zagoruyko, Nikos Komodakis, 2016
- A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning, Junho Yim, Donggyu Joo, Jihoon Bae, Junmo Kim, 2017
- Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks, Zheng Xu, Yen-Chang Hsu, Jiawei Huang
- Born Again Neural Networks, Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar, 2018
- Net2Net: Accelerating Learning Via Knowledge Transfer, Tianqi Chen, Ian Goodfellow, Jonathon Shlens, 2016
- Unifying distillation and privileged information, David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik, 2015
- Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks, Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, Ananthram Swami, 2016
- Large scale distributed neural network training through online distillation, Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton, 2018
- Deep Mutual Learning, Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu, 2017
- Learning Loss for Knowledge Distillation with Conditional Adversarial Networks, Zheng Xu, Yen-Chang Hsu, Jiawei Huang, 2017
- Data-Free Knowledge Distillation for Deep Neural Networks, Raphael Gontijo Lopes, Stefano Fenu, Thad Starner, 2017
- Quantization Mimic: Towards Very Tiny CNN for Object Detection, Yi Wei, Xinyu Pan, Hongwei Qin, Wanli Ouyang, Junjie Yan, 2018
- Knowledge Projection for Deep Neural Networks, Zhi Zhang, Guanghan Ning, Zhihai He, 2017
- Moonshine: Distilling with Cheap Convolutions, Elliot J. Crowley, Gavin Gray, Amos Storkey, 2017
- Training a Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving, Jiaolong Xu, Peng Wang, Heng Yang and Antonio M. L ´opez, 2018
- Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net, Zihao Liu, Qi Liu, Tao Liu, Yanzhi Wang, Wujie Wen, 2017
- Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher, Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Hassan Ghasemzadeh, 2019
- ResKD: Residual-Guided Knowledge Distillation, Xuewei Li, Songyuan Li, Bourahla Omar, and Xi Li, 2020
- Rethinking Data Augmentation: Self-Supervision and Self-Distillation, Hankook Lee, Sung Ju Hwang, Jinwoo Shin, 2019
- MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks, Yunteng Luan, Hanyu Zhao, Zhi Yang, Yafei Dai, 2019
- Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation, Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, Kaisheng Ma, 2019
- Cross Modal Distillation for Supervision Transfer, Saurabh Gupta, Judy Hoffman, Jitendra Malik, CVPR 2016
- Deep Model Compression: Distilling Knowledge from Noisy Teachers, Bharat Bhusan Sau, Vineeth N. Balasubramanian, 2016
- Knowledge Distillation for Small-footprint Highway Networks, Liang Lu, Michelle Guo, Steve Renals, 2016
- Sequence-Level Knowledge Distillation, deeplearning-papernotes, Yoon Kim, Alexander M. Rush, 2016
- Recurrent Neural Network Training with Dark Knowledge Transfer, Zhiyuan Tang, Dong Wang, Zhiyong Zhang, 2016
- Face Model Compression by Distilling Knowledge from Neurons, Ping Luo, Zhenyao Zhu, Ziwei Liu, Xiaogang Wang, and Xiaoou Tang, 2016
- Data Distillation: Towards Omni-Supervised Learning, Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He, CVPR 2017
- Knowledge Projection for Deep Neural Networks, Zhi Zhang, Guanghan Ning, Zhihai He, 2017
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer, Zehao Huang, Naiyan Wang, 2017
- Data-Free Knowledge Distillation For Deep Neural Networks, Raphael Gontijo Lopes, Stefano Fenu, 2017
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang, 2017
- Adapting Models to Signal Degradation using Distillation, Jong-Chyi Su, Subhransu Maji, BMVC 2017
- Learning Global Additive Explanations for Neural Nets Using Model Distillation, Sarah Tan, Rich Caruana, Giles Hooker, Paul Koch, Albert Gordo, 2018
- YASENN: Explaining Neural Networks via Partitioning Activation Sequences, Yaroslav Zharov, Denis Korzhenkov, Pavel Shvechikov, Alexander Tuzhilin, 2018
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, Antti Tarvainen, Harri Valpola, 2018
- Local Affine Approximators for Improving Knowledge Transfer, Suraj Srinivas & François Fleuret, 2018
- Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?Shilin Zhu, Xin Dong, Hao Su, 2018
- Probabilistic Knowledge Transfer for deep representation learning, Nikolaos Passalis, Anastasios Tefas, 2018
- Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons, Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi, 2018
- Paraphrasing Complex Network: Network Compression via Factor Transfer, Jangho Kim, SeongUk Park, Nojun Kwak, NIPS, 2018
- KDGAN: Knowledge Distillation with Generative Adversarial Networks, Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi, NeurIPS 2018
- Learning Efficient Detector with Semi-supervised Adaptive Distillation, Shitao Tang, Litong Feng, Zhanghui Kuang, Wenqi Shao, Quanquan Li, Wei Zhang, Yimin Chen, 2019
- Dataset Distillation, Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros, 2019
- Relational Knowledge Distillation, Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho, 2019
- Knowledge Adaptation for Efficient Semantic Segmentation, Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan, 2019
- A Comprehensive Overhaul of Feature Distillation, Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young Choi, 2019, code
- Towards Understanding Knowledge Distillation, Mary Phuong, Christoph Lampert, ICML, 2019
- Knowledge Distillation from Internal Representations, Gustavo Aguilar, Yuan Ling, Yu Zhang, Benjamin Yao, Xing Fan, Edward Guo, 2019
- Knowledge Flow: Improve Upon Your Teachers, Iou-Jen Liu, Jian Peng, Alexander G. Schwing, 2019
- Similarity-Preserving Knowledge Distillation, Frederick Tung, Greg Mori, 2019
- [Correlation Congruence for Knowledge Distillation](Correlation Congruence for Knowledge Distillation), Baoyun Peng, Xiao Jin, Jiaheng Liu, Shunfeng Zhou, Yichao Wu, Yu Liu, Dongsheng Li, Zhaoning Zhang, 2019
- Variational Information Distillation for Knowledge Transfer, Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen Dai, 2019
- Knowledge Distillation via Instance Relationship Graph, Yufan Liu, Jiajiong Cao, Bing Lia, Chunfeng Yuan, Weiming Hua, Yangxi Lic, Yunqiang Duan, CVPR 2019
- Structured Knowledge Distillation for Semantic Segmentation, Yifan Liu, Changyong Shu, Jingdong Wang, Chunhua Shen, CVPR 2019
- Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion, Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz, 2020
- Reducing the Teacher-Student Gap via Spherical Knowledge Disitllation, Jia Guo, Minghao Chen, Yao Hu, Chen Zhu, Xiaofei He, Deng Cai, 2020
- Data-Free Adversarial Distillation, Gongfan Fang, Jie Song, Chengchao Shen, Xinchao Wang, Da Chen, Mingli Song, 2020
- Contrastive Representation Distillation, Yonglong Tian, Dilip Krishnan, Phillip Isola, ICLR 2020, code
- StyleGAN2 Distillation for Feed-forward Image Manipulation, Yuri Viazovetskyi, Vladimir Ivashkin, and Evgeny Kashin, ECCV 2020
- Distilling Knowledge from Graph Convolutional Networks, Yiding Yang, Jiayan Qiu, Mingli Song, Dacheng Tao, Xinchao Wang, CVPR 2020
- Self-supervised Knowledge Distillation for Few-shot Learning, Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah, 2020, code
- Online Knowledge Distillation with Diverse Peers, Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng and Chun Chen, AAAI, 2020
- Intra-class Feature Variation Distillation for Semantic Segmentation, Yukang Wang, Wei Zhou, Tao Jiang, Xiang Bai, and Yongchao Xu, ECCV 2020
- Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition, Xiaobo Wang, Tianyu Fu, Shengcai Liao, Shuo Wang, Zhen Lei, and Tao Mei, ECCV 2020
- Improving Face Recognition from Hard Samples via Distribution Distillation Loss, Yuge Huang, Pengcheng Shen, Ying Tai, Shaoxin Li, Xiaoming Liu, Jilin Li, Feiyue Huang, Rongrong Ji, ECCV 2020
- Dataset Distillation with Infinitely Wide Convolutional Networks, Timothy Nguyen, Roman Novak, Lechao Xiao, Jaehoon Lee, 2021
- Dataset Meta-Learning from Kernel Ridge-Regression, Timothy Nguyen, Zhourong Chen, Jaehoon Lee, 2021
- Up to 100× Faster Data-free Knowledge Distillation, Gongfan Fang1, Kanya Mo, Xinchao Wang, Jie Song, Shitao Bei, Haofei Zhang, Mingli Song, 2021
- Robustness and Diversity Seeking Data-Free Knowledge Distillation, Pengchao Han, Jihong Park, Shiqiang Wang, Yejun Liu, 2021
- Data-Free Knowledge Transfer: A Survey, Yuang Liu, Wei Zhang, Jun Wang, Jianyong Wang, 2021
- Undistillable: Making A Nasty Teacher That CANNOT teach students, Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang, ICLR 2021
- QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning, Kaan Ozkara, Navjot Singh, Deepesh Data, Suhas Diggavi, NeurIPS 2021
- KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation, Yongfei Liu, Chenfei Wu, Shao-yen Tseng, Vasudev Lal, Xuming He, Nan Duan
- Online Knowledge Distillation for Efficient Pose Estimation, Zheng Li, Jingwen Ye, Mingli Song, Ying Huang, Zhigeng Pan, ICCV 2021
- Does Knowledge Distillation Really Work?, Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson, NeurIPS 2021
- Hierarchical Self-supervised Augmented Knowledge Distillation, Chuanguang Yang, Zhulin An, Linhang Cai, Yongjun Xu, IJCAI 2021
- DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis With GANs, Javier Nistal, Stefan Lattner, Gaël Richard, ISMIR2021
- On Self-Distilling Graph Neural Network, Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, Junzhou Huang, IJCAI 2021
- Graph-Free Knowledge Distillation for Graph Neural Networks, Xiang Deng, Zhongfei Zhang, IJCAI 2021
- Self Supervision to Distillation for Long-Tailed Visual Recognition, Tianhao Li, Limin Wang, Gangshan Wu, ICCV 2021
- Cross-Layer Distillation with Semantic Calibration, Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Zhe Wang, Yan Feng, Chun Chen, AAAI 2021
- Channel-wise Knowledge Distillation for Dense Prediction, Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen, ICCV 2021
- Training data-efficient image transformers & distillation through attention, Hugo Touvron, Matthieu Cord, Douze Matthijs, Francisco Massa, Alexandre Sablayrolles, Herve Jegou, ICML 2021
- Exploring Inter-Channel Correlation for Diversity-preserved Knowledge Distillation, Li Liu, Qingle Huang, Sihao Lin, Hongwei Xie, Bing Wang, Xiaojun Chang, Xiaodan Liang, ICCV 2021, code
- torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation, Yoshitomo Matsubara, International Workshop on Reproducible Research in Pattern Recognition 2021, code
- LGD: Label-guided Self-distillation for Object Detection, Peizhen Zhang, Zijian Kang, Tong Yang, Xiangyu Zhang, Nanning Zheng, Jian Sun, AAAI 2022
- MonoDistill: Learning Spatial Features for Monocular 3D Object Detection, Anonymous, ICLR 2022
- Bag of Instances Aggregation Boosts Self-supervised Distillation, Haohang Xu, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian, ICLR 2022
- Meta Learning for Knowledge Distillation, Wangchunshu Zhou, Canwen Xu, Julian McAuley, 2022
- Focal and Global Knowledge Distillation for Detectors, Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei Zhao, Chun Yuan, CVPR 2022
- Self-Distilled StyleGAN: Towards Generation from Internet Photos, Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani, Inbar Mosseri, 2022
- Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation, Gang Li, Xiang Li, Yujie Wang, Shanshan Zhang, Yichao Wu, Ding Liang, AAAI 2022
- Decoupled Knowledge Distillation, Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, Jiajun Liang, CVPR 2022, code
- Graph Flow: Cross-layer Graph Flow Distillation for Dual-Efficient Medical Image Segmentation, Wenxuan Zou, Muyi Sun, 2022
- Dataset Distillation by Matching Training Trajectories, George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, Jun-Yan Zhu, CVPR 2022
- Knowledge Distillation with the Reused Teacher Classifier, Defang Chen, Jian-Ping Mei, Hailin Zhang, Can Wang, Yan Feng, Chun Chen, CVPR 2022
- Self-Distillation from the Last Mini-Batch for Consistency Regularization, Shen Yiqing, Xu Liwu, Yang Yuzhe, Li Yaqian and Guo Yandong, CVPR 2022 code
- DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers, Xianing Chen, Qiong Cao, Yujie Zhong, Shenghua Gao, CVPR 2022
- Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning, Lin Zhang, Li Shen, Liang Ding, Dacheng Tao, Ling-Yu Duan, CVPR 2022
- LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection, Yi Wei, Zibu Wei, Yongming Rao, Jiaxin Li, Jiwen Lu, Jie Zhou, 2022
- Localization Distillation for Dense Object Detection, Zhaohui Zheng, Rongguang Ye, Ping Wang, Dongwei Ren, Wangmeng Zuo, Qibin Hou, Ming-Ming Cheng, CVPR 2022, code
- Localization Distillation for Object Detection, Zhaohui Zheng, Rongguang Ye, Qibin Hou, Dongwei Ren, Ping Wang, Wangmeng Zuo, Ming-Ming Cheng, 2022, code
- Cross-Image Relational Knowledge Distillation for Semantic Segmentation, Chuanguang Yang, Helong Zhou, Zhulin An, Xue Jiang, Yongjun Xu, Qian Zhang, CVPR 2022, code
- Knowledge distillation: A good teacher is patient and consistent, Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov, CVPR 2022
- Spot-adaptive Knowledge Distillation, Jie Song, Ying Chen, Jingwen Ye, Mingli Song, TIP 2022, code
- MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning, Shiming Chen, Ziming Hong, Guo-Sen Xie, Wenhan Yang, Qinmu Peng, Kai Wang, Jian Zhao, Xinge You, CVPR 2022
- Knowledge Distillation via the Target-aware Transformer, Sihao Lin, Hongwei Xie, Bing Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang, Gang Wang, CVPR 2022
- PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection, Linfeng Zhang, Runpei Dong, Hung-Shuo Tai, Kaisheng Ma, arxiv 2022, code
- Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation, Linfeng Zhang, Xin Chen, Xiaobing Tu, Pengfei Wan, Ning Xu, Kaisheng Ma, CVPR 2022
- Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation, Yixuan Wei, Han Hu, Zhenda Xie, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo, Tech Report 2022, code
- Knowledge Distillation via the Target-aware Transformer, Sihao Lin, Hongwei Xie, Bing Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang, Gang Wang, CVPR 2022
- BERT Learns to Teach: Knowledge Distillation with Meta Learning, Wangchunshu Zhou, Canwen Xu, Julian McAuley, ACL 2022, code
- Nearest Neighbor Knowledge Distillation for Neural Machine Translation, Zhixian Yang, Renliang Sun, Xiaojun Wan, NAACL 2022