mengkunzhao / Awesome-Dataset-Distillation

Awesome Dataset Distillation Papers

https://github.com/Guang000/Awesome-Dataset-Distillation

Awesome-Dataset-Distillation

A curated list of awesome papers on dataset distillation and related applications, inspired by awesome-computer-vision.

Dataset distillation is the task of synthesizing a small dataset such that models trained on it achieve high performance on the original large dataset. A dataset distillation algorithm takes as input a large real dataset to be distilled (training set), and outputs a small synthetic distilled dataset, which is evaluated via testing models trained on this distilled dataset on a separate real dataset (validation/test set). A good small distilled dataset is not only useful in dataset understanding, but has various applications (e.g., continual learning, privacy, neural architecture search, etc.). This task was first introduced in the 2018 paper Dataset Distillation [Tongzhou Wang et al., '18], along with a proposed algorithm using backpropagation through optimization steps.

In recent years (2019-now), dataset distillation has gained increasing attention in the research community, across many institutes and labs. More papers are now being published each year. These wonderful researches have been constantly improving dataset distillation and exploring its various variants and applications.

This project is curated and maintained by Guang Li, Bo Zhao, and Tongzhou Wang.

How to submit a pull request?

🌐 Project Page
Code
📖 bibtex

Contents

Main
Applications

Media Coverage
Acknowledgments

Main

Dataset Distillation (Tongzhou Wang et al., 2018) 🌐 📖

Early Work

Gradient-Based Hyperparameter Optimization Through Reversible Learning (Dougal Maclaurin et al., ICML 2015) 📖

Gradient/Trajectory Matching Surrogate Objective

Dataset Condensation with Gradient Matching (Bo Zhao et al., ICLR 2021) 📖
Dataset Condensation with Differentiable Siamese Augmentation (Bo Zhao et al., ICML 2021) 📖
Dataset Distillation by Matching Training Trajectories (George Cazenavette et al., CVPR 2022) 🌐 📖
Dataset Condensation with Contrastive Signals (Saehyung Lee et al., ICML 2022) 📖
Delving into Effective Gradient Matching for Dataset Condensation (Zixuan Jiang et al., 2022) 📖

Distribution/Feature Matching Surrogate Objective

Dataset Condensation with Distribution Matching (Bo Zhao et al., 2021) 📖
CAFE: Learning to Condense Dataset by Aligning Features (Kai Wang & Bo Zhao et al., CVPR 2022) 📖
Dataset Factorization for Condensation (Songhua Liu et al., NeurIPS 2022) 📖

Better Optimization

Optimizing Millions of Hyperparameters by Implicit Differentiation (Jonathan Lorraine et al., AISTATS 2020) 📖
Dataset Meta-Learning from Kernel Ridge-Regression (Timothy Nguyen et al., ICLR 2021) 📖
Dataset Distillation with Infinitely Wide Convolutional Networks (Timothy Nguyen et al., NeurIPS 2021) 📖
On Implicit Bias in Overparameterized Bilevel Optimization (Paul Vicol et al., ICML 2022) 📖
Dataset Distillation using Neural Feature Regression (Yongchao Zhou et al., NeurIPS 2022) 🌐 📖
Efficient Dataset Distillation using Random Feature Approximation (Noel Loo et al., NeurIPS 2022) 📖

Distilled Dataset Parametrization

Synthesizing Informative Training Samples with GAN (Bo Zhao et al., 2022) 📖
Dataset Condensation via Efficient Synthetic-Data Parameterization (Jang-Hyun Kim et al., ICML 2022) 📖
Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks (Zhiwei Deng et al., NeurIPS 2022) 📖
PRANC: Pseudo RAndom Networks for Compacting deep models (Parsa Nooralinejad et al., 2022) 📖
Dataset Condensation with Latent Space Knowledge Factorization and Sharing (Hae Beom Lee et al., 2022) 📖

Label Distillation

Flexible Dataset Distillation: Learn Labels Instead of Images (Ondrej Bohdal et al., NeurIPS 2020 Workshop) 📖
Soft-Label Dataset Distillation and Text Dataset Distillation (Ilia Sucholutsky et al., IJCNN 2021) 📖

Benchmark

DC-BENCH: Dataset Condensation Benchmark (Justin Cui et al., NeurIPS 2022) 📖

Applications

Continual Learning

Reducing Catastrophic Forgetting with Learning on Synthetic Data (Wojciech Masarczyk et al., CVPR 2020 Workshop) 📖
Condensed Composite Memory Continual Learning (Felix Wiewel et al., IJCNN 2021) 📖
Distilled Replay: Overcoming Forgetting through Synthetic Samples (Andrea Rosasco et al., 2021) 📖
Sample Condensation in Online Continual Learning (Mattia Sangermano et al., IJCNN 2022) 📖

Privacy

SecDD: Efficient and Secure Method for Remotely Training Neural Networks (Ilia Sucholutsky et al., AAAI 2021 Student Abstract) 📖
Privacy for Free: How does Dataset Condensation Help Privacy? (Tian Dong et al., ICML 2022) 📖
Can We Achieve Robustness from Data Alone? (Nikolaos Tsilivis et al., 2022) 📖

Medical

Soft-Label Anonymous Gastric X-ray Image Distillation (Guang Li et al., ICIP 2020) 📖
Compressed Gastric Image Generation Based on Soft-Label Dataset Distillation for Medical Data Sharing (Guang Li et al., CMPB 2022) 📖
Dataset Distillation for Medical Dataset Sharing (Guang Li et al., 2022) 📖
Dataset Distillation using Parameter Pruning (Guang Li et al., 2022) 📖

Federated Learning

Federated Learning via Synthetic Data (Jack Goetz et al., 2020) 📖
Distilled One-Shot Federated Learning (Yanlin Zhou et al., 2020) 📖
FedSynth: Gradient Compression via Synthetic Data in Federated Learning (Shengyuan Hu et al., 2022) 📖
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning (Yuanhao Xiong et al., 2022) 📖
Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments (Rui Song et al., 2022) 📖

Graph Neural Network

Graph Condensation for Graph Neural Networks (Wei Jin et al., ICLR 2022) 📖
Condensing Graphs via One-Step Gradient Matching (Wei Jin et al., KDD 2022) 📖
Graph Condensation via Receptive Field Distribution Matching (Mengyang Liu et al., 2022) 📖

Neural Architecture Search

Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data (Felipe Petroski Such et al., ICML 2020) 📖

Fashion, Art, and Design

Wearable ImageNet: Synthesizing Tileable Textures via Dataset Distillation (George Cazenavette et al., CVPR 2022 Workshop) 🌐 📖

Knowledge Distillation

Knowledge Condensation Distillation (Chenxin Li et al., ECCV 2022) 📖

Recommender Systems

Infinite Recommendation Networks: A Data-Centric Approach (Noveen Sachdeva et al., NeurIPS 2022) 📖

Blackbox Optimization

Bidirectional Learning for Offline Infinite-width Model-based Optimization (Can Chen et al., NeurIPS 2022) 📖

Text

Data Distillation for Text Classification (Yongqi Li et al., 2021) 📖

Media Coverage

Acknowledgments

We would like to thank Nikolaos Tsilivis, Wei Jin, Yongchao Zhou, Noveen Sachdeva, Can Chen and Guangxiang Zhao for their valuable suggestions and contributions.

About

Awesome Dataset Distillation Papers

https://github.com/Guang000/Awesome-Dataset-Distillation

MIT License