DISTIL is an active learning toolkit that implements a number of state-of-the-art active learning strategies with a particular focus for active learning in the deep learning setting. DISTIL is built on PyTorch and decouples the training loop from the active learning algorithm, thereby providing flexibility to the user by allowing them to control the training procedure and model. It allows users to incorporate new active learning algorithms easily with minimal changes to their existing code. DISTIL also provides support for incorporating active learning with your custom dataset and allows you to experiment on well-known datasets. We are continuously incorporating newer and better active learning selection strategies into DISTIL, and we plan to expand the scope of the supported active learning algorithms to settings beyond the currently supported supervised classification setting.
- Decouples the active learning strategy from the training loop, allowing users to modify the training and/or the active learning strategy
- Implements faster and more efficient versions of several active learning strategies
- Contains most state-of-the-art active learning algorithms
- Allows running basic experiments with just one command
- Presents interface to various active learning strategies through only a couple lines of code
- Requires only minimal changes to the configuration files to run your own experiments
- Achieves higher test accuracies with less amount of training data, admitting a huge reduction in labeling cost and time
- Requires minimal change to add it to existing training structures
- Contains recipes, tutorials, and benchmarks for all active learning algorithms on many deep learning datasets
DISTIL can be installed using the following means:
git clone https://github.com/decile-team/distil.git
cd distil
pip install -r requirements/requirements.txt
pip install decile-distil
To better understand DISTIL's functionality, we have provided example Jupyter notebooks in the tutorials folder, which can be easily executed by using Google Colab. We also provide a simple AL training loop that runs experiments using a provided configuration file. To run this loop, do the following from the base folder:
python train.py --config_path=/content/distil/configs/config_svhn_resnet_randomsampling.json
You can use the default configurations that we have provided in the configs folder, or you can make a custom configuration. For making your custom configuration file for training, please refer to Distil Configuration File Documentation.
Some of the algorithms currently implemented in DISTIL include the following:
- Uncertainty Sampling [1]
- Margin Sampling [2]
- Least Confidence Sampling [2]
- FASS [3]
- BADGE [4]
- GLISTER ACTIVE [6]
- CoreSets based Active Learning [5]
- Random Sampling
- Submodular Sampling [3,6,7]
- Adversarial DeepFool [9]
- BALD [10]
- Kmeans Sampling [5]
- Adversarial Bim
To learn more on different active learning algorithms, check out the Active Learning Strategies Survey Blog
Learn more about DISTIL by reading our documentation.
To receive updates about DISTIL and to be a part of the community, join the Decile_DISTIL_Dev group.
https://groups.google.com/forum/#!forum/Decile_DISTIL_Dev/join
This library takes inspiration, builds upon, and uses pieces of code from several open source codebases. These include Kuan-Hao Huang's deep active learning repository, Jordan Ash's Badge repository, and Andreas Kirsch's and Joost van Amersfoort's BatchBALD repository. Also, DISTIL uses submodlib for submodular optimization.
DISTIL is created and maintained by Nathan Beck, Suraj Kothawade, Durga Sivasubramanian, Apurva Dani, Rishabh Iyer, and Ganesh Ramakrishnan. We look forward to have DISTIL more community driven. Please use it and contribute to it for your active learning research, and feel free to use it for your commercial projects. We will add the major contributors here.
Youtube Tutorials on DISTIL:
- Tutorial on Active Learning
- Tutorial and Setup of DISTIL
- Benchmarking Active Learning through DISTIL
[1] Settles, Burr. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, 2009.
[2] Wang, Dan, and Yi Shang. "A new active labeling method for deep learning." 2014 International joint conference on neural networks (IJCNN). IEEE, 2014
[3] Kai Wei, Rishabh Iyer, Jeff Bilmes, Submodularity in data subset selection and active learning, International Conference on Machine Learning (ICML) 2015
[4] Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. Deep batch active learning by diverse, uncertain gradient lower bounds. CoRR, 2019. URL: http://arxiv.org/abs/1906.03671, arXiv:1906.03671.
[5] Sener, Ozan, and Silvio Savarese. "Active learning for convolutional neural networks: A core-set approach." ICLR 2018.
[6] Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer, GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning, 35th AAAI Conference on Artificial Intelligence, AAAI 2021
[7] Vishal Kaushal, Rishabh Iyer, Suraj Kothawade, Rohan Mahadev, Khoshrav Doctor, and Ganesh Ramakrishnan, Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision, 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019 Hawaii, USA
[8] Wei, Kai, et al. "Submodular subset selection for large-scale speech training data." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.
[9] Ducoffe, Melanie, and Frederic Precioso. "Adversarial active learning for deep networks: a margin based approach." arXiv preprint arXiv:1802.09841 (2018).
[10] Gal, Yarin, Riashat Islam, and Zoubin Ghahramani. "Deep bayesian active learning with image data." International Conference on Machine Learning. PMLR, 2017.
[11] Suraj Kothawade, Nathan Beck, Krishnateja Killamsetty, and Rishabh Iyer, “SIMILAR: Submodular Information Measures based Active Learning in Realistic Scenarios,” Neural Information Processing Systems, NeurIPS 2021.
[12] Suraj Kothawade, Vishal Kaushal, Ganesh Ramakrishnan, Jeff Bilmes, Rishabh Iyer. PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Subset Selection. To Appear In 36th AAAI Conference on Artificial Intelligence, AAAI 2022