Clustering of the 600,000 random subset of 1,747,307 literature hosted on arXiv. Papers are clustered using the technqiues described in COVID-19 Literature Clustering.
Dataset: arXiv Dataset | Kaggle
If you use arXiv Literature Clustering, please cite the original paper and the code:
@inproceedings{EREN2020,
author = {Eren, E. Maksim. Solovyev, Nick. Nicholas, Charles. Raff, Edward. Johnson, Ben},
title = {COVID-19 Kaggle Literature Organization},
year = {2020},
month = {April},
location = {San Jose, CA, USA},
note={Malware Research Group, University of Maryland Baltimore County. \url{https://github.com/MaksimEkin/COVID19-Literature-Clustering}},
url = {TBA},
doi = {TBA},
howpublished = {DocEng'20: ACM Symposium on Document Engineering}
}