ACoLi Document Clustering
Clustering documents according to document similarity, with a focus on scientific publications.
It's probably best to write this from scratch, but there are several earlier in-house implementations that can be taken into consideration in the old/
directory:
old/2019-beta-writer
: plain cosine-based document clusteringold/2021-beta-writer
: minor revision of2019-beta-writer
with duplicate avoidance