acoli-repo / document-clustering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ACoLi Document Clustering

Clustering documents according to document similarity, with a focus on scientific publications.

It's probably best to write this from scratch, but there are several earlier in-house implementations that can be taken into consideration in the old/ directory:

  • old/2019-beta-writer: plain cosine-based document clustering
  • old/2021-beta-writer: minor revision of 2019-beta-writer with duplicate avoidance

About

License:Apache License 2.0


Languages

Language:Python 100.0%