IlyaGusev / purano

News annotation and clustering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PuraNo - news annotation and clustering

Build Status Code Climate

Installation

Install Git, DVC and pip:

$ sudo wget https://dvc.org/deb/dvc.list -O /etc/apt/sources.list.d/dvc.list
$ sudo apt-get update
$ sudo apt-get install git dvc python3-pip

Clone repo and install Python requirements (Python 3.6+ recommended):

$ git clone https://github.com/IlyaGusev/purano
$ python3 -m pip install -r purano/requirements.txt

Run pipeline

$ dvc pull
$ dvc repro
$ cat output/metrics.json

WARNING: The clustering requires more than 8GB of RAM, as it stores all N^2 pairwise distances

About

News annotation and clustering

License:Apache License 2.0


Languages

Language:Jupyter Notebook 91.4%Language:Python 6.3%Language:HTML 1.8%Language:Jsonnet 0.4%Language:Shell 0.1%