r2en / wikipedia_es_similarity

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Text Similarity Search by using Elasticsearch

Preparation

$ wget https://dumps.wikimedia.org/other/cirrussearch/20190826/jawiki-20190826-cirrussearch-content.json.gz

$ wget https://github.com/singletongue/WikiEntVec/releases/download/20190520/jawiki.word_vectors.200d.txt.bz2
$ bunzip2 jawiki.word_vectors.200d.txt.bz2
$ docker-compose up
$ python build_index_wikipedia.py

Text Similarity Search

$ python search.py

About


Languages

Language:Python 100.0%