NidhoggurZ / simtext

simtext, the most accurate Chinese text similarity calculation tool.(最准的中文文本相似度计算工具)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

text2vec

text2vec, text to vector tool, the most accurate Chinese text similarity calculation tool.(最准的中文文本相似度计算工具)

Install

  • pip3 install text2vec

or

git clone https://github.com/shibing624/text2vec.git
cd text2vec
python3 setup.py install

Usage:

import text2vec

a = '如何更换花呗绑定银行卡'
b = '花呗更改绑定银行卡'
emb = text2vec.encode(a)
print(emb)
s = text2vec.score(a, b)
print(s)

output:

0.9569100456524151

Reference

  1. 将句子表示为向量(上):无监督句子表示学习(sentence embedding)
  2. 将句子表示为向量(下):无监督句子表示学习(sentence embedding)
  3. 《A Simple but Tough-to-Beat Baseline for Sentence Embeddings》[Sanjeev Arora and Yingyu Liang and Tengyu Ma, 2017]

About

simtext, the most accurate Chinese text similarity calculation tool.(最准的中文文本相似度计算工具)

License:Apache License 2.0


Languages

Language:Python 100.0%