Chinese Open Information Extraction (Zhopenie)

Installation

This module makes heavily use of pyltp

Install pyltp
```
pip install pyltp
```
Download NLP model from 百度雲

Why use LTP?

LTP has an excellent semantic parsing module shown below:

Also, in general, LTP performs better than other open-source Chinese NLP libraries,like Jieba ,here's the comparison on word tokenization for SIGHAN Bakeoff 2005 PKU, 510KB dataset:

Usage

The extractor module tries to break down a Chinese sentence into a Triple relation (e1, e2, r), which can be understood by computer
e.g. 星展集团是亚洲最大的金融服务集团之一, 拥有约3千5百亿美元资产和超过280间分行, 业务遍及18个市场。
are parsed as follows:

e1:星展集团, e2:亚洲最大的金融服务集团之一, r:是
e1:星展集团, e2:约3千5百亿美元资产, r:拥有
e1:业务, e2:18个市场, r:遍及

However, this extractor is about ~70% accurate and is still under improvement at this moment. Feel free to comment and make pull request.

Credits

哈工大社会计算与信息检索研究中心研制的语言技术平台 LTP

About

Chinese Open Information Extraction (Tree-based Triple Relation Extraction Module)

Languages

Language:Python 100.0%