wengbenjue / zhopenie

Chinese Open Information Extraction (Tree-based Triple Relation Extraction Module)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Chinese Open Information Extraction (Zhopenie)

Installation

This module makes heavily use of pyltp

  1. Install pyltp
    pip install pyltp
    
  2. Download NLP model from 百度雲

Why use LTP?

LTP has an excellent semantic parsing module shown below: Alt text

Also, in general, LTP performs better than other open-source Chinese NLP libraries,like Jieba ,here's the comparison on word tokenization for SIGHAN Bakeoff 2005 PKU, 510KB dataset: Alt text

Usage

The extractor module tries to break down a Chinese sentence into a Triple relation (e1, e2, r), which can be understood by computer
e.g. 星展集团是亚洲最大的金融服务集团之一, 拥有约3千5百亿美元资产和超过280间分行, 业务遍及18个市场。
are parsed as follows:

e1:星展集团, e2:亚洲最大的金融服务集团之一, r:是
e1:星展集团, e2:约3千5百亿美元资产, r:拥有
e1:业务, e2:18个市场, r:遍及

However, this extractor is about ~70% accurate and is still under improvement at this moment. Feel free to comment and make pull request.

Credits

哈工大社会计算与信息检索研究中心研制的语言技术平台 LTP

About

Chinese Open Information Extraction (Tree-based Triple Relation Extraction Module)


Languages

Language:Python 100.0%