nlp named-entity-recognition pos-tagging vietnamese-tokenizer sentence-segmentation ner parsing vietnamese-nlp nlp-toolkit

VnCoreNLP đã có Python wrapper trên repo chính thức.

VnCoreNLP: https://github.com/vncorenlp/VnCoreNLP

Setup

$ pip install py4j

Copy VnCoreNLP.jar, vncorenlp.py and models to your project in the same directory

Example

See example.py

from vncorenlp import VnCoreNLP

txt = 'học sinh học sinh học'

# Init & load model
vncore_nlp = VnCoreNLP(annotators="wseg pos ner parse")

# Use tokenize only
print(vncore_nlp.tokenize(txt, str=True))
print()
print(vncore_nlp.tokenize(txt, str=False))
print()
print(vncore_nlp.extract(txt))

Output:

học_sinh học_sinh học

['học_sinh', 'học_sinh', 'học']

[
    ['học_sinh', 'N', 'O', '3', 'sub'], 
    ['học_sinh', 'N', 'O', '1', 'nmod'], 
    ['học', 'V', 'O', '0', 'root']
]

Update new VnCoreNLP version

Clone or Download VnCoreNLP

$ git clone https://github.com/vncorenlp/VnCoreNLP

Build VnCoreNLP.jar from VnCoreNLP project

Copy Tokenizer.java to VnCoreNLP project

$ cp Tokenizer.java /path/VnCoreNLP/src/main/java/vn/

Build jar for Tokenizer.java main class

Copy ./models dir and new .jar file to this repository

About

A python wrapper for VnCoreNLP

https://github.com/vncorenlp/VnCoreNLP

nlp named-entity-recognition pos-tagging vietnamese-tokenizer sentence-segmentation ner parsing vietnamese-nlp nlp-toolkit

Languages

Language:Python 58.2%Language:Java 41.8%