mocobeta / janome

Japanese morphological analysis engine written in pure Python

Home Page:https://mocobeta.github.io/janome/en/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Janome

https://coveralls.io/repos/github/mocobeta/janome/badge.svg?branch=master https://badges.gitter.im/org.png https://img.shields.io/conda/v/conda-forge/janome

Janome is a Japanese morphological analysis engine written in pure Python.

General documentation:

https://mocobeta.github.io/janome/en/ (English)

https://mocobeta.github.io/janome/ (Japanese)

Requirements

Python 3.7+ is required.

Install

[Note] This consumes about 500 MB memory for building.

(venv) $ pip install janome

Run

(venv) $ python
>>> from janome.tokenizer import Tokenizer
>>> t = Tokenizer()
>>> for token in t.tokenize('すもももももももものうち'):
...     print(token)
...
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も    助詞,係助詞,*,*,*,*,も,モ,モ
もも  名詞,一般,*,*,*,*,もも,モモ,モモ
も    助詞,係助詞,*,*,*,*,も,モ,モ
もも  名詞,一般,*,*,*,*,もも,モモ,モモ
の    助詞,連体化,*,*,*,*,の,ノ,ノ
うち  名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ

License

Licensed under Apache License 2.0 and uses the MeCab-IPADIC dictionary/statistical model.

See LICENSE.txt and NOTICE.txt for license details.

Acknowledgement

Special thanks to @ikawaha, @takuyaa, @nakagami and @janome_oekaki.

Copyright

Copyright(C) 2015-2023, Tomoko Uchida. All rights reserved.

About

Japanese morphological analysis engine written in pure Python

https://mocobeta.github.io/janome/en/

License:Apache License 2.0


Languages

Language:Python 98.7%Language:Shell 1.2%Language:Batchfile 0.0%