mbanon / loomchild-segment-py

Python module to interface with Java Loomchild sentence segmenter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

loomchild-segment

A python module for interfacing with Java sentence splitter Loomchild. This package is aimed to be used in Bifixer and/or Bitextor

System dependencies to build and use this package are Maven and Java.

Installation

This package can be installed with pip from pypi:

pip install loomchild-segment

Usage

Splitting a text into sentences:

from loomchild.segmenter import LoomchildSegmenter

segmenter = LoomchildSegmenter(lang)
# segmenting a single line:
segments = segmenter.get_segmentation(input_line)
print("\n".join(segments))

# segmenting a document (i.e. multiple line breaks in the input)
segments = segmenter.get_document_segmentation(input_text)
print("\n".join(segments))

A command line tool is provided to work with base64 encoded documents.

cat b64encoded_input | py-segment -l $LANG > b64encoded_output

About

Python module to interface with Java Loomchild sentence segmenter

License:GNU General Public License v3.0


Languages

Language:Python 100.0%