deepanprabhu / fastbpe

Java library implementing Byte-Pair Encoding Tokenization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fastbpe

Java implementation of Neural Machine Translation of Rare Words with Subword Units

fastbpe implements byte pair encoding tokenization in java. The library is supposed to be

  • simple,
  • fast,
  • handle large volumes of text - minimum 1 gb,
  • flawless with tests,
  • ready for alpha testing soon.

References,

About

Java library implementing Byte-Pair Encoding Tokenization

License:MIT License


Languages

Language:Java 100.0%