phiradet / torchtext-th

A Thai word segmentation library, built on PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TorchText TH

A (beta version) Thai word segmentation library built on PyTorch

Usage

>>> from typing import List
>>> from torchtext_th.tokenizer import get_tokenizer
>>> tokenizer = get_tokenizer("artifact/emb150_bilstm512_1layer.pt")
>>>
>>> input_text: str = "ลองทดสอบโปรแกรมตัดคำด้วย PyTorch ง่ายๆ จร้าาาา ถถถ"
>>> output: List[str] = tokenizer.tokenize(input_text)
>>> print("|".join(output))
ลอง|ทดสอบ|โปรแกรม|ตัด|คำ|ด้วย| |PyTorch| |ง่าย|| |จร้าาาา| |ถถถ

About

A Thai word segmentation library, built on PyTorch

License:MIT License


Languages

Language:Python 100.0%