KurdishBLARK / Kurdish-BLARK-Basic

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kurdish-BLARK

This project composes of two components. The first one is a set of basic tools which have been developed as part of Kurdish BLARK project (see https://www.researchgate.net/profile/Hossein_Hassani11). The second is corpora of Kurmanji and Sorani dialects of Kurdish. The tools have been developed in Python (2.7). The tools currently include: a transliterator that transliterates Persian/Arabic texts into Latin script, a tokenizer which tokenizes the texts and uses RE to remove special characters and numeral tokens, a stemmer to find Kurmanji and Sorani stems, a word level literal translator based on a bidialectal dictionary to perform a literal translation from Kurmanji to Sorani and vice versa, a Kurdish proper names recognizer, and several other tools for building dictionaries and keeping them sorted. The codes include comments which help in understanding the logics.

About

License:GNU Affero General Public License v3.0


Languages

Language:Python 98.0%Language:MATLAB 2.0%