Kurdish-BLARK
This project composes of two components. The first one is a set of basic tools which have been developed as part of Kurdish BLARK project (see https://www.researchgate.net/profile/Hossein_Hassani11). The second is corpora of Kurmanji and Sorani dialects of Kurdish. The tools have been developed in Python (2.7). The tools currently include: a transliterator that transliterates Persian/Arabic texts into Latin script, a tokenizer which tokenizes the texts and uses RE to remove special characters and numeral tokens, a stemmer to find Kurmanji and Sorani stems, a word level literal translator based on a bidialectal dictionary to perform a literal translation from Kurmanji to Sorani and vice versa, a Kurdish proper names recognizer, and several other tools for building dictionaries and keeping them sorted. The codes include comments which help in understanding the logics.