ijazul-haq / pashto_pos

Pashto Part of Speach Tagger

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pashto Part-of-speech (POS) Tagger

Pashto POS Tagger is a Machine Learning Model trained on the Pashto Corpus using the Conditional Random Fields (CRF) algorithm.

This repository contains the source code for the paper “The Pashto Corpus and Machine Learning Model for Automatic POS Tagging”.

The Pashto Corpus used to train the model consists of 2 million words manually tagged for POS information. A sample of the dataset (Corpus) is available in the “data” directory and a pre-trained model is in the “models” directory.

About

Pashto Part of Speach Tagger


Languages

Language:Jupyter Notebook 100.0%