ammarasmro / Kurdish-Language

Applications of NLP on the Kurdish language

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kurdish Language

This repo will be an attempt to try different NLP techniques on the Kurdish Language

The main challenge is that there are only two or three official datasets about this language.

The current work is focused on the speech recognition task.

Short term plan

  • Text Preprocessing
  • Audio Preprocessing
  • Train a simple RNN for this task

Long term goals

  • Build an end to end ASR Pipeline
  • Use a language model

The Pipeline

  1. Get raw .sph files
  2. Convert .sph to .wav format sph to wav converter
  3. Convert .wav to pcm-16 wav pcm
  4. Turn audio and transcripts to a json representation Data Jsonifier
  5. Split data into training and validation corpora

About

Applications of NLP on the Kurdish language


Languages

Language:HTML 81.3%Language:Jupyter Notebook 17.1%Language:Python 1.6%Language:Shell 0.0%