0xdolan / wordcloud-for-kurdish-poems

Word-cloud for Kurdish Poems

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kurdish Poems Word-Cloud Project

This project generates word-cloud images for Kurdish poems. It includes 117 poets from the allekok repository and compares the word frequency results for all poems with those of the famous Kurdish poet Mamosta Hemin.


To use this project, follow these steps:

Install the required Python packages:

pip install -r requirements.txt

Below packsges needed to be built for supporting Kurdish language characters perfectly by matplotlib and wordcloud libraries:


Library/Package Installation:

- name: freetype
    - Clone the freetype repository:
      - git clone https://gitlab.freedesktop.org/freetype/freetype.git
    - Standard build with `configure`:
      - Depends on the following packages:
        - automake (1.10.1)
        - libtool (2.2.4)
        - autoconf (2.62)
      - To resolve, run:
        - sudo apt install libtool autotools-dev automake
      - Run:
        - sh autogen.sh

- name: graphite2-1.3.14
    - For detailed installation instructions, refer to:
      - https://www.metricfire.com/blog/how-to-install-and-configure-graphite-on-ubuntu/#Installing-Graphite-on-Ubuntu-1604
    - Use Docker to run Graphite:
      - docker run -d --name graphite --restart=always -p 81:80 -p 2003-2004:2003-2004 -p 2023-2024:2003-2004 -p 8125:8125/udp -p 8126:8126 graphiteapp/graphite-statsd

- name: harfbuzz
    - For detailed build instructions, refer to:
      - https://github.com/harfbuzz/harfbuzz/blob/main/BUILD.md
    - Install the required packages:
      - sudo apt install meson pkg-config ragel gtk-doc-tools gcc g++ libfreetype6-dev libglib2.0-dev libcairo2-dev
    - Clone the harfbuzz repository:
      - git clone https://github.com/harfbuzz/harfbuzz
    - Build and test:
      - meson build
      - meson test -C build

- name: libraqm
    - Install the required packages:
      - sudo apt install libfreetype6-dev libharfbuzz-dev libfribidi-dev meson gtk-doc-tools
    - For fribidi, clone the repository:
      - git clone https://github.com/fribidi/fribidi
      - sh autogen.sh
    - Clone the libraqm repository:
      - git clone https://github.com/HOST-Oman/libraqm
    - Build and install:
      - meson build
      - ninja -C build
      - ninja -C build install

Source of the used font vazirmatn in this project:



To generate word-cloud images, run the following commands:

python read_poems.py && python get_and_generate_wordclouds.py

This will clone the allekok-poems from its repository, create word-frequency files in json format, and generate word-cloud images for each poet with their names as directories for the photos and one for each poem separately.


117 poets
341 directories
10,658 poem files
261,788 lines (after cleaning)
1,849,262 words
10,151,576 characters

The top five words used throughout all poems are:

        "entry": "و",
        "frequency": 94940
        "entry": "لە",
        "frequency": 47199
        "entry": "بە",
        "frequency": 37435
        "entry": "بۆ",
        "frequency": 20258
        "entry": "کە",
        "frequency": 18956

To provide an example of the project's functionality, I conducted a comparison between all the poems and only those written by Mamosta Hemin. The first 8 lines of the results are displayed in the following screenshot:

comparing all poems and mamosta hemin's poems

All Poems Word-Cloud

  • taking into consideration one-character words


  • without considering one to three character words


Mamosta Hemin's Poems Word-Cloud

  • taking into consideration one-character words


  • without considering one to three character words


Additionally, the project includes a JSON file with the word frequency results for all poems.


This project uses the allekok repository, which includes 117 Kurdish poets. The word-cloud generation is based on the Python package wordcloud with helping other packges which support Kurdish characters perfectly such as:

  • freetype-2.13.0
  • graphite2-1.3.14
  • harfbuzz
  • libraqm


Word-cloud for Kurdish Poems


Language:Python 98.4%Language:Shell 1.6%