mzmmoazam / kashmiri_dataset

Data and tool to fetch kashmiri text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kashmiri Dataset

This repository contains data and the tool used to collect this dataset.

Data folders

  • downloaded_content/

    This folder contains word pronouncations, pdf's, docs and html files that contain kashmiri.

  • csv_files/

    This folder contains csv files that contain data from kashmiri dictionaries and text from websites.

Note: Find the zip files for these folders in compressed_data/

Some of the zip files maybe split into multiple parts, you will be get the actual zip file by running the following command :

cat zip_filename.zip.part* > zip_filename.zip


To know about working of the tool click here

Last but not least I would like to thank the websites from where I have collected this data


About

Data and tool to fetch kashmiri text


Languages

Language:HTML 99.9%Language:Python 0.1%