j0ma / in-geveb-corpus

Corpus of Yiddish based on literary articles published in "In Geveb"

Home Page:https://ingeveb.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

in-geveb-corpus

Corpus of Yiddish based on https://ingeveb.org

TODO

  • [] Clean up yud-yud => tsvey-yudn ligatures

Directory structure

.
├── LICENSE
├── README.md
├── corpus
│   └── scraped articles and CSVs go here
├── data
│   └── other data files (eg. article links) go here
├── src
│   └── various scripts used to create the corpus go here

About

Corpus of Yiddish based on literary articles published in "In Geveb"

https://ingeveb.org

License:GNU General Public License v3.0


Languages

Language:Python 69.5%Language:Shell 30.5%