Tarikhoza / JapaneseDataHorder

Data-scraper for various japanese learning tools(Takoboto, Tatoeba, Ichimoe and OJAD)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JapaneseDataHorder

Data-scraper for various japanese learning tools(Takoboto, Tatoeba, Ichimoe and OJAD)

  • This is a work in progress

A collection of web scraping modules written, to collect data for the UltimateJLPTDeck. The collection consists of four modules:

Takoboto - Works

To get the vocabulary lists from the website

Tatoeba - Works

To download the sentence pair tsv from the website

Ichimoe - Works

For the sentence deconstruction feature

OJAD - Not finished yet

For the Suzuki-kun pitch accent audio and graph generartor

There are also other features on the websites that I didn't implement in my scripts. If there is a need it is possible to implement. But I didn't need them so I didn't implement them. Please if you use these scripts, please have in mind that it might create a lot of stress on the websites, so be mindful about what you do. Please use the data according the licences of the websites downloaded from.

About

Data-scraper for various japanese learning tools(Takoboto, Tatoeba, Ichimoe and OJAD)


Languages

Language:Python 100.0%