mdd-repo / anki-card-scraping-tools

A collection of language-learning Anki cards and the Python scripts used for generating them.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Anki Card Scraping Tools

A collection of Python scripts used for generating Anki cards through web scraping and file scraping.
ctrl + shift + s downloads target files after clicking the link to them; key files linked below.

I love language learning and using Anki.
Now, ideally, I would just type entire dictionaries1 into the card creator by hand. Unfortunately life is short and I do not have time to do that for everything. So I automate the process in Python whenever I run into a resource I need to scrape for vocabulary.


Wiktionary: Persian lemmas (Persian / Farsi Anki Cards)

This card deck was last scraped on April 30th, 2024.
I cannot fully explain what possessed me to do this, beyond "I will use it."
Will anyone else? Maybe. Maybe not. I still wanted to do it.

This is a web scraped Anki deck pack that includes the entire Wiktionary database for words and phrases in Farsi/Persian. Incredibly hefty at 13,383 cards. They are sorted by part of speech, just as the Persian lemmas page divides them, and from there the script picks out the word, pronunciation, definitions, and etymology; redundant entries are passed over in favor of those with definitions. Due to the sheer quantity, expect minor formatting errors. Due to the nature of Wikipedia/Wiktionary, be critical.

Included: the Python code, the spreadsheets, and the Anki deck pack.
Categories: adjectives, adverbs, conjunctions, determiners, interjections, morphemes, multi-word terms, nouns, numerals, particles, phrases, prepositions, pronouns, verbs.

I have taken the liberty of removing any racial and ethnic slurs that I could find; this process may not have been perfect. Words that indicate sexual behavior and contact remain in the pack from a standpoint of linguistic knowledge. Please be aware of this when using the cards.

Available on AnkiNet #1446159529


Muskogee / Mvskoke Language Web Dictionary (Mvskoke Anki Cards)

This card deck was last scraped from the April 24th, 2024 update.
The wonderful teachers at Mvskoke Opunvkv have included an online edition of the 2000 "A Dictionary of Creek / Muskogee" reference book compiled by Jack B. Martin and Margaret McKane Mauldin, viewable here.2

Included: the Python code, the spreadsheet, and the Anki deck made from the current release.

Available on AnkiNet #2044931447.

Footnotes

  1. Recommendation for the whole-dictionary method: After importing into Anki, click on the deck, then Browse, then shift-click the first and last cards to select everything. Right click, and Toggle Suspend. Go section by section, or by words as you learn them individually, and un-Toggle Suspend, so as to not be overwhelmed. You can also use this to construct your own frequency lists, particularly for languages that do not have readily available lists to study.

  2. This web edition is still in its drafting stages. According to the roadmap, there will be several more rounds of community review before its final version is made public.

About

A collection of language-learning Anki cards and the Python scripts used for generating them.

License:Creative Commons Zero v1.0 Universal


Languages

Language:Python 100.0%