ym1234 / yomichan-dictionaries

A comprehensive collection of Japanese and Chinese dictionaries for Yomichan, including terms, kanji/hanzi info, frequency, and variants.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Yomichan Dictionaries

This repository contains dictionaries for Yomichan/Yomitan, a Japanese dictionary browser extension for Chrome, Firefox, and Edge. The repository was originally created to host the dictionaries I created, but I have since adapted this repository to serve as a hub for other dictionaries as well. If you have a dictionary you would like to share, please open an issue or pull request.

Please note that this repository is not any kind of ranking or endorsement of the listed dictionaries. I use some but not all of the dictionaries listed. Though in general, I recommend installing as many dictionaries as possible for maximum coverage.

Check out my JP Resources

Japanese

Terms

Do check out yomichan-dict-css for CSS that colors some term dictionaries to make them more immediately distinguishable.

example

JP-EN Term Dictionaries

JMDict

Download

The most extensive JP-EN dictionary using data from the EDRDG Project created by Jim Breen. The linked version should be the most up to date, with changed by stephenmk to display crucial information that wasn't previously linked and to improve formatting greatly, adding in example sentences as well.

JMnedict

Download

A dictionary of Japanese proper names. The linked version is advantageous over the one linked on the Yomichan homepage as it clutters the search page much less when searching kana, so it's highly recommended.

Shoui Bilingual Dictionaries Collection

Download

There are various bilingual dictionaries in Shoui's bilingual folder. Check the readme in the folder for further information.

  • 新和英 (Recommended)
    • Same as the 研究社 新和英大辞典 第5版 with better deconjugation but lacking some additional sentences.
  • 研究社 新和英大辞典 第5版

New Saitou Japanese-English Dictionary

Download NEW 斎藤和英大辞典

A bilingual dictionary by an anon, with lots of example sentences. You may want to limit the amount of example sentences to avoid cluttering the search page by using the following CSS, where the number 5 can be changed:

[data-dictionary='NEW斎藤和英大辞典'] ul.gloss-sc-ul > li:nth-child(n + 5) {
  display: none;
}

Japanese Monolingual Dictionaries

Shoui Monolingual Dictionaries Collection

Download

There are various monolingual dictionaries in Shoui's monolingual folder, authored by various people. Check the readme in the folder for further information, and check the explanation on learnjapanese.moe on how to use them. Currently contains:

  • 広辞苑 第七版
  • 三省堂国語辞典 第七版 (Recommended)
  • 実用日本語表現辞典 (Recommended)
  • 新明解国語辞典 第七版 (Recommended)
  • 明鏡国語辞典 第二版 (Recommended)
  • 旺文社国語辞典 第十一版 (Recommended)
    • Converted by irhello and shoui.
  • Weblio 古語辞典
    • Scraped/converted by 昔男/mk68.
  • 精選版 日本国語大辞典
  • 明鏡国語辞典
  • 旺文社国語辞典 第十一版 画像無し
  • 新明解国語辞典 第五版
  • 故事ことわざの辞典
    • Converted by Thermosphere with Yomichan Import
  • 広辞苑 第六版
  • 岩波国語辞典 第六版
  • 大辞林 第三版
  • ハイブリッド新辞林 v2
  • デジタル大辞泉
    • Converted by ッツ.
  • 新明解四字熟語辞典
    • Converted by ッツ.
  • 学研 四字熟語辞典
    • Converted by ッツ.
  • 日本語俗語辞書
    • Scraped/converted by Kartoffel.
  • 漢字源

Iwanami Kokugo Jiten

Download 岩波国語辞典 第八版

A monolingual dictionary made by an anon, with very nice formatting and links for related terms.

Images

1 2 3

Grammar Dictionaries

aiko-tanaka Grammar Dictionaries

Download

A collection of grammar dictionaries scraped and converted by aiko-tanaka. A lot of manual work was put in to creating them to make them parse well, I'd recommend you install all of them. Contains:

  • Nihongo no sensei 毎日のんびり日本語教師
  • E de wakaru 絵でわかる日本語
  • Nihongo Kyoshi JLPT 文法解説まとめ
  • Donna Toki どんなときどう使う 日本語表現文型辞典
  • DoJG 日本語文法辞典(全集)

Other Term Dictionaries

Pixiv

Download

A complete scrape of the public dic.pixiv.net of approximately 500,000 entries, containing a brief summary and links to related articles for each entry. This dictionary is quite extensive and contains entries for a vast amount of proper nouns that would not be in traditional dictionaries. For instance, 和泉妃愛 has an entry as does likely every notable voice actor, manga, mountain, and VTuber in Japan.

Note This is the lite version without readings, a version with kana readings is in the works but it will require a lot of time to scrape.

Warning This dictionary is quite large and may take a long time to import.

hiyori example

niconico-pixiv Terms

Click to expand (obsolete)

Download

Using the information gathered by ncaq for use in an IME, this is a dictionary that can help parse terms that are in both niconico and pixiv's online dictionaries. These online dictionaries are sort of like encyclopedias of the internet, so many terms such as proper nouns not in traditional dictionaries will be found.

ルールベースで IME 辞書の役に立たなそうな単語を除外しています。

surasura Onomatopoeia

Download

A dictionary of onomatopoeia from surasura.com. Contains some onomatopoeia that are not in any other dictionaries. Credit to stephenmk for the idea to mark information using those emojis with his improved JMDict.

For each entry, it contains:

  • A few definitions
  • An extended explanation if available, marked with the ℹ️ emoji
  • A few example sentences marked with the 🇯🇵 flag emoji

surasura

複合語起源 Term Origins

Download | List of words

Compound kunyomi word origins, for example 陥る -> 落ち入る(おち|いる). Information comes from anonymous forum posts, so it may not be 100% accurate.

Sources:

Gogen Yurai

Download

語源由来辞典 parsed from https://gogen-yurai.jp/ by Seikou. Contains a information about the origins of words.

Term Frequency

jpdb Frequency Dictionary

Download

A frequency dictionary based on information scraped from https://jpdb.io in May of 2022. More information can be found here.

Due to the way the data was scraped, some terms are missing frequencies and the jpdb dictionary itself is limited to terms in JMDict. For example, 経緯 only has an entry for the いきさつ reading so it should not be used as a dictionary for sorting (the more common/correct reading is けいい). However, the corpus of JPDB is quite good for immersion learners as it covers anime, dramas, light novels, visual novels, and web novels so the frequencies will be relatively accurate to what you're actually reading. This dictionary is notable for displaying the frequencies of kana readings separately, so you can often get a sense of how often a word is written with kanji or not.

Aozora Bunko Jukugo Frequency

Download

A frequency dictionary created using data collected by vrtm based on the Aozora Bunko. Due to the methodology used, this dictionary does not cover words with kana in them but it covers many rare 熟語 not covered by other frequency dictionaries, such as 睽乖.

CC100

Download

Made by the mind behind arujisho, this uses the CC100 dataset which was made by crawling the web. Coverage is very wide, and there is reason behind the way readings are differentiated which is why I use this as my Yomichan sort dictionary.

Original message by Seikou

Hello everyone! Recently I tokenized the CC-100 Japanese dataset (which is a high quality dataset filtered from Commoncrawl web crawl data, and is about 70GB large) as a corpus using mecab(fugashi) and sudachi, resulting a frequency rank list of about 900k words. After filtering it using several monolingual dictionaries, I got a freq rank list of roughly 160k words.

BCCWJ

Download

From the publication:

The balanced corpus of contemporary written Japanese (BCCWJ) is Japan’s first 100 million words balanced corpus. It consists of three subcorpora (publication subcorpus, library subcorpus, and special-purpose subcorpus) and covers a wide range of text registers including books in general, magazines, newspapers, governmental white papers, best-selling books, an internet bulletin-board, a blog, school textbooks, minutes of the national diet, publicity newsletters of local governments, laws, and poetry verses.

It has extremely wide coverage with most terms you'll encounter having an entry in this list even if other frequency lists don't. In addition, it differentiates between readings quite well. Make sure to install the LUW version as it has more terms.

Innocent Ranked

Download

The Innocent Corpus from the Yomichan page but reordered to be sorted by rank. It is based on data from 5000+ novels. A weakness is that it does not differentiate based on reading, so all readings of a term will show the same value.

jpDicts Frequencies

Download

A frequency dictionary created using monolingual dictionary definitions as the corpus, so it might be useful for those who really like reading dictionaries. Made by Avratzzz.

Dictionaries used:
  • ハイブリッド新辞林 v2
  • 故事ことわざの辞典
  • 漢字源
  • 精選版 日本国語大辞典
  • 新明解四字熟語辞典
  • 学研 四字熟語辞典
  • 実用日本語表現辞典
  • 明鏡国語辞典
  • 旺文社国語辞典 第十一版
  • 新明解国語辞典 第五版
  • 大辞林 第三版
  • デジタル大辞泉
  • 岩波国語辞典 第六版
  • 広辞苑 第六版

Youtube Frequency Dictionaries

Download the full Youtube Frequency Dictionary

Download all domain-specific dictionaries

Using data from 40k manually transcribed YouTube videos we have created 16 domain specific frequency lists for YomiChan. Enjoy and feel free to share around. Created by @Zetta @Vexxed @Anonymous

Domain-specific frequency lists from Youtube Videos:

Domains:
  • Vlogs
  • Vehicles
  • Travel
  • TEDx
  • Sports
  • SciTech
  • Pets/Animals
  • Nonprofits
  • News
  • Music
  • HowtoStyle
  • Gaming
  • Film/Anime
  • Entertainment
  • Education
  • Comedy

Corpus of Everyday Japanese Conversation

Download

Converted by n-manas, based on the Corpus of Everyday Japanese Conversation.

The Corpus of Everyday Japanese Conversation (CEJC) is a vocabulary and word count table based on 200 hours of recorded data (approximately from April 2016 to 2020).

Our project will develop a large-scale corpus of Japanese everyday conversation in a balanced manner. Since informants record their conversations in everyday situations by themselves, naturally occurring conversations can be collected. To build an empirical foundation for the corpus design, we conducted a survey of ordinary conversational behavior of about 250 adults."

Since there were several ranks included in the file, the overall rank was chosen to generate this frequency dictionary.

Shoui Dictionaries Collection Misc. Frequency Dictionaries

Some other miscellaneous frequency dictionaries in the Shoui Dictionaries Collection.

  • Anime & J-drama
  • Narou Freq
  • Novels
  • VN Freq v2
  • Wikipedia v2
  • 国語辞典
  • Nier

OhTalkWho オタク Frequency Dictionaries

Download

Some frequency dictionaries made by this YouTuber OhTalkWho オタク.

  • Netflix
  • Top 100 Shonen
  • Top 100 Slice of Life
  • JLPT Level Tags
  • Novel 5k
    • This might just be innocent corpus with stars?
  • Visual Novels
    • Might be based off vnstats? It's different than the VN Freq v2 in Shoui's Dictionaries Collection.

Anacreon's Frequency Dictionaries

Download

Some frequency dictionaries made by Anacreon that are not rank-based, but rather percentage-based where the displayed value is the percent of that corpus you would be able to read if you knew every word with that percentage or lower. They are somewhat redundant with other previously mentioned dictionaries, but some people may prefer the percentage-based approach.

Frequency is displayed as a number between MOST frequent 0 and LEAST frequent 100. Check out this graph, essentially the number in these dicts are the Y axis of this graph. So if you were aiming for understanding 95% of words you come across the most efficient way would be to mine all the words with a freq less than or equal 95.

Kanji

Yomichan CSS for Kanji Dictionaries

Yomichan and KANJIDIC by default have a lot of bloat in the kanji dictionary viewer, like repeating the kanji stroke order image, frequency information, and unused table rows for every entry. For using multiple kanji dictionaries, you can use some CSS to make the kanji display more compact like it is for terms.

In Settings -> Popup Appearance -> Configure custom CSS... input the following CSS for more compact display of entries.

/* remove misc dict classifications/codepoints/stats */
.kanji-glyph-data > tbody > tr:nth-child(n + 3) {
  display: none;
}

/* remove stroke diagram, freq, header for next entries */
div.entry[data-type='kanji']:nth-child(n + 2) .kanji-glyph-container,
div.entry[data-type='kanji']:nth-child(n + 2) [data-section-type='frequencies'],
div.entry[data-type='kanji']:nth-child(n + 2) table.kanji-glyph-data > tbody > tr:first-child {
  display: none;
}

/* remove 'No data found' */
.kanji-info-table-item-value-empty {
  display: none;
}

/* reduce extra padding */
.kanji-glyph-data,
div.entry[data-type='kanji'],
div.entry[data-type='kanji']:nth-child(n + 2) .kanji-glyph-data > tbody > tr > *,
.kanji-glyph-data dl.kanji-readings-japanese,
div.entry[data-type='kanji']:nth-child(n + 2)
  .kanji-glyph-data
  dl.kanji-readings-chinese[data-count='0'] {
  padding-top: 0 !important;
  padding-bottom: 0 !important;
  margin-bottom: 0em;
  margin-top: 0 !important;
}
/* remove horizontal lines */
.entry + .entry[data-type='kanji'],
div#dictionary-entries > div.entry:nth-child(n + 2) .kanji-glyph-data > tbody > tr > * {
  border-top: none !important;
}
/* change decimal list */
.kanji-gloss-list {
  list-style-type: circle;
}

Kanji Info

KANJIDIC

Download

The KANJIDIC Project's KANJIDIC is the primary English kanji dictionary used in Yomichan and contains information about most kanji, notably English definitions, readings, and some other statistics like stroke count, JLPT, grade level.

Wiktionary Kanji

Download

Kanji information of around 18,000 characters from Wiktionary, notably:

  • 呉音, 漢音, 唐音, 宋音, 慣用音 onyomi readings of kanji (further reading)
  • 字源 - information about how and why a kanji is composed the way it is, including the type of composition it is
  • The meaning of the kanji (in Japanese)
  • The various 異体字 of the kanji

jpdb Kanji

Download

Kanji information of around 6,000 characters from https://jpdb.io:

  • The 15 most common vocab applicable
  • The kanji decomposition according to jpdb (has inaccuracies because it's meant for memorizing keywords)
  • 漢字検定 level
  • 旧字体/新字体/拡張新字体 character form

TheKanjiMap

Download | List of possible phonetic components

Information from TheKanjiMap:

  • Radical information for all radicals
  • Kanji decomposition (more accurate than JPDB)
  • List of all kanji that contain a kanji/component/radical
  • Reading hints based on possible phonetic components (computed based on information from KANJIDIC and the decomposition here)

高更

Kanji Variants

mozc

Download

A kanji dictionary made from the kanji variant information in Google's mozc Japanese IME. Includes information about:

  • 異体字
  • 印刷標準字体
  • 簡易慣用字体
  • 旧字体
  • 略字
  • 正字
  • 俗字
  • 別字
  • 本字

jitai

Download

A kanji dictionary made using the data from jitai. This allows you to see information about 旧字体, 新字体, 拡張新字体, and 標準字体 variants from the kanji page in Yomichan.

Kanji Frequency

Aozora Bunko Kanji Frequency

Download

A kanji frequency dictionary created using data collected by vrtm based on the Aozora Bunko.

Innocent Corpus Kanji Frequency

Download

Uses the innocent corpus frequency list that is distributed with Yomichan to create a rank-based kanji frequency dictionary. This was created because the existing one is an occurence-based list and does not display ranks.

  • The displayed frequency in Yomichan will contain the frequency rank followed by the occurence count, for example 4686 (57) for 壟 indicating it's the 4686th most common kanji and appeared 57 times total in the 5000+ novels in Innocent Corpus.

Wikipedia Kanji Frequency

Download

Rank-based kanji frequency data from a May 2015 dump of Japanese Wikipedia, containing around 2 万 kanji. Data gathered by scriptin.

jpdb Kanji Frequency

Download

Kanji frequency data from https://jpdb.io.

Mandarin Chinese

Terms

Term Dictionaries

Shoui's Chinese Yomichan Setup

Shoui's Chinese Yomichan Setup

Shoui's guide to setting up Yomichan for Chinese, includes links for:

  • [ZH-EN] CEDICT
  • [ZH-JA] 中日大辞典 第二版
  • [ZH-ZH] 兩岸詞典
  • [ZH-ZH] 漢語大詞典
  • [ZH-ZH] 萌典国语辞典 (简体字)

Other Chinese Dictionaries

Not sure who made these, but some dictionaries available on this Google Drive. Includes:

  • 萌典.pinyin
  • 萌典
  • 牛津英汉汉英词典
  • 现代汉语规范词典
  • 譯典通英漢雙向字典
  • 五南國語活用辭典

Chinese Frequency

General Global Chinese Frequency

Frequency

A general Chinese frequency dictionary that is likely based off of "the chinese internet, movies, books, etc as a whole" according to its author Kamui.

Hanzi

See Yomichan CSS for Kanji Dictionaries for CSS used to reduce the clutter included by default in Yomichan.

Warning The default kanji stroke order font included with Yomichan is made for kanji stroke orders, and as thus will contain incorrect glyphs and stroke orders for Chinese that may be misleading. You can change this by using some CSS:

.kanji-glyph {
  font-family: sans-serif; /* or a whatever font you prefer for Chinese */
}

Wiktionary Hanzi

Download

Hanzi information of nearly 100,000 characters from ZH Wiktionary. Due to the complexity of the wiktionary pages, it will display most of the text on the page, excluding tables and such so the pinyin readings may not be included for many characters. In addition, do note that for some uncommonly used characters there is little information available as the wiki pages often consist of just unicode information and code points, which was stripped from the dictionary.

zh wiktionary hanzi

Cantonese

Terms

CantoDict

Download

CantoDict was a Cantonese-English dictionary created and maintained by Adam Sheik and public contributors. It was abandoned, but the data was archived thanks to awong-dev at https://github.com/awong-dev/cantodict-archive. This dictionary is based off of the archived data.

canto_please canto_read

Misc Dictionaries

Download

Thanks to richter_belmont on the Refold Cantonese Discord:

I converted all of the Migaku dictionaries from the "Learn Cantonese!" shared folder on Google Drive into Yomichan dictionaries. List of dictionaries available are:

  • Canto CEDICT
  • CC-Canto
  • CE Wiktionary
  • Words.hk C-C
  • Words.hk C-E

Other

Japanese-Mongolian

Japanese-Mongolian/日・モ辞典

Download | No example sentences version

A Japanese to Mongolian dictionary scraped from 栗林均's site. It contains about 19,000 entries.

現代日・モ辞典 橋本勝、エルデネ・プレブジャブ『現代日本語モンゴル語辞典』春風社、2001.

jp-mongolian

About

A comprehensive collection of Japanese and Chinese dictionaries for Yomichan, including terms, kanji/hanzi info, frequency, and variants.


Languages

Language:JavaScript 100.0%