brainsqueezer / hunspell_dictionaries

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Name: hunspell dictionaries
URL: http://bgoffice.sourceforge.net/
     http://code.google.com/p/tr-spell/
     http://extensions.services.openoffice.org/dictionary
     http://extensions.services.openoffice.org/en/project/dict_et
     http://extensions.services.openoffice.org/en/project/dict_lv_LV
     http://extensions.services.openoffice.org/en/project/dict-nl
     http://extensions.services.openoffice.org/project/dict_ru_RU
     http://extensions.services.openoffice.org/project/pl-dict
     http://ftp.linux.ee/pub/openoffice/contrib/dictionaries/et_EE.zip
     http://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries/de_DE_neu.zip
     https://github.com/brown-uk/dict_uk
     http://openoffice.rs/dict-sr/
     https://addons.mozilla.org/en-US/firefox/addon/albanian-dictionary/
     https://addons.mozilla.org/en-US/firefox/addon/thamizha-solthiruthi/
     https://addons.mozilla.org/en-US/firefox/addon/turkce-imla-denetimi/
     https://dsso.googlecode.com/files/sv-2.12.zip
     http://sourceforge.net/projects/magyarispell/
     https://spellcheck-ko.googlecode.com/files/ko-aff-dic-0.5.6.zip
     http://tajlingvo.tj/project/TajikHunSpellDictionary/tg_TG.zip
     https://tr-spell.googlecode.com/files/firefox-tr-dict-v0.3.2.xpi
     http://wiki.services.openoffice.org/wiki/Dictionaries
     http://wordlist.sourceforge.net/
     http://www.broffice.org/
     http://www.dicollecte.org/
     http://www.justlocal.com.au/
     http://lilak-project.com/
     http://techiaith.cymru/hunspell/cy_GB/
     https://github.com/hyspell/HySpell_3.0.1/tree/master/Dictionaries/Dictr
     https://github.com/msmiljan/srdict_chromium

License: MPL 1.1/GPL 2.0/LGPL 2.1/LPGL 3.0
Security Critical: no
Version: unknown

This folder contains the following:

.dic files
.aff files
.bdic files
.dic_delta files
README_<language>_<region>.txt files

The .bdic files are binary files, generated from the corresponding .dic and .aff
files, using convert_dict (chrome\tools\convert_dict). These binary files are
used by the spellchecker. The .dic_delta files are used to add words which are
not there in the .dic files. Irrespective of the encoding of the corresponding
.dic file, the .dic_delta files are encoded as UTF-8.

The final binary file, .bdic, is generated with words from the .dic and
additional words from the .dic_delta file. In order to get the current-most bdic
file versions, it is a good idea to rebuild them using convert_dict from the
.dic, .aff and .dic_delta files in this folder. convert_dict takes care of
duplicate entries present both in .dic and .dic_delta files.

The README_<language>_<region>.txt files contain information about the
individual dictionaries, including copyright information.

The .bdic files are versioned to force clients to download new versions when
necessary. Use the same version for all the dictionaries that you add at the
same time. Increment the major version number (5) if you're updating either dic
or aff files. Increment the minor version number (0) if you're updating only
dic_delta files. If you add or update dictionaries, make sure to update the
constants in chrome\common\spellcheck_common.cc.

Note: the encodings for these files are usually only UTF-8 for English
dictionaries. Otherwise, they could be anything. This will lead to
errors when trying to upload so in general, you'll just have to upload
your change and cross your fingers.

The following 39 dictionaries have been appended with or without additional
words using the .dic_delta files (as of December 14th 2012), and are covered
under the existing GPL/LGPL/MPL tri-license in COPYING:

af_ZA:  No changes

bg_BG:  No changes

ca_ES:  Added words
        Added NOSUGGEST flag = ! to .aff file
        Added two words to .dic file with the ! flag to mark them forbidden/nosuggest.

cs_CZ:  Added words

cy_GB:  No changes

da_DK:  Added words
        Changed "øvrigt/mk" to "øvrigt/" in da_DK.dic to make convert_dict work.

de_DE_neu:  Added words
        Removed "Information" from dic_delta in favor of "Information/P" to fix
            the spelling of "Informationen"

en_AU:  Added the same words as en_CA.

en_CA:  Added words

en_GB:  Added the same words as en_CA.

en_US:  Added the same words as en_CA.

es_ES:  Added words

et_EE:  No changes

fa_IR:  Removed IGNORE from affix file, because we don't support it.

fo_FO:  No changes

fr_FR:  Added words to 4.8 (modern) downloaded from
        http://www.dicollecte.org/download.php?prj=fr

he_IL:  Added words

hi_IN:  Added words

hr_HR:  Added words

hu-HU:  Added words

hy:     Removed a newline on line 6002 (word and affixes were split on 2 lines,
          probably by mistake, and that broke our convert_dict)

id_ID:  Added words

it_IT:  Added words
        Added NOSUGGEST flag = % to .aff file
        Added three words to .dic file with the ! flag to mark them forbidden/nosuggest.

ko:     Removed a line from dic that was longer than 128 characters

lt_LT:  Added words.
        Added NOSUGGEST flag = ! to .aff file
        Added some words to .dic file with the ! flag to mark them forbidden/nosuggest.

lv_LV:  Added words

nb_NO:  Added words

nl_NL:  Added words

pl_PL:  Added words

pt_BR:  Added words

pt_PT:  Added words

ro_RO:  Added words

sh:     Replaced with new rule based dictionary

sk_SK:  Added words

sl_SI:  Added words
        Added NOSUGGEST flag = ! to .aff file
        Added five words to .dic file with the ! flag to mark them forbidden/nosuggest.
        Changed "Ÿvrklji/N" to "Ÿvrklji/" in sl_SI.dic to make convert_dict work.

sq:     No changes

sr:     Replaced with new rule based dictionary

sv_SE:  Added words
        Changed "ω/r" to "ω/" in sv_SE.dic to make convert_dict work.

ta_IN:  Removed the word "சர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர" to make convert_dict work.

tr_TR:  No changes

uk_UA:  No changes

vi_VN:  Added words

The following dictionary has a BSD-like license in README_ru_RU.txt:

ru_RU:  Added words

The following dictionary has an Apache license in COPYING.Apache:

tg_TG:  No changes

On Jan 20, 2011, en_US.dic_delta was copied to en_AU.dic_delta and
en_GB.dic_delta so these locales would get the same additional words.

On Dec 26, 2012, we reran convert_dict on all dictionaries to add MD5 checksum
to the .bdic files, added sq, ta, and ko dictionaries, and bumped the versions
up to 3-0.

On Jan 8, 2013, we added back et_EE and tr_TR .dic and .aff files to add MD5
checksums to their .bdic files. We used version 3-0 again, because these files
were not updated with the rest of the dictionaries on Dec 26, 2012.

On Jan 9, 2013, we changed the download location of tr_TR from a Firefox website
to a Google code website. The only difference between the two locations is that
the tr_TR affix file from the Google code website includes a "FLAG num"
directive, which prevents Chrome's heapcheck tests from crashing. Given that aff
file changed, we used version 4-0.

On Mar 4, 2014, we added tg_TG dictionaries for Tajik language. We used 5-0
version to denote a new batch of dictionaries.

On Oct 28, 2014, we updated the en dictionaries to the latest upstream version.
We also added a bunch of words from bugs that were being flagged as incorrect.
Given that the aff & dic files changed, we used version 4-0.

On Feb 26, 2015, we updated the en-US and en-CA dictionaries to the latest
upstream version from SCOWL, but made sure that "alot -> a lot" correction does
not disappear, as it is not in SCOWL yet. The updated dictionaries support
typographical apostrophes. Because both aff and dic files changed, we
incremented major version to 6-0. That's 1 higher than the highest major version
of any dictionary.

On Mar 10, 2015, we reran convert_dict on en-US and en-CA dictionaries to fix
the missing REP rules.

On Mar 18, 2016, we added "Lilak" dictionary for Persian Language (fa-IR) and
updated en-US, en-CA and en-GB dictionaries from SCOWL. Added several words to
en*.dic_delta files. Because both aff and dic files changed, we incremented
major version to 7-0. That's 1 higher than the highest major version of any
dictionary.

On Oct 13, 2017, we updated the en dictionaries to the latest upstream version.
Since SCOWL now has an AU dictionary, we changed to using that to generate our
AU dictionary instead of copying the CA version. Because dic files changed, we
incremented the major version to 8-0.

On Aug 20, 2019, we replaced existing no aff rules based Serbian dictionaries
with new aff rule based dictionaries (1800+ rules). That applies to both sr-*
and sh-* dictionaries (Cyrillic and Latin alphabets respectively). New
dictionaries are not continuation of old ones but keep the same file names,
so we just incremented major version from 3-0 to 4-0. Original unaltered new
dictionaries are located at: https://github.com/msmiljan/srdict_chromium

On Jan 8, 2020, we updated the en and fa dictionaries to the latest upstream
versions. Because dic files changed, we incremented the major version to 9-0.

About

License:Apache License 2.0


Languages

Language:Python 100.0%