Use GitHub Releases for assets
wooorm opened this issue · comments
👋
GitHub Releases makes it easy to:
- watch releases
- see the changelog
- download assets, which is the main reason I’m opening this issue: I’ve been trying for a couple hours on macOS to build the hunspell .aff and .dic files but can’t seem to make it work 😩
Extra info: I package hunspell dictionaries for use in the JS ecosystem: wooorm/dictionaries, and similar projects (such as this one) do offer this feature.
Can you post the error message? Until an official version is available you can use an unofficial version from here.
Sweet, thanks!
Setup
So, I’m on macOS (latest), and updated to use GNU stuff:
- ispell:
brew install ispell
- hunspell:
brew install hunspell
- coreutils:
brew install coreutils
- sed:
brew install gnu-sed
- awk:
brew install gawk
- m4:
brew install m4
Where specified in installation logs I’ve put them in PATH to overwrite the Apple versions of tools.
I’ve updated the max open files:
$ launchctl limit maxfiles
maxfiles 65536 2000000
Usage
After cloning this repo I do:
$ LC_ALL=C make myspell
Yields:
===> magyar myspell alapszótár (magyar4myspell.dict) előállÃtása
==> szimbolikus kötések létrehozása a szotar.konf alapján
konfigurációs állomány nincs megadva, alapértelmezett a szotar.konf
/Users/tilde/Downloads/oss/magyarispell
. Rendben.
==> szótárak egybemásolása
.recode: Invalid input in step `UTF-8..ISO-8859-2'
recode: Invalid input in step `UTF-8..ISO-8859-2'
.recode: Invalid input in step `UTF-8..ISO-8859-2'
recode: Invalid input in step `UTF-8..ISO-8859-2'
.recode: Invalid input in step `UTF-8..ISO-8859-2'
recode: Invalid input in step `UTF-8..ISO-8859-2'
..............recode: Invalid input in step `UTF-8..ISO-8859-2'
recode: Invalid input in step `UTF-8..ISO-8859-2'
..................recode: Invalid input in step `UTF-8..ISO-8859-2'
recode: Invalid input in step `UTF-8..ISO-8859-2'
................recode: Invalid input in step `UTF-8..ISO-8859-2'
recode: Invalid input in step `UTF-8..ISO-8859-2'
.recode: Invalid input in step `UTF-8..ISO-8859-2'
recode: Invalid input in step `UTF-8..ISO-8859-2'
.recode: Invalid input in step `UTF-8..ISO-8859-2'
recode: Invalid input in step `UTF-8..ISO-8859-2'
.recode: Invalid input in step `UTF-8..ISO-8859-2'
recode: Invalid input in step `UTF-8..ISO-8859-2'
.recode: Invalid input in step `UTF-8..ISO-8859-2'
recode: Invalid input in step `UTF-8..ISO-8859-2'
.recode: Invalid input in step `UTF-8..ISO-8859-2'
recode: Invalid input in step `UTF-8..ISO-8859-2'
.sed: 1: "/Users/tilde/Downloads/ ...": undefined label 'ilde/Downloads/oss/magyarispell/tmp/fonev_osszetett.1'
Rendben.
# hy: összetétel
-n .
==> igébÅ‘l képzett alakok előállÃtása
.......... Rendben.
==> igék
... Rendben.
==> kivételek
-n .
-n .
-n .
-n .
Rendben.
==> névszók
.../usr/bin/awk: fonev_igekoto.1 makes too many open files
source line number 30
./usr/bin/awk: fonev_igekoto.1 makes too many open files
source line number 30
............/usr/bin/awk: fonev_igekoto.1 makes too many open files
source line number 30
./usr/bin/awk: fonev_igekoto.1 makes too many open files
source line number 30
/usr/bin/awk: fonev_igekoto.1 makes too many open files
source line number 30
/usr/bin/awk: fonev_igekoto.1 makes too many open files
source line number 30
.. Rendben.
==> morfológiai kódok
-n .
==> tiltott szavak
.recode: Invalid input in step `UTF-8..ISO-8859-2'
..... Rendben.
Rendben.
===> ragozási táblázat (magyar.aff) előállÃtása
===> myspell ragozási táblázat (hu_HU.aff) előállÃtása
===> myspell szótár (hu_HU.dic) előállÃtása
awk: cmd. line:2: (FILENAME=tmp/allomorf.txt FNR=1) warning: Invalid multibyte data detected. There may be a mismatch between your data and your locale.
===> Unicode karakterkódolású állományok előállÃtása
===> TömörÃtett Hunspell szótárak elkészÃtése
output: hu_HU_u8_alias.dic, hu_HU_u8_alias.aff
output: hu_HU_u8_gen_alias.dic, hu_HU_u8_gen_alias.aff
Results
Actual
$ tail hu_HU_u8.aff -n20
PFX ! 0 nyolc/% . [adj_num]+
PFX ! 0 nulla/% . [adj_num]+
PFX ! 0 negyven/% . [adj_num]+
PFX ! 0 millió/% . [adj_num]+
PFX ! 0 milliárd/% . [adj_num]+
PFX ! 0 két/% . [adj_num]+
PFX ! 0 kilencven/% . [adj_num]+
PFX ! 0 kilenc/% . [adj_num]+
PFX ! 0 húsz/% . [adj_num]+
PFX ! 0 hét/% . [adj_num]+
PFX ! 0 három/% . [adj_num]+
PFX ! 0 hetven/% . [adj_num]+
PFX ! 0 hatvan/% . [adj_num]+
PFX ! 0 hat/% . [adj_num]+
PFX ! 0 harminc/% . [adj_num]+
PFX ! 0 fél/% . [adj_num]+
PFX ! 0 ezer/% . [adj_num]+
PFX ! 0 egy/% . [adj_num]+
PFX ! 0 billió/% . [adj_num]+
Expected
$ tail hu_HU_u8.aff -n20
PFX ! 0 nyolc/% .
PFX ! 0 nulla/% .
PFX ! 0 negyven/% .
PFX ! 0 millió/% .
PFX ! 0 milliárd/% .
PFX ! 0 két/% .
PFX ! 0 kilencven/% .
PFX ! 0 kilenc/% .
PFX ! 0 húsz/% .
PFX ! 0 hét/% .
PFX ! 0 három/% .
PFX ! 0 hetven/% .
PFX ! 0 hatvan/% .
PFX ! 0 hat/% .
PFX ! 0 harminc/% .
PFX ! 0 fél/% .
PFX ! 0 ezer/% .
PFX ! 0 egy/% .
PFX ! 0 billió/% .
It looks like the release on https://github.com/crash5/mozilla-hungarian-spellchecker/releases/tag/2023.12.25.04.07 has embedded html entities in the file: ‰
FORBIDDENWORD w
WORDCHARS -.‰§%°0123456789–€''&ffi;&ffl;&ff;&fi;&fl;
I'm simply building it with the commands from the readme so I can't really do anything with it.
From a quick look at the git history and the makefiles these codes will be replaced with their unicode counterparts during the unicode output generation. bin/l1_u8.sed and bin/u8myspell
I don't know myspell so I can't tell whether it is a bug if it left in the output for non-unicode version or not. Maybe @laszlonemeth or @tgyurci can answer this question.
As I see it is good in the unicode outputs (hu_HU_u8*). Can you use these instead the non-unicode versions?