Fanaen / Hunspell2WordList

Java library/tool generating every possible word from Hunspell dictionaries

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Help to use

Alessio-R opened this issue · comments

Hi, this isn't an issue but a request to help.
Sorry for that, but I'm not an expert and, since I think that your application could help me a lot, I need to use it but I don't know how to do. I have both the .dic and .aff from the dictionary that I need to obtain.

No need for apologies. The instructions are rather succinct.

First, you'll need a Java runtime on your computer. Depending on your platform, you can get it here or install it with your package manager (on GNU/Linux for example).

Then, download the latest version and extract it the folder of your choice. Put the .dic/.aff files in a subfolder named data.

Then, open a terminal/shell/command-line (cmd.exe on Windows, reach it via research on start menu) and go to this folder with the command cd. Now you can write the command as described in the project readme, where dictionaryName is the name of your files.

Is that better ?

Thank you, I did all that you said but it returns me this error: "unable to access jarfile hunspell2wordlist"

On which platform do you work (Windows/Linux/Mac OSX/etc.) ?

Two possibilities. Either the terminal isn't in the folder of the .jar file or it doesn't recognise it.
Try with the exact case instead.

java -jar Hunspell2WordList.jar data/dictionaryName outputFile.txt

I'm on windows. The terminal is on the main folder (which I called "Hunspell2" and contains Hunspell2WordList.jar, license, readme.md and the data folder). It doesn't work neither with the exact case.

Weird. I have to admit I'm a bit lost.
Maybe you can try with the absolute path like this, but this isn't supposed to change anything:

java -jar "C:\path\to\folder\Hunspell2WordList.jar" data/dictionaryName outputFile.txt

Another possibility is to try with the other version (v0.1.0).

Nothing, it gives me the same error, even with the older version. For now I would like to thank you for your help: I hope to resolve this problem, because your application could be very useful for me.

I finally resolved the problem by reinstalling the java environment: it seems to work correctly with an italian dictionary (https://github.com/elastic/hunspell/tree/master/dicts/it_IT-moz), although I can't find the output, even expressing the complete path of the desidered file (c:...). However it seems to have problems with this ancient greek dictionary: http://extensions.openoffice.org/en/project/ancient-greek-spell-checker. For me is very important to have a complete output of that dictionary. Again, thank you for your help.

You're welcome.
The program was not designed to work with absolute paths, so you should avoid this if possible. The output is either the console or the named file.

Can you send the errors, please ?

Yes, but it didn't create the named file! Here is a pic of the result: http://imgur.com/vtzKWc9

It seams that the it_IT files uses ISO8859-15 encoding and H2WL can't handle it.
However, it is possible to convert these files to UTF-8 to fix this issue. The simplest way to do that is to install Notepad++, open each file and click on Encoding > Convert to UTF-8.

Where are the .dic/.aff files for the ancient greek dictionary?

I converted in UTF-8 and it worked for the it_IT (I had to write data/ita.txt in the shell to obtain the txt file). The ancient greek package is here: http://members.hellug.gr/sng/ancientgreekoxt/download.html, you must open with zip and extract the .dic and .aff

Thanks. The new version (v0.1.2) should solve the issue.

Note that you can also use the command without any argument, like this:

$ java -jar Hunspell2WordList.jar
Command help: hunspell2wordlist [InputFilePath] [OutputFilePath]
Enter the input file's path: data/grc_GR
Enter the output file's path: grc.txt
References stats:
 * 0 AM references
 * 0 AF references
 * 5000 affixes
 * 5000 affix options
DicParser stats:
 * 760186 lines
WordListGenerator stats:
 * 1784034 words

Thank you very much, it has worked perfectly. This application resolved a big problem for me: obtain a multilingual spell checker to use with TeXStudio (a LaTeX editor). In fact, while all other dictionaries were merged correctly with hunspell-merge (https://github.com/arty-name/hunspell-merge), the ancient greek suffered a problem with the affixes. Now I put all the greek words in the .dic file of the merged dictionaries, without affixes, and it works perfectly in combination with the other languages. Your help is very appreciated, I hope that your application will obtain the appreciation that deserves.

This merge problem is probably the same than for H2WL. The greek dictionary uses numeric references for affix which is usually used for something else.

Just keep in mind that the result of H2WL is far from perfect. However, I am glad that my little side-project helped.
Thank you for your kind words.