Help to use

Question

Help to use

Alessio-R opened this issue 8 years ago · comments

Hi, this isn't an issue but a request to help.
Sorry for that, but I'm not an expert and, since I think that your application could help me a lot, I need to use it but I don't know how to do. I have both the .dic and .aff from the dictionary that I need to obtain.

Elouan Poupard-Cosquer · Answer 1 · Sat Apr 09 2016 21:02:18 GMT+0800 (China Standard Time)

No need for apologies. The instructions are rather succinct.

First, you'll need a Java runtime on your computer. Depending on your platform, you can get it here or install it with your package manager (on GNU/Linux for example).

Then, download the latest version and extract it the folder of your choice. Put the .dic/.aff files in a subfolder named data.

Then, open a terminal/shell/command-line (cmd.exe on Windows, reach it via research on start menu) and go to this folder with the command cd. Now you can write the command as described in the project readme, where dictionaryName is the name of your files.

Is that better ?

Alessio-R · Answer 2 · Sat Apr 09 2016 21:12:58 GMT+0800 (China Standard Time)

Thank you, I did all that you said but it returns me this error: "unable to access jarfile hunspell2wordlist"

Elouan Poupard-Cosquer · Answer 3 · Sat Apr 09 2016 21:20:43 GMT+0800 (China Standard Time)

On which platform do you work (Windows/Linux/Mac OSX/etc.) ?

Two possibilities. Either the terminal isn't in the folder of the .jar file or it doesn't recognise it.
Try with the exact case instead.

java -jar Hunspell2WordList.jar data/dictionaryName outputFile.txt

Alessio-R · Answer 4 · Sat Apr 09 2016 21:27:36 GMT+0800 (China Standard Time)

I'm on windows. The terminal is on the main folder (which I called "Hunspell2" and contains Hunspell2WordList.jar, license, readme.md and the data folder). It doesn't work neither with the exact case.

Elouan Poupard-Cosquer · Answer 5 · Sat Apr 09 2016 21:40:11 GMT+0800 (China Standard Time)

Weird. I have to admit I'm a bit lost.
Maybe you can try with the absolute path like this, but this isn't supposed to change anything:

java -jar "C:\path\to\folder\Hunspell2WordList.jar" data/dictionaryName outputFile.txt

Another possibility is to try with the other version (v0.1.0).

Alessio-R · Answer 6 · Sat Apr 09 2016 23:43:48 GMT+0800 (China Standard Time)

Nothing, it gives me the same error, even with the older version. For now I would like to thank you for your help: I hope to resolve this problem, because your application could be very useful for me.

Alessio-R · Answer 7 · Sun Apr 10 2016 00:40:51 GMT+0800 (China Standard Time)

I finally resolved the problem by reinstalling the java environment: it seems to work correctly with an italian dictionary (https://github.com/elastic/hunspell/tree/master/dicts/it_IT-moz), although I can't find the output, even expressing the complete path of the desidered file (c:...). However it seems to have problems with this ancient greek dictionary: http://extensions.openoffice.org/en/project/ancient-greek-spell-checker. For me is very important to have a complete output of that dictionary. Again, thank you for your help.

Elouan Poupard-Cosquer · Answer 8 · Sun Apr 10 2016 00:45:42 GMT+0800 (China Standard Time)

You're welcome.
The program was not designed to work with absolute paths, so you should avoid this if possible. The output is either the console or the named file.

Can you send the errors, please ?

Alessio-R · Answer 9 · Sun Apr 10 2016 00:54:15 GMT+0800 (China Standard Time)

Yes, but it didn't create the named file! Here is a pic of the result: http://imgur.com/vtzKWc9

Elouan Poupard-Cosquer · Answer 10 · Sun Apr 10 2016 21:00:05 GMT+0800 (China Standard Time)

It seams that the it_IT files uses ISO8859-15 encoding and H2WL can't handle it.
However, it is possible to convert these files to UTF-8 to fix this issue. The simplest way to do that is to install Notepad++, open each file and click on Encoding > Convert to UTF-8.

Where are the .dic/.aff files for the ancient greek dictionary?

Alessio-R · Answer 11 · Sun Apr 10 2016 21:14:16 GMT+0800 (China Standard Time)

I converted in UTF-8 and it worked for the it_IT (I had to write data/ita.txt in the shell to obtain the txt file). The ancient greek package is here: http://members.hellug.gr/sng/ancientgreekoxt/download.html, you must open with zip and extract the .dic and .aff

Elouan Poupard-Cosquer · Answer 12 · Sun Apr 10 2016 21:59:25 GMT+0800 (China Standard Time)

Thanks. The new version (v0.1.2) should solve the issue.

Note that you can also use the command without any argument, like this:

$ java -jar Hunspell2WordList.jar
Command help: hunspell2wordlist [InputFilePath] [OutputFilePath]
Enter the input file's path: data/grc_GR
Enter the output file's path: grc.txt
References stats:
 * 0 AM references
 * 0 AF references
 * 5000 affixes
 * 5000 affix options
DicParser stats:
 * 760186 lines
WordListGenerator stats:
 * 1784034 words

Alessio-R · Answer 13 · Sun Apr 10 2016 23:05:00 GMT+0800 (China Standard Time)

Thank you very much, it has worked perfectly. This application resolved a big problem for me: obtain a multilingual spell checker to use with TeXStudio (a LaTeX editor). In fact, while all other dictionaries were merged correctly with hunspell-merge (https://github.com/arty-name/hunspell-merge), the ancient greek suffered a problem with the affixes. Now I put all the greek words in the .dic file of the merged dictionaries, without affixes, and it works perfectly in combination with the other languages. Your help is very appreciated, I hope that your application will obtain the appreciation that deserves.

Elouan Poupard-Cosquer · Answer 14 · Sun Apr 10 2016 23:39:07 GMT+0800 (China Standard Time)

This merge problem is probably the same than for H2WL. The greek dictionary uses numeric references for affix which is usually used for something else.

Just keep in mind that the result of H2WL is far from perfect. However, I am glad that my little side-project helped.
Thank you for your kind words.