mammothb / symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dictionary loading test is returning an empty list

mzeidhassan opened this issue · comments

Hi @mammothb ,

I am testing this piece of code from your example page, but I am getting an empty list.

from itertools import islice
import pkg_resources
from symspellpy import SymSpell

sym_spell = SymSpell()
dictionary_path = "sample_dictionary.txt"
sym_spell.load_dictionary(dictionary_path, 0, 1, encoding='utf-8')

# Print out first 5 elements to demonstrate that dictionary is successfully loaded
print(list(islice(sym_spell.words.items(), 5)))

Nothing is printed and I am getting an empty list [ ]
I am also attaching the sample dictionary file as well.

Please note that I also created the dictionary by using this code:

from symspellpy import SymSpell

sym_spell = SymSpell()
corpus_path = 'all_in_one.out.txt'
sym_spell.create_dictionary(corpus_path, encoding='utf-8')
print(sym_spell.words)

Any idea why I am getting empty [ ]

Thanks
sample_dictionary.txt

When I test the same code with the old dictionary format that I created a while ago, it displays the content correctly. This format, I mean.

وطلبت 29573
مزيدا 17978
من 8529179
التوضيح 3279
لما 93283
ذكر 32825
في 13752698
التقرير 291978
فرض 31737
غرامات 1438
على 5633698

commented

load_dictionary() requires the dictionary file to have the format of

<term> <count>
<term> <count>
...
<term> <count>

The sample_dictionary.txt you have provides looks like a python dictionary saved as text. If you want to use python dictionary, you have to first read it into memory and then wrap it with DictIO like so

from itertools import islice
from symspellpy import SymSpell
from symspellpy.helpers import DictIO

sym_spell = SymSpell(max_dictionary_edit_distance=2, prefix_length=7)
dict_obj = {'تلاحظ': 35371, 'اللجنة': 970280, 'الاستشارية': 85578,
            'من': 8216720, 'تقرير': 304199, 'الأمين': 331331,
            'العام': 710173, 'أن': 3914662, 'الاحتياجات': 75750,
            'الموظفين': 151569, 'تشكل': 73455, 'الحصة': 4340,
            'الكبرى': 35200, 'وهي': 19}
dict_stream = DictIO(dict_obj)

sym_spell.load_dictionary_stream(dict_stream, 0, 1)
# Print out first 5 elements to demonstrate that dictionary is successfully loaded
print(list(islice(sym_spell.words.items(), 5)))

Output:

[('تلاحظ', 35371), ('اللجنة', 970280), ('الاستشارية', 85578), ('من', 8216720), ('تقرير', 304199)]

Thanks @mammothb for your detailed answer.

I was basically following your example code here.

I thought by doing that, you will create a ready-to-use dictionary. Do you think the instruction should be updated, or at least mentioning the end format to be able to use the created dictionary?

Thanks again for your continuous support.

I seize the opportunity to wish you a Happy New Year.
Thanks