libindic / soundex

Soundex Phonetic Code Algorithm Demo for Indian Languages. Supports all indian languages and English. Provides intra-indic string comparison

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Soundex calculation for English language text is wrong

copyninja opened this issue · comments

Current Soundex module outputs wrong string for english text. For eg. use vasudeva and output is vA2C3D1A but as per soundex rules if we calculate soundex for this string it should be v231. For more information on Soundex check wiki link.

The rule is

Retain the first letter of the name and drop all other occurrences of a, e, i, o, u, y, h, w.

So it seems we are not dropping occurrences of a, e, i, o, u, y, h, w.

@santhoshtr no actually this is happening because we have ISO15919 dict along with others so instead of using English we end up using ISO15919.

Probably we need to use logic I used in Go charmap module in our charmap module.

@santhoshtr but check this line https://github.com/Project-SILPA/soundex/blob/master/soundex/core.py#L34 and especially definition of https://github.com/Project-SILPA/soundex/blob/master/soundex/charmap.py#L66 here we are looping over keys and dictionary doesn't have a particular oder. And we land up getting ISO15919 before en_US