Glench / fuzzyset.js

fuzzyset.js - A fuzzy string set for javascript

Home Page:http://glench.github.io/fuzzyset.js/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question] Support of multiple languages

opened this issue · comments

I went through the demo here and I don't know why but Calculating the dot product of two vectors when values are Arabic, score is 0 for this example:

لمادا

لماذا

Thanks a lot @Glench !!!!

Thanks for testing this out! You can actually use the rest of that demo to see where things go awry: http://glench.github.io/fuzzyset.js/ui/

Looks like the gramCounter function isn't working. I'll check if this is a bug in the actual library later.

Thanks @Glench Indeed gramCounter is not working.

Note: I'm not good in NLP, so I'm sorry I cannot help, I'm trying to use this amazing library.

It is because of this var _nonWordRe = /[^a-zA-Z0-9\u00C0-\u00FF, ]+/g; which is used by

var _iterateGrams = function(value, gramSize) {

There is this indication I'm not sure though

var _nonWordRe = /[^a-zA-Z0-9\u00C0-\u00FF, ]+ | ([\u0600-\u06ff]+)([^\u0600-\u06ff]+)?/g;
'-' + "hello".toLowerCase().replace(_nonWordRe, '') + '-';
'-hello-'
'-' + "مرحبا".toLowerCase().replace(_nonWordRe, '') + '-';
'-مرحبا-'
var _nonWordRe = /[^a-zA-Z0-9\u00C0-\u00FF, ]+/g;
'-' + "مرحبا".toLowerCase().replace(_nonWordRe, '') + '-';
'--'
'-' + "hello".toLowerCase().replace(_nonWordRe, '') + '-';
'-hello-'

I could use something similar to comprehend english and arabic, but I'm 90% sure I'm messing things.

fuzzyset = FuzzySet(['Mississippi', 'Missouri', 'California'], false, 3,3)
const similarity = fuzzyset.get('mossisippi')
console.log(similarity)

This is what I exactly needed by the way, 🥇

Okay, I added arabic support. I don't think other alphabets are supported at the moment but if someone needs this then please write a comment here.

Thanks a lot @Glench for the addition 🥳