[Question] Support of multiple languages
opened this issue · comments
Thanks for testing this out! You can actually use the rest of that demo to see where things go awry: http://glench.github.io/fuzzyset.js/ui/
Looks like the gramCounter function isn't working. I'll check if this is a bug in the actual library later.
Thanks @Glench Indeed gramCounter
is not working.
Note: I'm not good in NLP, so I'm sorry I cannot help, I'm trying to use this amazing library.
It is because of this var _nonWordRe = /[^a-zA-Z0-9\u00C0-\u00FF, ]+/g;
which is used by
Line 56 in 916ce62
There is this indication I'm not sure though
var _nonWordRe = /[^a-zA-Z0-9\u00C0-\u00FF, ]+ | ([\u0600-\u06ff]+)([^\u0600-\u06ff]+)?/g;
'-' + "hello".toLowerCase().replace(_nonWordRe, '') + '-';
'-hello-'
'-' + "مرحبا".toLowerCase().replace(_nonWordRe, '') + '-';
'-مرحبا-'
var _nonWordRe = /[^a-zA-Z0-9\u00C0-\u00FF, ]+/g;
'-' + "مرحبا".toLowerCase().replace(_nonWordRe, '') + '-';
'--'
'-' + "hello".toLowerCase().replace(_nonWordRe, '') + '-';
'-hello-'
I could use something similar to comprehend english and arabic, but I'm 90% sure I'm messing things.
fuzzyset = FuzzySet(['Mississippi', 'Missouri', 'California'], false, 3,3)
const similarity = fuzzyset.get('mossisippi')
console.log(similarity)
This is what I exactly needed by the way, 🥇
Okay, I added arabic support. I don't think other alphabets are supported at the moment but if someone needs this then please write a comment here.
Thanks a lot @Glench for the addition 🥳