Not taking custom tokenizer?
bitwombat opened this issue · comments
Bit Wombat commented
I'm rusty on my JS so I'm probably doing something dumb here, but I can't get your classifier to take a custom tokenizer.
const classifier = bayes({'tokenizer': tokenizer});
var tokenizer = function (text) {
var rgxPunctuation = /[^(a-zA-Z)+\s]/g
var sanitized = text.replace(rgxPunctuation, ' ').toLowerCase();
return sanitized.split(/\s+/)
}
If I put a console.log in there, it's clear it's not getting executed.
Jason Wohlgemuth commented
I am just passing through, but you might try putting your var tokenizer
above const classifier
(where it is used) and adding a new
in front of bayes({...
:
var tokenizer = function (text) {
var rgxPunctuation = /[^(a-zA-Z)+\s]/g
var sanitized = text.replace(rgxPunctuation, ' ').toLowerCase();
return sanitized.split(/\s+/)
};
const classifier = new bayes({'tokenizer': tokenizer});
I tried the above code in RunKit and it appeared to work as expected.
Note: You could also use a function statement for tokenizer
to maintain its position in your code:
// since you are already using ES6, you might consider the object properties shorthand ;)
// the "new" is needed either way
const classifier = new bayes({tokenizer});
function tokenizer(text) {
var rgxPunctuation = /[^(a-zA-Z)+\s]/g
var sanitized = text.replace(rgxPunctuation, ' ').toLowerCase();
return sanitized.split(/\s+/)
}
Bit Wombat commented
Ah, thanks so much @jhwohlgemuth !
I hate getting rusty on things.
All worky.