mozilla / fathom

A framework for extracting meaning from web pages

Home Page:http://mozilla.github.io/fathom/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add naive Bayes classifier for text

erikrose opened this issue · comments

We'll probably need one to distinguish Project Smoot's Article class from its Techie Article class. My initial musings have the classifier represented as a scoring callback:

rule(..., score(bayes(...)))

It, like any scoring callback, would take a fnode and probably operate on its innerText. Also, like any scoring callback, it would return a value in 0..1, which the rest of Fathom could threshold or mix into other types, as usual. An open question is how to train it.