src-d / enry

A faster file programming language detector

Home Page:https://blog.sourced.tech/post/enry/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bayesian classifier cann't distinguish "SQL" vs "PLpgSQL"

bzz opened this issue · comments

commented

Part of the #155.

After update to latest samples in #189, Bayesian classifier test fail to distinguish "SQL" vs "PLpgSQL" based only on content. Classifier weights are different in enry/linguist for the same document #189 (comment)

This most probably this has to do with with difference between tokenizations between two projects that going to be addressed in #193