the prediction is not same as predicted using official c++
ericxsun opened this issue · comments
I just tested this repo and the official one to predict a number of samples(with the same model trained by official code, in format of ftz).
c++
fasttext predict-prob test-example.txt
java(this) - api call
(equal represents the label is same, discard the probability)
all samples: 21513
equal: 19236
not-equal: 2219
null(in this repo): 58
java-cmd(this)
java -jar jfasttext-0.4-jar-with-dependencies.jar predict-prob test-example.txt
all samples: 21513
equal: 18825
not-equal: 2688
so, what's wrong?
Another thing: the prediction of java-cmd
is unstable , changing every time.
Found this one: https://github.com/linkfluence/fastText4j, the prediction is quite same.
Based on what @carschno mentioned in #49, I used this to get the right results:
public Map<String, Double> predictTopLabel(String text, int k) {
Map<String, Double> scoreMap = new LinkedHashMap<>();
text = StringUtils.trimToEmpty(text) + "\n";
final List<JFastText.ProbLabel> pl = model.predictProba(text, k);
for (JFastText.ProbLabel i : CollectionUtils.emptyIfNull(pl)) {
final double prob = Math.exp(i.logProb);
final double score = Math.round(prob * 100000000) / 100000000;
scoreMap.put(i.label, score);
}
return scoreMap;
}