vinhkhuc / JFastText

Java interface for fastText

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

the prediction is not same as predicted using official c++

ericxsun opened this issue · comments

I just tested this repo and the official one to predict a number of samples(with the same model trained by official code, in format of ftz).

c++

fasttext predict-prob test-example.txt

java(this) - api call

(equal represents the label is same, discard the probability)
all samples: 21513
equal: 19236
not-equal: 2219
null(in this repo): 58

java-cmd(this)

java -jar jfasttext-0.4-jar-with-dependencies.jar predict-prob test-example.txt
all samples: 21513
equal: 18825
not-equal: 2688

so, what's wrong?

Another thing: the prediction of java-cmd is unstable , changing every time.

Found this one: https://github.com/linkfluence/fastText4j, the prediction is quite same.

Based on what @carschno mentioned in #49, I used this to get the right results:

public Map<String, Double> predictTopLabel(String text, int k) {
    Map<String, Double> scoreMap = new LinkedHashMap<>();
    text = StringUtils.trimToEmpty(text) + "\n";
    final List<JFastText.ProbLabel> pl = model.predictProba(text, k);
    for (JFastText.ProbLabel i : CollectionUtils.emptyIfNull(pl)) {
        final double prob = Math.exp(i.logProb);
        final double score = Math.round(prob * 100000000) / 100000000;
        scoreMap.put(i.label, score);
    }
    return scoreMap;
}