JesperTerkelsen / blayze

A fast and flexible Naive Bayes implementation for the JVM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

blayze

A fast and flexible Bayesian Naive Bayes implementation for the JVM written in Kotlin.

  • Fully supports the online learning paradigm, in which data, and even new features, are added as they become available.
  • Reasonably fast and memory efficient. We've trained a document classifier with tens of thousands of classes on hundreds of thousands of documents, and ironed out most of the hot-spots.
  • Naturally works with few samples, by integrating out the uncertainty on estimated parameters.
  • Models and data structures are immutable such that they are concurrency friendly.
  • Efficient serialization and deserialization using protobuf.
  • Missing and unknown features at prediction time are properly handled.
  • Minimal dependencies.

Usage

Get the latest artifact from maven central

//Java 9
Model model = new Model().batchAdd(List.of(new Update( //Models are immutable
        new Inputs( // Supports multiple feature types
                Map.of( //Text features
                        "subject", "Attention, is it true?", //features are named.
                        "body", "Good day dear beneficiary. This is Secretary to president of Benin republic is writing this email ..." // multiple features of the same type have different names
                ),
                Map.of( //Categorical features
                        "sender", "WWW.@galaxy.ocn.ne.jp"
                ),
                Map.of( //Gaussian features
                        "n_words", 482.
                )
        ),
        "spam" // the outcome, in this case spam.
)));

Map<String, Double> predictions = model.predict(new Inputs(/*...*/));// e.g. {"spam": 0.624, "ham": 0.376}

Built With

Versioning

We use SemVer for versioning.

Authors

About

A fast and flexible Naive Bayes implementation for the JVM

License:MIT License


Languages

Language:Kotlin 100.0%