neomatrix369 / nlp-java-jvm-example

A repo with NLP examples of libraries/packages/framework written in Java/JVM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NLP Java/JVM License

NLP Java: NLP Java | NLP Clojure: NLP Clojure | NLP Kotlin: NLP Kotlin | NLP Scala: NLP Scala


Run a docker container with NLP libraries/frameworks written in Java/JVM languages, running under the traditional Java 11 (from OpenJDK or another source) or GraalVM.

Find out more about Natural Language Processing from the NLP section section.

Goals

  • Run docker container containing NLP libraries/frameworks written in Java/JVM languages
  • Ability to create custom docker images (scripts & docs provided)
  • Ability to debug the docker container
  • Run using the traditional JDK 11 (OpenJDK or vendor specific versions)
  • Run using the polyglot JVM i.e. GraalVM JDK (Community version from Oracle Labs), when running performing operations from the CLI
  • Play with and learn from with some examples for each of the libraries provided

Libraries / frameworks provided

Java

Clojure

  • Clojure-openNLP - Natural Language Processing in Clojure (opennlp)
  • Infections-clj - Rails-like inflection library for Clojure and ClojureScript
  • postagga - A library to parse natural language in Clojure and ClojureScript

Kotlin

  • Lingua - A language detection library for Kotlin and Java, suitable for long and short text alike
  • Kotidgy — an index-based text data generator written in Kotlin

Scala

  • Saul - Library for developing NLP systems, including built in modules like SRL, POS, etc.
  • ATR4S - Toolkit with state-of-the-art automatic term recognition methods.
  • tm - Implementation of topic modeling based on regularized multilingual PLSA.
  • word2vec-scala - Scala interface to word2vec model; includes operations on vectors like word-distance and word-analogy.
  • Epic - Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models.

Scripts provided

Scroll up to find the below provided scripts

  • docker-runner.sh: can perform a number of the below actions depending on the flags passed to it:
    • runs the container and brings you to the command prompt inside the container:
    • build the docker base and language (i.e. java, clojure, kotlin, scala) specific image takes under 5 minutes to finish on a decent connection
    • push pre-built docker images to docker hub (please pass in your own Docker username and later on enter Docker login details, see usage below)
    • a housekeeping script to remove dangling images and terminated containers (helps save some diskspace)
  • Base Dockerfile | Java Dockerfile: Dockerfile scripts to help build the base and language (i.e. java, clojure, kotlin, scala) specific docker image of NLP Java/JVM in an isolated environment with the necessary dependencies.
  • images folder - provided with scripts to build and the scripts included into the container for the base image and language (i.e. java, clojure, kotlin, scala) specific docker image

Usage

Help:

$ ./docker-runner.sh --help

       Usage: ./docker-runner.sh --dockerUserName [docker user name]
                                 --language [language id]
                                 --detach
                                 --buildImage
                                 --runContainer
                                 --pushImageToHub
                                 --cleanup
                                 --help

       --dockerUserName      docker user name as on Docker Hub
                             (mandatory with build and push commands)
       --language            language id as in java, clojure, scala, etc...
       --detach              run container and detach from it,
                             return control to console
       --jdk                 name of the JDK to use (currently supports 
                             GRAALVM only, default is blank which 
                             enables the traditional JDK)
       --javaopts            sets the JAVA_OPTS environment variable
                             inside the container as it starts
       --cleanup             (command action) remove exited containers and
                             dangling images from the local repository
       --buildImage          (command action) build the docker image
       --runContainer        (command action) run the docker image as a docker container
       --pushImageToHub      (command action) push the docker image built to Docker Hub
       --help                shows the script usage help text

Run the NLP Java/JVM docker container:

$ ./docker-runner.sh --runContainer

or

$ ./docker-runner.sh --runContainer --dockerUserName [your docker user name]

or run in GraalVM mode

$ ./docker-runner.sh --runContainer --jdk "GRAALVM"

or run by switching off JVMCI flag (default: on) when running in GRAALVM mode

$ ./docker-runner.sh --javaopts "-XX:-UseJVMCINativeLibrary"

Build the docker container:

Ensure your environment has the below variable set, or set it in your .bashrc or .bash_profile or the relevant startup script:

export DOCKER_USER_NAME="your_docker_username"

You must have an account on Docker hub under the above user name.

$ ./docker-runner --buildImage

or

$ ./docker-runner --buildImage --dockerUserName "your_docker_username"

or

$ ./docker-runner --buildImage --language [language_id]

[language_id] - defaults to java when not provided. Accepts: java, clojure, kotlin, scala

Push built NLP Java/JVM docker image to Docker hub:

$ ./docker-runner --pushImageToHub

or

$ ./docker-runner --pushImageToHub --dockerUserName "your_docker_username"

The above will prompt the docker login name and password, before it can push your image to Docker hub (you must have an account on Docker hub).

Docker image on Docker Hub

Find the NLP Java/JVM Docker Image on Docker Hub. The docker-runner.sh --pushImageToHub script pushes the image to the Docker hub and the docker-runner.sh --runContainer script runs it from the local repository. If absent, in the the local repository, it downloads this image from Docker Hub.

Contributing

Contributions are very welcome, please share back with the wider community (and get credited for it)!

Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.


Go to NLP page

About

A repo with NLP examples of libraries/packages/framework written in Java/JVM

License:Other


Languages

Language:Jupyter Notebook 80.8%Language:Shell 15.5%Language:Java 2.7%Language:Dockerfile 0.9%