NLP Java: | NLP Clojure: | NLP Kotlin: | NLP Scala:
Run a docker container with NLP libraries/frameworks written in Java/JVM languages, running under the traditional Java 11 (from OpenJDK or another source) or GraalVM.
Find out more about Natural Language Processing from the NLP section section.
- Run docker container containing NLP libraries/frameworks written in Java/JVM languages
- Ability to create custom docker images (scripts & docs provided)
- Ability to debug the docker container
- Run using the traditional JDK 11 (OpenJDK or vendor specific versions)
- Run using the polyglot JVM i.e. GraalVM JDK (Community version from Oracle Labs), when running performing operations from the CLI
- Play with and learn from with some examples for each of the libraries provided
- Standford CoreNLP
- Apache OpenNLP | See README for usage and examples
- NLP4J: NLP Toolkit for JVM Languages
- Word2vec in Java
- ReVerb: Web-Scale Open Information Extraction
- OpenRegex: An efficient and flexible token-based regular expression language and engine
- CogcompNLP: Core libraries developed in the U of Illinois' Cognitive Computation Group
- MALLET - MAchine Learning for LanguagE Toolkit
- RDRPOSTagger - A robust POS tagging toolkit available (in both Java & Python) together with pre-trained models for 40+ languages.
- Clojure-openNLP - Natural Language Processing in Clojure (opennlp)
- Infections-clj - Rails-like inflection library for Clojure and ClojureScript
- postagga - A library to parse natural language in Clojure and ClojureScript
- Lingua - A language detection library for Kotlin and Java, suitable for long and short text alike
- Kotidgy — an index-based text data generator written in Kotlin
- Saul - Library for developing NLP systems, including built in modules like SRL, POS, etc.
- ATR4S - Toolkit with state-of-the-art automatic term recognition methods.
- tm - Implementation of topic modeling based on regularized multilingual PLSA.
- word2vec-scala - Scala interface to word2vec model; includes operations on vectors like word-distance and word-analogy.
- Epic - Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models.
Scroll up to find the below provided scripts
- docker-runner.sh: can perform a number of the below actions depending on the flags passed to it:
- runs the container and brings you to the command prompt inside the container:
- build the docker base and language (i.e. java, clojure, kotlin, scala) specific image takes under 5 minutes to finish on a decent connection
- push pre-built docker images to docker hub (please pass in your own Docker username and later on enter Docker login details, see usage below)
- a housekeeping script to remove dangling images and terminated containers (helps save some diskspace)
- Base Dockerfile | Java Dockerfile: Dockerfile scripts to help build the base and language (i.e. java, clojure, kotlin, scala) specific docker image of NLP Java/JVM in an isolated environment with the necessary dependencies.
- images folder - provided with scripts to build and the scripts included into the container for the base image and language (i.e. java, clojure, kotlin, scala) specific docker image
Help:
$ ./docker-runner.sh --help
Usage: ./docker-runner.sh --dockerUserName [docker user name]
--language [language id]
--detach
--buildImage
--runContainer
--pushImageToHub
--cleanup
--help
--dockerUserName docker user name as on Docker Hub
(mandatory with build and push commands)
--language language id as in java, clojure, scala, etc...
--detach run container and detach from it,
return control to console
--jdk name of the JDK to use (currently supports
GRAALVM only, default is blank which
enables the traditional JDK)
--javaopts sets the JAVA_OPTS environment variable
inside the container as it starts
--cleanup (command action) remove exited containers and
dangling images from the local repository
--buildImage (command action) build the docker image
--runContainer (command action) run the docker image as a docker container
--pushImageToHub (command action) push the docker image built to Docker Hub
--help shows the script usage help text
Run the NLP Java/JVM docker container:
$ ./docker-runner.sh --runContainer
or
$ ./docker-runner.sh --runContainer --dockerUserName [your docker user name]
or run in GraalVM mode
$ ./docker-runner.sh --runContainer --jdk "GRAALVM"
or run by switching off JVMCI flag (default: on) when running in GRAALVM mode
$ ./docker-runner.sh --javaopts "-XX:-UseJVMCINativeLibrary"
Build the docker container:
Ensure your environment has the below variable set, or set it in your .bashrc
or .bash_profile
or the relevant startup script:
export DOCKER_USER_NAME="your_docker_username"
You must have an account on Docker hub under the above user name.
$ ./docker-runner --buildImage
or
$ ./docker-runner --buildImage --dockerUserName "your_docker_username"
or
$ ./docker-runner --buildImage --language [language_id]
[language_id]
- defaults to java
when not provided. Accepts: java
, clojure
, kotlin
, scala
Push built NLP Java/JVM docker image to Docker hub:
$ ./docker-runner --pushImageToHub
or
$ ./docker-runner --pushImageToHub --dockerUserName "your_docker_username"
The above will prompt the docker login name and password, before it can push your image to Docker hub (you must have an account on Docker hub).
Docker image on Docker Hub
Find the NLP Java/JVM Docker Image on Docker Hub. The docker-runner.sh --pushImageToHub
script pushes the image to the Docker hub and the docker-runner.sh --runContainer
script runs it from the local repository. If absent, in the the local repository, it downloads this image from Docker Hub.
Contributions are very welcome, please share back with the wider community (and get credited for it)!
Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.
Go to NLP page