ChenZhangg / BlackLab

A corpus retrieval engine based on Apache Lucene

Home Page:http://inl.github.io/BlackLab/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What is BlackLab?

BlackLab is a corpus retrieval engine built on top of Apache Lucene. It allows fast, complex searches with accurate hit highlighting on large, tagged and annotated, bodies of text. It was developed at the Institute of Dutch Lexicology (INL) to provide a fast and feature-rich search interface on our historical and contemporary text corpora.

We're also working on BlackLab Server, a web service interface to BlackLab, so you can access it from any programming language. BlackLab Server is included in the repository as well.

BlackLab and BlackLab Server are licensed under the Apache License 2.0.

To learn how to index and search your data, see the official project site.

Using BlackLab with Docker

An experimental Docker setup is provided now. It will likely change in the future.

We assume here that you are familiar with the BlackLab indexing process; see indexing with BlackLab to learn more.

Create a file named test.env with your indexing configuration:

IMAGE_VERSION=latest
BLACKLAB_FORMATS_DIR=/path/to/my/formats
INDEX_NAME=my-index
INDEX_FORMAT=my-file-format
INDEX_INPUT_DIR=/path/to/my/input-files
JAVA_OPTS=-Xmx10G

To index your data:

docker-compose --env-file test.env run --rm indexer

Now start the server:

docker-compose up -d

Your index should now be accessible at http://localhost:8080/blacklab-server/my-index.

See the Docker README for more details.

Special thanks

About

A corpus retrieval engine based on Apache Lucene

http://inl.github.io/BlackLab/

License:Apache License 2.0


Languages

Language:Java 98.5%Language:JavaScript 0.6%Language:C 0.5%Language:HTML 0.2%Language:Shell 0.1%Language:Dockerfile 0.0%Language:CSS 0.0%Language:Makefile 0.0%