The Blast is a full text search and indexing server written in Go built on top of the Bleve, Bolt and Raft.
Blast server provides functions through gRPC (HTTP/2 + Protocol Buffers) or traditional RESTful API (HTTP/1.1 + JSON), and uses Raft to achieve consensus across all the instances of the nodes, ensuring that every change made to the system is made to a quorum of nodes, or none at all.
Blast makes it easy for programmers to develop search applications with advanced features.
- Full-text search and indexing
- Faceting
- Result highlighting
- Easy deployment
- Bringing up cluster
- Index replication
- An easy-to-use HTTP API
- CLI is also available
- Docker container image is also available
Blast requires Bleve and Bleve Extensions. Some Bleve Extensions requires C/C++ libraries. The following sections are instructions for satisfying dependencies on particular platforms.
- Go
- Git
$ sudo apt-get install -y libicu-dev \
libleveldb-dev \
libstemmer-dev \
libgflags-dev \
libsnappy-dev \
zlib1g-dev \
libbz2-dev \
liblz4-dev \
libzstd-dev \
librocksdb-dev \
gcc-4.8 \
g++-4.8 \
build-essential
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 80
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 80
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 90
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 90
$ export GOPATH=${HOME}/go
$ go get -u -v github.com/blevesearch/cld2
$ cd ${GOPATH}/src/github.com/blevesearch/cld2
$ git clone https://github.com/CLD2Owners/cld2.git
$ cd cld2/internal
$ ./compile_libs.sh
$ sudo cp *.so /usr/local/lib
$ brew install icu4c \
leveldb \
rocksdb \
zstd
$ CGO_LDFLAGS="-L/usr/local/opt/icu4c/lib" \
CGO_CFLAGS="-I/usr/local/opt/icu4c/include" \
go get -u -v github.com/blevesearch/cld2
$ cd ${GOPATH}/src/github.com/blevesearch/cld2
$ git clone https://github.com/CLD2Owners/cld2.git
$ cd cld2/internal
$ perl -p -i -e 's/soname=/install_name,/' compile_libs.sh
$ ./compile_libs.sh
$ sudo cp *.so /usr/local/lib
Build Blast for Linux as following:
$ git clone git@github.com:mosuka/blast.git
$ cd blast
$ make build
If you want to build for other platform, set GOOS
, GOARCH
. For example, build for macOS like following:
$ GOOS=darwin \
make build
If you want to build Blast with Bleve and Bleve Extentions (blevex), please set CGO_LDFLAGS
, CGO_CFLAGS
, CGO_ENABLED
and BUILD_TAGS
. For example, enable Japanese Language Analyzer like following:
$ BUILD_TAGS=kagome \
make build
You can enable all Bleve Extensions for macOS like following:
$ GOOS=darwin \
CGO_LDFLAGS="-L/usr/local/opt/icu4c/lib -L/usr/local/opt/rocksdb/lib -lrocksdb -lstdc++ -lm -lz -lbz2 -lsnappy -llz4 -lzstd" \
CGO_CFLAGS="-I/usr/local/opt/icu4c/include -I/usr/local/opt/rocksdb/include" \
CGO_ENABLED=1 \
BUILD_TAGS="full" \
make build
Also, you can enable all Bleve Extensions for Linux like following:
$ GOOS=linux \
CGO_LDFLAGS="-L/usr/lib -lrocksdb -lstdc++ -lm -lz -lbz2 -lsnappy -llz4 -lzstd" \
CGO_CFLAGS="-I/usr/include/rocksdb" \
CGO_ENABLED=1 \
BUILD_TAGS=full \
make build
Please refer to the following table for details of Bleve Extensions:
CGO_ENABLED | BUILD_TAGS | |
---|---|---|
Compact Language Detector | 1 | cld2 |
cznicb KV store | cznicb | |
Thai Language Analyser | 1 | icu |
Japanese Language Analyser | kagome | |
LevelDB | 1 | leveldb |
Danish, German, English, Spanish, Finnish, French, Hungarian, Italian, Dutch, Norwegian, Portuguese, Romanian, Russian, Swedish, Turkish Language Stemmer | 1 | libstemmer |
RocksDB | 1 | rocksdb |
You can see the binary file when build successful like so:
$ ls ./bin
blast
Running a Blast node is easy. Start Blast node like so:
$ ./bin/blast start --bind-addr=localhost:10000 \
--grpc-addr=localhost:10001 \
--http-addr=localhost:10002 \
--node-id=node1 \
--raft-dir=/tmp/blast/noade1/raft \
--store-dir=/tmp/blast/node1/store \
--index-dir=/tmp/blast/node1/index \
--index-mapping=./example/wikipedia_index_mapping.json
You can now put, get, search and delete the document(s) via CLI.
Putting a document is as following:
$ cat ./example/enwiki_doc1.json | xargs -0 ./bin/blast put --grpc-addr=localhost:10001 --pretty-print enwiki_doc1
You can see the result in JSON format. The result of the above command is:
{
"success": true
}
Getting a document is as following:
$ ./bin/blast get --grpc-addr=localhost:10001 --pretty-print enwiki_doc1
You can see the result in JSON format. The result of the above command is:
{
"id": "enwiki_doc1",
"fields": {
"_type": "enwiki",
"contributor": "unknown",
"text_en": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload.",
"timestamp": "2018-07-04T05:41:00Z",
"title_en": "Search engine (computing)"
},
"success": true
}
Deleting a document is as following:
$ ./bin/blast delete --grpc-addr=localhost:10001 --pretty-print enwiki_doc1
You can see the result in JSON format. The result of the above command is:
{
"success": true
}
Bulk update is as following:
$ cat ./example/bulk_put_request.json | xargs -0 ./bin/blast bulk --pretty-print
You can see the result in JSON format. The result of the above command is:
{
"put_count": 3,
"success": true
}
Searching documents is as like following:
$ cat ./example/wikipedia_search_request.json | xargs -0 ./bin/blast search --grpc-addr=localhost:10001 --pretty-print
You can see the result in JSON format. The result of the above command is:
{
"search_result": {
"facets": {
"Contributor count": {
"field": "contributor",
"missing": 0,
"other": 0,
"terms": [
{
"count": 3,
"term": "unknown"
}
],
"total": 3
},
"Timestamp range": {
"date_ranges": [
{
"count": 3,
"end": "2020-12-31T23:59:59Z",
"name": "2011 - 2020",
"start": "2011-01-01T00:00:00Z"
}
],
"field": "timestamp",
"missing": 0,
"other": 0,
"total": 3
}
},
"hits": [
{
"fields": {
"_type": "enwiki",
"contributor": "unknown",
"text_en": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload.",
"timestamp": "2018-07-04T05:41:00Z",
"title_en": "Search engine (computing)"
},
"id": "enwiki_doc1",
"index": "/tmp/blast/node0/index",
"locations": {
"text_en": {
"search": [
{
"array_positions": null,
"end": 8,
"pos": 2,
"start": 2
},
{
"array_positions": null,
"end": 124,
"pos": 20,
"start": 118
},
{
"array_positions": null,
"end": 201,
"pos": 33,
"start": 195
}
]
},
"title_en": {
"search": [
{
"array_positions": null,
"end": 6,
"pos": 1,
"start": 0
}
]
}
},
"score": 0.18706386191354732,
"sort": [
"_score"
]
},
{
"fields": {
"_type": "ptwiki",
"contributor": "unknown",
"text_pt": "Motor de pesquisa (português europeu) ou ferramenta de busca (português brasileiro) ou buscador (em inglês: search engine) é um programa desenhado para procurar palavras-chave fornecidas pelo utilizador em documentos e bases de dados. No contexto da internet, um motor de pesquisa permite procurar palavras-chave em documentos alojados na world wide web, como aqueles que se encontram armazenados em websites.",
"timestamp": "2018-07-04T05:41:00Z",
"title_pt": "Motor de busca"
},
"id": "ptwiki_doc1",
"index": "/tmp/blast/node0/index",
"locations": {
"text_pt": {
"search": [
{
"array_positions": null,
"end": 117,
"pos": 16,
"start": 111
}
]
}
},
"score": 0.09273589609475594,
"sort": [
"_score"
]
},
{
"fields": {
"_type": "jawiki",
"contributor": "unknown",
"text_ja": "検索エンジン(けんさくエンジン、英語: search engine)は、狭義にはインターネットに存在する情報(ウェブページ、ウェブサイト、画像フみを提供していたウェブサイトそのものを検索エンジンと呼んだが、現在では様々なサービスが加わったポータルサイト化が進んだため、検索をサービスの一つとしてトを単に検索サイトと呼ぶことはなくなっている。広義には、インターネットに限定せず情報を検索するシステム全般を含む。",
"timestamp": "2018-05-30T00:52:00Z",
"title_ja": "検索エンジン"
},
"id": "jawiki_doc1",
"index": "/tmp/blast/node0/index",
"locations": {
"text_ja": {
"search": [
{
"array_positions": null,
"end": 62,
"pos": 11,
"start": 56
}
]
}
},
"score": 0.05875099443799347,
"sort": [
"_score"
]
}
],
"max_score": 0.18706386191354732,
"request": {
"explain": false,
"facets": {
"Contributor count": {
"field": "contributor",
"size": 10
},
"Timestamp range": {
"date_ranges": [
{
"end": "2010-12-31T23:59:59Z",
"name": "2001 - 2010",
"start": "2001-01-01T00:00:00Z"
},
{
"end": "2020-12-31T23:59:59Z",
"name": "2011 - 2020",
"start": "2011-01-01T00:00:00Z"
}
],
"field": "timestamp",
"size": 10
}
},
"fields": [
"*"
],
"from": 0,
"highlight": {
"fields": [
"title",
"text"
],
"style": "html"
},
"includeLocations": false,
"query": {
"query": "+_all:search"
},
"size": 10,
"sort": [
"-_score"
]
},
"status": {
"failed": 0,
"successful": 1,
"total": 1
},
"took": 318296,
"total_hits": 3
},
"success": true
}
Also you can do above commands via HTTP REST API that listened port 10001 (address port + 1).
Putting a document via HTTP is as following:
$ curl -X PUT 'http://localhost:10002/rest/enwiki_doc1?pretty-print=true' -d @./example/enwiki_doc1.json
You can see the result in JSON format. The result of the above request is:
{
"success": true
}
Getting a document via HTTP is as following:
$ curl -X GET 'http://localhost:10002/rest/enwiki_doc1?pretty-print=true'
You can see the result in JSON format. The result of the above request is:
{
"id": "enwiki_doc1",
"fields": {
"_type": "enwiki",
"contributor": "unknown",
"text_en": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload.",
"timestamp": "2018-07-04T05:41:00Z",
"title_en": "Search engine (computing)"
},
"success": true
}
Deleting a document via HTTP is as following:
$ curl -X DELETE 'http://localhost:10002/rest/enwiki_doc1?pretty-print=true'
You can see the result in JSON format. The result of the above request is:
{
"success": true
}
Bulk update is as following:
$ curl -X POST 'http://localhost:10002/rest/_bulk?pretty-print=true' -d @./example/bulk_put_request.json
You can see the result in JSON format. The result of the above command is:
{
"put_count": 3,
"success": true
}
Searching documents via HTTP is as following:
$ curl -X POST 'http://localhost:10002/rest/_search?pretty-print=true' -d @./example/wikipedia_search_request.json
You can see the result in JSON format. The result of the above request is:
{
"search_result": {
"facets": {
"Contributor count": {
"field": "contributor",
"missing": 0,
"other": 0,
"terms": [
{
"count": 3,
"term": "unknown"
}
],
"total": 3
},
"Timestamp range": {
"date_ranges": [
{
"count": 3,
"end": "2020-12-31T23:59:59Z",
"name": "2011 - 2020",
"start": "2011-01-01T00:00:00Z"
}
],
"field": "timestamp",
"missing": 0,
"other": 0,
"total": 3
}
},
"hits": [
{
"fields": {
"_type": "enwiki",
"contributor": "unknown",
"text_en": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload.",
"timestamp": "2018-07-04T05:41:00Z",
"title_en": "Search engine (computing)"
},
"id": "enwiki_doc1",
"index": "/tmp/blast/node0/index",
"locations": {
"text_en": {
"search": [
{
"array_positions": null,
"end": 8,
"pos": 2,
"start": 2
},
{
"array_positions": null,
"end": 124,
"pos": 20,
"start": 118
},
{
"array_positions": null,
"end": 201,
"pos": 33,
"start": 195
}
]
},
"title_en": {
"search": [
{
"array_positions": null,
"end": 6,
"pos": 1,
"start": 0
}
]
}
},
"score": 0.18706386191354732,
"sort": [
"_score"
]
},
{
"fields": {
"_type": "ptwiki",
"contributor": "unknown",
"text_pt": "Motor de pesquisa (português europeu) ou ferramenta de busca (português brasileiro) ou buscador (em inglês: search engine) é um programa desenhado para procurar palavras-chave fornecidas pelo utilizador em documentos e bases de dados. No contexto da internet, um motor de pesquisa permite procurar palavras-chave em documentos alojados na world wide web, como aqueles que se encontram armazenados em websites.",
"timestamp": "2018-07-04T05:41:00Z",
"title_pt": "Motor de busca"
},
"id": "ptwiki_doc1",
"index": "/tmp/blast/node0/index",
"locations": {
"text_pt": {
"search": [
{
"array_positions": null,
"end": 117,
"pos": 16,
"start": 111
}
]
}
},
"score": 0.09273589609475594,
"sort": [
"_score"
]
},
{
"fields": {
"_type": "jawiki",
"contributor": "unknown",
"text_ja": "検索エンジン(けんさくエンジン、英語: search engine)は、狭義にはインターネットに存在する情報(ウェブページ、ウェブサイト、画像フム。インターネットの普及初期には、検索としての機能のみを提供していたウェブサイトそのものを検索エンジンと呼んだが、現在では様々なサービスが加わったポー情報を検索するシステム全般を含む。",
"timestamp": "2018-05-30T00:52:00Z",
"title_ja": "検索エンジン"
},
"id": "jawiki_doc1",
"index": "/tmp/blast/node0/index",
"locations": {
"text_ja": {
"search": [
{
"array_positions": null,
"end": 62,
"pos": 11,
"start": 56
}
]
}
},
"score": 0.05875099443799347,
"sort": [
"_score"
]
}
],
"max_score": 0.18706386191354732,
"request": {
"explain": false,
"facets": {
"Contributor count": {
"field": "contributor",
"size": 10
},
"Timestamp range": {
"date_ranges": [
{
"end": "2010-12-31T23:59:59Z",
"name": "2001 - 2010",
"start": "2001-01-01T00:00:00Z"
},
{
"end": "2020-12-31T23:59:59Z",
"name": "2011 - 2020",
"start": "2011-01-01T00:00:00Z"
}
],
"field": "timestamp",
"size": 10
}
},
"fields": [
"*"
],
"from": 0,
"highlight": {
"fields": [
"title",
"text"
],
"style": "html"
},
"includeLocations": false,
"query": {
"query": "+_all:search"
},
"size": 10,
"sort": [
"-_score"
]
},
"status": {
"failed": 0,
"successful": 1,
"total": 1
},
"took": 332682,
"total_hits": 3
},
"success": true
}
Blast is easy to bring up the cluster. Blast node is already running, but that is not fault tolerant. If you need to increase the fault tolerance, bring up 2 more nodes like so:
$ ./bin/blast start --bind-addr=localhost:11000 \
--grpc-addr=localhost:11001 \
--http-addr=localhost:11002 \
--node-id=node2 \
--raft-dir=/tmp/blast/node2/raft \
--store-dir=/tmp/blast/node2/store \
--index-dir=/tmp/blast/node2/index \
--peer-grpc-addr=localhost:10001 \
--index-mapping=./example/wikipedia_index_mapping.json
$ ./bin/blast start --bind-addr=localhost:12000 \
--grpc-addr=localhost:12001 \
--http-addr=localhost:12002 \
--node-id=node3 \
--raft-dir=/tmp/blast/node3/raft \
--store-dir=/tmp/blast/node3/store \
--index-dir=/tmp/blast/node3/index \
--peer-grpc-addr=localhost:10001 \
--index-mapping=./example/wikipedia_index_mapping.json
Above example shows each Blast node running on the same host, so each node must listen on different ports. This would not be necessary if each node ran on a different host.
So you have a 3-node cluster. That way you can tolerate the failure of 1 node. You can check the peers with the following command:
$ ./bin/blast peers --grpc-addr=localhost:10001 --pretty-print
You can see the result in JSON format. The result of the above command is:
{
"peers": [
{
"address": "127.0.0.1:10000",
"leader": true,
"metadata": {
"grpc_address": "localhost:10001",
"http_address": "localhost:10002"
},
"node_id": "node1"
},
{
"address": "127.0.0.1:11000",
"metadata": {
"grpc_address": "localhost:11001",
"http_address": "localhost:11002"
},
"node_id": "node2"
},
{
"address": "127.0.0.1:12000",
"metadata": {
"grpc_address": "localhost:12001",
"http_address": "localhost:12002"
},
"node_id": "node3"
}
],
"success": true
}
Recommend 3 or more odd number of nodes in the cluster. In failure scenarios, data loss is inevitable, so avoid deploying single nodes.
This tells each new node to join the existing node. Once joined, each node now knows about the key:
Following command puts a document to node0:
$ cat ./example/enwiki_doc1.json | xargs -0 ./bin/blast put --grpc-addr=localhost:10001 --pretty-print enwiki_doc1
You can see the result in JSON format. The result of the above command is:
{
"success": true
}
So, you can get a document from node1 like following:
$ ./bin/blast get --grpc-addr=localhost:10001 --pretty-print enwiki_doc1
You can see the result in JSON format. The result of the above command is:
{
"id": "enwiki_doc1",
"fields": {
"_type": "enwiki",
"contributor": "unknown",
"text_en": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload.",
"timestamp": "2018-07-04T05:41:00Z",
"title_en": "Search engine (computing)"
},
"success": true
}
Also, you can get same document from node2 (localhost:11001) like following:
$ ./bin/blast get --grpc-addr=localhost:11001 --pretty-print enwiki_doc1
You can see the result in JSON format. The result of the above command is:
{
"id": "enwiki_doc1",
"fields": {
"_type": "enwiki",
"contributor": "unknown",
"text_en": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload.",
"timestamp": "2018-07-04T05:41:00Z",
"title_en": "Search engine (computing)"
},
"success": true
}
Lastly, you can get same document from node3 (localhost:12001) like following:
$ ./bin/blast get --grpc-addr=localhost:12001 --pretty-print enwiki_doc1
You can see the result in JSON format. The result of the above command is:
{
"id": "enwiki_doc1",
"fields": {
"_type": "enwiki",
"contributor": "unknown",
"text_en": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload.",
"timestamp": "2018-07-04T05:41:00Z",
"title_en": "Search engine (computing)"
},
"success": true
}
You can build the Docker container image like so:
$ make docker
You can also use the Docker container image already registered in docker.io like so:
$ docker pull mosuka/blast:v0.2.0
See https://hub.docker.com/r/mosuka/blast/tags/
Running a Blast node on Docker. Start Blast node like so:
$ docker run --rm --name blast1 \
-p 10000:10000 \
-p 10001:10001 \
-p 10002:10002 \
mosuka/blast:v0.3.0 start \
--bind-addr=:10000 \
--grpc-addr=:10001 \
--http-addr=:10002 \
--node-id=node1
You can also bringing up cluster with Docker Compose. First, you need to start 1st node of cluster like so:
$ docker-compose up -d blast1
Then, you can start other nodes like so:
$ docker-compose up -d blast2 blast3
All nodes are stopped as follows:
$ docker-compose stop