mnunberg / RediSearchBenchmark

Benchmarks for the RediSearch module

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RediSearchBenchmarks

Source code for benchmarking the RediSearch module, providing scalable high performance full-text search.

What's in here

This is a Go application that can ingest data into the search engine, and benchmark the throughput and latency of running these benchmarks.

It supports reading Wikipedia Abstract Data Dumps and indexing them, in three search engines:

Some Results

Output of the benchmark, running on a full Abstracts dump of the English wikipedia, on 5 redis shards over a c4.4xlarge EC2 instance:

Benchmark Concurrent Clients Throughput (requests/sec) Average Latency (ms)
search: hello 1 2737.57 0.36
search: hello 8 9706.18 0.80
search: hello 16 11201.99 1.39
search: hello 32 13198.80 2.36
search: hello 64 17663.32 3.51
search: barack obama 64 16482.96 3.77
search: barack obama 1 2505.68 0.40
search: barack obama 8 8522.50 0.91
search: barack obama 16 10346.83 1.50
search: barack obama 32 11720.58 2.66
exact: "united states of america" 1 618.86 1.62
exact: "united states of america" 8 816.22 9.38
exact: "united states of america" 16 816.47 18.97
exact: "united states of america" 32 815.41 37.44
exact: "united states of america" 64 801.75 75.61
search: manchester united 1 1513.97 0.66
search: manchester united 4 3192.11 1.23
search: manchester united 16 3485.66 4.43
search: manchester united 32 3512.08 8.80
search: manchester united 64 3559.03 17.29
autocomplete (2-3 letters) 1 4145.90 0.24
autocomplete (2-3 letters) 4 9691.04 0.41
autocomplete (2-3 letters) 8 12129.34 0.64
autocomplete (2-3 letters) 16 15268.47 1.00
autocomplete (2-3 letters) 32 16064.66 1.90
autocomplete (2-3 letters) 64 17255.77 3.51
autocomplete (2-3 letters) 128 17935.49 6.47

Benchmark output

For each benchmark, we append a single line to a CSV file, with the engine used, benchmark type, query, concurrency, throughput and latency.

The default file name is benchmark.csv, and running the app with -o - will result in the result printed to stdout.

The output for running a benchmark on the queries "foo,bar,baz" with 4 concurrent clients, looks like this:

redis,"search: foo,bar,baz",4,14997.81,0.27

Usage

Usage of ./RediSearchBenchmark:
  -benchmark string
    	[search|suggest] - if set, we run the given benchmark
  -c int
    	benchmark concurrency (default 4)
  -duration int
    	number of seconds to run the benchmark (default 5)
  -engine string
        [redis|elastic|solr] The search backend to run (default "redis")
  -file string
    	Input file to ingest data from (wikipedia abstracts)
  -fuzzy
    	For redis only - benchmark fuzzy auto suggest
  -hosts string
    	comma separated list of host:port to redis nodes (default "localhost:6379")
  -o string
    	results output file. set to - for stdout (default "benchmark.csv")
  -queries string
    	comma separated list of queries to benchmark (default "hello world")
  -scores string
    	read scores of documents CSV for indexing
  -shards int
    	the number of partitions we want (AT LEAST the number of cluster shards) (default 1)

Example: Indexing documents into RediSearch

./RediSearchBenchmark -engine redis -shards 4 -hosts "localhost:6379,localhost:6380,localhost:6381,localhost:6382" \
    -file ~/wiki/enwiki-20160305-abstract.xml -scores ~/wiki/scores.csv

Example: Benchmarking RediSearch with 32 concurrent clients

./RediSearchBenchmark -engine redis -shards 4 -hosts "localhost:6379,localhost:6380,localhost:6381,localhost:6382" \
    -benchmark search -queries "hello world,foo bar" -c 32 -o out.csv

About

Benchmarks for the RediSearch module

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Go 100.0%