Simple benchmarks for the insertion of 1 million distinct sorted values

Question

Simple benchmarks for the insertion of 1 million distinct sorted values

healiseu opened this issue 6 years ago · comments

HEALIS - Healthy Information Systems/Services commented 6 years ago

Hi, I would like to start experimenting with your search engine to see how I can embed it in TRIADB project. I have already built an implementation that uses Josiah Carlson Redis Object-Mapper (ROM) but it's rather slow on inserts and it has limited functionality on queries.

Nevertheless TRIADB is a Python Database Framework and it is very convenient for developers to use an Object Mapper like ROM or Walrus or Stdnet, something that can be developed on top of redisearch-py I think. In particular I am interested in Unique indexes, composite indexes and ManyToMany relationships. This is an essential and basic functionality for TRIADB.

To be more specific for a start I want to use redisearch to measure the memory footprint and the insertion speed for the values of a column from a table. Each column of data in TRIADB belongs to a data type and each data type stores distinct, i.e. single instance, values of a specific format e.g. integers, floats, dates, categorical data, etc. Therefore I want to learn what is the most efficient way, speed is No1 priority and memory size is second, to create and apply a unique index constraint in RediSearch engine.

For example I wrote this simple demo to enter 1 million sorted floats in the range [-5000, 5000].

How is it possible to specify the representation of a number in redisearch ? Does it account internally for differences between numeric types
What is the optimum value for the chunk_size, what is a general guideline ?
How the memory sizes reported from info(), compare to those reported from redis client info memory command. In my case I get
a. used_memory_human : 256 MB
b. used_memory_dataset : 209 MB
What is the most efficient way to build a unique constraint index without a try-except block
What other ways would you think in order to optimize data insertion, a bitmap index for primary key auto-increment perhaps ?