RediSearch / redisearch-py

RediSearch python client

Home Page:https://redisearch.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inconsistent index drop behavior

maguec opened this issue · comments

I added the following test to ensure that the data is not deleted, but on a fresh restart it fails maybe 1 in 25 or so tries.
Am not able to consistently reproduce, so I assume it's some sort of race condition.

    def testDropIndex(self):
        """
        Ensure the index gets dropped by data remains by default
        """
        for x in range(200):
            print("Run ", x)
            conn = self.redis()
            with conn as r:
                if check_version(r, 20000):
                    idx = "HaveIt-%d" %(int(time.time()))
                    index = Client(idx, port=conn.port)
                    index.redis.hset("index:haveit", mapping = {'name': 'haveit'})
                    idef = IndexDefinition(prefix=['index:'],score_field='name')
                    index.create_index((TextField('name'),),definition=idef)
                    index.drop_index()
                    i = index.redis.hgetall("index:haveit")
                    self.assertEqual(i, {'name': 'haveit'})

When I spin up a new fresh docker container

docker run -p 6379:6379 redislabs/redisearch:2.0.0

Running the test it fails intermittently

 chris   handle-index-drop-like-cli U:2 ?:1  ❲p❳ venv3  ~/Code/RedisLabs/Forks/redisearch-py  1  REDIS_PORT=6379 python test/test.py 
.........Run  0
Run  1
Run  2
Run  3
Run  4
Run  5
Run  6
Run  7
Redis output: 
F...................
======================================================================
FAIL: testDropIndex (__main__.RedisSearchTestCase)
Ensure the index gets dropped by data remains by default
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test.py", line 218, in testDropIndex
    self.assertEqual(i, {'name': 'haveit'})
AssertionError: {} != {'name': 'haveit'}
- {}
+ {'name': 'haveit'}

----------------------------------------------------------------------
Ran 29 tests in 4.561s

FAILED (failures=1)
 chris   handle-index-drop-like-cli U:2 ?:1  ❲p❳ venv3  ~/Code/RedisLabs/Forks/redisearch-py  1  REDIS_PORT=6379 python test/test.py 
.........Run  0
Run  1
Run  2
Run  3
Run  4
Run  5
Run  6
Run  7
Run  8
Run  9
Run  10
Run  11
Run  12
Run  13
Run  14
Run  15
Redis output: 
F...................
======================================================================
FAIL: testDropIndex (__main__.RedisSearchTestCase)
Ensure the index gets dropped by data remains by default
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test.py", line 218, in testDropIndex
    self.assertEqual(i, {'name': 'haveit'})
AssertionError: {} != {'name': 'haveit'}
- {}
+ {'name': 'haveit'}

----------------------------------------------------------------------
Ran 29 tests in 4.803s

FAILED (failures=1)

Running the CLI I clearly see the delete, but this does not happen every time.

> redis-cli MONITOR |grep -i haveit
1608328450.450124 [0 172.17.42.1:58372] "FT.DROP" "HaveIt-1608328450"
1608328450.450419 [0 172.17.42.1:58372] "HGETALL" "index:haveit"
1608328450.451187 [0 172.17.42.1:58376] "HSET" "index:haveit" "name" "haveit"
1608328450.451406 [0 172.17.42.1:58376] "FT.CREATE" "HaveIt-1608328450" "ON" "HASH" "PREFIX" "1" "index:" "SCORE_FIELD" "name" "SCORE" "1.0" "SCHEMA" "name" "TEXT" "WEIGHT" "1.0"
1608328450.451633 [0 172.17.42.1:58376] "FT.DROP" "HaveIt-1608328450"
1608328450.451815 [0 172.17.42.1:58376] "HGETALL" "index:haveit"
1608328450.452470 [0 172.17.42.1:58380] "HSET" "index:haveit" "name" "haveit"
1608328450.452672 [0 172.17.42.1:58380] "FT.CREATE" "HaveIt-1608328450" "ON" "HASH" "PREFIX" "1" "index:" "SCORE_FIELD" "name" "SCORE" "1.0" "SCHEMA" "name" "TEXT" "WEIGHT" "1.0"
1608328450.452938 [0 172.17.42.1:58380] "FT.DROP" "HaveIt-1608328450"
1608328450.453144 [0 172.17.42.1:58380] "HGETALL" "index:haveit"
1608328450.453807 [0 172.17.42.1:58384] "HSET" "index:haveit" "name" "haveit"
1608328450.454004 [0 172.17.42.1:58384] "FT.CREATE" "HaveIt-1608328450" "ON" "HASH" "PREFIX" "1" "index:" "SCORE_FIELD" "name" "SCORE" "1.0" "SCHEMA" "name" "TEXT" "WEIGHT" "1.0"
1608328450.454236 [0 172.17.42.1:58384] "FT.DROP" "HaveIt-1608328450"
1608328450.454413 [0 172.17.42.1:58384] "HGETALL" "index:haveit"
1608328450.455031 [0 172.17.42.1:58388] "HSET" "index:haveit" "name" "haveit"
1608328450.455209 [0 172.17.42.1:58388] "FT.CREATE" "HaveIt-1608328450" "ON" "HASH" "PREFIX" "1" "index:" "SCORE_FIELD" "name" "SCORE" "1.0" "SCHEMA" "name" "TEXT" "WEIGHT" "1.0"
1608328450.456367 [0 172.17.42.1:58388] "FT.DROP" "HaveIt-1608328450"
1608328450.456389 [0 ?:0] "DEL" "index:haveit"
1608328450.456602 [0 172.17.42.1:58388] "HGETALL" "index:haveit"

My docker output appears as follows:

 docker run -p 6379:6379 redislabs/redisearch:2.0.0
1:C 18 Dec 2020 21:44:57.557 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 18 Dec 2020 21:44:57.557 # Redis version=6.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 18 Dec 2020 21:44:57.557 # Configuration loaded
1:M 18 Dec 2020 21:44:57.558 * Running mode=standalone, port=6379.
1:M 18 Dec 2020 21:44:57.559 # Server initialized
1:M 18 Dec 2020 21:44:57.559 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 18 Dec 2020 21:44:57.559 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 18 Dec 2020 21:44:57.559 * <search> Redis version found by RedisSearch : 6.0.5 - oss
1:M 18 Dec 2020 21:44:57.559 * <search> RediSearch version 2.0.0 (Git=v1.99.5-17-gd421035)
1:M 18 Dec 2020 21:44:57.559 * <search> Low level api version 1 initialized successfully
1:M 18 Dec 2020 21:44:57.559 * <search> concurrent writes: OFF, gc: ON, prefix min length: 2, prefix max expansions: 200, query timeout (ms): 500, timeout policy: return, cursor read size: 1000, cursor max idle (ms): 300000, max doctable size: 1000000, max number of search results:  1000000, search pool size: 20, index pool size: 8, 
1:M 18 Dec 2020 21:44:57.559 * <search> Initialized thread pool!
1:M 18 Dec 2020 21:44:57.559 * Module 'search' loaded from /usr/lib/redis/modules/redisearch.so
1:M 18 Dec 2020 21:44:57.560 * Ready to accept connections
1:M 18 Dec 2020 21:45:13.463 * <module> Scanning index testAlias in background
1:M 18 Dec 2020 21:45:13.463 # <module> invalid score for key index1:lonestar
1:M 18 Dec 2020 21:45:13.464 * <module> Scanning indexes in background: done (scanned=2)
1:M 18 Dec 2020 21:45:13.464 * <module> Scanning index testAlias2 in background
1:M 18 Dec 2020 21:45:13.464 # <module> invalid score for key index2:yogurt
1:M 18 Dec 2020 21:45:13.464 * <module> Scanning indexes in background: done (scanned=2)
1:M 18 Dec 2020 21:45:14.296 * DB saved on disk
1:M 18 Dec 2020 21:45:14.306 * DB saved on disk
1:M 18 Dec 2020 21:45:14.307 * Loading RDB produced by version 6.0.5
1:M 18 Dec 2020 21:45:14.307 * RDB age 0 seconds
1:M 18 Dec 2020 21:45:14.307 * RDB memory usage when created 1.43 Mb
1:M 18 Dec 2020 21:45:14.311 * <module> Scanning all indexes
1:M 18 Dec 2020 21:45:14.311 # DB reloaded by DEBUG RELOAD
1:M 18 Dec 2020 21:45:14.311 * <module> Scanning indexes in background
1:M 18 Dec 2020 21:45:14.311 * <module> Scanning indexes in background: done (scanned=2)
1:M 18 Dec 2020 21:45:14.630 * DB saved on disk
1:M 18 Dec 2020 21:45:14.645 * DB saved on disk
1:M 18 Dec 2020 21:45:14.655 * Loading RDB produced by version 6.0.5
1:M 18 Dec 2020 21:45:14.655 * RDB age 0 seconds
1:M 18 Dec 2020 21:45:14.655 * RDB memory usage when created 5.15 Mb
1:M 18 Dec 2020 21:45:14.659 * <module> Scanning all indexes
1:M 18 Dec 2020 21:45:14.659 # DB reloaded by DEBUG RELOAD
1:M 18 Dec 2020 21:45:14.660 * <module> Scanning indexes in background
1:M 18 Dec 2020 21:45:14.748 * <module> Scanning indexes in background: done (scanned=500)
1:M 18 Dec 2020 21:45:15.078 * <module> Scanning index idx in background
1:M 18 Dec 2020 21:45:15.081 * <module> Scanning indexes in background: done (scanned=501)
1:M 18 Dec 2020 21:45:15.200 * DB saved on disk
1:M 18 Dec 2020 21:45:15.216 * DB saved on disk
1:M 18 Dec 2020 21:45:15.224 * Loading RDB produced by version 6.0.5
1:M 18 Dec 2020 21:45:15.224 * RDB age 0 seconds
1:M 18 Dec 2020 21:45:15.224 * RDB memory usage when created 5.36 Mb
1:M 18 Dec 2020 21:45:15.228 * <module> Scanning all indexes
1:M 18 Dec 2020 21:45:15.228 # DB reloaded by DEBUG RELOAD
1:M 18 Dec 2020 21:45:15.228 * <module> Scanning indexes in background
1:M 18 Dec 2020 21:45:15.316 * <module> Scanning indexes in background: done (scanned=501)
1:M 18 Dec 2020 21:45:15.334 * <module> Scanning index HaveIt-1608327915 in background
1:M 18 Dec 2020 21:45:15.334 # <module> invalid score for key index:haveit
1:M 18 Dec 2020 21:45:15.335 * <module> Scanning indexes in background: done (scanned=502)
1:M 18 Dec 2020 21:45:15.350 * <module> Scanning index idx in background
1:M 18 Dec 2020 21:45:15.350 * <module> Scanning indexes in background: done (scanned=1)
1:M 18 Dec 2020 21:45:15.365 * DB saved on disk
1:M 18 Dec 2020 21:45:15.370 * DB saved on disk
1:M 18 Dec 2020 21:45:15.370 * Loading RDB produced by version 6.0.5
1:M 18 Dec 2020 21:45:15.370 * RDB age 0 seconds
1:M 18 Dec 2020 21:45:15.370 * RDB memory usage when created 1.40 Mb
1:M 18 Dec 2020 21:45:15.370 * <module> Scanning all indexes
1:M 18 Dec 2020 21:45:15.370 # DB reloaded by DEBUG RELOAD
1:M 18 Dec 2020 21:45:15.370 * <module> Scanning indexes in background
1:M 18 Dec 2020 21:45:15.370 * <module> Scanning indexes in background: done (scanned=2)
1:M 18 Dec 2020 21:45:15.669 * <module> Scanning index idx in background
1:M 18 Dec 2020 21:45:15.676 * <module> Scanning indexes in background: done (scanned=500)
1:M 18 Dec 2020 21:45:15.800 * DB saved on disk
1:M 18 Dec 2020 21:45:15.820 * DB saved on disk
1:M 18 Dec 2020 21:45:15.821 * Loading RDB produced by version 6.0.5
1:M 18 Dec 2020 21:45:15.821 * RDB age 0 seconds
1:M 18 Dec 2020 21:45:15.821 * RDB memory usage when created 2.23 Mb
1:M 18 Dec 2020 21:45:15.824 * <module> Scanning all indexes
1:M 18 Dec 2020 21:45:15.824 # DB reloaded by DEBUG RELOAD
1:M 18 Dec 2020 21:45:15.824 * <module> Scanning indexes in background
1:M 18 Dec 2020 21:45:15.834 * <module> Scanning indexes in background: done (scanned=502)
1:M 18 Dec 2020 21:45:15.948 * DB saved on disk
1:M 18 Dec 2020 21:45:15.953 * DB saved on disk
1:M 18 Dec 2020 21:45:15.953 * Loading RDB produced by version 6.0.5
1:M 18 Dec 2020 21:45:15.953 * RDB age 0 seconds
1:M 18 Dec 2020 21:45:15.954 * RDB memory usage when created 1.58 Mb
1:M 18 Dec 2020 21:45:15.954 * <module> Scanning all indexes
1:M 18 Dec 2020 21:45:15.954 # DB reloaded by DEBUG RELOAD
1:M 18 Dec 2020 21:45:15.954 * <module> Scanning indexes in background
1:M 18 Dec 2020 21:45:15.954 * <module> Scanning indexes in background: done (scanned=2)
1:M 18 Dec 2020 21:45:15.974 * DB saved on disk
1:M 18 Dec 2020 21:45:15.978 * DB saved on disk
1:M 18 Dec 2020 21:45:15.979 * Loading RDB produced by version 6.0.5
1:M 18 Dec 2020 21:45:15.979 * RDB age 0 seconds
1:M 18 Dec 2020 21:45:15.979 * RDB memory usage when created 1.42 Mb
1:M 18 Dec 2020 21:45:15.979 * <module> Scanning all indexes
1:M 18 Dec 2020 21:45:15.979 # DB reloaded by DEBUG RELOAD
1:M 18 Dec 2020 21:45:15.979 * <module> Scanning indexes in background
1:M 18 Dec 2020 21:45:15.979 * <module> Scanning indexes in background: done (scanned=2)
1:M 18 Dec 2020 21:45:16.037 * DB saved on disk
1:M 18 Dec 2020 21:45:16.043 * DB saved on disk
1:M 18 Dec 2020 21:45:16.045 * Loading RDB produced by version 6.0.5
1:M 18 Dec 2020 21:45:16.045 * RDB age 0 seconds
1:M 18 Dec 2020 21:45:16.045 * RDB memory usage when created 2.22 Mb
1:M 18 Dec 2020 21:45:16.045 * <module> Scanning all indexes
1:M 18 Dec 2020 21:45:16.045 # DB reloaded by DEBUG RELOAD
1:M 18 Dec 2020 21:45:16.045 * <module> Scanning indexes in background
1:M 18 Dec 2020 21:45:16.058 * <module> Scanning indexes in background: done (scanned=100)
1:M 18 Dec 2020 21:45:16.166 * DB saved on disk
1:M 18 Dec 2020 21:45:16.170 * DB saved on disk
1:M 18 Dec 2020 21:45:16.170 * Loading RDB produced by version 6.0.5
1:M 18 Dec 2020 21:45:16.170 * RDB age 0 seconds
1:M 18 Dec 2020 21:45:16.170 * RDB memory usage when created 1.51 Mb
1:M 18 Dec 2020 21:45:16.171 * <module> Scanning all indexes
1:M 18 Dec 2020 21:45:16.171 # DB reloaded by DEBUG RELOAD
1:M 18 Dec 2020 21:45:16.171 * <module> Scanning indexes in background
1:M 18 Dec 2020 21:45:16.171 * <module> Scanning indexes in background: done (scanned=2)
^C1:signal-handler (1608327919) Received SIGINT scheduling shutdown...
1:M 18 Dec 2020 21:45:19.281 # User requested shutdown...
1:M 18 Dec 2020 21:45:19.281 # Redis is now ready to exit, bye bye...

@maguec Thank you for a detailed issue.

By default, FT.DROP which is used by redisearch-py, will delete all documents unless KEEPDOCS is given and therefore the documents should be dropped.

The issue here seems to be that you are dropping the index before the hashes were indexed into it and therefore you don't see the DEL command in the monitor. In our tests, we use waitForIndex to ensure indexing has completed before issuing additional commands. You can view the tests in PR #108.

I have taken the opportunity to add a keep_documents option to drop_index with a default of False in order to maintain the current behavior.

@K-Jo @emmanuelkeller we have deprecated FT.DROP for FT.DROPINDEX where the default is not deleting docs by defaults (unless DD is provided). What would you like the behavior to be here?

@ashtul If I understand well, when we call FT.DROPINDEX with the DD option, it only deletes the hashes that are in the index. The hashes that were not indexed before the call to FT.DROPINDEX are not deleted.

So, it does not look like a bug, but it requires to be documented.

If so, then we should explain this in the documentation of FT.DROPINDEX

@ashtul @emmanuelkeller Can't we show some kind of warning to user when they try to delete document before indexing if waitForIndex is not provided.
Though mentioning this thing in Doc is necessary, but not all users read entire doc line by line.

@EduardoSaverin
Unfortunately, this isn't an option since the delete command is a Redis command, and we get a notification. However, we could add a log. @emmanuelkeller , what do you think?