scoder / acora

Fast multi-keyword search engine for text strings

Home Page:http://pypi.python.org/pypi/acora

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segmentation fault

andresriancho opened this issue · comments

We're running into a segmentation fault:

kernel: [12705926.991919] python[21044]: segfault at 10 ip 00007f78305d7231 sp 00007f780ffb90a0 error 4 in_cacora.so[7f78305d1000+23000]

When using acora in w3af.

We replaced esm with acora a few days ago and the library was working fine until we used w3af to scan a specific target.

What information do you need in order to debug and fix this issue? I'm collecting kernel version, python version, and trying to come up with a minimalist PoC (1 file, ~30 lines of code) that will trigger the segmentation fault. Anything else?

Python 2.7.9, GCC 4.9.2, 3.16.43-2+deb8u2

Still trying to figure this out. The library is segfaulting on @qnrq 's workstation but not on mine

Seems like code is crashing in _search_in_bytes (https://github.com/scoder/acora/blob/master/acora/_cacora.pyx#L654) when being called by _BytesAcoraIter() here https://github.com/scoder/acora/blob/master/acora/_cacora.pyx#L642

Upgrading Cython 0.21.1 to 0.27.3 did not make any difference.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff894b5700 (LWP 20948)]
__pyx_f_5acora_7_cacora__search_in_bytes (__pyx_v_start_node=0x7fffb0003270, __pyx_v_data_end=__pyx_v_data_end@entry=0x7fffc00eb2f2 "", __pyx_v__data_char=__pyx_v__data_char@entry=0x7fff988efae0, 
    __pyx_v__current_node=__pyx_v__current_node@entry=0x7fff988efab8) at acora/_cacora.c:10034
10034   acora/_cacora.c: No such file or directory.
(gdb) bt
#0  __pyx_f_5acora_7_cacora__search_in_bytes (__pyx_v_start_node=0x7fffb0003270, __pyx_v_data_end=__pyx_v_data_end@entry=0x7fffc00eb2f2 "", __pyx_v__data_char=__pyx_v__data_char@entry=0x7fff988efae0, 
    __pyx_v__current_node=__pyx_v__current_node@entry=0x7fff988efab8) at acora/_cacora.c:10034
#1  0x00007fffac3bc3cc in __pyx_pf_5acora_7_cacora_15_BytesAcoraIter_4__next__ (__pyx_v_self=0x7fff988efaa0) at acora/_cacora.c:9773
#2  __pyx_pw_5acora_7_cacora_15_BytesAcoraIter_5__next__ (__pyx_v_self=<acora._cacora._BytesAcoraIter at remote 0x7fff988efaa0>) at acora/_cacora.c:9641
#3  0x00000000004c95a3 in PyEval_EvalFrameEx () at ../Python/ceval.c:2510

@qnrq could you share the core dump?

@scoder while we wait for the core dump from @qnrq I was wondering... do you have time this / next week to look into this issue? We're trying to use acora and this is a huge blocker.

I believe the bug is triggered when using an empty search engine (i.e., without any keyword). I could be wrong but I don't have the original test script to validate this assumption. The info on the core file seams to suggest this situation.

An empty search engine has only one node (the start_node) without any characters (char_count == 0).

The problem is in:

return current_node.targets[end-1] if current_char == test_chars[end-1] else start_node

When _search_in_bytes() is called, it in turn calls _step_to_next_node(), and in line 693, it assumes that current_node.char_count is always greater than 0 BUT in the case of an empty search engine, start_node == current_node, current_node.char_count == 0, and test_chars[0] == '\0', causing the condition on line 697 to became true and triggering undefined behavior on line 698 (because end-1 == -1, current_node.targets[end-1] is UB), returning a pointer to an invalid location (0x24 in this case).

I'm not completely sure the right way to detect an empty search engine but I assume that an engine with the start_node having char_count == 0 is a way to identifying it.

PR #19 contains a fix for this bug. Basically, it checks if the engine is empty on the __next__() methods of _UnicodeAcoraIter, _BytesAcoraIter, and _FileAcoraIter and raise an StopIteration if that's the case.

This script used to trigger the bug:

import random

# Create some dummy binary data
data = ''.join(chr(random.randint(0, 255)) for _ in xrange(100000))

import acora
builder = acora.AcoraBuilder()   # No keywords!
ac = builder.build()
r = ac.findall(data)  # <<< Segmentation fault

print 'Result', r

but now it runs fine.

@gosella thanks for your help!

First of all, the segmentation fault can be reproduced by the code you sent in my laptop, which is amazing since I was unable to reproduce it myself.

This means that you found and fixed "a segmentation fault", hopefully it is the same that was affecting w3af scans run by @qnrq. @qnrq can you confirm if this fixes our issues with w3af?

Also, I confirm that after installing the patched version from https://github.com/gosella/acora/tree/handle-empty-tries the segmentation fault is no longer reproducible with the code sent by @gosella .

Thanks for the investigation. I just uploaded 2.1 with a fix.