Add search/split iterators for Python
ashvardanian opened this issue · comments
In C++ we have special smart iterators for bulk search and split operations. They lazily report the matches, avoiding heap allocations for the array of match offsets.
For that, an arbitrary matcher (string / character / character set ; in normal / reverse order) is combined with search / split ranges. Similar functionality should be added in Python, where we currently materialize the matches into a "compressed" Strs
object.
I'm very interested in contributing to this project as my first step into open-source. I believe I could start by addressing this issue. To clarify, are we aiming to replace the Strs type with something like StrIterator that yields strings lazily? As a first step, should I focus on modifying the split function to eliminate the use of realloc and ensure it returns an iterator instead? Any guidance on this would be greatly appreciated.
Hi @ghazariann! I don't think we should replace the Strs
. We should keep both. The split should provide an iterator, which should if materialized, is converted to Strs
. How does that sound?
Added in 3b6cddd