Try out new regex APIs
divergentdave opened this issue · comments
New versions of regex add APIs that allow cutting down on allocations when matching and returning captures. See Regex.captures_read
and CaptureLocations
. Allocating a CaptureLocations
object once, and using captures_read
with it several times, will save allocations over merely calling captures
repeatedly. Using these APIs would require giving up on accessing captures by the names assigned to their capturing groups, and would require slicing the input text with the returned locations to get the actual captured string.
This sounds worth investigating, as it would remove some allocations from the tightest loop. Allocating CaptureLocations
once per carve_stream
invocation would be a win, but allocating it only once (per thread) ahead of time would be a bigger win. (This could use thread_local!
when parallelization is introduced)
thread_local!
may be too heavyweight, consider making a new struct per worker thread, moving the carving functions into its impl, and storing the CaptureLocations object inside the new struct.