divergentdave / certificate_carver

Digging through the attic of the Web PKI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Try out new regex APIs

divergentdave opened this issue · comments

New versions of regex add APIs that allow cutting down on allocations when matching and returning captures. See Regex.captures_read and CaptureLocations. Allocating a CaptureLocations object once, and using captures_read with it several times, will save allocations over merely calling captures repeatedly. Using these APIs would require giving up on accessing captures by the names assigned to their capturing groups, and would require slicing the input text with the returned locations to get the actual captured string.

This sounds worth investigating, as it would remove some allocations from the tightest loop. Allocating CaptureLocations once per carve_stream invocation would be a win, but allocating it only once (per thread) ahead of time would be a bigger win. (This could use thread_local! when parallelization is introduced)

thread_local! may be too heavyweight, consider making a new struct per worker thread, moving the carving functions into its impl, and storing the CaptureLocations object inside the new struct.