Large regression in word boundary regexes
rctcwyvrn opened this issue · comments
I was running the benchmarker for this regex #"<(\w*)\b[^>]*>(.*?)<\/\1>"#
which uses \b
to match the end of a html tag and noticed it was running really slow
Running
- htmlAll 11.8ms
main
Running
- htmlAll 3.08s
Some amount of regression was expected with the implementation of the new word breaking algorithm but a 300x slowdown seems unacceptable. A quick profile shows that ~99% of the time is spent in AssertFunction
, with 90% of that being String._wordIndex(after:)
and 10% being Set.insert
@Azoy is this because the SPI is inefficient, or any thoughts on what to do here?
yeah the current implementation of String.isOnWordBoundary
in this repo is really inefficient and was fully expecting perf to be pretty bad. Once _nearestWordIndex(atOrBelow:)
is fixed, I think this operation will get considerably faster.