apple / swift-experimental-string-processing

An early experimental general-purpose pattern matching engine for Swift.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large regression in word boundary regexes

rctcwyvrn opened this issue · comments

commented

I was running the benchmarker for this regex #"<(\w*)\b[^>]*>(.*?)<\/\1>"# which uses \b to match the end of a html tag and noticed it was running really slow

ed842cb

Running
- htmlAll 11.8ms

main

Running
- htmlAll 3.08s

Some amount of regression was expected with the implementation of the new word breaking algorithm but a 300x slowdown seems unacceptable. A quick profile shows that ~99% of the time is spent in AssertFunction, with 90% of that being String._wordIndex(after:) and 10% being Set.insert

cc @Azoy @milseman

@Azoy is this because the SPI is inefficient, or any thoughts on what to do here?

yeah the current implementation of String.isOnWordBoundary in this repo is really inefficient and was fully expecting perf to be pretty bad. Once _nearestWordIndex(atOrBelow:) is fixed, I think this operation will get considerably faster.