swiftlang / swift-experimental-string-processing

An early experimental general-purpose pattern matching engine for Swift.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Regex does not match isolated combining mark as whitespace if preceded by whitespace

digitalheir opened this issue · comments

Description

I believe Regexes should function on Unicode scalars, not on Swift Chars. This is a failure mode: <space>+<combining mark> (such as " ̃") is seen as a single whitespace character, where all other programming languages I know of regard it conceptually as a single whitespace character plus a single non-spacing combining character.

Reproduction

let aTilde = "" // \u{0061} + \u{0303}
let aMatch = try! /\S/.firstMatch(in: aTilde) 
print(aMatch?.output) // "ã" hm... I would have expected only the scalar 'a' to match
let combiningTilde = "̃" // \u{0303}
let tildeMatch = try! /\S/.firstMatch(in: combiningTilde)
print(tildeMatch?.output) // "̃" correct to me
let spaceWithTilde = " ̃" // space+tilde
let spaceTildeMatch = try! /\S/.firstMatch(in: spaceWithTilde)
print(spaceTildeMatch?.output) // nil, but I would expect \u{0303} to match

Expected behavior

tilde scalar was expected to match regex, since it is not a whitespace codepoint (WS) according to Unicode specification, but non-spacing (Mn)

Environment

5.9

Additional information

No response