WB3d: Keep horizontal whitespace together.
scottmcm opened this issue · comments
From the examples in https://docs.rs/unic-segment/0.9.0/unic_segment/ it appears that this crate (like unicode-segmentation) treats the boundary between two spaces as a word bound:
assert_eq!(
WordBounds::new("The quick (\"brown\") fox").collect::<Vec<&str>>(),
&["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", " ", " ", "fox"]
);
However WB3d says "Keep horizontal whitespace together.", with the rule "WSegSpace × WSegSpace". The test file https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/WordBreakTest.txt confirms that there should not be a break between sequential spaces:
÷ 0020 × 0020 ÷ # ÷ [0.2] SPACE (WSegSpace) × [3.4] SPACE (WSegSpace) ÷ [0.3]
Is this a bug, or am I misunderstanding something?
Ah this is a dup of #259 because the rule showed up in the re-issue of UAX#29 for Unicode 11 (http://www.unicode.org/reports/tr29/tr29-33.html#Modifications).