composewell / unicode-transforms

Fast Unicode normalization in Haskell

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hackage revision for old versions

parsonsmatt opened this issue · comments

I just debugged an entirely nasty problem and eventually traced it to unicode-transforms-0.1.0.1. The cabal solver had selected it because it was the only version that worked with base-4.17 and text > 2.

Would y'all mind putting an upper bound on text for this version: https://hackage.haskell.org/package/unicode-transforms-0.1.0.1 ?

@parsonsmatt What's the problem with this combination of packages? I'm getting a clean build with u-t-0.1.0.1, GHC 9.4.1 and text-2.0.1.

I'd get non-deterministic TransformError that go away in the newer versions. I haven't diagnosed exactly but I'd guess it's a bug in how the C code is working given the change from UTF-16 to UTF-8.

Yup - it's using UTF16Mode.

Using text >= 2 also breaks the testsuite:

$ cabal test --constraint 'text >= 2'
...
Test suite test: RUNNING...
@Part0 # Specific cases
toNFD 1E0A = 0005 0011 006D 0011 00BC; Expected: 0044 0307
toNFD 1E0A = 0005 0011 006D 0011 00BC; Expected: 0044 0307
toNFD 0044 0307 = 000E 0011 0062 0011; Expected: 0044 0307
toNFD 1E0A = 0005 0011 006D 0011 00BC; Expected: 0044 0307
toNFD 0044 0307 = 000E 0011 0062 0011; Expected: 0044 0307
Failed at line: 40
1E0A;1E0A;0044 0307;1E0A;0044 0307; # (Ḋ; Ḋ; D◌̇; Ḋ; D◌̇; ) LATIN CAPITAL LETTER D WITH DOT ABOVE
1E0A;1E0A;0044 0307;1E0A;0044 0307; # (Ḋ; Ḋ; Ḋ; Ḋ; Ḋ
test: Bailing out
CallStack (from HasCallStack):
  error, called at test/NormalizationTest.hs:92:17 in main:Main

Test suite test: FAIL

As a Hackage trustee, I have revised v0.1.0.1: https://hackage.haskell.org/package/unicode-transforms-0.1.0.1/revisions/

I just debugged an entirely nasty problem and eventually traced it to unicode-transforms-0.1.0.1. The cabal solver had selected it because it was the only version that worked with base-4.17 and text > 2.

Sorry about that @parsonsmatt . This was the first version of my first Haskell package! Missed proper bounds. Thanks for the revision @sjakobi it will save debugging time for others.

It assumes UTF-16 encoding. We will have to put UTF-8 handling in place before bumping to text 2.0.

Text-2.0 seems to be supported already via 6418a41 by @Bodigrim . @sjakobi does the test failure occur only for the old version for the latest version as well?

No worries @harendra-kumar ! 😄 I think I've still spent more time dealing with restrictive upper bounds than lax upper bounds.

@sjakobi does the test failure occur only for the old version for the latest version as well?

In the latest v0.4.0.1, tests are passing both with text >= 2 and text < 2.